By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,748 Members | 1,570 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,748 IT Pros & Developers. It's quick & easy.

Problem with RewriteRule when url contains percent character

P: n/a
Hi,

I'm having problems with a RewriteRule that's applied to url's with the %
character in them, hope someone can help. The % character is a result of
url-encoding non-ASCII words, as in the example below:

1. the word "sécurité" comes out of my db

2. I construct the following link, using the php urlencode function:
<a href="/search/s%C3%A9curit%C3%A9">sécurité</a>

3. the url created should be interpreted by a RewriteRule:
RewriteRule ^search/([a-zA-Z0-9-+%]+)$ /pages/search.php?word=$1 [QSA,L]

However the RewriteRule doesn't match on my url, and I see this in the
RewriteLog:

init rewrite engine with requested uri /search/sécurité

So it seems like some kind of decoding is going on so that the RewriteRule
never even sees the % character. I have set everything I can think of
(MySql SET NAMES, Apache AddDefaultCharset) to utf-8.

Any ideas?

TIA,

JON
Feb 4 '07 #1
Share this Question
Share on Google+
7 Replies


P: n/a
"Jon Maz" <pp****************@gmx.removethistoo.netschreef in bericht
news:eq**********@aioe.org...
I'm having problems with a RewriteRule that's applied to url's with the %
character in them, hope someone can help. The % character is a result of
url-encoding non-ASCII words, as in the example below:

1. the word "sécurité" comes out of my db

2. I construct the following link, using the php urlencode function:
<a href="/search/s%C3%A9curit%C3%A9">sécurité</a>

3. the url created should be interpreted by a RewriteRule:
RewriteRule ^search/([a-zA-Z0-9-+%]+)$ /pages/search.php?word=$1
[QSA,L]

However the RewriteRule doesn't match on my url, and I see this in the
RewriteLog:

init rewrite engine with requested uri /search/sécurité

So it seems like some kind of decoding is going on so that the RewriteRule
never even sees the % character. I have set everything I can think of
(MySql SET NAMES, Apache AddDefaultCharset) to utf-8.
So php has encoded the url to some ISO8859 variant and apache is decoding
those to some utf ... next to wonder is the charset used by your OS to
store the file name ...

In general, just forget diacritial, language specific, fancy characters and
just use 'securite' for filename.
It keeps you from dozens of cross-platform and cross-language traps, easing
migration of a website ten fold.

http://czyborra.com/charsets/iso8859.html 'The ISO 8859 Alphabet Soup'

HansH

Feb 4 '07 #2

P: n/a
Hi Hans,

Thanks for your answer. I guess I'm best off just avoiding the whole thing.

What got me wondering was the fact that my php application can cope fine
when this encoded word is passed in the query string:

/pages/search.php?word=s%C3%A9curit%C3%A9

But perhaps it's simply that different rules apply to a url and a query
string parameter?

Thanks,

JON
Feb 5 '07 #3

P: n/a
rh

"Jon Maz" <pp****************@gmx.removethistoo.netwrote in message
news:eq**********@aioe.org...
Hi,

I'm having problems with a RewriteRule that's applied to url's with the %
character in them, hope someone can help. The % character is a result of
url-encoding non-ASCII words, as in the example below:

1. the word "sécurité" comes out of my db

2. I construct the following link, using the php urlencode function:
<a href="/search/s%C3%A9curit%C3%A9">sécurité</a>
How do you get s%C3%A9curit%C3%A9 from sécurité

sécurité, url encoded, is s%E9curit%E9

s%C3%A9curit%C3%A9 decoded is sécurité as is correctly reported in your rewrite log.
>
3. the url created should be interpreted by a RewriteRule:
RewriteRule ^search/([a-zA-Z0-9-+%]+)$ /pages/search.php?word=$1 [QSA,L]
a hyphen in a character class specifies a range unless it's the first or last character in
the class

what range are you looking for with 9-+
>
However the RewriteRule doesn't match on my url, and I see this in the
RewriteLog:

init rewrite engine with requested uri /search/sécurité
The rewrite rule works correctly, the uri contains à and ©. The regex doesn't allow for
these.
>
So it seems like some kind of decoding is going on so that the RewriteRule
never even sees the % character. I have set everything I can think of
(MySql SET NAMES, Apache AddDefaultCharset) to utf-8.
The uri is decoded before the server tries to resolve it, why would it not?

Why are you trying to do the heavy lifting with mod rewrite? just pass the search term to
the script and validate it there, you should validate all user input in your scripts.

RewriteRule ^search/(.+)$ /pages/search.php?word=$1 [QSA,L]
Rich
Feb 5 '07 #4

P: n/a
On Sun, 4 Feb 2007 21:49:08 -0000
"Jon Maz" <pp****************@gmx.removethistoo.netwrote:
So it seems like some kind of decoding is going on so that the
RewriteRule never even sees the % character. I have set everything I
can think of (MySql SET NAMES, Apache AddDefaultCharset) to utf-8.
No you haven't. The expression in your RewriteRule is firmly in
ASCII, so it fails to match the non-ASCII characters in the URL.
Any ideas?
Don't faff about with mod_rewrite like that. Or if you
really must, fix your regexp. Or as someone else said,
stick to ASCII.

--
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/
Feb 5 '07 #5

P: n/a
Thanks to everybody for their help on this one!
Feb 7 '07 #6

P: n/a
Thanks to everybody for their help on this one!

Feb 7 '07 #7

P: n/a
"rh" <di*************@cableone.netwrote:
>
"Jon Maz" <pp****************@gmx.removethistoo.netwrote:
>>
I'm having problems with a RewriteRule that's applied to url's with the %
character in them, hope someone can help. The % character is a result of
url-encoding non-ASCII words, as in the example below:

1. the word "sécurité" comes out of my db

2. I construct the following link, using the php urlencode function:
<a href="/search/s%C3%A9curit%C3%A9">sécurité</a>

How do you get s%C3%A9curit%C3%A9 from sécurité

sécurité, url encoded, is s%E9curit%E9
Only in iso-8859-1. In UTF-8, the OP's encoding is correct.
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Feb 7 '07 #8

This discussion thread is closed

Replies have been disabled for this discussion.