469,282 Members | 1,977 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,282 developers. It's quick & easy.

urllib.quote fails on Unicode URL

The code in urllib.quote fails on Unicode input, when
called by robotparser.

That bit of code needs some attention.
- It still assumes ASCII goes up to 255, which hasn't been true in Python
for a while now.
- The initialization may not be thread-safe; a table is being initialized
on first use. The code is too clever and uncommented.

"robotparser" was trying to check if a URL,
"http://www.highbeam.com/DynamicContent/%E2%80%9D/mysaved/privacyPref.asp%22"
could be accessed, and there are some wierd characters in there. Unicode
URLs are legal, so this is a real bug.

Logged in as Bug #1712522.

John Nagle

May 4 '07 #1
1 2917
John Nagle wrote:
The code in urllib.quote fails on Unicode input, when
called by robotparser.

That bit of code needs some attention.
- It still assumes ASCII goes up to 255, which hasn't been true in
Python
for a while now.
- The initialization may not be thread-safe; a table is being
initialized
on first use. The code is too clever and uncommented.

"robotparser" was trying to check if a URL,
"http://www.highbeam.com/DynamicContent/%E2%80%9D/mysaved/privacyPref.asp%22"
could be accessed, and there are some wierd characters in there. Unicode
URLs are legal, so this is a real bug.

Logged in as Bug #1712522.
There has been a related discussion:

http://groups.google.com/group/comp....6e6a3c0635e340

IIRC the outcome was that while UTF-8 is recommended
urllib.quote()/unquote() should not guess the encoding.

What changes that would imply for robotparser I don't know...

Peter
May 4 '07 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

7 posts views Thread by Stuart McGraw | last post: by
1 post views Thread by Timothy Wu | last post: by
12 posts views Thread by sleytr | last post: by
11 posts views Thread by George Sakkis | last post: by
1 post views Thread by koara | last post: by
5 posts views Thread by chrispoliquin | last post: by
2 posts views Thread by Iain Dalton | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.