By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,089 Members | 2,159 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,089 IT Pros & Developers. It's quick & easy.

urllib.quote fails on Unicode URL

P: n/a
The code in urllib.quote fails on Unicode input, when
called by robotparser.

That bit of code needs some attention.
- It still assumes ASCII goes up to 255, which hasn't been true in Python
for a while now.
- The initialization may not be thread-safe; a table is being initialized
on first use. The code is too clever and uncommented.

"robotparser" was trying to check if a URL,
"http://www.highbeam.com/DynamicContent/%E2%80%9D/mysaved/privacyPref.asp%22"
could be accessed, and there are some wierd characters in there. Unicode
URLs are legal, so this is a real bug.

Logged in as Bug #1712522.

John Nagle

May 4 '07 #1
Share this Question
Share on Google+
1 Reply


P: n/a
John Nagle wrote:
The code in urllib.quote fails on Unicode input, when
called by robotparser.

That bit of code needs some attention.
- It still assumes ASCII goes up to 255, which hasn't been true in
Python
for a while now.
- The initialization may not be thread-safe; a table is being
initialized
on first use. The code is too clever and uncommented.

"robotparser" was trying to check if a URL,
"http://www.highbeam.com/DynamicContent/%E2%80%9D/mysaved/privacyPref.asp%22"
could be accessed, and there are some wierd characters in there. Unicode
URLs are legal, so this is a real bug.

Logged in as Bug #1712522.
There has been a related discussion:

http://groups.google.com/group/comp....6e6a3c0635e340

IIRC the outcome was that while UTF-8 is recommended
urllib.quote()/unquote() should not guess the encoding.

What changes that would imply for robotparser I don't know...

Peter
May 4 '07 #2

This discussion thread is closed

Replies have been disabled for this discussion.