urllib.quote fails on Unicode URL

John Nagle

The code in urllib.quote fails on Unicode input, when
called by robotparser.

That bit of code needs some attention.
- It still assumes ASCII goes up to 255, which hasn't been true in Python
for a while now.
- The initialization may not be thread-safe; a table is being initialized
on first use. The code is too clever and uncommented.

"robotparser" was trying to check if a URL,
"http://www.highbeam.com/DynamicContent/%E2%80%9D/mysaved/privacyPref.asp%22"
could be accessed, and there are some wierd characters in there. Unicode
URLs are legal, so this is a real bug.

Logged in as Bug #1712522.

John Nagle

May 4 '07 #1

Subscribe Post Reply

3090

Peter Otten

John Nagle wrote:

The code in urllib.quote fails on Unicode input, when
called by robotparser.

That bit of code needs some attention.
- It still assumes ASCII goes up to 255, which hasn't been true in
Python
for a while now.
- The initialization may not be thread-safe; a table is being
initialized
on first use. The code is too clever and uncommented.

"robotparser" was trying to check if a URL,

"http://www.highbeam.com/DynamicContent/%E2%80%9D/mysaved/privacyPref.asp%22"

could be accessed, and there are some wierd characters in there. Unicode
URLs are legal, so this is a real bug.

Logged in as Bug #1712522.

There has been a related discussion:

http://groups.google.com/group/comp....6e6a3c0635e340

IIRC the outcome was that while UTF-8 is recommended
urllib.quote()/unquote() should not guess the encoding.

What changes that would imply for robotparser I don't know...

Peter

May 4 '07 #2

by: Stuart McGraw | last post by:

I just spent a $*#@!*&^&% hour registering at ^$#@#%^ Sourceforce and trying to submit a Python bug report but it still won't let me. I give up. Maybe someone who cares will see this post, or...

Python

urllib problem (maybe bugs?)

by: Timothy Wu | last post by:

Hi, I'm trying to fill the form on page http://www.cbs.dtu.dk/services/TMHMM/ using urllib. There are two peculiarities. First of all, I am filling in incorrect key/value pairs in the...

Python

urllib.urlencode wrongly encoding ± character

by: sleytr | last post by:

Hi, I'm trying to make a gui for a web service. Site using ± character in value of some fields. But I can't encode this character properly. >>> data = {'key':'±'} >>> urllib.urlencode(data)...

Python

urllib.unquote and unicode

by: George Sakkis | last post by:

The following snippet results in different outcome for (at least) the last three major releases: # Python 2.3.4 u'%94' # Python 2.4.2 UnicodeDecodeError: 'ascii' codec can't decode byte...

Python

urllib.unquote + unicode

by: koara | last post by:

Hello all, i am using urllib.unquote_plus to unquote a string. Sometimes i get a strange string like for example "spolu%u017E%E1ci.cz" to unquote. Here the problem is that some application...

Python

urllib (54, 'Connection reset by peer') error

by: chrispoliquin | last post by:

Hi, I have a small Python script to fetch some pages from the internet. There are a lot of pages and I am looping through them and then downloading the page using urlretrieve() in the urllib...

Python

urllib.urlopen fails in Emacs

by: Iain Dalton | last post by:

In Emacs, using run-python, import urllib urllib.urlopen('http://www.google.com/') results in this traceback: Traceback (most recent call last): File "<stdin>", line 1, in <module> File...

Python

Re: Problem: neither urllib2.quote nor urllib.quote encode theunicode strings arguments

by: Jerry Hill | last post by:

On Fri, Oct 3, 2008 at 5:38 PM, Valery Khamenya <khamenya@gmail.comwrote: Do you know what, exactly, you'd like the result to be? The encoding of unicode characters into URIs is not well...

Python

sys.stdout, urllib and unicode... I don't understand.

by: Thierry | last post by:

Hello fellow pythonists, I'm a relatively new python developer, and I try to adjust my understanding about "how things works" to python, but I have hit a block, that I cannot understand. I...

Python

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

urllib.quote fails on Unicode URL

Similar topics