By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,310 Members | 1,430 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,310 IT Pros & Developers. It's quick & easy.

raise UnicodeError, "label too long"

P: n/a
Hi I am havin a problem with urllib2.urlopen.

I get this error when I try to pass a unicode to it.

raise UnicodeError, "label too long"

is this problem avoidable? no browser or programs such as wget seem to
have a problem with these strings.

Jan 24 '07 #1
Share this Question
Share on Google+
6 Replies


P: n/a
In <11**********************@l53g2000cwa.googlegroups .com>, Flavio wrote:
Hi I am havin a problem with urllib2.urlopen.

I get this error when I try to pass a unicode to it.

raise UnicodeError, "label too long"

is this problem avoidable? no browser or programs such as wget seem to
have a problem with these strings.
What exactly are you doing? How does a (unicode?) string look like that
triggers this exception?

Ciao,
Marc 'BlackJack' Rintsch

Jan 24 '07 #2

P: n/a
What I am doing is very simple:

I fetch an url (html page) parse it using BeautifulSoup, extract the
links and try to open each of the links, repeating the cycle.

Beautiful soup converts the html to unicode. That's why when I try to
open the links extracted from the page I get this error.

This is bad, since some links do contain strings with non-ascii
characters.

thanks,

Flávio
Marc 'BlackJack' Rintsch escreveu:
In <11**********************@l53g2000cwa.googlegroups .com>, Flavio wrote:
Hi I am havin a problem with urllib2.urlopen.

I get this error when I try to pass a unicode to it.

raise UnicodeError, "label too long"

is this problem avoidable? no browser or programs such as wget seem to
have a problem with these strings.

What exactly are you doing? How does a (unicode?) string look like that
triggers this exception?

Ciao,
Marc 'BlackJack' Rintsch
Jan 24 '07 #3

P: n/a
Flavio schrieb:
What I am doing is very simple:

I fetch an url (html page) parse it using BeautifulSoup, extract the
links and try to open each of the links, repeating the cycle.

Beautiful soup converts the html to unicode. That's why when I try to
open the links extracted from the page I get this error.

This is bad, since some links do contain strings with non-ascii
characters.
Please try answering the exact question that Marc asked:
what is an example for unicode string that triggers the
exception?

Regards,
Martin
Jan 24 '07 #4

P: n/a

something like this, for instance:
http://.wikipedia.org/wiki/Copper%28II%29_hydroxide

but even url with any non-ascii characters such as this

http://.wikipedia.org/wiki/Ammonia

also fail when passed to urlopen :
File "/usr/lib/python2.4/encodings/idna.py", line 72, in ToASCII
raise UnicodeError, "label too long"
UnicodeError: label too long

very strange, because I tried other unicode urls from the python
console like this

urllib2.urlopen(u'www.google.com')

and it works normally:

Martin v. Löwis escreveu:
Flavio schrieb:
What I am doing is very simple:

I fetch an url (html page) parse it using BeautifulSoup, extract the
links and try to open each of the links, repeating the cycle.

Beautiful soup converts the html to unicode. That's why when I try to
open the links extracted from the page I get this error.

This is bad, since some links do contain strings with non-ascii
characters.

Please try answering the exact question that Marc asked:
what is an example for unicode string that triggers the
exception?

Regards,
Martin
Jan 25 '07 #5

P: n/a
Flavio schrieb:
something like this, for instance:
http://.wikipedia.org/wiki/Copper%28II%29_hydroxide

but even url with any non-ascii characters such as this

http://.wikipedia.org/wiki/Ammonia

also fail when passed to urlopen :
File "/usr/lib/python2.4/encodings/idna.py", line 72, in ToASCII
raise UnicodeError, "label too long"
UnicodeError: label too long

very strange, because I tried other unicode urls from the python
console like this
It's the host name that starts with a dot that makes it fails:

pyu".wikipedia.org".encode("idna")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "encodings/idna.py", line 163, in encode
File "encodings/idna.py", line 72, in ToASCII
UnicodeError: label too long
pyu"wikipedia.org".encode("idna")
'wikipedia.org'

The exception is certainly misleading; I'll have to find out
whether there is a bug beyond that (i.e. whether host names
with empty labels should be accepted).

Regards,
martin
Jan 25 '07 #6

P: n/a
Guys, I am sorry I wrote these messages very late at night.

Naturally what came before the dot is the language defining two letter
string that is usual of wikipedia urls.

Something in my code is obviously gobbling that up. Thanks for pointing
that out and my apologies again for not seeing this obvious bug.

On Jan 25, 4:39 am, Dennis Lee Bieber <wlfr...@ix.netcom.comwrote:
On 24 Jan 2007 16:25:19 -0800, "Flavio" <fccoe...@gmail.comdeclaimed
the following in comp.lang.python:
something like this, for instance:
http://.wikipedia.org/wiki/Copper%28II%29_hydroxide Was there some text between the // and .wikipedia? As written this,
and the next one, both lock up Firefox. Take out the . and they work (or
put www before the . ).

--
Wulfraed Dennis Lee Bieber KD6MOG
wlfr...@ix.netcom.com wulfr...@bestiaria.com
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: web-a...@bestiaria.com)
HTTP://www.bestiaria.com/
Jan 25 '07 #7

This discussion thread is closed

Replies have been disabled for this discussion.