472,354 Members | 2,025 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,354 software developers and data experts.

Unexpected exception from socket.getaddrinfo on Unicode URL

Here's a strange little bug. "socket.getaddrinfo" blows up
if given a bad domain name containing ".." in Unicode. The
same string in ASCII produces the correct "gaierror" exception.

Actually, this deserves a documentation mention. The "socket" module,
given a Unicode string, calls the International Domain Name parser,
"idna.py", which has a a whole error system of its own. The IDNA
documentation says that "Furthermore, the socket module transparently converts
Unicode host names to ACE, so that applications need not be concerned about
converting host names themselves when they pass them to the socket module."
However, that's not quite true; the IDNA rules say that syntax errors must
be treated as errors, so you have to be prepared for IDNA exceptions.
They are all "UnicodeError" exceptions.

It's worth a mention in the documentation for "socket".

John Nagle

D:\>/python25/python.exe
Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>ss = 'www.gallery84..com'
uss = unicode(ss)
import socket
socket.getaddrinfo(ss,"http")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
socket.gaierror: (11001, 'getaddrinfo failed')
>>socket.getaddrinfo(uss,"http")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\python25\lib\encodings\idna.py", line 164, in encode
result.append(ToASCII(label))
File "D:\python25\lib\encodings\idna.py", line 73, in ToASCII
raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long
>>>
Apr 21 '07 #1
2 3902
John Nagle wrote:
Here's a strange little bug. "socket.getaddrinfo" blows up
if given a bad domain name containing ".." in Unicode. The
same string in ASCII produces the correct "gaierror" exception.

Actually, this deserves a documentation mention. The "socket" module,
given a Unicode string, calls the International Domain Name parser,
"idna.py", which has a a whole error system of its own. The IDNA
documentation says that "Furthermore, the socket module transparently converts
Unicode host names to ACE, so that applications need not be concerned about
converting host names themselves when they pass them to the socket module."
However, that's not quite true; the IDNA rules say that syntax errors must
be treated as errors, so you have to be prepared for IDNA exceptions.
They are all "UnicodeError" exceptions.

It's worth a mention in the documentation for "socket".

John Nagle

D:\>/python25/python.exe
Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>ss = 'www.gallery84..com'
>>uss = unicode(ss)
>>import socket
>>socket.getaddrinfo(ss,"http")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
socket.gaierror: (11001, 'getaddrinfo failed')
>>socket.getaddrinfo(uss,"http")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\python25\lib\encodings\idna.py", line 164, in encode
result.append(ToASCII(label))
File "D:\python25\lib\encodings\idna.py", line 73, in ToASCII
raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long
>>>
I took a look at the documentation but couldn't see where to add what,
given that the documentation for socket already says:

"""All errors raise exceptions. The normal exceptions for invalid
argument types and out-of-memory conditions can be raised; errors
related to socket or address semantics raise the error socket.error.
""".

Do we really need to specifically mention Unicode errors?

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Recent Ramblings http://holdenweb.blogspot.com

Apr 21 '07 #2
Steve Holden wrote:
John Nagle wrote:
> Here's a strange little bug. "socket.getaddrinfo" blows up
if given a bad domain name containing ".." in Unicode. The
same string in ASCII produces the correct "gaierror" exception.

Actually, this deserves a documentation mention. The "socket"
module,
given a Unicode string, calls the International Domain Name parser,
"idna.py", which has a a whole error system of its own. The IDNA
documentation says that "Furthermore, the socket module transparently
converts Unicode host names to ACE, so that applications need not be
concerned about converting host names themselves when they pass them
to the socket module."
However, that's not quite true; the IDNA rules say that syntax errors
must
be treated as errors, so you have to be prepared for IDNA exceptions.
They are all "UnicodeError" exceptions.

It's worth a mention in the documentation for "socket".

John Nagle

D:\>/python25/python.exe
Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
(Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
> >>ss = 'www.gallery84..com'
uss = unicode(ss)
import socket
socket.getaddrinfo(ss,"http")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
socket.gaierror: (11001, 'getaddrinfo failed')
> >>socket.getaddrinfo(uss,"http")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\python25\lib\encodings\idna.py", line 164, in encode
result.append(ToASCII(label))
File "D:\python25\lib\encodings\idna.py", line 73, in ToASCII
raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long
> >>>
I took a look at the documentation but couldn't see where to add what,
given that the documentation for socket already says:

"""All errors raise exceptions. The normal exceptions for invalid
argument types and out-of-memory conditions can be raised; errors
related to socket or address semantics raise the error socket.error.
""".

Do we really need to specifically mention Unicode errors?
It says "errors related to socket or address semantics raise the
error 'socket.error'", so, yes. The error really has nothing to
do with Unicode; it's that a different parser is used when a domain
name is in Unicode. It really shouldn't be a "Unicode error" at
all.

When Python goes to Unicode by default, this is likely to break
some existing code. Python's IDNA support is good, but not entirely
invisible. The socket module documentation should mention IDNA
support. It's not clear, for example, when you call "getnameinfo()",
whether you get back the name in Unicode or in Punycode.

John Nagle
Apr 21 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Bernhard Schmidt | last post by:
Hello, sorry for bothering, I'm not a programmer and I don't do much python, I'm more a networking guy trying to get his favourite linux distribution to update through the shiny new protocol...
25
by: Justin Robbs | last post by:
I am trying to write the communcations part of a Point of Sale program for the Convenience Store industry. The setup in each store will have varying numbers of registers. There could be as few...
32
by: Rene Pijlman | last post by:
One of the things I dislike about Java is the need to declare exceptions as part of an interface or class definition. But perhaps Java got this right... I've writen an application that uses...
1
by: mirandacascade | last post by:
I noticed the following lines from the connect() method of the HTTPConnection class within httplib: for res in socket.getaddrinfo(self.host, self.port, 0, socket.SOCK_STREAM): af, socktype,...
3
by: Thomas Dybdahl Ahle | last post by:
Hi, I'm writing an application that connects to the internet. Something like this: for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM): af, socktype, proto, canonname, sa = res...
3
by: Giampaolo Rodola' | last post by:
Hi there, since the socket.socket.family attribute has been introduced only in Python 2.5 and I need to have my application to be backward compatible with Python 2.3 and 2.4 I'd like to know how...
0
by: =?Utf-8?B?T2xpdmllciBHSUw=?= | last post by:
Hello, I try to post an HTTP message containing an XML document, and I get the following exception : System.Net.WebException: The underlying connection was closed: An unexpected error...
1
by: Karl Chen | last post by:
I've discovered that since glibc 2.3.2, getaddrinfo(3) supports a useful flag called AI_ADDRCONFIG. It turns off AAAA lookups if the machine isn't configured for IPv6 (and similarly for IPv4,...
2
Xx r3negade
by: Xx r3negade | last post by:
I am having trouble using connect() when entering struct addrinfo members as parameters. int tcpConnect() { struct addrinfo hints; struct addrinfo *ai = NULL; int res; int...
2
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and efficiency. While initially associated with cryptocurrencies...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge required to effectively administer and manage Oracle...
0
jalbright99669
by: jalbright99669 | last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made but the http to https rule only works for...
2
by: Matthew3360 | last post by:
Hi, I have a python app that i want to be able to get variables from a php page on my webserver. My python app is on my computer. How would I make it so the python app could use a http request to get...
0
by: AndyPSV | last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and...
0
by: Arjunsri | last post by:
I have a Redshift database that I need to use as an import data source. I have configured the DSN connection using the server, port, database, and credentials and received a successful connection...
0
by: Matthew3360 | last post by:
Hi, I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web server and have made sure to enable curl. I get a...
0
Oralloy
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the synthesis of my design into a bitstream, not the C++...
0
BLUEPANDA
by: BLUEPANDA | last post by:
At BluePanda Dev, we're passionate about building high-quality software and sharing our knowledge with the community. That's why we've created a SaaS starter kit that's not only easy to use but also...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.