473,573 Members | 2,627 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Unexpected exception from socket.getaddri nfo on Unicode URL

Here's a strange little bug. "socket.getaddr info" blows up
if given a bad domain name containing ".." in Unicode. The
same string in ASCII produces the correct "gaierror" exception.

Actually, this deserves a documentation mention. The "socket" module,
given a Unicode string, calls the International Domain Name parser,
"idna.py", which has a a whole error system of its own. The IDNA
documentation says that "Furthermor e, the socket module transparently converts
Unicode host names to ACE, so that applications need not be concerned about
converting host names themselves when they pass them to the socket module."
However, that's not quite true; the IDNA rules say that syntax errors must
be treated as errors, so you have to be prepared for IDNA exceptions.
They are all "UnicodeErr or" exceptions.

It's worth a mention in the documentation for "socket".

John Nagle

D:\>/python25/python.exe
Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit (Intel)] on win
32
Type "help", "copyright" , "credits" or "license" for more information.
>>ss = 'www.gallery84. .com'
uss = unicode(ss)
import socket
socket.getadd rinfo(ss,"http" )
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
socket.gaierror : (11001, 'getaddrinfo failed')
>>socket.getadd rinfo(uss,"http ")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\python25\li b\encodings\idn a.py", line 164, in encode
result.append(T oASCII(label))
File "D:\python25\li b\encodings\idn a.py", line 73, in ToASCII
raise UnicodeError("l abel empty or too long")
UnicodeError: label empty or too long
>>>
Apr 21 '07 #1
2 4027
John Nagle wrote:
Here's a strange little bug. "socket.getaddr info" blows up
if given a bad domain name containing ".." in Unicode. The
same string in ASCII produces the correct "gaierror" exception.

Actually, this deserves a documentation mention. The "socket" module,
given a Unicode string, calls the International Domain Name parser,
"idna.py", which has a a whole error system of its own. The IDNA
documentation says that "Furthermor e, the socket module transparently converts
Unicode host names to ACE, so that applications need not be concerned about
converting host names themselves when they pass them to the socket module."
However, that's not quite true; the IDNA rules say that syntax errors must
be treated as errors, so you have to be prepared for IDNA exceptions.
They are all "UnicodeErr or" exceptions.

It's worth a mention in the documentation for "socket".

John Nagle

D:\>/python25/python.exe
Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit (Intel)] on win
32
Type "help", "copyright" , "credits" or "license" for more information.
>>ss = 'www.gallery84. .com'
>>uss = unicode(ss)
>>import socket
>>socket.getadd rinfo(ss,"http" )
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
socket.gaierror : (11001, 'getaddrinfo failed')
>>socket.getadd rinfo(uss,"http ")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\python25\li b\encodings\idn a.py", line 164, in encode
result.append(T oASCII(label))
File "D:\python25\li b\encodings\idn a.py", line 73, in ToASCII
raise UnicodeError("l abel empty or too long")
UnicodeError: label empty or too long
>>>
I took a look at the documentation but couldn't see where to add what,
given that the documentation for socket already says:

"""All errors raise exceptions. The normal exceptions for invalid
argument types and out-of-memory conditions can be raised; errors
related to socket or address semantics raise the error socket.error.
""".

Do we really need to specifically mention Unicode errors?

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Recent Ramblings http://holdenweb.blogspot.com

Apr 21 '07 #2
Steve Holden wrote:
John Nagle wrote:
> Here's a strange little bug. "socket.getaddr info" blows up
if given a bad domain name containing ".." in Unicode. The
same string in ASCII produces the correct "gaierror" exception.

Actually, this deserves a documentation mention. The "socket"
module,
given a Unicode string, calls the International Domain Name parser,
"idna.py", which has a a whole error system of its own. The IDNA
documentatio n says that "Furthermor e, the socket module transparently
converts Unicode host names to ACE, so that applications need not be
concerned about converting host names themselves when they pass them
to the socket module."
However, that's not quite true; the IDNA rules say that syntax errors
must
be treated as errors, so you have to be prepared for IDNA exceptions.
They are all "UnicodeErr or" exceptions.

It's worth a mention in the documentation for "socket".

John Nagle

D:\>/python25/python.exe
Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
(Intel)] on win
32
Type "help", "copyright" , "credits" or "license" for more information.
> >>ss = 'www.gallery84. .com'
uss = unicode(ss)
import socket
socket.getadd rinfo(ss,"http" )
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
socket.gaierro r: (11001, 'getaddrinfo failed')
> >>socket.getadd rinfo(uss,"http ")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\python25\li b\encodings\idn a.py", line 164, in encode
result.append(T oASCII(label))
File "D:\python25\li b\encodings\idn a.py", line 73, in ToASCII
raise UnicodeError("l abel empty or too long")
UnicodeError : label empty or too long
> >>>
I took a look at the documentation but couldn't see where to add what,
given that the documentation for socket already says:

"""All errors raise exceptions. The normal exceptions for invalid
argument types and out-of-memory conditions can be raised; errors
related to socket or address semantics raise the error socket.error.
""".

Do we really need to specifically mention Unicode errors?
It says "errors related to socket or address semantics raise the
error 'socket.error'" , so, yes. The error really has nothing to
do with Unicode; it's that a different parser is used when a domain
name is in Unicode. It really shouldn't be a "Unicode error" at
all.

When Python goes to Unicode by default, this is likely to break
some existing code. Python's IDNA support is good, but not entirely
invisible. The socket module documentation should mention IDNA
support. It's not clear, for example, when you call "getnameinfo()" ,
whether you get back the name in Unicode or in Punycode.

John Nagle
Apr 21 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1570
by: Bernhard Schmidt | last post by:
Hello, sorry for bothering, I'm not a programmer and I don't do much python, I'm more a networking guy trying to get his favourite linux distribution to update through the shiny new protocol IPv6 again (for those who are interested, I'm talking about Gentoo Linux) Gentoo's portage system is implemented in python calling rsync to sync...
25
4312
by: Justin Robbs | last post by:
I am trying to write the communcations part of a Point of Sale program for the Convenience Store industry. The setup in each store will have varying numbers of registers. There could be as few as 2 or as many as 12. The program I am working on runs on a computer which communicates to our gas pumps and sends status changes to all registers....
32
4379
by: Rene Pijlman | last post by:
One of the things I dislike about Java is the need to declare exceptions as part of an interface or class definition. But perhaps Java got this right... I've writen an application that uses urllib2, urlparse, robotparser and some other modules in the battery pack. One day my app failed with an urllib2.HTTPError. So I catch that. But then I...
1
3169
by: mirandacascade | last post by:
I noticed the following lines from the connect() method of the HTTPConnection class within httplib: for res in socket.getaddrinfo(self.host, self.port, 0, socket.SOCK_STREAM): af, socktype, proto, canonname, sa = res This led me to the docs that describe the socket.getaddrinfo() method: ...
3
2869
by: Thomas Dybdahl Ahle | last post by:
Hi, I'm writing an application that connects to the internet. Something like this: for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM): af, socktype, proto, canonname, sa = res try: self.sock = socket.socket(af, socktype, proto) Now if the user press the cancel button, I'd like the connection to imidiatly stop. I run
3
2225
by: Giampaolo Rodola' | last post by:
Hi there, since the socket.socket.family attribute has been introduced only in Python 2.5 and I need to have my application to be backward compatible with Python 2.3 and 2.4 I'd like to know how could I determine the family of a socket.socket instance which may be AF_INET or AF_INET6. Is there some kind of getsockopt() directive I could use?...
0
1629
by: =?Utf-8?B?T2xpdmllciBHSUw=?= | last post by:
Hello, I try to post an HTTP message containing an XML document, and I get the following exception : System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send. ---System.IO.IOException: Unable to write data to the transport connection: Uma solicitação de envio ou recepção de dados não foi...
1
3852
by: Karl Chen | last post by:
I've discovered that since glibc 2.3.2, getaddrinfo(3) supports a useful flag called AI_ADDRCONFIG. It turns off AAAA lookups if the machine isn't configured for IPv6 (and similarly for IPv4, theoretically). This is especially important when behind gateways whose DNS forwarder silently filter AAAA requests. Without AI_ADDRCONFIG, every DNS...
2
7658
Xx r3negade
by: Xx r3negade | last post by:
I am having trouble using connect() when entering struct addrinfo members as parameters. int tcpConnect() { struct addrinfo hints; struct addrinfo *ai = NULL; int res; int sock1; char * ipaddr = "127.0.0.1";
0
7746
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, well explore What is ONU, What Is Router, ONU & Routers main...
0
7668
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
8179
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7735
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
1
5556
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5257
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3694
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1269
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
992
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.