473,739 Members | 2,531 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

More urllib timeout issues.

I thought I had all the timeout problems with urllib worked around,
but no.

socket.setdefau lttimeout is useful, but not always effective.
I'm setting that to 15 seconds.
If the host end won't open the connection within 15 seconds,
urllib times out. But if the host end opens the connection,
then never sends anything, urllib waits for many minutes before
timing out. Any idea how to deal with this? And don't just
say "use urllib2" unless you KNOW it works better there and
can explain why. I finally have M2Crypto and urllib playing
well together, and don't want to mess with that.

For some wierd reason, several UK academic sites have this
behavior, including "soton.ac.u k". If you try to open that
in a browser, the browser just sits there, and eventually,
after several minutes, displays "The site is taking too
long to respond".

What's the current status in this area? Some patches to sockets
were proposed a while back. There's a long history of trouble
in this area, and some fixes, but nothing that just works.
The sockets module has two timeout settings (socket.setdefa ulttimeout and
sock.settimeout , the M2Crypto module has two (sock.set_socke t_read_timeout and
sock.set_socket _write_timeout) , and none of them play well together
or with the urllib/urllib2/httplib level and the blocking/non blocking
socket distinction.

What we really should have is something like this:

Sockets should have
set_socket_conn ect_timeout
set_socket_read _timeout
set_socket_writ e_timeout

which set an upper limit on how long a socket can go with a request for
a connect, read or write pending but without progress on the connection.
This needs to be independent of select poll timeouts, and these timeouts
should work on blocking sockets.

The existing socket function

settimeout

should set all of the above, and

socket.setdefau lttimeout

should set the default value for settimeout to be used on new sockets.

SSL and M2Crypto, which wrap socket functionality,
should understand all the above functions.

HTTPlib, urllib, and urllib2 objects should understand

settimeout

Making the connect/read/write timeout distinction at that level
probably isn't worth the trouble.

Then we'd have a reasonable network timeout system.
We have about half of the above now, but it's not consistent.

Comments?

John Nagle
Apr 27 '07 #1
5 7697
John Nagle wrote:
I thought I had all the timeout problems with urllib worked around,
but no.

socket.setdefau lttimeout is useful, but not always effective.
I'm setting that to 15 seconds.
If the host end won't open the connection within 15 seconds,
urllib times out. But if the host end opens the connection,
then never sends anything, urllib waits for many minutes before
timing out. Any idea how to deal with this? And don't just
say "use urllib2" unless you KNOW it works better there and
can explain why. I finally have M2Crypto and urllib playing
well together, and don't want to mess with that.

For some wierd reason, several UK academic sites have this
behavior, including "soton.ac.u k". If you try to open that
in a browser, the browser just sits there, and eventually,
after several minutes, displays "The site is taking too
long to respond".

What's the current status in this area? Some patches to sockets
were proposed a while back. There's a long history of trouble
in this area, and some fixes, but nothing that just works.
The sockets module has two timeout settings (socket.setdefa ulttimeout and
sock.settimeout , the M2Crypto module has two (sock.set_socke t_read_timeout and
sock.set_socket _write_timeout) , and none of them play well together
or with the urllib/urllib2/httplib level and the blocking/non blocking
socket distinction.

What we really should have is something like this:

Sockets should have
set_socket_conn ect_timeout
set_socket_read _timeout
set_socket_writ e_timeout

which set an upper limit on how long a socket can go with a request for
a connect, read or write pending but without progress on the connection.
This needs to be independent of select poll timeouts, and these timeouts
should work on blocking sockets.

The existing socket function

settimeout

should set all of the above, and

socket.setdefau lttimeout

should set the default value for settimeout to be used on new sockets.

SSL and M2Crypto, which wrap socket functionality,
should understand all the above functions.

HTTPlib, urllib, and urllib2 objects should understand

settimeout

Making the connect/read/write timeout distinction at that level
probably isn't worth the trouble.

Then we'd have a reasonable network timeout system.
We have about half of the above now, but it's not consistent.

Comments?
The only comments I'll make for now are

1) There is work afoot to build timeout arguments into network libraries
for 2.6, and I know Facundo Batista has been involved, you might want to
Google or email Facundo about that.

2) The main reason why socket.setdefau lttimeout is unsuitable for many
purposes is its thread-unsafe property, so all threads must use the same
default timeout or have it randomly change according to the whim of hte
last thread to alter it.

3) This is important and sensible work and if properly followed through
will likely lead to serious quality improvements in the network libraries.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
------------------ Asciimercial ---------------------
Get Python in your .sig and on the web. Blog and lens
holdenweb.blogs pot.com squidoo.com/pythonology
tag items: del.icio.us/steve.holden/python
All these services currently offer free registration!
-------------- Thank You for Reading ----------------

Apr 28 '07 #2
Steve Holden wrote:
John Nagle wrote:
>Then we'd have a reasonable network timeout system.
We have about half of the above now, but it's not consistent.

Comments?
The only comments I'll make for now are

1) There is work afoot to build timeout arguments into network libraries
for 2.6, and I know Facundo Batista has been involved, you might want to
Google or email Facundo about that.
2) The main reason why socket.setdefau lttimeout is unsuitable for many
purposes is its thread-unsafe property, so all threads must use the same
default timeout or have it randomly change according to the whim of hte
last thread to alter it.
It has other problems. If you set that value, it affects
socket blocking/non blocking modes. It can mess up M2Crypto, causing
it to report "Peer did not return certificate".
3) This is important and sensible work and if properly followed through
will likely lead to serious quality improvements in the network libraries.
Agreed.
regards
Steve
I took a look at Facundo Batista's work in the tracker, and he
currently seems to be trying to work out a good way to test the
existing SSL module. It has to connect to something to be tested,
of course. Testing network functionality is tough; to do it right,
you need a little test network to talk to, one that forces some of
the error cases. And network testing doesn't have the repeatability
upon which the Python test system/buildbot depends.

It's really tough to test this stuff properly. The best I've
been able to do so far is to run the 11,000 site list from the
Webspam Challenge through our web spider.

Here's a list of URLs from our error log which
have given us connection trouble of one kind or another.
Most of these open an HTTP transaction, but for some reason,
don't carry it through to completion properly, resulting in
a long stall in urllib.
blaby.gov.uk
boys-brigade.org.uk
cam.ac.uk
essex.ac.uk
gla.ac.uk

open.ac.uk
soton.ac.uk
uea.ac.uk
ulster.ac.uk
So that's a short, but useful, set of timeout test cases. Those are the ones
that timed out after, not during, TCP connection opening.

It's interesting that this problem appears for the root domains of many
English universities. They must all run the same server software.

Some of these fail because "robotparse r", which uses "urllib", hangs
for minutes trying to read the "robots.txt " file associated with the
domain.

This isn't something that requires a major redesign. These are bugs.
John Nagle
Apr 28 '07 #3
John Nagle wrote:
I took a look at Facundo Batista's work in the tracker, and he
currently seems to be trying to work out a good way to test the
existing SSL module. It has to connect to something to be tested,
Right now, test_socket_ssl .py has, besides the previous tests, the
capability of executing openssl's s_server and connect to him.

I'm lees than a SSL begginer, so I do not have the knowledge to make
interesting tests, I just made the obvious ones...

If you know SSL, you can take that code and add new tests very easily.

Regards,

--
.. Facundo
..
Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/
May 2 '07 #4
Steve Holden wrote:
1) There is work afoot to build timeout arguments into network libraries
for 2.6, and I know Facundo Batista has been involved, you might want to
Google or email Facundo about that.
Right now (in svn trunk) httplib, ftplib, telnetlib, etc, has a timeout
argument.

If you use it, the socket timeout will be set (through s.settimeout()) .
What behaviour has the socket after setting it the timeout, is beyond of
these changes, though.

BTW, I still need to make the final step here, that is adding a timeout
argument to urllib2.urlopen ().

Regards,

--
.. Facundo
..
Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/
May 2 '07 #5
Facundo Batista wrote:
Steve Holden wrote:

>>1) There is work afoot to build timeout arguments into network libraries
for 2.6, and I know Facundo Batista has been involved, you might want to
Google or email Facundo about that.


Right now (in svn trunk) httplib, ftplib, telnetlib, etc, has a timeout
argument.

If you use it, the socket timeout will be set (through s.settimeout()) .
What behaviour has the socket after setting it the timeout, is beyond of
these changes, though.

BTW, I still need to make the final step here, that is adding a timeout
argument to urllib2.urlopen ().

Regards,
urllib, robotparser, and M2Crypto also need to be updated to match.

John Nagle
May 3 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
2398
by: Andreas Dahl | last post by:
Hi, I use urllib to retrieve data via HTTP. Unfortunately my program crashes after a while (after some loops) because the connection timed out. raise socket.error, msg IOError: (60, 'Connection timed out') I am not so familiar with python, but is there a possibility to configure the 'waiting time'? Or how can I handle such an event? To skip
1
2495
by: John Hunter | last post by:
I have a test script below which I use to fetch urls into strings, either over https or http. When over https, I use m2crypto.urllib and when over http I use the standard urllib. Whenever, I import sockets and setdefaulttimeout, however, using m2crypto.urllib tends to cause a http.BadStatusLine to be raised, even if the timeout is set to be very large. All of the documents in the test script can be accessed publicly. Any ideas? Is...
1
1543
by: Kingsley | last post by:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I've been having an intermittent problem with urllib. With an interval of around 15 minutes (i.e.: this is run every 15m) this code runs fine for about 1-2 weeks, but then gets it's knickers in a twist, and never seems to return, nor except.
4
9343
by: kgrafals | last post by:
Hi, I'm just trying to read from a webpage with urllib but I'm getting IOErrors. This is my code: import urllib sock = urllib.urlopen("http://www.google.com/") and this is the error:
4
6138
by: John Nagle | last post by:
There's no way to set a timeout if you use "urllib" to open a URL. "HTTP", which "urllib" uses, supports this, but the functionality is lost at the "urllib" level. It's not available via "class URLopener" or "FancyURLopener", either. There is a non-thread-safe workaround from 2003 at http://mail.python.org/pipermail/python-bugs-list/2003-September/020405.html
7
9944
by: dadocsis | last post by:
I have a custom VB.net application that runs a query against an SQL server that takes more than 3 minutes. Right now after 3 minutes it times out. So far after doing my research I have changed the following: in web.config: <system.web> <httpRuntime executionTimeout="1000" maxRequestLength="2000000"/> <sessionState mode="InProc"
5
4682
by: supercooper | last post by:
I am downloading images using the script below. Sometimes it will go for 10 mins, sometimes 2 hours before timing out with the following error: Traceback (most recent call last): File "ftp_20070326_Downloads_cooperc_FetchLibreMapProjectDRGs.py", line 108, i n ? urllib.urlretrieve(fullurl, localfile) File "C:\Python24\lib\urllib.py", line 89, in urlretrieve
1
12518
by: Abandoned | last post by:
Hi.. I want to set 30 second urllib.urlretrieve timeout.. Because if urllib.urlretrieve can't connect to page wait 1-2 hour... I download the images to my server with urlretrieve if you know the better way please help me. I'm sorry my bad english..
0
2097
by: John Nagle | last post by:
urllib has a "hole" in its timeout protection. Using "socket.setdefaulttimeout" will make urllib time out if a site doesn't open a TCP connection in the indicated time. But if the site opens the TCP connection and never sends HTTP headers, it takes about 20 minutes for the read in urllib's "open" to time out. There are some web servers that produce this behavior, and many seem to be associated with British universities and nonprofits....
0
8969
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, weíll explore What is ONU, What Is Router, ONU & Routerís main usage, and What is the difference between ONU and Router. Letís take a closer look ! Part I. Meaning of...
0
8794
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9483
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9211
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6056
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4572
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4826
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2748
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2195
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.