473,554 Members | 3,188 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

urllib timeout issues

I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:

Traceback (most recent call last):
File "ftp_20070326_D ownloads_cooper c_FetchLibreMap ProjectDRGs.py" ,
line 108, i
n ?
urllib.urlretri eve(fullurl, localfile)
File "C:\Python24\li b\urllib.py", line 89, in urlretrieve
return _urlopener.retr ieve(url, filename, reporthook, data)
File "C:\Python24\li b\urllib.py", line 222, in retrieve
fp = self.open(url, data)
File "C:\Python24\li b\urllib.py", line 190, in open
return getattr(self, name)(url)
File "C:\Python24\li b\urllib.py", line 322, in open_http
return self.http_error (url, fp, errcode, errmsg, headers)
File "C:\Python24\li b\urllib.py", line 335, in http_error
result = method(url, fp, errcode, errmsg, headers)
File "C:\Python24\li b\urllib.py", line 593, in http_error_302
data)
File "C:\Python24\li b\urllib.py", line 608, in redirect_intern al
return self.open(newur l)
File "C:\Python24\li b\urllib.py", line 190, in open
return getattr(self, name)(url)
File "C:\Python24\li b\urllib.py", line 313, in open_http
h.endheaders()
File "C:\Python24\li b\httplib.py", line 798, in endheaders
self._send_outp ut()
File "C:\Python24\li b\httplib.py", line 679, in _send_output
self.send(msg)
File "C:\Python24\li b\httplib.py", line 646, in send
self.connect()
File "C:\Python24\li b\httplib.py", line 630, in connect
raise socket.error, msg
IOError: [Errno socket error] (10060, 'Operation timed out')
I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.

Thanks.

<--- CODE --->

images = [['34095e3','Clay ton'],
['35096d2','Clea rview'],
['34095d1','Cleb it'],
['34095c3','Clou dy'],
['34096e2','Coal gate'],
['34096e1','Coal gate SE'],
['35095g7','Conc harty Mountain'],
['34096d6','Conn erville'],
['34096d5','Conn erville NE'],
['34096c5','Conn erville SE'],
['35094f8','Cook son'],
['35095e6','Coun cil Hill'],
['34095f5','Coun ts'],
['35095h6','Cowe ta'],
['35097h2','Coyl e'],
['35096c4','Crom well'],
['35095a6','Crow der'],
['35096h7','Cush ing']]

exts = ['tif', 'tfw']
envir = 'DEV'
# URL of our image(s) to grab
url = 'http://www.archive.org/download/'
logRoot = '//fayfiler/seecoapps/Geology/GEOREFRENCED IMAGES/TOPO/
Oklahoma UTMz14meters NAD27/'
logFile = os.path.join(lo gRoot, 'FetchLibreDRGs _' + strftime('%m_%d _%Y_
%H_%M_%S', localtime()) + '_' + envir + '.log')

# Local dir to store files in
fetchdir = logRoot
# Entire process start time
start = time.clock()

msg = envir + ' - ' + "Script: " + os.path.join(sy s.path[0],
sys.argv[0]) + ' - Start time: ' + strftime('%m/%d/%Y %I:%M:%S %p',
localtime()) + \

'\n--------------------------------------------------------------------------------------------------------------
\n\n'
AddPrintMessage (msg)
StartFinishMess age('Start')

# Loop thru image list, grab each tif and tfw
for image in images:
# Try and set socket timeout default to none
# Create a new socket connection for every time through list loop
s = socket.socket(s ocket.AF_INET, socket.SOCK_STR EAM)
s.connect(('arc hive.org', 80))
s.settimeout(No ne)

s2 = time.clock()
msg = '\nProcessing ' + image[0] + ' --' + image[1]
AddPrintMessage (msg)
print msg
for ext in exts:
fullurl = url + 'usgs_drg_ok_' + image[0][:5] + '_' + image[0]
[5:] + '/o' + image[0] + '.' + ext
localfile = fetchdir + image[0] + '_' +
string.replace( image[1], ' ', '_') + '.' + ext
urllib.urlretri eve(fullurl, localfile)
e2 = time.clock()
msg = '\nDone processing ' + image[0] + ' --' + image[1] +
'\nProcess took ' + Timer(s2, e2)
AddPrintMessage (msg)
print msg
# Close socket connection, only to reopen with next run thru loop
s.close()

end = time.clock()
StartFinishMess age('Finish')
msg = '\n\nDone! Process completed in ' + Timer(start, end)
AddPrintMessage (msg)

Mar 27 '07 #1
5 4659
En Tue, 27 Mar 2007 16:21:55 -0300, supercooper <su*********@gm ail.com>
escribió:
I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:

urllib.urlretri eve(fullurl, localfile)
IOError: [Errno socket error] (10060, 'Operation timed out')

I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.
You should do the opposite: timing out *early* -not waiting 2 hours- and
handling the error (maybe using a queue to hold pending requests)

--
Gabriel Genellina

Mar 27 '07 #2
On Mar 27, 3:13 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.a r>
wrote:
En Tue, 27 Mar 2007 16:21:55 -0300, supercooper <supercoo...@gm ail.com>
escribió:
I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:
urllib.urlretri eve(fullurl, localfile)
IOError: [Errno socket error] (10060, 'Operation timed out')
I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.

You should do the opposite: timing out *early* -not waiting 2 hours- and
handling the error (maybe using a queue to hold pending requests)

--
Gabriel Genellina
Gabriel, thanks for the input. So are you saying there is no way to
realistically *prevent* the timeout from occurring in the first
place? And by timing out early, do you mean to set the timeout for x
seconds and if and when the timeout occurs, handle the error and start
the process again somehow on the pending requests? Thanks.

chad

Mar 27 '07 #3
En Tue, 27 Mar 2007 17:41:44 -0300, supercooper <su*********@gm ail.com>
escribió:
On Mar 27, 3:13 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.a r>
wrote:
>En Tue, 27 Mar 2007 16:21:55 -0300, supercooper <supercoo...@gm ail.com>
escribió:
I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:
urllib.urlretri eve(fullurl, localfile)
IOError: [Errno socket error] (10060, 'Operation timed out')
I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.

You should do the opposite: timing out *early* -not waiting 2 hours- and
handling the error (maybe using a queue to hold pending requests)

--
Gabriel Genellina

Gabriel, thanks for the input. So are you saying there is no way to
realistically *prevent* the timeout from occurring in the first
Exactly. The error is out of your control: maybe the server is down,
irresponsive, overloaded, a proxy has any problems, any network problem,
etc.
place? And by timing out early, do you mean to set the timeout for x
seconds and if and when the timeout occurs, handle the error and start
the process again somehow on the pending requests? Thanks.
Exactly!
Another option: Python is cool, but there is no need to reinvent the
wheel. Use wget instead :)

--
Gabriel Genellina

Mar 27 '07 #4
On Mar 27, 4:41 pm, "supercoope r" <supercoo...@gm ail.comwrote:
On Mar 27, 3:13 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.a r>
wrote:
En Tue, 27 Mar 2007 16:21:55 -0300, supercooper <supercoo...@gm ail.com>
escribió:
I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:
urllib.urlretri eve(fullurl, localfile)
IOError: [Errno socket error] (10060, 'Operation timed out')
I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.
You should do the opposite: timing out *early* -not waiting 2 hours- and
handling the error (maybe using a queue to hold pending requests)
--
Gabriel Genellina

Gabriel, thanks for the input. So are you saying there is no way to
realistically *prevent* the timeout from occurring in the first
place? And by timing out early, do you mean to set the timeout for x
seconds and if and when the timeout occurs, handle the error and start
the process again somehow on the pending requests? Thanks.

chad
Chad,

Just run the retrieval in a Thread. If the thread is not done after x
seconds, then handle it as a timeout and then retry, ignore, quit or
anything else you want.

Even better, what I did for my program is first gather all the URLs (I
assume you can do that), then group by servers, i.e. n # of images
from foo.com, m # from bar.org .... Then start a thread for each
server (with some possible maximum number of threads), each one of
those threads will be responsible for retrieving images from only one
server (this is to prevent a DoS pattern). Let each of the server
threads start a 'small' retriever thread for each image (this is to
handle the timeout you mention).

So you have two different threads -- one per server to parallelize
downloading, which in turn will spawn and one per download to handle
timeout. This way you will (ideally) saturate your bandwidth but you
only get one image per server at a time so you still 'play nice' with
each of the servers. If you want to have a max # of server threads
running (in case you have way to many servers to deal with) then run
batches of server threads.

Hope this helps,
Nick Vatamaniuc

Mar 28 '07 #5
On Mar 27, 4:50 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.a r>
wrote:
En Tue, 27 Mar 2007 17:41:44 -0300, supercooper <supercoo...@gm ail.com>
escribió:
On Mar 27, 3:13 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.a r>
wrote:
En Tue, 27 Mar 2007 16:21:55 -0300, supercooper <supercoo...@gm ail.com>
escribió:
I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:
urllib.urlretri eve(fullurl, localfile)
IOError: [Errno socket error] (10060, 'Operation timed out')
I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.
You should do the opposite: timing out *early* -not waiting 2 hours- and
handling the error (maybe using a queue to hold pending requests)
--
Gabriel Genellina
Gabriel, thanks for the input. So are you saying there is no way to
realistically *prevent* the timeout from occurring in the first

Exactly. The error is out of your control: maybe the server is down,
irresponsive, overloaded, a proxy has any problems, any network problem,
etc.
place? And by timing out early, do you mean to set the timeout for x
seconds and if and when the timeout occurs, handle the error and start
the process again somehow on the pending requests? Thanks.

Exactly!
Another option: Python is cool, but there is no need to reinvent the
wheel. Use wget instead :)

--
Gabriel Genellina
Gabriel...thank s for the tip on wget...its awesome! I even built it on
my mac. It is working like a champ for hours on end...

Thanks!

chad


import os, shutil, string

images = [['34095d2','Nash oba'],
['34096c8','Nebo '],
['36095a4','Neod esha'],
['33095h7','New Oberlin'],
['35096f3','Newb y'],
['35094e5','Nicu t'],
['34096g2','Non'],
['35096h6','Nort h Village'],
['35095g3','Nort heast Muskogee'],
['35095g4','Nort hwest Muskogee'],
['35096f2','Nuya ka'],
['34094e6','Octa via'],
['36096a5','Oilt on'],
['35096d3','Okem ah'],
['35096c3','Okem ah SE'],
['35096e2','Okfu skee'],
['35096e1','Okmu lgee Lake'],
['35095f7','Okmu lgee NE'],
['35095f8','Okmu lgee North'],
['35095e8','Okmu lgee South'],
['35095e4','Okta ha'],
['34094b7','Old Glory Mountain'],
['36096a4','Oliv e'],
['34096d3','Olne y'],
['36095a6','Onet a'],
['34097a2','Over brook']]

wgetDir = 'C:/Program Files/wget/o'
exts = ['tif', 'tfw']
url = 'http://www.archive.org/download/'
home = '//fayfiler/seecoapps/Geology/GEOREFRENCED IMAGES/TOPO/Oklahoma
UTMz14meters NAD27/'

for image in images:
for ext in exts:
fullurl = url + 'usgs_drg_ok_' + image[0][:5] + '_' + image[0]
[5:] + '/o' + image[0] + '.' + ext
os.system('wget %s -t 10 -a log.log' % fullurl)
shutil.move(wge tDir + image[0] + '.' + ext, home + 'o' +
image[0] + '_' + string.replace( image[1], ' ', '_') + '.' + ext)

Mar 29 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
2394
by: Andreas Dahl | last post by:
Hi, I use urllib to retrieve data via HTTP. Unfortunately my program crashes after a while (after some loops) because the connection timed out. raise socket.error, msg IOError: (60, 'Connection timed out') I am not so familiar with python, but is there a possibility to configure the 'waiting time'? Or how can I handle such an event?...
1
2485
by: John Hunter | last post by:
I have a test script below which I use to fetch urls into strings, either over https or http. When over https, I use m2crypto.urllib and when over http I use the standard urllib. Whenever, I import sockets and setdefaulttimeout, however, using m2crypto.urllib tends to cause a http.BadStatusLine to be raised, even if the timeout is set to be...
3
2219
by: Chris Tavares | last post by:
Hi all. I'm currently tracking down a problem in a little script I have, and I was hoping that those more experienced than myself could weigh in. The script's job is to grab the status page off a DLink home router. This is a really simple job: I just use urllib.urlopen() to grab the status page. The router uses HTTP Basic authentication, so...
1
1522
by: Kingsley | last post by:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I've been having an intermittent problem with urllib. With an interval of around 15 minutes (i.e.: this is run every 15m) this code runs fine for about 1-2 weeks, but then gets it's knickers in a twist, and never seems to return, nor except.
4
9325
by: kgrafals | last post by:
Hi, I'm just trying to read from a webpage with urllib but I'm getting IOErrors. This is my code: import urllib sock = urllib.urlopen("http://www.google.com/") and this is the error:
4
6131
by: John Nagle | last post by:
There's no way to set a timeout if you use "urllib" to open a URL. "HTTP", which "urllib" uses, supports this, but the functionality is lost at the "urllib" level. It's not available via "class URLopener" or "FancyURLopener", either. There is a non-thread-safe workaround from 2003 at ...
5
7677
by: John Nagle | last post by:
I thought I had all the timeout problems with urllib worked around, but no. socket.setdefaulttimeout is useful, but not always effective. I'm setting that to 15 seconds. If the host end won't open the connection within 15 seconds, urllib times out. But if the host end opens the connection, then never sends anything, urllib waits for many...
1
12451
by: Abandoned | last post by:
Hi.. I want to set 30 second urllib.urlretrieve timeout.. Because if urllib.urlretrieve can't connect to page wait 1-2 hour... I download the images to my server with urlretrieve if you know the better way please help me. I'm sorry my bad english..
0
2083
by: John Nagle | last post by:
urllib has a "hole" in its timeout protection. Using "socket.setdefaulttimeout" will make urllib time out if a site doesn't open a TCP connection in the indicated time. But if the site opens the TCP connection and never sends HTTP headers, it takes about 20 minutes for the read in urllib's "open" to time out. There are some web servers...
0
7589
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
8029
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7551
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
6131
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5428
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5147
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3539
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2012
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
0
831
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.