By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,290 Members | 1,204 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,290 IT Pros & Developers. It's quick & easy.

urllib timeout issues

P: n/a
I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:

Traceback (most recent call last):
File "ftp_20070326_Downloads_cooperc_FetchLibreMapProje ctDRGs.py",
line 108, i
n ?
urllib.urlretrieve(fullurl, localfile)
File "C:\Python24\lib\urllib.py", line 89, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "C:\Python24\lib\urllib.py", line 222, in retrieve
fp = self.open(url, data)
File "C:\Python24\lib\urllib.py", line 190, in open
return getattr(self, name)(url)
File "C:\Python24\lib\urllib.py", line 322, in open_http
return self.http_error(url, fp, errcode, errmsg, headers)
File "C:\Python24\lib\urllib.py", line 335, in http_error
result = method(url, fp, errcode, errmsg, headers)
File "C:\Python24\lib\urllib.py", line 593, in http_error_302
data)
File "C:\Python24\lib\urllib.py", line 608, in redirect_internal
return self.open(newurl)
File "C:\Python24\lib\urllib.py", line 190, in open
return getattr(self, name)(url)
File "C:\Python24\lib\urllib.py", line 313, in open_http
h.endheaders()
File "C:\Python24\lib\httplib.py", line 798, in endheaders
self._send_output()
File "C:\Python24\lib\httplib.py", line 679, in _send_output
self.send(msg)
File "C:\Python24\lib\httplib.py", line 646, in send
self.connect()
File "C:\Python24\lib\httplib.py", line 630, in connect
raise socket.error, msg
IOError: [Errno socket error] (10060, 'Operation timed out')
I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.

Thanks.

<--- CODE --->

images = [['34095e3','Clayton'],
['35096d2','Clearview'],
['34095d1','Clebit'],
['34095c3','Cloudy'],
['34096e2','Coalgate'],
['34096e1','Coalgate SE'],
['35095g7','Concharty Mountain'],
['34096d6','Connerville'],
['34096d5','Connerville NE'],
['34096c5','Connerville SE'],
['35094f8','Cookson'],
['35095e6','Council Hill'],
['34095f5','Counts'],
['35095h6','Coweta'],
['35097h2','Coyle'],
['35096c4','Cromwell'],
['35095a6','Crowder'],
['35096h7','Cushing']]

exts = ['tif', 'tfw']
envir = 'DEV'
# URL of our image(s) to grab
url = 'http://www.archive.org/download/'
logRoot = '//fayfiler/seecoapps/Geology/GEOREFRENCED IMAGES/TOPO/
Oklahoma UTMz14meters NAD27/'
logFile = os.path.join(logRoot, 'FetchLibreDRGs_' + strftime('%m_%d_%Y_
%H_%M_%S', localtime()) + '_' + envir + '.log')

# Local dir to store files in
fetchdir = logRoot
# Entire process start time
start = time.clock()

msg = envir + ' - ' + "Script: " + os.path.join(sys.path[0],
sys.argv[0]) + ' - Start time: ' + strftime('%m/%d/%Y %I:%M:%S %p',
localtime()) + \

'\n--------------------------------------------------------------------------------------------------------------
\n\n'
AddPrintMessage(msg)
StartFinishMessage('Start')

# Loop thru image list, grab each tif and tfw
for image in images:
# Try and set socket timeout default to none
# Create a new socket connection for every time through list loop
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('archive.org', 80))
s.settimeout(None)

s2 = time.clock()
msg = '\nProcessing ' + image[0] + ' --' + image[1]
AddPrintMessage(msg)
print msg
for ext in exts:
fullurl = url + 'usgs_drg_ok_' + image[0][:5] + '_' + image[0]
[5:] + '/o' + image[0] + '.' + ext
localfile = fetchdir + image[0] + '_' +
string.replace(image[1], ' ', '_') + '.' + ext
urllib.urlretrieve(fullurl, localfile)
e2 = time.clock()
msg = '\nDone processing ' + image[0] + ' --' + image[1] +
'\nProcess took ' + Timer(s2, e2)
AddPrintMessage(msg)
print msg
# Close socket connection, only to reopen with next run thru loop
s.close()

end = time.clock()
StartFinishMessage('Finish')
msg = '\n\nDone! Process completed in ' + Timer(start, end)
AddPrintMessage(msg)

Mar 27 '07 #1
Share this Question
Share on Google+
5 Replies


P: n/a
En Tue, 27 Mar 2007 16:21:55 -0300, supercooper <su*********@gmail.com>
escribió:
I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:

urllib.urlretrieve(fullurl, localfile)
IOError: [Errno socket error] (10060, 'Operation timed out')

I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.
You should do the opposite: timing out *early* -not waiting 2 hours- and
handling the error (maybe using a queue to hold pending requests)

--
Gabriel Genellina

Mar 27 '07 #2

P: n/a
On Mar 27, 3:13 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
En Tue, 27 Mar 2007 16:21:55 -0300, supercooper <supercoo...@gmail.com>
escribió:
I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:
urllib.urlretrieve(fullurl, localfile)
IOError: [Errno socket error] (10060, 'Operation timed out')
I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.

You should do the opposite: timing out *early* -not waiting 2 hours- and
handling the error (maybe using a queue to hold pending requests)

--
Gabriel Genellina
Gabriel, thanks for the input. So are you saying there is no way to
realistically *prevent* the timeout from occurring in the first
place? And by timing out early, do you mean to set the timeout for x
seconds and if and when the timeout occurs, handle the error and start
the process again somehow on the pending requests? Thanks.

chad

Mar 27 '07 #3

P: n/a
En Tue, 27 Mar 2007 17:41:44 -0300, supercooper <su*********@gmail.com>
escribió:
On Mar 27, 3:13 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
>En Tue, 27 Mar 2007 16:21:55 -0300, supercooper <supercoo...@gmail.com>
escribió:
I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:
urllib.urlretrieve(fullurl, localfile)
IOError: [Errno socket error] (10060, 'Operation timed out')
I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.

You should do the opposite: timing out *early* -not waiting 2 hours- and
handling the error (maybe using a queue to hold pending requests)

--
Gabriel Genellina

Gabriel, thanks for the input. So are you saying there is no way to
realistically *prevent* the timeout from occurring in the first
Exactly. The error is out of your control: maybe the server is down,
irresponsive, overloaded, a proxy has any problems, any network problem,
etc.
place? And by timing out early, do you mean to set the timeout for x
seconds and if and when the timeout occurs, handle the error and start
the process again somehow on the pending requests? Thanks.
Exactly!
Another option: Python is cool, but there is no need to reinvent the
wheel. Use wget instead :)

--
Gabriel Genellina

Mar 27 '07 #4

P: n/a
On Mar 27, 4:41 pm, "supercooper" <supercoo...@gmail.comwrote:
On Mar 27, 3:13 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
En Tue, 27 Mar 2007 16:21:55 -0300, supercooper <supercoo...@gmail.com>
escribió:
I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:
urllib.urlretrieve(fullurl, localfile)
IOError: [Errno socket error] (10060, 'Operation timed out')
I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.
You should do the opposite: timing out *early* -not waiting 2 hours- and
handling the error (maybe using a queue to hold pending requests)
--
Gabriel Genellina

Gabriel, thanks for the input. So are you saying there is no way to
realistically *prevent* the timeout from occurring in the first
place? And by timing out early, do you mean to set the timeout for x
seconds and if and when the timeout occurs, handle the error and start
the process again somehow on the pending requests? Thanks.

chad
Chad,

Just run the retrieval in a Thread. If the thread is not done after x
seconds, then handle it as a timeout and then retry, ignore, quit or
anything else you want.

Even better, what I did for my program is first gather all the URLs (I
assume you can do that), then group by servers, i.e. n # of images
from foo.com, m # from bar.org .... Then start a thread for each
server (with some possible maximum number of threads), each one of
those threads will be responsible for retrieving images from only one
server (this is to prevent a DoS pattern). Let each of the server
threads start a 'small' retriever thread for each image (this is to
handle the timeout you mention).

So you have two different threads -- one per server to parallelize
downloading, which in turn will spawn and one per download to handle
timeout. This way you will (ideally) saturate your bandwidth but you
only get one image per server at a time so you still 'play nice' with
each of the servers. If you want to have a max # of server threads
running (in case you have way to many servers to deal with) then run
batches of server threads.

Hope this helps,
Nick Vatamaniuc

Mar 28 '07 #5

P: n/a
On Mar 27, 4:50 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
En Tue, 27 Mar 2007 17:41:44 -0300, supercooper <supercoo...@gmail.com>
escribió:
On Mar 27, 3:13 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
En Tue, 27 Mar 2007 16:21:55 -0300, supercooper <supercoo...@gmail.com>
escribió:
I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:
urllib.urlretrieve(fullurl, localfile)
IOError: [Errno socket error] (10060, 'Operation timed out')
I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.
You should do the opposite: timing out *early* -not waiting 2 hours- and
handling the error (maybe using a queue to hold pending requests)
--
Gabriel Genellina
Gabriel, thanks for the input. So are you saying there is no way to
realistically *prevent* the timeout from occurring in the first

Exactly. The error is out of your control: maybe the server is down,
irresponsive, overloaded, a proxy has any problems, any network problem,
etc.
place? And by timing out early, do you mean to set the timeout for x
seconds and if and when the timeout occurs, handle the error and start
the process again somehow on the pending requests? Thanks.

Exactly!
Another option: Python is cool, but there is no need to reinvent the
wheel. Use wget instead :)

--
Gabriel Genellina
Gabriel...thanks for the tip on wget...its awesome! I even built it on
my mac. It is working like a champ for hours on end...

Thanks!

chad


import os, shutil, string

images = [['34095d2','Nashoba'],
['34096c8','Nebo'],
['36095a4','Neodesha'],
['33095h7','New Oberlin'],
['35096f3','Newby'],
['35094e5','Nicut'],
['34096g2','Non'],
['35096h6','North Village'],
['35095g3','Northeast Muskogee'],
['35095g4','Northwest Muskogee'],
['35096f2','Nuyaka'],
['34094e6','Octavia'],
['36096a5','Oilton'],
['35096d3','Okemah'],
['35096c3','Okemah SE'],
['35096e2','Okfuskee'],
['35096e1','Okmulgee Lake'],
['35095f7','Okmulgee NE'],
['35095f8','Okmulgee North'],
['35095e8','Okmulgee South'],
['35095e4','Oktaha'],
['34094b7','Old Glory Mountain'],
['36096a4','Olive'],
['34096d3','Olney'],
['36095a6','Oneta'],
['34097a2','Overbrook']]

wgetDir = 'C:/Program Files/wget/o'
exts = ['tif', 'tfw']
url = 'http://www.archive.org/download/'
home = '//fayfiler/seecoapps/Geology/GEOREFRENCED IMAGES/TOPO/Oklahoma
UTMz14meters NAD27/'

for image in images:
for ext in exts:
fullurl = url + 'usgs_drg_ok_' + image[0][:5] + '_' + image[0]
[5:] + '/o' + image[0] + '.' + ext
os.system('wget %s -t 10 -a log.log' % fullurl)
shutil.move(wgetDir + image[0] + '.' + ext, home + 'o' +
image[0] + '_' + string.replace(image[1], ' ', '_') + '.' + ext)

Mar 29 '07 #6

This discussion thread is closed

Replies have been disabled for this discussion.