472,363 Members | 2,059 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,363 software developers and data experts.

urllib timeout issues

I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:

Traceback (most recent call last):
File "ftp_20070326_Downloads_cooperc_FetchLibreMapProje ctDRGs.py",
line 108, i
n ?
urllib.urlretrieve(fullurl, localfile)
File "C:\Python24\lib\urllib.py", line 89, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "C:\Python24\lib\urllib.py", line 222, in retrieve
fp = self.open(url, data)
File "C:\Python24\lib\urllib.py", line 190, in open
return getattr(self, name)(url)
File "C:\Python24\lib\urllib.py", line 322, in open_http
return self.http_error(url, fp, errcode, errmsg, headers)
File "C:\Python24\lib\urllib.py", line 335, in http_error
result = method(url, fp, errcode, errmsg, headers)
File "C:\Python24\lib\urllib.py", line 593, in http_error_302
data)
File "C:\Python24\lib\urllib.py", line 608, in redirect_internal
return self.open(newurl)
File "C:\Python24\lib\urllib.py", line 190, in open
return getattr(self, name)(url)
File "C:\Python24\lib\urllib.py", line 313, in open_http
h.endheaders()
File "C:\Python24\lib\httplib.py", line 798, in endheaders
self._send_output()
File "C:\Python24\lib\httplib.py", line 679, in _send_output
self.send(msg)
File "C:\Python24\lib\httplib.py", line 646, in send
self.connect()
File "C:\Python24\lib\httplib.py", line 630, in connect
raise socket.error, msg
IOError: [Errno socket error] (10060, 'Operation timed out')
I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.

Thanks.

<--- CODE --->

images = [['34095e3','Clayton'],
['35096d2','Clearview'],
['34095d1','Clebit'],
['34095c3','Cloudy'],
['34096e2','Coalgate'],
['34096e1','Coalgate SE'],
['35095g7','Concharty Mountain'],
['34096d6','Connerville'],
['34096d5','Connerville NE'],
['34096c5','Connerville SE'],
['35094f8','Cookson'],
['35095e6','Council Hill'],
['34095f5','Counts'],
['35095h6','Coweta'],
['35097h2','Coyle'],
['35096c4','Cromwell'],
['35095a6','Crowder'],
['35096h7','Cushing']]

exts = ['tif', 'tfw']
envir = 'DEV'
# URL of our image(s) to grab
url = 'http://www.archive.org/download/'
logRoot = '//fayfiler/seecoapps/Geology/GEOREFRENCED IMAGES/TOPO/
Oklahoma UTMz14meters NAD27/'
logFile = os.path.join(logRoot, 'FetchLibreDRGs_' + strftime('%m_%d_%Y_
%H_%M_%S', localtime()) + '_' + envir + '.log')

# Local dir to store files in
fetchdir = logRoot
# Entire process start time
start = time.clock()

msg = envir + ' - ' + "Script: " + os.path.join(sys.path[0],
sys.argv[0]) + ' - Start time: ' + strftime('%m/%d/%Y %I:%M:%S %p',
localtime()) + \

'\n--------------------------------------------------------------------------------------------------------------
\n\n'
AddPrintMessage(msg)
StartFinishMessage('Start')

# Loop thru image list, grab each tif and tfw
for image in images:
# Try and set socket timeout default to none
# Create a new socket connection for every time through list loop
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('archive.org', 80))
s.settimeout(None)

s2 = time.clock()
msg = '\nProcessing ' + image[0] + ' --' + image[1]
AddPrintMessage(msg)
print msg
for ext in exts:
fullurl = url + 'usgs_drg_ok_' + image[0][:5] + '_' + image[0]
[5:] + '/o' + image[0] + '.' + ext
localfile = fetchdir + image[0] + '_' +
string.replace(image[1], ' ', '_') + '.' + ext
urllib.urlretrieve(fullurl, localfile)
e2 = time.clock()
msg = '\nDone processing ' + image[0] + ' --' + image[1] +
'\nProcess took ' + Timer(s2, e2)
AddPrintMessage(msg)
print msg
# Close socket connection, only to reopen with next run thru loop
s.close()

end = time.clock()
StartFinishMessage('Finish')
msg = '\n\nDone! Process completed in ' + Timer(start, end)
AddPrintMessage(msg)

Mar 27 '07 #1
5 4567
En Tue, 27 Mar 2007 16:21:55 -0300, supercooper <su*********@gmail.com>
escribió:
I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:

urllib.urlretrieve(fullurl, localfile)
IOError: [Errno socket error] (10060, 'Operation timed out')

I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.
You should do the opposite: timing out *early* -not waiting 2 hours- and
handling the error (maybe using a queue to hold pending requests)

--
Gabriel Genellina

Mar 27 '07 #2
On Mar 27, 3:13 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
En Tue, 27 Mar 2007 16:21:55 -0300, supercooper <supercoo...@gmail.com>
escribió:
I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:
urllib.urlretrieve(fullurl, localfile)
IOError: [Errno socket error] (10060, 'Operation timed out')
I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.

You should do the opposite: timing out *early* -not waiting 2 hours- and
handling the error (maybe using a queue to hold pending requests)

--
Gabriel Genellina
Gabriel, thanks for the input. So are you saying there is no way to
realistically *prevent* the timeout from occurring in the first
place? And by timing out early, do you mean to set the timeout for x
seconds and if and when the timeout occurs, handle the error and start
the process again somehow on the pending requests? Thanks.

chad

Mar 27 '07 #3
En Tue, 27 Mar 2007 17:41:44 -0300, supercooper <su*********@gmail.com>
escribió:
On Mar 27, 3:13 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
>En Tue, 27 Mar 2007 16:21:55 -0300, supercooper <supercoo...@gmail.com>
escribió:
I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:
urllib.urlretrieve(fullurl, localfile)
IOError: [Errno socket error] (10060, 'Operation timed out')
I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.

You should do the opposite: timing out *early* -not waiting 2 hours- and
handling the error (maybe using a queue to hold pending requests)

--
Gabriel Genellina

Gabriel, thanks for the input. So are you saying there is no way to
realistically *prevent* the timeout from occurring in the first
Exactly. The error is out of your control: maybe the server is down,
irresponsive, overloaded, a proxy has any problems, any network problem,
etc.
place? And by timing out early, do you mean to set the timeout for x
seconds and if and when the timeout occurs, handle the error and start
the process again somehow on the pending requests? Thanks.
Exactly!
Another option: Python is cool, but there is no need to reinvent the
wheel. Use wget instead :)

--
Gabriel Genellina

Mar 27 '07 #4
On Mar 27, 4:41 pm, "supercooper" <supercoo...@gmail.comwrote:
On Mar 27, 3:13 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
En Tue, 27 Mar 2007 16:21:55 -0300, supercooper <supercoo...@gmail.com>
escribió:
I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:
urllib.urlretrieve(fullurl, localfile)
IOError: [Errno socket error] (10060, 'Operation timed out')
I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.
You should do the opposite: timing out *early* -not waiting 2 hours- and
handling the error (maybe using a queue to hold pending requests)
--
Gabriel Genellina

Gabriel, thanks for the input. So are you saying there is no way to
realistically *prevent* the timeout from occurring in the first
place? And by timing out early, do you mean to set the timeout for x
seconds and if and when the timeout occurs, handle the error and start
the process again somehow on the pending requests? Thanks.

chad
Chad,

Just run the retrieval in a Thread. If the thread is not done after x
seconds, then handle it as a timeout and then retry, ignore, quit or
anything else you want.

Even better, what I did for my program is first gather all the URLs (I
assume you can do that), then group by servers, i.e. n # of images
from foo.com, m # from bar.org .... Then start a thread for each
server (with some possible maximum number of threads), each one of
those threads will be responsible for retrieving images from only one
server (this is to prevent a DoS pattern). Let each of the server
threads start a 'small' retriever thread for each image (this is to
handle the timeout you mention).

So you have two different threads -- one per server to parallelize
downloading, which in turn will spawn and one per download to handle
timeout. This way you will (ideally) saturate your bandwidth but you
only get one image per server at a time so you still 'play nice' with
each of the servers. If you want to have a max # of server threads
running (in case you have way to many servers to deal with) then run
batches of server threads.

Hope this helps,
Nick Vatamaniuc

Mar 28 '07 #5
On Mar 27, 4:50 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
En Tue, 27 Mar 2007 17:41:44 -0300, supercooper <supercoo...@gmail.com>
escribió:
On Mar 27, 3:13 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
En Tue, 27 Mar 2007 16:21:55 -0300, supercooper <supercoo...@gmail.com>
escribió:
I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:
urllib.urlretrieve(fullurl, localfile)
IOError: [Errno socket error] (10060, 'Operation timed out')
I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.
You should do the opposite: timing out *early* -not waiting 2 hours- and
handling the error (maybe using a queue to hold pending requests)
--
Gabriel Genellina
Gabriel, thanks for the input. So are you saying there is no way to
realistically *prevent* the timeout from occurring in the first

Exactly. The error is out of your control: maybe the server is down,
irresponsive, overloaded, a proxy has any problems, any network problem,
etc.
place? And by timing out early, do you mean to set the timeout for x
seconds and if and when the timeout occurs, handle the error and start
the process again somehow on the pending requests? Thanks.

Exactly!
Another option: Python is cool, but there is no need to reinvent the
wheel. Use wget instead :)

--
Gabriel Genellina
Gabriel...thanks for the tip on wget...its awesome! I even built it on
my mac. It is working like a champ for hours on end...

Thanks!

chad


import os, shutil, string

images = [['34095d2','Nashoba'],
['34096c8','Nebo'],
['36095a4','Neodesha'],
['33095h7','New Oberlin'],
['35096f3','Newby'],
['35094e5','Nicut'],
['34096g2','Non'],
['35096h6','North Village'],
['35095g3','Northeast Muskogee'],
['35095g4','Northwest Muskogee'],
['35096f2','Nuyaka'],
['34094e6','Octavia'],
['36096a5','Oilton'],
['35096d3','Okemah'],
['35096c3','Okemah SE'],
['35096e2','Okfuskee'],
['35096e1','Okmulgee Lake'],
['35095f7','Okmulgee NE'],
['35095f8','Okmulgee North'],
['35095e8','Okmulgee South'],
['35095e4','Oktaha'],
['34094b7','Old Glory Mountain'],
['36096a4','Olive'],
['34096d3','Olney'],
['36095a6','Oneta'],
['34097a2','Overbrook']]

wgetDir = 'C:/Program Files/wget/o'
exts = ['tif', 'tfw']
url = 'http://www.archive.org/download/'
home = '//fayfiler/seecoapps/Geology/GEOREFRENCED IMAGES/TOPO/Oklahoma
UTMz14meters NAD27/'

for image in images:
for ext in exts:
fullurl = url + 'usgs_drg_ok_' + image[0][:5] + '_' + image[0]
[5:] + '/o' + image[0] + '.' + ext
os.system('wget %s -t 10 -a log.log' % fullurl)
shutil.move(wgetDir + image[0] + '.' + ext, home + 'o' +
image[0] + '_' + string.replace(image[1], ' ', '_') + '.' + ext)

Mar 29 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Andreas Dahl | last post by:
Hi, I use urllib to retrieve data via HTTP. Unfortunately my program crashes after a while (after some loops) because the connection timed out. raise socket.error, msg IOError: (60,...
1
by: John Hunter | last post by:
I have a test script below which I use to fetch urls into strings, either over https or http. When over https, I use m2crypto.urllib and when over http I use the standard urllib. Whenever, I...
3
by: Chris Tavares | last post by:
Hi all. I'm currently tracking down a problem in a little script I have, and I was hoping that those more experienced than myself could weigh in. The script's job is to grab the status page off a...
1
by: Kingsley | last post by:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I've been having an intermittent problem with urllib. With an interval of around 15 minutes (i.e.: this is run every 15m) this code runs...
4
by: kgrafals | last post by:
Hi, I'm just trying to read from a webpage with urllib but I'm getting IOErrors. This is my code: import urllib sock = urllib.urlopen("http://www.google.com/") and this is the error:
4
by: John Nagle | last post by:
There's no way to set a timeout if you use "urllib" to open a URL. "HTTP", which "urllib" uses, supports this, but the functionality is lost at the "urllib" level. It's not available via "class...
5
by: John Nagle | last post by:
I thought I had all the timeout problems with urllib worked around, but no. socket.setdefaulttimeout is useful, but not always effective. I'm setting that to 15 seconds. If the host end won't...
1
by: Abandoned | last post by:
Hi.. I want to set 30 second urllib.urlretrieve timeout.. Because if urllib.urlretrieve can't connect to page wait 1-2 hour... I download the images to my server with urlretrieve if you know the...
0
by: John Nagle | last post by:
urllib has a "hole" in its timeout protection. Using "socket.setdefaulttimeout" will make urllib time out if a site doesn't open a TCP connection in the indicated time. But if the site opens...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was proposed, which integrated multiple engines and...
0
by: AndyPSV | last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific technical details, Gmail likely implements measures...
0
Oralloy
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the synthesis of my design into a bitstream, not the C++...
0
by: Carina712 | last post by:
Setting background colors for Excel documents can help to improve the visual appeal of the document and make it easier to read and understand. Background colors can be used to highlight important...
0
by: Rahul1995seven | last post by:
Introduction: In the realm of programming languages, Python has emerged as a powerhouse. With its simplicity, versatility, and robustness, Python has gained popularity among beginners and experts...
1
by: Ricardo de Mila | last post by:
Dear people, good afternoon... I have a form in msAccess with lots of controls and a specific routine must be triggered if the mouse_down event happens in any control. Than I need to discover what...
0
by: Johno34 | last post by:
I have this click event on my form. It speaks to a Datasheet Subform Private Sub Command260_Click() Dim r As DAO.Recordset Set r = Form_frmABCD.Form.RecordsetClone r.MoveFirst Do If...
1
by: ezappsrUS | last post by:
Hi, I wonder if someone knows where I am going wrong below. I have a continuous form and two labels where only one would be visible depending on the checkbox being checked or not. Below is the...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.