By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,772 Members | 937 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,772 IT Pros & Developers. It's quick & easy.

urllib2 - safe way to download something

P: n/a
Hi,

I wonder if there is a safe way to download page with urllib2. I've
constructed following method to catch all possible exceptions.

def retrieve(url):
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers = {'User-Agent':user_agent}
request = urllib2.Request(url, headers=headers)
try:
handler = urllib2.urlopen(request)
data = handler.read()
handler.close()
except urllib2.HTTPError, e:
log.warning("Server couldn't fulfill the request: %s, %s" % \
(url, e.code))
return None
except urllib2.URLError, e:
log.warning("Failed to reach a server: %s, %s" % (url,
e.reason))
return None
except HTTPException, e:
log.warning("HTTP exception: %s, %s" % (url,
e.__class__.__name__))
return None
except socket.timeout:
log.warning("Timeout expired: %s" % (url))
return None
return data
But suddenly I've got the following:

Traceback (most recent call last):
File "/usr/lib/python2.5/threading.py", line 486, in
__bootstrap_inner
self.run()
File "/home/light/prj/ym-crawl/shops/dispatcher.py", line 122, in
run
self.task(self.queue, item)
File "scrawler.py", line 24, in spider
data = retrieve(url)
File "scrawler.py", line 44, in retrieve
data = handler.read()
File "/usr/lib/python2.5/socket.py", line 291, in read
data = self._sock.recv(recv_size)
File "/usr/lib/python2.5/httplib.py", line 509, in read
return self._read_chunked(amt)
File "/usr/lib/python2.5/httplib.py", line 563, in _read_chunked
value += self._safe_read(chunk_left)
File "/usr/lib/python2.5/httplib.py", line 602, in _safe_read
chunk = self.fp.read(min(amt, MAXAMOUNT))
File "/usr/lib/python2.5/socket.py", line 309, in read
data = self._sock.recv(recv_size)
error: (104, 'Connection reset by peer')

What did I miss? I don't really want to catch all errors. Thanks!
Nov 14 '08 #1
Share this Question
Share on Google+
3 Replies


P: n/a
I mean I don't want to catch all unexpected errors with empty
"except:" :).
Nov 14 '08 #2

P: n/a
On Fri, 14 Nov 2008 06:35:27 -0800, konstantin wrote:
Hi,

I wonder if there is a safe way to download page with urllib2. I've
constructed following method to catch all possible exceptions.
See here:

http://niallohiggins.com/2008/04/05/...documentation-
urllib2urlopen-exception-layering-problems/

There are probably others as well... I seem to recall getting
socket.error at some point myself.
--
Steven
Nov 14 '08 #3

P: n/a
On 14 , 18:12, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.auwrote:
On Fri, 14 Nov 2008 06:35:27 -0800, konstantin wrote:
Hi,
I wonder if there is a safe way to download page with urllib2. I've
constructed following method to catch all possible exceptions.

See here:

http://niallohiggins.com/2008/04/05/...documentation-
urllib2urlopen-exception-layering-problems/

There are probably others as well... I seem to recall getting
socket.error at some point myself.

--
Steven
Thanks. It's a nice post. But it seems there is no clear solution.
I remember I've caught IOError and ValueError as well.
I think urllib2 needs some unification on exception handling. It
breaks simplicity and this is no good.

But anyway, thanks.

ps Maybe I could contribute to this module but I do not really know
how and where to start.

Nov 14 '08 #4

This discussion thread is closed

Replies have been disabled for this discussion.