By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,584 Members | 1,754 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,584 IT Pros & Developers. It's quick & easy.

[urllib2 + Tor] How to handle 404?

P: n/a
Hello

I'm using the urllib2 module and Tor as a proxy to download data
from the web.

Occasionnally, urlllib2 returns 404, probably because of some issue
with the Tor network. This code doesn't solve the issue, as it just
loops through the same error indefinitely:

=====
for id in rows:
url = 'http://www.acme.com/?code=' + id[0]
while True:
try:
req = urllib2.Request(url, None, headers)
response = urllib2.urlopen(req).read()
except HTTPError,e:
print 'Error code: ', e.code
time.sleep(2)
continue
=====

Any idea of what I should do to handle this error properly?

Thank you.
Nov 7 '08 #1
Share this Question
Share on Google+
2 Replies


P: n/a
On Fri, Nov 7, 2008 at 12:05 AM, Gilles Ganault <no****@nospam.comwrote:
Hello

I'm using the urllib2 module and Tor as a proxy to download data
from the web.

Occasionnally, urlllib2 returns 404, probably because of some issue
with the Tor network. This code doesn't solve the issue, as it just
loops through the same error indefinitely:

=====
for id in rows:
url = 'http://www.acme.com/?code=' + id[0]
while True:
try:
req = urllib2.Request(url, None, headers)
response = urllib2.urlopen(req).read()
except HTTPError,e:
print 'Error code: ', e.code
time.sleep(2)
continue
else: #should align with the `except`
break
handle_success(response) #should align with `url =` line

Cheers,
Chris
--
Follow the path of the Iguana...
http://rebertia.com
=====

Any idea of what I should do to handle this error properly?

Thank you.
--
http://mail.python.org/mailman/listinfo/python-list
Nov 7 '08 #2

P: n/a
On Fri, Nov 7, 2008 at 2:28 AM, Chris Rebert <cl*@rebertia.comwrote:
>
On Fri, Nov 7, 2008 at 12:05 AM, Gilles Ganault <no****@nospam.comwrote:
Hello

I'm using the urllib2 module and Tor as a proxy to download data
from the web.

Occasionnally, urlllib2 returns 404, probably because of some issue
with the Tor network. This code doesn't solve the issue, as it just
loops through the same error indefinitely:

=====
*snip*

Cheers,
Chris
--
Follow the path of the Iguana...
http://rebertia.com
=====

Any idea of what I should do to handle this error properly?

Thank you.
--
http://mail.python.org/mailman/listinfo/python-list
--
http://mail.python.org/mailman/listinfo/python-list
It sounds like Gilles may be having an issue with persistent 404s, in
which case something like this could be more appropriate:

for id in rows:
url = 'http://www.acme.com/?code=' + id[0]
retries = 0
while retries < 10:
try:
req = urllib2.Request(url, None, headers)
response = urllib2.urlopen(req).read()
except HTTPError,e:
print 'Error code: ', e.code
retries += 1
time.sleep(2)
continue
else: #should align with the `except`
break
else:
print 'Fetch of ' + url + ' failed after ' + retries + 'tries.'
handle_success(response) #should align with `url =` line
Nov 7 '08 #3

This discussion thread is closed

Replies have been disabled for this discussion.