473,386 Members | 1,668 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

[urllib2 + Tor] How to handle 404?

Hello

I'm using the urllib2 module and Tor as a proxy to download data
from the web.

Occasionnally, urlllib2 returns 404, probably because of some issue
with the Tor network. This code doesn't solve the issue, as it just
loops through the same error indefinitely:

=====
for id in rows:
url = 'http://www.acme.com/?code=' + id[0]
while True:
try:
req = urllib2.Request(url, None, headers)
response = urllib2.urlopen(req).read()
except HTTPError,e:
print 'Error code: ', e.code
time.sleep(2)
continue
=====

Any idea of what I should do to handle this error properly?

Thank you.
Nov 7 '08 #1
2 5410
On Fri, Nov 7, 2008 at 12:05 AM, Gilles Ganault <no****@nospam.comwrote:
Hello

I'm using the urllib2 module and Tor as a proxy to download data
from the web.

Occasionnally, urlllib2 returns 404, probably because of some issue
with the Tor network. This code doesn't solve the issue, as it just
loops through the same error indefinitely:

=====
for id in rows:
url = 'http://www.acme.com/?code=' + id[0]
while True:
try:
req = urllib2.Request(url, None, headers)
response = urllib2.urlopen(req).read()
except HTTPError,e:
print 'Error code: ', e.code
time.sleep(2)
continue
else: #should align with the `except`
break
handle_success(response) #should align with `url =` line

Cheers,
Chris
--
Follow the path of the Iguana...
http://rebertia.com
=====

Any idea of what I should do to handle this error properly?

Thank you.
--
http://mail.python.org/mailman/listinfo/python-list
Nov 7 '08 #2
On Fri, Nov 7, 2008 at 2:28 AM, Chris Rebert <cl*@rebertia.comwrote:
>
On Fri, Nov 7, 2008 at 12:05 AM, Gilles Ganault <no****@nospam.comwrote:
Hello

I'm using the urllib2 module and Tor as a proxy to download data
from the web.

Occasionnally, urlllib2 returns 404, probably because of some issue
with the Tor network. This code doesn't solve the issue, as it just
loops through the same error indefinitely:

=====
*snip*

Cheers,
Chris
--
Follow the path of the Iguana...
http://rebertia.com
=====

Any idea of what I should do to handle this error properly?

Thank you.
--
http://mail.python.org/mailman/listinfo/python-list
--
http://mail.python.org/mailman/listinfo/python-list
It sounds like Gilles may be having an issue with persistent 404s, in
which case something like this could be more appropriate:

for id in rows:
url = 'http://www.acme.com/?code=' + id[0]
retries = 0
while retries < 10:
try:
req = urllib2.Request(url, None, headers)
response = urllib2.urlopen(req).read()
except HTTPError,e:
print 'Error code: ', e.code
retries += 1
time.sleep(2)
continue
else: #should align with the `except`
break
else:
print 'Fetch of ' + url + ' failed after ' + retries + 'tries.'
handle_success(response) #should align with `url =` line
Nov 7 '08 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Fuzzyman | last post by:
I'm trying to do Basic authentication using urllib2 and HTTPPasswordMgr Objects. According to my understanding I ought to perform the following simple sequence (except it doesn't work). Can...
0
by: Matt | last post by:
I'm trying to get the HTML data off of a webpage. Let's say for the sake of argument it's the python homepage. I've googled around and found some examples that people said worked. Here's what...
1
by: tomazi75-nospam(at)gmail.com | last post by:
Hello all, I've a problem using urllib2 with a proxy which need authentication. I've tested the 'simple way' : -- code -- import urllib # example values for the post
1
by: joemynz | last post by:
Help please with a URLError. Invoking a url that works in Firefox and IE results in a "urlerror 7, no address ..." in python. I need to debug why. Traceback is below. There's a redirect when the...
7
by: itay_k | last post by:
Hi, I dont understand why this is so complicated, just to add one line of cookie header on the GET request. This is my unworking code: import time import Cookie import cookielib, urllib2
3
by: m.banaouas | last post by:
Hi all, I started to use urllib2 library and HTTPBasicAuthHandler class in order to authenticate with a http server (Zope in this case). I don't know why but it doesn't work, while authenticating...
2
by: ken | last post by:
Hi, i have the following code to load a url. My question is what if I try to load an invalide url ("http:// www.heise.de/"), will I get an IOException? or it will wait forever? Thanks for any...
6
by: robean | last post by:
Hi everyone, I have a question about using urllib2. I like urllib2 better than urllib at least in part because it has more elaborate support for handling errors: there is built in support for...
1
by: leo1980 | last post by:
i use urllib2 to open the websites and get info. it works for a while, then will be stopped, the error in idle is http error 404 no such domain. but i just changed the parameters in the url by...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.