[urllib2 + Tor] How to handle 404?

Gilles Ganault

Hello

I'm using the urllib2 module and Tor as a proxy to download data
from the web.

Occasionnally, urlllib2 returns 404, probably because of some issue
with the Tor network. This code doesn't solve the issue, as it just
loops through the same error indefinitely:

=====
for id in rows:
url = 'http://www.acme.com/?code=' + id[0]
while True:
try:
req = urllib2.Request(url, None, headers)
response = urllib2.urlopen(req).read()
except HTTPError,e:
print 'Error code: ', e.code
time.sleep(2)
continue
=====

Any idea of what I should do to handle this error properly?

Thank you.

Nov 7 '08 #1

Subscribe Post Reply

5410

Chris Rebert

On Fri, Nov 7, 2008 at 12:05 AM, Gilles Ganault <no****@nospam.comwrote:

Hello

I'm using the urllib2 module and Tor as a proxy to download data
from the web.

Occasionnally, urlllib2 returns 404, probably because of some issue
with the Tor network. This code doesn't solve the issue, as it just
loops through the same error indefinitely:

=====
for id in rows:
url = 'http://www.acme.com/?code=' + id[0]
while True:
try:
req = urllib2.Request(url, None, headers)
response = urllib2.urlopen(req).read()
except HTTPError,e:
print 'Error code: ', e.code
time.sleep(2)
continue

else: #should align with the `except`
break
handle_success(response) #should align with `url =` line

Cheers,
Chris
--
Follow the path of the Iguana...
http://rebertia.com

=====

Any idea of what I should do to handle this error properly?

Thank you.
--
http://mail.python.org/mailman/listinfo/python-list

Nov 7 '08 #2

Steven McKay

On Fri, Nov 7, 2008 at 2:28 AM, Chris Rebert <cl*@rebertia.comwrote:

>
On Fri, Nov 7, 2008 at 12:05 AM, Gilles Ganault <no****@nospam.comwrote:
Hello

I'm using the urllib2 module and Tor as a proxy to download data
from the web.

Occasionnally, urlllib2 returns 404, probably because of some issue
with the Tor network. This code doesn't solve the issue, as it just
loops through the same error indefinitely:

=====
*snip*

Cheers,
Chris
--
Follow the path of the Iguana...
http://rebertia.com

=====

Any idea of what I should do to handle this error properly?

Thank you.
--
http://mail.python.org/mailman/listinfo/python-list
--
http://mail.python.org/mailman/listinfo/python-list

It sounds like Gilles may be having an issue with persistent 404s, in
which case something like this could be more appropriate:

for id in rows:
url = 'http://www.acme.com/?code=' + id[0]
retries = 0
while retries < 10:
try:
req = urllib2.Request(url, None, headers)
response = urllib2.urlopen(req).read()
except HTTPError,e:
print 'Error code: ', e.code
retries += 1
time.sleep(2)
continue
else: #should align with the `except`
break
else:
print 'Fetch of ' + url + ' failed after ' + retries + 'tries.'
handle_success(response) #should align with `url =` line

Nov 7 '08 #3

Similar topics

Authentication from urllib2

by: Fuzzyman | last post by:

I'm trying to do Basic authentication using urllib2 and HTTPPasswordMgr Objects. According to my understanding I ought to perform the following simple sequence (except it doesn't work). Can...

Python

urllib2.URLError: <urlopen error (7, 'getaddrinfo failed')> when trying to connect through a proxy

by: Matt | last post by:

I'm trying to get the HTML data off of a webpage. Let's say for the sake of argument it's the python homepage. I've googled around and found some examples that people said worked. Here's what...

Python

urllib2 and proxies support ?

by: tomazi75-nospam(at)gmail.com | last post by:

Hello all, I've a problem using urllib2 with a proxy which need authentication. I've tested the 'simple way' : -- code -- import urllib # example values for the post

Python

urlerror, urllib2: "no address" ... why or debug tips?

by: joemynz | last post by:

Help please with a URLError. Invoking a url that works in Firefox and IE results in a "urlerror 7, no address ..." in python. I need to debug why. Traceback is below. There's a redirect when the...

Python

send cookie on request with urllib2

by: itay_k | last post by:

Hi, I dont understand why this is so complicated, just to add one line of cookie header on the GET request. This is my unworking code: import time import Cookie import cookielib, urllib2

Python

urllib2 and HTTPBasicAuthHandler

by: m.banaouas | last post by:

Hi all, I started to use urllib2 library and HTTPBasicAuthHandler class in order to authenticate with a http server (Zope in this case). I don't know why but it doesn't work, while authenticating...

Python

Question about using urllib2 to load a url

by: ken | last post by:

Hi, i have the following code to load a url. My question is what if I try to load an invalide url ("http:// www.heise.de/"), will I get an IOException? or it will wait forever? Thanks for any...

Python

urllib2 and exceptions

by: robean | last post by:

Hi everyone, I have a question about using urllib2. I like urllib2 better than urllib at least in part because it has more elaborate support for handling errors: there is built in support for...

Python

urllib2 gets 404 error cannot display after a while

by: leo1980 | last post by:

i use urllib2 to open the websites and get info. it works for a while, then will be stopped, the error in idle is http error 404 no such domain. but i just changed the parameters in the url by...

Python

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing