472,780 Members | 2,050 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,780 software developers and data experts.

Network failure when using urllib2

I have a script that uses urllib2 to repeatedly lookup web pages (in a
spider sort of way). It appears to function normally, but if it runs
too long I start to get 404 responses. If I try to use the internet
through any other programs (Outlook, FireFox, etc.) it will also fail.
If I stop the script, the internet returns.

Has anyone observed this behavior before? I am relatively new to
Python and would appreciate any suggestions.

Shuad

Jan 8 '07 #1
4 1179

jd****@gmail.com wrote:
I have a script that uses urllib2 to repeatedly lookup web pages (in a
spider sort of way). It appears to function normally, but if it runs
too long I start to get 404 responses. If I try to use the internet
through any other programs (Outlook, FireFox, etc.) it will also fail.
If I stop the script, the internet returns.

Has anyone observed this behavior before? I am relatively new to
Python and would appreciate any suggestions.

Shuad
I am assuming that you are fetching the full page every little while.
You are not supposed to do that. The admin of the web site you are
constantly hitting probably configured his server to block you
temporarily when that happens. But don't feel bad :-). This is a common
Beginners mistake.

Read here on the proper way to do this.
http://diveintopython.org/http_web_services/review.html
especially 11.3.3. Last-Modified/If-Modified-Since in the next page

Ravi Teja.

Jan 9 '07 #2
I am fetching different web pages (never the same one) from a web
server. Does that make a difference with them trying to block me?
Also, if it was only that site blocking me, then why does the internet
not work in other programs when this happens in the script. It is
almost like something is seeing a lot of traffic from my computer, and
cutting it off thinking it is some kind of virus or worm. I am
starting to suspect my firewall. Anyone else have this happen?

I am going to read over that documentation you suggested to see if I
can get any ideas. Thanks for the link.

Shuad

On Jan 8, 4:15 pm, "Ravi Teja" <webravit...@gmail.comwrote:
jdv...@gmail.com wrote:
I have a script that uses urllib2 to repeatedly lookup web pages (in a
spider sort of way). It appears to function normally, but if it runs
too long I start to get 404 responses. If I try to use the internet
through any other programs (Outlook, FireFox, etc.) it will also fail.
If I stop the script, the internet returns.
Has anyone observed this behavior before? I am relatively new to
Python and would appreciate any suggestions.
ShuadI am assuming that you are fetching the full page every little while.
You are not supposed to do that. The admin of the web site you are
constantly hitting probably configured his server to block you
temporarily when that happens. But don't feel bad :-). This is a common
Beginners mistake.

Read here on the proper way to do this.http://diveintopython.org/http_web_services/review.html
especially 11.3.3. Last-Modified/If-Modified-Since in the next page

Ravi Teja.
Jan 9 '07 #3
At Monday 8/1/2007 21:30, jd****@gmail.com wrote:
>I am fetching different web pages (never the same one) from a web
server. Does that make a difference with them trying to block me?
Also, if it was only that site blocking me, then why does the internet
not work in other programs when this happens in the script. It is
almost like something is seeing a lot of traffic from my computer, and
cutting it off thinking it is some kind of virus or worm. I am
starting to suspect my firewall. Anyone else have this happen?
Perhaps you're not closing connections once finished?
Try netstat -an from the command line and see how many open
connections you have.
--
Gabriel Genellina
Softlab SRL


__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas

Jan 9 '07 #4

jd****@gmail.com wrote:
I am fetching different web pages (never the same one) from a web
server. Does that make a difference with them trying to block me?
Also, if it was only that site blocking me, then why does the internet
not work in other programs when this happens in the script. It is
almost like something is seeing a lot of traffic from my computer, and
cutting it off thinking it is some kind of virus or worm. I am
starting to suspect my firewall. Anyone else have this happen?

I am going to read over that documentation you suggested to see if I
can get any ideas. Thanks for the link.

Shuad
No! What I suggested should not effect traffic from other servers. I
would go with Gabriel's suggestion and check for open connections just
in case. Although I can't imagine why that would give you a 404
response since it is a server response (implies successful connection).
I would expect that you would get a client error in such a case.

Of course, you can always rule out your suspicions of local conditions
(turn off security software briefly or try from a different machine)
unless your ISP is implementing safeguards against DOS attacks from
their network with normal users in mind.

Ravi Teja.

Jan 9 '07 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: bmiras | last post by:
I've got a problem using urllib2 to get a web page. I'm going through a proxy using user/password authentification and i'm trying to get a page asking for a HTTP authentification. And I'm using...
5
by: Max M | last post by:
I am using ClientCookie for login on to servers and browsing them as authenticated users. I kept getting "HTTP Error 400: Bad Request" errors when submitting my forms. So I boiled it down to a...
0
by: jacob c. | last post by:
When I request a URL using urllib2, it appears that urllib2 always makes the request using HTTP 1.0, and not HTTP 1.1. I'm trying to use the "If-None-Match"/"ETag" HTTP headers to conserve...
0
by: Pieter Edelman | last post by:
Hi all, I'm trying to submit some data using a POST request to a HTTP server with BASIC authentication with python, but I can't get it to work. Since it's driving me completely nuts, so here's...
2
by: AMD | last post by:
Hi, I would like to have MySQL use a mapped network drive. I'd like to do this in case there is a failure of the mysql machine, I can just replace it with a new machine pointing to the same...
0
by: Baptiste Lepilleur | last post by:
I activated httplib debug, and when trace are printed, a UnicodeError exception is thrown. I have already set sys.stdout to use utf-8 encoding (this removed the exception when *I* was printing...
4
by: dumbkiwi | last post by:
I have written a script that uses the urllib2 module to download web pages for parsing. If there is no network interface, urllib2 hangs for a very long time before it raises an exception. I...
0
by: dvd | last post by:
Hi i have a python script that uses urllib2 to download one file, it runs ok in any pc but when urlopen is called in the ipod it show the following error: <urlopen error (4, 'Non-recoverable...
1
by: Magnus.Moraberg | last post by:
Hi, I have the following code - import urllib2 from BeautifulSoup import BeautifulSoup proxy_support = urllib2.ProxyHandler({"http":"http:// 999.999.999.999:8080"}) opener =...
0
by: Rina0 | last post by:
Cybersecurity engineering is a specialized field that focuses on the design, development, and implementation of systems, processes, and technologies that protect against cyber threats and...
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 2 August 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
by: erikbower65 | last post by:
Using CodiumAI's pr-agent is simple and powerful. Follow these steps: 1. Install CodiumAI CLI: Ensure Node.js is installed, then run 'npm install -g codiumai' in the terminal. 2. Connect to...
0
by: erikbower65 | last post by:
Here's a concise step-by-step guide for manually installing IntelliJ IDEA: 1. Download: Visit the official JetBrains website and download the IntelliJ IDEA Community or Ultimate edition based on...
0
by: Rina0 | last post by:
I am looking for a Python code to find the longest common subsequence of two strings. I found this blog post that describes the length of longest common subsequence problem and provides a solution in...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
0
by: lllomh | last post by:
How does React native implement an English player?
0
by: Mushico | last post by:
How to calculate date of retirement from date of birth
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.