473,383 Members | 1,922 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

"HTTP error -1" from urllib2

I'm getting a wierd error from urllib2 when opening certain
URLs. The code works for most sites, but not all of them.
Here's the traceback:

[Thread-2] InfoSitePage EXCEPTION while processing page
"http://www.fourmilab.ch": Problem with page "http://www.fourmilab.ch": HTTP
error -1 - ..
Traceback (most recent call last):
File "D:\projects\sitetruth\InfoSitePage.py", line 318, in httpfetch
fd = url_opener.open(self.requestedurl) # open file by url
File "D:\projects\sitetruth\miscutils.py", line 149, in open
result = urllib.FancyURLopener.open(self, url, *args)
File "D:\python24\lib\urllib.py", line 190, in open
return getattr(self, name)(url)
File "D:\python24\lib\urllib.py", line 322, in open_http
return self.http_error(url, fp, errcode, errmsg, headers)
File "D:\python24\lib\urllib.py", line 339, in http_error
return self.http_error_default(url, fp, errcode, errmsg, headers)
File "D:\projects\sitetruth\miscutils.py", line 144, in http_error_default
raise InfoException.InfoException(self.url, 'HTTP error %s - %s.' %
(errcode, errmsg))
InfoException: Problem with page "http://www.fourmilab.ch": HTTP error -1 - ..

This fails identically using Python 2.4 on a Windows desktop and on Python 2.5
on a Linux server.

The site being accessed reads fine in a browser. It's not a redirect, and it
doesn't insist on cookies.

See "http://mail.python.org/pipermail/python-list/2005-March/314301.html"
for another problem involving "HTTP error -1".

John Nagle
Apr 13 '07 #1
3 2112
John Nagle <na***@animats.comwrites:
I'm getting a wierd error from urllib2 when opening certain
URLs. The code works for most sites, but not all of them.
Here's the traceback:
[...]
InfoException: Problem with page "http://www.fourmilab.ch": HTTP error -1 - ..

This fails identically using Python 2.4 on a Windows desktop and on Python 2.5
on a Linux server.

The site being accessed reads fine in a browser. It's not a redirect,
and it doesn't insist on cookies.

See "http://mail.python.org/pipermail/python-list/2005-March/314301.html"
for another problem involving "HTTP error -1".
Can you create an example (preferably small) that fails? Feel free to
email it to me if it includes something you don't want to post.

Simply fetching the URL you mention with urllib2.urlopen() works for
me, so I guess something extra is needed to reproduce the bug:

import urllib2
r = urllib2.urlopen("http://www.fourmilab.ch")
print r.read()
John
Apr 14 '07 #2
The crash is a known bug, and is fixed in the Subversion repository,
but not in any released version. The problem is that if the server
returns an blank line, instead of "HTTP 1", httplib goes off into
some old HTTP 0.9 code that's broken.

John Nagle

John Nagle wrote:
I'm getting a wierd error from urllib2 when opening certain
URLs. The code works for most sites, but not all of them.
Here's the traceback:

[Thread-2] InfoSitePage EXCEPTION while processing page
"http://www.fourmilab.ch": Problem with page "http://www.fourmilab.ch":
HTTP
error -1 - ..
Traceback (most recent call last):
File "D:\projects\sitetruth\InfoSitePage.py", line 318, in httpfetch
fd = url_opener.open(self.requestedurl) # open file by url
File "D:\projects\sitetruth\miscutils.py", line 149, in open
result = urllib.FancyURLopener.open(self, url, *args)
File "D:\python24\lib\urllib.py", line 190, in open
return getattr(self, name)(url)
File "D:\python24\lib\urllib.py", line 322, in open_http
return self.http_error(url, fp, errcode, errmsg, headers)
File "D:\python24\lib\urllib.py", line 339, in http_error
return self.http_error_default(url, fp, errcode, errmsg, headers)
File "D:\projects\sitetruth\miscutils.py", line 144, in
http_error_default
raise InfoException.InfoException(self.url, 'HTTP error %s - %s.' %
(errcode, errmsg))
InfoException: Problem with page "http://www.fourmilab.ch": HTTP error
-1 - ..

This fails identically using Python 2.4 on a Windows desktop and on
Python 2.5
on a Linux server.

The site being accessed reads fine in a browser. It's not a redirect,
and it doesn't insist on cookies.

See "http://mail.python.org/pipermail/python-list/2005-March/314301.html"
for another problem involving "HTTP error -1".

John Nagle
Apr 14 '07 #3
John J. Lee wrote:
John Nagle <na***@animats.comwrites:
Can you create an example (preferably small) that fails? Feel free to
email it to me if it includes something you don't want to post.
It's not a Python problem, as it turns out. It's a problem in,
surprisingly, Coyote Point load balancers.

This fails:
====
telnet www.coyotepoint.com 80
GET / HTTP/1.0
Host: www.fourmilab.ch
User-agent: am

====

This works:
====
telnet www.coyotepoint.com 80
GET / HTTP/1.0
Host: www.fourmilab.ch
User-agent: an

=====

Note the difference in the "User-agent" field; "m" vs. "n".

There's some problem in Coyote Point Equalizer load balancers
in USER-AGENT parsing. If it sees a USER-AGENT string ending in
"m" but with no earlier "m" in the string, and the USER-AGENT field
is the last field in the HTTP header, it drops the packet. One can make
this happen talking to the HTTP server with a Telnet client.
If you paste the sections between "===" lines above into a Windows
command line window, you can demonstrate this too. (Remember to
copy the blank line that ends the header.)

We found this because we were using a user agent string of
"SiteTruth.com rating system", which ends in "m" but doesn't
contain any other "m" characters. A site run by people we know
wouldn't respond, and we've been working to figure out why. They
own a Coyote Point Equalizer, and after much digging through
log files, it became clear that the load balancer was dropping
these packets, even though it wasn't configured to do so.
So we tried Coyote Point's own site, and it has exactly the
same problem. It's thus probably a generic problem with
Coyote Point load balancers. It's not a configuration
problem; we've checked the load balancer's configuration file.

That load balancer uses regular expressions to parse HTTP
headers. My guess is that we're going to find a "\m" somewhere
that a "\n" was intended.

I'll be on the phone to Coyote Point on Monday.

John Nagle
Apr 14 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Matthew Wilson | last post by:
I am writing a script to check on my router's external IP address. My ISP refreshes my IP very often and I use dyndns for the hostname for my computer. My Netgear mr814 router has a webserver that...
2
by: John F Dutcher | last post by:
Can anyone comment on why the code shown in the Python error is in some way incorrect...or is there a problem with Python on my hoster's site ?? The highlites don't seem to show here...but line...
1
by: Doug Farrell | last post by:
Hi all, I'm trying to build a web page crawler to help us build our websites, which are driven by static pages after they are called the first time. Anyway, I can use urllib2.urlopen() no...
2
by: Fuzzyman | last post by:
I've written a CGI proxy that remotely fetches web pages. I use the urlopen function in CLientCookie which replaces (and ultimately uses) the urlopen function in urllib2. What I'd like to do is...
0
by: Robert | last post by:
using a proxy and https/SSL together I get the following SSL error : File "dscore.pyo", line 2257, in UrlOpenEx File "ClientCookie\_urllib2_support.pyo", line 572, in open File...
0
by: Benjamin Schollnick | last post by:
Folks, With Windows XP, and Python v2.41 I am running into a problem.... The following code gives me an unknown protocol error.... And I am not sure how to resolve it... I have a API...
0
by: Robert | last post by:
socket.sslerror:(1, 'error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol') Got this from another computer/setup, which I cannot debug. What can be the cause for this problem? ...
0
by: Ali.Sabil | last post by:
hello all, I just maybe hit a bug in both urllib and urllib2, actually urllib doesn't support proxy authentication, and if you setup the http_proxy env var to...
1
by: Alessandro Fachin | last post by:
I write this simply code that should give me the access to private page with htaccess using a proxy, i don't known because it's wrong... import urllib,urllib2 #input url...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.