I've got a problem using urllib2 to get a web page.
I'm going through a proxy using user/password authentification
and i'm trying to get a page asking for a HTTP authentification.
And I'm using python 2.3
Here is an exemple of the piece of code I use:
import urllib2
#Proxy handler
proxy_handler = urllib2.ProxyHandler({"http" :
"http://proxyuser:proxypassword@myproxy:8050"})
#Site auth handler
site_auth_handler = urllib2.HTTPBasicAuthHandler();
site_auth_handler.add_password( "This Realm", "www.mysite.com",
"siteuser", "sitepassword" );
opener = urllib2.build_opener( site_auth_handler,
urllib2.HTTPRedirectHandler, urllib2.HTTPHandler , proxy_handler)
urllib2.install_opener(opener)
req = urllib2.Request('http://www.mysite.com/protectedpage')
page = urllib2.urlopen(req)
I got a 401 error.
Analyzing the request using 'strace' I can see the following request
sent to the proxy:
GET http://www.mysite.com/protectedpage HTTP/1.0\r\nHost:
www.mysite.com\r\nUser-agent:
Python-urllib/2.0a1\r\nProxy-authorization: Basic
bWlyYXNiOm1pcjAz\n\r\nAuthorization: Basic
bWlyYXM6bWlyYXMwMDE=\n\r\n\r\n
As you can see there is additionnal \n sent to the server just after
the Proxy-authorization and the Authorization fields. I think that in
this case the web server get only this part:
GET http://www.mysite.com/protectedpage HTTP/1.0\r\nHost:
www.mysite.com\r\nUser-agent:
Python-urllib/2.0a1\r\nProxy-authorization: Basic
bWlyYXNiOm1pcjAz\n\r\n
and so send me back an error 401, since I'm not authenticated for the
site.
I had a look in the urllib2.py . I think that base64.encodestring add
an \n at the end of the string. It's the case in the method
'proxy_open':
def proxy_open(self, req, proxy, type):
orig_type = req.get_type()
type, r_type = splittype(proxy)
host, XXX = splithost(r_type)
if '@' in host:
user_pass, host = host.split('@', 1)
if ':' in user_pass:
user, password = user_pass.split(':', 1)
user_pass = base64.encodestring('%s:%s' %
(unquote(user),
unquote(password)))
req.add_header('Proxy-authorization', 'Basic ' +
user_pass)
host = unquote(host)
req.set_proxy(host, type)
...
I think it should be:
user_pass = base64.encodestring('%s:%s' % (unquote(user),
unquote(password))).split()
have you any other clue?
thank you!
Bastien