472,358 Members | 2,002 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,358 software developers and data experts.

problem using urllib2: \n

I've got a problem using urllib2 to get a web page.
I'm going through a proxy using user/password authentification
and i'm trying to get a page asking for a HTTP authentification.
And I'm using python 2.3

Here is an exemple of the piece of code I use:

import urllib2
#Proxy handler
proxy_handler = urllib2.ProxyHandler({"http" :
"http://proxyuser:proxypassword@myproxy:8050"})

#Site auth handler

site_auth_handler = urllib2.HTTPBasicAuthHandler();
site_auth_handler.add_password( "This Realm", "www.mysite.com",
"siteuser", "sitepassword" );
opener = urllib2.build_opener( site_auth_handler,
urllib2.HTTPRedirectHandler, urllib2.HTTPHandler , proxy_handler)
urllib2.install_opener(opener)
req = urllib2.Request('http://www.mysite.com/protectedpage')
page = urllib2.urlopen(req)

I got a 401 error.

Analyzing the request using 'strace' I can see the following request
sent to the proxy:

GET http://www.mysite.com/protectedpage HTTP/1.0\r\nHost:
www.mysite.com\r\nUser-agent:
Python-urllib/2.0a1\r\nProxy-authorization: Basic
bWlyYXNiOm1pcjAz\n\r\nAuthorization: Basic
bWlyYXM6bWlyYXMwMDE=\n\r\n\r\n

As you can see there is additionnal \n sent to the server just after
the Proxy-authorization and the Authorization fields. I think that in
this case the web server get only this part:
GET http://www.mysite.com/protectedpage HTTP/1.0\r\nHost:
www.mysite.com\r\nUser-agent:
Python-urllib/2.0a1\r\nProxy-authorization: Basic
bWlyYXNiOm1pcjAz\n\r\n

and so send me back an error 401, since I'm not authenticated for the
site.

I had a look in the urllib2.py . I think that base64.encodestring add
an \n at the end of the string. It's the case in the method
'proxy_open':

def proxy_open(self, req, proxy, type):
orig_type = req.get_type()
type, r_type = splittype(proxy)
host, XXX = splithost(r_type)
if '@' in host:
user_pass, host = host.split('@', 1)
if ':' in user_pass:
user, password = user_pass.split(':', 1)
user_pass = base64.encodestring('%s:%s' %
(unquote(user),

unquote(password)))
req.add_header('Proxy-authorization', 'Basic ' +
user_pass)
host = unquote(host)
req.set_proxy(host, type)
...

I think it should be:

user_pass = base64.encodestring('%s:%s' % (unquote(user),
unquote(password))).split()

have you any other clue?
thank you!

Bastien
Jul 18 '05 #1
4 4311
bm****@yahoo.com writes:
I've got a problem using urllib2 to get a web page.
I'm going through a proxy using user/password authentification
and i'm trying to get a page asking for a HTTP authentification.
And I'm using python 2.3

Here is an exemple of the piece of code I use:

import urllib2
#Proxy handler
proxy_handler = urllib2.ProxyHandler({"http" :
"http://proxyuser:proxypassword@myproxy:8050"})

#Site auth handler

site_auth_handler = urllib2.HTTPBasicAuthHandler();
site_auth_handler.add_password( "This Realm", "www.mysite.com",
"siteuser", "sitepassword" );
opener = urllib2.build_opener( site_auth_handler,
urllib2.HTTPRedirectHandler, urllib2.HTTPHandler , proxy_handler)
urllib2.install_opener(opener)
Looks OK (but I don't use a proxy, nor basic auth very often...).

Just as a BTW: you don't need to pass HTTPHandler or
HTTPRedirectHandler in there: build_opener adds them whether you ask
for them or not.

req = urllib2.Request('http://www.mysite.com/protectedpage')
page = urllib2.urlopen(req)

I got a 401 error.
So presumably your proxy is happy, but the site is not. Could you
test that theory by urlopen()ing a URL that *doesn't* require any
authentication? Just:

# ...your code up to install_opener goes here...
print urllib2.urlopen("http://www.python.org/").read()

Analyzing the request using 'strace' I can see the following request
sent to the proxy:

GET http://www.mysite.com/protectedpage HTTP/1.0\r\nHost:
www.mysite.com\r\nUser-agent:
Python-urllib/2.0a1\r\nProxy-authorization: Basic
XXX\n\r\nAuthorization: Basic
YYY\n\r\n\r\n
(You probably didn't want to post your usernames and passwords to a
public newsgroup. They're reversibly encoded, so anyone can decode
them. I've replaced them with XXX and YYY in the quote above.)

As you can see there is additionnal \n sent to the server just after
the Proxy-authorization and the Authorization fields. I think that in
this case the web server get only this part:

GET http://www.mysite.com/protectedpage HTTP/1.0\r\nHost:
www.mysite.com\r\nUser-agent:
Python-urllib/2.0a1\r\nProxy-authorization: Basic
XXX\n\r\n

and so send me back an error 401, since I'm not authenticated for the
site.
Hmm. That \n does seem likely to be wrong, but I'm not certain.

The urllib2 code appears to duplicate the code for base64 encoding for
proxy basic authorization (in ProxyBasicAuthHandler and ProxyHandler),
and the code differs between the two classes :-(. [It looks like PBAH
responds to 407, and ProxyHandler always sends Proxy-Authorization if
it's in the proxy's URL.] And in fact, only one of them does a
..strip() on the base64 encoded string (they also differ in quoting).
However, the Authorization: header appears to be generated only in one
place (AbstractBasicAuthHandler.retry_http_basic_auth), which *does*
strip, but you've got a \n there, too. So, I don't understand where
that \n is coming from. I'd try sticking some print statements in
there to find out what's going on.

I had a look in the urllib2.py . I think that base64.encodestring add
an \n at the end of the string. It's the case in the method
'proxy_open':

def proxy_open(self, req, proxy, type):
orig_type = req.get_type()
type, r_type = splittype(proxy)
host, XXX = splithost(r_type)
if '@' in host:
user_pass, host = host.split('@', 1)
if ':' in user_pass:
user, password = user_pass.split(':', 1)
user_pass = base64.encodestring('%s:%s' %
(unquote(user),

unquote(password)))
req.add_header('Proxy-authorization', 'Basic ' +
user_pass)
host = unquote(host)
req.set_proxy(host, type)
...

I think it should be:

user_pass = base64.encodestring('%s:%s' % (unquote(user),
unquote(password))).split()
You mean strip, not split?

Try debugging a bit, find out what's really going on. Just copy
urllib2.py to your current directory (so it'll override the installed
standard library's copy), and stick some print statements in there.

have you any other clue?

[...]

You could try sniffing what Mozilla sends, too.

If you get this working, please look at the doc patch here

http://www.python.org/sf/798244
test it, and post a comment to say whether or not it's correct (and
which examples you tried -- preferably all of them ;).
John
Jul 18 '05 #2
[bm****@yahoo.com wrote]
Here is an exemple of the piece of code I use:

import urllib2
#Proxy handler
proxy_handler = urllib2.ProxyHandler({"http" :
"http://proxyuser:proxypassword@myproxy:8050"})


Might you need to change that URL? It looks like this URL indicates
that the proxy is running on port 8050 on host "myproxy".

Unless the host on which the proxy is running is named "myproxy", try
changing the proxy URL to one of the following values

http://proxyuser:proxypassword@localhost:8050
http://proxyuser:pr***********@127.0.0.1:8050

HTH,

--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan
Jul 18 '05 #3
jj*@pobox.com (John J. Lee) wrote in message news:<87************@pobox.com>...
bm****@yahoo.com writes:
I've got a problem using urllib2 to get a web page.
I'm going through a proxy using user/password authentification
and i'm trying to get a page asking for a HTTP authentification.
And I'm using python 2.3

Here is an exemple of the piece of code I use:

import urllib2
#Proxy handler
proxy_handler = urllib2.ProxyHandler({"http" :
"http://proxyuser:proxypassword@myproxy:8050"})

#Site auth handler

site_auth_handler = urllib2.HTTPBasicAuthHandler();
site_auth_handler.add_password( "This Realm", "www.mysite.com",
"siteuser", "sitepassword" );
opener = urllib2.build_opener( site_auth_handler,
urllib2.HTTPRedirectHandler, urllib2.HTTPHandler , proxy_handler)
urllib2.install_opener(opener)
Looks OK (but I don't use a proxy, nor basic auth very often...).

Just as a BTW: you don't need to pass HTTPHandler or
HTTPRedirectHandler in there: build_opener adds them whether you ask
for them or not.

req = urllib2.Request('http://www.mysite.com/protectedpage')
page = urllib2.urlopen(req)

I got a 401 error.


So presumably your proxy is happy, but the site is not. Could you
test that theory by urlopen()ing a URL that *doesn't* require any
authentication? Just:

# ...your code up to install_opener goes here...
print urllib2.urlopen("http://www.python.org/").read()

It's ok with URL that doesn't require authentication

Analyzing the request using 'strace' I can see the following request
sent to the proxy:

GET http://www.mysite.com/protectedpage HTTP/1.0\r\nHost:
www.mysite.com\r\nUser-agent:
Python-urllib/2.0a1\r\nProxy-authorization: Basic
XXX\n\r\nAuthorization: Basic
YYY\n\r\n\r\n


(You probably didn't want to post your usernames and passwords to a
public newsgroup. They're reversibly encoded, so anyone can decode
them. I've replaced them with XXX and YYY in the quote above.)

As you can see there is additionnal \n sent to the server just after
the Proxy-authorization and the Authorization fields. I think that in
this case the web server get only this part:

GET http://www.mysite.com/protectedpage HTTP/1.0\r\nHost:
www.mysite.com\r\nUser-agent:
Python-urllib/2.0a1\r\nProxy-authorization: Basic
XXX\n\r\n

and so send me back an error 401, since I'm not authenticated for the
site.


Hmm. That \n does seem likely to be wrong, but I'm not certain.

The urllib2 code appears to duplicate the code for base64 encoding for
proxy basic authorization (in ProxyBasicAuthHandler and ProxyHandler),
and the code differs between the two classes :-(. [It looks like PBAH
responds to 407, and ProxyHandler always sends Proxy-Authorization if
it's in the proxy's URL.] And in fact, only one of them does a
.strip() on the base64 encoded string (they also differ in quoting).
However, the Authorization: header appears to be generated only in one
place (AbstractBasicAuthHandler.retry_http_basic_auth), which *does*
strip, but you've got a \n there, too. So, I don't understand where
that \n is coming from. I'd try sticking some print statements in
there to find out what's going on.


I've done a wrong copy/paste
there is no additional \n after Authorization field
but there an additional \n for Proxy-Authorization

I've used HTTPBasicAuthHandler since you said the code is different
and it worked fine!!!
I think the conclusion is that the strip call in the ProxyHandler code
is missing. Is it necessary to report it as a bug?

I had a look in the urllib2.py . I think that base64.encodestring add
an \n at the end of the string. It's the case in the method
'proxy_open':

def proxy_open(self, req, proxy, type):
orig_type = req.get_type()
type, r_type = splittype(proxy)
host, XXX = splithost(r_type)
if '@' in host:
user_pass, host = host.split('@', 1)
if ':' in user_pass:
user, password = user_pass.split(':', 1)
user_pass = base64.encodestring('%s:%s' %
(unquote(user),

unquote(password)))
req.add_header('Proxy-authorization', 'Basic ' +
user_pass)
host = unquote(host)
req.set_proxy(host, type)
...

I think it should be:

user_pass = base64.encodestring('%s:%s' % (unquote(user),
unquote(password))).split()
You mean strip, not split?


Yes strip, sorry,
Try debugging a bit, find out what's really going on. Just copy
urllib2.py to your current directory (so it'll override the installed
standard library's copy), and stick some print statements in there.

have you any other clue? [...]

You could try sniffing what Mozilla sends, too.

I've done better: telnet myproxy 8050

GET http://www.mysite.com/protectedpage HTTP/1.0
Host: www.mysite.com
User-agent: Python-urllib/2.0a1
Proxy-authorization: Basic XXX
Authorization: Basic YYY

And it worked fine.

If you get this working, please look at the doc patch here

http://www.python.org/sf/798244
test it, and post a comment to say whether or not it's correct (and
which examples you tried -- preferably all of them ;).
John

Jul 18 '05 #4
bm****@yahoo.com writes:
jj*@pobox.com (John J. Lee) wrote in message news:<87************@pobox.com>... [...] I've done a wrong copy/paste
there is no additional \n after Authorization field
but there an additional \n for Proxy-Authorization

I've used HTTPBasicAuthHandler since you said the code is different
and it worked fine!!!
I think the conclusion is that the strip call in the ProxyHandler code
is missing. Is it necessary to report it as a bug?


Yes. Please report it to sourceforge, remembering to check that
nobody else already has. The correct version of the duplicated code
should be factored out into a function.

To help future users, it would be really useful if you could do this
too:

[...]
If you get this working, please look at the doc patch here

http://www.python.org/sf/798244
test it, and post a comment to say whether or not it's correct (and
which examples you tried -- preferably all of them ;).


Won't take you long, since you already have your code working.
John
Jul 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Andre Bocchini | last post by:
I'm having some trouble using proxy authentication. I can't figure out how to authenticate with a Squid proxy. I know for a fact the proxy is using Basic instead of Digest for the authentication....
0
by: jacob c. | last post by:
When I request a URL using urllib2, it appears that urllib2 always makes the request using HTTP 1.0, and not HTTP 1.1. I'm trying to use the "If-None-Match"/"ETag" HTTP headers to conserve...
0
by: Pieter Edelman | last post by:
Hi all, I'm trying to submit some data using a POST request to a HTTP server with BASIC authentication with python, but I can't get it to work. Since it's driving me completely nuts, so here's...
1
by: kelio | last post by:
I have a simple cgi-script on a server that prints all key-value pairs from a request. And it really works when i use a browser and type smth like http://server/cgi-bin/test?name=mike&johny=dummy....
0
by: Josef Cihal | last post by:
Hi, I get an error, when I am trying to download URL with using Cookies. Where is the Problem? Thank u very much for all ideas!!! sincerely Josef
11
by: Johnny Lee | last post by:
Hi, I was using urllib to grab urls from web. here is the work flow of my program: 1. Get base url and max number of urls from user 2. Call filter to validate the base url 3. Read the source...
1
by: alex23 | last post by:
Hey everyone, I'm trying to install setuptools on a work PC behind an NTLM firewall. I've tried to use APS as recommended but am still unable to have anything other than IE talk through...
0
by: Phoe6 | last post by:
Hi All, I am able to use urlib2 through proxy. I give proxy credentials and use # Set the Proxy Address proxy_ip = "10.0.1.1:80" proxy_user = 'senthil_or' proxy_password_orig='password'
1
by: darran | last post by:
I'm struggling with using urllib2 to access the Harvest time-tracking web service (http://www.getharvest.com/api). GET is working fine. POST is giving me a problem. Here is an example that...
2
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and efficiency. While initially associated with cryptocurrencies...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge required to effectively administer and manage Oracle...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was proposed, which integrated multiple engines and...
1
by: Matthew3360 | last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function. Here is my code. header("Location:".$urlback); Is this the right layout the...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific technical details, Gmail likely implements measures...
1
by: Matthew3360 | last post by:
Hi, I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web server and have made sure to enable curl. I get a...
0
by: Carina712 | last post by:
Setting background colors for Excel documents can help to improve the visual appeal of the document and make it easier to read and understand. Background colors can be used to highlight important...
0
BLUEPANDA
by: BLUEPANDA | last post by:
At BluePanda Dev, we're passionate about building high-quality software and sharing our knowledge with the community. That's why we've created a SaaS starter kit that's not only easy to use but also...
0
by: Ricardo de Mila | last post by:
Dear people, good afternoon... I have a form in msAccess with lots of controls and a specific routine must be triggered if the mouse_down event happens in any control. Than I need to discover what...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.