473,569 Members | 2,870 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

problem using urllib2: \n

I've got a problem using urllib2 to get a web page.
I'm going through a proxy using user/password authentificatio n
and i'm trying to get a page asking for a HTTP authentificatio n.
And I'm using python 2.3

Here is an exemple of the piece of code I use:

import urllib2
#Proxy handler
proxy_handler = urllib2.ProxyHa ndler({"http" :
"http://proxyuser:proxy password@myprox y:8050"})

#Site auth handler

site_auth_handl er = urllib2.HTTPBas icAuthHandler() ;
site_auth_handl er.add_password ( "This Realm", "www.mysite.com ",
"siteuser", "sitepasswo rd" );
opener = urllib2.build_o pener( site_auth_handl er,
urllib2.HTTPRed irectHandler, urllib2.HTTPHan dler , proxy_handler)
urllib2.install _opener(opener)
req = urllib2.Request ('http://www.mysite.com/protectedpage')
page = urllib2.urlopen (req)

I got a 401 error.

Analyzing the request using 'strace' I can see the following request
sent to the proxy:

GET http://www.mysite.com/protectedpage HTTP/1.0\r\nHost:
www.mysite.com\r\nUser-agent:
Python-urllib/2.0a1\r\nProxy-authorization: Basic
bWlyYXNiOm1pcjA z\n\r\nAuthoriz ation: Basic
bWlyYXM6bWlyYXM wMDE=\n\r\n\r\n

As you can see there is additionnal \n sent to the server just after
the Proxy-authorization and the Authorization fields. I think that in
this case the web server get only this part:
GET http://www.mysite.com/protectedpage HTTP/1.0\r\nHost:
www.mysite.com\r\nUser-agent:
Python-urllib/2.0a1\r\nProxy-authorization: Basic
bWlyYXNiOm1pcjA z\n\r\n

and so send me back an error 401, since I'm not authenticated for the
site.

I had a look in the urllib2.py . I think that base64.encodest ring add
an \n at the end of the string. It's the case in the method
'proxy_open':

def proxy_open(self , req, proxy, type):
orig_type = req.get_type()
type, r_type = splittype(proxy )
host, XXX = splithost(r_typ e)
if '@' in host:
user_pass, host = host.split('@', 1)
if ':' in user_pass:
user, password = user_pass.split (':', 1)
user_pass = base64.encodest ring('%s:%s' %
(unquote(user),

unquote(passwor d)))
req.add_header( 'Proxy-authorization', 'Basic ' +
user_pass)
host = unquote(host)
req.set_proxy(h ost, type)
...

I think it should be:

user_pass = base64.encodest ring('%s:%s' % (unquote(user),
unquote(passwor d))).split()

have you any other clue?
thank you!

Bastien
Jul 18 '05 #1
4 4448
bm****@yahoo.co m writes:
I've got a problem using urllib2 to get a web page.
I'm going through a proxy using user/password authentificatio n
and i'm trying to get a page asking for a HTTP authentificatio n.
And I'm using python 2.3

Here is an exemple of the piece of code I use:

import urllib2
#Proxy handler
proxy_handler = urllib2.ProxyHa ndler({"http" :
"http://proxyuser:proxy password@myprox y:8050"})

#Site auth handler

site_auth_handl er = urllib2.HTTPBas icAuthHandler() ;
site_auth_handl er.add_password ( "This Realm", "www.mysite.com ",
"siteuser", "sitepasswo rd" );
opener = urllib2.build_o pener( site_auth_handl er,
urllib2.HTTPRed irectHandler, urllib2.HTTPHan dler , proxy_handler)
urllib2.install _opener(opener)
Looks OK (but I don't use a proxy, nor basic auth very often...).

Just as a BTW: you don't need to pass HTTPHandler or
HTTPRedirectHan dler in there: build_opener adds them whether you ask
for them or not.

req = urllib2.Request ('http://www.mysite.com/protectedpage')
page = urllib2.urlopen (req)

I got a 401 error.
So presumably your proxy is happy, but the site is not. Could you
test that theory by urlopen()ing a URL that *doesn't* require any
authentication? Just:

# ...your code up to install_opener goes here...
print urllib2.urlopen ("http://www.python.org/").read()

Analyzing the request using 'strace' I can see the following request
sent to the proxy:

GET http://www.mysite.com/protectedpage HTTP/1.0\r\nHost:
www.mysite.com\r\nUser-agent:
Python-urllib/2.0a1\r\nProxy-authorization: Basic
XXX\n\r\nAuthor ization: Basic
YYY\n\r\n\r\n
(You probably didn't want to post your usernames and passwords to a
public newsgroup. They're reversibly encoded, so anyone can decode
them. I've replaced them with XXX and YYY in the quote above.)

As you can see there is additionnal \n sent to the server just after
the Proxy-authorization and the Authorization fields. I think that in
this case the web server get only this part:

GET http://www.mysite.com/protectedpage HTTP/1.0\r\nHost:
www.mysite.com\r\nUser-agent:
Python-urllib/2.0a1\r\nProxy-authorization: Basic
XXX\n\r\n

and so send me back an error 401, since I'm not authenticated for the
site.
Hmm. That \n does seem likely to be wrong, but I'm not certain.

The urllib2 code appears to duplicate the code for base64 encoding for
proxy basic authorization (in ProxyBasicAuthH andler and ProxyHandler),
and the code differs between the two classes :-(. [It looks like PBAH
responds to 407, and ProxyHandler always sends Proxy-Authorization if
it's in the proxy's URL.] And in fact, only one of them does a
..strip() on the base64 encoded string (they also differ in quoting).
However, the Authorization: header appears to be generated only in one
place (AbstractBasicA uthHandler.retr y_http_basic_au th), which *does*
strip, but you've got a \n there, too. So, I don't understand where
that \n is coming from. I'd try sticking some print statements in
there to find out what's going on.

I had a look in the urllib2.py . I think that base64.encodest ring add
an \n at the end of the string. It's the case in the method
'proxy_open':

def proxy_open(self , req, proxy, type):
orig_type = req.get_type()
type, r_type = splittype(proxy )
host, XXX = splithost(r_typ e)
if '@' in host:
user_pass, host = host.split('@', 1)
if ':' in user_pass:
user, password = user_pass.split (':', 1)
user_pass = base64.encodest ring('%s:%s' %
(unquote(user),

unquote(passwor d)))
req.add_header( 'Proxy-authorization', 'Basic ' +
user_pass)
host = unquote(host)
req.set_proxy(h ost, type)
...

I think it should be:

user_pass = base64.encodest ring('%s:%s' % (unquote(user),
unquote(passwor d))).split()
You mean strip, not split?

Try debugging a bit, find out what's really going on. Just copy
urllib2.py to your current directory (so it'll override the installed
standard library's copy), and stick some print statements in there.

have you any other clue?

[...]

You could try sniffing what Mozilla sends, too.

If you get this working, please look at the doc patch here

http://www.python.org/sf/798244
test it, and post a comment to say whether or not it's correct (and
which examples you tried -- preferably all of them ;).
John
Jul 18 '05 #2
[bm****@yahoo.co m wrote]
Here is an exemple of the piece of code I use:

import urllib2
#Proxy handler
proxy_handler = urllib2.ProxyHa ndler({"http" :
"http://proxyuser:proxy password@myprox y:8050"})


Might you need to change that URL? It looks like this URL indicates
that the proxy is running on port 8050 on host "myproxy".

Unless the host on which the proxy is running is named "myproxy", try
changing the proxy URL to one of the following values

http://proxyuser:proxypassword@localhost:8050
http://proxyuser:pr***********@127.0.0.1:8050

HTH,

--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan
Jul 18 '05 #3
jj*@pobox.com (John J. Lee) wrote in message news:<87******* *****@pobox.com >...
bm****@yahoo.co m writes:
I've got a problem using urllib2 to get a web page.
I'm going through a proxy using user/password authentificatio n
and i'm trying to get a page asking for a HTTP authentificatio n.
And I'm using python 2.3

Here is an exemple of the piece of code I use:

import urllib2
#Proxy handler
proxy_handler = urllib2.ProxyHa ndler({"http" :
"http://proxyuser:proxy password@myprox y:8050"})

#Site auth handler

site_auth_handl er = urllib2.HTTPBas icAuthHandler() ;
site_auth_handl er.add_password ( "This Realm", "www.mysite.com ",
"siteuser", "sitepasswo rd" );
opener = urllib2.build_o pener( site_auth_handl er,
urllib2.HTTPRed irectHandler, urllib2.HTTPHan dler , proxy_handler)
urllib2.install _opener(opener)
Looks OK (but I don't use a proxy, nor basic auth very often...).

Just as a BTW: you don't need to pass HTTPHandler or
HTTPRedirectHan dler in there: build_opener adds them whether you ask
for them or not.

req = urllib2.Request ('http://www.mysite.com/protectedpage')
page = urllib2.urlopen (req)

I got a 401 error.


So presumably your proxy is happy, but the site is not. Could you
test that theory by urlopen()ing a URL that *doesn't* require any
authentication? Just:

# ...your code up to install_opener goes here...
print urllib2.urlopen ("http://www.python.org/").read()

It's ok with URL that doesn't require authentication

Analyzing the request using 'strace' I can see the following request
sent to the proxy:

GET http://www.mysite.com/protectedpage HTTP/1.0\r\nHost:
www.mysite.com\r\nUser-agent:
Python-urllib/2.0a1\r\nProxy-authorization: Basic
XXX\n\r\nAuthor ization: Basic
YYY\n\r\n\r\n


(You probably didn't want to post your usernames and passwords to a
public newsgroup. They're reversibly encoded, so anyone can decode
them. I've replaced them with XXX and YYY in the quote above.)

As you can see there is additionnal \n sent to the server just after
the Proxy-authorization and the Authorization fields. I think that in
this case the web server get only this part:

GET http://www.mysite.com/protectedpage HTTP/1.0\r\nHost:
www.mysite.com\r\nUser-agent:
Python-urllib/2.0a1\r\nProxy-authorization: Basic
XXX\n\r\n

and so send me back an error 401, since I'm not authenticated for the
site.


Hmm. That \n does seem likely to be wrong, but I'm not certain.

The urllib2 code appears to duplicate the code for base64 encoding for
proxy basic authorization (in ProxyBasicAuthH andler and ProxyHandler),
and the code differs between the two classes :-(. [It looks like PBAH
responds to 407, and ProxyHandler always sends Proxy-Authorization if
it's in the proxy's URL.] And in fact, only one of them does a
.strip() on the base64 encoded string (they also differ in quoting).
However, the Authorization: header appears to be generated only in one
place (AbstractBasicA uthHandler.retr y_http_basic_au th), which *does*
strip, but you've got a \n there, too. So, I don't understand where
that \n is coming from. I'd try sticking some print statements in
there to find out what's going on.


I've done a wrong copy/paste
there is no additional \n after Authorization field
but there an additional \n for Proxy-Authorization

I've used HTTPBasicAuthHa ndler since you said the code is different
and it worked fine!!!
I think the conclusion is that the strip call in the ProxyHandler code
is missing. Is it necessary to report it as a bug?

I had a look in the urllib2.py . I think that base64.encodest ring add
an \n at the end of the string. It's the case in the method
'proxy_open':

def proxy_open(self , req, proxy, type):
orig_type = req.get_type()
type, r_type = splittype(proxy )
host, XXX = splithost(r_typ e)
if '@' in host:
user_pass, host = host.split('@', 1)
if ':' in user_pass:
user, password = user_pass.split (':', 1)
user_pass = base64.encodest ring('%s:%s' %
(unquote(user),

unquote(passwor d)))
req.add_header( 'Proxy-authorization', 'Basic ' +
user_pass)
host = unquote(host)
req.set_proxy(h ost, type)
...

I think it should be:

user_pass = base64.encodest ring('%s:%s' % (unquote(user),
unquote(passwor d))).split()
You mean strip, not split?


Yes strip, sorry,
Try debugging a bit, find out what's really going on. Just copy
urllib2.py to your current directory (so it'll override the installed
standard library's copy), and stick some print statements in there.

have you any other clue? [...]

You could try sniffing what Mozilla sends, too.

I've done better: telnet myproxy 8050

GET http://www.mysite.com/protectedpage HTTP/1.0
Host: www.mysite.com
User-agent: Python-urllib/2.0a1
Proxy-authorization: Basic XXX
Authorization: Basic YYY

And it worked fine.

If you get this working, please look at the doc patch here

http://www.python.org/sf/798244
test it, and post a comment to say whether or not it's correct (and
which examples you tried -- preferably all of them ;).
John

Jul 18 '05 #4
bm****@yahoo.co m writes:
jj*@pobox.com (John J. Lee) wrote in message news:<87******* *****@pobox.com >... [...] I've done a wrong copy/paste
there is no additional \n after Authorization field
but there an additional \n for Proxy-Authorization

I've used HTTPBasicAuthHa ndler since you said the code is different
and it worked fine!!!
I think the conclusion is that the strip call in the ProxyHandler code
is missing. Is it necessary to report it as a bug?


Yes. Please report it to sourceforge, remembering to check that
nobody else already has. The correct version of the duplicated code
should be factored out into a function.

To help future users, it would be really useful if you could do this
too:

[...]
If you get this working, please look at the doc patch here

http://www.python.org/sf/798244
test it, and post a comment to say whether or not it's correct (and
which examples you tried -- preferably all of them ;).


Won't take you long, since you already have your code working.
John
Jul 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
34980
by: Andre Bocchini | last post by:
I'm having some trouble using proxy authentication. I can't figure out how to authenticate with a Squid proxy. I know for a fact the proxy is using Basic instead of Digest for the authentication. I can authenticate just fine using Mozilla. I've done some Google searches, but the closest piece of code I've is is for HTTPBasicAuthHandler: ...
0
2638
by: jacob c. | last post by:
When I request a URL using urllib2, it appears that urllib2 always makes the request using HTTP 1.0, and not HTTP 1.1. I'm trying to use the "If-None-Match"/"ETag" HTTP headers to conserve bandwidth, but if I'm not mistaken, these are HTTP 1.1 headers, so I can't reasonably expect a web server to respond correctly to my requests. (In my...
0
3571
by: Pieter Edelman | last post by:
Hi all, I'm trying to submit some data using a POST request to a HTTP server with BASIC authentication with python, but I can't get it to work. Since it's driving me completely nuts, so here's my cry for help. The server is an elog logbook server (http://midas.psi.ch/elog/). It is protected with a password and an empty username. I can...
1
3205
by: kelio | last post by:
I have a simple cgi-script on a server that prints all key-value pairs from a request. And it really works when i use a browser and type smth like http://server/cgi-bin/test?name=mike&johny=dummy. But when I use the following script, nothing is printed (like i type http://server/cgi-bin/test request in the browser). What is wrong about it?...
0
2810
by: Josef Cihal | last post by:
Hi, I get an error, when I am trying to download URL with using Cookies. Where is the Problem? Thank u very much for all ideas!!! sincerely Josef
11
3554
by: Johnny Lee | last post by:
Hi, I was using urllib to grab urls from web. here is the work flow of my program: 1. Get base url and max number of urls from user 2. Call filter to validate the base url 3. Read the source of the base url and grab all the urls from "href" property of "a" tag 4. Call filter to validate every url grabbed 5. Continue 3-4 until the number...
1
2051
by: alex23 | last post by:
Hey everyone, I'm trying to install setuptools on a work PC behind an NTLM firewall. I've tried to use APS as recommended but am still unable to have anything other than IE talk through firewall. But as I can downloaded eggs manually, I'm not overly concerned at this point. So I've tried following the instructions for un-networked...
0
1555
by: Phoe6 | last post by:
Hi All, I am able to use urlib2 through proxy. I give proxy credentials and use # Set the Proxy Address proxy_ip = "10.0.1.1:80" proxy_user = 'senthil_or' proxy_password_orig='password'
1
4538
by: darran | last post by:
I'm struggling with using urllib2 to access the Harvest time-tracking web service (http://www.getharvest.com/api). GET is working fine. POST is giving me a problem. Here is an example that creates a new time-tracking entry using curl. $ curl http://subdomain.harvestapp.com/daily/add -H 'Accept: application/xml' \ -H 'Content-Type:...
0
7693
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
1
7665
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
7962
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6277
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5501
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5217
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3651
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
1
2105
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1207
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.