473,323 Members | 1,589 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,323 software developers and data experts.

urllib2 through basic auth'ed proxy

I see from googling around that this is a popular topic, but I haven't seen
anyone saying "ah, yes, that works", so here it goes.

How does one connect through a proxy which requires basic authorisation?
The following code, stolen from somewhere, fails with a 407:

proxy_handler = urllib2.ProxyHandler({"http" :
"http://the.proxy.address:3128"})
proxy_auth_handler = urllib2.ProxyBasicAuthHandler()
proxy_auth_handler.add_password("The name of the realm sniffed from
telnetting to the proxy and doing a
get",'the.proxy.address','theusername','thepasswor d')
opener = urllib2.build_opener(proxy_handler, proxy_auth_handler)
urllib2.install_opener(opener)
f = urllib2.urlopen('http://www.google.com/')
I still get a 407 if I set the realm to None, I change host to the
'http://the.proxy.address/' form or even 'http://the.proxy.address:3128'
form.

The proxy is squid. Python version is 2.3.4 (I read that this version has a
problem in that it introduces an extra return after the authorisation, but
it isn't even getting to that bit). And yes, going through firefox,
everything works fine.

Can anyone explain me why this fails, or more importantly, code that would
work?

Thanks,
alejandro

Mar 29 '06 #1
6 11579
Alejandro Dubrovsky <du*******@physics.uq.edu.au> writes:
[...]
How does one connect through a proxy which requires basic authorisation?
The following code, stolen from somewhere, fails with a 407:
[...code involving urllib2.ProxyBasicAuthHandler()...] Can anyone explain me why this fails, or more importantly, code that would
work?


OK, I finally installed squid and had a look at the urllib2 proxy
basic auth support (which I've steered clear of for years despite
doing quite a bit with urllib2). Seems quite broken. Appears to have
been broken back in December 2004, with revision 38092 (note there's a
little revision number oddness in the Python SVN repo, BTW:
http://mail.python.org/pipermail/pyt.../058269.html):

--- urllib2.py (revision 38091)
+++ urllib2.py (revision 38092)
@@ -720,7 +720,10 @@
return self.retry_http_basic_auth(host, req, realm)

def retry_http_basic_auth(self, host, req, realm):
- user,pw = self.passwd.find_user_password(realm, host)
+ # TODO(jhylton): Remove the host argument? It depends on whether
+ # retry_http_basic_auth() is consider part of the public API.
+ # It probably is.
+ user, pw = self.passwd.find_user_password(realm, req.get_full_url())
if pw is not None:
raw = "%s:%s" % (user, pw)
....
That can't be right, can it? With a proxy, you're always
authenticating yourself for the whole proxy, and you want to look up
(RFC 2617 section 3.2.1). The ProxyBasicAuthHandler subclass
dutifully passes in the right thing for the host argument, but
AbstractBasicAuthHandler ignores it, which means that it never finds
the password -- e.g. if you're trying to connect to python.org through
myproxy.com, it'll be looking for a username/password for python.org
instead of the needed myproxy.com.

Obviously nobody else uses authenticating proxies either, or at least
nobody who can be bothered to fix urllib2 :-(

A workaround is to supply a stupid HTTPPasswordMgr that always returns
the proxy credentials regardless of what the handler asks it for (only
tested with a perhaps-broken 2.5 install, since I've broken my 2.4
install):

import urllib2

class DumbProxyPasswordMgr:
def __init__(self):
self.user = self.passwd = None
def add_password(self, realm, uri, user, passwd):
self.user = user
self.passwd = passwd
def find_user_password(self, realm, authuri):
return self.user, self.passwd
proxy_auth_handler = urllib2.ProxyBasicAuthHandler(DumbProxyPasswordMgr ())
proxy_handler = urllib2.ProxyHandler({"http": "http://localhost:3128"})
proxy_auth_handler.add_password(None, None, 'john', 'blah')
opener = urllib2.build_opener(proxy_handler, proxy_auth_handler)
f = opener.open('http://python.org/')
print f.read()
Yuck, yuck, yuck! I had realised the auth/proxies code in urllib2 was
buggy, but... And all those hoops to jump through.

Also, if you're using 2.5 SVN HEAD, it seems revision 42133 broke
ProxyHandler in an attempt to fix the URL host:post syntax!

I'll try to get some fixes in tomorrow so that 2.5 isn't broken (or at
least flag the issues to let somebody else fix them), but no promises
as usual...
John

Mar 30 '06 #2
Alejandro Dubrovsky <du*******@physics.uq.edu.au> writes:
[...]
The proxy is squid. Python version is 2.3.4 (I read that this version has a
problem in that it introduces an extra return after the authorisation, but
it isn't even getting to that bit). And yes, going through firefox,
everything works fine.

[...]

FWIW, at a glance, Python 2.3.4 has neither of the bugs I mentioned,
but the code I posted seems to work with 2.3.4. I'm not particularly
interested in what's wrong with 2.3.4's version or your usage of it
(probably both), since bugfix releases for 2.3 are no longer
happening, I believe.

I think the Examples section of the docs on this are wrong too, though
that's a bit of a moot point when the code is as broken as it seems...
John

Mar 31 '06 #3
John J. Lee wrote:
FWIW, at a glance, Python 2.3.4 has neither of the bugs I mentioned,
but the code I posted seems to work with 2.3.4. I'm not particularly
interested in what's wrong with 2.3.4's version or your usage of it
(probably both), since bugfix releases for 2.3 are no longer
happening, I believe.


"ah, yes, that works" on 2.3.4. Excellent. (I don't see what's so ugly
about that code, but i'm mostly accustomed to my own)
Thanks,
alejandro
Mar 31 '06 #4
Alejandro Dubrovsky <du*******@physics.uq.edu.au> writes:
John J. Lee wrote:
FWIW, at a glance, Python 2.3.4 has neither of the bugs I mentioned,
but the code I posted seems to work with 2.3.4. I'm not particularly
interested in what's wrong with 2.3.4's version or your usage of it
(probably both), since bugfix releases for 2.3 are no longer
happening, I believe.


"ah, yes, that works" on 2.3.4. Excellent. (I don't see what's so ugly
about that code, but i'm mostly accustomed to my own)


supplying a password surely shouldn't be that complicated...
John

Mar 31 '06 #5
jj*@pobox.com (John J. Lee) writes:
Alejandro Dubrovsky <du*******@physics.uq.edu.au> writes: [...Alejandro complains about non-working HTTP proxy auth in urllib2...]

[...John notes urllib2 bug...] A workaround is to supply a stupid HTTPPasswordMgr that always returns
the proxy credentials regardless of what the handler asks it for (only
tested with a perhaps-broken 2.5 install, since I've broken my 2.4
install): [...snip ugly code] Yuck, yuck, yuck! I had realised the auth/proxies code in urllib2 was
buggy, but... And all those hoops to jump through.

Also, if you're using 2.5 SVN HEAD, it seems revision 42133 broke
ProxyHandler in an attempt to fix the URL host:post syntax!

[...]

In fact the following also works with Python 2.3.4:

import urllib2
proxy_handler = urllib2.ProxyHandler({"http": "http://john:blah@localhost:3128"})
print urllib2.build_opener(proxy_handler).open('http://python.org/').read()
....but only just barely skirts around the bugs!-) :-(

(at least, the current bugs: I've no reason to work out what things
were like back in 2.3.4, but the above certainly works with that
version)
John

Apr 1 '06 #6
John J. Lee wrote:
jj*@pobox.com (John J. Lee) writes:
Alejandro Dubrovsky <du*******@physics.uq.edu.au> writes:

[...Alejandro complains about non-working HTTP proxy auth in urllib2...]

[...John notes urllib2 bug...]
A workaround is to supply a stupid HTTPPasswordMgr that always returns
the proxy credentials regardless of what the handler asks it for (only
tested with a perhaps-broken 2.5 install, since I've broken my 2.4
install):

[...snip ugly code]
Yuck, yuck, yuck! I had realised the auth/proxies code in urllib2 was
buggy, but... And all those hoops to jump through.

Also, if you're using 2.5 SVN HEAD, it seems revision 42133 broke
ProxyHandler in an attempt to fix the URL host:post syntax!

[...]

In fact the following also works with Python 2.3.4:

import urllib2
proxy_handler = urllib2.ProxyHandler({"http":
"http://john:blah@localhost:3128"}) print
urllib2.build_opener(proxy_handler).open('http://python.org/').read()

It does too. Thanks again. (I think this version is uglier, but easier to
insert into third party code)
Apr 3 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: O. Koch | last post by:
Until now, i know that ftplib doesn't support proxies and that i have to use urllib2. But i don't know how to use the urllib2 correct. I found some examples, but i don't understand them. Is...
4
by: bmiras | last post by:
I've got a problem using urllib2 to get a web page. I'm going through a proxy using user/password authentification and i'm trying to get a page asking for a HTTP authentification. And I'm using...
5
by: Pascal | last post by:
Hello, I want to acces my OWA (Outlook Web Acces - http Exchange interface) server with urllib2 but, when I try, I've always a 401 http error. Can someone help me (and us)? Thanks. ...
4
by: news.easynet.be | last post by:
Hello, I would like to access an HTTPS site via a proxy The following code is working for HTTP://www.hotmail.com but not for HTTPS I have try with other sites without success l_proxy_info = {...
0
by: Jörg Braun | last post by:
Hello NG, i have a problem With WebRequest! i want to download a htaccess saved file over a proxyconnetion with authentication. Download a file what is not saved with htaccess is no problem,...
0
by: BobJones | last post by:
After inserting the third CD during the installation process, the program starts rolling back the installation and displays an error message. I am running Windows 2000 SP4, installing VB.Net from...
1
by: rx | last post by:
I'm trying to hide my IP with the following code: import urllib2 proxy= opener=urllib2.build_opener(proxy) f=opener.open('http://www.whatismyipaddress.com') print f.read() But that didn't...
2
by: mrstephengross | last post by:
I'm working on learning how to use urllib2 to use a proxy server. I've looked through the postings on this group, and it's been helpful. I have not, however, found complete documentation on the...
6
by: Larry Hale | last post by:
Greetings, Pythonistas! My employer has a Squid Proxy between my Python programs and The Internet. I've searched high-and-low, and can only find examples online of how to do basic...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.