473,509 Members | 2,950 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Deficiency in urllib/socket for https?

I think I've found a deficiency in the design of urllib related to https.

In order to complete an https connection, it appears that URLOpener and
hence FancyURLOpener require the key and cert files. Or at least, it's not
clear from the description of socket.ssl what it does if they're omitted.

However, urlopen has no way to specify such things. Nor should it - for
typical uses, a person simply trying to retrieve data from an ssl site
really doesn't want to know or care about keys and certificate directories.
One just wants to provide an https url and have it work. Ideally, there
should be defaults for the certificate files.

This implies that somewhere in the function hierarchy, I suspect in
socket.ssl, there needs to be some clever defaults. I don't know if they
folks maintaining the Python distribution really want to be in the business
of maintaining key and certificate directories (probably not), but there at
least ought to be a way to specify default directories (oh, no, another
environment variable?). Thinking idealistically, it would be great if it
could share the default certs on the system (i.e. on UNIX, find a Netscape
or Mozilla install directory and use those, and on MS Windows, do whatever
it takes to use the Windows mechanism).

It's possible my analysis is flawed. I haven't taken the time to download
and read the _ssl code, just the socket.py code (and urllib and httplib) .
So corrections as appreciated as much as comments.

Gary
Jul 18 '05 #1
4 5168
Gary Feldman <ga*************@ziplink.stopallspam.net> writes:
I think I've found a deficiency in the design of urllib related to https.

In order to complete an https connection, it appears that URLOpener and
hence FancyURLOpener require the key and cert files. Or at least, it's not
clear from the description of socket.ssl what it does if they're omitted.
Nor from urllib -- see below. In fact, it seems that verification is
just skipped if they're not there.

However, urlopen has no way to specify such things. Nor should it - for
typical uses, a person simply trying to retrieve data from an ssl site
really doesn't want to know or care about keys and certificate directories.
One just wants to provide an https url and have it work. Ideally, there
should be defaults for the certificate files.
Hmm, looking at both urllib and urllib2, I see urllib2 doesn't use any
key or certificate files at all. So, two points: this is a deficiency
in urllib2 that should be fixed, and, if you're not bothered about key
verification, I'd guess just not providing key / cert files will work.

Hmm, urllib documentation seems wrong here:

Additional keyword parameters, collected in x509, are used for
authentication with the https: scheme. The keywords key_file and
cert_file are supported; both are needed to actually retrieve a
resource at an https: URL.

The fact that https works in urllib2 (which does not provide key /
cert files) seems to demonstrate that they're *not* required, and that
verification is skipped if they're not supplied.

If you *are* bothered about verification, use the x509 arg to
FancyURLOpener (which is documented, see above). The urlopen function
is just a convenience -- just cut-n-paste the trivial code from
urllib.py and adapt it to your needs if you need something more
complicated.

This implies that somewhere in the function hierarchy, I suspect in
socket.ssl, there needs to be some clever defaults. I don't know if they
folks maintaining the Python distribution really want to be in the business
of maintaining key and certificate directories (probably not), but there at
least ought to be a way to specify default directories (oh, no, another
environment variable?). Thinking idealistically, it would be great if it
could share the default certs on the system (i.e. on UNIX, find a Netscape
or Mozilla install directory and use those, and on MS Windows, do whatever
it takes to use the Windows mechanism).


That sounds great if you have the time to write the code. Nobody else
is likely to.
John
Jul 18 '05 #2
jj*@pobox.com (John J. Lee) writes:
[...]
Would you mind submitting a doc patch (both urllib and urllib2 docs
appear to need fixing -- urllib2 to say that it never verifies, urllib
to say that it skips verification if an appropriate x509 mapping isn't
supplied)?


Hmm, maybe I've got this wrong: the fact that key/cert args are passed
to httplib.HTTPS by urllib doesn't mean authentication happens, and
the fact that they're not passed by urllib2 doesn't mean
authentication doesn't happen. I'll investigate.
John
Jul 18 '05 #3
jj*@pobox.com (John J. Lee) writes:
[...]
You're right -- with the caveat that it is useful to have https even
without authentication (essentially all https traffic on the internet
proves that ;-).

[...]

I should have said "...it is useful to have *support* for https...".

The utility of https itself is another matter...
John
Jul 18 '05 #4
jj*@pobox.com (John J. Lee) writes:
jj*@pobox.com (John J. Lee) writes:
[...]
Would you mind submitting a doc patch (both urllib and urllib2 docs
appear to need fixing -- urllib2 to say that it never verifies, urllib
to say that it skips verification if an appropriate x509 mapping isn't
supplied)?


Hmm, maybe I've got this wrong: the fact that key/cert args are passed
to httplib.HTTPS by urllib doesn't mean authentication happens, and
the fact that they're not passed by urllib2 doesn't mean
authentication doesn't happen. I'll investigate.


Bah! *After* reading the source, I found this in the ssl module docs:

| Warning: This does not do any certificate verification!

(which the _ssl.c source confirms: it uses SSL_VERIFY_NONE, but
doesn't call SSL_get_verify_result).

So the urllib docs are wrong:

| Additional keyword parameters, collected in x509, are used for
| authentication with the https: scheme. The keywords key_file and
| cert_file are supported; both are needed to actually retrieve a
| resource at an https: URL.

They're not needed, and they're never used for authentication (if you
don't count just checking the key without verifying it against the
certificate). Given this, the fact that urllib2 doesn't have
arguments for this starts to look like a feature, not a bug! Actually
(dredging up very hazy memories here) aren't you supposed to check a
revocation list, too? Is that given in a URL in the certificate? No
idea how this SSL stuff is supposed to work, really...

I'll upload a doc patch in a minute.

So, in summary, none of httplib, urllib and urllib2 in standard Python
do proper authentication (because the socket module doesn't). There
are third-party SSL libraries for Python: m2crypto is one. If you
need it, and assuming m2crypto has an ssl function with the same
interface that *does* do better auth, I suppose you could probably do

import socket
from m2crypto import ssl # or whatever
socket.ssl = ssl
And have urllib magically start working, with any luck.
John
Jul 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
2479
by: John Hunter | last post by:
I have a test script below which I use to fetch urls into strings, either over https or http. When over https, I use m2crypto.urllib and when over http I use the standard urllib. Whenever, I...
3
2210
by: Chris Tavares | last post by:
Hi all. I'm currently tracking down a problem in a little script I have, and I was hoping that those more experienced than myself could weigh in. The script's job is to grab the status page off a...
4
3579
by: william | last post by:
I've got a strange problem on windows (not very familiar with that OS). I can ping a host, but cannot get it via urllib (see here under). I can even telnet the host on port 80. Thus network...
6
14508
by: JabaPyth | last post by:
Hello, I'm trying to use the urllib module, but when i try urllib.urlopen, it gives me a socket error: >>import urllib >>print urllib.urlopen('http://www.google.com/').read() Traceback (most...
0
2429
by: Ali.Sabil | last post by:
hello all, I just maybe hit a bug in both urllib and urllib2, actually urllib doesn't support proxy authentication, and if you setup the http_proxy env var to...
0
1333
by: Cecil Westerhof | last post by:
I have a strange problem. I wrote a script that uses urllib.urlopen to fetch a page through https. In Python 2.2.2 this works without a problem. But when I use the script in Python I get: ...
4
9319
by: kgrafals | last post by:
Hi, I'm just trying to read from a webpage with urllib but I'm getting IOErrors. This is my code: import urllib sock = urllib.urlopen("http://www.google.com/") and this is the error:
5
4659
by: supercooper | last post by:
I am downloading images using the script below. Sometimes it will go for 10 mins, sometimes 2 hours before timing out with the following error: Traceback (most recent call last): File...
5
7673
by: John Nagle | last post by:
I thought I had all the timeout problems with urllib worked around, but no. socket.setdefaulttimeout is useful, but not always effective. I'm setting that to 15 seconds. If the host end won't...
0
7234
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
1
7069
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7505
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
1
5060
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4730
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3203
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1570
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
775
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
441
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.