473,386 Members | 1,804 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Python Scripts to logon to websites

New to Python and Programming. Trying to make scripts that will open
sites and automatically log me on.

The following example is from the urllib2 module.

What are "realm" and "host" in this example.

import urllib2
# Create an OpenerDirector with support for Basic HTTP
Authentication...
auth_handler = urllib2.HTTPBasicAuthHandler()
auth_handler.add_password('realm', 'host', 'username', 'password')
opener = urllib2.build_opener(auth_handler)
# ...and install it globally so it can be used with urlopen.
urllib2.install_opener(opener)
urllib2.urlopen('http://www.example.com/login.html')

Does anyone have a simple example of a script that opens, say, gmail or
some other commonly accessed site that requires a username and password
so that I can see how one is made?

Thanks very much for any help.

rpd

Jan 11 '06 #1
13 2323
BartlebyScrivener wrote:
New to Python and Programming. Trying to make scripts that will open
sites and automatically log me on.

The following example is from the urllib2 module.

What are "realm" and "host" in this example.
http://www.ietf.org/rfc/rfc2617.txt probably provides more background
than you want on that topic, but googling for "basic authentication" and
maybe "realm" and/or "host" will find you other sites with less
technically detailed material. The first hit has a little summary
amidst some Apache-specific detail.
Does anyone have a simple example of a script that opens, say, gmail or
some other commonly accessed site that requires a username and password
so that I can see how one is made?


"realm" and "host" are associated with "basic authentication" and not
all sites use that. If the browser pops up a little dialog box of its
own (i.e not some Javascript-triggered thing) and you have to enter your
username and password there, that's probably a "basic auth" (or "digest
auth") site. If you fill that info into a form (as on gmail.com) you
don't want any of that "realm/host" stuff.

I'll leave it to others more expert in this to provide a more directly
useful answer.

-Peter

Jan 11 '06 #2
"BartlebyScrivener" <rp*******@gmail.com> writes:
New to Python and Programming. Trying to make scripts that will open
sites and automatically log me on.
A common enough things to want to do.
The following example is from the urllib2 module.

What are "realm" and "host" in this example.
Host is a domain name that can be mapped to a ip address. Realm is
from HTTP authentication schemes. When the server asks for
authentication, it gives out a "realm" name as well, so that different
parts of the host can use different authentication systems.
Does anyone have a simple example of a script that opens, say, gmail or
some other commonly accessed site that requires a username and password
so that I can see how one is made?


Yes, but its not clear how much good it'll do you. As Peter indicated,
not everyone uses HTTP based authentication. In fact, pretty much
anyone who wants to control how the authentication boxes look (which
seems to be 99% of the people writing web apps, never mind that they
can't really do that) use something other than HTTP-based
authentication. How you go about dealing with such sites depends on
where they put the user name/login information,and how they encode the
fact that you've authenticated as user "xxxx".

So I could show you my script for accessing yahoo. However, it
probably won't work on another site without changes to accomodate the
other site.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Jan 11 '06 #3
BartlebyScrivener wrote:
New to Python and Programming. Trying to make scripts that will open
sites and automatically log me on. [snip] Does anyone have a simple example of a script that opens, say, gmail or
some other commonly accessed site that requires a username and password
so that I can see how one is made?


I see your example uses HTTP authentication, but I still recommend
checking out mechanoid [1] if you want to access a site with a
form-based login system. The source contains an example that retreives
and sends email through Yahoo.

[1] http://cheeseshop.python.org/pypi/mechanoid/

--
dOb
Jan 11 '06 #4
> but googling for "basic authentication" and
maybe "realm" and/or "host" will find you other sites with less
technically detailed material.
This looks promising, but it'll take me a week to understand it :)

http://www.voidspace.org.uk/python/a...ntication.shtm

Thanks for your help with the search terms.

rpd
l
Peter Hansen wrote: BartlebyScrivener wrote:
New to Python and Programming. Trying to make scripts that will open
sites and automatically log me on.

The following example is from the urllib2 module.

What are "realm" and "host" in this example.


http://www.ietf.org/rfc/rfc2617.txt probably provides more background
than you want on that topic, but googling for "basic authentication" and
maybe "realm" and/or "host" will find you other sites with less
technically detailed material. The first hit has a little summary
amidst some Apache-specific detail.
Does anyone have a simple example of a script that opens, say, gmail or
some other commonly accessed site that requires a username and password
so that I can see how one is made?


"realm" and "host" are associated with "basic authentication" and not
all sites use that. If the browser pops up a little dialog box of its
own (i.e not some Javascript-triggered thing) and you have to enter your
username and password there, that's probably a "basic auth" (or "digest
auth") site. If you fill that info into a form (as on gmail.com) you
don't want any of that "realm/host" stuff.

I'll leave it to others more expert in this to provide a more directly
useful answer.

-Peter


Jan 11 '06 #5
BartlebyScrivener wrote:
but googling for "basic authentication" and
maybe "realm" and/or "host" will find you other sites with less
technically detailed material.


This looks promising, but it'll take me a week to understand it :)

http://www.voidspace.org.uk/python/a...ntication.shtm


(Minor typo... needs an extra "l" on the end:

http://www.voidspace.org.uk/python/a...tication.shtml
)

By the way, note that neither basic auth nor digest auth provide any
real security, and in fact with basic auth the userid and password are
sent *in cleartext*. For any serious production site these techniques
should probably not be used without additional security measures in
place, such as HTTPS encryption.

-Peter

Jan 11 '06 #6
Thanks, Peter.

Peter Hansen wrote:
BartlebyScrivener wrote:
but googling for "basic authentication" and
maybe "realm" and/or "host" will find you other sites with less
technically detailed material.


This looks promising, but it'll take me a week to understand it :)

http://www.voidspace.org.uk/python/a...ntication.shtm


(Minor typo... needs an extra "l" on the end:

http://www.voidspace.org.uk/python/a...tication.shtml
)

By the way, note that neither basic auth nor digest auth provide any
real security, and in fact with basic auth the userid and password are
sent *in cleartext*. For any serious production site these techniques
should probably not be used without additional security measures in
place, such as HTTPS encryption.

-Peter


Jan 12 '06 #7
Peter Hansen <pe***@engcorp.com> writes:
By the way, note that neither basic auth nor digest auth provide any
real security, and in fact with basic auth the userid and password are
sent *in cleartext*. For any serious production site these techniques
should probably not be used without additional security measures in
place, such as HTTPS encryption.


To be clear, the HTTP authentication schemes don't provide any
security for the *content* that gets passed back and forth, and they
don't claim to. If someone can intercept that content, they can read
it. For some applications, this is really important. For others, it
doesn't matter at all.

Basic auth doesn't (quite) pass the user name and password in
cleartext. It uses rot-13. For all the protection it provides, it
might as well be cleartext.

Digest passes around md5 sums of varous bits and pieces. While md5 has
been compromised, I don't believe that's happened in a way that
compromises the security of digest auth. The password and username
that pass over the wire are about as secure as they're going to get
without noticably heavier mechanisms than digest auth requires. On the
downside, the server has to have the clear text password available.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/ Independent
WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Jan 12 '06 #8
Mike Meyer wrote:
Peter Hansen <pe***@engcorp.com> writes:
By the way, note that neither basic auth nor digest auth provide any
real security, and in fact with basic auth the userid and password are
sent *in cleartext*. For any serious production site these techniques
should probably not be used without additional security measures in
place, such as HTTPS encryption.
To be clear, the HTTP authentication schemes don't provide any
security for the *content* that gets passed back and forth, and they
don't claim to. If someone can intercept that content, they can read
it. For some applications, this is really important. For others, it
doesn't matter at all.


If someone can see the content, they can also see the userid and
password. If they can see the password, they will (with how most people
operate) now have a userid and password that will work on many other
sites, including possibly someone's banking site, no matter how secure
even the content might be for that site.

Most people on the web are simply too ignorant of security issues for
those of us building systems that require passwords to ignore this
issue. To do so is to endanger the security and privacy of the very
people you are hoping to have as users and customers, which is lazy and
careless (and perhaps in some countries even criminal these days).
Basic auth doesn't (quite) pass the user name and password in
cleartext. It uses rot-13. For all the protection it provides, it
might as well be cleartext.
It's actually base64 encoding, but it amounts to the same thing, as you
say, as cleartext, since it's trivially reversible. The protection is
useless against all but honest people who might otherwise accidentally
see it while looking at packet monitoring dumps or such.
Digest passes around md5 sums of varous bits and pieces. While md5 has
been compromised, I don't believe that's happened in a way that
compromises the security of digest auth. The password and username
that pass over the wire are about as secure as they're going to get
without noticably heavier mechanisms than digest auth requires. On the
downside, the server has to have the clear text password available.


My information about digest was either obsolete or simply wrong, as I
didn't realize it had all the nonce and anti-replay support it appears
to have. (I may have been remembering articles about how much of that
wasn't supported widely at some time in the past, meaning replays were
still quite possible in most cases. No longer sure.) Thanks for the
correction.

In my own opinion, however, requiring that passwords be stored in clear
text on the server is still quite a bad thing to do. I don't think even
system administrators should ever have access to user passwords. But
many people don't seem to agree (or at least, are more than happy to be
lazy rather than diligent in protecting their users' privacy).

-Peter

Jan 12 '06 #9
Peter Hansen <pe***@engcorp.com> writes:
My information about digest was either obsolete or simply wrong, as I
didn't realize it had all the nonce and anti-replay support it appears
to have. (I may have been remembering articles about how much of that
wasn't supported widely at some time in the past, meaning replays were
still quite possible in most cases. No longer sure.) Thanks for the
correction.
Digest is actually rarely used, since sites with enough security
requirements to make it worthwhile generally use SSL/TLS with either
basic auth, or with some login mechanism implemented by the
application. Actually, HTTP authentication (basic or digest) is not
used all that much in general these days, since nontrivial web apps
generally prefer to do their own authentication. It was more common
in the early days of the web when most pages were static.
In my own opinion, however, requiring that passwords be stored in
clear text on the server is still quite a bad thing to do.


Digest auth, like basic auth, doesn't require storing the cleartext
password; only a hash of the password needs to be stored. See RFC
2617 for details.
Jan 12 '06 #10
Peter Hansen <pe***@engcorp.com> writes:
Mike Meyer wrote:
Peter Hansen <pe***@engcorp.com> writes:
By the way, note that neither basic auth nor digest auth provide any
real security, and in fact with basic auth the userid and password are
sent *in cleartext*. For any serious production site these techniques
should probably not be used without additional security measures in
place, such as HTTPS encryption. To be clear, the HTTP authentication schemes don't provide any
security for the *content* that gets passed back and forth, and they
don't claim to. If someone can intercept that content, they can read
it. For some applications, this is really important. For others, it
doesn't matter at all.

If someone can see the content, they can also see the userid and
password.


Only if the userid and password are part of the content. If you're
doing the usual form-based authentication, then they are. If you're
doing an HTTP-based authentication, then they aren't - the
authentication information is in the headers, and can be protected
however the protocol designers want it to be.
Most people on the web are simply too ignorant of security issues for
those of us building systems that require passwords to ignore this
issue. To do so is to endanger the security and privacy of the very
people you are hoping to have as users and customers, which is lazy
and careless (and perhaps in some countries even criminal these days).
Most of the people building systems that require passwords on the web
are too ignorant of security issues for me to trust anything crucial
to them. I don't bank online, because the banking systems I've looked
at don't meet *my* minimal requirements for security.
Digest passes around md5 sums of varous bits and pieces. While md5 has
been compromised, I don't believe that's happened in a way that
compromises the security of digest auth. The password and username
that pass over the wire are about as secure as they're going to get
without noticably heavier mechanisms than digest auth requires. On the
downside, the server has to have the clear text password available.

My information about digest was either obsolete or simply wrong, as I
didn't realize it had all the nonce and anti-replay support it appears
to have. (I may have been remembering articles about how much of that
wasn't supported widely at some time in the past, meaning replays were
still quite possible in most cases. No longer sure.) Thanks for the
correction.


Back when I was dealing with this on a regular basis, the major
browser and server vendors were all pushing encrypted session
mechanisms of various kinds. Given that, a secure authentication
mechanism is a waste of time, and would provide competition for their
product in some application domains. So those vendors typically didn't
implement digest authentication. This sucked if you were exchanging
content that didn't need security, but wanted to authenticate
identity.
In my own opinion, however, requiring that passwords be stored in
clear text on the server is still quite a bad thing to do. I don't
think even system administrators should ever have access to user
passwords. But many people don't seem to agree (or at least, are more
than happy to be lazy rather than diligent in protecting their users'
privacy).


Paul Rubin indicates that this isn't required - so my information is
out of date as well.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Jan 12 '06 #11
Mike Meyer <mw*@mired.org> writes:
Only if the userid and password are part of the content. If you're
doing the usual form-based authentication, then they are. If you're
doing an HTTP-based authentication, then they aren't - the
authentication information is in the headers, and can be protected
however the protocol designers want it to be.
Well, HTTP Basic and HTTP Digest authentication both send the userid
in the clear. Basic also sends the password in the clear, while
Digest sends a hash of the (salted) password in the clear. Digest is
better than Basic, but since the attacker can see both the salt and
the password hash, he can still run a dictionary attack. Therefore,
using form-based authentication over SSL is more secure than using
HTTP Digest without SSL. (Special tip from Paranoid Pete: have the
downloaded page include some javascript that inserts some padding
chars into a hidden form field, making the form post have constant
length and thereby prevent leaking the password length).
Most of the people building systems that require passwords on the web
are too ignorant of security issues for me to trust anything crucial
to them. I don't bank online, because the banking systems I've looked
at don't meet *my* minimal requirements for security.
Worse than that, the user agreements typically make security failures
the customer's problem even if they're the bank's fault.
Back when I was dealing with this on a regular basis, the major
browser and server vendors were all pushing encrypted session
mechanisms of various kinds. Given that, a secure authentication
mechanism is a waste of time, and would provide competition for their
product in some application domains. So those vendors typically didn't
implement digest authentication. This sucked if you were exchanging
content that didn't need security, but wanted to authenticate
identity.


I don't have the impression that it was that nefarious. It took a
while for the standards for both encryption and digest authentication
to settle. By the time digest authentication was ready for prime
time, SSL was also widely deployed, and anyone doing anything serious
used SSL. So digest authentication was simply not needed.
Jan 12 '06 #12
Peter Hansen wrote:
BartlebyScrivener wrote:
but googling for "basic authentication" and
maybe "realm" and/or "host" will find you other sites with less
technically detailed material.


This looks promising, but it'll take me a week to understand it :)

http://www.voidspace.org.uk/python/a...ntication.shtm

(Minor typo... needs an extra "l" on the end:

http://www.voidspace.org.uk/python/a...tication.shtml
)

By the way, note that neither basic auth nor digest auth provide any
real security, and in fact with basic auth the userid and password are
sent *in cleartext*. For any serious production site these techniques
should probably not be used without additional security measures in
place, such as HTTPS encryption.

Underlining your point, the difference between the two is that digest
offers *strong* authentication (i.e. is not subject to replay attacks)
while basic doesn't (anyone can capture the traffic and use the same
tokens to authorize against the site).

Sometimes strong authentication without confidentiality is a legitimate
requirement.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Jan 12 '06 #13
Steve Holden <st***@holdenweb.com> writes:
Underlining your point, the difference between the two is that digest
offers *strong* authentication (i.e. is not subject to replay attacks)


As I mentioned in another post, that's really not enough, since digest
still exposes the password hash to offline dictionary attacks, which
are sure to nab some passwords if you have a lot of users being
sniffed and you don't impose severe amounts of password discipline on
them. There's also usually no way to log out from an http
authenticated session except by completely closing the browser. All
in all, if you have nontrivial security requirements there's not much
point in using Digest. Use form-based authentication over SSL/TLS
instead. Make sure that the application locks out the user account
(at least temporarily) after too many failed login attempts, something
http authentication implementations that I know of don't bother to do.

For higher security applications (e.g. extranets, admin interfaces,
etc), use client certificates on hardware tokens.
Jan 12 '06 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

50
by: Edward K. Ream | last post by:
I would like to say a few (actually more than a few) words here about some recent discoveries I have made concerning the interaction of Leo and Python. If you don't want to hear an inventor enthuse...
4
by: Logan | last post by:
Several people asked me for the following HOWTO, so I decided to post it here (though it is still very 'alpha' and might contain many (?) mistakes; didn't test what I wrote, but wrote it - more or...
12
by: The Tao of Spike | last post by:
I've recentlty been getting into programming. I was wondering what language to learn first and after asking around I decided on Python. I'm about half way through "Non-Programmers Tutorial For...
0
by: Ksenia Marasanova | last post by:
Hi, I have few Python cgi scripts on the server (FreeBSD 4.9) for sending email from plain HTML websites. Few days ago I added database backup functionality to it, by saving emails into the...
1
by: Philippe Martin | last post by:
Hi, Are there any (even prototypes/proof of concept) gdm/kdm/xdm.../-style packages written in Python ? Regards, Philippe
31
by: Manfred Kooistra | last post by:
If I have a document like this: <html> <head> <script language=javascript> window.location.href='file.php'; </script> </head> <body> body content
4
by: Ultrus | last post by:
Hello Python Gurus, I picked up a book the other day on Python programming. Python rocks! I'm learning Python as I want to call upon it to handle some intensive tasks from PHP/web server. The...
145
by: Dave Parker | last post by:
I've read that one of the design goals of Python was to create an easy- to-use English-like language. That's also one of the design goals of Flaming Thunder at http://www.flamingthunder.com/ ,...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.