Help with cookies/authentication

Hi I am trying to pull some data from a Web site: http://schoolfinder.com

The issue is that I want to use the advanced search feature which requires logging into the Web site. I have a username and password, however I want to connect programmatically from Python. I have done data capture from the Web before so the only new thing here to me is the authentication stuff. I need cookies as this page describes: http://schoolfinder.com/login/login.asp

I already know how to enter POST/GET data to a request, but how do I deal with cookies/authentication? I have read a few articles without success:

urllib2:
http://www.voidspace.org.uk/python/a...lib2.shtml#id6

urllib2 Cookbook:
http://personalpages.tds.net/~kent37/kk/00010.html

basic authentication:
http://www.voidspace.org.uk/python/a...ion.shtml#id19

cookielib:
http://www.voidspace.org.uk/python/a...ookielib.shtml

Is there some other resource I am missing? Is it possible that someone could setup a basic script that would allow me to connect to schoolfinder.com with my username and password? My username is "greenman", password is "greenman". All I need to know is how to access pages as if I logged in by Web browser.

Thank you very much.

Aug 10 '08 #1

Subscribe Post Reply

7258

Formula

Try this code will give you all cookies will be registered in a file
from the schoolfinder.com

Expand|Select|Wrap|Line Numbers

 
#!/usr/local/bin/python
 
COOKIEFILE = 'cookies.lwp'          # the path and filename that you want to use to save your cookies in
 
import os.path
 
import sys
 
cj = None
 
ClientCookie = None
 
cookielib = None
 
try:                                    # Let's see if cookielib is available
 
    import cookielib            
 
except ImportError:
 
    pass
 
else:
 
    import urllib2    
 
    urlopen = urllib2.urlopen
 
    cj = cookielib.LWPCookieJar()       # This is a subclass of FileCookieJar that has useful load and save methods
 
    Request = urllib2.Request
 
if not cookielib:                   # If importing cookielib fails let's try ClientCookie
 
    try:                                            
 
        import ClientCookie 
 
    except ImportError:
 
        import urllib2
 
        urlopen = urllib2.urlopen
 
        Request = urllib2.Request
 
    else:
 
        urlopen = ClientCookie.urlopen
 
        cj = ClientCookie.LWPCookieJar()
 
        Request = ClientCookie.Request
 
####################################################
 
# We've now imported the relevant library - whichever library is being used urlopen is bound to the right function for retrieving URLs
 
# Request is bound to the right function for creating Request objects
 
# Let's load the cookies, if they exist. 
 
if cj != None:                                  # now we have to install our CookieJar so that it is used as the default CookieProcessor in the default opener handler
 
    if os.path.isfile(COOKIEFILE):
 
        cj.load(COOKIEFILE)
 
    if cookielib:
 
        opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
 
        urllib2.install_opener(opener)
 
    else:
 
        opener = ClientCookie.build_opener(ClientCookie.HTTPCookieProcessor(cj))
 
        ClientCookie.install_opener(opener)
 
# If one of the cookie libraries is available, any call to urlopen will handle cookies using the CookieJar instance we've created
 
# (Note that if we are using ClientCookie we haven't explicitly imported urllib2)
 
# as an example :
 
theurl = 'http://schoolfinder.com/login/login.asp'         # an example url that sets a cookie, try different urls here and see the cookie collection you can make !

body={'usr':'greenman','pwd':'greenman'}
 
from urllib import urlencode
 
txdata = urlencode(body)                                                                           # if we were making a POST type request, we could encode a dictionary of values here - using urllib.urlencode
 
txheaders =  {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}          # fake a user agent, some websites (like google) don't like automated exploration
 
try:
 
    req = Request(theurl, txdata, txheaders)            # create a request object
 
    handle = urlopen(req)                               # and open it to return a handle on the url
 
except IOError, e:
 
    print 'We failed to open "%s".' % theurl
 
    if hasattr(e, 'code'):
 
        print 'We failed with error code - %s.' % e.code
 
    elif hasattr(e, 'reason'):
 
        print "The error object has the following 'reason' attribute :", e.reason
 
        print "This usually means the server doesn't exist, is down, or we don't have an internet connection."
 
        sys.exit()
 
else:
 
    print 'Here are the headers of the page :'
 
    print handle.info()                             # handle.read() returns the page, handle.geturl() returns the true url of the page fetched (in case urlopen has followed any redirects, which it sometimes does)
 
print
 
if cj == None:
 
    print "We don't have a cookie library available - sorry."
 
    print "I can't show you any cookies."
 
else:
 
    print 'These are the cookies we have received so far :'
 
    for index, cookie in enumerate(cj):
 
        print index, '  :  ', cookie        
 
    cj.save(COOKIEFILE)                     # save the cookies again

Aug 10 '08 #2

trihaitran

Thanks for the help. Your code by itself did not work, but it pushed me in the right direction. Here is what worked for me and let me see the protected pages:

Expand|Select|Wrap|Line Numbers

 #!/usr/bin/env python

# -*- coding: UTF-8 -*-
 
import cookielib

import urllib

import urllib2
 
cj = cookielib.CookieJar()

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

resp = opener.open('http://schoolfinder.com') # save a cookie
 
theurl = 'http://schoolfinder.com/login/login.asp' # an example url that sets a cookie, try different urls here and see the cookie collection you can make !

body={'usr':'greenman','pwd':'greenman'}

txdata = urllib.urlencode(body) # if we were making a POST type request, we could encode a dictionary of values here - using urllib.urlencode

txheaders =  {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'} # fake a user agent, some websites (like google) don't like automated exploration
 
try:

    req = urllib2.Request(theurl, txdata, txheaders) # create a request object

    handle = opener.open(req) # and open it to return a handle on the url

    HTMLSource = handle.read()

    f = file('test.html', 'w')

    f.write(HTMLSource)

    f.close()
 
except IOError, e:

    print 'We failed to open "%s".' % theurl

    if hasattr(e, 'code'):

        print 'We failed with error code - %s.' % e.code

    elif hasattr(e, 'reason'):

        print "The error object has the following 'reason' attribute :", e.reason

        print "This usually means the server doesn't exist, is down, or we don't have an internet connection."

        sys.exit()
 
else:

    print 'Here are the headers of the page :'

    print handle.info() # handle.read() returns the page, handle.geturl() returns the true url of the page fetched (in case urlopen has followed any redirects, which it sometimes does)

Aug 30 '08 #3

johnpollard

Your script works for me, but the one below for another site does not. The test.html file is not my logged in file like it is when I run your script.

The only lines of code I changed are;
resp = opener.open('http://www.amm.com/')
theurl = 'http://www.amm.com/login.asp'
body={'username':'AMMT54590570','password':'AMMT32 564288'}

What am I doing wrong?

-----------------------------------

Expand|Select|Wrap|Line Numbers

 #!/usr/bin/env python

# -*- coding: UTF-8 -*-
 
import cookielib

import urllib

import urllib2
 
cj = cookielib.CookieJar()

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

resp = opener.open('http://www.amm.com/login.asp') # save a cookie
 
theurl = 'http://www.amm.com/login.asp'

# an example url that sets a cookie, try different urls here and see the cookie collection you can make !

body={'username':'AMMT54590570','password':'AMMT32564288'}

txdata = urllib.urlencode(body)

# if we were making a POST type request, we could encode a dictionary of values here - using urllib.urlencode

txheaders =  {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}

# fake a user agent, some websites (like google) don't like automated exploration
 
try:

    req = urllib2.Request(theurl, txdata, txheaders) # create a request object

    handle = opener.open(req) # and open it to return a handle on the url

    HTMLSource = handle.read()

    f = file('test.html', 'w')

    f.write(HTMLSource)

    f.close()
 
except IOError, e:

    print 'We failed to open "%s".' % theurl

    if hasattr(e, 'code'):

        print 'We failed with error code - %s.' % e.code

    elif hasattr(e, 'reason'):

        print "The error object has the following 'reason' attribute :", e.reason

        print "This usually means the server doesn't exist, is down, or we don't have an internet connection."

        sys.exit()
 
else:

    print 'Here are the headers of the page :'

    print handle.info() # handle.read() returns the page, handle.geturl() returns the true url of the page fetched (in case urlopen has followed any redirects, which it sometimes does)

Oct 29 '08 #4

Similar topics

HELP Connection error on Release mode

by: Brian Conway | last post by:

I have no idea what is going on. I have a Login screen where someone types in their login information and this populates a datagrid based off of the login. Works great in debug and test through...

C# / C Sharp

Forms authentication in a subfolder problem, please help

by: Kris van der Mast | last post by:

Hi, I've created a little site for my sports club. In the root folder there are pages that are viewable by every anonymous user but at a certain subfolder my administration pages should be...

ASP.NET

Forms Authentication Cookies Never Expire

by: Joey Powell | last post by:

This message was originally posted to the aspnet.security newsgroup, but no one there has ever heard of this before. That is why I am posting this message here, so that more people will see it... ...

ASP.NET

Cookies ... monster. Please help

by: Calvin KD | last post by:

Hi everyone, Can someone tell me what's wrong with the way that i read a cookie as below: private void Page_Load(object sender, System.EventArgs e) { Response.Cookies.Clear(); HttpCookie...

ASP.NET

Forms authentication cookies not expiring...

by: pv_kannan | last post by:

I recently found out that my authentication cookies are not expiring even though I have set the persist property to false. As a result, users are able to access the secure websites with indifferent...

ASP.NET

[.NET 1.1] Authentication and cookies clarifications

by: Nicola Farina | last post by:

Hi all, I'm testing ASP.NET 1.1 authentications and cookies features, and I've red tons of tutorials and articles about this, but not all is clear for me. My goal is to create a basic site...

ASP.NET

cookieless Forms Authentication - please help...

by: studio60podcast | last post by:

I'm writing an ASP.NET 2.0 application using the new Membership providor and I'm having trouble. I have created the roles, logins, login controls, etc... and I can log â€¨in to the site....

.NET Framework

Session timeout .... please help... very urgent

by: Calvin KD | last post by:

Hi everyone, I need help urgently. I have a C#.Net app which uses cookies for state management. Everything has been going fine until recently we've expanded the app and a few more screens were...

ASP.NET

Authentication help.

by: archana | last post by:

Hi all I am new to asp.net. I want to implement authentication in all pages. What i want to do is validate user from database table. So currently what i am doing is on login page validating...

ASP.NET

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing