473,386 Members | 1,763 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Help with cookies/authentication

Hi I am trying to pull some data from a Web site: http://schoolfinder.com

The issue is that I want to use the advanced search feature which requires logging into the Web site. I have a username and password, however I want to connect programmatically from Python. I have done data capture from the Web before so the only new thing here to me is the authentication stuff. I need cookies as this page describes: http://schoolfinder.com/login/login.asp

I already know how to enter POST/GET data to a request, but how do I deal with cookies/authentication? I have read a few articles without success:

urllib2:
http://www.voidspace.org.uk/python/a...lib2.shtml#id6

urllib2 Cookbook:
http://personalpages.tds.net/~kent37/kk/00010.html

basic authentication:
http://www.voidspace.org.uk/python/a...ion.shtml#id19

cookielib:
http://www.voidspace.org.uk/python/a...ookielib.shtml

Is there some other resource I am missing? Is it possible that someone could setup a basic script that would allow me to connect to schoolfinder.com with my username and password? My username is "greenman", password is "greenman". All I need to know is how to access pages as if I logged in by Web browser.

Thank you very much.
Aug 10 '08 #1
3 7258
Formula
11
Try this code will give you all cookies will be registered in a file
from the schoolfinder.com

Expand|Select|Wrap|Line Numbers
  1. #!/usr/local/bin/python
  2.  
  3.  
  4.  
  5.  
  6.  
  7. COOKIEFILE = 'cookies.lwp'          # the path and filename that you want to use to save your cookies in
  8.  
  9. import os.path
  10.  
  11. import sys
  12.  
  13.  
  14.  
  15. cj = None
  16.  
  17. ClientCookie = None
  18.  
  19. cookielib = None
  20.  
  21.  
  22.  
  23. try:                                    # Let's see if cookielib is available
  24.  
  25.     import cookielib            
  26.  
  27. except ImportError:
  28.  
  29.     pass
  30.  
  31. else:
  32.  
  33.     import urllib2    
  34.  
  35.     urlopen = urllib2.urlopen
  36.  
  37.     cj = cookielib.LWPCookieJar()       # This is a subclass of FileCookieJar that has useful load and save methods
  38.  
  39.     Request = urllib2.Request
  40.  
  41.  
  42.  
  43. if not cookielib:                   # If importing cookielib fails let's try ClientCookie
  44.  
  45.     try:                                            
  46.  
  47.         import ClientCookie 
  48.  
  49.     except ImportError:
  50.  
  51.         import urllib2
  52.  
  53.         urlopen = urllib2.urlopen
  54.  
  55.         Request = urllib2.Request
  56.  
  57.     else:
  58.  
  59.         urlopen = ClientCookie.urlopen
  60.  
  61.         cj = ClientCookie.LWPCookieJar()
  62.  
  63.         Request = ClientCookie.Request
  64.  
  65.  
  66.  
  67. ####################################################
  68.  
  69. # We've now imported the relevant library - whichever library is being used urlopen is bound to the right function for retrieving URLs
  70.  
  71. # Request is bound to the right function for creating Request objects
  72.  
  73. # Let's load the cookies, if they exist. 
  74.  
  75.  
  76.  
  77. if cj != None:                                  # now we have to install our CookieJar so that it is used as the default CookieProcessor in the default opener handler
  78.  
  79.     if os.path.isfile(COOKIEFILE):
  80.  
  81.         cj.load(COOKIEFILE)
  82.  
  83.     if cookielib:
  84.  
  85.         opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
  86.  
  87.         urllib2.install_opener(opener)
  88.  
  89.     else:
  90.  
  91.         opener = ClientCookie.build_opener(ClientCookie.HTTPCookieProcessor(cj))
  92.  
  93.         ClientCookie.install_opener(opener)
  94.  
  95.  
  96.  
  97. # If one of the cookie libraries is available, any call to urlopen will handle cookies using the CookieJar instance we've created
  98.  
  99. # (Note that if we are using ClientCookie we haven't explicitly imported urllib2)
  100.  
  101. # as an example :
  102.  
  103.  
  104.  
  105. theurl = 'http://schoolfinder.com/login/login.asp'         # an example url that sets a cookie, try different urls here and see the cookie collection you can make !
  106. body={'usr':'greenman','pwd':'greenman'}
  107.  
  108. from urllib import urlencode
  109.  
  110.  
  111. txdata = urlencode(body)                                                                           # if we were making a POST type request, we could encode a dictionary of values here - using urllib.urlencode
  112.  
  113. txheaders =  {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}          # fake a user agent, some websites (like google) don't like automated exploration
  114.  
  115.  
  116.  
  117. try:
  118.  
  119.     req = Request(theurl, txdata, txheaders)            # create a request object
  120.  
  121.     handle = urlopen(req)                               # and open it to return a handle on the url
  122.  
  123. except IOError, e:
  124.  
  125.     print 'We failed to open "%s".' % theurl
  126.  
  127.     if hasattr(e, 'code'):
  128.  
  129.         print 'We failed with error code - %s.' % e.code
  130.  
  131.     elif hasattr(e, 'reason'):
  132.  
  133.         print "The error object has the following 'reason' attribute :", e.reason
  134.  
  135.         print "This usually means the server doesn't exist, is down, or we don't have an internet connection."
  136.  
  137.         sys.exit()
  138.  
  139.  
  140.  
  141. else:
  142.  
  143.     print 'Here are the headers of the page :'
  144.  
  145.     print handle.info()                             # handle.read() returns the page, handle.geturl() returns the true url of the page fetched (in case urlopen has followed any redirects, which it sometimes does)
  146.  
  147.  
  148.  
  149. print
  150.  
  151. if cj == None:
  152.  
  153.     print "We don't have a cookie library available - sorry."
  154.  
  155.     print "I can't show you any cookies."
  156.  
  157. else:
  158.  
  159.     print 'These are the cookies we have received so far :'
  160.  
  161.     for index, cookie in enumerate(cj):
  162.  
  163.         print index, '  :  ', cookie        
  164.  
  165.     cj.save(COOKIEFILE)                     # save the cookies again
  166.  
  167.  
Aug 10 '08 #2
Thanks for the help. Your code by itself did not work, but it pushed me in the right direction. Here is what worked for me and let me see the protected pages:

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/env python
  2. # -*- coding: UTF-8 -*-
  3.  
  4. import cookielib
  5. import urllib
  6. import urllib2
  7.  
  8. cj = cookielib.CookieJar()
  9. opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
  10. resp = opener.open('http://schoolfinder.com') # save a cookie
  11.  
  12. theurl = 'http://schoolfinder.com/login/login.asp' # an example url that sets a cookie, try different urls here and see the cookie collection you can make !
  13. body={'usr':'greenman','pwd':'greenman'}
  14. txdata = urllib.urlencode(body) # if we were making a POST type request, we could encode a dictionary of values here - using urllib.urlencode
  15. txheaders =  {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'} # fake a user agent, some websites (like google) don't like automated exploration
  16.  
  17.  
  18. try:
  19.     req = urllib2.Request(theurl, txdata, txheaders) # create a request object
  20.     handle = opener.open(req) # and open it to return a handle on the url
  21.     HTMLSource = handle.read()
  22.     f = file('test.html', 'w')
  23.     f.write(HTMLSource)
  24.     f.close()
  25.  
  26. except IOError, e:
  27.     print 'We failed to open "%s".' % theurl
  28.     if hasattr(e, 'code'):
  29.         print 'We failed with error code - %s.' % e.code
  30.     elif hasattr(e, 'reason'):
  31.         print "The error object has the following 'reason' attribute :", e.reason
  32.         print "This usually means the server doesn't exist, is down, or we don't have an internet connection."
  33.         sys.exit()
  34.  
  35. else:
  36.     print 'Here are the headers of the page :'
  37.     print handle.info() # handle.read() returns the page, handle.geturl() returns the true url of the page fetched (in case urlopen has followed any redirects, which it sometimes does)
Aug 30 '08 #3
Your script works for me, but the one below for another site does not. The test.html file is not my logged in file like it is when I run your script.

The only lines of code I changed are;
resp = opener.open('http://www.amm.com/')
theurl = 'http://www.amm.com/login.asp'
body={'username':'AMMT54590570','password':'AMMT32 564288'}

What am I doing wrong?

-----------------------------------
Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/env python
  2. # -*- coding: UTF-8 -*-
  3.  
  4. import cookielib
  5. import urllib
  6. import urllib2
  7.  
  8. cj = cookielib.CookieJar()
  9. opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
  10. resp = opener.open('http://www.amm.com/login.asp') # save a cookie
  11.  
  12. theurl = 'http://www.amm.com/login.asp'
  13. # an example url that sets a cookie, try different urls here and see the cookie collection you can make !
  14. body={'username':'AMMT54590570','password':'AMMT32564288'}
  15. txdata = urllib.urlencode(body)
  16. # if we were making a POST type request, we could encode a dictionary of values here - using urllib.urlencode
  17. txheaders =  {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}
  18. # fake a user agent, some websites (like google) don't like automated exploration
  19.  
  20.  
  21. try:
  22.     req = urllib2.Request(theurl, txdata, txheaders) # create a request object
  23.     handle = opener.open(req) # and open it to return a handle on the url
  24.     HTMLSource = handle.read()
  25.     f = file('test.html', 'w')
  26.     f.write(HTMLSource)
  27.     f.close()
  28.  
  29. except IOError, e:
  30.     print 'We failed to open "%s".' % theurl
  31.     if hasattr(e, 'code'):
  32.         print 'We failed with error code - %s.' % e.code
  33.     elif hasattr(e, 'reason'):
  34.         print "The error object has the following 'reason' attribute :", e.reason
  35.         print "This usually means the server doesn't exist, is down, or we don't have an internet connection."
  36.         sys.exit()
  37.  
  38. else:
  39.     print 'Here are the headers of the page :'
  40.     print handle.info() # handle.read() returns the page, handle.geturl() returns the true url of the page fetched (in case urlopen has followed any redirects, which it sometimes does)
Oct 29 '08 #4

Sign in to post your reply or Sign up for a free account.

Similar topics

10
by: Brian Conway | last post by:
I have no idea what is going on. I have a Login screen where someone types in their login information and this populates a datagrid based off of the login. Works great in debug and test through...
3
by: Kris van der Mast | last post by:
Hi, I've created a little site for my sports club. In the root folder there are pages that are viewable by every anonymous user but at a certain subfolder my administration pages should be...
3
by: Joey Powell | last post by:
This message was originally posted to the aspnet.security newsgroup, but no one there has ever heard of this before. That is why I am posting this message here, so that more people will see it... ...
3
by: Calvin KD | last post by:
Hi everyone, Can someone tell me what's wrong with the way that i read a cookie as below: private void Page_Load(object sender, System.EventArgs e) { Response.Cookies.Clear(); HttpCookie...
2
by: pv_kannan | last post by:
I recently found out that my authentication cookies are not expiring even though I have set the persist property to false. As a result, users are able to access the secure websites with indifferent...
2
by: Nicola Farina | last post by:
Hi all, I'm testing ASP.NET 1.1 authentications and cookies features, and I've red tons of tutorials and articles about this, but not all is clear for me. My goal is to create a basic site...
1
by: studio60podcast | last post by:
I'm writing an ASP.NET 2.0 application using the new Membership providor and I'm having trouble. I have created the roles, logins, login controls, etc... and I can log 
in to the site....
0
by: Calvin KD | last post by:
Hi everyone, I need help urgently. I have a C#.Net app which uses cookies for state management. Everything has been going fine until recently we've expanded the app and a few more screens were...
5
by: archana | last post by:
Hi all I am new to asp.net. I want to implement authentication in all pages. What i want to do is validate user from database table. So currently what i am doing is on login page validating...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.