473,585 Members | 2,731 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Help with cookies/authentication

7 New Member
Hi I am trying to pull some data from a Web site: http://schoolfinder.com

The issue is that I want to use the advanced search feature which requires logging into the Web site. I have a username and password, however I want to connect programmaticall y from Python. I have done data capture from the Web before so the only new thing here to me is the authentication stuff. I need cookies as this page describes: http://schoolfinder.com/login/login.asp

I already know how to enter POST/GET data to a request, but how do I deal with cookies/authentication? I have read a few articles without success:

urllib2:
http://www.voidspace.org.uk/python/a...lib2.shtml#id6

urllib2 Cookbook:
http://personalpages.tds.net/~kent37/kk/00010.html

basic authentication:
http://www.voidspace.org.uk/python/a...ion.shtml#id19

cookielib:
http://www.voidspace.org.uk/python/a...ookielib.shtml

Is there some other resource I am missing? Is it possible that someone could setup a basic script that would allow me to connect to schoolfinder.co m with my username and password? My username is "greenman", password is "greenman". All I need to know is how to access pages as if I logged in by Web browser.

Thank you very much.
Aug 10 '08 #1
3 7265
Formula
11 New Member
Try this code will give you all cookies will be registered in a file
from the schoolfinder.co m

Expand|Select|Wrap|Line Numbers
  1. #!/usr/local/bin/python
  2.  
  3.  
  4.  
  5.  
  6.  
  7. COOKIEFILE = 'cookies.lwp'          # the path and filename that you want to use to save your cookies in
  8.  
  9. import os.path
  10.  
  11. import sys
  12.  
  13.  
  14.  
  15. cj = None
  16.  
  17. ClientCookie = None
  18.  
  19. cookielib = None
  20.  
  21.  
  22.  
  23. try:                                    # Let's see if cookielib is available
  24.  
  25.     import cookielib            
  26.  
  27. except ImportError:
  28.  
  29.     pass
  30.  
  31. else:
  32.  
  33.     import urllib2    
  34.  
  35.     urlopen = urllib2.urlopen
  36.  
  37.     cj = cookielib.LWPCookieJar()       # This is a subclass of FileCookieJar that has useful load and save methods
  38.  
  39.     Request = urllib2.Request
  40.  
  41.  
  42.  
  43. if not cookielib:                   # If importing cookielib fails let's try ClientCookie
  44.  
  45.     try:                                            
  46.  
  47.         import ClientCookie 
  48.  
  49.     except ImportError:
  50.  
  51.         import urllib2
  52.  
  53.         urlopen = urllib2.urlopen
  54.  
  55.         Request = urllib2.Request
  56.  
  57.     else:
  58.  
  59.         urlopen = ClientCookie.urlopen
  60.  
  61.         cj = ClientCookie.LWPCookieJar()
  62.  
  63.         Request = ClientCookie.Request
  64.  
  65.  
  66.  
  67. ####################################################
  68.  
  69. # We've now imported the relevant library - whichever library is being used urlopen is bound to the right function for retrieving URLs
  70.  
  71. # Request is bound to the right function for creating Request objects
  72.  
  73. # Let's load the cookies, if they exist. 
  74.  
  75.  
  76.  
  77. if cj != None:                                  # now we have to install our CookieJar so that it is used as the default CookieProcessor in the default opener handler
  78.  
  79.     if os.path.isfile(COOKIEFILE):
  80.  
  81.         cj.load(COOKIEFILE)
  82.  
  83.     if cookielib:
  84.  
  85.         opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
  86.  
  87.         urllib2.install_opener(opener)
  88.  
  89.     else:
  90.  
  91.         opener = ClientCookie.build_opener(ClientCookie.HTTPCookieProcessor(cj))
  92.  
  93.         ClientCookie.install_opener(opener)
  94.  
  95.  
  96.  
  97. # If one of the cookie libraries is available, any call to urlopen will handle cookies using the CookieJar instance we've created
  98.  
  99. # (Note that if we are using ClientCookie we haven't explicitly imported urllib2)
  100.  
  101. # as an example :
  102.  
  103.  
  104.  
  105. theurl = 'http://schoolfinder.com/login/login.asp'         # an example url that sets a cookie, try different urls here and see the cookie collection you can make !
  106. body={'usr':'greenman','pwd':'greenman'}
  107.  
  108. from urllib import urlencode
  109.  
  110.  
  111. txdata = urlencode(body)                                                                           # if we were making a POST type request, we could encode a dictionary of values here - using urllib.urlencode
  112.  
  113. txheaders =  {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}          # fake a user agent, some websites (like google) don't like automated exploration
  114.  
  115.  
  116.  
  117. try:
  118.  
  119.     req = Request(theurl, txdata, txheaders)            # create a request object
  120.  
  121.     handle = urlopen(req)                               # and open it to return a handle on the url
  122.  
  123. except IOError, e:
  124.  
  125.     print 'We failed to open "%s".' % theurl
  126.  
  127.     if hasattr(e, 'code'):
  128.  
  129.         print 'We failed with error code - %s.' % e.code
  130.  
  131.     elif hasattr(e, 'reason'):
  132.  
  133.         print "The error object has the following 'reason' attribute :", e.reason
  134.  
  135.         print "This usually means the server doesn't exist, is down, or we don't have an internet connection."
  136.  
  137.         sys.exit()
  138.  
  139.  
  140.  
  141. else:
  142.  
  143.     print 'Here are the headers of the page :'
  144.  
  145.     print handle.info()                             # handle.read() returns the page, handle.geturl() returns the true url of the page fetched (in case urlopen has followed any redirects, which it sometimes does)
  146.  
  147.  
  148.  
  149. print
  150.  
  151. if cj == None:
  152.  
  153.     print "We don't have a cookie library available - sorry."
  154.  
  155.     print "I can't show you any cookies."
  156.  
  157. else:
  158.  
  159.     print 'These are the cookies we have received so far :'
  160.  
  161.     for index, cookie in enumerate(cj):
  162.  
  163.         print index, '  :  ', cookie        
  164.  
  165.     cj.save(COOKIEFILE)                     # save the cookies again
  166.  
  167.  
Aug 10 '08 #2
trihaitran
7 New Member
Thanks for the help. Your code by itself did not work, but it pushed me in the right direction. Here is what worked for me and let me see the protected pages:

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/env python
  2. # -*- coding: UTF-8 -*-
  3.  
  4. import cookielib
  5. import urllib
  6. import urllib2
  7.  
  8. cj = cookielib.CookieJar()
  9. opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
  10. resp = opener.open('http://schoolfinder.com') # save a cookie
  11.  
  12. theurl = 'http://schoolfinder.com/login/login.asp' # an example url that sets a cookie, try different urls here and see the cookie collection you can make !
  13. body={'usr':'greenman','pwd':'greenman'}
  14. txdata = urllib.urlencode(body) # if we were making a POST type request, we could encode a dictionary of values here - using urllib.urlencode
  15. txheaders =  {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'} # fake a user agent, some websites (like google) don't like automated exploration
  16.  
  17.  
  18. try:
  19.     req = urllib2.Request(theurl, txdata, txheaders) # create a request object
  20.     handle = opener.open(req) # and open it to return a handle on the url
  21.     HTMLSource = handle.read()
  22.     f = file('test.html', 'w')
  23.     f.write(HTMLSource)
  24.     f.close()
  25.  
  26. except IOError, e:
  27.     print 'We failed to open "%s".' % theurl
  28.     if hasattr(e, 'code'):
  29.         print 'We failed with error code - %s.' % e.code
  30.     elif hasattr(e, 'reason'):
  31.         print "The error object has the following 'reason' attribute :", e.reason
  32.         print "This usually means the server doesn't exist, is down, or we don't have an internet connection."
  33.         sys.exit()
  34.  
  35. else:
  36.     print 'Here are the headers of the page :'
  37.     print handle.info() # handle.read() returns the page, handle.geturl() returns the true url of the page fetched (in case urlopen has followed any redirects, which it sometimes does)
Aug 30 '08 #3
johnpollard
2 New Member
Your script works for me, but the one below for another site does not. The test.html file is not my logged in file like it is when I run your script.

The only lines of code I changed are;
resp = opener.open('ht tp://www.amm.com/')
theurl = 'http://www.amm.com/login.asp'
body={'username ':'AMMT54590570 ','password':'A MMT32564288'}

What am I doing wrong?

-----------------------------------
Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/env python
  2. # -*- coding: UTF-8 -*-
  3.  
  4. import cookielib
  5. import urllib
  6. import urllib2
  7.  
  8. cj = cookielib.CookieJar()
  9. opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
  10. resp = opener.open('http://www.amm.com/login.asp') # save a cookie
  11.  
  12. theurl = 'http://www.amm.com/login.asp'
  13. # an example url that sets a cookie, try different urls here and see the cookie collection you can make !
  14. body={'username':'AMMT54590570','password':'AMMT32564288'}
  15. txdata = urllib.urlencode(body)
  16. # if we were making a POST type request, we could encode a dictionary of values here - using urllib.urlencode
  17. txheaders =  {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}
  18. # fake a user agent, some websites (like google) don't like automated exploration
  19.  
  20.  
  21. try:
  22.     req = urllib2.Request(theurl, txdata, txheaders) # create a request object
  23.     handle = opener.open(req) # and open it to return a handle on the url
  24.     HTMLSource = handle.read()
  25.     f = file('test.html', 'w')
  26.     f.write(HTMLSource)
  27.     f.close()
  28.  
  29. except IOError, e:
  30.     print 'We failed to open "%s".' % theurl
  31.     if hasattr(e, 'code'):
  32.         print 'We failed with error code - %s.' % e.code
  33.     elif hasattr(e, 'reason'):
  34.         print "The error object has the following 'reason' attribute :", e.reason
  35.         print "This usually means the server doesn't exist, is down, or we don't have an internet connection."
  36.         sys.exit()
  37.  
  38. else:
  39.     print 'Here are the headers of the page :'
  40.     print handle.info() # handle.read() returns the page, handle.geturl() returns the true url of the page fetched (in case urlopen has followed any redirects, which it sometimes does)
Oct 29 '08 #4

Sign in to post your reply or Sign up for a free account.

Similar topics

10
2692
by: Brian Conway | last post by:
I have no idea what is going on. I have a Login screen where someone types in their login information and this populates a datagrid based off of the login. Works great in debug and test through VS, however, when I change to release and put it out on the web it fails giving me the following error message The underlying connection was...
3
4850
by: Kris van der Mast | last post by:
Hi, I've created a little site for my sports club. In the root folder there are pages that are viewable by every anonymous user but at a certain subfolder my administration pages should be protected by forms authentication. When I create forms authentication at root level it works but when I move my code up to the subfolder I get this...
3
3831
by: Joey Powell | last post by:
This message was originally posted to the aspnet.security newsgroup, but no one there has ever heard of this before. That is why I am posting this message here, so that more people will see it... On my asp.net application, suddenly the forms authentication cookies for clients have quit expiring. This results in users being able to access...
3
2076
by: Calvin KD | last post by:
Hi everyone, Can someone tell me what's wrong with the way that i read a cookie as below: private void Page_Load(object sender, System.EventArgs e) { Response.Cookies.Clear(); HttpCookie cookie = GetSessionCookie("MyCookie", "duh"); } private HttpCookie GetSessionCookie(string cookieKey, string cookieValue)
2
2725
by: pv_kannan | last post by:
I recently found out that my authentication cookies are not expiring even though I have set the persist property to false. As a result, users are able to access the secure websites with indifferent results. Any pointers/suggestions would be very appreciated. Things were running as usual till until recently. Here are the relevant pieces...
2
1840
by: Nicola Farina | last post by:
Hi all, I'm testing ASP.NET 1.1 authentications and cookies features, and I've red tons of tutorials and articles about this, but not all is clear for me. My goal is to create a basic site with authentication process, like my other ASP 3.0 sites that I developed with classical session variables to follow each user with some personal data...
1
1558
by: studio60podcast | last post by:
I'm writing an ASP.NET 2.0 application using the new Membership providor and I'm having trouble. I have created the roles, logins, login controls, etc... and I can log 
in to the site.  However, as soon as I'm successfully authenticated, I 
click on a link and it challenges me for credentials again!  Doesn't 
matter which links I...
0
1025
by: Calvin KD | last post by:
Hi everyone, I need help urgently. I have a C#.Net app which uses cookies for state management. Everything has been going fine until recently we've expanded the app and a few more screens were added and quite a few more cookies were required to hold data across pages. Now i just found that at one specific spot, when clicking the button,...
5
1321
by: archana | last post by:
Hi all I am new to asp.net. I want to implement authentication in all pages. What i want to do is validate user from database table. So currently what i am doing is on login page validating user and storing valid user id in sesstion. On every page i am checking userid from session.. But i don't want to behavirour. what i want is to...
0
7900
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, well explore What is ONU, What Is Router, ONU & Routers main...
0
7832
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
8332
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7943
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
6592
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing, and deploymentwithout human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5705
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
3853
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2338
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1442
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.