472,330 Members | 1,503 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,330 software developers and data experts.

Using mechanize to do website authentication

I am trying to write a web scraper and am having trouble accessing pages that require authentication. I am attempting to utilise the mechanize library, but am having difficulties. The site I am trying to login is http://www.princetonreview.com/Login3.aspx?uidbadge=

user: bugmenot2008@yahoo.com
pass: letmeinalready

Previously I did something similar to another site: schoolfinder.com. Here is my code for that:

Expand|Select|Wrap|Line Numbers
  1. import cookielib
  2. import urllib
  3. import urllib2
  4.  
  5. cj = cookielib.CookieJar()
  6. opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
  7. resp = opener.open('http://schoolfinder.com') # save a cookie
  8.  
  9. theurl = 'http://schoolfinder.com/login/login.asp' # an example url that sets a cookie, try different urls here and see the cookie collection you can make !
  10. body={'usr':'greenman','pwd':'greenman'}
  11. txdata = urllib.urlencode(body) # if we were making a POST type request, we could encode a dictionary of values here - using urllib.urlencode
  12. txheaders =  {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'} # fake a user agent, some websites (like google) don't like automated exploration
  13.  
  14.  
  15. try:
  16.     req = urllib2.Request(theurl, txdata, txheaders) # create a request object
  17.     handle = opener.open(req) # and open it to return a handle on the url
  18.     HTMLSource = handle.read()
  19.     f = file('test.html', 'w')
  20.     f.write(HTMLSource)
  21.     f.close()
  22.  
  23. except IOError, e:
  24.     print 'We failed to open "%s".' % theurl
  25.     if hasattr(e, 'code'):
  26.         print 'We failed with error code - %s.' % e.code
  27.     elif hasattr(e, 'reason'):
  28.         print "The error object has the following 'reason' attribute :", e.reason
  29.         print "This usually means the server doesn't exist, is down, or we don't have an internet connection."
  30.         sys.exit()
  31.  
  32. else:
  33.     print 'Here are the headers of the page :'
  34.     print handle.info() # handle.read() returns the page, handle.geturl() returns the true url of the page fetched (in case urlopen has followed any redirects, which it sometimes does)
  35.  
This method does not work on the Princeton Review site however. Interestingly I cannot even get mechanize to access the schoolfinder.com site. Here is the code I am using:

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/env python
  2. # -*- coding: UTF-8 -*-
  3. import mechanize
  4.  
  5. theurl = 'http://www.princetonreview.com/Login3.aspx?uidbadge='
  6. mech = mechanize.Browser()
  7. mech.open(theurl)
  8.  
  9. mech.select_form(nr=0)
  10. mech["ctl00$MasterMainBodyContent$txtUsername"] = "bugmenot2008@yahoo.com"
  11. mech["ctl00$MasterMainBodyContent$txtPassword"] = "letmeinalready"
  12. results = mech.submit().read()
  13.  
  14. f = file('test.html', 'w')
  15. f.write(results) # write to a test file
  16. f.close()
  17.  
This code is so short and I just cannot figure out what I am doing wrong. What is incorrect about this? Thank you in advance.
Sep 5 '08 #1
0 3282

Sign in to post your reply or Sign up for a free account.

Similar topics

15
by: Larry Asher | last post by:
Hi all. I'm a bit of a novice in this arena so please forgive if this question reflects that. I am trying to grab the html from a website and...
0
by: bruce | last post by:
hi... update to an ongoing issue i've been having regarding html/Browser and selecting forms. i've created a basic test app, and created a...
2
by: barrybevel | last post by:
Hi, I have a very small simple program below which does the following: 1) post a username & password to a website - THIS WORKS 2) follow a link...
1
by: comeshopcheap | last post by:
Hi I am using this script to access doba.com (I need to download some files) but I keep on being sent back to the login page not the user home...
1
numberwhun
by: numberwhun | last post by:
I am having an issue with understanding something in the WWW::Mechanize module. I have a website which I want to download a whole plethora of pdf...
2
by: Silgd1 | last post by:
Hi all.... I'm using pyscripter 1.7.2, on a Win XP Prof 2002 - service pack 2 machine to script a website. I have no problem logging into the...
6
by: sureshbup | last post by:
Hi, i am new to perl... i tried this module mechanize. this is the script #!/usr/bin/perl # Include the WWW::Mechanize module use...
2
by: Rex | last post by:
Hello, I am working on an academic research project where I need to log in to a website (www.lexis.com) over HTTPS and execute a bunch of queries...
1
by: tedpottel | last post by:
Hi, I can read the home page using the mechanize lib. Is there a way to load in web pages using filename.html instad of servername/...
0
by: tammygombez | last post by:
Hey fellow JavaFX developers, I'm currently working on a project that involves using a ComboBox in JavaFX, and I've run into a bit of an issue....
0
by: tammygombez | last post by:
Hey everyone! I've been researching gaming laptops lately, and I must say, they can get pretty expensive. However, I've come across some great...
0
better678
by: better678 | last post by:
Question: Discuss your understanding of the Java platform. Is the statement "Java is interpreted" correct? Answer: Java is an object-oriented...
0
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and...
0
by: CD Tom | last post by:
This happens in runtime 2013 and 2016. When a report is run and then closed a toolbar shows up and the only way to get it to go away is to right...
0
by: CD Tom | last post by:
This only shows up in access runtime. When a user select a report from my report menu when they close the report they get a menu I've called Add-ins...
0
by: Matthew3360 | last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function. Here is my code. ...
2
by: Matthew3360 | last post by:
Hi, I have a python app that i want to be able to get variables from a php page on my webserver. My python app is on my computer. How would I make it...
0
by: AndyPSV | last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.