473,748 Members | 2,320 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Using mechanize to do website authentication

7 New Member
I am trying to write a web scraper and am having trouble accessing pages that require authentication. I am attempting to utilise the mechanize library, but am having difficulties. The site I am trying to login is http://www.princetonre view.com/Login3.aspx?uid badge=

user: bugmenot2008@ya hoo.com
pass: letmeinalready

Previously I did something similar to another site: schoolfinder.co m. Here is my code for that:

Expand|Select|Wrap|Line Numbers
  1. import cookielib
  2. import urllib
  3. import urllib2
  5. cj = cookielib.CookieJar()
  6. opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
  7. resp = opener.open('http://schoolfinder.com') # save a cookie
  9. theurl = 'http://schoolfinder.com/login/login.asp' # an example url that sets a cookie, try different urls here and see the cookie collection you can make !
  10. body={'usr':'greenman','pwd':'greenman'}
  11. txdata = urllib.urlencode(body) # if we were making a POST type request, we could encode a dictionary of values here - using urllib.urlencode
  12. txheaders =  {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'} # fake a user agent, some websites (like google) don't like automated exploration
  15. try:
  16.     req = urllib2.Request(theurl, txdata, txheaders) # create a request object
  17.     handle = opener.open(req) # and open it to return a handle on the url
  18.     HTMLSource = handle.read()
  19.     f = file('test.html', 'w')
  20.     f.write(HTMLSource)
  21.     f.close()
  23. except IOError, e:
  24.     print 'We failed to open "%s".' % theurl
  25.     if hasattr(e, 'code'):
  26.         print 'We failed with error code - %s.' % e.code
  27.     elif hasattr(e, 'reason'):
  28.         print "The error object has the following 'reason' attribute :", e.reason
  29.         print "This usually means the server doesn't exist, is down, or we don't have an internet connection."
  30.         sys.exit()
  32. else:
  33.     print 'Here are the headers of the page :'
  34.     print handle.info() # handle.read() returns the page, handle.geturl() returns the true url of the page fetched (in case urlopen has followed any redirects, which it sometimes does)
This method does not work on the Princeton Review site however. Interestingly I cannot even get mechanize to access the schoolfinder.co m site. Here is the code I am using:

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/env python
  2. # -*- coding: UTF-8 -*-
  3. import mechanize
  5. theurl = 'http://www.princetonreview.com/Login3.aspx?uidbadge='
  6. mech = mechanize.Browser()
  7. mech.open(theurl)
  9. mech.select_form(nr=0)
  10. mech["ctl00$MasterMainBodyContent$txtUsername"] = "bugmenot2008@yahoo.com"
  11. mech["ctl00$MasterMainBodyContent$txtPassword"] = "letmeinalready"
  12. results = mech.submit().read()
  14. f = file('test.html', 'w')
  15. f.write(results) # write to a test file
  16. f.close()
This code is so short and I just cannot figure out what I am doing wrong. What is incorrect about this? Thank you in advance.
Sep 5 '08 #1
0 3488

Sign in to post your reply or Sign up for a free account.

Similar topics

by: Larry Asher | last post by:
Hi all. I'm a bit of a novice in this arena so please forgive if this question reflects that. I am trying to grab the html from a website and display it within another webpage (once I get this to work I am going to manipulate the html in other ways - this isn't the end purpose of this effort). To do this I am trying to open another window containing the source html from a URL and then capture the html from that window. I can open the...
by: bruce | last post by:
hi... update to an ongoing issue i've been having regarding html/Browser and selecting forms. i've created a basic test app, and created a stripped down page of html. the html has a single form. i get the following error: fname = main <<<< the app can find the frame from the XPath...
by: barrybevel | last post by:
Hi, I have a very small simple program below which does the following: 1) post a username & password to a website - THIS WORKS 2) follow a link - THIS WORKS 3) update values of 2 fields and post the form - ERROR! This works fine using firefox even with javascript turned off. But when using Perl (v5.8.8 on FC5) I get a page back stating an error has occured: "We're sorry, an error has occurred. Please review the error below There has...
by: comeshopcheap | last post by:
Hi I am using this script to access doba.com (I need to download some files) but I keep on being sent back to the login page not the user home page. Any help. I think I may need to use a post method and opener is using a get method Thanks import mechanize
by: numberwhun | last post by:
I am having an issue with understanding something in the WWW::Mechanize module. I have a website which I want to download a whole plethora of pdf files from. It is a site that I have paid to access and it is perfectly legal for me to download them, but there are FAR too many files to download by hand so I want to automate the process. The problem is, is that the site has a login page (see http://stampalbums.com/worldwide_list.asp). I was...
by: Silgd1 | last post by:
Hi all.... I'm using pyscripter 1.7.2, on a Win XP Prof 2002 - service pack 2 machine to script a website. I have no problem logging into the site, loading and an xml file, and retrieving the confirmation transaction code xml file, but when I go to the reports page and try to grab a report, I run into a problem. The "Get Report" button code within the web page is the following: <input type="button" name="change" value="Get Report"...
by: sureshbup | last post by:
Hi, i am new to perl... i tried this module mechanize. this is the script #!/usr/bin/perl # Include the WWW::Mechanize module use WWW::Mechanize;
by: Rex | last post by:
Hello, I am working on an academic research project where I need to log in to a website (www.lexis.com) over HTTPS and execute a bunch of queries to gather a data set. I just discovered the mechanize module, which seems great because it's a high-level tool. However, I can't find any decent documentation for mechanize apart from the docstrings, which are pretty thin. So I just followed some other examples I found online, to produce the...
by: tedpottel | last post by:
Hi, I can read the home page using the mechanize lib. Is there a way to load in web pages using filename.html instad of servername/ filename.html. Lots of time the links just have the file name. I'm trying to read in the links name and then vsit those pages. here is the sample code I am ussing.
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.