473,836 Members | 1,904 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Using mechanize to do website authentication

7 New Member
I am trying to write a web scraper and am having trouble accessing pages that require authentication. I am attempting to utilise the mechanize library, but am having difficulties. The site I am trying to login is http://www.princetonre view.com/Login3.aspx?uid badge=

user: bugmenot2008@ya hoo.com
pass: letmeinalready

Previously I did something similar to another site: schoolfinder.co m. Here is my code for that:

Expand|Select|Wrap|Line Numbers
  1. import cookielib
  2. import urllib
  3. import urllib2
  4.  
  5. cj = cookielib.CookieJar()
  6. opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
  7. resp = opener.open('http://schoolfinder.com') # save a cookie
  8.  
  9. theurl = 'http://schoolfinder.com/login/login.asp' # an example url that sets a cookie, try different urls here and see the cookie collection you can make !
  10. body={'usr':'greenman','pwd':'greenman'}
  11. txdata = urllib.urlencode(body) # if we were making a POST type request, we could encode a dictionary of values here - using urllib.urlencode
  12. txheaders =  {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'} # fake a user agent, some websites (like google) don't like automated exploration
  13.  
  14.  
  15. try:
  16.     req = urllib2.Request(theurl, txdata, txheaders) # create a request object
  17.     handle = opener.open(req) # and open it to return a handle on the url
  18.     HTMLSource = handle.read()
  19.     f = file('test.html', 'w')
  20.     f.write(HTMLSource)
  21.     f.close()
  22.  
  23. except IOError, e:
  24.     print 'We failed to open "%s".' % theurl
  25.     if hasattr(e, 'code'):
  26.         print 'We failed with error code - %s.' % e.code
  27.     elif hasattr(e, 'reason'):
  28.         print "The error object has the following 'reason' attribute :", e.reason
  29.         print "This usually means the server doesn't exist, is down, or we don't have an internet connection."
  30.         sys.exit()
  31.  
  32. else:
  33.     print 'Here are the headers of the page :'
  34.     print handle.info() # handle.read() returns the page, handle.geturl() returns the true url of the page fetched (in case urlopen has followed any redirects, which it sometimes does)
  35.  
This method does not work on the Princeton Review site however. Interestingly I cannot even get mechanize to access the schoolfinder.co m site. Here is the code I am using:

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/env python
  2. # -*- coding: UTF-8 -*-
  3. import mechanize
  4.  
  5. theurl = 'http://www.princetonreview.com/Login3.aspx?uidbadge='
  6. mech = mechanize.Browser()
  7. mech.open(theurl)
  8.  
  9. mech.select_form(nr=0)
  10. mech["ctl00$MasterMainBodyContent$txtUsername"] = "bugmenot2008@yahoo.com"
  11. mech["ctl00$MasterMainBodyContent$txtPassword"] = "letmeinalready"
  12. results = mech.submit().read()
  13.  
  14. f = file('test.html', 'w')
  15. f.write(results) # write to a test file
  16. f.close()
  17.  
This code is so short and I just cannot figure out what I am doing wrong. What is incorrect about this? Thank you in advance.
Sep 5 '08 #1
0 3499

Sign in to post your reply or Sign up for a free account.

Similar topics

15
2024
by: Larry Asher | last post by:
Hi all. I'm a bit of a novice in this arena so please forgive if this question reflects that. I am trying to grab the html from a website and display it within another webpage (once I get this to work I am going to manipulate the html in other ways - this isn't the end purpose of this effort). To do this I am trying to open another window containing the source html from a URL and then capture the html from that window. I can open the...
0
2443
by: bruce | last post by:
hi... update to an ongoing issue i've been having regarding html/Browser and selecting forms. i've created a basic test app, and created a stripped down page of html. the html has a single form. i get the following error: fname = main <<<< the app can find the frame from the XPath...
2
8786
by: barrybevel | last post by:
Hi, I have a very small simple program below which does the following: 1) post a username & password to a website - THIS WORKS 2) follow a link - THIS WORKS 3) update values of 2 fields and post the form - ERROR! This works fine using firefox even with javascript turned off. But when using Perl (v5.8.8 on FC5) I get a page back stating an error has occured: "We're sorry, an error has occurred. Please review the error below There has...
1
3952
by: comeshopcheap | last post by:
Hi I am using this script to access doba.com (I need to download some files) but I keep on being sent back to the login page not the user home page. Any help. I think I may need to use a post method and opener is using a get method Thanks import mechanize
1
5839
numberwhun
by: numberwhun | last post by:
I am having an issue with understanding something in the WWW::Mechanize module. I have a website which I want to download a whole plethora of pdf files from. It is a site that I have paid to access and it is perfectly legal for me to download them, but there are FAR too many files to download by hand so I want to automate the process. The problem is, is that the site has a login page (see http://stampalbums.com/worldwide_list.asp). I was...
2
21093
by: Silgd1 | last post by:
Hi all.... I'm using pyscripter 1.7.2, on a Win XP Prof 2002 - service pack 2 machine to script a website. I have no problem logging into the site, loading and an xml file, and retrieving the confirmation transaction code xml file, but when I go to the reports page and try to grab a report, I run into a problem. The "Get Report" button code within the web page is the following: <input type="button" name="change" value="Get Report"...
6
3384
by: sureshbup | last post by:
Hi, i am new to perl... i tried this module mechanize. this is the script #!/usr/bin/perl # Include the WWW::Mechanize module use WWW::Mechanize;
2
4907
by: Rex | last post by:
Hello, I am working on an academic research project where I need to log in to a website (www.lexis.com) over HTTPS and execute a bunch of queries to gather a data set. I just discovered the mechanize module, which seems great because it's a high-level tool. However, I can't find any decent documentation for mechanize apart from the docstrings, which are pretty thin. So I just followed some other examples I found online, to produce the...
1
2242
by: tedpottel | last post by:
Hi, I can read the home page using the mechanize lib. Is there a way to load in web pages using filename.html instad of servername/ filename.html. Lots of time the links just have the file name. I'm trying to read in the links name and then vsit those pages. here is the sample code I am ussing.
0
9810
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10818
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10526
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10565
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10237
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9348
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7770
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
1
4436
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3999
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.