473,405 Members | 2,373 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

python - HTML processing - need tips

I need to process a HTML form in python. I'm using urllib2 and
HTMLParser to handle the html. There are several steps I need to take
to get to the specific page on the relevant site the first of which is
to log in with a username/password. The html code that processes the
login consists of 2 edit boxes (for User ID and Password) and a Submit
button which uses ASP.net client side validation as follows (formatted
for clarity):

<tr>
<td align="right"><b>User ID:</b>
</td>
<td align="left"><input name="txtUserName" id="txtUserName"
type="text" maxlength="63" /></td>
<td><span id="vEmail" controltovalidate="txtUserName"
errormessage="Valid Email format is required" isvalid="False"
evaluationfunction="RegularExpressionValidatorEval uateIsValid"
validationexpression="\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*"
style="color:Red;font-size:Smaller;font-weight:bold;">Valid Email
format is required</span>
</td>
</tr>
<tr>
<td align="right"><b>Password:</B>
</td>
<td align="left"><input name="txtUserPass" id="txtUserPass"
type="password" maxlength="49" /></td>
<td>&nbsp;</td>
</tr>
<tr >
<td>&nbsp;</td>
<td align="left"><input type="submit" name="loginButton"
value="Submit" onclick="if (typeof(Page_ClientValidate) == 'function')
Page_ClientValidate(); " language="javascript" id="loginButton" />
<td>&nbsp;</td>
</tr>

I've looked at all the relevant posts on this topic and already looked
at mechanize and ClientForm. It appears I can't use those for 2
reasons: 1) that they can't handle client side validation and 2) this
button doesn't actually reside in a form and I haven't been able to
find any python code that obtains a handle to a submit control and
simulates clicking on it.

I've tried sending the server a POST message as such:

loginParams = urllib.urlencode({'txtUserName': theUsername,
'txtUserPass': thePassword})
txdata = None
txheaders = {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5;
Windows NT)'}
req = Request(url1, txdata, txheaders) # url1 points to the secure
page seen following login
handle = urlopen(req, loginParams)

But this doesn't work. I dont understand the use of
Page_ClientValidate( ) and haven't really found any useful
documentation on it for my purposes. I basically need to be able to
submit this information to the site, by simulating the onclick event
through python. As far as I understand I need a solution to the 2
points I mentioned above (getting past client-side validation and
simulating a click of a non-form button). Any help on this (or other
issues I might have missed but are important/relevant) would be great!

Many thanks,
Pythonner

Aug 7 '06 #1
2 1606
At Monday 7/8/2006 20:58, wipit wrote:
>I need to process a HTML form in python. I'm using urllib2 and
HTMLParser to handle the html. There are several steps I need to take
to get to the specific page on the relevant site the first of which is
to log in with a username/password. The html code that processes the
login consists of 2 edit boxes (for User ID and Password) and a Submit
button which uses ASP.net client side validation as follows (formatted
for clarity):
Another approach would be using HTTPDebugger
<http://www.softx.org/debugger.htmlto see exactly what gets
submitted, and then build a compatible Request.
On many sites you don't even need to *get* the login page -nor parse
it-, just posting the right Request is enough to log in successfully.

Gabriel Genellina
'@'.join(('gagsl-py','.'.join(('yahoo','com','ar'))))

__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas

Aug 8 '06 #2
I figured it out... Just turned the POST request into a GET to see what
was getting appended to the URL - thanks Gabe!

Gabriel Genellina wrote:
At Monday 7/8/2006 20:58, wipit wrote:
I need to process a HTML form in python. I'm using urllib2 and
HTMLParser to handle the html. There are several steps I need to take
to get to the specific page on the relevant site the first of which is
to log in with a username/password. The html code that processes the
login consists of 2 edit boxes (for User ID and Password) and a Submit
button which uses ASP.net client side validation as follows (formatted
for clarity):

Another approach would be using HTTPDebugger
<http://www.softx.org/debugger.htmlto see exactly what gets
submitted, and then build a compatible Request.
On many sites you don't even need to *get* the login page -nor parse
it-, just posting the right Request is enough to log in successfully.

Gabriel Genellina
'@'.join(('gagsl-py','.'.join(('yahoo','com','ar'))))

__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas
Aug 8 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

42
by: Fred Ma | last post by:
Hello, This is not a troll posting, and I've refrained from asking because I've seen similar threads get all nitter-nattery. But I really want to make a decision on how best to invest my time....
1
by: Xah Lee | last post by:
suppose you want to do find & replace of string of all files in a directory. here's the code: ©# -*- coding: utf-8 -*- ©# Python © ©import os,sys © ©mydir= '/Users/t/web'
31
by: surfunbear | last post by:
I've read some posts on Perl versus Python and studied a bit of my Python book. I'm a software engineer, familiar with C++ objected oriented development, but have been using Perl because it is...
10
by: Andrew Dalke | last post by:
Is there an author index for the new version of the Python cookbook? As a contributor I got my comp version delivered today and my ego wanted some gratification. I couldn't find my entries. ...
0
by: Diez B. Roggisch | last post by:
QOTW: "In my view, the doctrinaire', indeed religious, adherence to OO purity has harmed our discipline considerably. Python was a nice breath of fresh air when I discovered it exactly because it...
47
by: Kenneth McDonald | last post by:
Is there any emerging consensus on the "best" UI for toolkit. Tk never quite made it but from what I can see, both qt and wxWin are both doing fairly well in general. I'm already aware of the...
0
by: Gabriel Genellina | last post by:
QOTW: "Good God! Is there *anything* that python does not already do? I hardly feel the need to write programs anymore ... Its really 80% like of the questions that are asked here get answered...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.