473,404 Members | 2,170 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,404 software developers and data experts.

html + javascript automations = [mechanize + ?? ] or something else?


I have to write a spyder for a webpage that uses html + javascript. I
had it written using mechanize
but the authors of the webpage now use a lot of javascript. Mechanize
can no longer do the job.
Does anyone know how I could automate my spyder to understand
javascript? Is there a way
to control a browser like firefox from python itself? How about IE?
That way, we do not have
to go thru something like mechanize?

Thanks in advance for your help/comments,
--j

Jan 16 '07 #1
12 5818

I am curious about the webbrowser module. I can open up firefox
using webbrowser.open(), but can one control it? Say enter a
login / passwd on a webpage? Send keystrokes to firefox?
mouse clicks?

Thanks,
--j

John wrote:
I have to write a spyder for a webpage that uses html + javascript. I
had it written using mechanize
but the authors of the webpage now use a lot of javascript. Mechanize
can no longer do the job.
Does anyone know how I could automate my spyder to understand
javascript? Is there a way
to control a browser like firefox from python itself? How about IE?
That way, we do not have
to go thru something like mechanize?

Thanks in advance for your help/comments,
--j
Jan 16 '07 #2
Hello,

John wrote:
John wrote:
>I have to write a spyder for a webpage that uses html + javascript. I
had it written using mechanize
but the authors of the webpage now use a lot of javascript. Mechanize
can no longer do the job.
Does anyone know how I could automate my spyder to understand
javascript? Is there a way
to control a browser like firefox from python itself? How about IE?
That way, we do not have
to go thru something like mechanize?

I am curious about the webbrowser module. I can open up firefox
using webbrowser.open(), but can one control it? Say enter a
login / passwd on a webpage? Send keystrokes to firefox?
mouse clicks?
Not with the webbrowser module - it can only launch a browser.

On the website of mechanize you will also find DOMForm
<http://wwwsearch.sourceforge.net/DOMForm/>, which is a webscraper with
basic JS support (using the SpiderMonkey engine from the Mozilla project).
But note that DOMForm is in a early state and not developed anymore
(according to the site, never used it myself).

You could try to script IE (perhaps also FF, dunno..) using COM. This can be
done using the pywin32 module <https://sourceforge.net/projects/pywin32/>.
How this is done in detail is a windows issue. You may find help and
documentation in win specific group/mailing list, msdn, ... You can usually
translate the COM calls from VB, C#, ... quite directly to Python.
HTH

--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://pink.odahoda.de/
Jan 16 '07 #3
John,

"J" == John wrote:

JI have to write a spyder for a webpage that uses html + javascript. I
Jhad it written using mechanize but the authors of the webpage now use a
Jlot of javascript. Mechanize can no longer do the job. Does anyone
Jknow how I could automate my spyder to understand javascript? Is there
Ja way to control a browser like firefox from python itself? How about
JIE? That way, we do not have to go thru something like mechanize?

Up to my knowledge, there no way to test javascript but to fire up a
browser.

So, you might check Selenium (http://www.openqa.org/selenium/) and its
python module.

--
Andrey V Khavryuchenko
Software Development Company http://www.kds.com.ua/
Jan 16 '07 #4
Up to my knowledge, there no way to test javascript but to fire up a
browser.

So, you might check Selenium (http://www.openqa.org/selenium/) and its
python module.
No use in that, as to be remote-controlled by python, selenium must be run
on the server-site itself, due to JS security model restrictions.

Diez
Jan 16 '07 #5
"John" <we**********@yahoo.comwrote:
Is there a way
to control a browser like firefox from python itself? How about IE?
IE is easy enough to control and you have full access to the DOM:
>>import win32com
win32com.client.gencache.EnsureModule('{EAB22A C0-30C1-11CF-A7EB-
0000C05BAE0B}', 0, 1, 1)
<module 'win32com.gen_py.EAB22AC0-30C1-11CF-A7EB-0000C05BAE0Bx0x1x1' from
'C:\Python25\lib\site-packages\win32com\gen_py\EAB22AC0-30C1-11CF-A7EB-
0000C05BAE0Bx0x1x1.py'>
>>IE = win32com.client.DispatchEx('InternetExplorer.Appli cation.1')
dir(IE)
['CLSID', 'ClientToWindow', 'ExecWB', 'GetProperty', 'GoBack', 'GoForward',
'GoHome', 'GoSearch', 'Navigate', 'Navigate2', 'PutProperty',
'QueryStatusWB', 'Quit', 'Refresh', 'Refresh2', 'ShowBrowserBar', 'Stop',
'_ApplyTypes_', '__call__', '__cmp__', '__doc__', '__getattr__',
'__init__', '__int__', '__module__', '__repr__', '__setattr__', '__str__',
'__unicode__', '_get_good_object_', '_get_good_single_object_', '_oleobj_',
'_prop_map_get_', '_prop_map_put_', 'coclass_clsid']
>>IE.Visible=True
IE.Navigate("http://plone.org")
while IE.Busy: pass
>>print IE.Document.getElementById("portlet-news").innerHTML
<DT class=portletHeader><A class="feedButton link-plain"
href="feed://plone.org/news/newslisting/RSS"><IMG title="RSS subscription
feed for news items" alt=RSS src="http://plone.org/rss.gif"</A><A
href="http://plone.org/news">News</A</DT>

.... and so on ...
See
http://msdn.microsoft.com/workshop/b...ce/objects/int
ernetexplorer.asp
for the documentation.
Jan 16 '07 #6
Diez,

"DBR" == Diez B Roggisch wrote:
>Up to my knowledge, there no way to test javascript but to fire up a
browser.

So, you might check Selenium (http://www.openqa.org/selenium/) and its
python module.
DBRNo use in that, as to be remote-controlled by python, selenium must be run
DBRon the server-site itself, due to JS security model restrictions.

Sorry, missed 'spider' word in the original post.

--
Andrey V Khavryuchenko
Software Development Company http://www.kds.com.ua/
Jan 16 '07 #7
ina

John wrote:
I have to write a spyder for a webpage that uses html + javascript. I
had it written using mechanize
but the authors of the webpage now use a lot of javascript. Mechanize
can no longer do the job.
Does anyone know how I could automate my spyder to understand
javascript? Is there a way
to control a browser like firefox from python itself? How about IE?
That way, we do not have
to go thru something like mechanize?

Thanks in advance for your help/comments,
--j
You want pamie, iec or ishybrowser. Pamie is probably the best choice
since it gets patches and updates on a regular basis.

http://pamie.sourceforge.net/

Jan 16 '07 #8

I tried to install pamie (but I have mostly used python on cygwin on
windows).
In the section " What will you need to run PAMIE", it says I will need
"Mark Hammonds Win32 All"
which I can not find. Can anyone tell me how do I install PAMIE? Do I
need python for
windows that is different from cygwin's python?

Thanks,
--j

ina wrote:
John wrote:
I have to write a spyder for a webpage that uses html + javascript. I
had it written using mechanize
but the authors of the webpage now use a lot of javascript. Mechanize
can no longer do the job.
Does anyone know how I could automate my spyder to understand
javascript? Is there a way
to control a browser like firefox from python itself? How about IE?
That way, we do not have
to go thru something like mechanize?

Thanks in advance for your help/comments,
--j

You want pamie, iec or ishybrowser. Pamie is probably the best choice
since it gets patches and updates on a regular basis.

http://pamie.sourceforge.net/
Jan 22 '07 #9


My python2.5 installation on windows did not come with "win32com".
How do I install/get this module for windows?

Thanks,
--j

Duncan Booth wrote:
"John" <we**********@yahoo.comwrote:
Is there a way
to control a browser like firefox from python itself? How about IE?

IE is easy enough to control and you have full access to the DOM:
>import win32com
win32com.client.gencache.EnsureModule('{EAB22AC 0-30C1-11CF-A7EB-
0000C05BAE0B}', 0, 1, 1)
<module 'win32com.gen_py.EAB22AC0-30C1-11CF-A7EB-0000C05BAE0Bx0x1x1' from
'C:\Python25\lib\site-packages\win32com\gen_py\EAB22AC0-30C1-11CF-A7EB-
0000C05BAE0Bx0x1x1.py'>
>IE = win32com.client.DispatchEx('InternetExplorer.Appli cation.1')
dir(IE)
['CLSID', 'ClientToWindow', 'ExecWB', 'GetProperty', 'GoBack', 'GoForward',
'GoHome', 'GoSearch', 'Navigate', 'Navigate2', 'PutProperty',
'QueryStatusWB', 'Quit', 'Refresh', 'Refresh2', 'ShowBrowserBar', 'Stop',
'_ApplyTypes_', '__call__', '__cmp__', '__doc__', '__getattr__',
'__init__', '__int__', '__module__', '__repr__', '__setattr__', '__str__',
'__unicode__', '_get_good_object_', '_get_good_single_object_', '_oleobj_',
'_prop_map_get_', '_prop_map_put_', 'coclass_clsid']
>IE.Visible=True
IE.Navigate("http://plone.org")
while IE.Busy: pass
>print IE.Document.getElementById("portlet-news").innerHTML
<DT class=portletHeader><A class="feedButton link-plain"
href="feed://plone.org/news/newslisting/RSS"><IMG title="RSS subscription
feed for news items" alt=RSS src="http://plone.org/rss.gif"</A><A
href="http://plone.org/news">News</A</DT>

... and so on ...
See
http://msdn.microsoft.com/workshop/b...ce/objects/int
ernetexplorer.asp
for the documentation.
Jan 22 '07 #10
"John" <we**********@yahoo.comescribió en el mensaje
news:11*********************@38g2000cwa.googlegrou ps.com...
My python2.5 installation on windows did not come with "win32com".
How do I install/get this module for windows?
Look for the pywin32 package at sourceforge.net

--
Gabriel Genellina
Jan 22 '07 #11

I tried it, didnt work with the python25 distribution msi file that is
on python.org
But activestate python worked. Now I can open IE using COM. What I am
trying
to figure out is how to click an x,y coordinate on a page in IE
automatically
using COM. How about typing something automatically...Any ideas?

Thanks,
--j

Gabriel Genellina wrote:
"John" <we**********@yahoo.comescribió en el mensaje
news:11*********************@38g2000cwa.googlegrou ps.com...
My python2.5 installation on windows did not come with "win32com".
How do I install/get this module for windows?

Look for the pywin32 package at sourceforge.net

--
Gabriel Genellina
Jan 22 '07 #12
"John" <we**********@yahoo.comwrote:
I tried it, didnt work with the python25 distribution msi file that is
on python.org
But activestate python worked. Now I can open IE using COM. What I am
trying
to figure out is how to click an x,y coordinate on a page in IE
automatically
using COM. How about typing something automatically...Any ideas?
Don't think about clicking a coordinate or typing something; think about
the actions on the page. e.g. to fill in a field on a form you'll want
something like:

ie.document.forms[formname][fieldname].value = 'whatever'

to click a button call its click method e.g.

submit = ie.document.forms[0]['submit']
submit.focus()
submit.click()

Check out the documentation at msdn.microsoft.com for the application,
document, form etc. objects. Generally speaking anything you could have
done through javascript you should be able to do through automation, plus a
few of other things that javascript might have blocked for security
reasons.

Jan 22 '07 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Shlomi Schwartz | last post by:
Hi guys, Here is the case: <a href> inside an <IFRAME> dose not fire the event ondrop when saved as HTA Allow me to explain: I have a link inside an Iframe and I want to catch the ondrop
4
by: Newbie | last post by:
Is it possible to set up an event handler or something else so that when *any* link on the page is clicked it 'fires-up', executes some JS and then continues to process the link that was clicked?...
5
by: juglesh | last post by:
"$string = isset($xyz) ? $xyz : "something else";" Hello, someone gave code like this in another thread. I understand (by inference) what it does, but have not found any documentation on this...
21
by: strutsng | last post by:
<input type="file"> only allows the user to browse for files. How about "browse for folder" dialog? Can html/javascript do that? I couldn't find any syntax for that. If not, please advise what...
9
by: outstretchedarm | last post by:
How exactly does HTML/Javascript handle playing midi files? Does it have a player imbedded in it? Or does it borrow from the computer's midi player? How could you make a webpage play certain...
5
by: fjanon | last post by:
Is there a way to remove the default footer/header from the printed page when printing a page from HTML/Javascript in IE 6? I want to use letterhead paper and I have to remove manually the...
9
by: ajos | last post by:
hi all, im getting some problems in my javascript validations..... my jsp code--> 8: <head> 9: <title>Budget Master Administration</title> 10: <meta name="GENERATOR" content="Microsoft...
15
by: UberGirl | last post by:
Hi, I'm a novice... just trying figure out something simple, I'm sure. I've got a script within my <head> tags to show/hide divs on click: function showHide(divID, imgID){ if...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.