By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,236 Members | 1,011 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,236 IT Pros & Developers. It's quick & easy.

html + javascript automations = [mechanize + ?? ] or something else?

P: n/a

I have to write a spyder for a webpage that uses html + javascript. I
had it written using mechanize
but the authors of the webpage now use a lot of javascript. Mechanize
can no longer do the job.
Does anyone know how I could automate my spyder to understand
javascript? Is there a way
to control a browser like firefox from python itself? How about IE?
That way, we do not have
to go thru something like mechanize?

Thanks in advance for your help/comments,
--j

Jan 16 '07 #1
Share this Question
Share on Google+
12 Replies


P: n/a

I am curious about the webbrowser module. I can open up firefox
using webbrowser.open(), but can one control it? Say enter a
login / passwd on a webpage? Send keystrokes to firefox?
mouse clicks?

Thanks,
--j

John wrote:
I have to write a spyder for a webpage that uses html + javascript. I
had it written using mechanize
but the authors of the webpage now use a lot of javascript. Mechanize
can no longer do the job.
Does anyone know how I could automate my spyder to understand
javascript? Is there a way
to control a browser like firefox from python itself? How about IE?
That way, we do not have
to go thru something like mechanize?

Thanks in advance for your help/comments,
--j
Jan 16 '07 #2

P: n/a
Hello,

John wrote:
John wrote:
>I have to write a spyder for a webpage that uses html + javascript. I
had it written using mechanize
but the authors of the webpage now use a lot of javascript. Mechanize
can no longer do the job.
Does anyone know how I could automate my spyder to understand
javascript? Is there a way
to control a browser like firefox from python itself? How about IE?
That way, we do not have
to go thru something like mechanize?

I am curious about the webbrowser module. I can open up firefox
using webbrowser.open(), but can one control it? Say enter a
login / passwd on a webpage? Send keystrokes to firefox?
mouse clicks?
Not with the webbrowser module - it can only launch a browser.

On the website of mechanize you will also find DOMForm
<http://wwwsearch.sourceforge.net/DOMForm/>, which is a webscraper with
basic JS support (using the SpiderMonkey engine from the Mozilla project).
But note that DOMForm is in a early state and not developed anymore
(according to the site, never used it myself).

You could try to script IE (perhaps also FF, dunno..) using COM. This can be
done using the pywin32 module <https://sourceforge.net/projects/pywin32/>.
How this is done in detail is a windows issue. You may find help and
documentation in win specific group/mailing list, msdn, ... You can usually
translate the COM calls from VB, C#, ... quite directly to Python.
HTH

--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://pink.odahoda.de/
Jan 16 '07 #3

P: n/a
John,

"J" == John wrote:

JI have to write a spyder for a webpage that uses html + javascript. I
Jhad it written using mechanize but the authors of the webpage now use a
Jlot of javascript. Mechanize can no longer do the job. Does anyone
Jknow how I could automate my spyder to understand javascript? Is there
Ja way to control a browser like firefox from python itself? How about
JIE? That way, we do not have to go thru something like mechanize?

Up to my knowledge, there no way to test javascript but to fire up a
browser.

So, you might check Selenium (http://www.openqa.org/selenium/) and its
python module.

--
Andrey V Khavryuchenko
Software Development Company http://www.kds.com.ua/
Jan 16 '07 #4

P: n/a
Up to my knowledge, there no way to test javascript but to fire up a
browser.

So, you might check Selenium (http://www.openqa.org/selenium/) and its
python module.
No use in that, as to be remote-controlled by python, selenium must be run
on the server-site itself, due to JS security model restrictions.

Diez
Jan 16 '07 #5

P: n/a
"John" <we**********@yahoo.comwrote:
Is there a way
to control a browser like firefox from python itself? How about IE?
IE is easy enough to control and you have full access to the DOM:
>>import win32com
win32com.client.gencache.EnsureModule('{EAB22A C0-30C1-11CF-A7EB-
0000C05BAE0B}', 0, 1, 1)
<module 'win32com.gen_py.EAB22AC0-30C1-11CF-A7EB-0000C05BAE0Bx0x1x1' from
'C:\Python25\lib\site-packages\win32com\gen_py\EAB22AC0-30C1-11CF-A7EB-
0000C05BAE0Bx0x1x1.py'>
>>IE = win32com.client.DispatchEx('InternetExplorer.Appli cation.1')
dir(IE)
['CLSID', 'ClientToWindow', 'ExecWB', 'GetProperty', 'GoBack', 'GoForward',
'GoHome', 'GoSearch', 'Navigate', 'Navigate2', 'PutProperty',
'QueryStatusWB', 'Quit', 'Refresh', 'Refresh2', 'ShowBrowserBar', 'Stop',
'_ApplyTypes_', '__call__', '__cmp__', '__doc__', '__getattr__',
'__init__', '__int__', '__module__', '__repr__', '__setattr__', '__str__',
'__unicode__', '_get_good_object_', '_get_good_single_object_', '_oleobj_',
'_prop_map_get_', '_prop_map_put_', 'coclass_clsid']
>>IE.Visible=True
IE.Navigate("http://plone.org")
while IE.Busy: pass
>>print IE.Document.getElementById("portlet-news").innerHTML
<DT class=portletHeader><A class="feedButton link-plain"
href="feed://plone.org/news/newslisting/RSS"><IMG title="RSS subscription
feed for news items" alt=RSS src="http://plone.org/rss.gif"</A><A
href="http://plone.org/news">News</A</DT>

.... and so on ...
See
http://msdn.microsoft.com/workshop/b...ce/objects/int
ernetexplorer.asp
for the documentation.
Jan 16 '07 #6

P: n/a
Diez,

"DBR" == Diez B Roggisch wrote:
>Up to my knowledge, there no way to test javascript but to fire up a
browser.

So, you might check Selenium (http://www.openqa.org/selenium/) and its
python module.
DBRNo use in that, as to be remote-controlled by python, selenium must be run
DBRon the server-site itself, due to JS security model restrictions.

Sorry, missed 'spider' word in the original post.

--
Andrey V Khavryuchenko
Software Development Company http://www.kds.com.ua/
Jan 16 '07 #7

P: n/a
ina

John wrote:
I have to write a spyder for a webpage that uses html + javascript. I
had it written using mechanize
but the authors of the webpage now use a lot of javascript. Mechanize
can no longer do the job.
Does anyone know how I could automate my spyder to understand
javascript? Is there a way
to control a browser like firefox from python itself? How about IE?
That way, we do not have
to go thru something like mechanize?

Thanks in advance for your help/comments,
--j
You want pamie, iec or ishybrowser. Pamie is probably the best choice
since it gets patches and updates on a regular basis.

http://pamie.sourceforge.net/

Jan 16 '07 #8

P: n/a

I tried to install pamie (but I have mostly used python on cygwin on
windows).
In the section " What will you need to run PAMIE", it says I will need
"Mark Hammonds Win32 All"
which I can not find. Can anyone tell me how do I install PAMIE? Do I
need python for
windows that is different from cygwin's python?

Thanks,
--j

ina wrote:
John wrote:
I have to write a spyder for a webpage that uses html + javascript. I
had it written using mechanize
but the authors of the webpage now use a lot of javascript. Mechanize
can no longer do the job.
Does anyone know how I could automate my spyder to understand
javascript? Is there a way
to control a browser like firefox from python itself? How about IE?
That way, we do not have
to go thru something like mechanize?

Thanks in advance for your help/comments,
--j

You want pamie, iec or ishybrowser. Pamie is probably the best choice
since it gets patches and updates on a regular basis.

http://pamie.sourceforge.net/
Jan 22 '07 #9

P: n/a


My python2.5 installation on windows did not come with "win32com".
How do I install/get this module for windows?

Thanks,
--j

Duncan Booth wrote:
"John" <we**********@yahoo.comwrote:
Is there a way
to control a browser like firefox from python itself? How about IE?

IE is easy enough to control and you have full access to the DOM:
>import win32com
win32com.client.gencache.EnsureModule('{EAB22AC 0-30C1-11CF-A7EB-
0000C05BAE0B}', 0, 1, 1)
<module 'win32com.gen_py.EAB22AC0-30C1-11CF-A7EB-0000C05BAE0Bx0x1x1' from
'C:\Python25\lib\site-packages\win32com\gen_py\EAB22AC0-30C1-11CF-A7EB-
0000C05BAE0Bx0x1x1.py'>
>IE = win32com.client.DispatchEx('InternetExplorer.Appli cation.1')
dir(IE)
['CLSID', 'ClientToWindow', 'ExecWB', 'GetProperty', 'GoBack', 'GoForward',
'GoHome', 'GoSearch', 'Navigate', 'Navigate2', 'PutProperty',
'QueryStatusWB', 'Quit', 'Refresh', 'Refresh2', 'ShowBrowserBar', 'Stop',
'_ApplyTypes_', '__call__', '__cmp__', '__doc__', '__getattr__',
'__init__', '__int__', '__module__', '__repr__', '__setattr__', '__str__',
'__unicode__', '_get_good_object_', '_get_good_single_object_', '_oleobj_',
'_prop_map_get_', '_prop_map_put_', 'coclass_clsid']
>IE.Visible=True
IE.Navigate("http://plone.org")
while IE.Busy: pass
>print IE.Document.getElementById("portlet-news").innerHTML
<DT class=portletHeader><A class="feedButton link-plain"
href="feed://plone.org/news/newslisting/RSS"><IMG title="RSS subscription
feed for news items" alt=RSS src="http://plone.org/rss.gif"</A><A
href="http://plone.org/news">News</A</DT>

... and so on ...
See
http://msdn.microsoft.com/workshop/b...ce/objects/int
ernetexplorer.asp
for the documentation.
Jan 22 '07 #10

P: n/a
"John" <we**********@yahoo.comescribió en el mensaje
news:11*********************@38g2000cwa.googlegrou ps.com...
My python2.5 installation on windows did not come with "win32com".
How do I install/get this module for windows?
Look for the pywin32 package at sourceforge.net

--
Gabriel Genellina
Jan 22 '07 #11

P: n/a

I tried it, didnt work with the python25 distribution msi file that is
on python.org
But activestate python worked. Now I can open IE using COM. What I am
trying
to figure out is how to click an x,y coordinate on a page in IE
automatically
using COM. How about typing something automatically...Any ideas?

Thanks,
--j

Gabriel Genellina wrote:
"John" <we**********@yahoo.comescribió en el mensaje
news:11*********************@38g2000cwa.googlegrou ps.com...
My python2.5 installation on windows did not come with "win32com".
How do I install/get this module for windows?

Look for the pywin32 package at sourceforge.net

--
Gabriel Genellina
Jan 22 '07 #12

P: n/a
"John" <we**********@yahoo.comwrote:
I tried it, didnt work with the python25 distribution msi file that is
on python.org
But activestate python worked. Now I can open IE using COM. What I am
trying
to figure out is how to click an x,y coordinate on a page in IE
automatically
using COM. How about typing something automatically...Any ideas?
Don't think about clicking a coordinate or typing something; think about
the actions on the page. e.g. to fill in a field on a form you'll want
something like:

ie.document.forms[formname][fieldname].value = 'whatever'

to click a button call its click method e.g.

submit = ie.document.forms[0]['submit']
submit.focus()
submit.click()

Check out the documentation at msdn.microsoft.com for the application,
document, form etc. objects. Generally speaking anything you could have
done through javascript you should be able to do through automation, plus a
few of other things that javascript might have blocked for security
reasons.

Jan 22 '07 #13

This discussion thread is closed

Replies have been disabled for this discussion.