469,592 Members | 2,008 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,592 developers. It's quick & easy.

Screenscraping, in python, a web page that requires javascript?


Is there a method, with python, of screenscraping a web page, if that web
page uses javascript?

I know about BeautifulSoup, but AFAIK at this time, BeautifulSoup is for
HTML that doesn't have embedded javascript.

Thanks!

Aug 9 '07 #1
1 1388
Dan Stromberg - Datallegro <ds********@datallegro.comwrites:
Is there a method, with python, of screenscraping a web page, if that web
page uses javascript?
Not pure CPython, no.

I know about BeautifulSoup, but AFAIK at this time, BeautifulSoup is for
HTML that doesn't have embedded javascript.
It's not that BeautifulSoup is unhappy with JS, it's just that there's
no support for executing the JS.

There are some Java libraries that know how to execute JS embedded in
web pages, which could be used from Jython:

http://www.thefrontside.net/crosscheck

http://htmlunit.sourceforge.net/

http://httpunit.sourceforge.net/
You can also automate a browser, but that still seems to be painful in
one way or another.
John
Aug 9 '07 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

16 posts views Thread by Paul Prescod | last post: by
3 posts views Thread by John Draper | last post: by
17 posts views Thread by MilkmanDan | last post: by
1 post views Thread by Philipp Lenssen | last post: by
reply views Thread by Laszlo Nagy | last post: by
reply views Thread by suresh191 | last post: by
4 posts views Thread by guiromero | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.