472,119 Members | 1,883 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,119 software developers and data experts.

Screenscraping, in python, a web page that requires javascript?


Is there a method, with python, of screenscraping a web page, if that web
page uses javascript?

I know about BeautifulSoup, but AFAIK at this time, BeautifulSoup is for
HTML that doesn't have embedded javascript.

Thanks!

Aug 9 '07 #1
1 1459
Dan Stromberg - Datallegro <ds********@datallegro.comwrites:
Is there a method, with python, of screenscraping a web page, if that web
page uses javascript?
Not pure CPython, no.

I know about BeautifulSoup, but AFAIK at this time, BeautifulSoup is for
HTML that doesn't have embedded javascript.
It's not that BeautifulSoup is unhappy with JS, it's just that there's
no support for executing the JS.

There are some Java libraries that know how to execute JS embedded in
web pages, which could be used from Jython:

http://www.thefrontside.net/crosscheck

http://htmlunit.sourceforge.net/

http://httpunit.sourceforge.net/
You can also automate a browser, but that still seems to be painful in
one way or another.
John
Aug 9 '07 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

16 posts views Thread by Paul Prescod | last post: by
3 posts views Thread by John Draper | last post: by
17 posts views Thread by MilkmanDan | last post: by
1 post views Thread by Philipp Lenssen | last post: by
reply views Thread by Laszlo Nagy | last post: by
reply views Thread by leo001 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.