By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,922 Members | 1,689 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,922 IT Pros & Developers. It's quick & easy.

Screenscraping, in python, a web page that requires javascript?

P: n/a

Is there a method, with python, of screenscraping a web page, if that web
page uses javascript?

I know about BeautifulSoup, but AFAIK at this time, BeautifulSoup is for
HTML that doesn't have embedded javascript.

Thanks!

Aug 9 '07 #1
Share this Question
Share on Google+
1 Reply


P: n/a
Dan Stromberg - Datallegro <ds********@datallegro.comwrites:
Is there a method, with python, of screenscraping a web page, if that web
page uses javascript?
Not pure CPython, no.

I know about BeautifulSoup, but AFAIK at this time, BeautifulSoup is for
HTML that doesn't have embedded javascript.
It's not that BeautifulSoup is unhappy with JS, it's just that there's
no support for executing the JS.

There are some Java libraries that know how to execute JS embedded in
web pages, which could be used from Jython:

http://www.thefrontside.net/crosscheck

http://htmlunit.sourceforge.net/

http://httpunit.sourceforge.net/
You can also automate a browser, but that still seems to be painful in
one way or another.
John
Aug 9 '07 #2

This discussion thread is closed

Replies have been disabled for this discussion.