469,353 Members | 2,061 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,353 developers. It's quick & easy.

Parsing HTML with JavaScript

I am trying to extract some information from a few web pages, and I was
using the HTMLParser module. It worked fine until it got to the
javascript, at which it gave a parse error. Is there a good way to work
around this or should I just preparse the file to remove the javascript
manually? This is my first python program.
Jul 19 '05 #1
2 2075

<mt******@tacobell.land> wrote in message news:slrnd88pns.qsm.mt******@tacobell.land...
I am trying to extract some information from a few web pages, and I was
using the HTMLParser module. It worked fine until it got to the
javascript, at which it gave a parse error.


It's fairly common for pages with Javascript to also be invalid HTML.
HTMLParser isn't an 'ignore all errors silently and guess what it's
meant to be' parser. Unless you have known good inputs it's often
best to use an alternative. Some options are discussed in Uche's article
here: http://www.xml.com/pub/a/2004/09/08/pyxml.html
Jul 19 '05 #2
mt******@tacobell.land writes:
I am trying to extract some information from a few web pages, and I was
using the HTMLParser module. It worked fine until it got to the
javascript, at which it gave a parse error. Is there a good way to work
around this or should I just preparse the file to remove the javascript
manually? This is my first python program.


sgmllib is very similar to HTMLParser, but doesn't break so easily
(but sgmllib has some problems with XHTML -- swings and roundabouts).

Or, try BeautifulSoup.
John
Jul 19 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

16 posts views Thread by Terry | last post: by
5 posts views Thread by Martin Walke | last post: by
reply views Thread by bruce | last post: by
3 posts views Thread by Rodrigo Meza | last post: by
2 posts views Thread by hzgt9b | last post: by
1 post views Thread by avpkills2002 | last post: by
1 post views Thread by Philip Semanchuk | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.