467,161 Members | 994 Online
Bytes | Developer Community
Ask Question

Home New Posts Topics Members FAQ

Post your question to a community of 467,161 developers. It's quick & easy.

minidom and pulldom

I'm trying to use either the minidom or pulldom to find table tags in
html web pages. I've tried parsing two web pages that show up fine in
my browser, but I get errors when I call minidom.parse, or try to get
events with pulldom. Is there a parser that is as forgiving as web
browsers?
Jul 18 '05 #1
  • viewed: 1557
Share:
4 Replies
pi***@map.com (David Pinto) writes:
I'm trying to use either the minidom or pulldom to find table tags in
html web pages. I've tried parsing two web pages that show up fine in
my browser, but I get errors when I call minidom.parse, or try to get
events with pulldom. Is there a parser that is as forgiving as web
browsers?


minidom is an XML parser. Most Web pages are not XML, but some form of
HTML.

You should have better chances with parsing HTML using htmllib.

Regards,
Martin

Jul 18 '05 #2
ma****@v.loewis.de (Martin v. Lwis) writes:
pi***@map.com (David Pinto) writes:
I'm trying to use either the minidom or pulldom to find table tags in
html web pages. I've tried parsing two web pages that show up fine in
[...] minidom is an XML parser. Most Web pages are not XML, but some form of
HTML.

You should have better chances with parsing HTML using htmllib.


Or, better, HTMLParser.HTMLParser -- works better with XHTML.

If you don't mind dependencies and want a document tree, a good plan
is to shove everything through mxTidy or uTidylib to generate XHTML,
then use the XML API of your choice.
John
Jul 18 '05 #3
pi***@map.com (David Pinto) writes:
I'm trying to use either the minidom or pulldom to find table tags in
html web pages. I've tried parsing two web pages that show up fine in
my browser, but I get errors when I call minidom.parse, or try to get
events with pulldom. Is there a parser that is as forgiving as web
browsers?


Didn't this get answered just the other day?

minidom and pulldom are built on XML parsers. HTML is not XML.

If you want a tree, I recommend using pushing the HTML through mxTidy
or uTidylib, and feeding the resultant XHTML to the XML API of your
choice.
John
Jul 18 '05 #4
jj*@pobox.com (John J. Lee) writes:
[...]
Didn't this get answered just the other day?

[...]

Whoops, local news trouble, I guess.
John
Jul 18 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

5 posts views Thread by Paul Miller | last post: by
reply views Thread by xtian | last post: by
3 posts views Thread by Sunil Movva | last post: by
5 posts views Thread by Mike McGavin | last post: by
4 posts views Thread by webdev | last post: by
8 posts views Thread by jog | last post: by
reply views Thread by Greg Copeland | last post: by
reply views Thread by Gary | last post: by
reply views Thread by susan_ali@hotmail.com | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.