ma****@v.loewis.de (Martin v. Löwis) writes:
pi***@map.com (David Pinto) writes:
I'm trying to use either the minidom or pulldom to find table tags in
html web pages. I've tried parsing two web pages that show up fine in
[...] minidom is an XML parser. Most Web pages are not XML, but some form of
HTML.
You should have better chances with parsing HTML using htmllib.
Or, better, HTMLParser.HTMLParser -- works better with XHTML.
If you don't mind dependencies and want a document tree, a good plan
is to shove everything through mxTidy or uTidylib to generate XHTML,
then use the XML API of your choice.
John