By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,257 Members | 1,180 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,257 IT Pros & Developers. It's quick & easy.

minidom and pulldom

P: n/a
I'm trying to use either the minidom or pulldom to find table tags in
html web pages. I've tried parsing two web pages that show up fine in
my browser, but I get errors when I call minidom.parse, or try to get
events with pulldom. Is there a parser that is as forgiving as web
browsers?
Jul 18 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
pi***@map.com (David Pinto) writes:
I'm trying to use either the minidom or pulldom to find table tags in
html web pages. I've tried parsing two web pages that show up fine in
my browser, but I get errors when I call minidom.parse, or try to get
events with pulldom. Is there a parser that is as forgiving as web
browsers?


minidom is an XML parser. Most Web pages are not XML, but some form of
HTML.

You should have better chances with parsing HTML using htmllib.

Regards,
Martin

Jul 18 '05 #2

P: n/a
ma****@v.loewis.de (Martin v. Lwis) writes:
pi***@map.com (David Pinto) writes:
I'm trying to use either the minidom or pulldom to find table tags in
html web pages. I've tried parsing two web pages that show up fine in
[...] minidom is an XML parser. Most Web pages are not XML, but some form of
HTML.

You should have better chances with parsing HTML using htmllib.


Or, better, HTMLParser.HTMLParser -- works better with XHTML.

If you don't mind dependencies and want a document tree, a good plan
is to shove everything through mxTidy or uTidylib to generate XHTML,
then use the XML API of your choice.
John
Jul 18 '05 #3

P: n/a
pi***@map.com (David Pinto) writes:
I'm trying to use either the minidom or pulldom to find table tags in
html web pages. I've tried parsing two web pages that show up fine in
my browser, but I get errors when I call minidom.parse, or try to get
events with pulldom. Is there a parser that is as forgiving as web
browsers?


Didn't this get answered just the other day?

minidom and pulldom are built on XML parsers. HTML is not XML.

If you want a tree, I recommend using pushing the HTML through mxTidy
or uTidylib, and feeding the resultant XHTML to the XML API of your
choice.
John
Jul 18 '05 #4

P: n/a
jj*@pobox.com (John J. Lee) writes:
[...]
Didn't this get answered just the other day?

[...]

Whoops, local news trouble, I guess.
John
Jul 18 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.