By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,377 Members | 1,659 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,377 IT Pros & Developers. It's quick & easy.

HTML Structure Extraction

P: n/a
Hi,

I'm going to write a program that extracts the structure of HTML
documents. The structure would be in the form of a tree, separating the
tags and grouping the start and end tags. I think I will use
htmllib.HTMLParser, is it appropriate for my application? If so, I
believe I will need to keep track of the depth reached.

Any tips for such application will be much appreciated.

Cheers,
Michael

Jul 18 '05 #1
Share this Question
Share on Google+
1 Reply


P: n/a
<da*****@hotmail.com> wrote:
I'm going to write a program that extracts the structure of HTML
documents. The structure would be in the form of a tree, separating the
tags and grouping the start and end tags. I think I will use
htmllib.HTMLParser, is it appropriate for my application? If so, I
believe I will need to keep track of the depth reached.


you mean like:

http://www.crummy.com/software/BeautifulSoup/
http://effbot.org/zone/element-tidylib.htm
http://utidylib.berlios.de/
http://www.xmlsoft.org/
http://effbot.org/zone/pythondoc-ele...reeBuilder.htm

and a few dozen others?

</F>

Jul 18 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.