By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
454,526 Members | 1,618 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 454,526 IT Pros & Developers. It's quick & easy.

parse a table in HTML page.

P: n/a
Hi all,
I have a need to read and parse a table in HTML page.

Iím using the following script:
http://trac.davidgrant.ca/browser/sr...TableParser.py

It works fine aside from link in href.

Example:

String to parse:
<tr><td><a href='vaffa.html'>elog</a></td><td>normal text</td></tr>

Output:
[[['elog', 'normal text']]]

as you can see it misses the info about href...
how can get this information 'vaffa.html'?

thanks,
Antonella
Oct 28 '08 #1
Share this Question
Share on Google+
2 Replies


P: n/a
Have you looked at beautiful soup?
http://www.crummy.com/software/BeautifulSoup/

antonio_wn8 schrieb:
Hi all,
I have a need to read and parse a table in HTML page.

Iím using the following script:
http://trac.davidgrant.ca/browser/sr...TableParser.py

It works fine aside from link in href.

Example:

String to parse:
<tr><td><a href='vaffa.html'>elog</a></td><td>normal text</td></tr>

Output:
[[['elog', 'normal text']]]

as you can see it misses the info about href...
how can get this information 'vaffa.html'?


--
Thomas Guettler, http://www.thomas-guettler.de/
E-Mail: guettli (*) thomas-guettler + de
Oct 28 '08 #2

P: n/a
antonio_wn8 wrote:
I have a need to read and parse a table in HTML page.

Iím using the following script:
http://trac.davidgrant.ca/browser/sr...TableParser.py

It works fine aside from link in href.

Example:

String to parse:
<tr><td><a href='vaffa.html'>elog</a></td><td>normal text</td></tr>

Output:
[[['elog', 'normal text']]]
You should try lxml.html. It gives you various tools like XPath to look for
specific elements and helper functions to find the links in an HTML document.

http://codespeak.net/lxml/

Stefan
Oct 28 '08 #3

This discussion thread is closed

Replies have been disabled for this discussion.