473,320 Members | 2,048 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

parse a table in HTML page.

Hi all,
I have a need to read and parse a table in HTML page.

I’m using the following script:
http://trac.davidgrant.ca/browser/sr...TableParser.py

It works fine aside from link in href.

Example:

String to parse:
<tr><td><a href='vaffa.html'>elog</a></td><td>normal text</td></tr>

Output:
[[['elog', 'normal text']]]

as you can see it misses the info about href...
how can get this information 'vaffa.html'?

thanks,
Antonella
Oct 28 '08 #1
2 1413
Have you looked at beautiful soup?
http://www.crummy.com/software/BeautifulSoup/

antonio_wn8 schrieb:
Hi all,
I have a need to read and parse a table in HTML page.

I’m using the following script:
http://trac.davidgrant.ca/browser/sr...TableParser.py

It works fine aside from link in href.

Example:

String to parse:
<tr><td><a href='vaffa.html'>elog</a></td><td>normal text</td></tr>

Output:
[[['elog', 'normal text']]]

as you can see it misses the info about href...
how can get this information 'vaffa.html'?


--
Thomas Guettler, http://www.thomas-guettler.de/
E-Mail: guettli (*) thomas-guettler + de
Oct 28 '08 #2
antonio_wn8 wrote:
I have a need to read and parse a table in HTML page.

I’m using the following script:
http://trac.davidgrant.ca/browser/sr...TableParser.py

It works fine aside from link in href.

Example:

String to parse:
<tr><td><a href='vaffa.html'>elog</a></td><td>normal text</td></tr>

Output:
[[['elog', 'normal text']]]
You should try lxml.html. It gives you various tools like XPath to look for
specific elements and helper functions to find the links in an HTML document.

http://codespeak.net/lxml/

Stefan
Oct 28 '08 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: IceOnFire | last post by:
I am working on a script to extract statistics (which is updated daily) from a website, and insert them into a MySQL database. I want to take this website:...
8
by: Polar | last post by:
I am having troubles finding the parse error in this script. I've been checking for weeks. I am too new to the subject I guess. I am trying to show a readord and them have a form at the bottom...
3
by: Mitchua | last post by:
When I run the well quoted line: my $ascii = HTML::FormatText->new->format(HTML::Parse::parse_html($html)); to remove HTML tags from an html document, it replaces all tables with "". Is there a...
14
by: Roland Hall | last post by:
Since I'm not getting any response from the community, I'm reposting this under my managed account. I've turned my web.config friendly error messages off and it may be easier to view what I'm...
14
by: Rob Meade | last post by:
Hi all, I'm working on a project where there are just under 1300 course files, these are HTML files - my problem is that I need to do more with the content of these pages - and the thought of...
2
plumpnation
by: plumpnation | last post by:
I have now got this bulky piece of script working, it reads the form correctly, which sends the request to the web service using SOAP. It conforms to their DTD and the web service then responds and...
5
by: js | last post by:
I have a textbox contains text in the format of "yyyy/MM/dd hh:mm:ss". I need to parse the text using System.DateTime.Parse() function with custom format. I got an error using the following code. ...
14
by: dbldeegd | last post by:
My Java Script executes fine on an test HTML page, but on the required page which is php results in an error. the page The script drops in message box when the page is loaded since the script...
11
by: JRough | last post by:
I'm trying to use output buffering to cheat so i can print to excel which is called later than this header(). header("Content-type: application/xmsdownload"); header("Content-Disposition:...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.