473,378 Members | 1,434 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

Hands-on HTML Table Parser/Matrix?

Often I want to extract some web table contents. Formats are
mostly static, simple text & numbers in it, other tags to be
stripped off. So a simple & fast approach would be ok.

What of the different modules around is most easy to use, stable,
up-to-date, iterator access or best matrix-access (without need
for callback functions,classes.. for basic tasks)?
Robert
Jul 6 '08 #1
3 1581
There are couple of HTML examples using Pyparsing here:

http://pyparsing.wikispaces.com/Examples
--Tim

On Sun, 2008-07-06 at 14:40 +0200, robert wrote:
Often I want to extract some web table contents. Formats are
mostly static, simple text & numbers in it, other tags to be
stripped off. So a simple & fast approach would be ok.

What of the different modules around is most easy to use, stable,
up-to-date, iterator access or best matrix-access (without need
for callback functions,classes.. for basic tasks)?
Robert
--
http://mail.python.org/mailman/listinfo/python-list
--
Timothy Cook, MSc
Health Informatics Research & Development Services
LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook
Skype ID == timothy.cook
************************************************** ************
*You may get my Public GPG key from popular keyservers or *
*from this link http://timothywayne.cook.googlepages.com/home*
************************************************** ************

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQBIcL/72TFRV0OoZwMRAuOEAKCpDdFwDmNP6XzHfiFQlMeKkvnprwCeM T/H
EhH0g7ctU0eiz8XtbLZBoLI=
=V+Xf
-----END PGP SIGNATURE-----

Jul 6 '08 #2
Tim Cook wrote:
>
On Sun, 2008-07-06 at 14:40 +0200, robert wrote:
>Often I want to extract some web table contents. Formats are
mostly static, simple text & numbers in it, other tags to be
stripped off. So a simple & fast approach would be ok.

What of the different modules around is most easy to use, stable,
up-to-date, iterator access or best matrix-access (without need
for callback functions,classes.. for basic tasks)?
There are couple of HTML examples using Pyparsing here:

http://pyparsing.wikispaces.com/Examples

hm - nothing special with HTML tables.

Meanwhile:

I dislike "ClientTable" (file centric, too much parsing errors in
real world).

"TableParse" works. Very simple&fast 70-liner regexp->matrix and
strip/clean/HTML-entities conversion. Fast success hands-on.
Doesn't separate nested tables and such complexities consciously -
but works though for simple hands-on tasks in real world.
Robert
Jul 6 '08 #3
robert <no*****@no-spam-no-spam.invalid>:
Often I want to extract some web table contents. Formats are
mostly static, simple text & numbers in it, other tags to be
stripped off. So a simple & fast approach would be ok.

What of the different modules around is most easy to use, stable,
up-to-date, iterator access or best matrix-access (without need
for callback functions,classes.. for basic tasks)?
Not more than a handful of lines with lxml.html:

def htmltable2matrix(table):
"""Converts a html table to a matrix.

:param table: The html table element
:type table: An lxml element
"""
matrix = []
for row in table:
matrix.append([e.text_content() for e in row])
return matrix

--
Freedom is always the freedom of dissenters.
(Rosa Luxemburg)
Jul 6 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

27
by: Simon Biber | last post by:
I was reading http://en.wikipedia.org/wiki/Poker_probability which has a good description of how to count the frequency of different types of poker hands using a mathematical approach. A sample...
1
by: shanmips | last post by:
Dear Partners, We are inviting you to join www.Corp-Corp.com, the fast growing network of recruiters. Please view the list of companies already joined in this network...
15
by: sandy123456 | last post by:
At the moment im trying to write a hand class for a game poker patientnce But when i get to the part having to catergorise the difference of full house straight flush flush four of a kind and...
13
by: kinghippo423 | last post by:
Hello Everyone, I did a poker program in Java that essencially finds the strenght of a poker hand created Randomly. My program is doing OK...but I'm pretty sure it can be optimised. This is my...
24
by: bnashenas1984 | last post by:
Hi every one I'm trying to make a little poker game but I don't know how to evaluate the strength of a 7 card hand.. It's not that hard with 5 cards. Actually I found some program to do that with...
2
by: DesiShaddy | last post by:
Hi Guys, I have all my code working except sort function . I need to sort the cards in hand......and I am having hard time with that.... Any help would be really helpful ;) class hand{...
7
by: Concepts Systems | last post by:
Hello All, Advance C and Linux System Programming are an intensive hands-on course designed by Concepts Systems to provide a detailed examination of each topic. These modules enable...
30
by: imran akhtar | last post by:
i have a balckjack code, which does not seem to run, in python, it comes up with syntax error, i have try sortng it out. does not seem to work. below is my code, if anyone can work out wht wrong...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.