473,216 Members | 2,142 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,216 software developers and data experts.

How to read information from tables in HTML?

Hi, all

I'm confronted with some trouble when dealing with html files.

The html files contain javascript and some information stored in tables.
And it seems that they're not well-formed, when parsed with minidom, it
will say "mismatched tag".
Then how can i get information from those files? Is there any useful
library for me?

Many thanks ;-)
Aug 3 '07 #1
1 1382
ZelluX wrote:
I'm confronted with some trouble when dealing with html files.

The html files contain javascript and some information stored in tables.
And it seems that they're not well-formed, when parsed with minidom, it
will say "mismatched tag".
minidom deals with XML. You're trying to read something that's (similar to)
HTML. HTML is much less strict.

Then how can i get information from those files? Is there any useful
library for me?
BeautifulSoup or lxml.html (which supports the BeautifulSoup parser, btw).

Both can deal with broken HTML, but lxml.html has better support for cleaning
up HTML (e.g. removing Javascript or embedded content, etc.) or handling forms.

http://codespeak.net/lxml/

The lxml.html package is not currently in an official lxml release, but you
can install it from SVN sources:

http://codespeak.net/svn/lxml/branch/html/

A release is expected soon.

Stefan
Aug 3 '07 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Peter | last post by:
I am having a problem reading an Excel file that is XML based. The directory I am reading contains Excel files that can be of two types. Either generic Microsoft based or XML based. I am reading...
1
by: jaYPee | last post by:
i read some article that it's better to store some information to xml than using an ini file. suppose to be i want to read and write this data: file name="my.ini" Server="myServer"...
4
by: Scot L. Harris | last post by:
Currently using Postgresql 7.2.4-5.80 with php 4.2.2.-8.0.8 on a redhat 8.0 system. I am writing some php scripts where I want to generate a list of the column names in a particular table that...
5
by: Mark A. Sam | last post by:
Hello, I am trying to use a literal control to past test onto a page from several buttons, so that each button displays something different. The problem I am encountering is that the text wraps...
1
PEB
by: PEB | last post by:
POSTING GUIDELINES Please follow these guidelines when posting questions Post your question in a relevant forum Do NOT PM questions to individual experts - This is not fair on them and...
10
by: Phil Stanton | last post by:
There are various hidden tables in Acees 2000 including MSysACEs The owner in Engine (I presume the Microsoft Jet Engine); I can't read the data or change the permissions. Any ideas Thanks ...
8
by: send.me.all.email | last post by:
Hi experts, which approaches would you suggest for: - Reading a database schema (tables, fields, relationships) from SQL Server 2005? - Visualizing the DB schema? For developing a DB tool...
2
by: Bobby | last post by:
Hi, Not sure if this is Access, SQL or ODBC. I have a SQL database with an Access Front End. They are linked with ODBC. Occasionally (it's happened 3 times in 4 months) some of the linked tables...
6
Cintury
by: Cintury | last post by:
Hi all, I've developed a mobile application for windows mobile 5.0 that has been in use for a while (1 year and a couple of months). It was developed in visual studios 2005 with a back-end sql...
1
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...
0
by: VivesProcSPL | last post by:
Obviously, one of the original purposes of SQL is to make data query processing easy. The language uses many English-like terms and syntax in an effort to make it easy to learn, particularly for...
0
by: jianzs | last post by:
Introduction Cloud-native applications are conventionally identified as those designed and nurtured on cloud infrastructure. Such applications, rooted in cloud technologies, skillfully benefit from...
0
by: mar23 | last post by:
Here's the situation. I have a form called frmDiceInventory with subform called subfrmDice. The subform's control source is linked to a query called qryDiceInventory. I've been trying to pick up the...
0
by: abbasky | last post by:
### Vandf component communication method one: data sharing ​ Vandf components can achieve data exchange through data sharing, state sharing, events, and other methods. Vandf's data exchange method...
2
by: jimatqsi | last post by:
The boss wants the word "CONFIDENTIAL" overlaying certain reports. He wants it large, slanted across the page, on every page, very light gray, outlined letters, not block letters. I thought Word Art...
0
by: fareedcanada | last post by:
Hello I am trying to split number on their count. suppose i have 121314151617 (12cnt) then number should be split like 12,13,14,15,16,17 and if 11314151617 (11cnt) then should be split like...
0
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
0
Git
by: egorbl4 | last post by:
Скачал я git, хотел начать настройку, а там вылезло вот это Что это? Что мне с этим делать? ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.