473,394 Members | 1,971 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

How to read information from tables in HTML?

Hi, all

I'm confronted with some trouble when dealing with html files.

The html files contain javascript and some information stored in tables.
And it seems that they're not well-formed, when parsed with minidom, it
will say "mismatched tag".
Then how can i get information from those files? Is there any useful
library for me?

Many thanks ;-)
Aug 3 '07 #1
1 1389
ZelluX wrote:
I'm confronted with some trouble when dealing with html files.

The html files contain javascript and some information stored in tables.
And it seems that they're not well-formed, when parsed with minidom, it
will say "mismatched tag".
minidom deals with XML. You're trying to read something that's (similar to)
HTML. HTML is much less strict.

Then how can i get information from those files? Is there any useful
library for me?
BeautifulSoup or lxml.html (which supports the BeautifulSoup parser, btw).

Both can deal with broken HTML, but lxml.html has better support for cleaning
up HTML (e.g. removing Javascript or embedded content, etc.) or handling forms.

http://codespeak.net/lxml/

The lxml.html package is not currently in an official lxml release, but you
can install it from SVN sources:

http://codespeak.net/svn/lxml/branch/html/

A release is expected soon.

Stefan
Aug 3 '07 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Peter | last post by:
I am having a problem reading an Excel file that is XML based. The directory I am reading contains Excel files that can be of two types. Either generic Microsoft based or XML based. I am reading...
1
by: jaYPee | last post by:
i read some article that it's better to store some information to xml than using an ini file. suppose to be i want to read and write this data: file name="my.ini" Server="myServer"...
4
by: Scot L. Harris | last post by:
Currently using Postgresql 7.2.4-5.80 with php 4.2.2.-8.0.8 on a redhat 8.0 system. I am writing some php scripts where I want to generate a list of the column names in a particular table that...
5
by: Mark A. Sam | last post by:
Hello, I am trying to use a literal control to past test onto a page from several buttons, so that each button displays something different. The problem I am encountering is that the text wraps...
1
PEB
by: PEB | last post by:
POSTING GUIDELINES Please follow these guidelines when posting questions Post your question in a relevant forum Do NOT PM questions to individual experts - This is not fair on them and...
10
by: Phil Stanton | last post by:
There are various hidden tables in Acees 2000 including MSysACEs The owner in Engine (I presume the Microsoft Jet Engine); I can't read the data or change the permissions. Any ideas Thanks ...
8
by: send.me.all.email | last post by:
Hi experts, which approaches would you suggest for: - Reading a database schema (tables, fields, relationships) from SQL Server 2005? - Visualizing the DB schema? For developing a DB tool...
2
by: Bobby | last post by:
Hi, Not sure if this is Access, SQL or ODBC. I have a SQL database with an Access Front End. They are linked with ODBC. Occasionally (it's happened 3 times in 4 months) some of the linked tables...
6
Cintury
by: Cintury | last post by:
Hi all, I've developed a mobile application for windows mobile 5.0 that has been in use for a while (1 year and a couple of months). It was developed in visual studios 2005 with a back-end sql...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.