473,503 Members | 5,495 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

HTML Parser in python

Hi,

Is there a HTML parser (not xml) in python?
I need a html parser which has the ability to handle mal-format html
pages.

Thank you.

Apr 6 '07 #1
2 1461
On Apr 6, 1:05 pm, "ying...@gmail.com" <ying...@gmail.comwrote:
Hi,

Is there a HTML parser (not xml) in python?
I need a html parser which has the ability to handle mal-format html
pages.

Thank you.
Yeah...it's called Beautiful Soup.

http://www.crummy.com/software/BeautifulSoup/

Mike

Apr 6 '07 #2
Beautiful Soup. http://www.crummy.com/software/BeautifulSoup/

Works, well...beautifully.

Apr 6 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
2437
by: Leif K-Brooks | last post by:
I'm writing a site with mod_python which will have, among other things, forums. I want to allow users to use some HTML (<em>, <strong>, <p>, etc.) on the forums, but I don't want to allow bad...
1
5268
by: Karalius, Joseph | last post by:
Can anyone explain what is happening here? I haven't found any useful info on Google yet. Thanks in advance. mmagnet:/home/jkaralius/src/zopeplone/Python-2.3.5 # make gcc -pthread -c...
18
14724
by: SwordAngel | last post by:
Hello, I'm looking for a program that converts characters of different encodings (such as EUC-JP, Big5, GB-18030, etc.) into HTML ampersand escape sequences. Anybody knows where I can find one? ...
8
3969
by: Brian | last post by:
I was wondering if anyone here on the group could point me in a direction that would expllaing how to use python to convert a tsv file to html. I have been searching for a resource but have only...
13
4163
by: DH | last post by:
Hi, I'm trying to strip the html and other useless junk from a html page.. Id like to create something like an automated text editor, where it takes the keywords from a txt file and removes them...
4
1911
by: Jackie | last post by:
Hi, all, I want to get the information of the professors (name,title) from the following link: "http://www.economics.utoronto.ca/index.php/index/person/faculty/" Ideally, I'd like to have a...
4
2368
by: Falcolas | last post by:
Does anybody know of a decent HTML parser for Jython? I have to do some screen scraping, and would rather use a tested module instead of rolling my own. Thanks! GP
11
4512
by: Tim Arnold | last post by:
hi, I've got lots of xhtml pages that need to be fed to MS HTML Workshop to create CHM files. That application really hates xhtml, so I need to convert self-ending tags (e.g. <br />) to plain html...
5
7304
by: Johannes Bauer | last post by:
Hello group, I'm trying to use a htmllib.HTMLParser derivate class to parse a website which I fetched via httplib.HTTPConnection().request().getresponse().read(). Now the problem is: As soon as...
2
3595
by: Felipe De Bene | last post by:
I'm having problems parsing an HTML file with the following syntax : <TABLE cellspacing=0 cellpadding=0 ALIGN=CENTER BORDER=1 width='100%'> <TH BGCOLOR='#c0c0c0' Width='3%'>User ID</TH> <TH...
0
7207
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7093
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7291
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
1
7012
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7468
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
1
5023
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4690
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3180
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
3171
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.