473,396 Members | 2,009 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Manipulate HTML documents via data structure


Python provides HTML parsing through the
HTMLParser and htmllib modules.

For my application, I needed to search through
an HTML document in a nonlinear fashion and
dynamically change parts of the document. The
most logical way to do this is to translate HTML
back and forth to a data structure.

I wrote a module called htmldata, available from:

http://oregonstate.edu/~barnesc/htmldata/

Example:
from htmldata import dumps, loads
o=loads('<img src=hi.gif alt="blah">foo</body>')
o [('img', {'src':'hi.gif', 'alt':'blah'}), 'foo',
('/body', {})] dumps(o)

'<img alt="blah" src="hi.gif">foo</body>'

Pros:
* More powerful for HTML editing.
* Easy to reproduce the original document (at least,
a document that is HTML-equiv to the original).

Cons:
* Less user friendly than HTMLParser module.

I tested it on several popular sites. Feedback, bug
reports, etc appreciated.

- Connelly Barnes

__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!
http://promotions.yahoo.com/new_mail
Jul 18 '05 #1
0 1044

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: C. Barnes | last post by:
htmldata 1.0.4 is available. http://oregonstate.edu/~barnesc/htmldata/ The htmldata module allows one to translate HTML documents back and forth to list data structures. This allows for...
3
by: dayzman | last post by:
Hi, I've read somewhere that feature-based analysis can be used to extract the semantic structure of HTML documents. By semantic structure, they mean the model of the rendered view a reader...
18
by: pkassianidis | last post by:
Hello everybody, I am in the process of writing my very first web application in Python, and I need a way to generate dynamic HTML pages with data from a database. I have to say I am...
2
by: Steve Hershoff | last post by:
Hi everyone, I'm going to manipulate some XML files in my next project-- crawl through the tags, compare if file X has the same nodes as file Y, and if not, what the differences are and who's...
4
by: etuncer | last post by:
Hello All, I have Access 2003, and am trying to build a database for my small company. I want to be able to create a word document based on the data entered through a form. the real question is...
1
by: Ron | last post by:
I would like to write a small program to manipulate this XML: http://www.keepitsimplekid.com/xml/Ad00304.xml What I want the program to do. it would look at all.xml documents in a directory...
8
by: irek.szczesniak | last post by:
Hi, I have table pairs that I need to compare, and produce another table that shows differences. I can't just open them in separate browser and look for differences, because I have many such...
3
by: super.raddish | last post by:
Greetings, I am relatively new to, what I would call, advanced XSLT/XPath and I am after some advice from those in the know. I am attempting to figure out a mechanism within XSLT to compare the...
1
by: cnixuser | last post by:
Hello, I am having a problem that I believe is related to the way a stream reader object looks for a text file by default. What I am doing is using a StreamReader object to read the text of a text...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.