473,378 Members | 1,134 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

convert html

Hi:

I want to convert html to xml.

I am doing this:

from xml.dom.ext.reader import HtmlLib
from xml.dom import ext, Node
from xml.dom.NodeFilter import NodeFilter

def main( argv ):
# build a DOM tree from the html
reader = HtmlLib.Reader()
dom_object = reader.fromUri( sys.argv[1] )

info = getTableInfo( dom_object, 9 )

reader.releaseNode( dom_object );

if __name__ == "__main__":
main( sys.argv )

This takes almost a minute on a 6000 line html file on a PIII 700 Mhz 256 RAM. This is too slow.

Can you suggest another way of doing this in Python?

Jul 18 '05 #1
1 1677

<je*******@rogers.com> wrote in message news:ma*************************************@pytho n.org...
I want to convert html to xml.

I am doing this: .... Can you suggest another way of doing this in Python?


I haven't benchmarked but I would imagine using HTML Tidy
(or µTidylib) is as good as any, particularly if your HTML source
is a bit rough.
Jul 18 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Matt | last post by:
I want to use XML to store a document's configurations. And I can convert to different file format by using XSL. For example, convert to HTML, PDF, or RTF. But the contents are all stored in single...
4
by: Dominic | last post by:
Hi guys, In .NET, how can I convert a HTML file to TIFF efficiently? One possible way is that I can first use word automatation to load the HTML up and then print it to TIFF. Is that right?...
3
by: hunterb | last post by:
I have a file which has no BOM and contains mostly single byte chars. There are numerous double byte chars (Japanese) which appear throughout. I need to take the resulting Unicode and store it in a...
3
by: iwdu15 | last post by:
hi, how can i convert rtf encoding to HTML? i looked at previous posts and i cant seem to find the HTTPUtility in vbexpress 2005. Is it not there, and is there another way? thanks -- -iwdu15
5
by: melickas | last post by:
We designed a custom application using Office Developer Tools '97 which included a Run-time version of Access--- so it would not matter if our customer even had any version of Access on their...
6
by: PenguinPig | last post by:
Dear All Experts I would like to know how to convert a HTML into Image using C#. Or allow me contains HTML code (parsed) in Image? I also tried this way but it just display the character "<" &...
2
by: csgraham74 | last post by:
Hi, I have a requirement in work that i give a person the ability to create a html document using a richt text editor. What i then want to do is save the HTML doct to my server & insert...
4
by: csgraham74 | last post by:
Hi, Ive posted on this previously but had no response. Basically i need to build some html using a rich text editor. Then i want to actually create an html document from this and save it to my...
4
by: perryclisbee via AccessMonster.com | last post by:
I have dates of service for several people that range all over each month. ie: patient had dates of service of: 7/3/2006, 7/24/2006 and 7/25/2006. I need to create a new field via a query that...
5
by: Just Another Victim of the Ambient Morality | last post by:
I've done a google search on this but, amazingly, I'm the first guy to ever need this! Everyone else seems to need the reverse of this. Actually, I did find some people who complained about this...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.