473,378 Members | 1,146 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

high level, fast XML package for Python?

I searched online, but couldn't really find a standard package for
working with Python and XML -- everybody seems to suggest different
ones.

Is there a standard xml package for Python? Preferably high-level, fast
and that can parse in-file, not in-memory since I have to deal with
potentially MBs of data.

Thanks.

Sep 15 '06 #1
11 1826
Gleb Rybkin wrote:
I searched online, but couldn't really find a standard package for
working with Python and XML -- everybody seems to suggest different
ones.

Is there a standard xml package for Python? Preferably high-level, fast
and that can parse in-file, not in-memory since I have to deal with
potentially MBs of data.
cElementTree and lxml (which is API-compatible to the former). cElementTree
has an incremental parser, which allows for lager-than-memory-files to be
processed.

Diez
Sep 15 '06 #2
Diez B. Roggisch wrote:
Gleb Rybkin wrote:
>I searched online, but couldn't really find a standard package for
working with Python and XML -- everybody seems to suggest different
ones.

Is there a standard xml package for Python? Preferably high-level, fast
and that can parse in-file, not in-memory since I have to deal with
potentially MBs of data.

cElementTree and lxml (which is API-compatible to the former). cElementTree
has an incremental parser, which allows for lager-than-memory-files to be
processed.
In Python 2.5, cElementTree and ElementTree will be available in the
standard library as xml.etree.cElementTree and xml.etree.ElementTree.
So learning them now is a great idea.

STeVe
Sep 15 '06 #3
Okay, thanks!

Steven Bethard wrote:
Diez B. Roggisch wrote:
Gleb Rybkin wrote:
I searched online, but couldn't really find a standard package for
working with Python and XML -- everybody seems to suggest different
ones.

Is there a standard xml package for Python? Preferably high-level, fast
and that can parse in-file, not in-memory since I have to deal with
potentially MBs of data.
cElementTree and lxml (which is API-compatible to the former). cElementTree
has an incremental parser, which allows for lager-than-memory-files to be
processed.

In Python 2.5, cElementTree and ElementTree will be available in the
standard library as xml.etree.cElementTree and xml.etree.ElementTree.
So learning them now is a great idea.

STeVe
Sep 15 '06 #4
Hi Gleb,

Gleb Rybkin wrote:
I searched online, but couldn't really find a standard package for
working with Python and XML -- everybody seems to suggest different
ones.

Is there a standard xml package for Python? Preferably high-level, fast
and that can parse in-file, not in-memory since I have to deal with
potentially MBs of data.

Thanks.
Another option is Amara; also quite high-level and also allows for
incremental parsing. I would say Amara is somewhat higher level than
ElementTree since it allows you to access your XML nodes as Python
objects (with some extra attributes and some minor warts), as well as
giving you XPath expressions on the object tree.

URL:

http://uche.ogbuji.net/tech/4suite/amara/

Best version currently available is version 1.1.7

It does work together with py2exe on windows if the need ever arises
for you but you have to fiddle a bit with it (ask for details on this
list if you ever need to do that)

Cheers,

--Tim

Sep 15 '06 #5
Tim N. van der Leeuw wrote:
Another option is Amara; also quite high-level and also allows for
incremental parsing. I would say Amara is somewhat higher level than
ElementTree since it allows you to access your XML nodes as Python
objects (with some extra attributes and some minor warts), as well as
giving you XPath expressions on the object tree.
Then you should definitely give lxml.objectify a try. It combines the ET API
with the lxml set of features (XPath, RelaxNG, XSLT, ...) and hides the actual
XML behind a Python object interface. That gives you everything at the same time.

http://codespeak.net/lxml/objectify.html

It's part of the lxml distribution:
http://codespeak.net/lxml/

Stefan
Sep 16 '06 #6
Steven Bethard <st************@gmail.comwrites:
[...]
In Python 2.5, cElementTree and ElementTree will be available in the
standard library as xml.etree.cElementTree and
xml.etree.ElementTree. So learning them now is a great idea.
Only some of the original ElementTree software is going into 2.5,
apparently. So you can get more on the effbot.org site than you get
from just downloading Python 2.5. Probably future Python releases
will add more of Fredrik's XML code.
John
Sep 17 '06 #7
Gleb Rybkin schrieb:
I searched online, but couldn't really find a standard package for
working with Python and XML -- everybody seems to suggest different
ones.

Is there a standard xml package for Python? Preferably high-level, fast
and that can parse in-file, not in-memory since I have to deal with
potentially MBs of data.
It seems that everybody is proposing libraries that use in-memory
representations. There is a standard xml package for Python, it's
called "xml" (and comes with the standard library). It contains a
SAX interface, xml.sax, which can parse files incrementally.

Regards,
Martin
Sep 17 '06 #8
Martin v. Löwis wrote:
Gleb Rybkin schrieb:
>I searched online, but couldn't really find a standard package for
working with Python and XML -- everybody seems to suggest different
ones.

Is there a standard xml package for Python? Preferably high-level, fast
and that can parse in-file, not in-memory since I have to deal with
potentially MBs of data.

It seems that everybody is proposing libraries that use in-memory
representations. There is a standard xml package for Python, it's
called "xml" (and comes with the standard library). It contains a
SAX interface, xml.sax, which can parse files incrementally.
To use ElementTree and keep your memory consumption down, consider using
the iterparse function:

http://effbot.org/zone/element-iterparse.htm

Then you can get more SAX-like memory consumption while still enjoying
the high-level interface of ElementTree.

STeVe
Sep 17 '06 #9
Martin v. Löwis wrote:
>
It seems that everybody is proposing libraries that use in-memory
representations. There is a standard xml package for Python, it's
called "xml" (and comes with the standard library). It contains a
SAX interface, xml.sax, which can parse files incrementally.
What about xml.dom.pulldom? It quite possibly resembles ElementTree's
iterparse, or at least promotes event-style handling of XML information
using some kind of mainloop...

import xml.dom.pulldom

for etype, node in xml.dom.pulldom.parseString(s):
if etype == xml.dom.pulldom.START_ELEMENT:
print node.nodeName, node.attributes

....instead of callbacks (as happens with SAX):

import xml.sax

class CH(xml.sax.ContentHandler):
def startElement(self, name, attrs):
print name, attrs

xml.sax.parseString(s, CH())

Paul

Sep 19 '06 #10
Martin v. Löwis wrote:
>Is there a standard xml package for Python? Preferably high-level, fast
and that can parse in-file, not in-memory since I have to deal with
potentially MBs of data.

It seems that everybody is proposing libraries that use in-memory
representations. There is a standard xml package for Python, it's
called "xml" (and comes with the standard library). It contains a
SAX interface, xml.sax, which can parse files incrementally.
note that the requirements included "high-level" and "fast"; sax is
low-level, error-prone, and once you've finally fixed all the remaining
bugs in your state machine, not that fast, really.

</F>

Sep 20 '06 #11
Paul Boddie schrieb:
>It seems that everybody is proposing libraries that use in-memory
representations. There is a standard xml package for Python, it's
called "xml" (and comes with the standard library). It contains a
SAX interface, xml.sax, which can parse files incrementally.

What about xml.dom.pulldom? It quite possibly resembles ElementTree's
iterparse, or at least promotes event-style handling of XML information
using some kind of mainloop...
Right; that also meets the criteria of being standard and not
in-memory (nobody had mentioned it so far).

Whether it is high-level and fast is in the eyes of the beholder
(as they are relative, rather than absolute properties).

Regards,
Martin
Sep 20 '06 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

75
by: Howard Nease | last post by:
Hello, everyone. I would appreciate any advice that someone could give me on my future career path. Here is my situation: I am a bright Junior in a very well-respected private high school, taking...
7
by: Irmen de Jong | last post by:
Hi, Things like Twisted, medusa, etc.... that claim to be able to support hundreds of concurrent connections because of the async I/O framework they're based on.... can someone give a few...
4
by: Jeremy Sanders | last post by:
Hi - I'd like to write a program which basically does a few snmpgets. I haven't been able to find a python package which gives you a nice high-level and simple way of doing this (like PHP has)....
8
by: York | last post by:
Hi, R language has very high-level IO functions, its read.table can read a total .csv file and recogonize the types of each column. write.table can do the reverse. R's MySQL interface has...
32
by: Robin Becker | last post by:
Is there some smart/fast way to flatten a level one list using the latest iterator/generator idioms. The problem arises in coneverting lists of (x,y) coordinates into a single list of...
4
by: Alexis Gallagher | last post by:
(I tried to post this yesterday but I think my ISP ate it. Apologies if this is a double-post.) Is it possible to do very fast string processing in python? My bioinformatics application needs to...
2
by: Jay Loden | last post by:
All, In studying Python, I have predictably run across quite a bit of talk about the GIL and threading in Python. As my day job, I work with a (mostly Java) application that is heavily threaded....
3
by: Steve | last post by:
I want to ready binary data from a udp socket effeciently as possible in python. I know of the struct package but do people have any tips when dealing with binary data in python? Is there a...
1
by: sanjupommen | last post by:
I am in the process of exploring the possibility of providing our products on databases other than Oracle.I am able to migrate the data, procedures etc without too much effort (latest version of DB2...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.