473,325 Members | 2,671 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,325 software developers and data experts.

What XML lib to use?

I'm confused, I want to read/write XML files but I don't really understand
what library to use.

I've used DOM-based libraries in other languages, is PyXML the library to
use?

Sep 13 '05 #1
19 3115
You can try xml.dom and xml.sax. Both are inbuilt libraries with Python
standard package. You can read and write xml files with these very
easily. There are number of third party modules for Python that
manipulate XML. But the above are the basic ones.

Sep 13 '05 #2
You can try xml.dom and xml.sax. Both are inbuilt libraries with Python
standard package. You can read and write xml files with these very
easily. There are number of third party modules for Python that
manipulate XML. But the above are the basic ones.

Sep 13 '05 #3
On Tue, 13 Sep 2005 19:23:50 +0200
Kalle Anke <sk*****@gmail.com> wrote:
I'm confused, I want to read/write XML files but I don't really
understand what library to use.

I've used DOM-based libraries in other languages, is PyXML the
library to use?


PyXML will do the job. I'm currently using it in one of my projects.
4suite has their cDomlette also, which provides a high-speed
lightweight DOM implementation. Although, if you don't need
canonicalization, XPath, or those kinds of extensions, you can probably
get by with minidom and the other code included in the standard Python
distribution, and avoid the need to install additional libraries.

I have also heard excellent things about ElementTree; I haven't used it
myself though (largely because I can't find any resources on doing XML
canonicalization with it).

-Michael


Sep 13 '05 #4
me******@scl.ameslab.gov wrote:
I have also heard excellent things about ElementTree; I haven't used it
myself though (largely because I can't find any resources on doing XML
canonicalization with it).


ElementTree/cElementTree is really easy to use and Pythonic.
--
Michael Hoffman
Sep 13 '05 #5
"""
I'm confused, I want to read/write XML files but I don't really
understand
what library to use.

I've used DOM-based libraries in other languages, is PyXML the library
to
use?
"""

There are many options (some say too many):

http://www.xml.com/pub/a/2004/10/13/py-xml.html

Try out Amara Bindery, if you like:

http://uche.ogbuji.net/tech/4suite/amara/

Browsing the manual should let you know whether you like the API:

http://uche.ogbuji.net/tech/4suite/amara/manual

BTW, lots on Python/XML processing covered in my column, including
other options besides Amara:

http://www.xml.com/pub/at/24

--
Uche
http://copia.ogbuji.net

Sep 13 '05 #6
me******@scl.ameslab.gov wrote:
I have also heard excellent things about ElementTree; I haven't used it
myself though (largely because I can't find any resources on doing XML
canonicalization with it).


You can use lxml which is an implementation of the ElementTree API using
libxml2 and libxslt under the covers for greater standards compliance
including c14n. I've been using extensively recently and highly
recommend it.

http://codespeak.net/lxml

--
Robert Kern
rk***@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Sep 14 '05 #7
Kalle Anke <sk*****@gmail.com> writes:
I'm confused, I want to read/write XML files but I don't really understand
what library to use.

I've used DOM-based libraries in other languages, is PyXML the library to
use?


It depends. Like there's no best car - "best" is very dependant on use of the
vehicle concerned in addition to personal preferences - there's no best XML
module either. Some seem very well in many respects, though :)

I recommend using EffBot's ElementTree. It's very simple to use (you get to do
stuff without thinking delicacies of parsing/generating), and it is
_fast_. Now let me repeat the last part - normally speed is of no concern
with the computers we have nowadays, but using eg. xml.minidom to process
files of size > 10 MB, your system might get very sluggish unless you are
quite careful in traversing the parse tree (and maybe even then).

Using a SAX / full-compliant DOM parser could be good for learning things,
though. As I said, depends a lot.

--
# Edvard Majakari Software Engineer
# PGP PUBLIC KEY available Soli Deo Gloria!

$_ = '456476617264204d616a616b6172692c20612043687269737 469616e20'; print
join('',map{chr hex}(split/(\w{2})/)),uc substr(crypt(60281449,'es'),2,4),"\n";
Sep 14 '05 #8
Kalle Anke wrote:
I've used DOM-based libraries in other languages, is PyXML the library to
use?


I would start off with minidom; a tutorial I once wrote can be found
here:

http://www.boddie.org.uk/python/XML_intro.html

That should demonstrate some minor differences between PyXML-style DOMs
and those for languages like Java. Should you need a faster DOM
implementation, you might want to look at libxml2dom:

http://www.boddie.org.uk/python/libxml2dom.html

It's a pure Python module that uses the lower levels of libxml2's own
Python bindings, so if you already have libxml2 plus bindings
installed, it should be very convenient. Although libxml2dom isn't by
any means complete, I do use it myself and would welcome any feedback
which would make it better.

Paul

Sep 14 '05 #9
One more vote for Amara! I think it's unmatched for ease of use, if you
already know Python.

Ron

Sep 14 '05 #10
Edvard Majakari wrote:
Using a SAX / full-compliant DOM parser could be good for learning things,
though. As I said, depends a lot.


since there are no *sane* reasons to use SAX or DOM in Python, that's mainly
a job security issue...

</F>

Sep 14 '05 #11
Fredrik Lundh wrote:
since there are no *sane* reasons to use SAX or DOM in Python, that's mainly
a job security issue...


While I doubt that anyone would really recommend exclusive DOM API
usage for significant XML processing tasks (or for anything other than
educational purposes), I think you're overstating some case or other
here. Interoperability is a pretty sane argument for using DOM-based
technologies, whether that be skills interoperability (possibly related
to job security) or just using many different technologies together.
For example, PyQt and PyKDE expose various DOMs of the purest
"non-Pythonic" kind; Mozilla exposes DOMs for XML and HTML; adding a
layer of PyXML varnish to any of these isn't a huge job. Using
different technologies with the same foundations shouldn't have to
involve breaking open yet another API for the "fun" of it.

Paul

Sep 14 '05 #12
Paul Boddie wrote:
For example, PyQt and PyKDE expose various DOMs of the purest
"non-Pythonic" kind; Mozilla exposes DOMs for XML and HTML


I didn't see anything about manipulating an application's internal
data structures in the original post, but I might have missed some-
thing.

For stand-alone XML manipulation in Python, my point still stands:
programs using DOM and SAX are more bloated and slower than
the alternatives.

</F>

Sep 14 '05 #13
Fredrik Lundh wrote:
since there are no *sane* reasons to use SAX or DOM in Python, that's mainly
a job security issue...


I can see two reasons (sane or not):
- You're familiar with those APIs and use them in e.g. C++.
- You don't want to rely on third party libraries unless you must.

In many cases, xml.dom.minidom etc will do fine...

Having said this, I must admit that I much prefer Fredrik's ElementTree.

Although Fredrik (did you know that?) had a part in Carmen's move to
Python around five years ago, ElementTree isn't installed by default
on our machines, and I think our CM people are happier if we use as
few third party libraries as possible...
Sep 14 '05 #14
Fredrik Lundh wrote:
Paul Boddie wrote:
[On interoperability]
For example, PyQt and PyKDE expose various DOMs of the purest
"non-Pythonic" kind; Mozilla exposes DOMs for XML and HTML


I didn't see anything about manipulating an application's internal
data structures in the original post, but I might have missed some-
thing.


Well, manipulating documents in Mozilla and KHTML are just examples, as
I pointed out, and whilst I'd agree that the in-process restrictions
Mozilla appears to place on full participants in its component system
does kind of mean that the Mozilla DOM is an "application's internal
data structure", the opportunities are more open for KHTML in that one
isn't limited to just automating some application. Moreover, the XML
APIs exposed by PyQt are also available for general XML processing,
whether you regard them as performant or not.
For stand-alone XML manipulation in Python, my point still stands:
programs using DOM and SAX are more bloated and slower than
the alternatives.


Your point was that "there are no *sane* reasons to use SAX or DOM in
Python", which actually isn't true. Sure, processing a 100GB XML
document using the DOM isn't a sensible strategy (with this generation
of hardware!), and SAX isn't necessarily the most elegant way of
expressing the processing logic, and other tools and APIs exist to
perform such tasks more efficiently and elegantly, but then "to
read/write XML files" leaves the questioner's field of endeavour pretty
much open to interpretation. Somewhere amongst the many fields of
endeavour there are places where the DOM (whilst not as "Pythonic" as
some might like) certainly is a valid choice, possibly because it's the
only choice - all thanks to interoperability, as I said. ;-)

Paul

Sep 14 '05 #15
Paul Boddie wrote:
For stand-alone XML manipulation in Python, my point still stands:
programs using DOM and SAX are more bloated and slower than
the alternatives.


Your point was that "there are no *sane* reasons to use SAX or DOM in
Python", which actually isn't true.


I replied in the context of this thread. If you chose to ignore the
context, or if you're not capable of reading and understanding the
posts you're replying to, that's your problem.

</F>

Sep 14 '05 #16
Fredrik Lundh wrote:
Paul Boddie wrote:
For stand-alone XML manipulation in Python, my point still stands:
programs using DOM and SAX are more bloated and slower than
the alternatives.


Your point was that "there are no *sane* reasons to use SAX or DOM in
Python", which actually isn't true.


I replied in the context of this thread. If you chose to ignore the
context, or if you're not capable of reading and understanding the
posts you're replying to, that's your problem.


His interpretation of your words is a perfectly valid one even in the
context of this thread. "in Python" explicitly provides a context for
the rest of the sentence. In English, at least, it is perfectly
reasonable to presume that explicit contexts override implicit ones.

--
Robert Kern
rk***@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Sep 14 '05 #17
Robert Kern wrote:
His interpretation of your words is a perfectly valid one even in the
context of this thread. "in Python" explicitly provides a context for
the rest of the sentence.
Exactly. "in Python", not "in an application with an existing API".

(also, if the OP had been forced to use an existing API by external
constraints, don't you think he would have mentioned it?)
In English, at least, it is perfectly reasonable to presume that explicit
contexts override implicit ones.


Letting a part of a sentence override the context of the discussion is
perhaps popular in certain tabloid journalist circles, and among slash-
dot editors and US political bloggers, but most people do in fact have
a context buffer memory that can hold more than a few words. (how
come you're so sure I wasn't talking about, say, the Python Lisp com-
piler? or the Monty Python sketch with the sadistic Belgian instrument-
making monk? or a Harry Potter book?)

I know what I meant. You know what I meant. Paul knows what I
meant. If you still want to play the "but there is a way to interpret
this in another way" game, file a bug report against the python.org
"what is python?" summary page.

</F>

Sep 15 '05 #18
Fredrik Lundh wrote:
Robert Kern wrote:
His interpretation of your words is a perfectly valid one even in the
context of this thread. "in Python" explicitly provides a context for
the rest of the sentence.
Exactly. "in Python", not "in an application with an existing API".


Well, if you're still not convinced that DOMs exist outside monolithic
applications... ;-)
In English, at least, it is perfectly reasonable to presume that explicit
contexts override implicit ones.


Letting a part of a sentence override the context of the discussion is
perhaps popular in certain tabloid journalist circles, and among slash-
dot editors and US political bloggers, but most people do in fact have
a context buffer memory that can hold more than a few words.


I don't really see how an absolutely-qualified complete sentence...

Q> since there are no *sane* reasons to use SAX or DOM in Python,
Q> that's mainly a job security issue...

....can somehow be qualified by the preceding discussion, when the only
ambiguous context is "that" == "good for learning things". It's an
absolute statement of opinion! (And yes, I think we all agree on what
"Python" is.)
I know what I meant. You know what I meant. Paul knows what I
meant.


I actually do know what you mean, but that doesn't mean that the
statement in question wasn't misleading, especially to people who
aren't familiar with or accustomed to discovering this missing context.
It's like saying "there are no *sane* reasons to drive a Volvo",
possibly in a follow-up to a discussion about how bad Volvos are
compared to Saabs. There may well be a sane reason to drive a Volvo,
but the statement doesn't allow for the possibility, unless in the hunt
for the missing context you're willing to take the term "significant
whitespace" to a whole new level.

Paul

Sep 15 '05 #19
Fredrik Lundh wrote:
Edvard Majakari wrote:
Using a SAX / full-compliant DOM parser could be good for learning
things, though. As I said, depends a lot.


since there are no *sane* reasons to use SAX or DOM in Python, that's
mainly a job security issue...

One sane reason is that ElementTree is not part of the standard library. There
are cases where you write a simple python script of 400 lines and you want it
to stay single-file. While ElementTree is very easy to distribute (for basic
features it's just a single file), it still won't fit some scenarios.

So, why did it not make it to the standard library yet, given that it's so much
better than the alternatives?
--
Giovanni Bajo
Sep 19 '05 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: thecrow | last post by:
Alright, what the hell is going on here? In the following code, I expect the printed result to be: DEBUG: frank's last name is burns. Instead, what I get is: DEBUG: frank's last name is...
220
by: Brandon J. Van Every | last post by:
What's better about Ruby than Python? I'm sure there's something. What is it? This is not a troll. I'm language shopping and I want people's answers. I don't know beans about Ruby or have...
699
by: mike420 | last post by:
I think everyone who used Python will agree that its syntax is the best thing going for it. It is very readable and easy for everyone to learn. But, Python does not a have very good macro...
92
by: Reed L. O'Brien | last post by:
I see rotor was removed for 2.4 and the docs say use an AES module provided separately... Is there a standard module that works alike or an AES module that works alike but with better encryption?...
137
by: Philippe C. Martin | last post by:
I apologize in advance for launching this post but I might get enlightment somehow (PS: I am _very_ agnostic ;-). - 1) I do not consider my intelligence/education above average - 2) I am very...
12
by: Dario | last post by:
The following simple program behaves differently in Windows and Linux . #include <stdexcept> #include <iostream> #include <string> using namespace std; class LogicError : public logic_error {...
125
by: Sarah Tanembaum | last post by:
Beside its an opensource and supported by community, what's the fundamental differences between PostgreSQL and those high-price commercial database (and some are bloated such as Oracle) from...
47
by: Neal | last post by:
Patrick Griffiths weighs in on the CSS vs table layout debate in his blog entry "Tables my ass" - http://www.htmldog.com/ptg/archives/000049.php . A quite good article.
121
by: typingcat | last post by:
First of all, I'm an Asian and I need to input Japanese, Korean and so on. I've tried many PHP IDEs today, but almost non of them supported Unicode (UTF-8) file. I've found that the only Unicode...
8
by: Midnight Java Junkie | last post by:
Dear Colleagues: I feel that the dumbest questions are those that are never asked. I have been given the opportunity to get into .NET. Our organization has a subscription with Microsoft that...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.