473,418 Members | 2,028 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,418 software developers and data experts.

package similar to XML::Simple

Hi,
does anyone know of a Python package that
is able to load XML like the XML::Simple
Perl package does?

For those that don't know it, this package
maps the XML file to a dictionary.

Of course I can build such a package myself
but it would be better if it already exists :)

--
Paulo Pinto

Jul 18 '05 #1
13 2257
I'm using pyRXP, and it's great.
It's using one tuple, not dictionnaries.
Very very fast.
By the way I'm just starting using this package, anybody met any
problems with pyRXP?

-- Pierre
On Wed, 2004-01-28 at 09:53, Paulo Pinto wrote:
Hi,
does anyone know of a Python package that
is able to load XML like the XML::Simple
Perl package does?

For those that don't know it, this package
maps the XML file to a dictionary.

Of course I can build such a package myself
but it would be better if it already exists :)

--
Paulo Pinto

Jul 18 '05 #2
Paulo Pinto
does anyone know of a Python package that
is able to load XML like the XML::Simple
Perl package does?


Good to ask! I know of at least 3 packages that do sth. similiar.

- Fredrik Lundhs elementtree
- D. Merzs gnosis xml utilities
- handyxml

just google for them.
Jul 18 '05 #3
Paulo Pinto wrote:

does anyone know of a Python package that
is able to load XML like the XML::Simple
Perl package does?

For those that don't know it, this package
maps the XML file to a dictionary.
A simple dictionary is insufficient to represent XML in general,
so perhaps you're talking about a subset of XML, maybe with no
attributes, and where the order of the child elements doesn't
matter? Or something else?

Or do you really mean something like a multiply-nested
dictionary, perhaps with lists as well?
Of course I can build such a package myself
but it would be better if it already exists :)


We were able to build something similar by stripping down
Fredrik Lundh's elementtree until we had little more than the
calls to the expat parser (i.e. we used his source as a tutorial
on using expat :-), so if this is something like the XML-subset
I mention above, you could do it in an hour or so from scratch
if you knew Python well.

-Peter
Jul 18 '05 #4
I mean multiple nested dictionaries with lists.

But handyxml seems to solve my problem.

Thanks, guys

Peter Hansen wrote:
Paulo Pinto wrote:
does anyone know of a Python package that
is able to load XML like the XML::Simple
Perl package does?

For those that don't know it, this package
maps the XML file to a dictionary.

A simple dictionary is insufficient to represent XML in general,
so perhaps you're talking about a subset of XML, maybe with no
attributes, and where the order of the child elements doesn't
matter? Or something else?

Or do you really mean something like a multiply-nested
dictionary, perhaps with lists as well?

Of course I can build such a package myself
but it would be better if it already exists :)

We were able to build something similar by stripping down
Fredrik Lundh's elementtree until we had little more than the
calls to the expat parser (i.e. we used his source as a tutorial
on using expat :-), so if this is something like the XML-subset
I mention above, you could do it in an hour or so from scratch
if you knew Python well.

-Peter


Jul 18 '05 #5
Pierre N <pi*****@mac.com> wrote in message news:<ma**************************************@pyt hon.org>...
I'm using pyRXP, and it's great.
It's using one tuple, not dictionnaries.
Very very fast.
By the way I'm just starting using this package, anybody met any
problems with pyRXP?


I did. It's not an XML parser :-(. It does not accept character
entities such as … (the example that bit me), giving meaningless
"error" messages along the lines: "not a valid 8-bit XML character".
If you need an XML parser, use PyRXPU, which comes in ReportLab CVS
only. It is not as fast as PyRXP, but conformant in my testing, and
the point of XML is conformance, not speed at all costs. If you want
speed at all costs, use CSV or some other plain text format.

I'm writing at length about this unfortunate PyRXP situation in my
next ORA python/XML column (expected Weds).

--Uche
http://uche.ogbui.net
Jul 18 '05 #6
Paulo Pinto <pa*********@cern.ch> wrote in message news:<bv**********@sunnews.cern.ch>...
Hi,
does anyone know of a Python package that
is able to load XML like the XML::Simple
Perl package does?

For those that don't know it, this package
maps the XML file to a dictionary.

Of course I can build such a package myself
but it would be better if it already exists :)


FWIW: http://www.xml.com/pub/a/2004/01/14/py-xml.html

--Uche
http://uche.ogbui.net
Jul 18 '05 #7
Uche Ogbuji wrote:

Pierre N <pi*****@mac.com> wrote in message news:<ma**************************************@pyt hon.org>...
I'm using pyRXP, and it's great.
It's using one tuple, not dictionnaries.
Very very fast.
By the way I'm just starting using this package, anybody met any
problems with pyRXP?


I did. It's not an XML parser :-(. It does not accept character
entities such as … (the example that bit me), giving meaningless
"error" messages along the lines: "not a valid 8-bit XML character".
If you need an XML parser, use PyRXPU, which comes in ReportLab CVS
only. It is not as fast as PyRXP, but conformant in my testing, and
the point of XML is conformance, not speed at all costs. If you want
speed at all costs, use CSV or some other plain text format.


Hmm... so it's your opinion that *all* XML parsers must handle *all*
aspects of XML? If not, I think you should back off on the criticism
of PyRXP as being "not an XML parser" and simply point out that it
doesn't handle all aspects of XML because it is intended to provide
a very fast/heavily optimized approach to parsing only certain kinds
of XML. It's a valid choice to do so, though of course if PyRXP is
promoted as a "full" XML solution that might be inaccurate.

-Peter
Jul 18 '05 #8
Peter Hansen wrote:
Hmm... so it's your opinion that *all* XML parsers must handle *all*
aspects of XML? If not, I think you should back off on the criticism
of PyRXP as being "not an XML parser" and simply point out that it
doesn't handle all aspects of XML because it is intended to provide
a very fast/heavily optimized approach to parsing only certain kinds
of XML.
I am not Uche, but I think that all XML parsers should conform to the
XML recommendation (and treat deviations from the XML recommendation
as bugs).

This is not the same as handling all aspects of XML, since the XML
recommendation makes certain aspects optional. Processing character
references is not one of them (but e.g. validation is).
It's a valid choice to do so, though of course if PyRXP is
promoted as a "full" XML solution that might be inaccurate.


Packages may help processing only selected XML documents, and they
may also support documents which are not XML. However, in neither
case, they should call themselves "XML parsers". "XML-like parsers"
or "XML subset parsers" might be more appriate.

Regards,
Martin

Jul 18 '05 #9
Peter Hansen <pe***@engcorp.com> wrote in message news:<40***************@engcorp.com>...
Uche Ogbuji wrote:

Pierre N <pi*****@mac.com> wrote in message news:<ma**************************************@pyt hon.org>...
I'm using pyRXP, and it's great.
It's using one tuple, not dictionnaries.
Very very fast.
By the way I'm just starting using this package, anybody met any
problems with pyRXP?
I did. It's not an XML parser :-(. It does not accept character
entities such as … (the example that bit me), giving meaningless
"error" messages along the lines: "not a valid 8-bit XML character".
If you need an XML parser, use PyRXPU, which comes in ReportLab CVS
only. It is not as fast as PyRXP, but conformant in my testing, and
the point of XML is conformance, not speed at all costs. If you want
speed at all costs, use CSV or some other plain text format.


Hmm... so it's your opinion that *all* XML parsers must handle *all*
aspects of XML?


XML is clear on what a Parser *must* support. The full character
production is one of those things. From XML 1.0, section 2.2:

Character Range
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]

There is no "option" to not support characters greater than #xFF. XML
parsers *can* leave off handling some aspects of XML, external DTD
subsets, for example, but you can not be as fundamentally
non-conformant as PyRXP and still call yourself an XML parser.

This is not just an academic matter. There are a *vast* number of
useful and heavily-used characters of code point higher than U+FF and
if parsers decided on a whim to pick and choose what to support the
result would be complete and utter chaos.

If not, I think you should back off on the criticism
of PyRXP as being "not an XML parser" and simply point out that it
doesn't handle all aspects of XML because it is intended to provide
a very fast/heavily optimized approach to parsing only certain kinds
of XML. It's a valid choice to do so, though of course if PyRXP is
promoted as a "full" XML solution that might be inaccurate.


PyRXP is not an XML parser. It's that simple. I stand by that veru
strong satement, and I'd be surprised if XML expert refusaes to
corroborate it.

I do want to point out that PyRXPU does seem to be a proper XML
parser, and is what people should use instead if they like the
ReportLab products.

Of course if yu don't really need an XML parser, feel free to use
PyRXP. Just don't call it what it isn't.

--Uche
http://uche.ogbuji.net
Jul 18 '05 #10
"Martin v. Löwis" <ma****@v.loewis.de> wrote in message news:<c0*************@news.t-online.com>...
Peter Hansen wrote:
Hmm... so it's your opinion that *all* XML parsers must handle *all*
aspects of XML? If not, I think you should back off on the criticism
of PyRXP as being "not an XML parser" and simply point out that it
doesn't handle all aspects of XML because it is intended to provide
a very fast/heavily optimized approach to parsing only certain kinds
of XML.


I am not Uche, but I think that all XML parsers should conform to the
XML recommendation (and treat deviations from the XML recommendation
as bugs).

This is not the same as handling all aspects of XML, since the XML
recommendation makes certain aspects optional. Processing character
references is not one of them (but e.g. validation is).
It's a valid choice to do so, though of course if PyRXP is
promoted as a "full" XML solution that might be inaccurate.


Packages may help processing only selected XML documents, and they
may also support documents which are not XML. However, in neither
case, they should call themselves "XML parsers". "XML-like parsers"
or "XML subset parsers" might be more appriate.


I wouldn't argue with calling PyRXP an "XML-like parser".

Because until very recently I thought that PyRXP was an XML parser, I
was extremely taken aback when I ran afoul of PyRXP's brazen character
non-conformance. As an example of the danger in this non-conformance,
PyRXP refused to parse the very first well-formed XML document I gave
it. And I'm (mostly) a native English speaker. True XML parsers
strive for interoperability for a reason. Not doing so pretty much
negates the value of XML.

I was even more taken aback to read that the PyRXP developers refused
to make the simple fix needed for conformance. I think it is
essential to point out that a tool that refuses XML conformance cannot
go about calling itself an XML parser.
--Uche
http://uche.ogbuji.net
Jul 18 '05 #11
Uche Ogbuji wrote:

I was even more taken aback to read that the PyRXP developers refused
to make the simple fix needed for conformance.


This is a very relevant data point that was missing in the discussion
until now.

Given that situation, I'd agree that labelling PyRXP simply an "XML parser"
without qualification is misleading and wrong.

-Peter
Jul 18 '05 #12
Paulo Pinto wrote:
does anyone know of a Python package that
is able to load XML like the XML::Simple
Perl package does?


Despite all of the, uh, _discussion_ in this thread, I'd like to thank you
folks for pointing out pyRXP... I hadn't found that before, and if I can
whip up a pyRXP -> DOM2 translator, it will fit my needs _perfectly_.

Thanks!

--
Chris Herborth ch****@cryptocard.com
Documentation Overlord, CRYPTOCard Corp. http://www.cryptocard.com/
Never send a monster to do the work of an evil scientist.
Jul 18 '05 #13
Chris Herborth <ch****@cryptocard.com> wrote in message news:<Wj*******************@news20.bellglobal.com> ...

Despite all of the, uh, _discussion_ in this thread, I'd like to thank you
folks for pointing out pyRXP... I hadn't found that before, and if I can
whip up a pyRXP -> DOM2 translator, it will fit my needs _perfectly_.


Well, if it is true what people claim about dictionaries and tuples
being faster than objects, then you may see any supposed performance
advantage claimed by the PyRXP proponents just dissolve away as you
instantiate all those nodes. But as I noted with respect to "double
wrapping" libxml2, if you can restrict yourself to very few high-level
operations through those layers, and then invoke various "native"
methods directly, then it could still be worth it.

Paul
Jul 18 '05 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Dan | last post by:
Using XML::Simple in perl is extreemly slow to parse big XML files (can be up to 250M, taking ~1h). How can I increase my performance / reduce my memory usage? Is SAX the way forward?
0
by: Randy | last post by:
Is there a dotnet class that formats XML simple types. I'm making a xmldocument which has a timestamp element (among others). The format is: yyyy-MM-ddThh:mm:ss (which looks like...
6
by: Lindy | last post by:
I'm using VB .Net and am brand new to XML. I need to create an XML file with the following lines: <?xml version="1.0" encoding="UTF-8" ?> - <HC_DATA...
1
by: Miguel Manso | last post by:
Hi there, I'm a Perl programmer trying to get into Python. I've been reading some documentation and I've choosed Python has being the "next step" to give. Can you point me out to Python...
1
by: jack | last post by:
Hi all, I am working on perl..and am using XML::Simple to parse a xml document. I've been trying to retrieve character data from tags whose occurance is recursive.. The scenario can be better...
0
by: Marv | last post by:
Is it possible to print the path of all leaf nodes of an XML using XML::Simple This is the kind of output text that i'm trying to print (not the leaf node values but the path to reach them) c:\>...
0
by: JohnLucas | last post by:
Hi all, I have just started working with the XML::Simple module to parse an XML file. I'm trying to pull some values from the file that I need in another program. The problem is that the XML...
4
by: Steven M. O'Neill | last post by:
I have an xml structure like this: <Meta name="fieldAttributes"> <MetaString name="name">SUB_PHONE</MetaString> <MetaString name="value">999999999</MetaString> </Meta> <Meta...
5
by: 0xception | last post by:
Hi, I'm attempting to create a perl script that will modify a series of RRD databases (a couple hundred of them). in order to do this the RRD database can be exported to XML modified and then...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.