472,358 Members | 1,991 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,358 software developers and data experts.

XML parser that sorts elements?

Hi everyone,

I am a total newbie to XML parsing. I've written a couple of toy
examples under the instruction of tutorials available on the web.

The problem I want to solve is this. I have an XML snippet (in a
string) that looks like this:

<booga foo="1" bar="2">
<well>hello</well>
<blah>goodbye</blah>
</booga>

and I want to alphabetize not only the attributes of an element, but I
also want to alphabetize the elements in the same scope:

<booga bar="2" foo="1">
<blah>goodbye</blah>
<well>hello</well>
</booga>

I've found a "Canonizer" class, that subclasses saxlib.HandlerBase, and
played around with it and vaguely understand what it's doing. But what
I get out of it is

<booga bar="2" foo="1">
<well>hello</well>
<blah>goodbye</blah>
</booga>

in other words it sorts the attributes of each element, but doesn't
touch the order of the elements.

How can I sort the elements? I think I want to subclass the parser, to
present the elements to the content handler in different order, but I
couldn't immediately find any examples of the parser being subclassed.

Thanks for any pointers!
--JMike

Sep 22 '06 #1
6 5200
jm***@alum.mit.edu wrote:
Hi everyone,

I am a total newbie to XML parsing. I've written a couple of toy
examples under the instruction of tutorials available on the web.

The problem I want to solve is this. I have an XML snippet (in a
string) that looks like this:

<booga foo="1" bar="2">
<well>hello</well>
<blah>goodbye</blah>
</booga>

and I want to alphabetize not only the attributes of an element, but I
also want to alphabetize the elements in the same scope:

<booga bar="2" foo="1">
<blah>goodbye</blah>
<well>hello</well>
</booga>

I've found a "Canonizer" class, that subclasses saxlib.HandlerBase, and
played around with it and vaguely understand what it's doing. But what
I get out of it is

<booga bar="2" foo="1">
<well>hello</well>
<blah>goodbye</blah>
</booga>

in other words it sorts the attributes of each element, but doesn't
touch the order of the elements.

How can I sort the elements? I think I want to subclass the parser, to
present the elements to the content handler in different order, but I
couldn't immediately find any examples of the parser being subclassed.
You can sort them by obtaining them as tree of nodes, e.g. using element
tree or minidom.

But you should be aware that this will change the structure of your document
and it isn't always desirable to do so - e.g. html pages would look funny
to say the least if sorted in that way.

Diez
Sep 22 '06 #2

Diez B. Roggisch wrote:
You can sort them by obtaining them as tree of nodes, e.g. using element
tree or minidom.

But you should be aware that this will change the structure of your document
and it isn't always desirable to do so - e.g. html pages would look funny
to say the least if sorted in that way.

Diez
In this particular case, I need to sort the elements, and the specific
application I'm testing guarantees that the order of the elements "in
the same scope" (this may not be the right term in XML semantics, but
it's what I know how to say) does not matter. That probably means that
the specific application I'm testing is not using XML in a standard
way, but so be it.

I'm looking at minidom now and I think maybe there's enough
documentation there that I can get a handle on it and do what I need to
do. Thanks. (But if anyone else has a specific example I can crib
from, that'd be great.)

--JMike

Sep 22 '06 #3
<jm***@alum.mit.eduwrote in message
news:11**********************@m73g2000cwd.googlegr oups.com...
Hi everyone,

I am a total newbie to XML parsing. I've written a couple of toy
examples under the instruction of tutorials available on the web.

The problem I want to solve is this. I have an XML snippet (in a
string) that looks like this:

<booga foo="1" bar="2">
<well>hello</well>
<blah>goodbye</blah>
</booga>

and I want to alphabetize not only the attributes of an element, but I
also want to alphabetize the elements in the same scope:

<booga bar="2" foo="1">
<blah>goodbye</blah>
<well>hello</well>
</booga>

I've found a "Canonizer" class, that subclasses saxlib.HandlerBase, and
played around with it and vaguely understand what it's doing. But what
I get out of it is

<booga bar="2" foo="1">
<well>hello</well>
<blah>goodbye</blah>
</booga>

in other words it sorts the attributes of each element, but doesn't
touch the order of the elements.

How can I sort the elements? I think I want to subclass the parser, to
present the elements to the content handler in different order, but I
couldn't immediately find any examples of the parser being subclassed.
I suspect that Canonizer doesn't sort nested elements because some schemas
require elements to be in a particular order, and not necessarily an
alphabetical one.

Here is a snippet from an interactive Python session, working with the
"batteries included" xml.dom.minidom. The solution is not necessarily in
the parser, it may be instead in what you do with the parsed document
object.

This is not a solution to your actual problem, but I hope it gives you
enough to work with to find your own solution.

HTH,
-- Paul

>>xmlsrc = """<booga foo="1" bar="2">
.... <well>hello</well>
.... <blah>goodbye</blah>
.... </booga>
.... """
>>import xml.dom.minidom
doc = xml.dom.minidom.parseString(xmlsrc)
doc.childNodes
[<DOM Element: booga at 0x9e8508>]
>>print doc.toprettyxml()
<?xml version="1.0" ?>
<booga bar="2" foo="1">
<well>
hello
</well>
<blah>
goodbye
</blah>
</booga>
>>[n.nodeName for n in doc.childNodes]
[u'booga']
>>[n.nodeName for n in doc.childNodes[0].childNodes]
['#text', u'well', '#text', u'blah', '#text']
>>[n.nodeName for n in doc.childNodes[0].childNodes if n.nodeType ==
doc.ELEMENT_NODE]
[u'well', u'blah']
>>doc.childNodes[0].childNodes =
sorted(doc.childNodes[0].childNodes,key=lambda n:n.nodeName)
print doc.toprettyxml()
<?xml version="1.0" ?>
<booga bar="2" foo="1">


<blah>
goodbye
</blah>
<well>
hello
</well>
</booga>
>>doc.childNodes[0].childNodes = sorted([n for n in
doc.childNodes[0].childNodes if n.nodeType ==
doc.ELEMENT_NODE],key=lambda n:n.nodeName)
print doc.toprettyxml()
<?xml version="1.0" ?>
<booga bar="2" foo="1">
<blah>
goodbye
</blah>
<well>
hello
</well>
</booga>
>>>

Sep 22 '06 #4

Paul McGuire wrote:

....
Here is a snippet from an interactive Python session, working with the
"batteries included" xml.dom.minidom. The solution is not necessarily in
the parser, it may be instead in what you do with the parsed document
object.

This is not a solution to your actual problem, but I hope it gives you
enough to work with to find your own solution.

HTH,
-- Paul
Whoa. Outstanding. Excellent. Thank you!
--JMike

Sep 22 '06 #5
"Paul McGuire" <pt***@austin.rr._bogus_.comwrote in message
news:_O*****************@tornado.texas.rr.com...
<jm***@alum.mit.eduwrote in message
news:11**********************@m73g2000cwd.googlegr oups.com...
<snip>
>
This is what I posted, but it's not what I typed. I entered some very long
lines at the console, and the newsgroup software, when wrapping the text,
prefixed it with '>>>', not '...'. So this looks like something that wont
run.
>>>doc.childNodes[0].childNodes = sorted([n for n in
doc.childNodes[0].childNodes if n.nodeType ==
doc.ELEMENT_NODE],key=lambda n:n.nodeName)
print doc.toprettyxml()
<?xml version="1.0" ?>
<booga bar="2" foo="1">
<blah>
goodbye
</blah>
<well>
hello
</well>
</booga>
>>>>
Here's the console session, with '...' continuation lines:
>>xmlsrc = """<booga foo="1" bar="2">
.... <well>hello</well>
.... <blah>goodbye</blah>
.... </booga>
.... """
>>import xml.dom.minidom
doc = xml.dom.minidom.parseString(xmlsrc)
print doc.toprettyxml()
<?xml version="1.0" ?>
<booga bar="2" foo="1">
<well>
hello
</well>
<blah>
goodbye
</blah>
</booga>
>>[n.nodeName for n in doc.childNodes]
[u'booga']
>>[n.nodeName for n in doc.childNodes[0].childNodes]
['#text', u'well', '#text', u'blah', '#text']
>>[n.nodeName for n in doc.childNodes[0].childNodes
.... if n.nodeType == doc.ELEMENT_NODE]
[u'well', u'blah']
>>doc.childNodes[0].childNodes = sorted(
.... doc.childNodes[0].childNodes,key=lambda n:n.nodeName)
>>[n.nodeName for n in doc.childNodes[0].childNodes
.... if n.nodeType == doc.ELEMENT_NODE]
[u'blah', u'well']
>>print doc.toprettyxml()
<?xml version="1.0" ?>
<booga bar="2" foo="1">


<blah>
goodbye
</blah>
<well>
hello
</well>
</booga>
>>doc.childNodes[0].childNodes = sorted(
.... [n for n in doc.childNodes[0].childNodes
.... if n.nodeType==doc.ELEMENT_NODE],
.... key=lambda n:n.nodeName)
>>print doc.toprettyxml()
<?xml version="1.0" ?>
<booga bar="2" foo="1">
<blah>
goodbye
</blah>
<well>
hello
</well>
</booga>
>>>


Sep 22 '06 #6
Paul McGuire wrote:
>doc.childNodes[0].childNodes = sorted(
... [n for n in doc.childNodes[0].childNodes
... if n.nodeType==doc.ELEMENT_NODE],
... key=lambda n:n.nodeName)
>print doc.toprettyxml()
<?xml version="1.0" ?>
<booga bar="2" foo="1">
<blah>
goodbye
</blah>
<well>
hello
</well>
</booga>
My requirements changed a bit, so now I'm sorting second level elements
by their values of a specific attribute (where the specific attribute
can be chosen). But the solution is still mainly what you posted here.
It was just a matter of supplying a different function for 'key'.
It's up and running live now and all is well. Thanks again!

(A bonus side effect of this is that it let me sneak "sorted()" into
our test infrastructure, which gave me reason to get our IT guys to
upgrade a mismash of surprisingly old Python versions up to Python 2.5
everywhere.)

--JMike

Sep 28 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: juppie | last post by:
Hello, I'm writing visual XML editor. At the moment I need a DTD parser which will parse DTD (either standalone or internal to XML doc) and give me an access to underlying DTD in convenient...
4
by: annoyingmouse2002 | last post by:
Hi there, sorry if this a long post but I'm really just starting out. I've been using MSXML to parse an OWL but would like to use a different solution. Basically it reads the OWL (Based on XML)...
6
by: Jan Danielsson | last post by:
Hello all, I guess this is a question for people who have written a parser. Does an XML parser ever need to be recursive? I mean like: &fo&bar;o; I know this particular example is in the...
3
by: cr88192 | last post by:
for various reasons, I added an imo ugly hack to my xml parser. basically, I wanted the ability to have binary payload within the xml parse trees. this was partly because I came up with a binary...
5
by: qqcq6s59 | last post by:
Hi all I am a newbie and I just saw a ongoing thread on Fileprocessing which talks abt config parser. I have writen many pyhton program to parse many kind of text files by using string module and...
4
by: mitsura | last post by:
Hi, I think I ran into a bug in the XML SAX parser. part of my program consist of reading a rather large XML file (about 10Mb) containing a few thousand elements. I have the following...
2
by: Alex | last post by:
Hello all, As I don't have any experience with XML parsers, I'm looking for your advice in this group. In the project I'm involved with, we are dealing with parsing of XML messages that are...
28
by: Marc Gravell | last post by:
In Linq, you can apparently get a meaningful body from and expression's .ToString(); random question - does anybody know if linq also includes a parser? It just seemed it might be a handy way to...
4
by: fbrewster | last post by:
I'm writing an HTML parser and would like to use Internet Explorers DOM parser. Can I use Internet Explorers DOM parser through a web service? thanks for the help
2
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and efficiency. While initially associated with cryptocurrencies...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge required to effectively administer and manage Oracle...
0
by: AndyPSV | last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and...
0
by: Arjunsri | last post by:
I have a Redshift database that I need to use as an import data source. I have configured the DSN connection using the server, port, database, and credentials and received a successful connection...
1
by: Matthew3360 | last post by:
Hi, I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web server and have made sure to enable curl. I get a...
0
by: Carina712 | last post by:
Setting background colors for Excel documents can help to improve the visual appeal of the document and make it easier to read and understand. Background colors can be used to highlight important...
0
BLUEPANDA
by: BLUEPANDA | last post by:
At BluePanda Dev, we're passionate about building high-quality software and sharing our knowledge with the community. That's why we've created a SaaS starter kit that's not only easy to use but also...
0
by: Rahul1995seven | last post by:
Introduction: In the realm of programming languages, Python has emerged as a powerhouse. With its simplicity, versatility, and robustness, Python has gained popularity among beginners and experts...
0
by: Ricardo de Mila | last post by:
Dear people, good afternoon... I have a form in msAccess with lots of controls and a specific routine must be triggered if the mouse_down event happens in any control. Than I need to discover what...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.