473,498 Members | 1,776 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

python xml dom help please

Apologies if this post appears more than once.

The file -

---------------
<?xml version="1.0" encoding="utf-8"?>
<Game><A/><B/><C/></Game>
---------------

is processed by this program -

---------------
#!/usr/bin/env python

from xml.dom.ext.reader import PyExpat
from xml.dom.ext import PrettyPrint

import sys

def deepen(nodeList):
for node in nodeList:
print(node.nodeName)
if node.previousSibling != None:
if node.previousSibling.nodeType == node.ELEMENT_NODE:
if node.previousSibling.hasChildNodes():
print("has children")
node.previousSibling.lastChild.appendChild(node)
else:
node.previousSibling.appendChild(node)
deepen(node.childNodes)

# get DOM object
reader = PyExpat.Reader()
doc = reader.fromUri(sys.argv[1])

# call func
deepen(doc.childNodes)

# display altered document
PrettyPrint(doc)
---------------

which outputs the following -

---------------
Game
Game
A
B
<?xml version='1.0' encoding='UTF-8'?>
<Game>
<A>
<B/>
</A>
<C/>
</Game>

---------------

Can anybody explain why the line 'print(node.nodeName)' never prints 'C'?

Also, why 'has children' is never printed?

I am trying to output

---------------
<?xml version='1.0' encoding='UTF-8'?>
<Game>
<A>
<B>
<C/>
</B>
</A>
</Game>
---------------

I know there are easier ways to do this, but i want to do it using dom.

Thanks in advance.
Jul 18 '05 #1
11 4438
Without having any thorough look at your (recursive)'deepen' function, I can
see there's no termination condition for the recursion....
So that's one reason this won't work the way you want it to.

Miklós
deglog <sp***********@ntlworld.com> wrote in message
news:f7*************************@posting.google.co m...

---------------
#!/usr/bin/env python

from xml.dom.ext.reader import PyExpat
from xml.dom.ext import PrettyPrint

import sys

def deepen(nodeList):
for node in nodeList:
print(node.nodeName)
if node.previousSibling != None:
if node.previousSibling.nodeType == node.ELEMENT_NODE:
if node.previousSibling.hasChildNodes():
print("has children")
node.previousSibling.lastChild.appendChild(node)
else:
node.previousSibling.appendChild(node)
deepen(node.childNodes)


Jul 18 '05 #2
Miklós wrote:
Without having any thorough look at your (recursive)'deepen' function, I
can see there's no termination condition for the recursion....
So that's one reason this won't work the way you want it to.


Nope - he has a termination condition. deepen is called for all childNodes,
so he makes a traversal of all nodes.

Regards,

Diez
Jul 18 '05 #3
Hi,

Also, why 'has children' is never printed?
The code is somewhat complicated, however the reason for "has children" not
beeing printed is simply that for the example no node matches the condition
- nodes A,B,C are the only ones with siblings, and none of them has a child
node....
I know there are easier ways to do this, but i want to do it using dom.


I'm not sure what easier ways _you_ think of - but to me it looks like a
classic field for XSLT, which is much more convenient to deal with. DOM is
usually PIA, don't mess around with it if you're not forced to.

Diez

Jul 18 '05 #4
sp***********@ntlworld.com (deglog) wrote:
def deepen(nodeList):
for node in nodeList:
[...]
node.previousSibling.appendChild(node)


Bzzt: destructive iteration gotcha.

DOM NodeLists are 'live': when you move a child Element out of the parent,
it no longer exists in the childNodes list. So in the example:

<a/>
<b/>
<c/>

the first element (a) cannot be moved and is skipped; the second element (b)
is moved into its previousSibling (a); the third element... wait, there is no
third element any more because (c) is now the second element. So the loop
stops.

A solution would be to make a static copy of the list beforehand. There's no
standard-DOM way of doing that and the Python copy() method is not guaranteed
to work here, so use a list comprehension or map:

identity= lambda x: x
for node in map(identity, nodeList):
...

--
Andrew Clover
mailto:an*@doxdesk.com
http://www.doxdesk.com/
Jul 18 '05 #5
an********@doxdesk.com (Andrew Clover) writes:
sp***********@ntlworld.com (deglog) wrote: [...] A solution would be to make a static copy of the list beforehand. There's no
standard-DOM way of doing that and the Python copy() method is not guaranteed
to work here, so use a list comprehension or map:

identity= lambda x: x
for node in map(identity, nodeList):
...


Why not just

for node in list(nodeList):
...

?
John
Jul 18 '05 #6
Thanks for the help - this works and i understand how, and why.

jj*@pobox.com (John J. Lee) wrote in message news:<87************@pobox.com>...

Why not just

for node in list(nodeList):
...

?
John


the following also works (as i intended):

from xml.dom.NodeFilter import NodeFilter

def appendToDescendant(node):
walker.previousSibling()
while 1:
if walker.currentNode.hasChildNodes():
next = walker.nextNode()
else: break
walker.currentNode.appendChild(node)

walker = doc.createTreeWalker(doc.documentElement,NodeFilte r.SHOW_ELEMENT,
None, 0)
while 1:
print walker.currentNode.nodeName
if walker.currentNode.previousSibling != None:
print "ps "+walker.currentNode.previousSibling.nodeName
if walker.currentNode.previousSibling.nodeName != "Game":
if walker.currentNode.previousSibling.hasChildNodes() :
appendToDescendant(walker.currentNode)
else:
walker.currentNode.previousSibling.appendChild(wal ker.currentNode)
next = walker.nextNode()
if next is None: break

Strangely, the line checking "Game" is needed, because this firstnode
is its own previous sibling - how can this be right?

for example with the input file:
---
<?xml version="1.0" encoding="utf-8"?>
<Game/>
---
the ouptput is:
---
Game
ps Game
<?xml version='1.0' encoding='UTF-8'?>
<Game/>
Jul 18 '05 #7
John J. Lee <jj*@pobox.com> wrote:
Why not just for node in list(nodeList)?
You're right! I never trusted list() to make a copy if it was already a
native list (as it is sometimes in eg. minidom) but, bothering to check the
docs, it is guaranteed to after all. Hurrah.

sp***********@ntlworld.com (deglog) wrote:
def appendToDescendant(node):
walker.previousSibling()
while 1:
if walker.currentNode.hasChildNodes():
next = walker.nextNode()
else: break
walker.currentNode.appendChild(node)
Are you sure this is doing what you want? A TreeWalker's nextNode() method
goes to an node's next matching sibling, not into its children. To go into
the matching children you'd use TreeWalker.firstChild().

The function as written above appends the argument node to the first sibling
to have no child nodes, starting from the TreeWalker's current node or its
previous sibling if there is one.

I'm not wholly sure I understand the problem you're trying to solve. If you
just want to nest sibling elements as first children, you could do it without
Traversal or recursion, for example:

def nestChildrenIntoFirstElements(parent):
elements= [c for c in parent.childNodes if c.nodeType==c.ELEMENT_NODE]
if len(elements)>=2:
insertionPoint= elements[0]
for element in elements[1:]:
insertionPoint.appendChild(element)
insertionPoint= element

(Untested but no reason it shouldn't work.)
Strangely, the line checking "Game" is needed, because this firstnode
is its own previous sibling - how can this be right?


4DOM is fooling you. It has inserted a <!DOCTYPE> declaration automatically
for you. (It probably shouldn't do that.) So the previous sibling of the
documentElement is the doctype; of course the doctype has the same nodeName
as the documentElement, so the debugging output is misleading.

--
Andrew Clover
mailto:an*@doxdesk.com
http://www.doxdesk.com/
Jul 18 '05 #8
an********@doxdesk.com (Andrew Clover) wrote in message news:<2c**************************@posting.google. com>...
def appendToDescendant(node):
walker.previousSibling()
while 1:
if walker.currentNode.hasChildNodes():
next = walker.nextNode()
else: break
walker.currentNode.appendChild(node)
Are you sure this is doing what you want? A TreeWalker's nextNode() method
goes to an node's next matching sibling, not into its children. To go into
the matching children you'd use TreeWalker.firstChild().


right

I'm not wholly sure I understand the problem you're trying to solve.


actually i'm trying to change the relationship 'is next sibling of' to
'is child of' throughout a document

my latest idea is to go to the end of the document, then walk it
backwards (for christmas?:-) towards this end i wrote:
---
walker = doc.createTreeWalker(doc.documentElement,NodeFilte r.SHOW_ELEMENT,
None, 0)
while 1:
print '1 '+walker.currentNode.nodeName
next = walker.nextNode()
if next is None: break
print '2 '+walker.currentNode.nodeName
---
which, given
---
<?xml version="1.0" encoding="utf-8"?>
<Game><A/></Game>

---
outputs
---
1 Game
1 A
2 Game
---
foiled again. How come the current node is back at the start atfter
the loop has finished?
Jul 18 '05 #9
sp***********@ntlworld.com (deglog) wrote:
actually i'm trying to change the relationship 'is next sibling of' to
'is child of' throughout a document
Well, the snippet in the posting above should do that well enough. What
happens to any existing nested children is not defined.
How come the current node is back at the start atfter the loop has finished?


Bug. I've just submitted a patch to the PyXML tracker to address this issue.

(Note: earlier versions of TreeWalker - certainly 0.8.0 - have more significant
bugs, that can lead to infinite recursion.)

That said, I'm not sure how using a TreeWalker or walking backwards actually
helps you here! If you are just using it to filter out non-element children,
remember that moving the current node takes the position of the TreeWalker
with it. It's not like NodeIterator.

--
Andrew Clover
mailto:an*@doxdesk.com
http://www.doxdesk.com/
Jul 18 '05 #10
an********@doxdesk.com (Andrew Clover) wrote in message news:<2c**************************@posting.google. com>...

Bug. I've just submitted a patch to the PyXML tracker to address this issue.

(Note: earlier versions of TreeWalker - certainly 0.8.0 - have more significant
bugs, that can lead to infinite recursion.)


Thanks.

Does the function def __regress(self) from the same package need a similar fix?

(i am using PyXml 0.8.3)
Jul 18 '05 #11
sp***********@ntlworld.com (deglog) wrote:
Does the function def __regress(self) from the same package need a similar
fix?


Nope, looks OK to me. There's no 'in between' state where the current node
ends up pointing somewhere it shouldn't in this one, because of the different
order of the next/previous-sibling step and the move-through-ancestor/descendant
step.

I haven't checked all of the rest of the code, though, so I can't guarantee
there aren't any other problems with 4DOM's Traversal/Range implementation.

--
Andrew Clover
mailto:an*@doxdesk.com
http://www.doxdesk.com/
Jul 18 '05 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

112
13720
by: mystilleef | last post by:
Hello, What is the Pythonic way of implementing getters and setters. I've heard people say the use of accessors is not Pythonic. But why? And what is the alternative? I refrain from using them...
12
2980
by: adamurbas | last post by:
ya so im pretty much a newb to this whole python thing... its pretty cool but i just started today and im already having trouble. i started to use a tutorial that i found somewhere and i followed...
0
7167
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7208
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
7379
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5464
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
4593
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3095
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
1423
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
657
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
292
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.