469,602 Members | 1,838 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,602 developers. It's quick & easy.

xml minidom redundant children??

Great guys:

As a newbie, I'm trying to simply parse a xml file using minidom, but
I don't know why I get some extra children(?). I don't know what is
wrong in xml file, but I've tried different xml files, still same
problem.

************************************************** ****************************
xml file (fileTest) looks like:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<afc xmlns="http://python.org/:aaa" xmlns:afc="http://
python.org/:foo">
<afc:Bibliography>
<File version="2.0.0.0" publicationDate="2007-02-16
11:23:06+01:00" />
<Revision version="2" />
<Application version="02.00.00" />
</afc:Bibliography>
</afc>
************************************************** ****************************
Python file looks like:
from xml.dom import minidom
doc = minidom.parse(fileTest)
a= doc.documentElement.childNodes
print a
print '--------------'
for item in a:
print item.nodeName
************************************************** ****************************
And output is:
[<DOM Text node "\n">, <DOM Element: afc:Bibliography at 12082960>,
<DOM Text node "\n">]
--------------
#text
afc:Bibliography
#text
************************************************** ****************************

My question is why this <DOM Text node "\n"or #text has been
created and how to get rid of them by changing python code? (here I'm
not interested to change xml file.)

Have search the forum without finding any solution :-(

Thank you to all in advance!!
/Ben

Mar 1 '07 #1
4 1870
In <11**********************@v33g2000cwv.googlegroups .com>, bkamrani
wrote:
************************************************** ****************************
xml file (fileTest) looks like:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<afc xmlns="http://python.org/:aaa" xmlns:afc="http://
python.org/:foo">
<afc:Bibliography>
<File version="2.0.0.0" publicationDate="2007-02-16
11:23:06+01:00" />
<Revision version="2" />
<Application version="02.00.00" />
</afc:Bibliography>
</afc>
************************************************** ****************************
Python file looks like:
from xml.dom import minidom
doc = minidom.parse(fileTest)
a= doc.documentElement.childNodes
print a
print '--------------'
for item in a:
print item.nodeName
************************************************** ****************************
And output is:
[<DOM Text node "\n">, <DOM Element: afc:Bibliography at 12082960>,
<DOM Text node "\n">]
--------------
#text
afc:Bibliography
#text
************************************************** ****************************

My question is why this <DOM Text node "\n"or #text has been
created and how to get rid of them by changing python code? (here I'm
not interested to change xml file.)
They have been created because the text is in the XML source. Line breaks
are valid text.

Ciao,
Marc 'BlackJack' Rintsch
Mar 1 '07 #2
bk******@gmail.com schrieb:
Great guys:

As a newbie, I'm trying to simply parse a xml file using minidom, but
I don't know why I get some extra children(?). I don't know what is
wrong in xml file, but I've tried different xml files, still same
problem.

************************************************** ****************************
xml file (fileTest) looks like:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<afc xmlns="http://python.org/:aaa" xmlns:afc="http://
python.org/:foo">
<afc:Bibliography>
<File version="2.0.0.0" publicationDate="2007-02-16
11:23:06+01:00" />
<Revision version="2" />
<Application version="02.00.00" />
</afc:Bibliography>
</afc>
************************************************** ****************************
Python file looks like:
from xml.dom import minidom
doc = minidom.parse(fileTest)
a= doc.documentElement.childNodes
print a
print '--------------'
for item in a:
print item.nodeName
************************************************** ****************************
And output is:
[<DOM Text node "\n">, <DOM Element: afc:Bibliography at 12082960>,
<DOM Text node "\n">]
--------------
#text
afc:Bibliography
#text
************************************************** ****************************

My question is why this <DOM Text node "\n"or #text has been
created and how to get rid of them by changing python code? (here I'm
not interested to change xml file.)

Have search the forum without finding any solution :-(
You can't get rid of them by itself - xml.minidom can't possibly know if
whitespace is of any significance for you or not.

There are several ways to deal with this. If you have to stay in
minidom, just loop over the children and discard all whitespace-only
text-nodes, before really processing the document.

But the better alternative would be to use a better API for processing
XML. Use one of the several ElementTree implementations, such as lxml:

http://codespeak.net/lxml/

This will not rid you of the whitespace itself, but represents text
differently so that you can focus on elements without intespersed
text-nodes.

Diez
Mar 1 '07 #3
On Mar 1, 12:46 pm, bkamr...@gmail.com wrote:
As a newbie, I'm trying to simply parse a xml file using minidom, but
I don't know why I get some extra children(?). I don't know what is
wrong in xml file, but I've tried different xml files, still same
problem.
Most simply, if you need to stick with xml.dom.minidom; just check the
nodeType and make sure its not 3 (textNode):

from xml.dom import minidom
doc = minidom.parse(fileTest)
for item in doc.documentElement.childNodes:
if not item.nodeType == 3:
print item.nodeName

Regards,
Jordan

Mar 1 '07 #4
En Thu, 01 Mar 2007 15:46:53 -0300, <bk******@gmail.comescribió:
As a newbie, I'm trying to simply parse a xml file using minidom, but
I don't know why I get some extra children(?). I don't know what is
wrong in xml file, but I've tried different xml files, still same
problem.
If you don't have to use exactly xml.dom.minidom, try using ElementTree
http://www.effbot.org/zone/element-index.htm (it's already included with
Python 2.5, for earlier versions you have to download and install it).
It's a lot easier and clean if you are mostly concerned about the infoset
rather than its representation.

--
Gabriel Genellina

Mar 2 '07 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

reply views Thread by xtian | last post: by
5 posts views Thread by Alexandre | last post: by
4 posts views Thread by Skip Montanaro | last post: by
5 posts views Thread by Skip Montanaro | last post: by
5 posts views Thread by Mike McGavin | last post: by
4 posts views Thread by webdev | last post: by
2 posts views Thread by Marco | last post: by
reply views Thread by Gary | last post: by
reply views Thread by guiromero | last post: by
reply views Thread by devrayhaan | last post: by
reply views Thread by gheharukoh7 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.