xml minidom redundant children??

bkamrani

Great guys:

As a newbie, I'm trying to simply parse a xml file using minidom, but
I don't know why I get some extra children(?). I don't know what is
wrong in xml file, but I've tried different xml files, still same
problem.

************************************************** ****************************
xml file (fileTest) looks like:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<afc xmlns="http://python.org/:aaa" xmlns:afc="http://
python.org/:foo">
<afc:Bibliography>
<File version="2.0.0.0" publicationDate="2007-02-16
11:23:06+01:00" />
<Revision version="2" />
<Application version="02.00.00" />
</afc:Bibliography>
</afc>
************************************************** ****************************
Python file looks like:
from xml.dom import minidom
doc = minidom.parse(fileTest)
a= doc.documentElement.childNodes
print a
print '--------------'
for item in a:
print item.nodeName
************************************************** ****************************
And output is:
[<DOM Text node "\n">, <DOM Element: afc:Bibliography at 12082960>,
<DOM Text node "\n">]
--------------
#text
afc:Bibliography
#text
************************************************** ****************************

My question is why this <DOM Text node "\n"or #text has been
created and how to get rid of them by changing python code? (here I'm
not interested to change xml file.)

Have search the forum without finding any solution :-(

Thank you to all in advance!!
/Ben

Mar 1 '07 #1

Subscribe Post Reply

1996

Marc 'BlackJack' Rintsch

In <11**********************@v33g2000cwv.googlegroups .com>, bkamrani
wrote:

************************************************** ****************************
xml file (fileTest) looks like:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<afc xmlns="http://python.org/:aaa" xmlns:afc="http://
python.org/:foo">
<afc:Bibliography>
<File version="2.0.0.0" publicationDate="2007-02-16
11:23:06+01:00" />
<Revision version="2" />
<Application version="02.00.00" />
</afc:Bibliography>
</afc>
************************************************** ****************************
Python file looks like:
from xml.dom import minidom
doc = minidom.parse(fileTest)
a= doc.documentElement.childNodes
print a
print '--------------'
for item in a:
print item.nodeName
************************************************** ****************************
And output is:
[<DOM Text node "\n">, <DOM Element: afc:Bibliography at 12082960>,
<DOM Text node "\n">]
--------------
#text
afc:Bibliography
#text
************************************************** ****************************

My question is why this <DOM Text node "\n"or #text has been
created and how to get rid of them by changing python code? (here I'm
not interested to change xml file.)

They have been created because the text is in the XML source. Line breaks
are valid text.

Ciao,
Marc 'BlackJack' Rintsch

Mar 1 '07 #2

Diez B. Roggisch

bk******@gmail.com schrieb:

Great guys:

As a newbie, I'm trying to simply parse a xml file using minidom, but
I don't know why I get some extra children(?). I don't know what is
wrong in xml file, but I've tried different xml files, still same
problem.

************************************************** ****************************
xml file (fileTest) looks like:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<afc xmlns="http://python.org/:aaa" xmlns:afc="http://
python.org/:foo">
<afc:Bibliography>
<File version="2.0.0.0" publicationDate="2007-02-16
11:23:06+01:00" />
<Revision version="2" />
<Application version="02.00.00" />
</afc:Bibliography>
</afc>
************************************************** ****************************
Python file looks like:
from xml.dom import minidom
doc = minidom.parse(fileTest)
a= doc.documentElement.childNodes
print a
print '--------------'
for item in a:
print item.nodeName
************************************************** ****************************
And output is:
[<DOM Text node "\n">, <DOM Element: afc:Bibliography at 12082960>,
<DOM Text node "\n">]
--------------
#text
afc:Bibliography
#text
************************************************** ****************************

My question is why this <DOM Text node "\n"or #text has been
created and how to get rid of them by changing python code? (here I'm
not interested to change xml file.)

Have search the forum without finding any solution :-(

You can't get rid of them by itself - xml.minidom can't possibly know if
whitespace is of any significance for you or not.

There are several ways to deal with this. If you have to stay in
minidom, just loop over the children and discard all whitespace-only
text-nodes, before really processing the document.

But the better alternative would be to use a better API for processing
XML. Use one of the several ElementTree implementations, such as lxml:

http://codespeak.net/lxml/

This will not rid you of the whitespace itself, but represents text
differently so that you can focus on elements without intespersed
text-nodes.

Diez

Mar 1 '07 #3

MonkeeSage

On Mar 1, 12:46 pm, bkamr...@gmail.com wrote:

As a newbie, I'm trying to simply parse a xml file using minidom, but
I don't know why I get some extra children(?). I don't know what is
wrong in xml file, but I've tried different xml files, still same
problem.

Most simply, if you need to stick with xml.dom.minidom; just check the
nodeType and make sure its not 3 (textNode):

from xml.dom import minidom
doc = minidom.parse(fileTest)
for item in doc.documentElement.childNodes:
if not item.nodeType == 3:
print item.nodeName

Regards,
Jordan

Mar 1 '07 #4

Gabriel Genellina

En Thu, 01 Mar 2007 15:46:53 -0300, <bk******@gmail.comescribió:

As a newbie, I'm trying to simply parse a xml file using minidom, but
I don't know why I get some extra children(?). I don't know what is
wrong in xml file, but I've tried different xml files, still same
problem.

If you don't have to use exactly xml.dom.minidom, try using ElementTree
http://www.effbot.org/zone/element-index.htm (it's already included with
Python 2.5, for earlier versions you have to download and install it).
It's a lot easier and clean if you are mostly concerned about the infoset
rather than its representation.

--
Gabriel Genellina

Mar 2 '07 #5

Similar topics

minidom questions

by: xtian | last post by:

Hi - I'm doing some data conversion with minidom (turning a csv file into a specific xml format), and I've hit a couple of small problems. 1: The output format has a header with some xml that...

Python

xml.dom.minidom -> nextElement ?

by: Alexandre | last post by:

Hello all, Could someone explain to me why there is no nextElement in minidom ? if i execute this : *************************************** import xml.dom.minidom doc = """\ <root>...

Python

xml.dom.minidom - documentElement vs. childNodes

by: Skip Montanaro | last post by:

I'm getting somewhat painfully acquainted with xml.dom.minidom. What is the relationship between its documentElement attribute and its childNodes list? I thought XML documents consisted of a...

Python

Comparing two minidom objects

by: Skip Montanaro | last post by:

I'd like to compare two xml.dom.minidom objects, but the naive attempt fails: >>> import xml.dom.minidom >>> d1 = xml.dom.minidom.parse("ES.xml") >>> d2 = xml.dom.minidom.parse("ES.xml") >>> d1...

Python

Problem parsing namespaces with xml.dom.minidom

by: Mike McGavin | last post by:

Hi everyone. I've been trying for several hours now to get minidom to parse namespaces properly from my stream of XML, so that I can use DOM methods such as getElementsByTagNameNS(). For some...

Python

minidom xml & non ascii / unicode & files

by: webdev | last post by:

lo all, some of the questions i'll ask below have most certainly been discussed already, i just hope someone's kind enough to answer them again to help me out.. so i started a python 2.3...

Python

minidom appendChild confusion

by: Marco | last post by:

Hello! Can anyone explain why the following code does not work? (I'm using python2.4.) Cheers, Marco --

Python

xml.dom.minidom: how to preserve CRLF's inside CDATA?

by: sim.sim | last post by:

Hi all. i'm faced to trouble using minidom: #i have a string (xml) within CDATA section, and the section includes "\r\n": iInStr = '<?xml version="1.0"?>\n<Data><!]></Data>\n' #After i...

Python

Empty string namespace on XP in minidom

by: Gary | last post by:

Howdy I ran into a difference between Python on Windows XP and Linux Fedora 6. Writing a dom to xml with minidom works on Linux. It gives an error on XP if there is an empty namespace. The...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server