473,394 Members | 1,841 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

ElementTree surprise

I have a doc with a bunch of fields like:

<foo bar="spam">stuff</foo>
<foo bar="penguin">other stuff</foo>

and sometimes

<foo bar="parrot"></foo>

I use ElementTree to parse the doc and I use the .text attribute
to get "stuff" or "other stuff" in the spam and penguin examples.

I'd expect .text to be the empty string in the parrot example, but
instead it is None.

I can fix my script to deal with this, but it's surprising. Is it
intentional? I could understand it being None if the doc had said

<foo bar="parrot"/>

but that is different.

Disclaimer: I'm not even slightly an XML expert, I just find myself
having to deal with a lot of it.
Aug 16 '07 #1
4 1306
Hallöchen!

Paul Rubin writes:
I have a doc with a bunch of fields like:

<foo bar="spam">stuff</foo>
<foo bar="penguin">other stuff</foo>

and sometimes

<foo bar="parrot"></foo>

I use ElementTree to parse the doc and I use the .text attribute
to get "stuff" or "other stuff" in the spam and penguin examples.

I'd expect .text to be the empty string in the parrot example, but
instead it is None.
Technically, text is nodes as all other element nodes. In the
parrot example, there is no empty textnode but no textnode at all.
I can fix my script to deal with this, but it's surprising. Is it
intentional? I could understand it being None if the doc had said

<foo bar="parrot"/>

but that is different.
<foo bar="parrot"/and <foo bar="parrot"></fooare mapped to the
same thing by any XML parser, and I think it wouldn't be standards
conforming if an XML parser would pass this difference to a caller.
Disclaimer: I'm not even slightly an XML expert, I just find myself
having to deal with a lot of it.
I think the question is how XMLish the access via ElementTree should
be. While it is in principle correct that there is no text node in
parrot, it may be sensible to set it to "" for practical reasons.
As far as I can see, there is no empty text node in XML, so no
ambiguity would occur.

Tschö,
Torsten.

--
Torsten Bronger, aquisgrana, europa vetus
Jabber ID: br*****@jabber.org
(See http://ime.webhop.org for ICQ, MSN, etc.)
Aug 16 '07 #2
Torsten Bronger <br*****@physik.rwth-aachen.dewrites:
<foo bar="parrot"></foo>
Technically, text is nodes as all other element nodes. In the
parrot example, there is no empty textnode but no textnode at all.
That is required by the xml standard? If yes, elementtree is doing
the right thing, but it surprises me, I would have expected an empty
string. Thanks.
Aug 16 '07 #3
Paul Rubin wrote:
Torsten Bronger <br*****@physik.rwth-aachen.dewrites:
>> <foo bar="parrot"></foo>
Technically, text is nodes as all other element nodes. In the
parrot example, there is no empty textnode but no textnode at all.

That is required by the xml standard? If yes, elementtree is doing
the right thing, but it surprises me, I would have expected an empty
string. Thanks.
The XML standard defines both as being equivalent, so any XML parser would
handle them exactly the same. Also, as most XML parsers have a SAX(-like)
interface, which always generates events in the "<foo></foo>" form, there is
not even a way for applications or libraries to distinguish between the two.

So it's not even an ElementTree thing. ET just doesn't know what exactly was
in the original XML byte stream. A very simple way to make sure you always get
a string back is
>>text = element.text or ""
BTW, you'd be even more surprised to see that ET can actually /store/ "" as
text if you tell it to, and then returns an empty string when you ask for the
..text property. But any empty text coming from the parser will always be None.

Oh, and lxml.etree behaves exactly the same as ElementTree here. :)

Stefan
Aug 16 '07 #4
Stefan Behnel <st******************@web.dewrites:
So it's not even an ElementTree thing. ET just doesn't know what
exactly was in the original XML byte stream. A very simple way to
make sure you always get a string back is
>>text = element.text or ""
Thanks, I ended up doing something like that. What I wondered about
the standard was whether it specified that parrot had no text node,
as opposed to having an empty text node. I guess it doesn't matter,
it just caught me by surprise.
Aug 16 '07 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Stewart Midwinter | last post by:
I want to parse a file with ElementTree. My file has the following format: <!-- file population.xml --> <?xml version='1.0' encoding='utf-8'?> <population> <person><name="joe" sex="male"...
1
by: Greg Wilson | last post by:
I'm trying to convert from minidom to ElementTree for handling XML, and am having trouble with entities in DTDs. My Python script looks like this: ...
1
by: mirandacascade | last post by:
O/S: Windows 2K Vsn of Python: 2.4 Currently: 1) Folder structure: \workarea\ <- ElementTree files reside here \xml\ \dom\
15
by: Steven Bethard | last post by:
I'm having trouble using elementtree with an XML file that has some gbk-encoded text. (I can't read Chinese, so I'm taking their word for it that it's gbk-encoded.) I always have trouble with...
0
by: Greg Aumann | last post by:
I am trying to write some python code for a library that reads an XML-like language from a file into elementtree data structures. Then I want to be able to read and/or modify the structure and then...
2
by: mirandacascade | last post by:
Situation is this: 1) I have inherited some python code that accepts a string object, the contents of which is an XML document, and produces a data structure that represents some of the content of...
5
by: saif.shakeel | last post by:
#!/usr/bin/env python from elementtree import ElementTree as Element tree = et.parse("testxml.xml") for t in tree.getiterator("SERVICEPARAMETER"): if t.get("Semantics") == "localId":...
1
by: Mike Slinn | last post by:
The following short Python program parses a KML file and displays the names of all Marks and Routes: from elementtree.ElementTree import ElementTree tree = ElementTree(file='test.kml') kml =...
3
by: gray.bowman | last post by:
I'm messing around with trying to write an xml file using xml.etree.ElementTree. All the examples on the internet show the use of ElementTree.write(), although when I try to use it it's not...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.