By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
439,932 Members | 1,908 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 439,932 IT Pros & Developers. It's quick & easy.

ElementTree surprise

P: n/a
I have a doc with a bunch of fields like:

<foo bar="spam">stuff</foo>
<foo bar="penguin">other stuff</foo>

and sometimes

<foo bar="parrot"></foo>

I use ElementTree to parse the doc and I use the .text attribute
to get "stuff" or "other stuff" in the spam and penguin examples.

I'd expect .text to be the empty string in the parrot example, but
instead it is None.

I can fix my script to deal with this, but it's surprising. Is it
intentional? I could understand it being None if the doc had said

<foo bar="parrot"/>

but that is different.

Disclaimer: I'm not even slightly an XML expert, I just find myself
having to deal with a lot of it.
Aug 16 '07 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Hallöchen!

Paul Rubin writes:
I have a doc with a bunch of fields like:

<foo bar="spam">stuff</foo>
<foo bar="penguin">other stuff</foo>

and sometimes

<foo bar="parrot"></foo>

I use ElementTree to parse the doc and I use the .text attribute
to get "stuff" or "other stuff" in the spam and penguin examples.

I'd expect .text to be the empty string in the parrot example, but
instead it is None.
Technically, text is nodes as all other element nodes. In the
parrot example, there is no empty textnode but no textnode at all.
I can fix my script to deal with this, but it's surprising. Is it
intentional? I could understand it being None if the doc had said

<foo bar="parrot"/>

but that is different.
<foo bar="parrot"/and <foo bar="parrot"></fooare mapped to the
same thing by any XML parser, and I think it wouldn't be standards
conforming if an XML parser would pass this difference to a caller.
Disclaimer: I'm not even slightly an XML expert, I just find myself
having to deal with a lot of it.
I think the question is how XMLish the access via ElementTree should
be. While it is in principle correct that there is no text node in
parrot, it may be sensible to set it to "" for practical reasons.
As far as I can see, there is no empty text node in XML, so no
ambiguity would occur.

Tschö,
Torsten.

--
Torsten Bronger, aquisgrana, europa vetus
Jabber ID: br*****@jabber.org
(See http://ime.webhop.org for ICQ, MSN, etc.)
Aug 16 '07 #2

P: n/a
Torsten Bronger <br*****@physik.rwth-aachen.dewrites:
<foo bar="parrot"></foo>
Technically, text is nodes as all other element nodes. In the
parrot example, there is no empty textnode but no textnode at all.
That is required by the xml standard? If yes, elementtree is doing
the right thing, but it surprises me, I would have expected an empty
string. Thanks.
Aug 16 '07 #3

P: n/a
Paul Rubin wrote:
Torsten Bronger <br*****@physik.rwth-aachen.dewrites:
>> <foo bar="parrot"></foo>
Technically, text is nodes as all other element nodes. In the
parrot example, there is no empty textnode but no textnode at all.

That is required by the xml standard? If yes, elementtree is doing
the right thing, but it surprises me, I would have expected an empty
string. Thanks.
The XML standard defines both as being equivalent, so any XML parser would
handle them exactly the same. Also, as most XML parsers have a SAX(-like)
interface, which always generates events in the "<foo></foo>" form, there is
not even a way for applications or libraries to distinguish between the two.

So it's not even an ElementTree thing. ET just doesn't know what exactly was
in the original XML byte stream. A very simple way to make sure you always get
a string back is
>>text = element.text or ""
BTW, you'd be even more surprised to see that ET can actually /store/ "" as
text if you tell it to, and then returns an empty string when you ask for the
..text property. But any empty text coming from the parser will always be None.

Oh, and lxml.etree behaves exactly the same as ElementTree here. :)

Stefan
Aug 16 '07 #4

P: n/a
Stefan Behnel <st******************@web.dewrites:
So it's not even an ElementTree thing. ET just doesn't know what
exactly was in the original XML byte stream. A very simple way to
make sure you always get a string back is
>>text = element.text or ""
Thanks, I ended up doing something like that. What I wondered about
the standard was whether it specified that parrot had no text node,
as opposed to having an empty text node. I guess it doesn't matter,
it just caught me by surprise.
Aug 16 '07 #5

This discussion thread is closed

Replies have been disabled for this discussion.