473,406 Members | 2,894 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

small inconsistency in ElementTree (1.2.6)

Attached is the smallest test case, that shows that ElementTree returns
a
string object if the text in the tree is only ascii, but returns a
unicode
object otherwise.

This would make sense if the sting object and unicode object were
interchangeable... but they are not - one example, the translate method
is
completelly different.

I've tested with cElementTree (1.0.2) too, it has the same behaviour.

Any suggestions?
Do I need to check the output of ElementTree everytime, or there's some
hidden switch to change this behaviour?

from elementtree import ElementTree

xml = """\
<?xml version="1.0" encoding="UTF-8"?>
<root>
<p1> ascii </p1>
<p2> \xd0\xba\xd0\xb8\xd1\x80\xd0\xb8\xd0\xbb\xd0\xb8\x d1\x86\xd0\xb0
</p2>
</root>
"""

tree = ElementTree.fromstring(xml)
p1, p2 = tree.getchildren()
print "type(p1.text):", type(p1.text)
print "type(p2.text):", type(p2.text)

Dec 9 '05 #1
4 1322
Damjan wrote:
Attached is the smallest test case, that shows that ElementTree returns
a string object if the text in the tree is only ascii, but returns a unicode
object otherwise.

This would make sense if the sting object and unicode object were
interchangeable... but they are not - one example, the translate method
is completelly different.

I've tested with cElementTree (1.0.2) too, it has the same behaviour.

Any suggestions?
this is documented behaviour.
Do I need to check the output of ElementTree everytime, or there's some
hidden switch to change this behaviour?


no.

ascii strings and unicode strings are perfectly interchangable, with some
minor exceptions. if you find yourself using translate all the time (why?),
add an explicit conversion to the translate code.

(fwiw, I'd say this is a bug in translate rather than in elementtree)

</F>

Dec 9 '05 #2
>> Do I need to check the output of ElementTree everytime, or there's some
hidden switch to change this behaviour?
no.

ascii strings and unicode strings are perfectly interchangable, with some
minor exceptions.


It's not only translate, it's decode too... probably other methods and
behaviour differ too.
And the bigger picture, string objects are really only byte sequences,
while
text is consisted of characters and that's what unicode strings are
for,
strings-made-of-characters.

It seems to me more logical that an et.text to be a unicode object
always.
It's text, right!
if you find yourself using translate all the time
(why?), add an explicit conversion to the translate code.
I'm using translate because I need it :)

I'm currently just wrapping anything from ElementTree in unicode(), but
this
seems like an ugly step.
(fwiw, I'd say this is a bug in translate rather than in elementtree)


I wonder what the python devels will say? ;)

Dec 9 '05 #3
Damjan wrote:
ascii strings and unicode strings are perfectly interchangable, with some
minor exceptions.
It's not only translate, it's decode too...


why would you use decode on the strings you get back from ET ?
probably other methods and behaviour differ too.

And the bigger picture, string objects are really only byte sequences
not if they contain ASCII characters.
while text is consisted of characters and that's what unicode strings
are for, strings-made-of-characters.

It seems to me more logical that an et.text to be a unicode object
always. It's text, right!
if you find yourself using translate all the time
(why?), add an explicit conversion to the translate code.


I'm using translate because I need it :)

I'm currently just wrapping anything from ElementTree in unicode(), but
this seems like an ugly step.
(fwiw, I'd say this is a bug in translate rather than in elementtree)


I wonder what the python devels will say? ;)


well, you're talking to the developer who wrote the original Unicode
implementation...

</F>

Dec 9 '05 #4
>>> ascii strings and unicode strings are perfectly interchangable, with
some minor exceptions.


It's not only translate, it's decode too...


why would you use decode on the strings you get back from ET ?


Long story... some time ago when computers wouldn't support charsets
people
invented so called "cyrillic fonts" - ie a font that has cyrillic
glyphs
mapped on the latin posstions. Since our cyrillic alphabet has 31
characters, some characters in said fonts were mapped to { or ~ etc..
Of
course this ,,sollution" is awful but it was the only one at the
time.

So I'm making a python script that takes an OpenDocument file and
translates
it to UTF-8...

ps. I use translate now, but I was making a general note that unicode
and
string objects are not 100% interchangeable. translate, encode, decode
are
especially problematic.

anyway, I wrap the output of ET in unicode() now... I don't see
another, better, sollution.

Dec 10 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Carlos Ribeiro | last post by:
Hi all, While writing a small program to help other poster at c.l.py, I found a small inconsistency between the handling of keyword parameters of string.split() and the split() method of...
7
by: Stewart Midwinter | last post by:
I want to parse a file with ElementTree. My file has the following format: <!-- file population.xml --> <?xml version='1.0' encoding='utf-8'?> <population> <person><name="joe" sex="male"...
1
by: Greg Wilson | last post by:
I'm trying to convert from minidom to ElementTree for handling XML, and am having trouble with entities in DTDs. My Python script looks like this: ...
8
by: Matthew Thorley | last post by:
Greetings, perhaps someone can explain this. I get to different styles of formatting for xmla and xmlb when I do the following: from elementtree import ElementTree as et xmla =...
1
by: mirandacascade | last post by:
O/S: Windows 2K Vsn of Python: 2.4 Currently: 1) Folder structure: \workarea\ <- ElementTree files reside here \xml\ \dom\
15
by: Steven Bethard | last post by:
I'm having trouble using elementtree with an XML file that has some gbk-encoded text. (I can't read Chinese, so I'm taking their word for it that it's gbk-encoded.) I always have trouble with...
0
by: Greg Aumann | last post by:
I am trying to write some python code for a library that reads an XML-like language from a file into elementtree data structures. Then I want to be able to read and/or modify the structure and then...
2
by: mirandacascade | last post by:
Situation is this: 1) I have inherited some python code that accepts a string object, the contents of which is an XML document, and produces a data structure that represents some of the content of...
5
by: saif.shakeel | last post by:
#!/usr/bin/env python from elementtree import ElementTree as Element tree = et.parse("testxml.xml") for t in tree.getiterator("SERVICEPARAMETER"): if t.get("Semantics") == "localId":...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.