473,399 Members | 3,888 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,399 software developers and data experts.

ElementTree, XML and Unicode -- C0 Controls

Hi all,

The unicode code points in the 0000-001F range --
except newline, tab, carriage return -- are not legal
XML 1.0 characters.

Attempts to serialize and deserialize such strings
with ElementTree will fail:
>>elt = Element("root", char=u"\u0000")
xml = tostring(elt)
xml
'<root char="\x00" />'
>>fromstring(xml)
[...]
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1,
column 12

Good ! But I was expecting a failure *earlier*, in
the "tostring" function -- I basically assumed that
ElementTree would refuse to generate a XML
fragment that is not well-formed.

Could anyone comment on the rationale behind
the current behavior ? Is it a performance issue,
the search for non-valid unicode code points being
too expensive ?

Cheers,

SB

Dec 11 '06 #1
2 1656
Sébastien Boisgérault wrote:
Could anyone comment on the rationale behind
the current behavior ? Is it a performance issue,
the search for non-valid unicode code points being
too expensive ?
the default serializer doesn't do any validation or well-formedness checks at all; it assumes
that you know what you're doing.

</F>

Dec 11 '06 #2


On Dec 11, 4:51 pm, "Fredrik Lundh" <fred...@pythonware.comwrote:
Sébastien Boisgérault wrote:
Could anyone comment on the rationale behind
the current behavior ? Is it a performance issue,
the search for non-valid unicode code points being
too expensive ?
the default serializer doesn't do any validation or well-formedness checks at all; it assumes
that you know what you're doing.

</F>
Fair enough !

Thanks Fredrik.

SB

Dec 11 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Stewart Midwinter | last post by:
I want to parse a file with ElementTree. My file has the following format: <!-- file population.xml --> <?xml version='1.0' encoding='utf-8'?> <population> <person><name="joe" sex="male"...
14
by: Erik Bethke | last post by:
Hello All, I am getting an error of not well-formed at the beginning of the Korean text in the second example. I am doing something wrong with how I am encoding my Korean? Do I need more of a...
1
by: mirandacascade | last post by:
O/S: Windows 2K Vsn of Python: 2.4 Currently: 1) Folder structure: \workarea\ <- ElementTree files reside here \xml\ \dom\
4
by: Damjan | last post by:
Attached is the smallest test case, that shows that ElementTree returns a string object if the text in the tree is only ascii, but returns a unicode object otherwise. This would make sense if...
15
by: Steven Bethard | last post by:
I'm having trouble using elementtree with an XML file that has some gbk-encoded text. (I can't read Chinese, so I'm taking their word for it that it's gbk-encoded.) I always have trouble with...
7
by: mirandacascade | last post by:
O/S: Windows XP Home Vsn of Python: 2.4 Copy/paste of interactive window is immediately below; the text/questions toward the bottom of this post will refer to the content of the copy/paste ...
6
by: Sébastien Boisgérault | last post by:
I guess I am doing something wrong ... Any clue ? Traceback (most recent call last): File "<stdin>", line 1, in ? File "/usr/lib/python2.4/site-packages/elementtree/ElementTree.py", line 960,...
6
by: Tim Arnold | last post by:
Hi, I'm getting the by-now-familiar error: return codecs.charmap_decode(input,errors,decoding_map) UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position 4615: ordinal not in...
2
by: globophobe | last post by:
This is likely an easy problem; however, I couldn't think of appropriate keywords for google: Basically, I have some raw data that needs to be preprocessed before it is saved to the database...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.