473,396 Members | 2,034 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

ElementTree and Unicode


I guess I am doing something wrong ... Any clue ?
>>from elementtree.ElementTree import *
element = Element("string", value=u"\x00")
xml = tostring(element)
XML(xml)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib/python2.4/site-packages/elementtree/ElementTree.py",
line 960, in XML
parser.feed(text)
File "/usr/lib/python2.4/site-packages/elementtree/ElementTree.py",
line 1242, in feed
self._parser.Parse(data, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1,
column 15

Cheers,

SB

Aug 2 '06 #1
6 2604

"Sébastien Boisgérault" <Se*******************@gmail.comwrote in message
news:11*********************@h48g2000cwc.googlegro ups.com...
>>>element = Element("string", value=u"\x00")
I'm not as familiar with elementtree.ElementTree as I perhaps
should be. However, you appear to be trying to insert a null
character into an XML document. Should you succeed in this
quest, the resulting document will be ill-formed, and any
conforming parser will choke on it.
Aug 2 '06 #2

Richard Brodie wrote:
"Sébastien Boisgérault" <Se*******************@gmail.comwrote in message
news:11*********************@h48g2000cwc.googlegro ups.com...
>>element = Element("string", value=u"\x00")

I'm not as familiar with elementtree.ElementTree as I perhaps
should be. However, you appear to be trying to insert a null
character into an XML document. Should you succeed in this
quest, the resulting document will be ill-formed, and any
conforming parser will choke on it.
I am trying to embed an *arbitrary* (unicode) strings inside
an XML document. Of course I'd like to be able to reconstruct
it later from the xml document ... If the naive way to do it does
not work, can anyone suggest a way to do it ?

SB

Aug 2 '06 #3
In <11**********************@b28g2000cwb.googlegroups .com>, Sébastien
Boisgérault wrote:
I am trying to embed an *arbitrary* (unicode) strings inside
an XML document. Of course I'd like to be able to reconstruct
it later from the xml document ... If the naive way to do it does
not work, can anyone suggest a way to do it ?
Encode it in UTF-8 and then Base64. AFAIK the only reliable way to put an
arbitrary string into XML and get exactly the same string back again.

Ciao,
Marc 'BlackJack' Rintsch
Aug 2 '06 #4
Sébastien Boisgérault schrieb:
I am trying to embed an *arbitrary* (unicode) strings inside
an XML document. Of course I'd like to be able to reconstruct
it later from the xml document ... If the naive way to do it does
not work, can anyone suggest a way to do it ?
XML does not support arbitrary Unicode characters; a few control
characters are excluded. See the definiton of Char in

http://www.w3.org/TR/2004/REC-xml-20040204

[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] |
[#xE000-#xFFFD] | [#x10000-#x10FFFF]

Now, one might thing you could use a character reference
(e.g. �) to refer to the "missing" characters, but this is not so:
[66] CharRef ::= '&#' [0-9]+ ';'
| '&#x' [0-9a-fA-F]+ ';

Well-formedness constraint: Legal Character
Characters referred to using character references must match the
production for Char.

As others have explained, if you want to transmit arbitrary characters,
you need to encode it as text in some way. One obvious solution
would be to encode the Unicode data as UTF-8 first, and then encode
the UTF-8 bytes using base64. The receiver of the XML document then
must do the reverse.

Regards,
Martin
Aug 2 '06 #5
Sébastien Boisgérault schrieb:
I am trying to embed an *arbitrary* (unicode) strings inside
an XML document. Of course I'd like to be able to reconstruct
it later from the xml document ... If the naive way to do it does
not work, can anyone suggest a way to do it ?
XML does not support arbitrary Unicode characters; a few control
characters are excluded. See the definiton of Char in

http://www.w3.org/TR/2004/REC-xml-20040204

[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] |
[#xE000-#xFFFD] | [#x10000-#x10FFFF]

Now, one might thing you could use a character reference
(e.g. �) to refer to the "missing" characters, but this is not so:
[66] CharRef ::= '&#' [0-9]+ ';'
| '&#x' [0-9a-fA-F]+ ';

Well-formedness constraint: Legal Character
Characters referred to using character references must match the
production for Char.

As others have explained, if you want to transmit arbitrary characters,
you need to encode it as text in some way. One obvious solution
would be to encode the Unicode data as UTF-8 first, and then encode
the UTF-8 bytes using base64. The receiver of the XML document then
must do the reverse.

Regards,
Martin
Aug 2 '06 #6

Martin v. Löwis wrote:
Sébastien Boisgérault schrieb:
I am trying to embed an *arbitrary* (unicode) strings inside
an XML document. Of course I'd like to be able to reconstruct
it later from the xml document ... If the naive way to do it does
not work, can anyone suggest a way to do it ?

XML does not support arbitrary Unicode characters; a few control
characters are excluded. See the definiton of Char in

http://www.w3.org/TR/2004/REC-xml-20040204

[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] |
[#xE000-#xFFFD] | [#x10000-#x10FFFF]

Now, one might thing you could use a character reference
(e.g. �) to refer to the "missing" characters, but this is not so:
[66] CharRef ::= '&#' [0-9]+ ';'
| '&#x' [0-9a-fA-F]+ ';

Well-formedness constraint: Legal Character
Characters referred to using character references must match the
production for Char.

As others have explained, if you want to transmit arbitrary characters,
you need to encode it as text in some way. One obvious solution
would be to encode the Unicode data as UTF-8 first, and then encode
the UTF-8 bytes using base64. The receiver of the XML document then
must do the reverse.

Regards,
Martin
OK ! Thanks a lot for this helpful information.

Cheers,

SB

Aug 2 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Stewart Midwinter | last post by:
I want to parse a file with ElementTree. My file has the following format: <!-- file population.xml --> <?xml version='1.0' encoding='utf-8'?> <population> <person><name="joe" sex="male"...
14
by: Erik Bethke | last post by:
Hello All, I am getting an error of not well-formed at the beginning of the Korean text in the second example. I am doing something wrong with how I am encoding my Korean? Do I need more of a...
1
by: mirandacascade | last post by:
O/S: Windows 2K Vsn of Python: 2.4 Currently: 1) Folder structure: \workarea\ <- ElementTree files reside here \xml\ \dom\
4
by: Damjan | last post by:
Attached is the smallest test case, that shows that ElementTree returns a string object if the text in the tree is only ascii, but returns a unicode object otherwise. This would make sense if...
15
by: Steven Bethard | last post by:
I'm having trouble using elementtree with an XML file that has some gbk-encoded text. (I can't read Chinese, so I'm taking their word for it that it's gbk-encoded.) I always have trouble with...
7
by: mirandacascade | last post by:
O/S: Windows XP Home Vsn of Python: 2.4 Copy/paste of interactive window is immediately below; the text/questions toward the bottom of this post will refer to the content of the copy/paste ...
2
by: Sébastien Boisgérault | last post by:
Hi all, The unicode code points in the 0000-001F range -- except newline, tab, carriage return -- are not legal XML 1.0 characters. Attempts to serialize and deserialize such strings with...
6
by: Tim Arnold | last post by:
Hi, I'm getting the by-now-familiar error: return codecs.charmap_decode(input,errors,decoding_map) UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position 4615: ordinal not in...
2
by: globophobe | last post by:
This is likely an easy problem; however, I couldn't think of appropriate keywords for google: Basically, I have some raw data that needs to be preprocessed before it is saved to the database...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.