By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,002 Members | 1,171 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,002 IT Pros & Developers. It's quick & easy.

error when parsing xml

P: n/a
I use xml.dom.minidom to parse some xml, but when input
contains some specific caracters(, and ), I get an
UnicodeEncodeError, like this:

UnicodeEncodeError: 'ascii' codec can't encode character
u'\xe6' in position 604: ordinal not in range(128).

How can I avoid this error?
All help much appreciated!
--
Har du et kjleskap, har du en TV
s har du alt du trenger for leve

-Jokke & Valentinerne
Sep 5 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
Odd-R. wrote:
I use xml.dom.minidom to parse some xml, but when input < contains some specific caracters(, and ), I get an UnicodeEncodeError, like this:

UnicodeEncodeError: 'ascii' codec can't encode character
u'\xe6' in position 604: ordinal not in range(128).

How can I avoid this error?


if you're getting this on the way in, something is broken (posting a short
self-contained test program will help us figure out what's wrong).

if you're getting this on the way out, the problem is that you're trying to
print Unicode strings to an ASCII device. use the "encode" method to
convert the string to the encoding you want to use, or use codecs.open
to open an encoded stream and print via that one instead.

more reading (google for "python unicode" if you want more):

http://www.jorendorff.com/articles/unicode/python.html
http://effbot.org/zone/unicode-objects.htm
http://www.amk.ca/python/howto/unicode

</F>

Sep 5 '05 #2

P: n/a
> if you're getting this on the way in, something is broken (posting a short
self-contained test program will help us figure out what's wrong).


Or he tries to pass a unicode object to parseString.
Regards,

Diez

# -*- coding: utf-8 -*-
import xml.dom.minidom
dom3 = xml.dom.minidom.parseString(u'<myxml>wir hoffen ihr habt den sommer schšn verbracht<empty/> some more data</myxml>')
print dom3

Sep 5 '05 #3

P: n/a
On 2005-09-05, Fredrik Lundh <fr*****@pythonware.com> wrote:
Odd-R. wrote:
I use xml.dom.minidom to parse some xml, but when input< contains some specific caracters(, and ), I get an
UnicodeEncodeError, like this:

UnicodeEncodeError: 'ascii' codec can't encode character
u'\xe6' in position 604: ordinal not in range(128).

How can I avoid this error?


if you're getting this on the way in, something is broken (posting a short
self-contained test program will help us figure out what's wrong).


This is retrieved through a webservice and stored in a variable test

<?xml version='1.0' encoding='utf-8'?>
<!-- DTD for xmltest-->
<!DOCTYPE testtest [ <!ELEMENT testtest ( test*)>
<!ELEMENT test (#PCDATA)>]>
<testtest><test></test></testtest>

printing this out yields no problems, so the trouble seems to be when executing
the following:

doc = minidom.parseString(test)

Then I get this error:

File "C:\Plone\Python\lib\site-packages\_xmlplus\dom\minidom.py", line 1918, in parseString
return expatbuilder.parseString(string)
File "C:\Plone\Python\lib\site-packages\_xmlplus\dom\expatbuilder.py", line 940, in parseString
return builder.parseString(string)
File "C:\Plone\Python\lib\site-packages\_xmlplus\dom\expatbuilder.py", line 223, in parseString
parser.Parse(string, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 157-159: ordinal not in range(128)
In the top of the file, I have put this statement: # -*- coding: utf-8 -*-
if you're getting this on the way out, the problem is that you're trying to
print Unicode strings to an ASCII device. use the "encode" method to
convert the string to the encoding you want to use, or use codecs.open
to open an encoded stream and print via that one instead.


Can you give an example of how this is done?

Thanks again for all help!

--
Har du et kjleskap, har du en TV
s har du alt du trenger for leve

-Jokke & Valentinerne
Sep 5 '05 #4

P: n/a
Odd-R. wrote:
This is retrieved through a webservice and stored in a variable test

<?xml version='1.0' encoding='utf-8'?>
<!-- DTD for xmltest-->
<!DOCTYPE testtest [ <!ELEMENT testtest ( test*)>
<!ELEMENT test (#PCDATA)>]>
<testtest><test></test></testtest>

printing this out yields no problems, so the trouble seems to be when executing
the following:

doc = minidom.parseString(test)


You need to do

doc = minidom.parseString(test.encode("utf-8"))

The reason is simple: test is not a string, but a unicode object.
XML-Parsers work with strings - thus passing a unicode object to them
will convert it - with the default encoding, which is ascii. BTW, I used
encode("utf-8") because the header of your documnet says so. If it
were latin1, you'd need that. There is plenty of unicode-related
material out there - use google to search this NG or the web.

Diez
Sep 5 '05 #5

P: n/a
Odd-R. wrote:
This is retrieved through a webservice and stored in a variable test

<?xml version='1.0' encoding='utf-8'?>
<!-- DTD for xmltest-->
<!DOCTYPE testtest [ <!ELEMENT testtest ( test*)>
<!ELEMENT test (#PCDATA)>]>
<testtest><test></test></testtest>

printing this out yields no problems, so the trouble seems to be when executing < the following:
doc = minidom.parseString(test)


unless we have a cut-and-paste problem here, that looks like invalid XML;
the header says UTF-8, but the test element contains ISO-8859-1 text.

try changing "utf-8" to "iso-8859-1" to see if that helps.

and you really need to fix the originating system, to make sure that the en-
coding header matches the encoding used for the content.

</F>

Sep 5 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.