473,396 Members | 2,050 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

error when parsing xml

I use xml.dom.minidom to parse some xml, but when input
contains some specific caracters(æ, ø and å), I get an
UnicodeEncodeError, like this:

UnicodeEncodeError: 'ascii' codec can't encode character
u'\xe6' in position 604: ordinal not in range(128).

How can I avoid this error?
All help much appreciated!
--
Har du et kjøleskap, har du en TV
så har du alt du trenger for å leve

-Jokke & Valentinerne
Sep 5 '05 #1
5 3743
Odd-R. wrote:
I use xml.dom.minidom to parse some xml, but when input < contains some specific caracters(æ, ø and å), I get an UnicodeEncodeError, like this:

UnicodeEncodeError: 'ascii' codec can't encode character
u'\xe6' in position 604: ordinal not in range(128).

How can I avoid this error?


if you're getting this on the way in, something is broken (posting a short
self-contained test program will help us figure out what's wrong).

if you're getting this on the way out, the problem is that you're trying to
print Unicode strings to an ASCII device. use the "encode" method to
convert the string to the encoding you want to use, or use codecs.open
to open an encoded stream and print via that one instead.

more reading (google for "python unicode" if you want more):

http://www.jorendorff.com/articles/unicode/python.html
http://effbot.org/zone/unicode-objects.htm
http://www.amk.ca/python/howto/unicode

</F>

Sep 5 '05 #2
> if you're getting this on the way in, something is broken (posting a short
self-contained test program will help us figure out what's wrong).


Or he tries to pass a unicode object to parseString.
Regards,

Diez

# -*- coding: utf-8 -*-
import xml.dom.minidom
dom3 = xml.dom.minidom.parseString(u'<myxml>wir hoffen ihr habt den sommer schšn verbracht<empty/> some more data</myxml>')
print dom3

Sep 5 '05 #3
On 2005-09-05, Fredrik Lundh <fr*****@pythonware.com> wrote:
Odd-R. wrote:
I use xml.dom.minidom to parse some xml, but when input< contains some specific caracters(æ, ø and å), I get an
UnicodeEncodeError, like this:

UnicodeEncodeError: 'ascii' codec can't encode character
u'\xe6' in position 604: ordinal not in range(128).

How can I avoid this error?


if you're getting this on the way in, something is broken (posting a short
self-contained test program will help us figure out what's wrong).


This is retrieved through a webservice and stored in a variable test

<?xml version='1.0' encoding='utf-8'?>
<!-- DTD for xmltest-->
<!DOCTYPE testtest [ <!ELEMENT testtest ( test*)>
<!ELEMENT test (#PCDATA)>]>
<testtest><test>æøå</test></testtest>

printing this out yields no problems, so the trouble seems to be when executing
the following:

doc = minidom.parseString(test)

Then I get this error:

File "C:\Plone\Python\lib\site-packages\_xmlplus\dom\minidom.py", line 1918, in parseString
return expatbuilder.parseString(string)
File "C:\Plone\Python\lib\site-packages\_xmlplus\dom\expatbuilder.py", line 940, in parseString
return builder.parseString(string)
File "C:\Plone\Python\lib\site-packages\_xmlplus\dom\expatbuilder.py", line 223, in parseString
parser.Parse(string, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 157-159: ordinal not in range(128)
In the top of the file, I have put this statement: # -*- coding: utf-8 -*-
if you're getting this on the way out, the problem is that you're trying to
print Unicode strings to an ASCII device. use the "encode" method to
convert the string to the encoding you want to use, or use codecs.open
to open an encoded stream and print via that one instead.


Can you give an example of how this is done?

Thanks again for all help!

--
Har du et kjøleskap, har du en TV
så har du alt du trenger for å leve

-Jokke & Valentinerne
Sep 5 '05 #4
Odd-R. wrote:
This is retrieved through a webservice and stored in a variable test

<?xml version='1.0' encoding='utf-8'?>
<!-- DTD for xmltest-->
<!DOCTYPE testtest [ <!ELEMENT testtest ( test*)>
<!ELEMENT test (#PCDATA)>]>
<testtest><test>æøå</test></testtest>

printing this out yields no problems, so the trouble seems to be when executing
the following:

doc = minidom.parseString(test)


You need to do

doc = minidom.parseString(test.encode("utf-8"))

The reason is simple: test is not a string, but a unicode object.
XML-Parsers work with strings - thus passing a unicode object to them
will convert it - with the default encoding, which is ascii. BTW, I used
encode("utf-8") because the header of your documnet says so. If it
were latin1, you'd need that. There is plenty of unicode-related
material out there - use google to search this NG or the web.

Diez
Sep 5 '05 #5
Odd-R. wrote:
This is retrieved through a webservice and stored in a variable test

<?xml version='1.0' encoding='utf-8'?>
<!-- DTD for xmltest-->
<!DOCTYPE testtest [ <!ELEMENT testtest ( test*)>
<!ELEMENT test (#PCDATA)>]>
<testtest><test>æøå</test></testtest>

printing this out yields no problems, so the trouble seems to be when executing < the following:
doc = minidom.parseString(test)


unless we have a cut-and-paste problem here, that looks like invalid XML;
the header says UTF-8, but the test element contains ISO-8859-1 text.

try changing "utf-8" to "iso-8859-1" to see if that helps.

and you really need to fix the originating system, to make sure that the en-
coding header matches the encoding used for the content.

</F>

Sep 5 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Andrew James | last post by:
Gentlemen, I'm running into a problem whilst testing the parsing of a language I've created with TPG . It seems that for some reason, TPG balks when I try to parse an expression whose first...
67
by: Steven T. Hatton | last post by:
Some people have suggested the desire for code completion and refined edit-time error detection are an indication of incompetence on the part of the programmer who wants such features. ...
102
by: Skybuck Flying | last post by:
Sometime ago on the comp.lang.c, I saw a teacher's post asking why C compilers produce so many error messages as soon as a closing bracket is missing. The response was simply because the compiler...
1
by: D A H | last post by:
I have gotten the same exception in multiple projects. I have solved the underlying problem. My question is if anyone knew of a setting that would cause this exception to be thrown. A...
6
by: ST | last post by:
Hi, I keep getting the parser error, and I have no idea why. I've tried a number of things including: 1)building/rebuilding about 100x 2)making sure all dll's are in the bin folder in the root...
0
by: Rosetta | last post by:
I need help with Xml parsing error. I have data that I want to combine with a .pdf file in Adobe Acrobat 7.0 and when I try it gives me this error: Xml parsing error: xml processing instruction...
0
by: palabat | last post by:
Hello. I'm getting this error , "XML error parsing SOAP payload : Empty Document" when I try to execute a NuSOAP client in consuming a .NET web service. The response from the web service server...
0
by: =?Utf-8?B?VWxmIFRob3JzZW4=?= | last post by:
I use Visual Studio 2005 for a C-project using an external compiler, and came up with the idea that error parsing would be neat, i.e. enabling the functionality available for a "normal" build...
3
by: =?Utf-8?B?RGFuYQ==?= | last post by:
I am re-posting this message after registering my posting alias. When I specify an end tag for the clear element of namespaces in my web.config file, the parser error "Unrecognized element 'add'"...
3
by: GazK | last post by:
I have been using an xml parsing script to parse a number of rss feeds and return relevant results to a database. The script has worked well for a couple of years, despite having very crude...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.