469,579 Members | 1,188 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,579 developers. It's quick & easy.

Problems with unicode

I'm trying to write out a XML document using a StringIO class, however
I always run into the following error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position
4: ordinal
not in range(128)

Apparently in the batch that I'm encoding there is one string with
non-ascii characters in it.
Is there any way to just have it encode everything as unicode and not
ascii?
Or should I just strip out non-ascii characters (a last resort which I
do not want to do)
Thanks.
Jul 18 '05 #1
2 1177
In article <a0**************************@posting.google.com >,
ja****@appliedminds.com (James Laamnna) wrote:
Apparently in the batch that I'm encoding there is one string with
non-ascii characters in it. Is there any way to just have it encode
everything as unicode and not ascii?


A better question to ask is this: where did the supposed ASCII data come
from in the first place? If, for instance, it came from a Windows
machine, then there's a chance it's actually ISO-8859-1 encoding, in
which case you can preserve the 0x92 by encoding using that codec,
instead of the 'ascii' one. Similarly, if the original text came from a
Mac, then it's likely in Mac Roman, so if you use the 'mac-roman' codec
you'll be able to preserve the correct character in your resulting
Unicode.

Dave
Jul 18 '05 #2
David Opstad <op****@batnet.com> pisze:
If, for instance, it came from a Windows
machine, then there's a chance it's actually ISO-8859-1 encoding


If it came from Windows, it's actually CP-1252, not Latin-1.

http://www.effbot.org/zone/unicode-gremlins.htm

--
Jarek Zgoda
http://jpa.berlios.de/
Jul 18 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.

By using this site, you agree to our Privacy Policy and Terms of Use.