469,934 Members | 1,873 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,934 developers. It's quick & easy.

Unicode problem with exec

I'm using code.Interactive console but it doesn't work correctly
with non-ascii characters. I think it boils down to this problem:

Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
print u"" exec 'print u""' Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<string>", line 1, in ?
File "c:\python24\lib\encodings\cp850.py", line 18, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\x84' in position 0: character maps to <undefined> ^Z


Why does the exec call fail, and is there a workaround?

Thanks,
Thomas

Jun 23 '06 #1
3 6032
Thomas Heller schrieb:
I'm using code.Interactive console but it doesn't work correctly
with non-ascii characters. I think it boils down to this problem:

Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
print u"" exec 'print u""' Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<string>", line 1, in ?
File "c:\python24\lib\encodings\cp850.py", line 18, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\x84' in
position 0: character maps to <undefined> ^Z


Why does the exec call fail, and is there a workaround?


Most probably because you failed to encode the snippet as whole - so the
embedded unicode literal isn't encoded properly.

As your exec-encoding seems to be cp850, maybe

exec u"print u''".encode("cp850")

works.

Diez
Jun 23 '06 #2
On 23/06/2006 9:06 PM, Thomas Heller wrote:
I'm using code.Interactive console but it doesn't work correctly
with non-ascii characters. I think it boils down to this problem:

Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
print u""
This is utterly useless for diagnostic purposes. What you see is NOT
what you've got. Use repr().

What you've got, as the error message says, is u'\x84' which is not
u"\N{LATIN SMALL LETTER A WITH DIAERESIS}", it is a control character.

See below.
exec 'print u""' Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<string>", line 1, in ?
File "c:\python24\lib\encodings\cp850.py", line 18, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\x84' in
position 0: character maps to <undefined> ^Z


Why does the exec call fail, and is there a workaround?


Executive summary:

The exec statement didn't fail, it was the print statement trying to
print, to your CP850 console, a unicode char that doesn't exist in CP850.

This happened because you copied a character whose repr() is '\x84' from
your MS-DOS console and pasted it into 'u"<insert any old rubbish
here>"' :-)

Details:

Windows XP, in a console screen:

Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
|>> uc = u"\N{LATIN SMALL LETTER A WITH DIAERESIS}"
|>> uc
u'\xe4' <<== agrees with Unicode book
|>> encoded = uc.encode('cp850') encoded

'\x84' <<== agrees with
http://www.unicode.org/Public/MAPPIN...T/PC/CP850.TXT
|>> print uc
<<== looks like LATIN SMALL LETTER A WITH DIAERESIS, as expected
|>> print encoded
<<== looks like LATIN SMALL LETTER A WITH DIAERESIS, as expected
|>> print u"\x84"
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "c:\python24\lib\encodings\cp850.py", line 18, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\x84' in
position 0: character maps to <undefined>
<<== as expected

Looks like Python is working fine to me ...
So, what's happening? Look at this:

|>> char1 = u"" <<= corresponds to your "print"
|>> char2 = "" <<= corresponds to your exec -- which was given a STRING
constant, like this, not a Unicode constant.

Character in char1 was copied from DOS console.
Second line was obtained by DOS console editing of copy of first line.

|>> char1
u'\xe4'
|>> char2
'\x84' <<= Aha!

What you have done is effectively: exec 'print u"\x84"'

Workaround/kludge/bypass:

exec u'print u""'
......^

Much better: embed non-ASCII characters in source code *ONLY* when you
have a proper coding header: http://www.python.org/dev/peps/pep-0263/

HTH,
John

Jun 23 '06 #3
On 23/06/2006 9:06 PM, Thomas Heller wrote:
I'm using code.Interactive console but it doesn't work correctly
with non-ascii characters. I think it boils down to this problem:

Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
print u""
This is utterly useless for diagnostic purposes. What you see is NOT
what you've got. Use repr().

What you've got, as the error message says, is u'\x84' which is not
u"\N{LATIN SMALL LETTER A WITH DIAERESIS}", it is a control character.

See below.
exec 'print u""' Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<string>", line 1, in ?
File "c:\python24\lib\encodings\cp850.py", line 18, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\x84' in
position 0: character maps to <undefined> ^Z


Why does the exec call fail, and is there a workaround?


Executive summary:

The exec statement didn't fail, it was the print statement trying to
print, to your CP850 console, a unicode char that doesn't exist in CP850.

This happened because you copied a character whose repr() is '\x84' from
your MS-DOS console and pasted it into 'u"<insert any old rubbish
here>"' :-)

Details:

Windows XP, in a console screen:

Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
|>> uc = u"\N{LATIN SMALL LETTER A WITH DIAERESIS}"
|>> uc
u'\xe4' <<== agrees with Unicode book
|>> encoded = uc.encode('cp850') encoded

'\x84' <<== agrees with
http://www.unicode.org/Public/MAPPIN...T/PC/CP850.TXT
|>> print uc
<<== looks like LATIN SMALL LETTER A WITH DIAERESIS, as expected
|>> print encoded
<<== looks like LATIN SMALL LETTER A WITH DIAERESIS, as expected
|>> print u"\x84"
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "c:\python24\lib\encodings\cp850.py", line 18, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\x84' in
position 0: character maps to <undefined>
<<== as expected

Looks like Python is working fine to me ...
So, what's happening? Look at this:

|>> char1 = u"" <<= corresponds to your "print"
|>> char2 = "" <<= corresponds to your exec -- which was given a STRING
constant, like this, not a Unicode constant.

Character in char1 was copied from DOS console.
Second line was obtained by DOS console editing of copy of first line.

|>> char1
u'\xe4'
|>> char2
'\x84' <<= Aha!

What you have done is effectively: exec 'print u"\x84"'

Workaround/kludge/bypass:

exec u'print u""'
......^

Much better: embed non-ASCII characters in source code *ONLY* when you
have a proper coding header: http://www.python.org/dev/peps/pep-0263/

HTH,
John
Jun 23 '06 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

12 posts views Thread by Peter Wilkinson | last post: by
3 posts views Thread by fanbanlo | last post: by
1 post views Thread by Mike Brown | last post: by
3 posts views Thread by Shrii | last post: by
1 post views Thread by Steve Thorpe | last post: by
1 post views Thread by PRiya | last post: by
3 posts views Thread by andrew | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.