Gerhard Häring <gh@ghaering.de> writes:
>>> u"äöü"
u'\x84\x94\x81'
(Python 2.2.3/2.3b2; sys.getdefaultencoding() == "ascii")
Why does this work?
Does Python guess which encoding I mean? I thought Python should
refuse to guess :-)
I stumbled over this yesterday, and it seems it is (at least) partially
answered by PEP 263:
In Python 2.1, Unicode literals can only be written using the
Latin-1 based encoding "unicode-escape". This makes the programming
environment rather unfriendly to Python users who live and work in
non-Latin-1 locales such as many of the Asian countries. Programmers
can write their 8-bit strings using the favorite encoding, but are
bound to the "unicode-escape" encoding for Unicode literals.
I have the impression that this is undocumented on purpose, because you
should not write unescaped non-ansi characters into the source file
(with 'unknown' encoding).
Thomas