Jiba wrote:
is the following behaviour normal :
d = {"é" : 1}
d["é"] 1 d[u"é"]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
KeyError: u'\xe9'
it seems that "é" and u"é" are not considered as the same key (in Python
2.3.3). Though they have the same hash code (returned by hash()).
And "e" and u"e" (non accentuated characters) are considered as the same
!
Well, "e" and u"e" _are_ the same character, while the unicode that comes
from decoding the "é" representation is entirely dependent on which codec
you use for the decoding. It is only the same as u"é" when decoded using
certain codecs, most likely. ASCII is 7-bit only, so the "é" value is
not legal in ASCII, which is likely your default encoding.
For example, try "é".decode('iso-8859-1') and you will probably get the
unicode value you were expecting.
I'm not the best to answer this, but I would at least say that the above
behaviour is considered "normal", though it can be surprising to those
of us not expert in Unicode issues...
-Peter