Sali Nicolas :)),
please see below for my answers.
ni************@genevoise.ch schrieb:
Gruëzi, Gerald ;-)
Well, ok, but I don't understand why I should first convert a pure
unicode string into a byte string.
The encoding ( here, latin-1) seems an arbitrary choice.
Well "latin-1" is only encoding, about which I know that it works on
my xterm and which I can type without spelling errors :)
Your solution works, but is it a workaround or the real way to use
strxfrm ?
It seems a little artificial to me, but perhaps I haven't understood
something ...
In Python 2.3.4 I had some strange encounters with the locale module,
In the end I considered it broken, at least when it came to currency
formating.
Does this mean that you cannot pass a unicode string to strxfrm ?
This works here for my home-grown python 2.4 on Jurrasic Debian Woody:
import locale
s=u'\u00e9'
print s
print locale.setlocale(locale.LC_ALL, '')
print repr( locale.strxfrm( s.encode( "latin-1" ) ) )
print repr( locale.strxfrm( s.encode( "utf-8" ) ) )
The output is rather strange:
é
de_DE
"\x10\x01\x05\x01\x02\x01'@/locale"
"\x0c\x01\x0c\x01\x04\x01'@/locale"
Another (not so) weird thing happens when I unset LANG.
bear@special:~ > unset LANG
bear@special:~ > python2.4 ttt.py
Traceback (most recent call last):
File "ttt.py", line 3, in ?
print s
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
position 0: ordinal not in range(128)
Acually it's more weird, that printing works with LANG=de_DE.
Back to your question. A quick glance at the C-sources of the
_localemodule.c reveals:
if (!PyArg_ParseTuple(args, "s:strxfrm", &s))
So yes, strxfrm does not accept unicode!
I am inclined to consider this a bug.
A least it is not consistent with strcoll.
Strcoll accepts either 2 strings or 2 unicode strings,
at least when HAVE_WCSCOLL was defined when python
was compiled on your plattform.
BTW: Which platform do you use?
HTH,
Gerald
PS: If you have access to irc, you can also ask at
irc://irc.freenode.net#python.de.
--
GPG-Key:
http://keyserver.veridis.com:11371/search?q=0xA140D634