473,503 Members | 8,959 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

strxfrm works with unicode string ?

I am trying to use strxfm with unicode strings, but it does not work.
This is what I did:
import locale
s=u'\u00e9'
print s é locale.setlocale(locale.LC_ALL, '') 'French_Switzerland.1252' locale.strxfrm(s)
Traceback (most recent call last):
File "<pyshell#20>", line 1, in -toplevel-
locale.strxfrm(s)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
position 0: ordinal not in range(128)


Someone sees what I did wrong ?

Jul 19 '05 #1
4 1801
How about:

import locale
s=u'\u00e9'
print s
locale.setlocale(locale.LC_ALL, '')
locale.strxfrm( s.encode( "latin-1" ) )

---
HTH,
Gerald

ni************@genevoise.ch schrieb:
I am trying to use strxfm with unicode strings, but it does not work.
This is what I did:

import locale
s=u'\u00e9'
print s
é
locale.setlocale(locale.LC_ALL, '')
'French_Switzerland.1252'
locale.strxfrm(s)

Traceback (most recent call last):
File "<pyshell#20>", line 1, in -toplevel-
locale.strxfrm(s)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
position 0: ordinal not in range(128)
Someone sees what I did wrong ?


--
GPG-Key: http://keyserver.veridis.com:11371/search?q=0xA140D634

Jul 19 '05 #2
Gruëzi, Gerald ;-)

Well, ok, but I don't understand why I should first convert a pure
unicode string into a byte string.
The encoding ( here, latin-1) seems an arbitrary choice.

Your solution works, but is it a workaround or the real way to use
strxfrm ?
It seems a little artificial to me, but perhaps I haven't understood
something ...

Does this mean that you cannot pass a unicode string to strxfrm ?

Bonne journée !

Jul 19 '05 #3
Sali Nicolas :)),
please see below for my answers.

ni************@genevoise.ch schrieb:
Gruëzi, Gerald ;-)

Well, ok, but I don't understand why I should first convert a pure
unicode string into a byte string.
The encoding ( here, latin-1) seems an arbitrary choice. Well "latin-1" is only encoding, about which I know that it works on
my xterm and which I can type without spelling errors :)
Your solution works, but is it a workaround or the real way to use
strxfrm ?
It seems a little artificial to me, but perhaps I haven't understood
something ... In Python 2.3.4 I had some strange encounters with the locale module,
In the end I considered it broken, at least when it came to currency
formating.
Does this mean that you cannot pass a unicode string to strxfrm ?

This works here for my home-grown python 2.4 on Jurrasic Debian Woody:

import locale
s=u'\u00e9'
print s

print locale.setlocale(locale.LC_ALL, '')
print repr( locale.strxfrm( s.encode( "latin-1" ) ) )
print repr( locale.strxfrm( s.encode( "utf-8" ) ) )

The output is rather strange:

é
de_DE
"\x10\x01\x05\x01\x02\x01'@/locale"
"\x0c\x01\x0c\x01\x04\x01'@/locale"

Another (not so) weird thing happens when I unset LANG.

bear@special:~ > unset LANG
bear@special:~ > python2.4 ttt.py
Traceback (most recent call last):
File "ttt.py", line 3, in ?
print s
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
position 0: ordinal not in range(128)

Acually it's more weird, that printing works with LANG=de_DE.

Back to your question. A quick glance at the C-sources of the
_localemodule.c reveals:

if (!PyArg_ParseTuple(args, "s:strxfrm", &s))

So yes, strxfrm does not accept unicode!

I am inclined to consider this a bug.
A least it is not consistent with strcoll.
Strcoll accepts either 2 strings or 2 unicode strings,
at least when HAVE_WCSCOLL was defined when python
was compiled on your plattform.

BTW: Which platform do you use?

HTH,
Gerald

PS: If you have access to irc, you can also ask at
irc://irc.freenode.net#python.de.

--
GPG-Key: http://keyserver.veridis.com:11371/search?q=0xA140D634

Jul 19 '05 #4
ni************@genevoise.ch wrote:
Gruëzi, Gerald ;-)

Well, ok, but I don't understand why I should first convert a pure
unicode string into a byte string.
The encoding ( here, latin-1) seems an arbitrary choice.


Yes. The correct choice would be 'cp1252', not 'latin-1',
since that's what your locale setting indicates.

It seems to me that Python is on a journey from the ASCII
world to the Unicode world, and it will take a few more
versions before it gets there. Going from 2.2 to 2.3 was
a bumpy part of the ride, and it's still not smooth.

Just try to use raw_input with national characters. As far
as I remember it hasn't worked (on windows at least) since
2.2.

The clear improvement from 2.3 is that if you print unicode
strings to stdout, they will look correct both in the GUI
and in text mode (cmd.exe). That never worked before since
Windows use different code pages in Windows and in the text
mode (which is supposed to be DOS compatible).
Jul 19 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
17594
by: Michael Weir | last post by:
I'm sure this is a very simple thing to do, once you know how to do it, but I am having no fun at all trying to write utf-8 strings to a unicode file. Does anyone have a couple of lines of code...
8
5251
by: Bill Eldridge | last post by:
I'm trying to grab a document off the Web and toss it into a MySQL database, but I keep running into the various encoding problems with Unicode (that aren't a problem for me with GB2312, BIG 5,...
6
7262
by: Ulrika Ziverts | last post by:
Hello! I have a .NET application that communicates with an AS400 application through PCOM. I call an unmanaged function in the PCOM API to get a string back representing the screen in the host...
2
4520
by: John | last post by:
I am trying to call a third party company's dll that I have no control over. They tell me it was written in C. The declaration in VB6 is as follows: Private Declare Sub csub Lib "rlpolk.dll"...
10
8011
by: Nikolay Petrov | last post by:
How can I convert DOS cyrillic text to Unicode
18
34069
by: Ger | last post by:
I have not been able to find a simple, straight forward Unicode to ASCII string conversion function in VB.Net. Is that because such a function does not exists or do I overlook it? I found...
5
2486
by: wanghz | last post by:
Hello, everyone. I have a problem when I'm processing unicode strings. Is it possible to get the 8bit-string representation of any unicode string? Suppose I get a unicode string: a =...
3
3499
by: jt | last post by:
For some stupid reason, I can't get this to work in Release mode, but works well in Debug mode. Below is the function: Here is the line: pos=strpos(pszCmdLine,cmdLineStr); // in release mode...
1
2742
by: JohnCox | last post by:
I have a simple Win32 DLL I wrote named "SimpleLib" that exports two functions. It is written in C++ and compiled with __stdcall (/Gz) and with the preprocessor definition _MBCS (not Unicode). ...
0
7207
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7294
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7361
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
7015
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7470
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
1
5026
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4693
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
1523
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
749
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.