strxfrm works with unicode string ?

I am trying to use strxfm with unicode strings, but it does not work.
This is what I did:

import locale
s=u'\u00e9'
print s é locale.setlocale(locale.LC_ALL, '') 'French_Switzerland.1252' locale.strxfrm(s)
Traceback (most recent call last):
File "<pyshell#20>", line 1, in -toplevel-
locale.strxfrm(s)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
position 0: ordinal not in range(128)

Someone sees what I did wrong ?

Jul 19 '05 #1

Subscribe Reply

1801

Gerald Klix

How about:

import locale
s=u'\u00e9'
print s
locale.setlocale(locale.LC_ALL, '')
locale.strxfrm( s.encode( "latin-1" ) )

---
HTH,
Gerald

ni************@genevoise.ch schrieb:

I am trying to use strxfm with unicode strings, but it does not work.
This is what I did:

import locale
s=u'\u00e9'
print s
é
locale.setlocale(locale.LC_ALL, '')
'French_Switzerland.1252'
locale.strxfrm(s)

Traceback (most recent call last):
File "<pyshell#20>", line 1, in -toplevel-
locale.strxfrm(s)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
position 0: ordinal not in range(128)
Someone sees what I did wrong ?

--
GPG-Key: http://keyserver.veridis.com:11371/search?q=0xA140D634

Jul 19 '05 #2

nicolas.riesch

Gruëzi, Gerald ;-)

Well, ok, but I don't understand why I should first convert a pure
unicode string into a byte string.
The encoding ( here, latin-1) seems an arbitrary choice.

Your solution works, but is it a workaround or the real way to use
strxfrm ?
It seems a little artificial to me, but perhaps I haven't understood
something ...

Does this mean that you cannot pass a unicode string to strxfrm ?

Bonne journée !

Jul 19 '05 #3

Gerald Klix

Sali Nicolas :)),
please see below for my answers.

ni************@genevoise.ch schrieb:

Gruëzi, Gerald ;-)

Well, ok, but I don't understand why I should first convert a pure
unicode string into a byte string.
The encoding ( here, latin-1) seems an arbitrary choice. Well "latin-1" is only encoding, about which I know that it works on
my xterm and which I can type without spelling errors :)
Your solution works, but is it a workaround or the real way to use
strxfrm ?
It seems a little artificial to me, but perhaps I haven't understood
something ... In Python 2.3.4 I had some strange encounters with the locale module,
In the end I considered it broken, at least when it came to currency
formating.
Does this mean that you cannot pass a unicode string to strxfrm ?

This works here for my home-grown python 2.4 on Jurrasic Debian Woody:

import locale
s=u'\u00e9'
print s

print locale.setlocale(locale.LC_ALL, '')
print repr( locale.strxfrm( s.encode( "latin-1" ) ) )
print repr( locale.strxfrm( s.encode( "utf-8" ) ) )

The output is rather strange:

é
de_DE
"\x10\x01\x05\x01\x02\x01'@/locale"
"\x0c\x01\x0c\x01\x04\x01'@/locale"

Another (not so) weird thing happens when I unset LANG.

bear@special:~ > unset LANG
bear@special:~ > python2.4 ttt.py
Traceback (most recent call last):
File "ttt.py", line 3, in ?
print s
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
position 0: ordinal not in range(128)

Acually it's more weird, that printing works with LANG=de_DE.

Back to your question. A quick glance at the C-sources of the
_localemodule.c reveals:

if (!PyArg_ParseTuple(args, "s:strxfrm", &s))

So yes, strxfrm does not accept unicode!

I am inclined to consider this a bug.
A least it is not consistent with strcoll.
Strcoll accepts either 2 strings or 2 unicode strings,
at least when HAVE_WCSCOLL was defined when python
was compiled on your plattform.

BTW: Which platform do you use?

HTH,
Gerald

PS: If you have access to irc, you can also ask at
irc://irc.freenode.net#python.de.

--
GPG-Key: http://keyserver.veridis.com:11371/search?q=0xA140D634

Jul 19 '05 #4

Magnus Lycka

ni************@genevoise.ch wrote:

Gruëzi, Gerald ;-)

Well, ok, but I don't understand why I should first convert a pure
unicode string into a byte string.
The encoding ( here, latin-1) seems an arbitrary choice.

Yes. The correct choice would be 'cp1252', not 'latin-1',
since that's what your locale setting indicates.

It seems to me that Python is on a journey from the ASCII
world to the Unicode world, and it will take a few more
versions before it gets there. Going from 2.2 to 2.3 was
a bumpy part of the ride, and it's still not smooth.

Just try to use raw_input with national characters. As far
as I remember it hasn't worked (on windows at least) since
2.2.

The clear improvement from 2.3 is that if you print unicode
strings to stdout, they will look correct both in the GUI
and in text mode (cmd.exe). That never worked before since
Windows use different code pages in Windows and in the text
mode (which is supposed to be DOS compatible).

Jul 19 '05 #5

Similar topics

17594

Writing UTF-8 string to UNICODE file

by: Michael Weir | last post by:

I'm sure this is a very simple thing to do, once you know how to do it, but I am having no fun at all trying to write utf-8 strings to a unicode file. Does anyone have a couple of lines of code...

Python

5251

Unicode from Web to MySQL

by: Bill Eldridge | last post by:

I'm trying to grab a document off the Web and toss it into a MySQL database, but I keep running into the various encoding problems with Unicode (that aren't a problem for me with GB2312, BIG 5,...

Python

7262

Problem calling unmanaged API from C#, works from VB.NET

by: Ulrika Ziverts | last post by:

Hello! I have a .NET application that communicates with an AS400 application through PCOM. I call an unmanaged function in the PCOM API to get a string back representing the screen in the host...

C# / C Sharp

4520

C dll reference/call works in VB6, not working in VB.NET

by: John | last post by:

I am trying to call a third party company's dll that I have no control over. They tell me it was written in C. The declaration in VB6 is as follows: Private Declare Sub csub Lib "rlpolk.dll"...

Visual Basic .NET

8011

Convert DOS Cyrillic text to Unicode

by: Nikolay Petrov | last post by:

How can I convert DOS cyrillic text to Unicode

Visual Basic .NET

34069

Unicode to ASCII string conversion

by: Ger | last post by:

I have not been able to find a simple, straight forward Unicode to ASCII string conversion function in VB.Net. Is that because such a function does not exists or do I overlook it? I found...

Visual Basic .NET

2486

Can I get the 8bit-string representation of any unicode string

by: wanghz | last post by:

Hello, everyone. I have a problem when I'm processing unicode strings. Is it possible to get the 8bit-string representation of any unicode string? Suppose I get a unicode string: a =...

Python

3499

In Debug it works but not in Release.

by: jt | last post by:

For some stupid reason, I can't get this to work in Release mode, but works well in Debug mode. Below is the function: Here is the line: pos=strpos(pszCmdLine,cmdLineStr); // in release mode...

C / C++

2742

C#-APP: problem with DllImport when string not the first param

by: JohnCox | last post by:

I have a simple Win32 DLL I wrote named "SimpleLib" that exports two functions. It is written in C++ and compiled with __stdcall (/Gz) and with the preprocessor definition _MBCS (not Unicode). ...

.NET Framework

7207

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

7294

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

7361

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

7015

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

7470

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

5026

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

4693

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

1523

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp

749

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP