Connecting Tech Pros Worldwide Forums | Help | Site Map

locale.CODESET / different in python shell and scripts

Nuff Said
Guest
 
Posts: n/a
#1: Jul 18 '05
When I type the following code in the interactive python shell,
I get 'UTF-8'; but if I put the code into a Python script and
run the script - in the same terminal on my Linux box in which
I opened the python shell before -, I get 'ANSI_X3.4-1968'.

How does that come?

Thanks in advance for your answers! Nuff.


The Code:

import locale
print locale.nl_langinfo(locale.CODESET)


Martin v. Löwis
Guest
 
Posts: n/a
#2: Jul 18 '05

re: locale.CODESET / different in python shell and scripts


Nuff Said wrote:[color=blue]
> When I type the following code in the interactive python shell,
> I get 'UTF-8'; but if I put the code into a Python script and
> run the script - in the same terminal on my Linux box in which
> I opened the python shell before -, I get 'ANSI_X3.4-1968'.
>
> How does that come?[/color]

Because, for some reason, locale.setlocale() is called in your
interactive startup, but not in the normal startup.

It is uncertain why this happens - setlocale is not normally
called automatically; not even in interactive mode. Perhaps
you have created your own startup file?

Regards,
Martin

Michael Hudson
Guest
 
Posts: n/a
#3: Jul 18 '05

re: locale.CODESET / different in python shell and scripts


"Martin v. Löwis" <martin@v.loewis.de> writes:
[color=blue]
> Nuff Said wrote:[color=green]
> > When I type the following code in the interactive python shell,
> > I get 'UTF-8'; but if I put the code into a Python script and
> > run the script - in the same terminal on my Linux box in which
> > I opened the python shell before -, I get 'ANSI_X3.4-1968'.
> > How does that come?[/color]
>
> Because, for some reason, locale.setlocale() is called in your
> interactive startup, but not in the normal startup.
>
> It is uncertain why this happens - setlocale is not normally
> called automatically; not even in interactive mode. Perhaps
> you have created your own startup file?[/color]

readline calls setlocale() iirc.

Cheers,
mwh

--
Not only does the English Language borrow words from other
languages, it sometimes chases them down dark alleys, hits
them over the head, and goes through their pockets. -- Eddy Peters
Martin v. Löwis
Guest
 
Posts: n/a
#4: Jul 18 '05

re: locale.CODESET / different in python shell and scripts


Michael Hudson wrote:[color=blue][color=green]
>>It is uncertain why this happens - setlocale is not normally
>>called automatically; not even in interactive mode. Perhaps
>>you have created your own startup file?[/color]
>
>
> readline calls setlocale() iirc.[/color]

Sure. However, we restore the locale to what it was before
readline initialization messes with the locale.

Regards,
Martin

Nuff Said
Guest
 
Posts: n/a
#5: Jul 18 '05

re: locale.CODESET / different in python shell and scripts


On Tue, 27 Apr 2004 22:29:59 +0200, Martin v. Löwis wrote:[color=blue]
> Because, for some reason, locale.setlocale() is called in your
> interactive startup, but not in the normal startup.
>
> It is uncertain why this happens - setlocale is not normally
> called automatically; not even in interactive mode. Perhaps
> you have created your own startup file?[/color]

I use two Python versions on my Linux box (Fedora Core 1):
the Python 2.2 which came with Fedora and a Python 2.3 which
I compiled myself. (I didn't tinker with the last one;
Fedora's Python is a (well known) mess.)

Both Python versions give me 'ANSI_X3.4-1968' when I run a script
with 'print locale.nl_langinfo(locale.CODESET)'.
When I execute the same command in an interactive Python shell,
I get the (correct) 'UTF-8'.

(By 'correct', I mean that the bash command 'locale' gives me
'LANG=en_US.UTF-8, LC_CTYPE="en_US.UTF-8", ...'. This seems to
be correct, because e.g. the 'less ...' command shows files which
are UTF-8 encoded in the correct way; files which are e.g.
'ISO-8859-1' encoded are not shown in the correct way.)


Things are getting even worse:

I write a Python script which uses Unicode strings; now I want
to 'print ...' one of those strings (containing non-ASCII characters;
e.g. German umlauts).
With Fedora's Python 2.2 I have to use 'print s.encode('ISO-8859-1')
or something similar.
With my self-compiled Python 2.3, I have to use (the expected)
'print s.encode('UTF-8')' (though it shows me 'ANSI_X3.4-1968' when
using 'print locale.nl_langinfo(locale.CODESET)' in the same file).

???

Any ideas what's going wrong here?

(I tried 'python -S ...'; doesn't make a difference.)

Martin v. Löwis
Guest
 
Posts: n/a
#6: Jul 18 '05

re: locale.CODESET / different in python shell and scripts


Nuff Said wrote:[color=blue]
> Both Python versions give me 'ANSI_X3.4-1968' when I run a script
> with 'print locale.nl_langinfo(locale.CODESET)'.
> When I execute the same command in an interactive Python shell,
> I get the (correct) 'UTF-8'.[/color]

PLEASE invoke

locale.setlocale(locale.LC_ALL, "")

before invoking nl_langinfo. Different C libraries behave differently
in their nl_langinfo responses if setlocale hasn't been called.

Regards,
Martin

Nuff Said
Guest
 
Posts: n/a
#7: Jul 18 '05

re: locale.CODESET / different in python shell and scripts


On Thu, 29 Apr 2004 22:14:23 +0200, Martin v. Löwis wrote:[color=blue]
> PLEASE invoke
>
> locale.setlocale(locale.LC_ALL, "")
>
> before invoking nl_langinfo. Different C libraries behave differently
> in their nl_langinfo responses if setlocale hasn't been called.[/color]

Thanks a lot for your help!

That solved (part of) the problem; now I get 'UTF-8' (which is correct)
when running the following script (with either my self-compiled Python
2.3 or Fedora's Python 2.2):

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

import locale

locale.setlocale(locale.LC_ALL, "")
encoding = locale.nl_langinfo(locale.CODESET)
print encoding


Still, one problem remains:

When I add the following line to the above script

print u"schönes Mädchen".encode(encoding)

the result is:

schönes Mädchen (with my self-compiled Python 2.3)
schönes Mädchen (with Fedora's Python 2.2)

I observed, that my Python gives me (the correct value) 15 for
len(u"schönes Mädchen") whereas Fedora's Python says 17 (one more
for each German umlaut, i.e. the len of the UTF-8 representation of
the string; observe, that the file uses the coding cookie for UTF-8).
Maybe Fedora's Python was compiled without Unicode support?

(Is that even possible? I recall something about a UCS2 resp.
UCS4 switch when compiling Python; but without Unicode support?
And if it would be possible, shouldn't a Python without Unicode
support disallow strings of the form u"..." resp. show a warning???)


This really drives me nuts because I thought the above approach
should be the correct way to assure that Python scripts can print
non-ASCII characters on any terminal (which is able to display
those characters in some encoding as UTF-8, ISO-8859-x, ...).

Is there something I do utterly wrong here?
Python can't be that complicated?

Nuff.

Martin v. Löwis
Guest
 
Posts: n/a
#8: Jul 18 '05

re: locale.CODESET / different in python shell and scripts


Nuff Said wrote:[color=blue]
> When I add the following line to the above script
>
> print u"schönes Mädchen".encode(encoding)
>
> the result is:
>
> schönes Mädchen (with my self-compiled Python 2.3)
> schönes Mädchen (with Fedora's Python 2.2)
>
> I observed, that my Python gives me (the correct value) 15 for
> len(u"schönes Mädchen") whereas Fedora's Python says 17 (one more
> for each German umlaut, i.e. the len of the UTF-8 representation of
> the string; observe, that the file uses the coding cookie for UTF-8).
> Maybe Fedora's Python was compiled without Unicode support?[/color]

Certainly not: It would not support u"" literals without Unicode.

Please understand that you can use non-ASCII characters in source
code unless you also use the facilities described in

http://www.python.org/peps/pep-0263.html

So instead of "ö", you should write "\xf6".
[color=blue]
> Is there something I do utterly wrong here?[/color]

Yes, you are.
[color=blue]
> Python can't be that complicated?[/color]

Python is not. Encodings are.

Regards,
Martin

Nuff Said
Guest
 
Posts: n/a
#9: Jul 18 '05

re: locale.CODESET / different in python shell and scripts


On Fri, 30 Apr 2004 04:30:34 +0200, Martin v. Löwis wrote:
[color=blue]
> Nuff Said wrote:[color=green]
>> When I add the following line to the above script
>>
>> print u"schönes Mädchen".encode(encoding)
>>
>> the result is:
>>
>> schönes Mädchen (with my self-compiled Python 2.3)
>> schönes Mädchen (with Fedora's Python 2.2)
>>
>> I observed, that my Python gives me (the correct value) 15 for
>> len(u"schönes Mädchen") whereas Fedora's Python says 17 (one more
>> for each German umlaut, i.e. the len of the UTF-8 representation of
>> the string; observe, that the file uses the coding cookie for UTF-8).
>> Maybe Fedora's Python was compiled without Unicode support?[/color]
>
> Certainly not: It would not support u"" literals without Unicode.[/color]

That's what I thought.

[color=blue]
> Please understand that you can use non-ASCII characters in source
> code unless you also use the facilities described in
>
> http://www.python.org/peps/pep-0263.html
>
> So instead of "ö", you should write "\xf6".[/color]

But *I do use* the line

# -*- coding: UTF-8 -*-

from your PEP (directly after the shebang-line; s. the full source
code in my earlier posting). I thought, that allows me to write u"ö"
(which - as described above - works in one of my two Pythons).

??? Nuff.


Nuff Said
Guest
 
Posts: n/a
#10: Jul 18 '05

re: locale.CODESET / different in python shell and scripts


On Fri, 30 Apr 2004 11:56:19 +0200, Nuff Said wrote:[color=blue]
> But *I do use* the line
>
> # -*- coding: UTF-8 -*-
>
> from your PEP (directly after the shebang-line; s. the full source
> code in my earlier posting). I thought, that allows me to write u"ö"
> (which - as described above - works in one of my two Pythons).[/color]

Follow up to myself:

Arrgh!!! Think I got it now. Your PEP 263: 'Source Code Encodings' was
incorporated into Python 2.3 (i.e. my self-compiled Python) but not
into Python 2.2 (Fedora's Python).

Thanks for your help!

Closed Thread