By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,427 Members | 1,378 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,427 IT Pros & Developers. It's quick & easy.

locale.CODESET / different in python shell and scripts

P: n/a
When I type the following code in the interactive python shell,
I get 'UTF-8'; but if I put the code into a Python script and
run the script - in the same terminal on my Linux box in which
I opened the python shell before -, I get 'ANSI_X3.4-1968'.

How does that come?

Thanks in advance for your answers! Nuff.
The Code:

import locale
print locale.nl_langinfo(locale.CODESET)

Jul 18 '05 #1
Share this Question
Share on Google+
9 Replies


P: n/a
Nuff Said wrote:
When I type the following code in the interactive python shell,
I get 'UTF-8'; but if I put the code into a Python script and
run the script - in the same terminal on my Linux box in which
I opened the python shell before -, I get 'ANSI_X3.4-1968'.

How does that come?


Because, for some reason, locale.setlocale() is called in your
interactive startup, but not in the normal startup.

It is uncertain why this happens - setlocale is not normally
called automatically; not even in interactive mode. Perhaps
you have created your own startup file?

Regards,
Martin

Jul 18 '05 #2

P: n/a
"Martin v. Löwis" <ma****@v.loewis.de> writes:
Nuff Said wrote:
When I type the following code in the interactive python shell,
I get 'UTF-8'; but if I put the code into a Python script and
run the script - in the same terminal on my Linux box in which
I opened the python shell before -, I get 'ANSI_X3.4-1968'.
How does that come?


Because, for some reason, locale.setlocale() is called in your
interactive startup, but not in the normal startup.

It is uncertain why this happens - setlocale is not normally
called automatically; not even in interactive mode. Perhaps
you have created your own startup file?


readline calls setlocale() iirc.

Cheers,
mwh

--
Not only does the English Language borrow words from other
languages, it sometimes chases them down dark alleys, hits
them over the head, and goes through their pockets. -- Eddy Peters
Jul 18 '05 #3

P: n/a
Michael Hudson wrote:
It is uncertain why this happens - setlocale is not normally
called automatically; not even in interactive mode. Perhaps
you have created your own startup file?

readline calls setlocale() iirc.


Sure. However, we restore the locale to what it was before
readline initialization messes with the locale.

Regards,
Martin

Jul 18 '05 #4

P: n/a
On Tue, 27 Apr 2004 22:29:59 +0200, Martin v. Löwis wrote:
Because, for some reason, locale.setlocale() is called in your
interactive startup, but not in the normal startup.

It is uncertain why this happens - setlocale is not normally
called automatically; not even in interactive mode. Perhaps
you have created your own startup file?


I use two Python versions on my Linux box (Fedora Core 1):
the Python 2.2 which came with Fedora and a Python 2.3 which
I compiled myself. (I didn't tinker with the last one;
Fedora's Python is a (well known) mess.)

Both Python versions give me 'ANSI_X3.4-1968' when I run a script
with 'print locale.nl_langinfo(locale.CODESET)'.
When I execute the same command in an interactive Python shell,
I get the (correct) 'UTF-8'.

(By 'correct', I mean that the bash command 'locale' gives me
'LANG=en_US.UTF-8, LC_CTYPE="en_US.UTF-8", ...'. This seems to
be correct, because e.g. the 'less ...' command shows files which
are UTF-8 encoded in the correct way; files which are e.g.
'ISO-8859-1' encoded are not shown in the correct way.)
Things are getting even worse:

I write a Python script which uses Unicode strings; now I want
to 'print ...' one of those strings (containing non-ASCII characters;
e.g. German umlauts).
With Fedora's Python 2.2 I have to use 'print s.encode('ISO-8859-1')
or something similar.
With my self-compiled Python 2.3, I have to use (the expected)
'print s.encode('UTF-8')' (though it shows me 'ANSI_X3.4-1968' when
using 'print locale.nl_langinfo(locale.CODESET)' in the same file).

???

Any ideas what's going wrong here?

(I tried 'python -S ...'; doesn't make a difference.)

Jul 18 '05 #5

P: n/a
Nuff Said wrote:
Both Python versions give me 'ANSI_X3.4-1968' when I run a script
with 'print locale.nl_langinfo(locale.CODESET)'.
When I execute the same command in an interactive Python shell,
I get the (correct) 'UTF-8'.


PLEASE invoke

locale.setlocale(locale.LC_ALL, "")

before invoking nl_langinfo. Different C libraries behave differently
in their nl_langinfo responses if setlocale hasn't been called.

Regards,
Martin

Jul 18 '05 #6

P: n/a
On Thu, 29 Apr 2004 22:14:23 +0200, Martin v. Löwis wrote:
PLEASE invoke

locale.setlocale(locale.LC_ALL, "")

before invoking nl_langinfo. Different C libraries behave differently
in their nl_langinfo responses if setlocale hasn't been called.


Thanks a lot for your help!

That solved (part of) the problem; now I get 'UTF-8' (which is correct)
when running the following script (with either my self-compiled Python
2.3 or Fedora's Python 2.2):

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

import locale

locale.setlocale(locale.LC_ALL, "")
encoding = locale.nl_langinfo(locale.CODESET)
print encoding
Still, one problem remains:

When I add the following line to the above script

print u"schönes Mädchen".encode(encoding)

the result is:

schönes Mädchen (with my self-compiled Python 2.3)
schönes Mädchen (with Fedora's Python 2.2)

I observed, that my Python gives me (the correct value) 15 for
len(u"schönes Mädchen") whereas Fedora's Python says 17 (one more
for each German umlaut, i.e. the len of the UTF-8 representation of
the string; observe, that the file uses the coding cookie for UTF-8).
Maybe Fedora's Python was compiled without Unicode support?

(Is that even possible? I recall something about a UCS2 resp.
UCS4 switch when compiling Python; but without Unicode support?
And if it would be possible, shouldn't a Python without Unicode
support disallow strings of the form u"..." resp. show a warning???)
This really drives me nuts because I thought the above approach
should be the correct way to assure that Python scripts can print
non-ASCII characters on any terminal (which is able to display
those characters in some encoding as UTF-8, ISO-8859-x, ...).

Is there something I do utterly wrong here?
Python can't be that complicated?

Nuff.

Jul 18 '05 #7

P: n/a
Nuff Said wrote:
When I add the following line to the above script

print u"schönes Mädchen".encode(encoding)

the result is:

schönes Mädchen (with my self-compiled Python 2.3)
schönes Mädchen (with Fedora's Python 2.2)

I observed, that my Python gives me (the correct value) 15 for
len(u"schönes Mädchen") whereas Fedora's Python says 17 (one more
for each German umlaut, i.e. the len of the UTF-8 representation of
the string; observe, that the file uses the coding cookie for UTF-8).
Maybe Fedora's Python was compiled without Unicode support?
Certainly not: It would not support u"" literals without Unicode.

Please understand that you can use non-ASCII characters in source
code unless you also use the facilities described in

http://www.python.org/peps/pep-0263.html

So instead of "ö", you should write "\xf6".
Is there something I do utterly wrong here?
Yes, you are.
Python can't be that complicated?


Python is not. Encodings are.

Regards,
Martin

Jul 18 '05 #8

P: n/a
On Fri, 30 Apr 2004 04:30:34 +0200, Martin v. Löwis wrote:
Nuff Said wrote:
When I add the following line to the above script

print u"schönes Mädchen".encode(encoding)

the result is:

schönes Mädchen (with my self-compiled Python 2.3)
schönes Mädchen (with Fedora's Python 2.2)

I observed, that my Python gives me (the correct value) 15 for
len(u"schönes Mädchen") whereas Fedora's Python says 17 (one more
for each German umlaut, i.e. the len of the UTF-8 representation of
the string; observe, that the file uses the coding cookie for UTF-8).
Maybe Fedora's Python was compiled without Unicode support?
Certainly not: It would not support u"" literals without Unicode.


That's what I thought.

Please understand that you can use non-ASCII characters in source
code unless you also use the facilities described in

http://www.python.org/peps/pep-0263.html

So instead of "ö", you should write "\xf6".


But *I do use* the line

# -*- coding: UTF-8 -*-

from your PEP (directly after the shebang-line; s. the full source
code in my earlier posting). I thought, that allows me to write u"ö"
(which - as described above - works in one of my two Pythons).

??? Nuff.
Jul 18 '05 #9

P: n/a
On Fri, 30 Apr 2004 11:56:19 +0200, Nuff Said wrote:
But *I do use* the line

# -*- coding: UTF-8 -*-

from your PEP (directly after the shebang-line; s. the full source
code in my earlier posting). I thought, that allows me to write u"ö"
(which - as described above - works in one of my two Pythons).


Follow up to myself:

Arrgh!!! Think I got it now. Your PEP 263: 'Source Code Encodings' was
incorporated into Python 2.3 (i.e. my self-compiled Python) but not
into Python 2.2 (Fedora's Python).

Thanks for your help!

Jul 18 '05 #10

This discussion thread is closed

Replies have been disabled for this discussion.