By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,294 Members | 2,650 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,294 IT Pros & Developers. It's quick & easy.

Locale confusion

P: n/a
[Long posting due to the examples, but pretty simple question.]

I'm sitting here with a Debian Linux 'Woody' system with the default Python
2.2 installation, and I want the re module to understand that
re.compile(r'\W+'. re.LOCALE) doesn't match my national, accented
characters.

I don't quite understand how the locale module reasons about these things,
and Python doesn't seem to act as other programs on my system. Bug or my
mistake? Here's my environment:

frailea> env |grep -e LC -e LANG
LC_MESSAGES=C
LC_TIME=C
LANG=sv_SE
LC_NUMERIC=C
LC_MONETARY=C
frailea> locale
LANG=sv_SE
LC_CTYPE="sv_SE"
LC_NUMERIC=C
LC_TIME=C
LC_COLLATE="sv_SE"
LC_MONETARY=C
LC_MESSAGES=C
LC_PAPER="sv_SE"
LC_NAME="sv_SE"
LC_ADDRESS="sv_SE"
LC_TELEPHONE="sv_SE"
LC_MEASUREMENT="sv_SE"
LC_IDENTIFICATION="sv_SE"
LC_ALL=

This seems to indicate that $LANG acts as a fallback when other things (e.g.
LC_CTYPE isn't defined) and that's also what the glibc setlocale(3) man page
says. Works well for me in general, too. However, consider this tiny Python
program:

frailea> cat foo
import locale
print locale.getlocale()
locale.setlocale(locale.LC_CTYPE)
print locale.getlocale()

When I paste it into an interactive Python session, the locale is already
set up correctly (which is what I suppose interactive mode /should/ do):
import locale
print locale.getlocale() ['sv_SE', 'ISO8859-1'] locale.setlocale(locale.LC_CTYPE) 'sv_SE' print locale.getlocale() ['sv_SE', 'ISO8859-1']


When I run it as a script it isn't though, and the setlocale() call does not
appear to fall back to looking at $LANG as it's supposed to(?), so my
LC_CTYPE remains in the POSIX locale:

frailea> python foo
(None, None)
(None, None)

The corresponding program written in C works as expected:

frailea> cat foot.c
#include <stdio.h>
#include <locale.h>
int main(void) {
printf("%s\n", setlocale(LC_CTYPE, 0));
printf("%s\n", setlocale(LC_CTYPE, ""));
printf("%s\n", setlocale(LC_CTYPE, 0));
return 0;
}
frailea> ./foot
C
sv_SE
sv_SE

So, is this my fault or Python's? I realize I could just adapt and set
$LC_CTYPE explicitly in my environment, but I don't want to capitulate for a
Python bug, if that's what this is.

BR,
Jorgen

--
// Jorgen Grahn <jgrahn@ Ph'nglui mglw'nafh Cthulhu
\X/ algonet.se> R'lyeh wgah'nagl fhtagn!
Jul 18 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
Jorgen Grahn wrote:
[snip]

frailea> cat foo
import locale
print locale.getlocale()
locale.setlocale(locale.LC_CTYPE)
print locale.getlocale()

When I paste it into an interactive Python session, the locale is already set up correctly (which is what I suppose interactive mode /should/ do):
import locale
print locale.getlocale() ['sv_SE', 'ISO8859-1'] locale.setlocale(locale.LC_CTYPE) 'sv_SE' print locale.getlocale() ['sv_SE', 'ISO8859-1']

When I run it as a script it isn't though, and the setlocale() call

does not appear to fall back to looking at $LANG as it's supposed to(?), so my
LC_CTYPE remains in the POSIX locale:

frailea> python foo
(None, None)
(None, None)

The corresponding program written in C works as expected:

frailea> cat foot.c
#include <stdio.h>
#include <locale.h>
int main(void) {
printf("%s\n", setlocale(LC_CTYPE, 0));
printf("%s\n", setlocale(LC_CTYPE, ""));
printf("%s\n", setlocale(LC_CTYPE, 0));
return 0;
}
frailea> ./foot
C
sv_SE
sv_SE

So, is this my fault or Python's? I realize I could just adapt and set $LC_CTYPE explicitly in my environment, but I don't want to capitulate for a Python bug, if that's what this is.


Try locale.setlocale(locale.LC_CTYPE,"") as in your C program. It would
be great if locale.setlocale with one parameter would be deprecated,
because it suddenly acts like getlocale. It's unpythonic.

By the way, since you took time to setup various LC_* variables there
is no need to play with LC_CTYPE category. Just use the standard idiom.
import locale
locale.setlocale(LC_ALL,"")

Serge.

Jul 18 '05 #2

P: n/a
On 11 Jan 2005 05:49:32 -0800, Se*********@gmail.com <Se*********@gmail.com> wrote:
Jorgen Grahn wrote:
[snip]

frailea> cat foo
import locale
print locale.getlocale()
locale.setlocale(locale.LC_CTYPE)
print locale.getlocale()
....
When I run it as a script it isn't though, and the setlocale() call does not
appear to fall back to looking at $LANG as it's supposed to(?), so my
LC_CTYPE remains in the POSIX locale: .... So, is this my fault or Python's? I realize I could just adapt and

set
$LC_CTYPE explicitly in my environment, but I don't want to

capitulate for a
Python bug, if that's what this is.


Try locale.setlocale(locale.LC_CTYPE,"") as in your C program.


Oops, you are right. locale.setlocale(locale.LC_CTYPE,"") sets the locale
from my environment (and gets it right!) while
locale.setlocale(locale.LC_CTYPE) /returns/ the current locale. I don't know
how I could have missed that, since it's clearly documented and also maps
directly to C usage.
It would
be great if locale.setlocale with one parameter would be deprecated,
because it suddenly acts like getlocale. It's unpythonic.
I dislike the term "unpythonic", but I tend to agree with you in practice
here. Even better, but maybe not feasible, would be an approach to locales
which doesn't involve changing a global state in this fashion.
By the way, since you took time to setup various LC_* variables there
is no need to play with LC_CTYPE category. Just use the standard idiom.
import locale
locale.setlocale(LC_ALL,"")


Thanks for pointing that out. I picked out LC_CTYPE for my small program
because I was in a hurry and didn't want to risk non-standard sorting
elsewhere in the program. I hate what the LC_COLLATE=C does to swedish
national characters, but I hate what LC_COLLATE=sv_SE does to non-alphabetic
characters even more.

To paraphrase Barbie: "i18n is hard". ;-)

/Jorgen

--
// Jorgen Grahn <jgrahn@ Ph'nglui mglw'nafh Cthulhu
\X/ algonet.se> R'lyeh wgah'nagl fhtagn!
Jul 18 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.