469,363 Members | 2,640 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,363 developers. It's quick & easy.

encoding of sys.argv ?

Hi all,

I am desperately searching for the encoding of sys.argv.

I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem. sys.getdefaultencoding() is "ascii" and sys.getfilesystemencoding() is "utf-8". However, sys.argv is neither in ASCII (since I can pass French accentuated character), nor in UTF-8. It seems to be encoded in "latin-1", but why ?

Jiba
Oct 23 '06 #1
6 2469
On 2006-10-23, Jiba <ji******@free.frwrote:
Hi all,

I am desperately searching for the encoding of sys.argv.

I use a Linux box, with French UTF-8 locales and an UTF-8
filesystem. sys.getdefaultencoding() is "ascii" and
sys.getfilesystemencoding() is "utf-8". However, sys.argv is
neither in ASCII (since I can pass French accentuated
character), nor in UTF-8. It seems to be encoded in "latin-1",
but why ?
It will most likely be in the encoding of the terminal from which
you call Python, in other words, sys.stdin.encoding. Your only
hope of accepting non-US-ASCII command line arguments in this
manner is that sys.stdin.encoding is divined correctly by Python.

--
Neil Cerutti
Facts are stupid things. --Ronald Reagan
Oct 23 '06 #2
In <20061023130504.26823717@autremonde>, Jiba wrote:
I am desperately searching for the encoding of sys.argv.

I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem.
sys.getdefaultencoding() is "ascii" and sys.getfilesystemencoding() is
"utf-8". However, sys.argv is neither in ASCII (since I can pass French
accentuated character), nor in UTF-8. It seems to be encoded in
"latin-1", but why ?
There is no way to determine the encoding. The application that starts
another and sets the arguments can use any encoding it likes and there's
no standard way to find out which it was.

The `sys.stdin.encoding` approach isn't very robust because this will only
be set if the interpreter can find out what encoding is used on `stdin`.
That's impossible if the `stdin` is the input from another file.

Make it explicit: Add a command line option to choose the encoding.

Ciao,
Marc 'BlackJack' Rintsch
Oct 23 '06 #3
Jiba wrote:
Hi all,

I am desperately searching for the encoding of sys.argv.

I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem. sys.getdefaultencoding() is "ascii" and sys.getfilesystemencoding() is "utf-8". However, sys.argv is neither in ASCII (since I can pass French accentuated character), nor in UTF-8. It seems to be encoded in "latin-1", but why ?

Jiba
Here's what I see in a Windows command prompt interactive session:

Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
Started with C:/Steve/.pythonrc
>>import sys
sys.stdin.encoding
'cp437'
>>sys.getdefaultencoding()
'ascii'
>>>
But in a Cygwin command window on the same machine I see

import syPython 2.5b2 (trunk:50713, Jul 19 2006, 16:04:09)
[GCC 3.4.4 (cygming special) (gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
sStarted with C:/Steve/.pythonrc
>>import sys
sys.stdin.encoding
'US-ASCII'
>>sys.getdefaultencoding()
'ascii'
>>>
The strings in sys.argv are encoded the same as the standard input, I
bleieve.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Oct 23 '06 #4

Jiba wrote:
Hi all,

I am desperately searching for the encoding of sys.argv.

I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem. sys.getdefaultencoding() is "ascii" and sys.getfilesystemencoding() is "utf-8". However, sys.argv is neither in ASCII (since I can pass French accentuated character), nor in UTF-8. It seems to be encoded in "latin-1", but why ?
Your system is misconfigured, complain to your distribution. On UNIX
sys.getfilesystemencoding(), sys.stdin.encoding, sys.stdout.encoding,
locale.getprefferedencoding and the encoding of the characters you type
should be the same.

Oct 23 '06 #5

Marc 'BlackJack' Rintsch wrote:
In <20061023130504.26823717@autremonde>, Jiba wrote:
I am desperately searching for the encoding of sys.argv.

I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem.
sys.getdefaultencoding() is "ascii" and sys.getfilesystemencoding() is
"utf-8". However, sys.argv is neither in ASCII (since I can pass French
accentuated character), nor in UTF-8. It seems to be encoded in
"latin-1", but why ?

There is no way to determine the encoding. The application that starts
another and sets the arguments can use any encoding it likes and there's
no standard way to find out which it was.
There is standard way: nl_langinfo function
<http://www.opengroup.org/onlinepubs/009695399/functions/nl_langinfo.html>
The code in pythonrun.c properly uses it find out the encoding. The
other question if Linux or *BSD distributions confirm to the standard.

-- Leo.

Oct 23 '06 #6
Jiba schrieb:
I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem.
sys.getdefaultencoding() is "ascii" and sys.getfilesystemencoding()
is "utf-8". However, sys.argv is neither in ASCII (since I can pass
French accentuated character), nor in UTF-8. It seems to be encoded
in "latin-1", but why ?
Let me second Leo Kislov's analysis. They should be encoded in
locale.getpreferredencoding(), which should be UTF-8. Are you
*sure* they aren't encoded in this way?

On my Debian system, I get this:

martin@mira:~/tmp$ echo $LANG
de_DE.UTF-8
martin@mira:~/tmp$ cat a.py
import sys
print sys.argv

martin@mira:~/tmp$ python a.py Martin v. Lwis
['a.py', 'Martin', 'v.', 'L\xc3\xb6wis']

So clearly, my terminal application + shell passes them as UTF-8,
as it should. The terminal application is KDE konsole; the shell
is bash. The shell *pretty likely* passes the arguments "through"
as-read from the terminal, so if you are not seeing UTF-8, you
have managed to misconfigure your terminal.

Regards,
Martin
Oct 23 '06 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

10 posts views Thread by Robin Sanderson | last post: by
3 posts views Thread by Petr Prikryl | last post: by
28 posts views Thread by Charles Sullivan | last post: by
3 posts views Thread by Diez B. Roggisch | last post: by
22 posts views Thread by Joe Smith | last post: by
6 posts views Thread by Harshad Modi | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
1 post views Thread by Marylou17 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.