473,320 Members | 1,946 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

encoding of sys.argv ?

Hi all,

I am desperately searching for the encoding of sys.argv.

I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem. sys.getdefaultencoding() is "ascii" and sys.getfilesystemencoding() is "utf-8". However, sys.argv is neither in ASCII (since I can pass French accentuated character), nor in UTF-8. It seems to be encoded in "latin-1", but why ?

Jiba
Oct 23 '06 #1
6 2721
On 2006-10-23, Jiba <ji******@free.frwrote:
Hi all,

I am desperately searching for the encoding of sys.argv.

I use a Linux box, with French UTF-8 locales and an UTF-8
filesystem. sys.getdefaultencoding() is "ascii" and
sys.getfilesystemencoding() is "utf-8". However, sys.argv is
neither in ASCII (since I can pass French accentuated
character), nor in UTF-8. It seems to be encoded in "latin-1",
but why ?
It will most likely be in the encoding of the terminal from which
you call Python, in other words, sys.stdin.encoding. Your only
hope of accepting non-US-ASCII command line arguments in this
manner is that sys.stdin.encoding is divined correctly by Python.

--
Neil Cerutti
Facts are stupid things. --Ronald Reagan
Oct 23 '06 #2
In <20061023130504.26823717@autremonde>, Jiba wrote:
I am desperately searching for the encoding of sys.argv.

I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem.
sys.getdefaultencoding() is "ascii" and sys.getfilesystemencoding() is
"utf-8". However, sys.argv is neither in ASCII (since I can pass French
accentuated character), nor in UTF-8. It seems to be encoded in
"latin-1", but why ?
There is no way to determine the encoding. The application that starts
another and sets the arguments can use any encoding it likes and there's
no standard way to find out which it was.

The `sys.stdin.encoding` approach isn't very robust because this will only
be set if the interpreter can find out what encoding is used on `stdin`.
That's impossible if the `stdin` is the input from another file.

Make it explicit: Add a command line option to choose the encoding.

Ciao,
Marc 'BlackJack' Rintsch
Oct 23 '06 #3
Jiba wrote:
Hi all,

I am desperately searching for the encoding of sys.argv.

I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem. sys.getdefaultencoding() is "ascii" and sys.getfilesystemencoding() is "utf-8". However, sys.argv is neither in ASCII (since I can pass French accentuated character), nor in UTF-8. It seems to be encoded in "latin-1", but why ?

Jiba
Here's what I see in a Windows command prompt interactive session:

Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
Started with C:/Steve/.pythonrc
>>import sys
sys.stdin.encoding
'cp437'
>>sys.getdefaultencoding()
'ascii'
>>>
But in a Cygwin command window on the same machine I see

import syPython 2.5b2 (trunk:50713, Jul 19 2006, 16:04:09)
[GCC 3.4.4 (cygming special) (gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
sStarted with C:/Steve/.pythonrc
>>import sys
sys.stdin.encoding
'US-ASCII'
>>sys.getdefaultencoding()
'ascii'
>>>
The strings in sys.argv are encoded the same as the standard input, I
bleieve.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Oct 23 '06 #4

Jiba wrote:
Hi all,

I am desperately searching for the encoding of sys.argv.

I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem. sys.getdefaultencoding() is "ascii" and sys.getfilesystemencoding() is "utf-8". However, sys.argv is neither in ASCII (since I can pass French accentuated character), nor in UTF-8. It seems to be encoded in "latin-1", but why ?
Your system is misconfigured, complain to your distribution. On UNIX
sys.getfilesystemencoding(), sys.stdin.encoding, sys.stdout.encoding,
locale.getprefferedencoding and the encoding of the characters you type
should be the same.

Oct 23 '06 #5

Marc 'BlackJack' Rintsch wrote:
In <20061023130504.26823717@autremonde>, Jiba wrote:
I am desperately searching for the encoding of sys.argv.

I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem.
sys.getdefaultencoding() is "ascii" and sys.getfilesystemencoding() is
"utf-8". However, sys.argv is neither in ASCII (since I can pass French
accentuated character), nor in UTF-8. It seems to be encoded in
"latin-1", but why ?

There is no way to determine the encoding. The application that starts
another and sets the arguments can use any encoding it likes and there's
no standard way to find out which it was.
There is standard way: nl_langinfo function
<http://www.opengroup.org/onlinepubs/009695399/functions/nl_langinfo.html>
The code in pythonrun.c properly uses it find out the encoding. The
other question if Linux or *BSD distributions confirm to the standard.

-- Leo.

Oct 23 '06 #6
Jiba schrieb:
I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem.
sys.getdefaultencoding() is "ascii" and sys.getfilesystemencoding()
is "utf-8". However, sys.argv is neither in ASCII (since I can pass
French accentuated character), nor in UTF-8. It seems to be encoded
in "latin-1", but why ?
Let me second Leo Kislov's analysis. They should be encoded in
locale.getpreferredencoding(), which should be UTF-8. Are you
*sure* they aren't encoded in this way?

On my Debian system, I get this:

martin@mira:~/tmp$ echo $LANG
de_DE.UTF-8
martin@mira:~/tmp$ cat a.py
import sys
print sys.argv

martin@mira:~/tmp$ python a.py Martin v. Löwis
['a.py', 'Martin', 'v.', 'L\xc3\xb6wis']

So clearly, my terminal application + shell passes them as UTF-8,
as it should. The terminal application is KDE konsole; the shell
is bash. The shell *pretty likely* passes the arguments "through"
as-read from the terminal, so if you are not seeing UTF-8, you
have managed to misconfigure your terminal.

Regards,
Martin
Oct 23 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: Robin Sanderson | last post by:
Sorry in advance if this is a stupid question - I am new to C++. In the process of converting program to be run from the command line into a function to be run from another program I noticed...
3
by: Petr Prikryl | last post by:
Hi, When solving the problem of passing the unicode directory name through command line into a script (MS Windows environment), I have discovered that I do not understand what encoding should...
28
by: Charles Sullivan | last post by:
I'm working on a program which has a "tree" of command line arguments, i.e., myprogram level1 ]] such that there can be more than one level2 argument for each level1 argument and more than one...
3
by: Diez B. Roggisch | last post by:
Hi, I've got to deal with a pretty huge XML-document, and to do so I use the cElementTree.iterparse functionality. Working great. Only trouble: The guys creating that chunk of XML - well, lets...
22
by: Joe Smith | last post by:
It is nothing short of embarrassing to feel the need to ask for help on this. I can't see how I would make the main control for this. What I want is a for loop and a test condition. And while I...
1
by: dpahl | last post by:
Try to read an rss file with: $file = trim($argv); $xml = simplexml_load_file($file); My RSS-reader has no problem reading the file, but when I try to read it with simplexml_load_file I...
17
by: Michael Reichenbach | last post by:
Here is the example code. int main(int argc, char *argv) { string Result; WIN32_FIND_DATA daten; HANDLE h = FindFirstFile(TEXT("c://test"), &daten); system("PAUSE"); return EXIT_SUCCESS; }
6
by: Harshad Modi | last post by:
hello , I make one function for encoding latin1 to utf-8. but i think it is not work proper. plz guide me. it is not get proper result . such that i got "Belgi�" using this method, (Belgium)...
10
by: hsmit.home | last post by:
Hi everyone, I'm having some difficulty with the following piece of code. I have stripped it to it's bare minimum to demonstrate the problem at hand. Compiler: MS Visual C++ 2005 Express...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.