Hi all,
I am desperately searching for the encoding of sys.argv.
I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem. sys.getdefaultencoding() is "ascii" and sys.getfilesystemencoding() is "utf-8". However, sys.argv is neither in ASCII (since I can pass French accentuated character), nor in UTF-8. It seems to be encoded in "latin-1", but why ?
Jiba 6 2721
On 2006-10-23, Jiba <ji******@free.frwrote:
Hi all,
I am desperately searching for the encoding of sys.argv.
I use a Linux box, with French UTF-8 locales and an UTF-8
filesystem. sys.getdefaultencoding() is "ascii" and
sys.getfilesystemencoding() is "utf-8". However, sys.argv is
neither in ASCII (since I can pass French accentuated
character), nor in UTF-8. It seems to be encoded in "latin-1",
but why ?
It will most likely be in the encoding of the terminal from which
you call Python, in other words, sys.stdin.encoding. Your only
hope of accepting non-US-ASCII command line arguments in this
manner is that sys.stdin.encoding is divined correctly by Python.
--
Neil Cerutti
Facts are stupid things. --Ronald Reagan
In <20061023130504.26823717@autremonde>, Jiba wrote:
I am desperately searching for the encoding of sys.argv.
I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem.
sys.getdefaultencoding() is "ascii" and sys.getfilesystemencoding() is
"utf-8". However, sys.argv is neither in ASCII (since I can pass French
accentuated character), nor in UTF-8. It seems to be encoded in
"latin-1", but why ?
There is no way to determine the encoding. The application that starts
another and sets the arguments can use any encoding it likes and there's
no standard way to find out which it was.
The `sys.stdin.encoding` approach isn't very robust because this will only
be set if the interpreter can find out what encoding is used on `stdin`.
That's impossible if the `stdin` is the input from another file.
Make it explicit: Add a command line option to choose the encoding.
Ciao,
Marc 'BlackJack' Rintsch
Jiba wrote:
Hi all,
I am desperately searching for the encoding of sys.argv.
I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem. sys.getdefaultencoding() is "ascii" and sys.getfilesystemencoding() is "utf-8". However, sys.argv is neither in ASCII (since I can pass French accentuated character), nor in UTF-8. It seems to be encoded in "latin-1", but why ?
Jiba
Here's what I see in a Windows command prompt interactive session:
Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
Started with C:/Steve/.pythonrc
>>import sys sys.stdin.encoding
'cp437'
>>sys.getdefaultencoding()
'ascii'
>>>
But in a Cygwin command window on the same machine I see
import syPython 2.5b2 (trunk:50713, Jul 19 2006, 16:04:09)
[GCC 3.4.4 (cygming special) (gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
sStarted with C:/Steve/.pythonrc
>>import sys sys.stdin.encoding
'US-ASCII'
>>sys.getdefaultencoding()
'ascii'
>>>
The strings in sys.argv are encoded the same as the standard input, I
bleieve.
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden
Jiba wrote:
Hi all,
I am desperately searching for the encoding of sys.argv.
I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem. sys.getdefaultencoding() is "ascii" and sys.getfilesystemencoding() is "utf-8". However, sys.argv is neither in ASCII (since I can pass French accentuated character), nor in UTF-8. It seems to be encoded in "latin-1", but why ?
Your system is misconfigured, complain to your distribution. On UNIX
sys.getfilesystemencoding(), sys.stdin.encoding, sys.stdout.encoding,
locale.getprefferedencoding and the encoding of the characters you type
should be the same.
Marc 'BlackJack' Rintsch wrote:
In <20061023130504.26823717@autremonde>, Jiba wrote:
I am desperately searching for the encoding of sys.argv.
I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem.
sys.getdefaultencoding() is "ascii" and sys.getfilesystemencoding() is
"utf-8". However, sys.argv is neither in ASCII (since I can pass French
accentuated character), nor in UTF-8. It seems to be encoded in
"latin-1", but why ?
There is no way to determine the encoding. The application that starts
another and sets the arguments can use any encoding it likes and there's
no standard way to find out which it was.
There is standard way: nl_langinfo function
<http://www.opengroup.org/onlinepubs/009695399/functions/nl_langinfo.html>
The code in pythonrun.c properly uses it find out the encoding. The
other question if Linux or *BSD distributions confirm to the standard.
-- Leo.
Jiba schrieb:
I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem.
sys.getdefaultencoding() is "ascii" and sys.getfilesystemencoding()
is "utf-8". However, sys.argv is neither in ASCII (since I can pass
French accentuated character), nor in UTF-8. It seems to be encoded
in "latin-1", but why ?
Let me second Leo Kislov's analysis. They should be encoded in
locale.getpreferredencoding(), which should be UTF-8. Are you
*sure* they aren't encoded in this way?
On my Debian system, I get this:
martin@mira:~/tmp$ echo $LANG
de_DE.UTF-8
martin@mira:~/tmp$ cat a.py
import sys
print sys.argv
martin@mira:~/tmp$ python a.py Martin v. Löwis
['a.py', 'Martin', 'v.', 'L\xc3\xb6wis']
So clearly, my terminal application + shell passes them as UTF-8,
as it should. The terminal application is KDE konsole; the shell
is bash. The shell *pretty likely* passes the arguments "through"
as-read from the terminal, so if you are not seeing UTF-8, you
have managed to misconfigure your terminal.
Regards,
Martin This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Robin Sanderson |
last post by:
Sorry in advance if this is a stupid question - I am new to C++.
In the process of converting program to be run from the command line
into a function to be run from another program I noticed...
|
by: Petr Prikryl |
last post by:
Hi,
When solving the problem of passing the unicode
directory name through command line into a script
(MS Windows environment), I have discovered that
I do not understand what encoding should...
|
by: Charles Sullivan |
last post by:
I'm working on a program which has a "tree" of command line arguments,
i.e.,
myprogram level1 ]]
such that there can be more than one level2 argument for each level1
argument and more than one...
|
by: Diez B. Roggisch |
last post by:
Hi,
I've got to deal with a pretty huge XML-document, and to do so I use the
cElementTree.iterparse functionality. Working great.
Only trouble: The guys creating that chunk of XML - well, lets...
|
by: Joe Smith |
last post by:
It is nothing short of embarrassing to feel the need to ask for help on
this. I can't see how I would make the main control for this. What I want
is a for loop and a test condition. And while I...
|
by: dpahl |
last post by:
Try to read an rss file with:
$file = trim($argv);
$xml = simplexml_load_file($file);
My RSS-reader has no problem reading the file, but when I try to read it with simplexml_load_file I...
|
by: Michael Reichenbach |
last post by:
Here is the example code.
int main(int argc, char *argv)
{
string Result;
WIN32_FIND_DATA daten;
HANDLE h = FindFirstFile(TEXT("c://test"), &daten);
system("PAUSE");
return EXIT_SUCCESS;
}
|
by: Harshad Modi |
last post by:
hello ,
I make one function for encoding latin1 to utf-8. but i think it is
not work proper.
plz guide me.
it is not get proper result . such that i got "Belgi�" using this
method, (Belgium)...
|
by: hsmit.home |
last post by:
Hi everyone,
I'm having some difficulty with the following piece of code. I have
stripped it to it's bare minimum to demonstrate the problem at hand.
Compiler: MS Visual C++ 2005 Express...
|
by: DolphinDB |
last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation.
Take...
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: Vimpel783 |
last post by:
Hello!
Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
|
by: jfyes |
last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
|
by: Defcon1945 |
last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
|
by: Shællîpôpï 09 |
last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
|
by: Faith0G |
last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
| |