473,883 Members | 1,686 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

encoding of sys.argv ?

Hi all,

I am desperately searching for the encoding of sys.argv.

I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem. sys.getdefaulte ncoding() is "ascii" and sys.getfilesyst emencoding() is "utf-8". However, sys.argv is neither in ASCII (since I can pass French accentuated character), nor in UTF-8. It seems to be encoded in "latin-1", but why ?

Jiba
Oct 23 '06 #1
6 2756
On 2006-10-23, Jiba <ji******@free. frwrote:
Hi all,

I am desperately searching for the encoding of sys.argv.

I use a Linux box, with French UTF-8 locales and an UTF-8
filesystem. sys.getdefaulte ncoding() is "ascii" and
sys.getfilesyst emencoding() is "utf-8". However, sys.argv is
neither in ASCII (since I can pass French accentuated
character), nor in UTF-8. It seems to be encoded in "latin-1",
but why ?
It will most likely be in the encoding of the terminal from which
you call Python, in other words, sys.stdin.encod ing. Your only
hope of accepting non-US-ASCII command line arguments in this
manner is that sys.stdin.encod ing is divined correctly by Python.

--
Neil Cerutti
Facts are stupid things. --Ronald Reagan
Oct 23 '06 #2
In <20061023130504 .26823717@autre monde>, Jiba wrote:
I am desperately searching for the encoding of sys.argv.

I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem.
sys.getdefaulte ncoding() is "ascii" and sys.getfilesyst emencoding() is
"utf-8". However, sys.argv is neither in ASCII (since I can pass French
accentuated character), nor in UTF-8. It seems to be encoded in
"latin-1", but why ?
There is no way to determine the encoding. The application that starts
another and sets the arguments can use any encoding it likes and there's
no standard way to find out which it was.

The `sys.stdin.enco ding` approach isn't very robust because this will only
be set if the interpreter can find out what encoding is used on `stdin`.
That's impossible if the `stdin` is the input from another file.

Make it explicit: Add a command line option to choose the encoding.

Ciao,
Marc 'BlackJack' Rintsch
Oct 23 '06 #3
Jiba wrote:
Hi all,

I am desperately searching for the encoding of sys.argv.

I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem. sys.getdefaulte ncoding() is "ascii" and sys.getfilesyst emencoding() is "utf-8". However, sys.argv is neither in ASCII (since I can pass French accentuated character), nor in UTF-8. It seems to be encoded in "latin-1", but why ?

Jiba
Here's what I see in a Windows command prompt interactive session:

Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright" , "credits" or "license" for more information.
Started with C:/Steve/.pythonrc
>>import sys
sys.stdin.enc oding
'cp437'
>>sys.getdefaul tencoding()
'ascii'
>>>
But in a Cygwin command window on the same machine I see

import syPython 2.5b2 (trunk:50713, Jul 19 2006, 16:04:09)
[GCC 3.4.4 (cygming special) (gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright" , "credits" or "license" for more information.
sStarted with C:/Steve/.pythonrc
>>import sys
sys.stdin.enc oding
'US-ASCII'
>>sys.getdefaul tencoding()
'ascii'
>>>
The strings in sys.argv are encoded the same as the standard input, I
bleieve.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Oct 23 '06 #4

Jiba wrote:
Hi all,

I am desperately searching for the encoding of sys.argv.

I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem. sys.getdefaulte ncoding() is "ascii" and sys.getfilesyst emencoding() is "utf-8". However, sys.argv is neither in ASCII (since I can pass French accentuated character), nor in UTF-8. It seems to be encoded in "latin-1", but why ?
Your system is misconfigured, complain to your distribution. On UNIX
sys.getfilesyst emencoding(), sys.stdin.encod ing, sys.stdout.enco ding,
locale.getpreff eredencoding and the encoding of the characters you type
should be the same.

Oct 23 '06 #5

Marc 'BlackJack' Rintsch wrote:
In <20061023130504 .26823717@autre monde>, Jiba wrote:
I am desperately searching for the encoding of sys.argv.

I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem.
sys.getdefaulte ncoding() is "ascii" and sys.getfilesyst emencoding() is
"utf-8". However, sys.argv is neither in ASCII (since I can pass French
accentuated character), nor in UTF-8. It seems to be encoded in
"latin-1", but why ?

There is no way to determine the encoding. The application that starts
another and sets the arguments can use any encoding it likes and there's
no standard way to find out which it was.
There is standard way: nl_langinfo function
<http://www.opengroup.o rg/onlinepubs/009695399/functions/nl_langinfo.htm l>
The code in pythonrun.c properly uses it find out the encoding. The
other question if Linux or *BSD distributions confirm to the standard.

-- Leo.

Oct 23 '06 #6
Jiba schrieb:
I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem.
sys.getdefaulte ncoding() is "ascii" and sys.getfilesyst emencoding()
is "utf-8". However, sys.argv is neither in ASCII (since I can pass
French accentuated character), nor in UTF-8. It seems to be encoded
in "latin-1", but why ?
Let me second Leo Kislov's analysis. They should be encoded in
locale.getprefe rredencoding(), which should be UTF-8. Are you
*sure* they aren't encoded in this way?

On my Debian system, I get this:

martin@mira:~/tmp$ echo $LANG
de_DE.UTF-8
martin@mira:~/tmp$ cat a.py
import sys
print sys.argv

martin@mira:~/tmp$ python a.py Martin v. Löwis
['a.py', 'Martin', 'v.', 'L\xc3\xb6wis']

So clearly, my terminal application + shell passes them as UTF-8,
as it should. The terminal application is KDE konsole; the shell
is bash. The shell *pretty likely* passes the arguments "through"
as-read from the terminal, so if you are not seeing UTF-8, you
have managed to misconfigure your terminal.

Regards,
Martin
Oct 23 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
2632
by: Robin Sanderson | last post by:
Sorry in advance if this is a stupid question - I am new to C++. In the process of converting program to be run from the command line into a function to be run from another program I noticed behaviour that I do not understand. Consider the example programs below: Program 1 below is a simple program that merely outputs the command line arguments. This compiles and runs fine with Microsoft Visual C++ 6.0 and g++ 3.3.1.
3
2469
by: Petr Prikryl | last post by:
Hi, When solving the problem of passing the unicode directory name through command line into a script (MS Windows environment), I have discovered that I do not understand what encoding should be used to convert the sys.argv into unicode. I know about the rejected attempt to implement sys.argvu. Still, how the sys.argv is filled? What
28
12645
by: Charles Sullivan | last post by:
I'm working on a program which has a "tree" of command line arguments, i.e., myprogram level1 ]] such that there can be more than one level2 argument for each level1 argument and more than one level3 argument for each level2 argument, etc. Suppose I code it similar to this fragment: int main( int argc, char *argv ) {
3
1895
by: Diez B. Roggisch | last post by:
Hi, I've got to deal with a pretty huge XML-document, and to do so I use the cElementTree.iterparse functionality. Working great. Only trouble: The guys creating that chunk of XML - well, lets just say they are "encodingly challanged", so they don't produce utf-8, but only cp1252 instead, together with some weird name (Windows-1252) for that. That is not part of the standard codecs module. cp1252 is, of course.
22
2213
by: Joe Smith | last post by:
It is nothing short of embarrassing to feel the need to ask for help on this. I can't see how I would make the main control for this. What I want is a for loop and a test condition. And while I know, from things I pondered 2 decades ago, that a fella can write code without a goto, I'm stuck. /* sieve1.c */ #define whatever 20 #define N whatever
1
16438
by: dpahl | last post by:
Try to read an rss file with: $file = trim($argv); $xml = simplexml_load_file($file); My RSS-reader has no problem reading the file, but when I try to read it with simplexml_load_file I always get the error: parser error : Input is not proper UTF-8, indicate encoding What's wrong ?
17
4116
by: Michael Reichenbach | last post by:
Here is the example code. int main(int argc, char *argv) { string Result; WIN32_FIND_DATA daten; HANDLE h = FindFirstFile(TEXT("c://test"), &daten); system("PAUSE"); return EXIT_SUCCESS; }
6
9627
by: Harshad Modi | last post by:
hello , I make one function for encoding latin1 to utf-8. but i think it is not work proper. plz guide me. it is not get proper result . such that i got "Belgi�" using this method, (Belgium) : import codecs import sys
10
3359
by: hsmit.home | last post by:
Hi everyone, I'm having some difficulty with the following piece of code. I have stripped it to it's bare minimum to demonstrate the problem at hand. Compiler: MS Visual C++ 2005 Express Edition (similar problem arises with 2008) Runtime Library: All multi-threaded variants have been seen to fail |
0
9940
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10742
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10847
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10415
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9573
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5797
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5991
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
4220
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3232
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.