469,328 Members | 1,301 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,328 developers. It's quick & easy.

What encoding is used when initializing sys.argv?

Hi,

When solving the problem of passing the unicode
directory name through command line into a script
(MS Windows environment), I have discovered that
I do not understand what encoding should be used
to convert the sys.argv into unicode.

I know about the rejected attempt to implement
sys.argvu. Still, how the sys.argv is filled? What
encoding is used when parsing the cmd line internally?
To what encoding is it converted when non ASCII
characters appear?

Thanks for your time and experience,
pepr
--
Petr Prikryl (prikrylp at skil dot cz)
Sep 30 '05 #1
3 2273
Petr Prikryl wrote:
I know about the rejected attempt to implement
sys.argvu. Still, how the sys.argv is filled? What
encoding is used when parsing the cmd line internally?
To what encoding is it converted when non ASCII
characters appear?


Python does not perform any conversion whatsoever.
It has a traditional main() function, with the
char *argv[] argument.

So if you think that the arguments are inherently
Unicode on your system, your question should be
"how does my operating system convert the arguments"?

That, of course, depends on your operating system.
"MS Windows environment" is not precise enough, since
it also depends on the specific incarnation of that
environment. On Windows 9x, I believe the command
line arguments are "inherently" *not* in Unicode,
but in a char array. On Windows NT+, they are Unicode,
and Windows (or is it the MS VC runtime?) converts them
to characters using the CP_ACP code page.

Kind regards,
Martin
Sep 30 '05 #2
Petr Prikryl:
... I have discovered that
I do not understand what encoding should be used
to convert the sys.argv into unicode.


Martin mentioned CP_ACP. In Python on Windows, this can be accessed
as the "mbcs" codec.

import sys
print repr(sys.argv[1])
print repr(unicode(sys.argv[1], "mbcs"))

C:\bin>python glurp.py abcߕ
'abc\xdf\x95'
u'abc\xdf\u2022'

Neil
Sep 30 '05 #3
Neil Hodgson <ny*****************@gmail.com> wrote:

Petr Prikryl:
... I have discovered that
I do not understand what encoding should be used
to convert the sys.argv into unicode.


Martin mentioned CP_ACP. In Python on Windows, this can be accessed
as the "mbcs" codec.

import sys
print repr(sys.argv[1])
print repr(unicode(sys.argv[1], "mbcs"))

C:\bin>python glurp.py abcߕ
'abc\xdf\x95'
u'abc\xdf\u2022'


There's another entry in my "keep this post forever" file.
--
- Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Oct 2 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

15 posts views Thread by ben | last post: by
reply views Thread by joseph speigle | last post: by
6 posts views Thread by Jiba | last post: by
6 posts views Thread by Harshad Modi | last post: by
89 posts views Thread by Tubular Technician | last post: by
4 posts views Thread by Erwin Moller | last post: by
reply views Thread by Purva khokhar | last post: by
reply views Thread by haryvincent176 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.