469,568 Members | 1,378 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,568 developers. It's quick & easy.

Windows XP - Environment variable - Unicode

Hi

I would like to retrieve the application data directory path of the
logged user on
windows XP. To achieve this goal i use the environment variable
APPDATA.

The logged user has this name: sébastien. The second character is not an
ascii one and when i try to encode the path that contains this name in
utf-8,
i got this error:

Ascii error: index not in range (128)

I would like to first decode this string and then re-encode it in utf-8, but
i am not able to find out what encoding is used when i make:

appdata = os.environ ['APPDATA']

Any ideas ?

Thanks in advance
Sebastien

Jul 18 '05 #1
8 6827
sebastien.hugues wrote in news:3f******@epflnews.epfl.ch:
Hi

I would like to retrieve the application data directory path of the
logged user on
windows XP. To achieve this goal i use the environment variable
APPDATA.

The logged user has this name: sébastien. The second character is not
an ascii one and when i try to encode the path that contains this name
in utf-8,
i got this error:

Ascii error: index not in range (128)

I would like to first decode this string and then re-encode it in
utf-8, but i am not able to find out what encoding is used when i
make:

appdata = os.environ ['APPDATA']

Any ideas ?


I don't know if it will help but:
import win32com.client
shell = win32com.client.Dispatch("WScript.Shell")
env = shell.GetEnvironment("VOLATILE") j = []
for i in env: .... j.append(i)
.... j [u'LOGONSERVER=\\\\COMPUTERNAME', u'APPDATA=C:\\Documents and Settings
\\username\\Application Data']
Note the leading u, which I don't get with:
import os
os.environ["APPDATA"] 'C:\\Documents and Settings\\username\\Application Data'

Also note that APPDATA should also be in env = shell.GetEnvironment("PROCESS")


HTH

Rob.
--
http://www.victim-prime.dsl.pipex.com/
Jul 18 '05 #2

"sebastien.hugues" <se**************@swissinfo.org> wrote in message
news:3f******@epflnews.epfl.ch...
Hi

I would like to retrieve the application data directory path of the
logged user on
windows XP. To achieve this goal i use the environment variable
APPDATA.

The logged user has this name: sébastien. The second character is not an
ascii one and when i try to encode the path that contains this name in
utf-8,
i got this error:

Ascii error: index not in range (128)

I would like to first decode this string and then re-encode it in utf-8, but i am not able to find out what encoding is used when i make:

appdata = os.environ ['APPDATA']

Any ideas ?
I don't think encoding is an issue. Windows XP stores all character data as
unicode internally, so whatever you get back from os.environ() is either
going to be unicode, or it's going to be translated back to some single byte
code by Python. In the latter case, you may not be able to recover non-ascii
values, so Rob Willscroft's workaround to get the unicode version may be
your only hope.

If you're getting a standard string though, I'd try using Latin-1, or the
Windows
equivalent first (it's got an additional 32 characters that aren't in
Latin-1.)
Sorry I don't remember the actual names.

Note that Release 2.3 fixes the unicode problems for files under XP.
It's currently in late beta, though. I don't know if it fixes the
os.environ()
interface though, and it's rather late to get anything into 2.3.

John Roth


Thanks in advance
Sebastien

Jul 18 '05 #3
John Roth wrote:
I don't think encoding is an issue. Windows XP stores all character data as
unicode internally, so whatever you get back from os.environ() is either
going to be unicode, or it's going to be translated back to some single byte
code by Python.
Read the source, Luke. Python uses environ, which is a C library
variable pointing to byte strings, so no Unicode here.
In the latter case, you may not be able to recover non-ascii
values, so Rob Willscroft's workaround to get the unicode version may be
your only hope.
You are certainly able to recover non-ascii values, as long as they
only use CP_ACP.
If you're getting a standard string though, I'd try using Latin-1, or the
Windows equivalent first (it's got an additional 32 characters that aren't in
Latin-1.)
That, in general, is wrong. It is only true for the Western European and
American editions of Windows. In all other installations, CP_ACP differs
significantly from Latin-1.
Note that Release 2.3 fixes the unicode problems for files under XP.
It's currently in late beta, though. I don't know if it fixes the
os.environ()


It doesn't. "Fixing" something here is less urgent and more difficult,
as environment variables rarely exceed CP_ACP.

If people get support for Unicode environment variables, they want
Unicode command line arguments next.

Regards,
Martin

Jul 18 '05 #4

"Martin v. Löwis" <ma****@v.loewis.de> wrote in message
news:3F**************@v.loewis.de...
John Roth wrote:
I don't think encoding is an issue. Windows XP stores all character data as unicode internally, so whatever you get back from os.environ() is either
going to be unicode, or it's going to be translated back to some single byte code by Python.
Read the source, Luke.


I haven't gotten into the Python source, and my name is not Luke.
Also, don't respond to my e-mail address. Unfortunately, I had a problem
where I had to reload my system, and it's gotten out to usenet. It used
to go to an ISP I no longer have an account with.
Python uses environ, which is a C library
variable pointing to byte strings, so no Unicode here.
The OP's question revolved around ***which*** code page was
being used internally. Windows uses Unicode. That's not the same
question as what code set Python uses to attempt to translate Unicode
into a single byte character set.
> In the latter case, you may not be able to recover non-ascii
values, so Rob Willscroft's workaround to get the unicode version may be
your only hope.


You are certainly able to recover non-ascii values, as long as they
only use CP_ACP.


I said "may not," not "cannot in any and all circumstances."
If you're getting a standard string though, I'd try using Latin-1, or the Windows equivalent first (it's got an additional 32 characters that aren't in Latin-1.)


That, in general, is wrong. It is only true for the Western European and
American editions of Windows. In all other installations, CP_ACP differs
significantly from Latin-1.


The OP's problem was a character that's in the Western European range.
Note that Release 2.3 fixes the unicode problems for files under XP.
It's currently in late beta, though. I don't know if it fixes the
os.environ()


It doesn't. "Fixing" something here is less urgent and more difficult,
as environment variables rarely exceed CP_ACP.


Less urgent I can see, unless you're concerned about whether Python
survives against systems that do it right. Now that the Windows 9x
series is dying off, the vast majority of systems on the desktop are
going to have Unicode support internally. Granted, Python is not
targeted at "the vast majority of systems," but if you can't easily get
Unicode from the environment and the registry, then it's not very
useful for system administration tasks or automation tasks on
Windows.

Many, if not most, environment variables are file names. If file
names need Unicode support, then so do environment variables.

As to more difficult, as I said above, I haven't perused the source,
so I can't comment on that. If I had to do it myself, I'd probably
start out by always using the Unicode variant of the Windows API
call, and then check the type of the arguement to environ() to determine
which to pass back. I'm not sure whether or not I'd throw an exception
if the actual value couldn't be translated to the current SBCS code.
If people get support for Unicode environment variables, they want
Unicode command line arguments next.
Why not? I can enter a command with Unicode at the Windows
command prompt, and that command is likely to contain file names.
Same problem raising it's head in a different spot.

John Roth

On reading this over, it does sound a bit more strident than my
responses usually do, but I will admit to being irritated at the
assumption that you need to read the source to find out the
answer to various questions.
Regards,
Martin

Jul 18 '05 #5
"John Roth" <ne********@jhrothjr.com> writes:
The OP's question revolved around ***which*** code page was
being used internally. Windows uses Unicode. That's not the same
question as what code set Python uses to attempt to translate Unicode
into a single byte character set.
Yes and no. What Windows uses is largely irrelevant, as Python does
not use Windows here. Instead, it uses the Microsoft C library, in
which environment variables are *not* stored in some Unicode encoding,
when accessed through the _environ pointer.
As to more difficult, as I said above, I haven't perused the source,
so I can't comment on that. If I had to do it myself, I'd probably
start out by always using the Unicode variant of the Windows API
call, and then check the type of the arguement to environ() to determine
which to pass back. I'm not sure whether or not I'd throw an exception
if the actual value couldn't be translated to the current SBCS code.
Notice that os.environ is not a function, but a dictionary. So there
is no system call involved when retrieving an environment
variable. Instead, they are all precomputed.
On reading this over, it does sound a bit more strident than my
responses usually do, but I will admit to being irritated at the
assumption that you need to read the source to find out the
answer to various questions.


If the question is "how does software Foo do something", the *only*
reliable way is to read the source. You may have a mental model that
may allow you to give an educated guess how Foo *might* do
something. In this case, your educated guess was wrong, that's why I
referred you to the source.

Regards,
Martin

Jul 18 '05 #6

"Martin v. Löwis" <ma****@v.loewis.de> wrote in message
news:m3************@mira.informatik.hu-berlin.de...
"John Roth" <ne********@jhrothjr.com> writes:
The OP's question revolved around ***which*** code page was
being used internally. Windows uses Unicode. That's not the same
question as what code set Python uses to attempt to translate Unicode
into a single byte character set.
Yes and no. What Windows uses is largely irrelevant, as Python does
not use Windows here. Instead, it uses the Microsoft C library, in
which environment variables are *not* stored in some Unicode encoding,
when accessed through the _environ pointer.


I've found at various times that using the C library causes lots of
problems with Microsoft.
As to more difficult, as I said above, I haven't perused the source,
so I can't comment on that. If I had to do it myself, I'd probably
start out by always using the Unicode variant of the Windows API
call, and then check the type of the arguement to environ() to determine
which to pass back. I'm not sure whether or not I'd throw an exception
if the actual value couldn't be translated to the current SBCS code.


Notice that os.environ is not a function, but a dictionary. So there
is no system call involved when retrieving an environment
variable. Instead, they are all precomputed.


Good point. That does make it somewhat harder; the routine
would have to precompute both versions, and store them with
both standard strings and unicode strings as keys. Whether the
overhead would be worth it is debatable. It's not, however,
all that difficult to understand for the user of the facility, though.
It would work exactly the same way the file functions work: if
you use a unicode key, you get a unicode result.

John Roth

Regards,
Martin

Jul 18 '05 #7

"Fredrik Lundh" <fr*****@pythonware.com> wrote in message
news:ma**********************************@python.o rg...
John Roth wrote:
Read the source, Luke.
I haven't gotten into the Python source, and my name
is not Luke.


And life's to short to waste on movies...


Depends on what your goals in life are.
On reading this over, it does sound a bit more strident than my
responses usually do, but I will admit to being irritated at the
assumption that you need to read the source to find out the
answer to various questions.


Well, you obviously didn't bother to read the documentation for
os.environ, so pointing you to the source sounds like a reasonable
idea.


Not particularly. I might be one of that not inconsiderable number
of people that doesn't know C. I'm not, but the number of people
who use Python and who don't know C is not zero.

I like Python because, for the most part, it's much more
understandable than many languages I know, and that
makes it much more productive. What I've learned in this
conversation is that os.environ fails to handle one of the
major corner cases in a Windows NT/2000/XP environment.
So if I need that corner case, I'm going to have to use
the Windows API call. Not a big deal, but also not something
that I regard as one of the language's strengths.

John Roth
</F>

Jul 18 '05 #8

"Martin v. Löwis" <ma****@v.loewis.de> wrote in message
news:be************@news.t-online.com...
John Roth wrote:
Good point. That does make it somewhat harder; the routine
would have to precompute both versions, and store them with
both standard strings and unicode strings as keys.
That doesn't work. You cannot have separate dictionary entries
for unicode and byte string keys if the keys compare and hash
equal, which is the case for all-ASCII keys (which environment
variable names typically are).


Ah, so.

John Roth
Regards,
Martin

Jul 18 '05 #9

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

2 posts views Thread by Andrej Litowka | last post: by
8 posts views Thread by Scott | last post: by
4 posts views Thread by reachsamdurai | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.