473,707 Members | 2,362 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Unicode entries on sys.path

I was trying to track down a bug in py2exe where the executable did
not work when it is in a directory containing japanese characters.

Then, I discovered that part of the problem is in the zipimporter that
py2exe uses, and finally I found that it didn't even work in Python
itself.

If the entry in sys.path contains normal western characters, umlauts for
example, it works fine. But when I copied some japanese characters from
a random web page, and named a directory after that, it didn't work any
longer.

The windows command prompt is not able to print these characters,
although windows explorer has no problems showing them.

Here's the script, the subdirectory contains the file 'somemodule.py' ,
but importing this fails:

import sys
sys.path = [u'\u5b66\u6821\ u30c7xx']
print sys.path

import somemodule

It seems that Python itself converts unicode entries in sys.path to
normal strings using windows default conversion rules - is this a
problem that I can fix by changing some regional setting on my machine?

Hm, maybe more a windows question than a python question...

Thanks,
Thomas
Jul 18 '05 #1
14 2632
Thomas Heller wrote:
It seems that Python itself converts unicode entries in sys.path to
normal strings using windows default conversion rules - is this a
problem that I can fix by changing some regional setting on my machine?
You can set the system code page on the third tab on the XP
regional settings (character set for non-unicode applications).
This, of course, assumes that there is a character set that supports
all directories in sys.path. If you have Japanese characters on
sys.path, you certainly need to set the system locale to Japanese
(is that CP932?).

Changing this setting requires a reboot.
Hm, maybe more a windows question than a python question...


The real question here is: why does Python not support arbitrary
Unicode strings on sys.path? It could, in principle, atleast on
Windows NT+ (and also on OSX). Patches are welcome.

Regards,
Martin
Jul 18 '05 #2
In article <41************ **@v.loewis.de> ,
"Martin v. Lowis" <ma****@v.loewi s.de> wrote:
Hm, maybe more a windows question than a python question...


The real question here is: why does Python not support arbitrary
Unicode strings on sys.path? It could, in principle, atleast on
Windows NT+ (and also on OSX). Patches are welcome.


Works for me on OSX 10.3.6, as it should: prior to using the sys.path
entry, a unicode string is encoded with Py_FileSystemDe faultEncoding.
I'm not sure how well it works together with zipimport, though.

Just
Jul 18 '05 #3
Just wrote:
In article <41************ **@v.loewis.de> ,
"Martin v. Lowis" <ma****@v.loewi s.de> wrote:

Hm, maybe more a windows question than a python question...
The real question here is: why does Python not support arbitrary
Unicode strings on sys.path? It could, in principle, atleast on
Windows NT+ (and also on OSX). Patches are welcome.

Works for me on OSX 10.3.6, as it should: prior to using the sys.path
entry, a unicode string is encoded with Py_FileSystemDe faultEncoding.


For this conversion "mbcs" will be used on Windows machines, implying
that such conversions are made using the current system Ansi codepage.
(As a matter of interest: What is this on OSX?). This conversion is
likely to be useless for unicode directory names containing characters
that do not have a mapping to a character in this particular codepage.

The technique described by Martin may solve the problem for what in this
case are Japanese characters, but what if I have directory names from
another language group, such as simpliefied Chinese, as well?

The only way to get around this is to allow - as Martin suggests -
arbitrary unicode strings in sys.path on those platforms that may have
unicode file names.

--
Vincen Wehren
I'm not sure how well it works together with zipimport, though.
Just

Jul 18 '05 #4
In article <cq**********@n ews6.zwoll1.ov. home.nl>,
vincent wehren <vi*****@visual trans.de> wrote:
Just wrote:
In article <41************ **@v.loewis.de> ,
"Martin v. Lowis" <ma****@v.loewi s.de> wrote:

Hm, maybe more a windows question than a python question...

The real question here is: why does Python not support arbitrary
Unicode strings on sys.path? It could, in principle, atleast on
Windows NT+ (and also on OSX). Patches are welcome.

Works for me on OSX 10.3.6, as it should: prior to using the sys.path
entry, a unicode string is encoded with Py_FileSystemDe faultEncoding.


For this conversion "mbcs" will be used on Windows machines, implying
that such conversions are made using the current system Ansi codepage.
(As a matter of interest: What is this on OSX?).


UTF-8.

Just
Jul 18 '05 #5
Just wrote:
The real question here is: why does Python not support arbitrary
Unicode strings on sys.path? It could, in principle, atleast on
Windows NT+ (and also on OSX). Patches are welcome.

Works for me on OSX 10.3.6, as it should: prior to using the sys.path
entry, a unicode string is encoded with Py_FileSystemDe faultEncoding.
I'm not sure how well it works together with zipimport, though.


As Vincent's message already implies, I'm asking for Windows patches.
In a Windows system, there are path names which just *don't have*
a representation in the file system default encoding. So you just
can't use the standard file system API (open, read, write) to access
those files - instead, you have to use specific Unicode variants
of the file system API.

The only operating system in active use that can reliably represent
all file names in the standard API is OS X. Unix can do that as
long as the locale is UTF-8; for all other systems, there are
restrictions when you try to use the file system API to access
files with "funny" characters.

Regards,
Martin
Jul 18 '05 #6
On Thu, 23 Dec 2004 19:24:58 +0100, =?ISO-8859-1?Q?=22Martin_v =2E_L=F6wis=22? = <ma****@v.loewi s.de> wrote:
Thomas Heller wrote:
It seems that Python itself converts unicode entries in sys.path to
normal strings using windows default conversion rules - is this a
problem that I can fix by changing some regional setting on my machine?


You can set the system code page on the third tab on the XP
regional settings (character set for non-unicode applications).
This, of course, assumes that there is a character set that supports
all directories in sys.path. If you have Japanese characters on
sys.path, you certainly need to set the system locale to Japanese
(is that CP932?).

Changing this setting requires a reboot.
Hm, maybe more a windows question than a python question...


The real question here is: why does Python not support arbitrary
Unicode strings on sys.path? It could, in principle, atleast on
Windows NT+ (and also on OSX). Patches are welcome.

What about removable drives? And mountable multiple file system types?
Maybe some collections of potentially homogenous file system references
such as sys.path need to be virtualized to carry relevant file system
encoding and protocol info etc. That could cover synthetic or compressed
info sources too, IWT. Homogeneous package representation could be a similar
problem, I guess.

Regards,
Bengt Richter
Jul 18 '05 #7
"Martin v. Löwis" <ma****@v.loewi s.de> writes:
Thomas Heller wrote:
It seems that Python itself converts unicode entries in sys.path to
normal strings using windows default conversion rules - is this a
problem that I can fix by changing some regional setting on my machine?


You can set the system code page on the third tab on the XP
regional settings (character set for non-unicode applications).
This, of course, assumes that there is a character set that supports
all directories in sys.path. If you have Japanese characters on
sys.path, you certainly need to set the system locale to Japanese
(is that CP932?).

Changing this setting requires a reboot.
Hm, maybe more a windows question than a python question...


The real question here is: why does Python not support arbitrary
Unicode strings on sys.path? It could, in principle, atleast on
Windows NT+ (and also on OSX). Patches are welcome.


How should these patches be approached? On windows, it would probably
be easiest to use the MS generic text routines: _tcslen instead of
strlen, for example, and to rely on the _UNICODE preprocessor symbol to
map this function to strlen or wcslen. Is there a similar thing in the
non-windows world?

Thomas
Jul 18 '05 #8
Bengt Richter wrote:
The real question here is: why does Python not support arbitrary
Unicode strings on sys.path? It could, in principle, atleast on
Windows NT+ (and also on OSX). Patches are welcome.

What about removable drives? And mountable multiple file system types?


I'm not sure I understand the question. What about them?

On Windows, a removable drive will typically have its file names encoded
in UCS-2LE (i.e. "Unicode proper"), through the vfat, ntfs, or joliet
file systems. So if a Unicode file name in sys.path refers to them, and
a proper patch to use wide APIs is incorporated in Python, Python will
transparently find the files on these media.
Maybe some collections of potentially homogenous file system references
such as sys.path need to be virtualized to carry relevant file system
encoding and protocol info etc.


No no no. sys.path contains path names on the local system, nothing
virtualized (unless one of the existing hook mechanisms is used, which
would be OT for this thread).

Regards,
Martin
Jul 18 '05 #9
Thomas Heller wrote:
How should these patches be approached?
Please have a look as to how posixmodule.c and fileobject.c deal with
this issue.
On windows, it would probably
be easiest to use the MS generic text routines: _tcslen instead of
strlen, for example, and to rely on the _UNICODE preprocessor symbol to
map this function to strlen or wcslen.


No. This fails for two reasons:
1. We don't compile Python with _UNICODE, and never will do so. This
macro is only a mechanism to simplify porting code from ANSI APIs
to Unicode APIs, so you don't have to reformulate all the API calls.
For new code, it is better to use the Unicode APIs directly if you
plan to use them.
2. On Win9x, the Unicode APIs don't work (*). So you need to chose at
run-time whether you want to use wide or narrow API. Unless
a) we ship two binaries in the future, one for W9x, one for NT+
(I hope this won't happen), or
b) we drop support for W9x. I'm in favour of doing so sooner or
later, but perhaps not for Python 2.5.

Regards,
Martin

(*) Can somebody please report whether the *W file APIs fail on W9x
because the entry points are not there (so you can't even run the
binary), or because they fail with an error when called?
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
26624
by: ..... | last post by:
I have an established program that I am changing to allow users to select one of eight languages and have all the label captions change accordingly. I have no problems with English, French, Dutch, German, Spanish or Italian. The Polish language is causing me trouble. From what I have read, VB supports UNICODE, in fact it uses UNICODE internally, which means that ANY character in pretty much any language should be readable from a UNICODE...
5
6483
by: Edward K. Ream | last post by:
Am I reading pep 277 correctly? On Windows NT/XP, should filenames always be converted to Unicode using the mbcs encoding? For example, myFile = unicode(__file__, "mbcs", "strict") This seems to work, and I'm wondering whether there are any other details to consider. My experiments with Idle for Python 2.2 indicate that os.path.join doesn't work as I expect when one of the args is a Unicode string. Everything
19
11883
by: Gerson Kurz | last post by:
AAAAAAAARG I hate the way python handles unicode. Here is a nice problem for y'all to enjoy: say you have a variable thats unicode directory = u"c:\temp" Its unicode not because you want it to, but because its for example read from _winreg which returns unicode. You do an os.listdir(directory). Note that all filenames returned are now unicode. (Change introduced I believe in 2.3).
30
2757
by: aurora | last post by:
I have long find the Python default encoding of strict ASCII frustrating. For one thing I prefer to get garbage character than an exception. But the biggest issue is Unicode exception often pop up in unexpected places and only when a non-ASCII or unicode character first found its way into the system. Below is an example. The program may runs fine at the beginning. But as soon as an unicode character u'b' is introduced, the program boom...
3
5631
by: Stanislaw Findeisen | last post by:
Does anyone know how to create file shortcuts in Windows? The only way I know is like: --------------------------------------------------------------- import win32com.client wScriptShellObject = win32com.client.Dispatch("WScript.Shell") shortcutName = unicode("shortcut.lnk", "utf8")
11
2846
by: Nir Aides | last post by:
Hello, Is there a solution or a work around for the sys.path problem with unicode folder names on Windows XP? I need to be able to import modules from a folder with a non-ascii name. Thanks, Nir
14
6406
by: abhi147 | last post by:
Hi , I want to convert an array of bytes like : {79,104,-37,-66,24,123,30,-26,-99,-8,80,-38,19,14,-127,-3} into Unicode character with ISO-8859-1 standard. Can anyone help me .. how should I go about doing it ? Thanks
7
336
by: gheissenberger | last post by:
HELP! Guy who was here before me wrote a script to parse files in Python. Includes line: print u where u is a line from a file we are parsing. However, we have started recieving data from Brazil. If I open file to parse in VI, looks like: <Utt id="3" transcribe="yes" audioRoot="A1"
24
3379
by: Donn Ingle | last post by:
Hello, I hope someone can illuminate this situation for me. Here's the nutshell: 1. On start I call locale.setlocale(locale.LC_ALL,''), the getlocale. 2. If this returns "C" or anything without 'utf8' in it, then things start to go downhill: 2a. The app assumes unicode objects internally. i.e. Whenever there is
0
8787
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9289
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9001
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7921
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6615
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4454
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
3151
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2508
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2096
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.