473,383 Members | 1,929 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

pep 277, Unicode filenames & mbcs encoding &c.

Am I reading pep 277 correctly? On Windows NT/XP, should filenames always
be converted to Unicode using the mbcs encoding? For example,

myFile = unicode(__file__, "mbcs", "strict")

This seems to work, and I'm wondering whether there are any other details to
consider.

My experiments with Idle for Python 2.2 indicate that os.path.join doesn't
work as I expect when one of the args is a Unicode string. Everything
before the Unicode string gets thrown away. But this is probably moot: pep
277 implies Python 2.3...

Am I correct that conversions to Unicode (using "mbcs" on Windows) should be
done before passing arguments to os.path.join, os.path.split,
os.path.normpath, etc. ? Presumably os.path functions use the default
system encoding to convert strings to Unicode, which isn't likely to be
"mbcs" or anything else useful :-)

Are there any situations where some other encoding should be used instead on
Windows? What about other platforms? For instance, does Linux allow
non-ascii file names? If so, what encoding should be specified when
converting to Unicode? Thanks.

Edward
--------------------------------------------------------------------
Edward K. Ream email: ed*******@charter.net
Leo: Literate Editor with Outlines
Leo: http://webpages.charter.net/edreamleo/front.html
--------------------------------------------------------------------
Jul 18 '05 #1
5 6468
"Edward K. Ream" <ed*******@charter.net> schrieb im Newsbeitrag
news:vp************@corp.supernews.com...
| Am I reading pep 277 correctly? On Windows NT/XP, should filenames always
| be converted to Unicode using the mbcs encoding? For example,
|
| myFile = unicode(__file__, "mbcs", "strict")

No and no. You can *still* use regular byte strings. Python will do the
conversion to Unicode for you using "mbcs" as encoding.

|
| This seems to work, and I'm wondering whether there are any other details
to
| consider.
|
| My experiments with Idle for Python 2.2 indicate that os.path.join doesn't
| work as I expect when one of the args is a Unicode string. Everything
| before the Unicode string gets thrown away. But this is probably moot:
pep
| 277 implies Python 2.3...

Exactly. Python Unicode file name support has arrived with 2.3.

|
....
|
| Are there any situations where some other encoding should be used instead
on
| Windows? What about other platforms? For instance, does Linux allow
| non-ascii file names?

You can use "os.path.supports_unicode_filenames" to check...
HTH

Vincent Wehren

If so, what encoding should be specified when
| converting to Unicode? Thanks.
Propably the default encoding, on Linux

|
| Edward
| --------------------------------------------------------------------
| Edward K. Ream email: ed*******@charter.net
| Leo: Literate Editor with Outlines
| Leo: http://webpages.charter.net/edreamleo/front.html
| --------------------------------------------------------------------
|
|
Jul 18 '05 #2
In article <bn*********@news4.tilbu1.nb.home.nl>,
"vincent wehren" <vi*****@visualtrans.de> wrote:
| Are there any situations where some other encoding should be used instead
on
| Windows? What about other platforms? For instance, does Linux allow
| non-ascii file names?

You can use "os.path.supports_unicode_filenames" to check...


Actually, you can't, see:

http://python.org/sf/767645

The only two platforms that currently support unicode filenames properly
are Windows NT/XP and MacOSX, and for one of them
os.path.supports_unicode_filenames returns False :(

Just
Jul 18 '05 #3
"Edward K. Ream" <ed*******@charter.net> writes:
Am I reading pep 277 correctly? On Windows NT/XP, should filenames always
be converted to Unicode using the mbcs encoding?
What do you mean with "should"? "Should Python always..." or "Should
the application always"?

PEP 277 actually answers neither question. As Vincent explains,
nothing changes with respect to using byte strings on the API. The
changes only affect Unicode strings passed to functions expecting file names.
For example,

myFile = unicode(__file__, "mbcs", "strict")

This seems to work
And it has nothing to do with PEP 277: You are not passing myFile to
any API function.

If you mean to use myFile as a file name, then yes: this is intended
to work. However, using plain __file__ directly should also work.
Am I correct that conversions to Unicode (using "mbcs" on Windows) should be
done before passing arguments to os.path.join, os.path.split,
os.path.normpath, etc. ?
You should either use only Unicode strings, or only byte strings. The
functions of os.path are not all affected by the PEP 277
implementation (although they probably should).
Presumably os.path functions use the default
system encoding to convert strings to Unicode, which isn't likely to be
"mbcs" or anything else useful :-)
Right. This is actually unfortunate.
Are there any situations where some other encoding should be used instead on
Windows?
If you get data from a cmd.exe Window.
What about other platforms? For instance, does Linux allow non-ascii
file names?
Yes, it does.
If so, what encoding should be specified when converting to Unicode?


Nobody knows, but the convention is to use the locale's encoding, as
returned by locale.getpreferredencoding().

Regards,
Martin
Jul 18 '05 #4
Many thanks, Martin, for these comments. They are so helpful...
You should either use only Unicode strings, or only byte strings. The
functions of os.path are not all affected by the PEP 277
implementation (although they probably should).


My working assumption is that all strings in my app must be Unicode strings.
For example, the crashes happening right now trying to support Unicode
filenames occur when a string is converted to Unicode in situations like:

if fileName1 == fileName2:

where one fileName is a unicode string and the other isn't yet. That's why
I wanted to do:

myFile = unicode(__file__, "mbcs", "strict")

The challenge in my app is to make sure the proper encoding is used in the
more than 30 situations where a filename gets created somehow. Naturally,
that's not your problem, nor PEP 277's problem either :-)
If so, what encoding should be specified when converting to Unicode?


Nobody knows, but the convention is to use the locale's encoding, as
returned by locale.getpreferredencoding().


Thanks for this.

Edward
--------------------------------------------------------------------
Edward K. Ream email: ed*******@charter.net
Leo: Literate Editor with Outlines
Leo: http://webpages.charter.net/edreamleo/front.html
--------------------------------------------------------------------
Jul 18 '05 #5
"Edward K. Ream" <ed*******@charter.net> writes:
if fileName1 == fileName2:

where one fileName is a unicode string and the other isn't yet. That's why
I wanted to do:

myFile = unicode(__file__, "mbcs", "strict")


Ah, I see. Instead of "mbcs", you should use
sys.getfilesystemencoding(). This is what Python will use when
converting the Unicode strings back to byte strings before passing
them to the system (in case it converts back at all, which it doesn't
on Windows thanks to PEP 277).

Regards,
Martin
Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

30
by: aurora | last post by:
I have long find the Python default encoding of strict ASCII frustrating. For one thing I prefer to get garbage character than an exception. But the biggest issue is Unicode exception often pop up...
7
by: Sune | last post by:
Hi! For example: 1) I want to open a file in a Chinese locale and print it. 2) The program takes the file name as a command line argument.
7
by: Robert | last post by:
Hello, I'm using Pythonwin and py2.3 (py2.4). I did not come clear with this: I want to use win32-fuctions like win32ui.MessageBox, listctrl.InsertItem ..... to get unicode strings on the...
13
by: gabor | last post by:
hi, from the documentation (http://docs.python.org/lib/os-file-dir.html) for os.listdir: "On Windows NT/2k/XP and Unix, if path is a Unicode object, the result will be a list of Unicode...
4
by: Bit Byte | last post by:
I am working on a large project and have several modules written compiled using MBCS. I am considering purchasing a third party library which I understand, is compiled to support unicode strings. ...
17
by: =?Utf-8?B?R2Vvcmdl?= | last post by:
Hello everyone, Wide character and multi-byte character are two popular encoding schemes on Windows. And wide character is using unicode encoding scheme. But each time I feel confused when...
13
by: mario | last post by:
Hello! i stumbled on this situation, that is if I decode some string, below just the empty string, using the mcbs encoding, it succeeds, but if I try to encode it back with the same encoding it...
24
by: Donn Ingle | last post by:
Hello, I hope someone can illuminate this situation for me. Here's the nutshell: 1. On start I call locale.setlocale(locale.LC_ALL,''), the getlocale. 2. If this returns "C" or anything...
1
by: Mudcat | last post by:
In short what I'm trying to do is read a document using an xml parser and then upload that data back into a database. I've got the code more or less completed using xml.etree.ElementTree for the...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.