473,787 Members | 2,989 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

pep 277, Unicode filenames & mbcs encoding &c.

Am I reading pep 277 correctly? On Windows NT/XP, should filenames always
be converted to Unicode using the mbcs encoding? For example,

myFile = unicode(__file_ _, "mbcs", "strict")

This seems to work, and I'm wondering whether there are any other details to
consider.

My experiments with Idle for Python 2.2 indicate that os.path.join doesn't
work as I expect when one of the args is a Unicode string. Everything
before the Unicode string gets thrown away. But this is probably moot: pep
277 implies Python 2.3...

Am I correct that conversions to Unicode (using "mbcs" on Windows) should be
done before passing arguments to os.path.join, os.path.split,
os.path.normpat h, etc. ? Presumably os.path functions use the default
system encoding to convert strings to Unicode, which isn't likely to be
"mbcs" or anything else useful :-)

Are there any situations where some other encoding should be used instead on
Windows? What about other platforms? For instance, does Linux allow
non-ascii file names? If so, what encoding should be specified when
converting to Unicode? Thanks.

Edward
--------------------------------------------------------------------
Edward K. Ream email: ed*******@chart er.net
Leo: Literate Editor with Outlines
Leo: http://webpages.charter.net/edreamleo/front.html
--------------------------------------------------------------------
Jul 18 '05 #1
5 6488
"Edward K. Ream" <ed*******@char ter.net> schrieb im Newsbeitrag
news:vp******** ****@corp.super news.com...
| Am I reading pep 277 correctly? On Windows NT/XP, should filenames always
| be converted to Unicode using the mbcs encoding? For example,
|
| myFile = unicode(__file_ _, "mbcs", "strict")

No and no. You can *still* use regular byte strings. Python will do the
conversion to Unicode for you using "mbcs" as encoding.

|
| This seems to work, and I'm wondering whether there are any other details
to
| consider.
|
| My experiments with Idle for Python 2.2 indicate that os.path.join doesn't
| work as I expect when one of the args is a Unicode string. Everything
| before the Unicode string gets thrown away. But this is probably moot:
pep
| 277 implies Python 2.3...

Exactly. Python Unicode file name support has arrived with 2.3.

|
....
|
| Are there any situations where some other encoding should be used instead
on
| Windows? What about other platforms? For instance, does Linux allow
| non-ascii file names?

You can use "os.path.suppor ts_unicode_file names" to check...
HTH

Vincent Wehren

If so, what encoding should be specified when
| converting to Unicode? Thanks.
Propably the default encoding, on Linux

|
| Edward
| --------------------------------------------------------------------
| Edward K. Ream email: ed*******@chart er.net
| Leo: Literate Editor with Outlines
| Leo: http://webpages.charter.net/edreamleo/front.html
| --------------------------------------------------------------------
|
|
Jul 18 '05 #2
In article <bn*********@ne ws4.tilbu1.nb.h ome.nl>,
"vincent wehren" <vi*****@visual trans.de> wrote:
| Are there any situations where some other encoding should be used instead
on
| Windows? What about other platforms? For instance, does Linux allow
| non-ascii file names?

You can use "os.path.suppor ts_unicode_file names" to check...


Actually, you can't, see:

http://python.org/sf/767645

The only two platforms that currently support unicode filenames properly
are Windows NT/XP and MacOSX, and for one of them
os.path.support s_unicode_filen ames returns False :(

Just
Jul 18 '05 #3
"Edward K. Ream" <ed*******@char ter.net> writes:
Am I reading pep 277 correctly? On Windows NT/XP, should filenames always
be converted to Unicode using the mbcs encoding?
What do you mean with "should"? "Should Python always..." or "Should
the application always"?

PEP 277 actually answers neither question. As Vincent explains,
nothing changes with respect to using byte strings on the API. The
changes only affect Unicode strings passed to functions expecting file names.
For example,

myFile = unicode(__file_ _, "mbcs", "strict")

This seems to work
And it has nothing to do with PEP 277: You are not passing myFile to
any API function.

If you mean to use myFile as a file name, then yes: this is intended
to work. However, using plain __file__ directly should also work.
Am I correct that conversions to Unicode (using "mbcs" on Windows) should be
done before passing arguments to os.path.join, os.path.split,
os.path.normpat h, etc. ?
You should either use only Unicode strings, or only byte strings. The
functions of os.path are not all affected by the PEP 277
implementation (although they probably should).
Presumably os.path functions use the default
system encoding to convert strings to Unicode, which isn't likely to be
"mbcs" or anything else useful :-)
Right. This is actually unfortunate.
Are there any situations where some other encoding should be used instead on
Windows?
If you get data from a cmd.exe Window.
What about other platforms? For instance, does Linux allow non-ascii
file names?
Yes, it does.
If so, what encoding should be specified when converting to Unicode?


Nobody knows, but the convention is to use the locale's encoding, as
returned by locale.getprefe rredencoding().

Regards,
Martin
Jul 18 '05 #4
Many thanks, Martin, for these comments. They are so helpful...
You should either use only Unicode strings, or only byte strings. The
functions of os.path are not all affected by the PEP 277
implementation (although they probably should).


My working assumption is that all strings in my app must be Unicode strings.
For example, the crashes happening right now trying to support Unicode
filenames occur when a string is converted to Unicode in situations like:

if fileName1 == fileName2:

where one fileName is a unicode string and the other isn't yet. That's why
I wanted to do:

myFile = unicode(__file_ _, "mbcs", "strict")

The challenge in my app is to make sure the proper encoding is used in the
more than 30 situations where a filename gets created somehow. Naturally,
that's not your problem, nor PEP 277's problem either :-)
If so, what encoding should be specified when converting to Unicode?


Nobody knows, but the convention is to use the locale's encoding, as
returned by locale.getprefe rredencoding().


Thanks for this.

Edward
--------------------------------------------------------------------
Edward K. Ream email: ed*******@chart er.net
Leo: Literate Editor with Outlines
Leo: http://webpages.charter.net/edreamleo/front.html
--------------------------------------------------------------------
Jul 18 '05 #5
"Edward K. Ream" <ed*******@char ter.net> writes:
if fileName1 == fileName2:

where one fileName is a unicode string and the other isn't yet. That's why
I wanted to do:

myFile = unicode(__file_ _, "mbcs", "strict")


Ah, I see. Instead of "mbcs", you should use
sys.getfilesyst emencoding(). This is what Python will use when
converting the Unicode strings back to byte strings before passing
them to the system (in case it converts back at all, which it doesn't
on Windows thanks to PEP 277).

Regards,
Martin
Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

30
2767
by: aurora | last post by:
I have long find the Python default encoding of strict ASCII frustrating. For one thing I prefer to get garbage character than an exception. But the biggest issue is Unicode exception often pop up in unexpected places and only when a non-ASCII or unicode character first found its way into the system. Below is an example. The program may runs fine at the beginning. But as soon as an unicode character u'b' is introduced, the program boom...
7
3252
by: Sune | last post by:
Hi! For example: 1) I want to open a file in a Chinese locale and print it. 2) The program takes the file name as a command line argument.
7
4204
by: Robert | last post by:
Hello, I'm using Pythonwin and py2.3 (py2.4). I did not come clear with this: I want to use win32-fuctions like win32ui.MessageBox, listctrl.InsertItem ..... to get unicode strings on the screen - best results according to the platform/language settings (mainly XP Home, W2K, ...). Also unicode strings should be displayed as nice as possible at the console with normal print-s to stdout (on varying platforms, different
13
2970
by: gabor | last post by:
hi, from the documentation (http://docs.python.org/lib/os-file-dir.html) for os.listdir: "On Windows NT/2k/XP and Unix, if path is a Unicode object, the result will be a list of Unicode objects." i'm on Unix. (linux, ubuntu edgy)
4
2507
by: Bit Byte | last post by:
I am working on a large project and have several modules written compiled using MBCS. I am considering purchasing a third party library which I understand, is compiled to support unicode strings. Is there anyway I can work with both unicode and multi-byte C strings (MBCS) in the same project ?
17
10710
by: =?Utf-8?B?R2Vvcmdl?= | last post by:
Hello everyone, Wide character and multi-byte character are two popular encoding schemes on Windows. And wide character is using unicode encoding scheme. But each time I feel confused when talking with another team -- codepage -- at the same time. I am more confused when I saw sometimes we need codepage parameter for wide character conversion, and sometimes we do not need for conversion. Here are two examples,
13
3692
by: mario | last post by:
Hello! i stumbled on this situation, that is if I decode some string, below just the empty string, using the mcbs encoding, it succeeds, but if I try to encode it back with the same encoding it surprisingly fails with a LookupError. This seems like something to be corrected? $ python Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) on darwin
24
3390
by: Donn Ingle | last post by:
Hello, I hope someone can illuminate this situation for me. Here's the nutshell: 1. On start I call locale.setlocale(locale.LC_ALL,''), the getlocale. 2. If this returns "C" or anything without 'utf8' in it, then things start to go downhill: 2a. The app assumes unicode objects internally. i.e. Whenever there is
1
3863
by: Mudcat | last post by:
In short what I'm trying to do is read a document using an xml parser and then upload that data back into a database. I've got the code more or less completed using xml.etree.ElementTree for the parser and dbi/ odbc for my db connection. To fix problems with unicode I built a work-around by mapping unicode characters to equivalent ascii characters and then encoding everything to ascii. That allowed me to build the application and debug...
0
9655
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9497
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10169
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10110
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8993
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6749
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5398
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
3670
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2894
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.