Am I reading pep 277 correctly? On Windows NT/XP, should filenames always
be converted to Unicode using the mbcs encoding? For example,
myFile = unicode(__file_ _, "mbcs", "strict")
This seems to work, and I'm wondering whether there are any other details to
consider.
My experiments with Idle for Python 2.2 indicate that os.path.join doesn't
work as I expect when one of the args is a Unicode string. Everything
before the Unicode string gets thrown away. But this is probably moot: pep
277 implies Python 2.3...
Am I correct that conversions to Unicode (using "mbcs" on Windows) should be
done before passing arguments to os.path.join, os.path.split,
os.path.normpat h, etc. ? Presumably os.path functions use the default
system encoding to convert strings to Unicode, which isn't likely to be
"mbcs" or anything else useful :-)
Are there any situations where some other encoding should be used instead on
Windows? What about other platforms? For instance, does Linux allow
non-ascii file names? If so, what encoding should be specified when
converting to Unicode? Thanks.
Edward
--------------------------------------------------------------------
Edward K. Ream email: ed*******@chart er.net
Leo: Literate Editor with Outlines
Leo: http://webpages.charter.net/edreamleo/front.html
-------------------------------------------------------------------- 5 6488
"Edward K. Ream" <ed*******@char ter.net> schrieb im Newsbeitrag
news:vp******** ****@corp.super news.com...
| Am I reading pep 277 correctly? On Windows NT/XP, should filenames always
| be converted to Unicode using the mbcs encoding? For example,
|
| myFile = unicode(__file_ _, "mbcs", "strict")
No and no. You can *still* use regular byte strings. Python will do the
conversion to Unicode for you using "mbcs" as encoding.
|
| This seems to work, and I'm wondering whether there are any other details
to
| consider.
|
| My experiments with Idle for Python 2.2 indicate that os.path.join doesn't
| work as I expect when one of the args is a Unicode string. Everything
| before the Unicode string gets thrown away. But this is probably moot:
pep
| 277 implies Python 2.3...
Exactly. Python Unicode file name support has arrived with 2.3.
|
....
|
| Are there any situations where some other encoding should be used instead
on
| Windows? What about other platforms? For instance, does Linux allow
| non-ascii file names?
You can use "os.path.suppor ts_unicode_file names" to check...
HTH
Vincent Wehren
If so, what encoding should be specified when
| converting to Unicode? Thanks.
Propably the default encoding, on Linux
|
| Edward
| --------------------------------------------------------------------
| Edward K. Ream email: ed*******@chart er.net
| Leo: Literate Editor with Outlines
| Leo: http://webpages.charter.net/edreamleo/front.html
| --------------------------------------------------------------------
|
|
In article <bn*********@ne ws4.tilbu1.nb.h ome.nl>,
"vincent wehren" <vi*****@visual trans.de> wrote: | Are there any situations where some other encoding should be used instead on | Windows? What about other platforms? For instance, does Linux allow | non-ascii file names?
You can use "os.path.suppor ts_unicode_file names" to check...
Actually, you can't, see: http://python.org/sf/767645
The only two platforms that currently support unicode filenames properly
are Windows NT/XP and MacOSX, and for one of them
os.path.support s_unicode_filen ames returns False :(
Just
"Edward K. Ream" <ed*******@char ter.net> writes: Am I reading pep 277 correctly? On Windows NT/XP, should filenames always be converted to Unicode using the mbcs encoding?
What do you mean with "should"? "Should Python always..." or "Should
the application always"?
PEP 277 actually answers neither question. As Vincent explains,
nothing changes with respect to using byte strings on the API. The
changes only affect Unicode strings passed to functions expecting file names.
For example,
myFile = unicode(__file_ _, "mbcs", "strict")
This seems to work
And it has nothing to do with PEP 277: You are not passing myFile to
any API function.
If you mean to use myFile as a file name, then yes: this is intended
to work. However, using plain __file__ directly should also work.
Am I correct that conversions to Unicode (using "mbcs" on Windows) should be done before passing arguments to os.path.join, os.path.split, os.path.normpat h, etc. ?
You should either use only Unicode strings, or only byte strings. The
functions of os.path are not all affected by the PEP 277
implementation (although they probably should).
Presumably os.path functions use the default system encoding to convert strings to Unicode, which isn't likely to be "mbcs" or anything else useful :-)
Right. This is actually unfortunate.
Are there any situations where some other encoding should be used instead on Windows?
If you get data from a cmd.exe Window.
What about other platforms? For instance, does Linux allow non-ascii file names?
Yes, it does.
If so, what encoding should be specified when converting to Unicode?
Nobody knows, but the convention is to use the locale's encoding, as
returned by locale.getprefe rredencoding().
Regards,
Martin
Many thanks, Martin, for these comments. They are so helpful... You should either use only Unicode strings, or only byte strings. The functions of os.path are not all affected by the PEP 277 implementation (although they probably should).
My working assumption is that all strings in my app must be Unicode strings.
For example, the crashes happening right now trying to support Unicode
filenames occur when a string is converted to Unicode in situations like:
if fileName1 == fileName2:
where one fileName is a unicode string and the other isn't yet. That's why
I wanted to do:
myFile = unicode(__file_ _, "mbcs", "strict")
The challenge in my app is to make sure the proper encoding is used in the
more than 30 situations where a filename gets created somehow. Naturally,
that's not your problem, nor PEP 277's problem either :-) If so, what encoding should be specified when converting to Unicode?
Nobody knows, but the convention is to use the locale's encoding, as returned by locale.getprefe rredencoding().
Thanks for this.
Edward
--------------------------------------------------------------------
Edward K. Ream email: ed*******@chart er.net
Leo: Literate Editor with Outlines
Leo: http://webpages.charter.net/edreamleo/front.html
--------------------------------------------------------------------
"Edward K. Ream" <ed*******@char ter.net> writes: if fileName1 == fileName2:
where one fileName is a unicode string and the other isn't yet. That's why I wanted to do:
myFile = unicode(__file_ _, "mbcs", "strict")
Ah, I see. Instead of "mbcs", you should use
sys.getfilesyst emencoding(). This is what Python will use when
converting the Unicode strings back to byte strings before passing
them to the system (in case it converts back at all, which it doesn't
on Windows thanks to PEP 277).
Regards,
Martin This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: aurora |
last post by:
I have long find the Python default encoding of strict ASCII frustrating.
For one thing I prefer to get garbage character than an exception. But the
biggest issue is Unicode exception often pop up in unexpected places and
only when a non-ASCII or unicode character first found its way into the
system.
Below is an example. The program may runs fine at the beginning. But as
soon as an unicode character u'b' is introduced, the program boom...
|
by: Sune |
last post by:
Hi!
For example:
1)
I want to open a file in a Chinese locale and print it.
2)
The program takes the file name as a command line argument.
|
by: Robert |
last post by:
Hello,
I'm using Pythonwin and py2.3 (py2.4). I did not come clear with this:
I want to use win32-fuctions like win32ui.MessageBox,
listctrl.InsertItem ..... to get unicode strings on the screen - best
results according to the platform/language settings (mainly XP Home,
W2K, ...).
Also unicode strings should be displayed as nice as possible at the
console with normal print-s to stdout (on varying platforms, different
|
by: gabor |
last post by:
hi,
from the documentation (http://docs.python.org/lib/os-file-dir.html) for
os.listdir:
"On Windows NT/2k/XP and Unix, if path is a Unicode object, the result
will be a list of Unicode objects."
i'm on Unix. (linux, ubuntu edgy)
|
by: Bit Byte |
last post by:
I am working on a large project and have several modules written
compiled using MBCS. I am considering purchasing a third party library
which I understand, is compiled to support unicode strings.
Is there anyway I can work with both unicode and multi-byte C strings
(MBCS) in the same project ?
| |
by: =?Utf-8?B?R2Vvcmdl?= |
last post by:
Hello everyone,
Wide character and multi-byte character are two popular encoding schemes on
Windows. And wide character is using unicode encoding scheme. But each time I
feel confused when talking with another team -- codepage -- at the same time.
I am more confused when I saw sometimes we need codepage parameter for wide
character conversion, and sometimes we do not need for conversion. Here are
two examples,
|
by: mario |
last post by:
Hello!
i stumbled on this situation, that is if I decode some string, below
just the empty string, using the mcbs encoding, it succeeds, but if I
try to encode it back with the same encoding it surprisingly fails
with a LookupError. This seems like something to be corrected?
$ python
Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04)
on darwin
|
by: Donn Ingle |
last post by:
Hello,
I hope someone can illuminate this situation for me.
Here's the nutshell:
1. On start I call locale.setlocale(locale.LC_ALL,''), the getlocale.
2. If this returns "C" or anything without 'utf8' in it, then things start
to go downhill:
2a. The app assumes unicode objects internally. i.e. Whenever there is
|
by: Mudcat |
last post by:
In short what I'm trying to do is read a document using an xml parser
and then upload that data back into a database. I've got the code more
or less completed using xml.etree.ElementTree for the parser and dbi/
odbc for my db connection.
To fix problems with unicode I built a work-around by mapping unicode
characters to equivalent ascii characters and then encoding everything
to ascii. That allowed me to build the application and debug...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| | |