By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,908 Members | 1,850 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,908 IT Pros & Developers. It's quick & easy.

Simpler transition to PEP 3000 "Unicode only strings"?

P: n/a
Hi all,

My question is: How do you tackle with mixing
Unicode and non-Unicode parts of your application?

Context:
========

The PEP 3000 says
"Make all strings be Unicode, and have a separate bytes() type."

Until then, I am forced to write
# -*- coding: cp123456 -*-
(see 2.1.4 Encoding declarations) and use...
myString = u'text with funny letters'

This leads to a source polution that will be
difficult to remove later.

The idea:
=========

What do you think about the following proposal
that goes the half way

If the Python source file is stored in UTF-8 (or
other recognised Unicode file format), then the
encoding declaration must reflect the format or
can be omitted entirely. In such case, all
simple string literals will be treated as
unicode string literals.

Would this break any existing code?

Thanks for your time and experience,
pepr

--
Petr Prikryl (prikrylp at skil dot cz)
Sep 20 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Petr Prikryl wrote:
Would this break any existing code?


Yes, it would break code which currently contains

# -*- coding: utf-8 -*-

and also contains byte string literals.

Notice that there is an alternative form of the UTF-8
declaration: if the Python file starts with an UTF-8
signature (BOM), then it is automatically considered
as UTF-8, with no explicit conding:-declaration
required. Set IDLE's Options/General/Default Source
Encoding to UTF-8 to have IDLE automatically use the
UTF-8 signature when saving files with non-ASCII
characters.

As for dropping the u prefix on string literals:
Just try the -U option of the interpreter some time,
which makes all string literals Unicode. If you manage
to get the standard library working this way, you
won't need a per-file decision anymore: just start
your program with 'python -U'.

Regards,
Martin
Sep 20 '05 #2

P: n/a
Petr Prikryl wrote:
Would this break any existing code?


Yes, it would break code which currently contains

# -*- coding: utf-8 -*-

and also contains byte string literals.

Notice that there is an alternative form of the UTF-8
declaration: if the Python file starts with an UTF-8
signature (BOM), then it is automatically considered
as UTF-8, with no explicit conding:-declaration
required. Set IDLE's Options/General/Default Source
Encoding to UTF-8 to have IDLE automatically use the
UTF-8 signature when saving files with non-ASCII
characters.

As for dropping the u prefix on string literals:
Just try the -U option of the interpreter some time,
which makes all string literals Unicode. If you manage
to get the standard library working this way, you
won't need a per-file decision anymore: just start
your program with 'python -U'.

Regards,
Martin
Sep 20 '05 #3

P: n/a
"Petr Prikryl" <Pr******@skil.cz> writes on Tue, 20 Sep 2005 11:21:59 +0200:
...
The idea:
=========

What do you think about the following proposal
that goes the half way

If the Python source file is stored in UTF-8 (or
other recognised Unicode file format), then the
encoding declaration must reflect the format or
can be omitted entirely. In such case, all
simple string literals will be treated as
unicode string literals.

Would this break any existing code?


Yes: modules that construct byte strings (i.e. strings
which should *not* be unicode strings).

Nevertheless, such a module may be stored in UTF-8.
Sep 21 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.