473,725 Members | 2,295 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Revised PEP 349: Allow str() to return unicode strings

[Please mail followups to py********@pyth on.org.]

The PEP has been rewritten based on a suggestion by Guido to change
str() rather than adding a new built-in function. Based on my
testing, I believe the idea is feasible. It would be helpful if
people could test the patched Python with their own applications and
report any incompatibiliti es.
PEP: 349
Title: Allow str() to return unicode strings
Version: $Revision: 1.3 $
Last-Modified: $Date: 2005/08/22 21:12:08 $
Author: Neil Schemenauer <na*@arctrix.co m>
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 02-Aug-2005
Post-History: 06-Aug-2005
Python-Version: 2.5
Abstract

This PEP proposes to change the str() built-in function so that it
can return unicode strings. This change would make it easier to
write code that works with either string type and would also make
some existing code handle unicode strings. The C function
PyObject_Str() would remain unchanged and the function
PyString_New() would be added instead.
Rationale

Python has had a Unicode string type for some time now but use of
it is not yet widespread. There is a large amount of Python code
that assumes that string data is represented as str instances.
The long term plan for Python is to phase out the str type and use
unicode for all string data. Clearly, a smooth migration path
must be provided.

We need to upgrade existing libraries, written for str instances,
to be made capable of operating in an all-unicode string world.
We can't change to an all-unicode world until all essential
libraries are made capable for it. Upgrading the libraries in one
shot does not seem feasible. A more realistic strategy is to
individually make the libraries capable of operating on unicode
strings while preserving their current all-str environment
behaviour.

First, we need to be able to write code that can accept unicode
instances without attempting to coerce them to str instances. Let
us label such code as Unicode-safe. Unicode-safe libraries can be
used in an all-unicode world.

Second, we need to be able to write code that, when provided only
str instances, will not create unicode results. Let us label such
code as str-stable. Libraries that are str-stable can be used by
libraries and applications that are not yet Unicode-safe.

Sometimes it is simple to write code that is both str-stable and
Unicode-safe. For example, the following function just works:

def appendx(s):
return s + 'x'

That's not too surprising since the unicode type is designed to
make the task easier. The principle is that when str and unicode
instances meet, the result is a unicode instance. One notable
difficulty arises when code requires a string representation of an
object; an operation traditionally accomplished by using the str()
built-in function.

Using the current str() function makes the code not Unicode-safe.
Replacing a str() call with a unicode() call makes the code not
str-stable. Changing str() so that it could return unicode
instances would solve this problem. As a further benefit, some code
that is currently not Unicode-safe because it uses str() would
become Unicode-safe.
Specification

A Python implementation of the str() built-in follows:

def str(s):
"""Return a nice string representation of the object. The
return value is a str or unicode instance.
"""
if type(s) is str or type(s) is unicode:
return s
r = s.__str__()
if not isinstance(r, (str, unicode)):
raise TypeError('__st r__ returned non-string')
return r

The following function would be added to the C API and would be the
equivalent to the str() built-in (ideally it be called PyObject_Str,
but changing that function could cause a massive number of
compatibility problems):

PyObject *PyString_New(P yObject *);

A reference implementation is available on Sourceforge [1] as a
patch.
Backwards Compatibility

Some code may require that str() returns a str instance. In the
standard library, only one such case has been found so far. The
function email.header_de code() requires a str instance and the
email.Header.de code_header() function tries to ensure this by
calling str() on its argument. The code was fixed by changing
the line "header = str(header)" to:

if isinstance(head er, unicode):
header = header.encode(' ascii')

Whether this is truly a bug is questionable since decode_header()
really operates on byte strings, not character strings. Code that
passes it a unicode instance could itself be considered buggy.
Alternative Solutions

A new built-in function could be added instead of changing str().
Doing so would introduce virtually no backwards compatibility
problems. However, since the compatibility problems are expected to
rare, changing str() seems preferable to adding a new built-in.

The basestring type could be changed to have the proposed behaviour,
rather than changing str(). However, that would be confusing
behaviour for an abstract base type.
References

[1] http://www.python.org/sf/1266570
Copyright

This document has been placed in the public domain.

Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:

Aug 22 '05 #1
2 2628
neil,

i just intended to worry that returning a unicode object from ``str()``
would break assumptions about the way that 'type definers' like
``str()``, ``int()``, ``float()`` and so on work, but i quickly
realized that e.g. ``int()`` does return a long where appropriate!
since the principle works there one may surmise it will also work for
``str()`` in the long run.

one point i don't seem to understand right now is why it says in the
function definition::

if type(s) is str or type(s) is unicode:
...

instead of using ``isinstance()` `.

Testing for ``type()`` means that instances of derived classes (that
may or may not change nothing or almost nothing to the underlying
class) when passed to a function that uses ``str()`` will behave in a
different way!

isn't it more realistic and commonplace to assume that derivatives of a
class do fulfill the requirements of the underlying class? -- which may
turn out to be wrong! but still...

the code as it stands means i have to remember that *in this special
case only* (when deriving from ``unicode``), i have to add a
``__str__()`` method myself that simply returns ``self``.

then of course, one could change ``unicode.__str __()`` to return
``self``, itself, which should work. but then, why so complicated?

i suggest to change said line to::

if isinstance( s, ( str, unicode ) ):
...

any objections?

_wolf

Aug 23 '05 #2
Neil Schemenauer <na*@arctrix.co m> writes on Mon, 22 Aug 2005 15:31:42 -0600:
...
Some code may require that str() returns a str instance. In the
standard library, only one such case has been found so far. The
function email.header_de code() requires a str instance and the
email.Header.de code_header() function tries to ensure this by
calling str() on its argument. The code was fixed by changing
the line "header = str(header)" to:

if isinstance(head er, unicode):
header = header.encode(' ascii')
Note, that this is not equivalent to the old "str(header )":

"str(header )" used Python's "default encoding" while the
new code uses 'ascii'.

The new code might be more correct than the old one has been.

...
Alternative Solutions

A new built-in function could be added instead of changing str().
Doing so would introduce virtually no backwards compatibility
problems. However, since the compatibility problems are expected to
rare, changing str() seems preferable to adding a new built-in.


Can we get a new builtin with the exact same behaviour as
the current "str" which can be used when we do require an "str"
(and cannot use a "unicode").

Dieter
Aug 24 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

31
2981
by: Raymond Hettinger | last post by:
Based on your extensive feedback, PEP 322 has been completely revised. The response was strongly positive, but almost everyone preferred having a function instead of multiple object methods. The updated proposal is at: www.python.org/peps/pep-0322.html In a nutshell, it proposes a builtin function that greatly simplifies reverse iteration. The core concept is that clarity comes from specifying a sequence in a forward direction and...
30
2215
by: Hallvard B Furuseth | last post by:
Now that the '-*- coding: <charset> -*-' feature has arrived, I'd like to see an addition: # -*- str7bit:True -*- After the source file has been converted to Unicode, cause a parse error if a non-u'' string contains a non-7bit source character. It can be used to ensure that the source file doesn't contain national characters that the program will treat as characters in the current
27
2604
by: John Roth | last post by:
PEP 263 is marked finished in the PEP index, however I haven't seen the specified Phase 2 in the list of changes for 2.4 which is when I expected it. Did phase 2 get cancelled, or is it just not in the changes document? John Roth
20
2138
by: Mike Meyer | last post by:
This version includes the input from various and sundry people. Thanks to everyone who contributed. <mike PEP: XXX Title: A rational number module for Python Version: $Revision: 1.4 $ Last-Modified: $Date: 2003/09/22 04:51:50 $ Author: Mike Meyer <mwm@mired.org>
46
2443
by: Leo Breebaart | last post by:
I've tried Googling for this, but practically all discussions on str.join() focus on the yuck-ugly-shouldn't-it-be-a-list-method? issue, which is not my problem/question at all. What I can't find an explanation for is why str.join() doesn't automatically call str() on its arguments, so that e.g. str.join() would yield "1245", and ditto for e.g. user-defined classes that have a __str__() defined. All I've been able to find is a 1999...
11
25739
by: cjl | last post by:
Hey all: I want to convert strings (ex. '3', '32') to strings with left padded zeroes (ex. '003', '032'), so I tried this: string1 = '32' string2 = "%03s" % (string1) print string2 >32
14
1499
by: Russell E. Owen | last post by:
I have code like this: except Exception, e: self.setState(self.Failed, str(e)) which fails if the exception contains a unicode argument. I did, of course, try unicode(e) but that fails. The following works, but seems rather messy: except Exception, e:
3
7665
by: Andrii V. Mishkovskyi | last post by:
2008/5/7 Alexandr N Zamaraev <tonal@promsoft.ru>: Unicode and str objects are not the same. Why do you think that this is a bug? Anyway, you can always use 'encode' method of unicode objects: In : datetime.today().strftime('%Y-%m-%d %H-%M-%S.csv') Out: '2008-05-07 10-49-24.csv' In : datetime.today().strftime(u'%Y-%m-%d %H-%M-%S.csv')
19
5339
by: est | last post by:
From python manual str( ) Return a string containing a nicely printable representation of an object. For strings, this returns the string itself. The difference with repr(object) is that str(object) does not always attempt to return a string that is acceptable to eval(); its goal is to return a printable string. If no argument is given, returns the empty string, ''.
0
8752
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9401
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9257
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9179
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9116
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8099
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6702
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4519
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4784
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.