PEP: Generalised String Coercion

Neil Schemenauer

The title is perhaps a little too grandiose but it's the best I
could think of. The change is really not large. Personally, I
would be happy enough if only %s was changed and the built-in was
not added. Please comment.

Neil
PEP: 349
Title: Generalised String Coercion
Version: $Revision: 1.2 $
Last-Modified: $Date: 2005/08/06 04:05:48 $
Author: Neil Schemenauer <na*@arctrix.com>
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 02-Aug-2005
Post-History: 06-Aug-2005
Python-Version: 2.5
Abstract

This PEP proposes the introduction of a new built-in function,
text(), that provides a way of generating a string representation
of an object without forcing the result to be a particular string
type. In addition, the behavior %s format specifier would be
changed to call text() on the argument. These two changes would
make it easier to write library code that can be used by
applications that use only the str type and by others that also
use the unicode type.
Rationale

Python has had a Unicode string type for some time now but use of
it is not yet widespread. There is a large amount of Python code
that assumes that string data is represented as str instances.
The long term plan for Python is to phase out the str type and use
unicode for all string data. Clearly, a smooth migration path
must be provided.

We need to upgrade existing libraries, written for str instances,
to be made capable of operating in an all-unicode string world.
We can't change to an all-unicode world until all essential
libraries are made capable for it. Upgrading the libraries in one
shot does not seem feasible. A more realistic strategy is to
individually make the libraries capable of operating on unicode
strings while preserving their current all-str environment
behaviour.

First, we need to be able to write code that can accept unicode
instances without attempting to coerce them to str instances. Let
us label such code as Unicode-safe. Unicode-safe libraries can be
used in an all-unicode world.

Second, we need to be able to write code that, when provided only
str instances, will not create unicode results. Let us label such
code as str-stable. Libraries that are str-stable can be used by
libraries and applications that are not yet Unicode-safe.

Sometimes it is simple to write code that is both str-stable and
Unicode-safe. For example, the following function just works:

def appendx(s):
return s + 'x'

That's not too surprising since the unicode type is designed to
make the task easier. The principle is that when str and unicode
instances meet, the result is a unicode instance. One notable
difficulty arises when code requires a string representation of an
object; an operation traditionally accomplished by using the str()
built-in function.

Using str() makes the code not Unicode-safe. Replacing a str()
call with a unicode() call makes the code not str-stable. Using a
string format almost accomplishes the goal but not quite.
Consider the following code:

def text(obj):
return '%s' % obj

It behaves as desired except if 'obj' is not a basestring instance
and needs to return a Unicode representation of itself. In that
case, the string format will attempt to coerce the result of
__str__ to a str instance. Defining a __unicode__ method does not
help since it will only be called if the right-hand operand is a
unicode instance. Using a unicode instance for the right-hand
operand does not work because the function is no longer str-stable
(i.e. it will coerce everything to unicode).
Specification

A Python implementation of the text() built-in follows:

def text(s):
"""Return a nice string representation of the object. The
return value is a basestring instance.
"""
if isinstance(s, basestring):
return s
r = s.__str__()
if not isinstance(r, basestring):
raise TypeError('__str__ returned non-string')
return r

Note that it is currently possible, although not very useful, to
write __str__ methods that return unicode instances.

The %s format specifier for str objects would be changed to call
text() on the argument. Currently it calls str() unless the
argument is a unicode instance (in which case the object is
substituted as is and the % operation returns a unicode instance).

The following function would be added to the C API and would be the
equivalent of the text() function:

PyObject *PyObject_Text(PyObject *o);

A reference implementation is available on Sourceforge [1] as a
patch.
Backwards Compatibility

The change to the %s format specifier would result in some %
operations returning a unicode instance rather than raising a
UnicodeDecodeError exception. It seems unlikely that the change
would break currently working code.
Alternative Solutions

Rather than adding the text() built-in, if PEP 246 were
implemented then adapt(s, basestring) could be equivalent to
text(s). The advantage would be one less built-in function. The
problem is that PEP 246 is not implemented.

Fredrik Lundh has suggested [2] that perhaps a new slot should be
added (e.g. __text__), that could return any kind of string that's
compatible with Python's text model. That seems like an
attractive idea but many details would still need to be worked
out.

Instead of providing the text() built-in, the %s format specifier
could be changed and a string format could be used instead of
calling text(). However, it seems like the operation is important
enough to justify a built-in.

Instead of providing the text() built-in, the basestring type
could be changed to provide the same functionality. That would
possibly be confusing behaviour for an abstract base type.

Some people have suggested [3] that an easier migration path would
be to change the default encoding to be UTF-8. Code that is not
Unicode safe would then encode Unicode strings as UTF-8 and
operate on them as str instances, rather than raising a
UnicodeDecodeError exception. Other code would assume that str
instances were encoded using UTF-8 and decode them if necessary.
While that solution may work for some applications, it seems
unsuitable as a general solution. For example, some applications
get string data from many different sources and assuming that all
str instances were encoded using UTF-8 could easily introduce
subtle bugs.
References

[1] http://www.python.org/sf/1159501
[2] http://mail.python.org/pipermail/pyt...er/048755.html
[3] http://blog.ianbicking.org/illusive-...tencoding.html
Copyright

This document has been placed in the public domain.

Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:

Aug 6 '05 #1

Subscribe Post Reply

1822

wolf

hi,

i guess that anyone reading this pep will agree that
*something* must be done to the state of unicode affairs
in python. there are zillions of modules out there that
have str() scattered all over the place, and they all
*break* on the first mention of düsseldorf...

i'm not quite sure myself how to evolve python to make
it grow from unicode-enabled to unicode-perfect, so for
me some discussion would be a good thing. only two
micro-remarks to the pep as it stands:

1) i dislike the naming of the function ``text()`` --
i´ve been using the word 'text' for a long time to mean
'some appropriate representation of character data',
i.e. mostly something that would pass ::

assert isinstance(x,basestring)

i feel this is a fairly common way of defining the term,
so to me a function `` text(x)`` should really

* return its argument unaltered if it passes
``isinstance(x,basestring)``,

* try to return spefically a unicode object (by using
the ``x.__unicode__()`` method, where available)

* or return an 8bit-string (from ``x.__repr__()`` or
``x.__str__()``)

as discussed later on in the pep, it is conceivable to
assign the functionality of the ``text()`` function of
the pep to ``basestring`` -- that would make perfect
sense to me (not sure whether that stands scrutiny in
the big picture, tho).
2) really minor: somewhere near the beginning it says ::

def text(obj): return '%s' % obj

and the claim is that this "behaves as desired" except
for unicode-issues, which is incorrect. the second line
must read ::

return '%s' % ( obj, )

or else it will fail if ``obj`` is a tuple that is not
of length one.
cheers,

_wolf

Aug 22 '05 #2

Similar topics

Confused about pep 318

by: Edward K. Ream | last post by:

Hello all, First of all, my present state of mind re pep 318 is one of sheepish confusion. I suspect pep 318 will not affect Leo significantly, but I am most surprised that an apparently...

Python

PEP 263 status check

by: John Roth | last post by:

PEP 263 is marked finished in the PEP index, however I haven't seen the specified Phase 2 in the list of changes for 2.4 which is when I expected it. Did phase 2 get cancelled, or is it just not...

Python

pre-PEP: Print Without Intervening Space

by: Marcin Ciura | last post by:

Here is a pre-PEP about print that I wrote recently. Please let me know what is the community's opinion on it. Cheers, Marcin PEP: XXX Title: Print Without Intervening Space Version:...

Python

PEP on path module for standard library

by: Michael Hoffman | last post by:

Many of you are familiar with Jason Orendorff's path module <http://www.jorendorff.com/articles/python/path/>, which is frequently recommended here on c.l.p. I submitted an RFE to add it to the...

Python

Can a variable hold 2 values simultaneously - a string value and a numeric?

by: MLH | last post by:

120 MyString = "How many copies of each letter do you need?" 150 MyVariant = InputBox(MyString, "How Many?", "3") If MyVariant = "2" Then MsgBox "MyVariant equals the string '2'" If...

Microsoft Access / VBA

PEP 354: Enumerations in Python

by: Ben Finney | last post by:

Howdy all, PEP 354: Enumerations in Python has been accepted as a draft PEP. The current version can be viewed online: <URL:http://www.python.org/peps/pep-0354.html> Here is the...

Python

Comments sought for PEP 357 --- allowing any object in slice syntax

by: Travis Oliphant | last post by:

This post is to gather feedback from the wider community on PEP 357. It is nearing the acceptance stage and has previously been discussed on python-dev. This is a chance for the wider Python...

Python

399

PEP 3131: Supporting Non-ASCII Identifiers

by: =?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?= | last post by:

PEP 1 specifies that PEP authors need to collect feedback from the community. As the author of PEP 3131, I'd like to encourage comments to the PEP included below, either here (comp.lang.python), or...

Python

Coercion of a String Into a Double Doesnt work (??!!)

by: AGP | last post by:

I've been scratching my head for weeks to understand why some code doesnt work for me. here is what i have: dim sVal as string = "13.2401516" dim x as double x = sVal debug.writeline ( x)

Visual Basic .NET

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice