treating str as unicode in legacy code?

Ben

I'm left with some legacy code using plain old str, and I need to make
sure it works with unicode input/output. I have a simple plan to do
this:

- Run the code with "python -U" so all the string literals become
unicode litrals.
- Add this statement

str = unicode

to all .py files so the type comparison (e.g., type('123') == str)
would work.
Did I miss anything? Does this sound like a workable plan?

Thanks!

Apr 12 '07 #1

Subscribe Post Reply

1573

Steve Holden

Ben wrote:

I'm left with some legacy code using plain old str, and I need to make
sure it works with unicode input/output. I have a simple plan to do
this:

- Run the code with "python -U" so all the string literals become
unicode litrals.
- Add this statement

str = unicode

to all .py files so the type comparison (e.g., type('123') == str)
would work.
Did I miss anything? Does this sound like a workable plan?

Thanks!

Well, don't forget that the assignment to str *shadows* the built-in
rather than replacing it, so there may be places (imported modules being
the example that most readily springs to mind) where that replacement
won't be effective.

Plus which in CPython the C parts of the code may well be creating and
expecting objects of type str but they won't use the Python naming
mechanism at all, so you will have no way to effect changes in those
behaviors.

This will probably account for about 95% of any strangeness you see, but
it's probably a good first step in the conversion process.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Recent Ramblings http://holdenweb.blogspot.com

Apr 13 '07 #2

John Machin

On Apr 13, 5:57 am, "Ben" <benjamin....@gmail.comwrote:

I'm left with some legacy code using plain oldstr, and I need to make
sure it works withunicodeinput/output. I have a simple plan to do
this:

- Run the code with "python -U" so all the string literals becomeunicodelitrals.

Requiring that the code is always run with a non-default argument
doesn't seem very robust/portable to me.

- Add this statement

str=unicode

to all .py files so the type comparison (e.g., type('123') ==str)
would work.

IMVHO (1) doing that merely changes "legacy code" to "kludged legacy
code" (2) there is no substitute for reading the code and trying to
nut out what it is doing.

Do you mean that those two things are the ONLY changes you plan to
make?

Did I miss anything? Does this sound like a workable plan?

Do you need to make sure it still works with ASCII input? With input
in some other encoding e.g. cp1252?

What do you mean by "unicode input"? Bear in mind that if you want to
work with Python unicode objects internally, input from a file /
socket / whatever will need to be decoded i.e. you will have to read
the code and make appropriate changes. Data stored in (say) utf_16_le
encoding is not "unicode" in the sense that you need; it still has to
be decoded.

What do you mean by "unicode output"? You are going to need to encode
your output.

This doesn't work; the output is not "unicode" in any meaningful
sense:

>>f = open(u'uout', u'w')

### Warning: you need to hope that all builtins etc that you are
calling cope with unicode arguments as well as the above one does.

>>f.write(u'abcde\n')
f.close()
open(u'uout', u'rb').read()

'abcde\r\n'

This doesn't work; it crashes.

>>f = open('uout2', u'w')
f.write(u'abcde\xff\n')

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in
position 5:
ordinal not in range(128)

>>>

Some object methods work differently with unicode; e.g. (1)
str.translate and unicode.translate.

(2)

>>'abc\xA0def'.split()

['abc\xa0def']

>>u'abc\xA0def'.split()

[u'abc', u'def']
NameError: name 'isspace' is not defined

>>'\xA0'.isspace()

False

>>u'\xA0'.isspace()

True

>>>

HTH,
John

Apr 14 '07 #3

Similar topics

convert Unicode to lower/uppercase?

by: Hallvard B Furuseth | last post by:

Has someone got a Python routine or module which converts Unicode strings to lowercase (or uppercase)? What I actually need to do is to compare a number of strings in a case-insensitive manner,...

Python

Trouble saving unicode text to file

by: Svennglenn | last post by:

I'm working on a program that is supposed to save different information to text files. Because the program is in swedish i have to use unicode text for ÅÄÖ letters. When I run the following...

Python

unicode strings and network byte ordering ?

by: srikant | last post by:

I am writing a client in C# that needs to communicate over the network to a legacy C++ application that uses Unicode strings. I realize that C# strings are already in Unicode, however, how do I...

C# / C Sharp

How to find number of characters in a unicode string?

by: Preben Randhol | last post by:

Hi If I use len() on a string containing unicode letters I get the number of bytes the string uses. This means that len() can report size 6 when the unicode string only contains 3 characters...

Python

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA