[unicode] inconvenient unicode conversion of non-string arguments

Holger Joukl

Hi there,

I consider the behaviour of unicode() inconvenient wrt to conversion of
non-string
arguments.
While you can do:

>>unicode(17.3)

u'17.3'

you cannot do:

>>unicode(17.3, 'ISO-8859-1', 'replace')

Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: coercing to Unicode: need string or buffer, float found

>>>

This is somehow annoying when you want to convert a mixed-type argument
list
to unicode strings, e.g. for a logging system (that's where it bit me) and
want to make sure that possible raw string arguments are also convertedto
unicode without errors (although by force).
Especially as this is a performance-critical part in my application so I
really
do not like to wrap unicode() into some custom tounicode() function that
handles
such cases by distinction of argument types.

Any reason why unicode() with a non-string argument should not allow the
encoding and errors arguments?
Or some good solution to work around my problem?

(Currently running on python 2.4.3)

Regards,
Holger

Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene
Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiertwurde,
verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail
sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sindnicht
gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht
garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Siebitte
den Inhalt der E-Mail als Hardcopy an.

The contents of this e-mail are confidential. If you are not the named
addressee or if this transmission has been addressed to you in error,
please notify the sender immediately and then delete this e-mail. Any
unauthorized copying and transmission is forbidden. E-Mail transmission
cannot be guaranteed to be secure. If verification is required, please
request a hard copy version.

Dec 13 '06 #1

Subscribe Reply

3019

Leo Kislov

Holger Joukl wrote:

Hi there,

I consider the behaviour of unicode() inconvenient wrt to conversion of
non-string
arguments.
While you can do:

>unicode(17.3)

u'17.3'

you cannot do:

>unicode(17.3, 'ISO-8859-1', 'replace')

Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: coercing to Unicode: need string or buffer, float found

>>

This is somehow annoying when you want to convert a mixed-type argument
list
to unicode strings, e.g. for a logging system (that's where it bit me) and
want to make sure that possible raw string arguments are also converted to
unicode without errors (although by force).
Especially as this is a performance-critical part in my application so I
really
do not like to wrap unicode() into some custom tounicode() function that
handles
such cases by distinction of argument types.

Any reason why unicode() with a non-string argument should not allow the
encoding and errors arguments?

There is reason: encoding is a property of bytes, it is not applicable
to other objects.

Or some good solution to work around my problem?

Do not put undecoded bytes in a mixed-type argument list. A rule of
thumb working with unicode: decode as soon as possible, encode as late
as possible.

-- Leo

Dec 13 '06 #2

Fredrik Lundh

Holger Joukl wrote:

Ok, but I still don't see why these arguments shouldn't simply be silently
ignored

>>import this

</F>

Dec 13 '06 #3

Leo Kislov

Holger Joukl wrote:

py**************************************@python.or g schrieb am 13.12.2006
11:02:30:

Holger Joukl wrote:
Hi there,
>
I consider the behaviour of unicode() inconvenient wrt to conversion of
non-string
arguments.
While you can do:
>
>unicode(17.3)
u'17.3'
>
you cannot do:
>
>unicode(17.3, 'ISO-8859-1', 'replace')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: coercing to Unicode: need string or buffer, float found
>>
[...]
Any reason why unicode() with a non-string argument should not allow

the

encoding and errors arguments?
There is reason: encoding is a property of bytes, it is not applicable
to other objects.

Ok, but I still don't see why these arguments shouldn't simply be silently
ignored
for non-string arguments.

That's rather bizzare and sloppy approach. Should

unicode(17.3, 'just-having-fun', 'I-do-not-like-errors')
unicode(17.3, 'sdlfkj', 'ewrlkj', 'eoirj', 'sdflkj')

work?

Or some good solution to work around my problem?
Do not put undecoded bytes in a mixed-type argument list. A rule of
thumb working with unicode: decode as soon as possible, encode as late
as possible.

It's not always that easy when you deal with a tree data structure with the
tree elements containing different data types and your user may decide to
output
root.element.subelement.whateverData.
I have the problems in a logging mechanism, and it would vanish if
unicode(<non-string>, encoding, errors) would work and just ignore the
obsolete
arguments.

I don't really see from your example what stops you from putting
unicode instead of bytes into your tree, but I can believe some
libraries can cause some extra work. That's the problem with libraries,
not with builtin function unicode(). Would you be happy if floating
point value 17.3 would be stored as 8 bytes in your tree? After all,
that is how 17.3 is actually represented in computer memory. Same story
with unicode, if some library gives you raw bytes *you* have to do
extra work later.

-- Leo

Dec 13 '06 #4

Marc 'BlackJack' Rintsch

In <ma***************************************@python. org>, Holger Joukl
wrote:

Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene
EmpfÃ¤nger sind oder falls diese E-Mail irrtÃ¼mlich an Sie adressiert wurde,
verstÃ¤ndigen Sie bitte den Absender sofort und lÃ¶schen Sie die E-Mail
sodann. Das unerlaubte Kopieren sowie die unbefugte Ãœbermittlung sind nicht
gestattet. Die Sicherheit von Ãœbermittlungen per E-Mail kann nicht
garantiert werden. Falls Sie eine BestÃ¤tigung wÃ¼nschen, fordern Sie bitte
den Inhalt der E-Mail als Hardcopy an.

The contents of this e-mail are confidential. If you are not the named
addressee or if this transmission has been addressed to you in error,
please notify the sender immediately and then delete this e-mail. Any
unauthorized copying and transmission is forbidden. E-Mail transmission
cannot be guaranteed to be secure. If verification is required, please
request a hard copy version.

Maybe you should rethink if it really makes sense to add this huge block
of "nonsense" to a post to a newsgroup or public mailing list. If it's
confidential, just keep it secret. ;-)

Ciao,
Marc 'BlackJack' Rintsch

Dec 13 '06 #5

Ben Finney

"Marc 'BlackJack' Rintsch" <bj****@gmx.netwrites:

In <ma***************************************@python. org>, Holger Joukl
wrote:
[a meaningless disclaimer text at the bottom of every message]

Maybe you should rethink if it really makes sense to add this huge
block of "nonsense" to a post to a newsgroup or public mailing list.
If it's confidential, just keep it secret. ;-)

In all likelihood, the OP isn't choosing specifically to attach it;
these things are often done to *every* outgoing message at an
organisational level by people who don't think the issue through very
well.

<URL:http://goldmark.org/jeff/stupid-disclaimers/>

Please, those with such badly-configured systems, discuss the issue of
public discussion forums with the boneheads who think these disclaimer
texts are a good idea and at least try to change that behaviour.

Alternatively, post from some other mail system that doesn't slap
these obnoxious blocks onto your messages.

--
\ "I wish there was a knob on the TV to turn up the intelligence. |
`\ There's a knob called 'brightness' but it doesn't work." -- |
_o__) Eugene P. Gallagher |
Ben Finney

Dec 14 '06 #6

Similar topics

2608

Unicode entries on sys.path

by: Thomas Heller | last post by:

I was trying to track down a bug in py2exe where the executable did not work when it is in a directory containing japanese characters. Then, I discovered that part of the problem is in the...

Python

5438

Converting between Unicode and default locale

by: Keith MacDonald | last post by:

Hello, Is there a portable (at least for VC.Net and g++) method to convert text between wchar_t and char, using the standard library? I may have missed something obvious, but the section on...

C / C++

2571

Unicode conversion issue.

by: Jonathan | last post by:

I have a unicode database and I basically wish to publish out certain data (via views) from it to a non unicode database. Unfortunately we can not change the type of either of the databases due to...

Microsoft SQL Server

2759

sgml vs unicode notation

by: S. | last post by:

if in my website i am using the sgml { notation, is it accurate to say to my users that the site uses unicode or that it requires unicode? is there a mathematical formula to calculate a unicode...

HTML / CSS

2584

unicode text file

by: Koulbak | last post by:

I have some unicode (utf8) text file. I _tried_ to write a simple program that read one of them and write it to the standard output but... of course it doesn't work. There is an easy way to do it?...

C / C++

7947

Non-Unicode to Unicode Data conversion

by: New MSSQL DBA | last post by:

Hi all, we are now planning to upgrade our application from a non-unicode version to a unicode version. The application's backend is a SQL Server 2000 SP3. The concern is, existing business...

Microsoft SQL Server

4052

Arabic Language support and Unicode Performance

by: Hari Shankar | last post by:

Dear all I am supposed to develop a world ready application in C# 2003 (which will run in xp) which should aupport Arabic & Hebrew also. My doubts(Fears) are: 1. Do i have to create all the...

C# / C Sharp

2788

unicode html

by: lorenzo.viscanti | last post by:

X-No-Archive: yes Hi, I've found lots of material on the net about unicode html conversions, but still i'm having many problems converting unicode characters to html entities. Is there any...

Python

430

Unicode conversion

by: ankan.banerjee | last post by:

Hi, I am currently trying to get an application to support Turkish language... The exact scenario is that we are trying to execute a BULK INSERT query in our MS SQL database based on a data...

C / C++

2804

Exporting unicode characters to a file (at this point I dont carewhat type of file)

by: JimmyKoolPantz | last post by:

We purchased som software for encoding a barcode. We want to automate the process of converting a number to a readable barcode. However, I am having a few issues. The file that the barcode...

Visual Basic .NET

7125

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

7004

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

7208

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

6890

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

7379

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

3085

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

1423

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp

657

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

292

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

General