Unicode problems, yet again

Ivan Voras

I have a string fetched from database, in iso8859-2, with 8bit
characters, and I'm trying to send it over the network, via a socket:

File "E:\Python24\lib\socket.py", line 249, in write
data = str(data) # XXX Should really reject non-string non-buffers
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0161' in
position 123: ordinal not in range(128)

The other end knows it should expect this encoding, so how to send it?

(Does anyone else feel that python's unicode handling is, well...
suboptimal at least?)

Jul 19 '05 #1

Subscribe Post Reply

2659

Kent Johnson

Ivan Voras wrote:

I have a string fetched from database, in iso8859-2, with 8bit
characters, and I'm trying to send it over the network, via a socket:

File "E:\Python24\lib\socket.py", line 249, in write
data = str(data) # XXX Should really reject non-string non-buffers
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0161' in
position 123: ordinal not in range(128)

The other end knows it should expect this encoding, so how to send it?
I think maybe the string from the database is a unicode string, not 8-bit. What happens if you write
data.encode('iso8859-2') ?

(Does anyone else feel that python's unicode handling is, well...
suboptimal at least?)

It can be confusing and surprising, yes. Suboptimal...well, I wouldn't want to say that I could do
better...

Kent

Jul 19 '05 #2

John Machin

On Sun, 24 Apr 2005 03:15:02 +0200, Ivan Voras
<iv****@something.ortheother> wrote:

I have a string fetched from database, in iso8859-2, with 8bit
characters,
"8bit characters"?? Maybe you did once, or you thought you did, but
what you have now is a Unicode string, and socket.write() is expecting
an ordinary string.
and I'm trying to send it over the network, via a socket:

File "E:\Python24\lib\socket.py", line 249, in write
data = str(data) # XXX Should really reject non-string non-buffers
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0161' in
position 123: ordinal not in range(128)
Like it says, you have passed it a *UNICODE* string that has u'\u0161'
(the small s with caron) at position 123.

The other end knows it should expect this encoding, so how to send it?

If the other end wants an encoding, then you should *encode* it, like
this:

us = u'\u0161'
s = us.encode('iso8859_2')
s '\xb9' str(us) Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0161' in
position 0: ordinal not in range(128) str(s) '\xb9' # looks like socket.write() might be happier with this.

(Does anyone else feel that python's unicode handling is, well...
suboptimal at least?)

Your posting gives no evidence for such a conclusion.

Jul 19 '05 #3

Ivan Voras

John Machin wrote:

(Does anyone else feel that python's unicode handling is, well...
suboptimal at least?)

Your posting gives no evidence for such a conclusion.

Sorry, that was just steam venting from my ears - I often get bitten by
the "ordinal not in range(128)" error. :)

Jul 19 '05 #4

Martin v. Löwis

Ivan Voras wrote:

Sorry, that was just steam venting from my ears - I often get bitten by
the "ordinal not in range(128)" error. :)

I think I'm glad to hear that. Errors should never pass silently, unless
explicitly silenced. When you get that error, it means there is a bug in
your code (just like a ValueError, a TypeError, or an IndexError). The
best way to deal with them is to fix them.

Now, the troubling part is clearly that you are getting *bitten* by
this specific error, and often so. I presume you get other kinds of
errors also often, but they don't bite :-) This suggests that you should
really try to understand what the error message is trying to tell so,
and what precisely the underlying error is.

For other errors, you have already come to an understanding what they
mean: NameError, ah, there must be a typo. AttributeError on None, ah,
forgot to check for a None result somewhere. ordinal not in range(128),
hmm, let's try different variations of the code and see which ones
work. This is going to continue biting you until you really understand
what it means.

The most "sane" mental model (and architecture) is one where you always
have Unicode strings in your code, and decode/encode only at system
interfaces (sockets, databases, ...). It turns out that the database
you use already follows this strategy (i.e. it decodes for you), so
you now only need to design the other interfaces so it is clear when
you have Unicode characters and when you have bytes.

Regards,
Martin

Jul 19 '05 #5

Similar topics

unicode encoding usablilty problem

by: aurora | last post by:

I have long find the Python default encoding of strict ASCII frustrating. For one thing I prefer to get garbage character than an exception. But the biggest issue is Unicode exception often pop up...

Python

minidom xml & non ascii / unicode & files

by: webdev | last post by:

lo all, some of the questions i'll ask below have most certainly been discussed already, i just hope someone's kind enough to answer them again to help me out.. so i started a python 2.3...

Python

How do i convert unicode string to ansi string in C#?

by: Julia | last post by:

Hi, I need to convert unicode string to ansi string Thanks in adavance.

C# / C Sharp

Unicode/ascii encoding nightmare

by: Thomas W | last post by:

I'm getting really annoyed with python in regards to unicode/ascii-encoding problems. The string below is the encoding of the norwegian word "fødselsdag". I stored the string as "fødselsdag"...

Python

Writing Unicode to database using ODBC

by: Mudcat | last post by:

In short what I'm trying to do is read a document using an xml parser and then upload that data back into a database. I've got the code more or less completed using xml.etree.ElementTree for the...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing