473,385 Members | 1,753 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Unicode problems, yet again

I have a string fetched from database, in iso8859-2, with 8bit
characters, and I'm trying to send it over the network, via a socket:

File "E:\Python24\lib\socket.py", line 249, in write
data = str(data) # XXX Should really reject non-string non-buffers
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0161' in
position 123: ordinal not in range(128)

The other end knows it should expect this encoding, so how to send it?

(Does anyone else feel that python's unicode handling is, well...
suboptimal at least?)
Jul 19 '05 #1
4 2659
Ivan Voras wrote:
I have a string fetched from database, in iso8859-2, with 8bit
characters, and I'm trying to send it over the network, via a socket:

File "E:\Python24\lib\socket.py", line 249, in write
data = str(data) # XXX Should really reject non-string non-buffers
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0161' in
position 123: ordinal not in range(128)

The other end knows it should expect this encoding, so how to send it?
I think maybe the string from the database is a unicode string, not 8-bit. What happens if you write
data.encode('iso8859-2') ?

(Does anyone else feel that python's unicode handling is, well...
suboptimal at least?)


It can be confusing and surprising, yes. Suboptimal...well, I wouldn't want to say that I could do
better...

Kent
Jul 19 '05 #2
On Sun, 24 Apr 2005 03:15:02 +0200, Ivan Voras
<iv****@something.ortheother> wrote:
I have a string fetched from database, in iso8859-2, with 8bit
characters,
"8bit characters"?? Maybe you did once, or you thought you did, but
what you have now is a Unicode string, and socket.write() is expecting
an ordinary string.
and I'm trying to send it over the network, via a socket:

File "E:\Python24\lib\socket.py", line 249, in write
data = str(data) # XXX Should really reject non-string non-buffers
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0161' in
position 123: ordinal not in range(128)
Like it says, you have passed it a *UNICODE* string that has u'\u0161'
(the small s with caron) at position 123.

The other end knows it should expect this encoding, so how to send it?

If the other end wants an encoding, then you should *encode* it, like
this:

us = u'\u0161'
s = us.encode('iso8859_2')
s '\xb9' str(us) Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0161' in
position 0: ordinal not in range(128) str(s) '\xb9' # looks like socket.write() might be happier with this.

(Does anyone else feel that python's unicode handling is, well...
suboptimal at least?)


Your posting gives no evidence for such a conclusion.
Jul 19 '05 #3
John Machin wrote:
(Does anyone else feel that python's unicode handling is, well...
suboptimal at least?)

Your posting gives no evidence for such a conclusion.


Sorry, that was just steam venting from my ears - I often get bitten by
the "ordinal not in range(128)" error. :)
Jul 19 '05 #4
Ivan Voras wrote:
Sorry, that was just steam venting from my ears - I often get bitten by
the "ordinal not in range(128)" error. :)


I think I'm glad to hear that. Errors should never pass silently, unless
explicitly silenced. When you get that error, it means there is a bug in
your code (just like a ValueError, a TypeError, or an IndexError). The
best way to deal with them is to fix them.

Now, the troubling part is clearly that you are getting *bitten* by
this specific error, and often so. I presume you get other kinds of
errors also often, but they don't bite :-) This suggests that you should
really try to understand what the error message is trying to tell so,
and what precisely the underlying error is.

For other errors, you have already come to an understanding what they
mean: NameError, ah, there must be a typo. AttributeError on None, ah,
forgot to check for a None result somewhere. ordinal not in range(128),
hmm, let's try different variations of the code and see which ones
work. This is going to continue biting you until you really understand
what it means.

The most "sane" mental model (and architecture) is one where you always
have Unicode strings in your code, and decode/encode only at system
interfaces (sockets, databases, ...). It turns out that the database
you use already follows this strategy (i.e. it decodes for you), so
you now only need to design the other interfaces so it is clear when
you have Unicode characters and when you have bytes.

Regards,
Martin
Jul 19 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

30
by: aurora | last post by:
I have long find the Python default encoding of strict ASCII frustrating. For one thing I prefer to get garbage character than an exception. But the biggest issue is Unicode exception often pop up...
4
by: webdev | last post by:
lo all, some of the questions i'll ask below have most certainly been discussed already, i just hope someone's kind enough to answer them again to help me out.. so i started a python 2.3...
4
by: Julia | last post by:
Hi, I need to convert unicode string to ansi string Thanks in adavance.
19
by: Thomas W | last post by:
I'm getting really annoyed with python in regards to unicode/ascii-encoding problems. The string below is the encoding of the norwegian word "fødselsdag". I stored the string as "fødselsdag"...
1
by: Mudcat | last post by:
In short what I'm trying to do is read a document using an xml parser and then upload that data back into a database. I've got the code more or less completed using xml.etree.ElementTree for the...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.