By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,766 Members | 1,432 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,766 IT Pros & Developers. It's quick & easy.

Unicode problems, yet again

P: n/a
I have a string fetched from database, in iso8859-2, with 8bit
characters, and I'm trying to send it over the network, via a socket:

File "E:\Python24\lib\socket.py", line 249, in write
data = str(data) # XXX Should really reject non-string non-buffers
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0161' in
position 123: ordinal not in range(128)

The other end knows it should expect this encoding, so how to send it?

(Does anyone else feel that python's unicode handling is, well...
suboptimal at least?)
Jul 19 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Ivan Voras wrote:
I have a string fetched from database, in iso8859-2, with 8bit
characters, and I'm trying to send it over the network, via a socket:

File "E:\Python24\lib\socket.py", line 249, in write
data = str(data) # XXX Should really reject non-string non-buffers
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0161' in
position 123: ordinal not in range(128)

The other end knows it should expect this encoding, so how to send it?
I think maybe the string from the database is a unicode string, not 8-bit. What happens if you write
data.encode('iso8859-2') ?

(Does anyone else feel that python's unicode handling is, well...
suboptimal at least?)


It can be confusing and surprising, yes. Suboptimal...well, I wouldn't want to say that I could do
better...

Kent
Jul 19 '05 #2

P: n/a
On Sun, 24 Apr 2005 03:15:02 +0200, Ivan Voras
<iv****@something.ortheother> wrote:
I have a string fetched from database, in iso8859-2, with 8bit
characters,
"8bit characters"?? Maybe you did once, or you thought you did, but
what you have now is a Unicode string, and socket.write() is expecting
an ordinary string.
and I'm trying to send it over the network, via a socket:

File "E:\Python24\lib\socket.py", line 249, in write
data = str(data) # XXX Should really reject non-string non-buffers
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0161' in
position 123: ordinal not in range(128)
Like it says, you have passed it a *UNICODE* string that has u'\u0161'
(the small s with caron) at position 123.

The other end knows it should expect this encoding, so how to send it?

If the other end wants an encoding, then you should *encode* it, like
this:

us = u'\u0161'
s = us.encode('iso8859_2')
s '\xb9' str(us) Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0161' in
position 0: ordinal not in range(128) str(s) '\xb9' # looks like socket.write() might be happier with this.

(Does anyone else feel that python's unicode handling is, well...
suboptimal at least?)


Your posting gives no evidence for such a conclusion.
Jul 19 '05 #3

P: n/a
John Machin wrote:
(Does anyone else feel that python's unicode handling is, well...
suboptimal at least?)

Your posting gives no evidence for such a conclusion.


Sorry, that was just steam venting from my ears - I often get bitten by
the "ordinal not in range(128)" error. :)
Jul 19 '05 #4

P: n/a
Ivan Voras wrote:
Sorry, that was just steam venting from my ears - I often get bitten by
the "ordinal not in range(128)" error. :)


I think I'm glad to hear that. Errors should never pass silently, unless
explicitly silenced. When you get that error, it means there is a bug in
your code (just like a ValueError, a TypeError, or an IndexError). The
best way to deal with them is to fix them.

Now, the troubling part is clearly that you are getting *bitten* by
this specific error, and often so. I presume you get other kinds of
errors also often, but they don't bite :-) This suggests that you should
really try to understand what the error message is trying to tell so,
and what precisely the underlying error is.

For other errors, you have already come to an understanding what they
mean: NameError, ah, there must be a typo. AttributeError on None, ah,
forgot to check for a None result somewhere. ordinal not in range(128),
hmm, let's try different variations of the code and see which ones
work. This is going to continue biting you until you really understand
what it means.

The most "sane" mental model (and architecture) is one where you always
have Unicode strings in your code, and decode/encode only at system
interfaces (sockets, databases, ...). It turns out that the database
you use already follows this strategy (i.e. it decodes for you), so
you now only need to design the other interfaces so it is clear when
you have Unicode characters and when you have bytes.

Regards,
Martin
Jul 19 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.