By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
445,743 Members | 1,122 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 445,743 IT Pros & Developers. It's quick & easy.

Bug in python (Weird UnicodeDecodeError)

P: n/a
Hello

I am getting somewhat random UnicodeDecodeError messages in my program.

It is random in that I will be going through a pysqlite database of
records, manipulate
the results, and it will throw UnicodeDecodeError apparently without
regard
as to what data is being used.

For example I am reading in some book barcodes. These are 7 digit
strings. It processes a few thousand of these with no problem. But then
parway through the database results I get something like this:

for item in list:
UnicodeDecodeError : 'utf8' codec can't decode bytes in position 26-28:
invalid data

I am not quite sure how u'2349350' is 26 bytes long. Maybe it is. But
another time it said something like 'position 129'.

Furthermore when I rearrange the order the database data is retrieved
(with Order By), it barfs on a different piece of data than it did
before. I cannot figure it out.

I thought at first it was the eval() function being loopy, so I wrote
my own 'evalfix' function that handled the limited set of data i was
using. Then the thing barfed in a completely different spot.

And then I added more data to the database, and it barfed it yet a new
spot.

Any help is appreciated, thanks.

Dec 13 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
db******@gmail.com wrote:
... partway through the database results I get something like this:
for item in list:
UnicodeDecodeError : 'utf8' codec can't decode bytes in position 26-28:
invalid data

It is quite likely that the position is not what you think it is.
For one of the bad strings, print:
repr(thestring), [ord(ch) for ch in thestring]
This may give you a clue (and will definitely help us help you).
So far you have explained to us why you are confused, but have
not explained (with enough precision) what is going wrong in a
way that anyone can help you. I suspect that "position" is more
like a Unicode data point than the position within the string you
are feeding.

Show us the code doing the translation and the data it is being fed,
and we can help.

--Scott David Daniels
sc***********@acm.org
Dec 13 '05 #2

P: n/a
db******@gmail.com wrote:
I am getting somewhat random UnicodeDecodeError messages in my program.

It is random in that I will be going through a pysqlite database of
records, manipulate
the results, and it will throw UnicodeDecodeError apparently without
regard
as to what data is being used.

For example I am reading in some book barcodes. These are 7 digit
strings. It processes a few thousand of these with no problem. But then
parway through the database results I get something like this:

for item in list:
UnicodeDecodeError : 'utf8' codec can't decode bytes in position 26-28:
invalid data

I am not quite sure how u'2349350' is 26 bytes long. Maybe it is. But
another time it said something like 'position 129'.

Furthermore when I rearrange the order the database data is retrieved
(with Order By), it barfs on a different piece of data than it did
before. I cannot figure it out.


it's probably a bug in pysqlite (or some other C extension), which does
some conversion somewhere, but forgets to check the return status.

(if you raise an exception at the C level, but forget to flag it back to
the interpreter when you return to Python, the error may occur in a
seemingly random location.)

you can usually

hasattr(None, "none")

to reset the error state (at least this worked in older versions; I think
it should work in 2.3 and 2.4 as well). try adding such calls after the
database calls, and see if the problem goes away... (if it does, com-
plain to the pysqlite developers).

</F>

Dec 13 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.