467,080 Members | 936 Online
Bytes | Developer Community
Ask Question

Home New Posts Topics Members FAQ

Post your question to a community of 467,080 developers. It's quick & easy.

Bug in python (Weird UnicodeDecodeError)

Hello

I am getting somewhat random UnicodeDecodeError messages in my program.

It is random in that I will be going through a pysqlite database of
records, manipulate
the results, and it will throw UnicodeDecodeError apparently without
regard
as to what data is being used.

For example I am reading in some book barcodes. These are 7 digit
strings. It processes a few thousand of these with no problem. But then
parway through the database results I get something like this:

for item in list:
UnicodeDecodeError : 'utf8' codec can't decode bytes in position 26-28:
invalid data

I am not quite sure how u'2349350' is 26 bytes long. Maybe it is. But
another time it said something like 'position 129'.

Furthermore when I rearrange the order the database data is retrieved
(with Order By), it barfs on a different piece of data than it did
before. I cannot figure it out.

I thought at first it was the eval() function being loopy, so I wrote
my own 'evalfix' function that handled the limited set of data i was
using. Then the thing barfed in a completely different spot.

And then I added more data to the database, and it barfed it yet a new
spot.

Any help is appreciated, thanks.

Dec 13 '05 #1
  • viewed: 1358
Share:
2 Replies
db******@gmail.com wrote:
... partway through the database results I get something like this:
for item in list:
UnicodeDecodeError : 'utf8' codec can't decode bytes in position 26-28:
invalid data

It is quite likely that the position is not what you think it is.
For one of the bad strings, print:
repr(thestring), [ord(ch) for ch in thestring]
This may give you a clue (and will definitely help us help you).
So far you have explained to us why you are confused, but have
not explained (with enough precision) what is going wrong in a
way that anyone can help you. I suspect that "position" is more
like a Unicode data point than the position within the string you
are feeding.

Show us the code doing the translation and the data it is being fed,
and we can help.

--Scott David Daniels
sc***********@acm.org
Dec 13 '05 #2
db******@gmail.com wrote:
I am getting somewhat random UnicodeDecodeError messages in my program.

It is random in that I will be going through a pysqlite database of
records, manipulate
the results, and it will throw UnicodeDecodeError apparently without
regard
as to what data is being used.

For example I am reading in some book barcodes. These are 7 digit
strings. It processes a few thousand of these with no problem. But then
parway through the database results I get something like this:

for item in list:
UnicodeDecodeError : 'utf8' codec can't decode bytes in position 26-28:
invalid data

I am not quite sure how u'2349350' is 26 bytes long. Maybe it is. But
another time it said something like 'position 129'.

Furthermore when I rearrange the order the database data is retrieved
(with Order By), it barfs on a different piece of data than it did
before. I cannot figure it out.


it's probably a bug in pysqlite (or some other C extension), which does
some conversion somewhere, but forgets to check the return status.

(if you raise an exception at the C level, but forget to flag it back to
the interpreter when you return to Python, the error may occur in a
seemingly random location.)

you can usually

hasattr(None, "none")

to reset the error state (at least this worked in older versions; I think
it should work in 2.3 and 2.4 as well). try adding such calls after the
database calls, and see if the problem goes away... (if it does, com-
plain to the pysqlite developers).

</F>

Dec 13 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

2 posts views Thread by Edward K. Ream | last post: by
16 posts views Thread by Paul Prescod | last post: by
9 posts views Thread by Mizipzor | last post: by
2 posts views Thread by John Nagle | last post: by
28 posts views Thread by Christoph Zwerschke | last post: by
4 posts views Thread by weheh | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.