Bug in python (Weird UnicodeDecodeError)

dbri.tcc

Hello

I am getting somewhat random UnicodeDecodeError messages in my program.

It is random in that I will be going through a pysqlite database of
records, manipulate
the results, and it will throw UnicodeDecodeError apparently without
regard
as to what data is being used.

For example I am reading in some book barcodes. These are 7 digit
strings. It processes a few thousand of these with no problem. But then
parway through the database results I get something like this:

for item in list:
UnicodeDecodeError : 'utf8' codec can't decode bytes in position 26-28:
invalid data

I am not quite sure how u'2349350' is 26 bytes long. Maybe it is. But
another time it said something like 'position 129'.

Furthermore when I rearrange the order the database data is retrieved
(with Order By), it barfs on a different piece of data than it did
before. I cannot figure it out.

I thought at first it was the eval() function being loopy, so I wrote
my own 'evalfix' function that handled the limited set of data i was
using. Then the thing barfed in a completely different spot.

And then I added more data to the database, and it barfed it yet a new
spot.

Any help is appreciated, thanks.

Dec 13 '05 #1

Subscribe Post Reply

1552

Scott David Daniels

db******@gmail.com wrote:

... partway through the database results I get something like this:
for item in list:
UnicodeDecodeError : 'utf8' codec can't decode bytes in position 26-28:
invalid data

It is quite likely that the position is not what you think it is.
For one of the bad strings, print:
repr(thestring), [ord(ch) for ch in thestring]
This may give you a clue (and will definitely help us help you).
So far you have explained to us why you are confused, but have
not explained (with enough precision) what is going wrong in a
way that anyone can help you. I suspect that "position" is more
like a Unicode data point than the position within the string you
are feeding.

Show us the code doing the translation and the data it is being fed,
and we can help.

--Scott David Daniels
sc***********@acm.org

Dec 13 '05 #2

Fredrik Lundh

db******@gmail.com wrote:

I am getting somewhat random UnicodeDecodeError messages in my program.

It is random in that I will be going through a pysqlite database of
records, manipulate
the results, and it will throw UnicodeDecodeError apparently without
regard
as to what data is being used.

For example I am reading in some book barcodes. These are 7 digit
strings. It processes a few thousand of these with no problem. But then
parway through the database results I get something like this:

for item in list:
UnicodeDecodeError : 'utf8' codec can't decode bytes in position 26-28:
invalid data

I am not quite sure how u'2349350' is 26 bytes long. Maybe it is. But
another time it said something like 'position 129'.

Furthermore when I rearrange the order the database data is retrieved
(with Order By), it barfs on a different piece of data than it did
before. I cannot figure it out.

it's probably a bug in pysqlite (or some other C extension), which does
some conversion somewhere, but forgets to check the return status.

(if you raise an exception at the C level, but forget to flag it back to
the interpreter when you return to Python, the error may occur in a
seemingly random location.)

you can usually

hasattr(None, "none")

to reset the error state (at least this worked in older versions; I think
it should work in 2.3 and 2.4 as well). try adding such calls after the
database calls, and see if the problem goes away... (if it does, com-
plain to the pysqlite developers).

</F>

Dec 13 '05 #3

by: Edward K. Ream | last post by:

From the documentation for the string module at: C:\Python23\Doc\Python-Docs-2.3.1\lib\module-string.html letters: The concatenation of the strings lowercase and uppercase described below....

Python

Prothon should not borrow Python strings!

by: Paul Prescod | last post by:

I skimmed the tutorial and something alarmed me. "Strings are a powerful data type in Prothon. Unlike many languages, they can be of unlimited size (constrained only by memory size) and can hold...

Python

help wanted regarding displaying Japanese characters in a GUI using QT and python

by: prats | last post by:

I want to write a GUI application in PYTHON using QT. This application is supposed to take in Japanese characters. I am using PyQt as the wrapper for using QT from python. I am able to take input...

Python

"Subscribing" to topics?

by: Mizipzor | last post by:

Is there a way to "subscribe" to individual topics? im currently getting bombarded with daily digests and i wish to only receive a mail when there is activity in a topic that interests me. Can this...

Python

sgmllib bug in Python 2.5, works in 2.4.

by: John Nagle | last post by:

(Was prevously posted as a followup to something else by accident.) I'm running a website page through BeautifulSoup. It parses OK with Python 2.4, but Python 2.5 fails with an exception: ...

Python

Re-raising exceptions with modified message

by: Christoph Zwerschke | last post by:

What is the best way to re-raise any exception with a message supplemented with additional information (e.g. line number in a template)? Let's say for simplicity I just want to add "sorry" to every...

Python

How to get Python to default to UTF8

by: weheh | last post by:

I'm developing a cgi-bin application that must be unicode sensitive. I'm striving for a UTF8 implementation. I'm running python 2.3 on a development machine (windows xp) and a server (windows xp...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Bug in python (Weird UnicodeDecodeError)

Similar topics