469,610 Members | 1,494 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,610 developers. It's quick & easy.

sqlite utf8 encoding error

I have an application that uses sqlite3 to store job/error data. When
I log in as a German user the error codes generated are translated into
German. The error code text is then stored in the db. When I use the
fetchall() to retrieve the data to generate a report I get the
following error:

Traceback (most recent call last):
File "c:\Pest3\Glosser\baseApp\reportGen.py", line 199, in
OnGenerateButtonNow
self.OnGenerateButton(event)
File "c:\Pest3\Glosser\baseApp\reportGen.py", line 243, in
OnGenerateButton
warningresult = messagecursor1.fetchall()
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
unsupported Unicode code range

does anyone have any idea on what could be going wrong? The string
that I store in the database table is:

'Keinen Text für Übereinstimmungsfehler gefunden'

I thought that all strings were stored in unicode in sqlite.

Greg Miller

Nov 22 '05 #1
16 7969
Greg Miller wrote:
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
unsupported Unicode code range

does anyone have any idea on what could be going wrong? The string
that I store in the database table is:

'Keinen Text für Übereinstimmungsfehler gefunden'
$ more test.py
# -*- coding: iso-8859-1 -*-
u = u'Keinen Text für Übereinstimmungsfehler gefunden'
s = u.encode("iso-8859-1")
u = s.decode("utf-8") # <-- this gives an error

$ python test.py
Traceback (most recent call last):
File "test.py", line 4, in ?
u = s.decode("utf-8") # <-- this gives an error
File "lib/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
unsupported Unicode code range
I thought that all strings were stored in unicode in sqlite.


did you pass in a Unicode string or an 8-bit string when you stored the text ?

</F>

Nov 22 '05 #2
Greg Miller wrote:
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
unsupported Unicode code range

does anyone have any idea on what could be going wrong? The string
that I store in the database table is:

'Keinen Text für Übereinstimmungsfehler gefunden'
$ more test.py
# -*- coding: iso-8859-1 -*-
u = u'Keinen Text für Übereinstimmungsfehler gefunden'
s = u.encode("iso-8859-1")
u = s.decode("utf-8") # <-- this gives an error

$ python test.py
Traceback (most recent call last):
File "test.py", line 4, in ?
u = s.decode("utf-8") # <-- this gives an error
File "lib/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
unsupported Unicode code range
I thought that all strings were stored in unicode in sqlite.


did you pass in a Unicode string or an 8-bit string when you stored the text ?

</F>

Nov 22 '05 #3
Greg Miller enlightened us with:
'Keinen Text für Übereinstimmungsfehler gefunden'
You posted it as "Keinen Text f<FC>r ...", which is Latin-1, not
UTF-8.
I thought that all strings were stored in unicode in sqlite.


Only if you put them into the DB as such. Make sure you're inserting
UTF-8 text, since the DB won't do character conversion for you.

Sybren
--
The problem with the world is stupidity. Not saying there should be a
capital punishment for stupidity, but why don't we just take the
safety labels off of everything and let the problem solve itself?
Frank Zappa
Nov 22 '05 #4
Greg Miller enlightened us with:
'Keinen Text für Übereinstimmungsfehler gefunden'
You posted it as "Keinen Text f<FC>r ...", which is Latin-1, not
UTF-8.
I thought that all strings were stored in unicode in sqlite.


Only if you put them into the DB as such. Make sure you're inserting
UTF-8 text, since the DB won't do character conversion for you.

Sybren
--
The problem with the world is stupidity. Not saying there should be a
capital punishment for stupidity, but why don't we just take the
safety labels off of everything and let the problem solve itself?
Frank Zappa
Nov 22 '05 #5
Fredrik Lundh napisa³(a):
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
unsupported Unicode code range

does anyone have any idea on what could be going wrong? The string
that I store in the database table is:

'Keinen Text für Übereinstimmungsfehler gefunden'


$ more test.py
# -*- coding: iso-8859-1 -*-
u = u'Keinen Text für Übereinstimmungsfehler gefunden'
s = u.encode("iso-8859-1")
u = s.decode("utf-8") # <-- this gives an error

$ python test.py
Traceback (most recent call last):
File "test.py", line 4, in ?
u = s.decode("utf-8") # <-- this gives an error
File "lib/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
unsupported Unicode code range


I cann't wait for the moment when encoded strings go away from Python.
The more I program in this language, the more confusion this difference
is causing. Now most of functions and various object's methods accept
strings and unicode, making it hard to find sources of Unicode*Errors.

--
Jarek Zgoda
http://jpa.berlios.de/
Nov 22 '05 #6
Fredrik Lundh napisa³(a):
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
unsupported Unicode code range

does anyone have any idea on what could be going wrong? The string
that I store in the database table is:

'Keinen Text für Übereinstimmungsfehler gefunden'


$ more test.py
# -*- coding: iso-8859-1 -*-
u = u'Keinen Text für Übereinstimmungsfehler gefunden'
s = u.encode("iso-8859-1")
u = s.decode("utf-8") # <-- this gives an error

$ python test.py
Traceback (most recent call last):
File "test.py", line 4, in ?
u = s.decode("utf-8") # <-- this gives an error
File "lib/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
unsupported Unicode code range


I cann't wait for the moment when encoded strings go away from Python.
The more I program in this language, the more confusion this difference
is causing. Now most of functions and various object's methods accept
strings and unicode, making it hard to find sources of Unicode*Errors.

--
Jarek Zgoda
http://jpa.berlios.de/
Nov 22 '05 #7
Jarek Zgoda wrote:
Fredrik Lundh napisa³(a):
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
unsupported Unicode code range

does anyone have any idea on what could be going wrong? The string
that I store in the database table is:

'Keinen Text für Übereinstimmungsfehler gefunden'


$ more test.py
# -*- coding: iso-8859-1 -*-
u = u'Keinen Text für Übereinstimmungsfehler gefunden'
s = u.encode("iso-8859-1")
u = s.decode("utf-8") # <-- this gives an error

$ python test.py
Traceback (most recent call last):
File "test.py", line 4, in ?
u = s.decode("utf-8") # <-- this gives an error
File "lib/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
unsupported Unicode code range


I cann't wait for the moment when encoded strings go away from Python.
The more I program in this language, the more confusion this difference
is causing. Now most of functions and various object's methods accept
strings and unicode, making it hard to find sources of Unicode*Errors.


Library writers can speed up the transition by hiding 8bit interface,
for example:

import sqlite
sqlite.I_promise_to_pass_8bit_string_only_in_utf8_ encoding(my_signature="sig.gif")

if you don't call this function 8bit strings will not be accepted :)
IMHO if libraries keep on excepting both str and unicode till python
3.0, it will just prolong the confusion of unicode newbies instead of
guiding them in the right direction _right now_.

Nov 22 '05 #8
Jarek Zgoda wrote:
Fredrik Lundh napisa³(a):
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
unsupported Unicode code range

does anyone have any idea on what could be going wrong? The string
that I store in the database table is:

'Keinen Text für Übereinstimmungsfehler gefunden'


$ more test.py
# -*- coding: iso-8859-1 -*-
u = u'Keinen Text für Übereinstimmungsfehler gefunden'
s = u.encode("iso-8859-1")
u = s.decode("utf-8") # <-- this gives an error

$ python test.py
Traceback (most recent call last):
File "test.py", line 4, in ?
u = s.decode("utf-8") # <-- this gives an error
File "lib/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
unsupported Unicode code range


I cann't wait for the moment when encoded strings go away from Python.
The more I program in this language, the more confusion this difference
is causing. Now most of functions and various object's methods accept
strings and unicode, making it hard to find sources of Unicode*Errors.


Library writers can speed up the transition by hiding 8bit interface,
for example:

import sqlite
sqlite.I_promise_to_pass_8bit_string_only_in_utf8_ encoding(my_signature="sig.gif")

if you don't call this function 8bit strings will not be accepted :)
IMHO if libraries keep on excepting both str and unicode till python
3.0, it will just prolong the confusion of unicode newbies instead of
guiding them in the right direction _right now_.

Nov 22 '05 #9
On 17 Nov 2005 03:47:00 -0800, "Greg Miller" <et**********@gmail.com>
wrote:
I have an application that uses sqlite3 to store job/error data. When
I log in as a German user the error codes generated are translated into
German. The error code text is then stored in the db. When I use the
fetchall() to retrieve the data to generate a report I get the
following error:

Traceback (most recent call last):
File "c:\Pest3\Glosser\baseApp\reportGen.py", line 199, in
OnGenerateButtonNow
self.OnGenerateButton(event)
File "c:\Pest3\Glosser\baseApp\reportGen.py", line 243, in
OnGenerateButton
warningresult = messagecursor1.fetchall()
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
unsupported Unicode code range

does anyone have any idea on what could be going wrong? The string
that I store in the database table is:

'Keinen Text für Übereinstimmungsfehler gefunden'

I thought that all strings were stored in unicode in sqlite.

No, they are stored as UTF-8 in sqlite and pysqlite has no way to make
sure the string you insert into the database is really encoded in
UTF-8 (the only secure way is to use Unicode strings).

How did you insert that string?

As a partial solution, try to disable automatic conversion of text
fields in Unicode strings:
def convert_text(s):
# XXX do not use Unicode
return s
# Register the converter with SQLite
sqlite.register_converter("TEXT", convert_text)
....connect("...",
detect_types=sqlite.PARSE_DECLTYPES|sqlite.PARSE_C OLNAMES
)


Regards Manlio Perillo
Nov 22 '05 #10
On 17 Nov 2005 03:47:00 -0800, "Greg Miller" <et**********@gmail.com>
wrote:
I have an application that uses sqlite3 to store job/error data. When
I log in as a German user the error codes generated are translated into
German. The error code text is then stored in the db. When I use the
fetchall() to retrieve the data to generate a report I get the
following error:

Traceback (most recent call last):
File "c:\Pest3\Glosser\baseApp\reportGen.py", line 199, in
OnGenerateButtonNow
self.OnGenerateButton(event)
File "c:\Pest3\Glosser\baseApp\reportGen.py", line 243, in
OnGenerateButton
warningresult = messagecursor1.fetchall()
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
unsupported Unicode code range

does anyone have any idea on what could be going wrong? The string
that I store in the database table is:

'Keinen Text für Übereinstimmungsfehler gefunden'

I thought that all strings were stored in unicode in sqlite.

No, they are stored as UTF-8 in sqlite and pysqlite has no way to make
sure the string you insert into the database is really encoded in
UTF-8 (the only secure way is to use Unicode strings).

How did you insert that string?

As a partial solution, try to disable automatic conversion of text
fields in Unicode strings:
def convert_text(s):
# XXX do not use Unicode
return s
# Register the converter with SQLite
sqlite.register_converter("TEXT", convert_text)
....connect("...",
detect_types=sqlite.PARSE_DECLTYPES|sqlite.PARSE_C OLNAMES
)


Regards Manlio Perillo
Nov 22 '05 #11
Thank you for all your suggestions. I ended up casting the string to
unicode prior to inserting into the database.

Greg Miller

Nov 22 '05 #12
Thank you for all your suggestions. I ended up casting the string to
unicode prior to inserting into the database.

Greg Miller

Nov 22 '05 #13
On 18 Nov 2005 09:09:24 -0800, "Greg Miller" <et**********@gmail.com>
wrote:
Thank you for all your suggestions. I ended up casting the string to
unicode prior to inserting into the database.


Don't do it by hand if it can be done by an automated system.

Try with:

from pysqlite2 import dbapi2 as sqlite

def adapt_str(s):
# if you have declared this encoding at begin of the module
return s.decode("iso-8859-1")

sqlite.register_adapter(str, adapt_str)
Read pysqlite documentation for more informations:
http://initd.org/pub/software/pysqli...age-guide.html

Regards Manlio Perillo
Nov 22 '05 #14
On 18 Nov 2005 09:09:24 -0800, "Greg Miller" <et**********@gmail.com>
wrote:
Thank you for all your suggestions. I ended up casting the string to
unicode prior to inserting into the database.


Don't do it by hand if it can be done by an automated system.

Try with:

from pysqlite2 import dbapi2 as sqlite

def adapt_str(s):
# if you have declared this encoding at begin of the module
return s.decode("iso-8859-1")

sqlite.register_adapter(str, adapt_str)
Read pysqlite documentation for more informations:
http://initd.org/pub/software/pysqli...age-guide.html

Regards Manlio Perillo
Nov 22 '05 #15
Thanks again, I'll look into this method.

Greg Miller

Nov 22 '05 #16
Thanks again, I'll look into this method.

Greg Miller

Nov 22 '05 #17

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

5 posts views Thread by Richard Lewis | last post: by
1 post views Thread by Vlajko Knezic | last post: by
reply views Thread by Greg Miller | last post: by
4 posts views Thread by EmeraldShield | last post: by
4 posts views Thread by weheh | last post: by
reply views Thread by devrayhaan | last post: by
reply views Thread by gheharukoh7 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.