By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,907 Members | 1,932 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,907 IT Pros & Developers. It's quick & easy.

SQlite none english char

P: n/a
I works with python 2.5 on windows, And I use sqlite3

Now, I have problem searching string in Hebrew in my database

I have table called "test" with field num and test
firs row i insert "1" and "עברית" (that is "Hebrew" in Hebrew)
second row i insert "2" and "English"

now this code will print me
>>English
as it should:

i="Englisht"
cur.execute("select * from `test` where text like '%"+i+"%' ")
for row in cur:
print row[1]

but this one print me nothing
>>>
instead of עברית

i="עברית"
cur.execute("select * from `test` where text like '%"+i+"%' ")
for row in cur:
print row[1]

does any one have an idea how can i solve it?
Jun 27 '08 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Gandalf wrote:
I works with python 2.5 on windows, And I use sqlite3

Now, I have problem searching string in Hebrew in my database

I have table called "test" with field num and test
firs row i insert "1" and "עברית" (that is "Hebrew" in Hebrew)
second row i insert "2" and "English" [...]
I recommend you use Unicode strings for all your Python strings in the
application. You can then be that things will just work with the sqlite3
module.

The problem is that the sqlite3 module doesn't currently check if what
you insert into the database is in UTF-8 encoding. But if it isn't,
you're likely to run into problems later on.

So, let's assume that you've written your Python using a UTF-8 editor,
you can then use something like:

# coding: utf-8

as the first line in the script. Among others, this will allow you to
write Unicode literals in UTF-8, i. e. do something like:

data = u"עברית".

then you can insert this into the database:

cur.execute("insert into test(test) values (?)", (data,))

Note that I use the parametrized form of execute() with ? as
placeholders. *Always* use this when the SQL statement is not constant,
but has parameters of some sort.
[...]
cur.execute("select * from `test` where text like '%"+i+"%' ")
for row in cur:
print row[1]

but this one print me nothing
instead of עברית
This could be two problems. Either (likely) it simply isn't found
because you inserted garbage (non-UTF-8 data) or you got a decoding
error to UTF-8 executing the select, which the sqlite3 module will
currently unfortunately ignore and return a None value instead.

Again, use the parametrized form of the execute() statement with Unicode
Python strings to avoid any problems.

cur.execute("select * from test where text like '%' || ? || '%'",
(searchstr,))

!!! Ok, I just found out that you used the + operator in SQL to
concatenate strings. Don't, as it will not work (except on maybe MySQL).
Use the || operator instead!

Note that when you use cur.execute(sql, params), then params is actually
a tuple of parameters. In the above examples there was only one
parameter, so it was a one-tuple, which is written as (element,).
Dont'write it as (element), because Python will not recognize it as a
tuple then.

Don't hesitate to ask if you need further help.

-- Gerhard
Jun 27 '08 #2

P: n/a
On Jun 8, 11:00*am, Gerhard Häring <g...@ghaering.dewrote:
Gandalf wrote:
I works with python 2.5 on windows, *And I use sqlite3
Now, I have problem searching string in Hebrew in my database
I have table called "test" with field num and test
firs row i insert *"1" and "עברית" *(that is "Hebrew" in Hebrew)
second row i insert "2" and "English" [...]

I recommend you use Unicode strings for all your Python strings in the
application. You can then be that things will just work with the sqlite3
module.

The problem is that the sqlite3 module doesn't currently check if what
you insert into the database is in UTF-8 encoding. But if it isn't,
you're likely to run into problems later on.

So, let's assume that you've written your Python using a UTF-8 editor,
you can then use something like:

# coding: utf-8

as the first line in the script. Among others, this will allow you to
write Unicode literals in UTF-8, i. e. do something like:

data = u"עברית".

then you can insert this into the database:

cur.execute("insert into test(test) values (?)", (data,))

Note that I use the parametrized form of execute() with ? as
placeholders. *Always* use this when the SQL statement is not constant,
but has parameters of some sort.
[...]
cur.execute("select * *from `test` where text like '%"+i+"%' *")
for row in cur:
* * print row[1]
but this one print me nothing
instead of עברית

This could be two problems. Either (likely) it simply isn't found
because you inserted garbage (non-UTF-8 data) or you got a decoding
error to UTF-8 executing the select, which the sqlite3 module will
currently unfortunately ignore and return a None value instead.

Again, use the parametrized form of the execute() statement with Unicode
Python strings to avoid any problems.

cur.execute("select * from test where text like '%' || ? || '%'",
(searchstr,))

!!! Ok, I just found out that you used the + operator in SQL to
concatenate strings. Don't, as it will not work (except on maybe MySQL).
Use the || operator instead!

Note that when you use cur.execute(sql, params), then params is actually
a tuple of parameters. In the above examples there was only one
parameter, so it was a one-tuple, which is written as (element,).
Dont'write it as (element), because Python will not recognize it as a
tuple then.

Don't hesitate to ask if you need further help.

-- Gerhard
I solved the problem by entering data manually but now the problem is
that i had to delete this function:
con.text_factory = str
and now I can't see the data that I entered thought my sqlite manager

Jun 27 '08 #3

P: n/a
Gandalf wrote:
[...]
I solved the problem by entering data manually but now the problem is
that i had to delete this function:
con.text_factory = str
and now I can't see the data that I entered thought my sqlite manager
Then apparently there is still non-UTF-8 data in the database. Perhaps
SQLite Manager allows you to insert invalid data as well?

Try this:

con.text_factory = lambda s: unicode(s, "utf-8", "replace")

This will decode to UTF-8 as good as possible, and for non-decodable
characters you will get a REPLACEMENT CHARACTER instead.

To demonstrate:
>>result = con.execute("select ? || ? || ?", ('text ok ', chr(230), '
ok, too')).fetchone()[0]

The chr(230) is just random garbage that isn't valid UTF-8:
>>print result
text ok �k, too

As you can see, there is now a strange character, but the second 'ok,
too' got messed up :-/ But at least you can then find out which rows in
the database have messed up data. You can then iterate over all rows
with something like:

Then, assuming the text to check for validity is in result, you can do
something like:
>>import unicodedata
unicodedata.lookup("REPLACEMENT CHARACTER") in result
True

Does this help?

-- Gerhard

PS: This thread reinforces my believe that I have to make it harder for
users of pysqlite to make themselves shoot in the foot with non-UTF-8 data.

Jun 27 '08 #4

This discussion thread is closed

Replies have been disabled for this discussion.