473,583 Members | 2,878 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Convertion of Unicode to ASCII NIGHTMARE

Hi

I am reading from an oracle database using cx_Oracle. I am writing to a
SQLite database using apsw.

The oracle database is returning utf-8 characters for euopean item
names, ie special charcaters from an ASCII perspective.

I get the following error:
SQLiteCur.execu te(sql, row)
UnicodeDecodeE rror: 'ascii' codec can't decode byte 0xdc in position 12: ordinal not in >range(128)


I have googled for serval days now and still cant get it to encode to
ascii.

I encode the SQL as follows:

sql = "insert into %s values %s" % (SQLiteTable, paramstr)
sql.encode('asc ii', 'ignore')

I then code each of the row values returned from Oracle like this:

row = map(encodestr, row)
SQLiteCur.execu te(sql, row)

where encodestr is as follows:

def encodestr(item) :
if isinstance(item , types.StringTyp es):
return unicodedata.nor malize('NFKD', unicode(item, 'utf-8',
'ignore')).enco de('ASCII', 'ignore')
else:
return item

I have tried a thousand of similiar functions to the above,
permitations of the above from various google searches. But I still get
the above exception on the line:

SQLiteCur.execu te(sql, row)

and the exception is reslated to the data in one field.

Int the end I resorted to using oracles convert function in the SQL
statement but would like to understand why this is happening and why
its so hard to convert the string in python. I have read many
complaints about this from other people some of whom have written
custom stripping routines. I havent tried a custom routine yet, cause I
think it should be possilble in python.

Thanks,

Apr 3 '06 #1
24 9035
ChaosKCW wrote:
Hi

I am reading from an oracle database using cx_Oracle. I am writing to a
SQLite database using apsw.

The oracle database is returning utf-8 characters for euopean item
names, ie special charcaters from an ASCII perspective.
And does cx_Oracle return those as Unicode objects or as plain strings
containing UTF-8 byte sequences? It's very important to distinguish
between these two cases, and I don't have any experience with cx_Oracle
to be able to give advice here.
I get the following error:
SQLiteCur.execu te(sql, row)
UnicodeDecodeE rror: 'ascii' codec can't decode byte 0xdc in position 12: ordinal not
in range(128)

It looks like you may have Unicode objects that you're presenting to
sqlite. In any case, with earlier versions of pysqlite that I've used,
you need to connect with a special unicode_results parameter, although
later versions should work with Unicode objects without special
configuration. See here for a thread (in which I seem to have
participated, coincidentally) :

http://mail.python.org/pipermail/pyt...ne/107526.html
I have googled for serval days now and still cant get it to encode to
ascii.


This is a tough thing to find out - whilst previous searches did
uncover some discussions about it, I just tried and failed to find the
enlightening documents - and I certainly didn't see many references to
it on the official pysqlite site.

Paul

Apr 3 '06 #2
ChaosKCW wrote:
Hi

I am reading from an oracle database using cx_Oracle. I am writing to a
SQLite database using apsw.

The oracle database is returning utf-8 characters for euopean item
names, ie special charcaters from an ASCII perspective.
I'm not sure that you are using those terms correctly. From your description
below, it seems that your data is being returned from the Oracle database as
unicode strings rather than regular strings containing UTF-8 encoded data. These
European characters are not "special characters from an ASCII perspective;" they
simply aren't characters in the ASCII character set at all.
I get the following error:
SQLiteCur.execu te(sql, row)
UnicodeDecode Error: 'ascii' codec can't decode byte 0xdc in position 12: ordinal not in >range(128)
I have googled for serval days now and still cant get it to encode to
ascii.


Don't. You can't. Those characters don't exist in the ASCII character set.
SQLite 3.0 deals with UTF-8 encoded SQL statements, though.

http://www.sqlite.org/version3.html
I encode the SQL as follows:

sql = "insert into %s values %s" % (SQLiteTable, paramstr)
sql.encode('asc ii', 'ignore')


The .encode() method returns a new value; it does not change an object inplace.

sql = sql.encode('utf-8')

--
Robert Kern
ro*********@gma il.com

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Apr 3 '06 #3
> Don't. You can't. Those characters don't exist in the ASCII character set.
SQLite 3.0 deals with UTF-8 encoded SQL statements, though.


That is not entirely correct - one can, if losing information is ok. The OPs
code that normalized UTF-8 to NFKD, an umlaut like ä is transformed to a
two-character-sequence basically saying "a with two dots on top". With
'ignore' specified as parameter to the encoder, this should be result in
the letter a.
Regards,

Diez
Apr 3 '06 #4
Oh, and it occurs to me, as I seem to have mentioned a document about
PgSQL rather than pysqlite (although they both have the same principal
developer), that you might need to investigate the client_encoding
parameter when setting up your connection. The following message gives
some information (but not much):

http://groups.google.com/group/comp....7fa9866c9b7b5f

Sadly, I can't find the information about getting result values as
Unicode objects, but I believe it involves some kind of SQL comment
that you send to the database system which actually tells pysqlite to
change its behaviour.

Paul

Apr 3 '06 #5
"Paul Boddie" <pa**@boddie.or g.uk> wrote in message news:11******** **************@ i39g2000cwa.goo glegroups.com.. .
It looks like you may have Unicode objects that you're presenting to
sqlite. In any case, with earlier versions of pysqlite that I've used,
you need to connect with a special unicode_results parameter,


He is using apsw. apsw correctly handles unicode. In fact it won't
accept a str with bytes >127 as they will be an unknown encoding and
SQLite only uses Unicode internally. It does have a blob type
using buffer for situations where binary data needs to be stored.
pysqlite's mishandling of Unicode is one of the things that drove
me to writing apsw in the first place.

Roger
Apr 4 '06 #6
Hi

Thanks for all the posts. I am still digesting it all but here are my
initial comments.
Don't. You can't. Those characters don't exist in the ASCII character set.
SQLite 3.0 deals with UTF-8 encoded SQL statements, though.
http://www.sqlite.org/version3.html
As mentioned by the next poster, there is, its supposed to be encode
with the 'ignore' option. Thus you lose data, but thats just dandy with
me. As for SQLite supporting unicode, it probably does, but something
on the python side (probabyl in apsw) converts it to ascii at some
point before its handed to SQLite.
The .encode() method returns a new value; it does not change an object inplace.
sql = sql.encode('utf-8')
Ah yes, big bistake on my part :-/
He is using apsw. apsw correctly handles unicode. In fact it won't
accept a str with bytes >127 as they will be an unknown encoding and
SQLite only uses Unicode internally. It does have a blob type
using buffer for situations where binary data needs to be stored.
pysqlite's mishandling of Unicode is one of the things that drove
me to writing apsw in the first place.


Ok if SQLite uses unicode internally why do you need to ignore
everything greater than 127, the ascii table (256 bit one) fits into
unicode just fine as far as I recall? Or did I miss the boat here ?

Thanks,

Apr 4 '06 #7
"Thus you lose data, but thats just dandy with
me.": Please reconsider this attitude, before you perpetrate a nonsense
or even a disaster.

Wrt your last para:
1. Roger didn't say "ignore" -- he said "won't accept" (a major
difference).
2. The ASCII code comprises 128 characters, *NOT* 256.
3. What Roger means is: given a Python 8-bit string and no other
information, you don't have a clue what the encoding is. Most codes of
interest these days have the ASCII code (or a mild perversion thereof)
in the first 128 positions, but it's anyones guess what the writer of
the string had in mind with the next 128.

Apr 4 '06 #8
Roger Binns wrote:
"Paul Boddie" <pa**@boddie.or g.uk> wrote in message news:11******** **************@ i39g2000cwa.goo glegroups.com.. .
It looks like you may have Unicode objects that you're presenting to
sqlite. In any case, with earlier versions of pysqlite that I've used,
you need to connect with a special unicode_results parameter,


He is using apsw. apsw correctly handles unicode. In fact it won't
accept a str with bytes >127 as they will be an unknown encoding and
SQLite only uses Unicode internally. It does have a blob type
using buffer for situations where binary data needs to be stored.
pysqlite's mishandling of Unicode is one of the things that drove
me to writing apsw in the first place.


Ah, I misread the OP's traceback.

Okay, the OP is getting regular strings, which are probably encoded in
ISO-8859-1 if I had to guess, from the Oracle DB. He is trying to pass them in
to SQLiteCur.execu te() which tries to make a unicode string from the input:

In [1]: unicode('\xdc')
---------------------------------------------------------------------------
exceptions.Unic odeDecodeError Traceback (most recent call
last)

/Users/kern/<ipython console>

UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0xdc in position 0: ordinal
not in range(128)

*Now*, my advice to the OP is to figure out the encoding of the strings that are
being returned from Oracle. As I said, ISO-8859-1 is probably a good guess.
Then, he would *decode* the string to a unicode string using the encoding. E.g.:

row = row.decode('iso-8859-1')

Then everything should be peachy. I hope.

--
Robert Kern
ro*********@gma il.com

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Apr 4 '06 #9
Robert Kern wrote:
Roger Binns wrote:
"Paul Boddie" <pa**@boddie.or g.uk> wrote
It looks like you may have Unicode objects that you're presenting to
sqlite. In any case, with earlier versions of pysqlite that I've used,
you need to connect with a special unicode_results parameter,

Note that I've since mentioned client_encoding which seems to matter
for pysqlite 1.x.
He is using apsw. apsw correctly handles unicode. In fact it won't
accept a str with bytes >127 as they will be an unknown encoding and
SQLite only uses Unicode internally. It does have a blob type
using buffer for situations where binary data needs to be stored.
pysqlite's mishandling of Unicode is one of the things that drove
me to writing apsw in the first place.

For pysqlite 2.x, it appears that Unicode objects can be handed
straight to the API methods, and I'd be interested to hear about your
problems with pysqlite, Unicode and what actually made you write apsw
instead.
Ah, I misread the OP's traceback.

Okay, the OP is getting regular strings, which are probably encoded in
ISO-8859-1 if I had to guess, from the Oracle DB. He is trying to pass them in
to SQLiteCur.execu te() which tries to make a unicode string from the input:
[...]

There's an Oracle environment variable that appears to make a
difference: NLS_CHARSET, perhaps - it's been a while since I've had to
deal with Oracle, and I'm not looking for another adventure into
Oracle's hideous documentation to find out.
*Now*, my advice to the OP is to figure out the encoding of the strings that are
being returned from Oracle. As I said, ISO-8859-1 is probably a good guess.
Then, he would *decode* the string to a unicode string using the encoding. E.g.:

row = row.decode('iso-8859-1')

Then everything should be peachy. I hope.


Yes, just find out what Oracle wants first, then set it all up, noting
that without looking into the Oracle wrapper being used, I can't
suggest an easier way.

Paul

Apr 4 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

14
2834
by: wolfgang haefelinger | last post by:
Hi, I wonder whether someone could explain me a bit what's going on here: import sys # I'm running Mandrake 1o and Windows XP. print sys.version ## 2.3.3 (#2, Feb 17 2004, 11:45:40)
17
542
by: DraguVaso | last post by:
Hi, For my SMS-application I need to be able to send characters with accents (like é and à). But this doesn't seem to work in Text Mode, so i will need to do it in PDU Mode. Does anybody has soem converting routines for that in VB.NET? I need my text to be converted to a PDU-message, and back (for received messages). Thansk in advance,
18
34100
by: Ger | last post by:
I have not been able to find a simple, straight forward Unicode to ASCII string conversion function in VB.Net. Is that because such a function does not exists or do I overlook it? I found Encoding.Convert, but that needs byte arrays. Thanks, /Ger
19
3319
by: Thomas W | last post by:
I'm getting really annoyed with python in regards to unicode/ascii-encoding problems. The string below is the encoding of the norwegian word "fødselsdag". I stored the string as "fødselsdag" but somewhere in my code it got translated into the mess above and I cannot get the original string back. It cannot be printed in the console or...
0
8159
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
8314
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7922
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
1
5689
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5366
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3811
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3836
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2317
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1416
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.