Unicode / cx_Oracle problem

Richard Schulman

Sorry to be back at the goodly well so soon, but...

....when I execute the following -- variable mean_eng_txt being
utf-16LE and its datatype nvarchar2(79) in Oracle:

cursor.execute("""INSERT INTO mean (mean_id,mean_eng_txt)
VALUES (:id,:mean)""",id=id,mean=mean)

I not surprisingly get this error message:

"cx_Oracle.NotSupportedError: Variable_TypeByValue(): unhandled data
type unicode"

But when I try putting a codecs.BOM_UTF16_LE in various plausible
places, I just end up generating different errors.

Recommendations, please?

TIA,
Richard Schulman
(Remove xx for email reply)

Sep 8 '06 #1

Subscribe Post Reply

9892

Diez B. Roggisch

Richard Schulman schrieb:

Sorry to be back at the goodly well so soon, but...

...when I execute the following -- variable mean_eng_txt being
utf-16LE and its datatype nvarchar2(79) in Oracle:

cursor.execute("""INSERT INTO mean (mean_id,mean_eng_txt)
VALUES (:id,:mean)""",id=id,mean=mean)

I not surprisingly get this error message:

"cx_Oracle.NotSupportedError: Variable_TypeByValue(): unhandled data
type unicode"

But when I try putting a codecs.BOM_UTF16_LE in various plausible
places, I just end up generating different errors.

Show us the alleged plausible places, and the different errors.
Otherwise it's crystal ball time again.

Diez

Sep 9 '06 #2

Richard Schulman

> cursor.execute("""INSERT INTO mean (mean_id,mean_eng_txt)

> VALUES (:id,:mean)""",id=id,mean=mean)
...
"cx_Oracle.NotSupportedError: Variable_TypeByValue(): unhandled data
type unicode"

But when I try putting a codecs.BOM_UTF16_LE in various plausible
places, I just end up generating different errors.

Diez:

>Show us the alleged plausible places, and the different errors.
Otherwise it's crystal ball time again.

More usefully, let's just try to fix the code above. Here's the error
message I get:

NotSupportedError: Variable_TypeByValue(): unhandled data type unicode

Traceback (innermost last):

File "c:\pythonapps\LoadMeanToOra.py", line 1, in ?
# LoadMeanToOra reads a UTF-16LE input file one record at a time
File "c:\pythonapps\LoadMeanToOra.py", line 23, in ?
cursor.execute("""INSERT INTO mean (mean_id,mean_eng_txt)

What I can't figure out is whether cx_Oracle is saying it can't handle
Unicode for an Oracle nvarchar2 data type or whether it can handle the
input but that it needs to be in a specific format that I'm not
supplying.

- Richard Schulman

Sep 10 '06 #3

Diez B. Roggisch

Richard Schulman schrieb:

>> cursor.execute("""INSERT INTO mean (mean_id,mean_eng_txt)
VALUES (:id,:mean)""",id=id,mean=mean)
...
"cx_Oracle.NotSupportedError: Variable_TypeByValue(): unhandled data
type unicode"

But when I try putting a codecs.BOM_UTF16_LE in various plausible
places, I just end up generating different errors.

Diez:
>Show us the alleged plausible places, and the different errors.
Otherwise it's crystal ball time again.

More usefully, let's just try to fix the code above. Here's the error
message I get:

NotSupportedError: Variable_TypeByValue(): unhandled data type unicode

Traceback (innermost last):

File "c:\pythonapps\LoadMeanToOra.py", line 1, in ?
# LoadMeanToOra reads a UTF-16LE input file one record at a time
File "c:\pythonapps\LoadMeanToOra.py", line 23, in ?
cursor.execute("""INSERT INTO mean (mean_id,mean_eng_txt)

What I can't figure out is whether cx_Oracle is saying it can't handle
Unicode for an Oracle nvarchar2 data type or whether it can handle the
input but that it needs to be in a specific format that I'm not
supplying.

What does

print repr(mean)

give you?

It _looks_ to me (don't have an orcacle available right now) as if it is
a unicode object. That you have to consider as some abstract string
representation. Which means it has to be encoded in some way before sent
over the wire. There might exist db-api bindings that can deal with
them, by applying a default encoding or somehow figuring out what
encoding the DB expects. But I don't see any references to unicode in
pep 249, so I presume you can't rely on that - which seems to be the
case here.

The oracle NLS is a sometimes tricky beast, as it sets the encoding it
tries to be clever and assigns an existing connection some encoding,
based on the users/machines locale. Which can yield unexpected results,
such as "Dusseldorf" instead of "Düsseldorf" when querying a german city
list with an english locale.

So - you have to figure out, what encoding your db-connection expects.
You can do so by issuing some queries against the session tables I
believe - I don't have my oracle resources at home, but googling will
bring you there, the important oracle term is NLS.

Then you need to encode the unicode string before passing it - something
like this:

mean = mean.encode("latin1")

That should help.

Diez

Sep 10 '06 #4

Richard Schulman

On Sun, 10 Sep 2006 11:42:26 +0200, "Diez B. Roggisch"
<de***@nospam.web.dewrote:

>What does print repr(mean) give you?

That is a useful suggestion.

For context, I reproduce the source code:

in_file = codecs.open("c:\\pythonapps\\mean.my",encoding="ut f_16_LE")
connection = cx_Oracle.connect("username", "password")
cursor = connection.cursor()
for row in in_file:
id = row[0]
mean = row[1]
print "Value of row is ", repr(row) #debug line
print "Value of the variable 'id' is ", repr(id) #debug line
print "Value of the variable 'mean' is ", repr(mean) #debug line
cursor.execute("""INSERT INTO mean (mean_id,mean_eng_txt)
VALUES (:id,:mean)""",id=id,mean=mean)

Here is the result from the print repr() statements:

Value of row is u"\ufeff(3,'sadness, lament; sympathize with,
pity')\r\n"
Value of the variable 'id' is u'\ufeff'
Value of the variable 'mean' is u'('

Clearly, the values loaded into the 'id' and 'mean' variables are not
satisfactory but are picking up the BOM.

>...
The oracle NLS is a sometimes tricky beast, as it sets the encoding it
tries to be clever and assigns an existing connection some encoding,
based on the users/machines locale. Which can yield unexpected results,
such as "Dusseldorf" instead of "Düsseldorf" when querying a german city
list with an english locale.

Agreed.

>So - you have to figure out, what encoding your db-connection expects.
You can do so by issuing some queries against the session tables I
believe - I don't have my oracle resources at home, but googling will
bring you there, the important oracle term is NLS.

It's very hard to figure out what to do on the basis of complexities
on the order of

http://download-east.oracle.com/docs...htm#sthref1042

(tiny equivalent http://tinyurl.com/fnc54

But I'm not even sure I got that far. My problems so far seem prior:
in Python or Python's cx_Oracle driver. To be candid, I'm very tempted
at this point to abandon the Python effort and revert to an all-ucs2
environment, much as I dislike Java and C#'s complexities and the poor
support available for all-Java databases.

>Then you need to encode the unicode string before passing it - something
like this:

mean = mean.encode("latin1")

I don't see how the Chinese characters embedded in the English text
will carry over if I do that.

In any case, thanks for your patient and generous help.

Richard Schulman
Delete the antispamming 'xx' characters for email reply

Sep 10 '06 #5

Diez B. Roggisch

Value of the variable 'id' is u'\ufeff'

Value of the variable 'mean' is u'('

So they both are unicode objects - as I presumed.

It's very hard to figure out what to do on the basis of complexities
on the order of

http://download-east.oracle.com/docs...htm#sthref1042

(tiny equivalent http://tinyurl.com/fnc54

Yes, that is somewhat intimidating.

But I'm not even sure I got that far. My problems so far seem prior:
in Python or Python's cx_Oracle driver. To be candid, I'm very tempted
at this point to abandon the Python effort and revert to an all-ucs2
environment, much as I dislike Java and C#'s complexities and the poor
support available for all-Java databases.

That actually doesn't help you much I guess - just because JDBC will
convert java's unicode strings to byte strings behind the curtains, you
will lose all encoding information nonetheless - especially if the DB
itself isn't running an encoding that will allow for all possible
unicode characters to be represented.

>Then you need to encode the unicode string before passing it - something
like this:

mean = mean.encode("latin1")

I don't see how the Chinese characters embedded in the English text
will carry over if I do that.

Me neither, but how could I have foreseen that? So use something else
instead - utf-8 for example, or whatever the oracle connection will grok.

I think you should read up on what unicode and encodings are, and how
they work in python, and unfortunately how they do work in oracle.
Because even if you use java - not understanding how things are
connected will hit you in the neck at some point.

Diez

Sep 10 '06 #6

John Machin

Richard Schulman wrote:

On Sun, 10 Sep 2006 11:42:26 +0200, "Diez B. Roggisch"
<de***@nospam.web.dewrote:

What does print repr(mean) give you?

That is a useful suggestion.

For context, I reproduce the source code:

in_file = codecs.open("c:\\pythonapps\\mean.my",encoding="ut f_16_LE")
connection = cx_Oracle.connect("username", "password")
cursor = connection.cursor()
for row in in_file:
id = row[0]
mean = row[1]
print "Value of row is ", repr(row) #debug line
print "Value of the variable 'id' is ", repr(id) #debug line
print "Value of the variable 'mean' is ", repr(mean) #debug line
cursor.execute("""INSERT INTO mean (mean_id,mean_eng_txt)
VALUES (:id,:mean)""",id=id,mean=mean)

Here is the result from the print repr() statements:

Value of row is u"\ufeff(3,'sadness, lament; sympathize with,
pity')\r\n"
Value of the variable 'id' is u'\ufeff'
Value of the variable 'mean' is u'('

Clearly, the values loaded into the 'id' and 'mean' variables are not
satisfactory but are picking up the BOM.

Well of course they're "unsatisfactory" and this is absolutely nothing
to do with Oracle and cx_Oracle.

row is a string of characters. row[0] is the BOM. Read my lips (from a
previous thread):

"""
Use utf_16 -- it will strip off the BOM for you.
"""
and again:
"""
| >>codecs.open('guff.utf16le', 'r', encoding='utf_16').read()
| u'abc\n\rdef\n\rghi' ######### Look, Mom, no BOM!
"""

row[1] is the first ***character*** of what looks suspiciously like the
Python representation of a tuple:

"""(3,'sadness, lament; sympathize with, pity')"""

Who wrote that like that??? If it is at all under your control, do it
like this:
Encode each Unicode text field in UTF-8. Write the file as a CSV file
using Python's csv module. Read the CSV file using the same module.
Decode the text fields from UTF-8.

You need to parse the incoming line into column values (the csv module
does this for you) and then convert each column value from
string/Unicode to a Python type that is compatible with the Oracle type
for that column.

My guess (not having used cx_Oracle) is that the error is happening
because the column "id" has a numeric type and you are trying to jam a
Unicode string into it. IOW, nothing to do with the "mean" column
(yet!).

BTW, I've managed to decode that "eng" means English not engineering
and "mean" means meaning i.e. not average and not stingy. Holy
obfuscation, Batman!

HTH,
John

Sep 10 '06 #7

Richard Schulman

On 10 Sep 2006 15:27:17 -0700, "John Machin" <sj******@lexicon.net>
wrote:

>...
Encode each Unicode text field in UTF-8. Write the file as a CSV file
using Python's csv module. Read the CSV file using the same module.
Decode the text fields from UTF-8.

You need to parse the incoming line into column values (the csv module
does this for you) and then convert each column value from
string/Unicode to a Python type that is compatible with the Oracle type
for that column.
...

John, how am I to reconcile your suggestions above with my
ActivePython 2.4 documentation, which states:

<<12.20 csv -- CSV File Reading and Writing
<<New in version 2.3.
....
<<Note: This version of the csv module doesn't support Unicode input.
Also, there are currently some issues regarding ASCII NUL characters.
Accordingly, all input should generally be printable ASCII to be safe.
These restrictions will be removed in the future.>>

Regards,
Richard Schulman

Sep 16 '06 #8

John Machin

Richard Schulman wrote:

On 10 Sep 2006 15:27:17 -0700, "John Machin" <sj******@lexicon.net>
wrote:

...
Encode each Unicode text field in UTF-8. Write the file as a CSV file
using Python's csv module. Read the CSV file using the same module.
Decode the text fields from UTF-8.

You need to parse the incoming line into column values (the csv module
does this for you) and then convert each column value from
string/Unicode to a Python type that is compatible with the Oracle type
for that column.
...

John, how am I to reconcile your suggestions above with my
ActivePython 2.4 documentation, which states:

<<12.20 csv -- CSV File Reading and Writing
<<New in version 2.3.
...
<<Note: This version of the csv module doesn't support Unicode input.
Also, there are currently some issues regarding ASCII NUL characters.
Accordingly, all input should generally be printable ASCII to be safe.
These restrictions will be removed in the future.>>

1. For "Unicode" read "UTF-16".

2. Unless you have \u0000 in your Unicode data, encoding it into UTF-8
won't cause any ASCII NUL bytes to appear. Ensuring that you don't have
NULs in your data is a good idea in general.

3. There are also evidently some issues regarding ASCII LF characters
embedded in fields (like when Excel users do Alt-Enter (Windows
version) to put a hard line break in their headings); see
http://docs.python.org/dev/whatsnew/modules.html of which the following
is an extract:
"""
The CSV parser is now stricter about multi-line quoted fields.
Previously, if a line ended within a quoted field without a terminating
newline character, a newline would be inserted into the returned field.
This behavior caused problems when reading files that contained
carriage return characters within fields, so the code was changed to
return the field without inserting newlines. As a consequence, if
newlines embedded within fields are important, the input should be
split into lines in a manner that preserves the newline characters.
"""

4. Provided your fields don't contain any of CR, LF, ctrl-Z (maybe),
and NUL, you should be OK. I can't understand the sentence
"Accordingly, all input should generally be printable ASCII to be
safe." -- especially the "accordingly". If it was running amok with
8-bit characters with ord(c) >= 128, there would have been loud shrieks
from several corners of the globe.

5. However, to be safe, you could go the next step and convert the
UTF-8 to base64 -- see
http://docs.python.org/dev/lib/module-base64.html -- BUT once you've
done that your encoded data doesn't even have commas and quotes in it,
so you can avoid the maybe unsafe csv module and just write your data
as ",".join(base64_encoded_fields).

HTH,
John

HTH,
John

Sep 16 '06 #9

Similar topics

Create Unicode object problem

by: Kathy | last post by:

Hi! Is there any way I can create a unicode object from the unicode code itself. I extract the code from a file and need to create a unicode object that represents the corresponding character....

Python

unicode encoding problem

by: garykpdx | last post by:

Every time I think I understand unicode, I prove I don't. I created a variable in interactive mode like this: s = u'ä' where this character is the a-umlaut that worked alright. Then I encoded...

Python

Unicode Character Problem

by: Andrew McCall | last post by:

Hi Folks, I am having a problem with a forms entry using Unicode characters, specifically "smart-quotes". Under Microsoft PWS and MS-SQL 7.0 it works fine and I get the Unicode characters, such...

ASP / Active Server Pages

Unicode + SVG problem

by: Klaubator | last post by:

Hi, A simple task is driving me crazy, just can figure out how to programatically write unicode characters to a SVG (XML) document. With an editor it is easy to write Unicode characters like...

.NET Framework

Unicode/CStdioFile problem

by: charles hamm | last post by:

I have an app that worked fine under MBCS in VC6 but is broken now under unicode in VC7. The app writes text files using CStdioFile. A standard doc/view pair of classes allows the user to read the...

.NET Framework

unicode mystery/problem

by: Petr Jakes | last post by:

Hi, I am using Python 2.4.3 on Fedora Core4 and "Eric3" Python IDE .. Below mentioned code works fine in the Eric3 environment. While trying to start it from the command line, it returns: ...

Python

Unicode to non unicode conversion problem

by: santhescript01 | last post by:

Unicode to non unicode conversion problem -------------------------------------------------------------------------------- Hi All, I am using C dll in macro which converts Unicode data to...

Visual Basic 4 / 5 / 6

Unicode conversion problem (codec can't decode)

by: Eric S. Johansson | last post by:

I'm having a problem (Python 2.4) converting strings with random 8-bit characters into an escape form which is 7-bit clean for storage in a database. Here's an example: body =...

Python

unicode textbox problem

by: =?Utf-8?B?dHBhcmtzNjk=?= | last post by:

OK I have some Chinese text in sql server column that looks like this: 12å¤§ä¸“é¢˜è°ƒç ”ç ´è§£å¹¿ä¸œç§‘å¦å‘å±•éš¾é¢˜ This is unicode? Anyway, I put this data into a text area like this:...

C# / C Sharp

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General