473,721 Members | 2,256 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Unicode / cx_Oracle problem

Sorry to be back at the goodly well so soon, but...

....when I execute the following -- variable mean_eng_txt being
utf-16LE and its datatype nvarchar2(79) in Oracle:

cursor.execute( """INSERT INTO mean (mean_id,mean_e ng_txt)
VALUES (:id,:mean)""", id=id,mean=mean )

I not surprisingly get this error message:

"cx_Oracle.NotS upportedError: Variable_TypeBy Value(): unhandled data
type unicode"

But when I try putting a codecs.BOM_UTF1 6_LE in various plausible
places, I just end up generating different errors.

Recommendations , please?

TIA,
Richard Schulman
(Remove xx for email reply)
Sep 8 '06 #1
8 9922
Richard Schulman schrieb:
Sorry to be back at the goodly well so soon, but...

...when I execute the following -- variable mean_eng_txt being
utf-16LE and its datatype nvarchar2(79) in Oracle:

cursor.execute( """INSERT INTO mean (mean_id,mean_e ng_txt)
VALUES (:id,:mean)""", id=id,mean=mean )

I not surprisingly get this error message:

"cx_Oracle.NotS upportedError: Variable_TypeBy Value(): unhandled data
type unicode"

But when I try putting a codecs.BOM_UTF1 6_LE in various plausible
places, I just end up generating different errors.
Show us the alleged plausible places, and the different errors.
Otherwise it's crystal ball time again.

Diez
Sep 9 '06 #2
> cursor.execute( """INSERT INTO mean (mean_id,mean_e ng_txt)
> VALUES (:id,:mean)""", id=id,mean=mean )
...
"cx_Oracle.NotS upportedError: Variable_TypeBy Value(): unhandled data
type unicode"

But when I try putting a codecs.BOM_UTF1 6_LE in various plausible
places, I just end up generating different errors.
Diez:
>Show us the alleged plausible places, and the different errors.
Otherwise it's crystal ball time again.
More usefully, let's just try to fix the code above. Here's the error
message I get:

NotSupportedErr or: Variable_TypeBy Value(): unhandled data type unicode

Traceback (innermost last):

File "c:\pythonapps\ LoadMeanToOra.p y", line 1, in ?
# LoadMeanToOra reads a UTF-16LE input file one record at a time
File "c:\pythonapps\ LoadMeanToOra.p y", line 23, in ?
cursor.execute( """INSERT INTO mean (mean_id,mean_e ng_txt)

What I can't figure out is whether cx_Oracle is saying it can't handle
Unicode for an Oracle nvarchar2 data type or whether it can handle the
input but that it needs to be in a specific format that I'm not
supplying.

- Richard Schulman
Sep 10 '06 #3
Richard Schulman schrieb:
>> cursor.execute( """INSERT INTO mean (mean_id,mean_e ng_txt)
VALUES (:id,:mean)""", id=id,mean=mean )
...
"cx_Oracle.NotS upportedError: Variable_TypeBy Value(): unhandled data
type unicode"

But when I try putting a codecs.BOM_UTF1 6_LE in various plausible
places, I just end up generating different errors.

Diez:
>Show us the alleged plausible places, and the different errors.
Otherwise it's crystal ball time again.

More usefully, let's just try to fix the code above. Here's the error
message I get:

NotSupportedErr or: Variable_TypeBy Value(): unhandled data type unicode

Traceback (innermost last):

File "c:\pythonapps\ LoadMeanToOra.p y", line 1, in ?
# LoadMeanToOra reads a UTF-16LE input file one record at a time
File "c:\pythonapps\ LoadMeanToOra.p y", line 23, in ?
cursor.execute( """INSERT INTO mean (mean_id,mean_e ng_txt)

What I can't figure out is whether cx_Oracle is saying it can't handle
Unicode for an Oracle nvarchar2 data type or whether it can handle the
input but that it needs to be in a specific format that I'm not
supplying.
What does

print repr(mean)

give you?

It _looks_ to me (don't have an orcacle available right now) as if it is
a unicode object. That you have to consider as some abstract string
representation. Which means it has to be encoded in some way before sent
over the wire. There might exist db-api bindings that can deal with
them, by applying a default encoding or somehow figuring out what
encoding the DB expects. But I don't see any references to unicode in
pep 249, so I presume you can't rely on that - which seems to be the
case here.

The oracle NLS is a sometimes tricky beast, as it sets the encoding it
tries to be clever and assigns an existing connection some encoding,
based on the users/machines locale. Which can yield unexpected results,
such as "Dusseldorf " instead of "Dsseldorf " when querying a german city
list with an english locale.

So - you have to figure out, what encoding your db-connection expects.
You can do so by issuing some queries against the session tables I
believe - I don't have my oracle resources at home, but googling will
bring you there, the important oracle term is NLS.

Then you need to encode the unicode string before passing it - something
like this:

mean = mean.encode("la tin1")

That should help.

Diez
Sep 10 '06 #4
On Sun, 10 Sep 2006 11:42:26 +0200, "Diez B. Roggisch"
<de***@nospam.w eb.dewrote:
>What does print repr(mean) give you?
That is a useful suggestion.

For context, I reproduce the source code:

in_file = codecs.open("c: \\pythonapps\\m ean.my",encodin g="utf_16_LE" )
connection = cx_Oracle.conne ct("username", "password")
cursor = connection.curs or()
for row in in_file:
id = row[0]
mean = row[1]
print "Value of row is ", repr(row) #debug line
print "Value of the variable 'id' is ", repr(id) #debug line
print "Value of the variable 'mean' is ", repr(mean) #debug line
cursor.execute( """INSERT INTO mean (mean_id,mean_e ng_txt)
VALUES (:id,:mean)""", id=id,mean=mean )

Here is the result from the print repr() statements:

Value of row is u"\ufeff(3,'sad ness, lament; sympathize with,
pity')\r\n"
Value of the variable 'id' is u'\ufeff'
Value of the variable 'mean' is u'('

Clearly, the values loaded into the 'id' and 'mean' variables are not
satisfactory but are picking up the BOM.
>...
The oracle NLS is a sometimes tricky beast, as it sets the encoding it
tries to be clever and assigns an existing connection some encoding,
based on the users/machines locale. Which can yield unexpected results,
such as "Dusseldorf " instead of "Dsseldorf " when querying a german city
list with an english locale.
Agreed.
>So - you have to figure out, what encoding your db-connection expects.
You can do so by issuing some queries against the session tables I
believe - I don't have my oracle resources at home, but googling will
bring you there, the important oracle term is NLS.
It's very hard to figure out what to do on the basis of complexities
on the order of

http://download-east.oracle.com/docs...htm#sthref1042

(tiny equivalent http://tinyurl.com/fnc54

But I'm not even sure I got that far. My problems so far seem prior:
in Python or Python's cx_Oracle driver. To be candid, I'm very tempted
at this point to abandon the Python effort and revert to an all-ucs2
environment, much as I dislike Java and C#'s complexities and the poor
support available for all-Java databases.
>Then you need to encode the unicode string before passing it - something
like this:

mean = mean.encode("la tin1")
I don't see how the Chinese characters embedded in the English text
will carry over if I do that.

In any case, thanks for your patient and generous help.

Richard Schulman
Delete the antispamming 'xx' characters for email reply
Sep 10 '06 #5
Value of the variable 'id' is u'\ufeff'
Value of the variable 'mean' is u'('
So they both are unicode objects - as I presumed.

It's very hard to figure out what to do on the basis of complexities
on the order of

http://download-east.oracle.com/docs...htm#sthref1042

(tiny equivalent http://tinyurl.com/fnc54
Yes, that is somewhat intimidating.
But I'm not even sure I got that far. My problems so far seem prior:
in Python or Python's cx_Oracle driver. To be candid, I'm very tempted
at this point to abandon the Python effort and revert to an all-ucs2
environment, much as I dislike Java and C#'s complexities and the poor
support available for all-Java databases.
That actually doesn't help you much I guess - just because JDBC will
convert java's unicode strings to byte strings behind the curtains, you
will lose all encoding information nonetheless - especially if the DB
itself isn't running an encoding that will allow for all possible
unicode characters to be represented.
>Then you need to encode the unicode string before passing it - something
like this:

mean = mean.encode("la tin1")

I don't see how the Chinese characters embedded in the English text
will carry over if I do that.
Me neither, but how could I have foreseen that? So use something else
instead - utf-8 for example, or whatever the oracle connection will grok.

I think you should read up on what unicode and encodings are, and how
they work in python, and unfortunately how they do work in oracle.
Because even if you use java - not understanding how things are
connected will hit you in the neck at some point.

Diez
Sep 10 '06 #6
Richard Schulman wrote:
On Sun, 10 Sep 2006 11:42:26 +0200, "Diez B. Roggisch"
<de***@nospam.w eb.dewrote:
What does print repr(mean) give you?

That is a useful suggestion.

For context, I reproduce the source code:

in_file = codecs.open("c: \\pythonapps\\m ean.my",encodin g="utf_16_LE" )
connection = cx_Oracle.conne ct("username", "password")
cursor = connection.curs or()
for row in in_file:
id = row[0]
mean = row[1]
print "Value of row is ", repr(row) #debug line
print "Value of the variable 'id' is ", repr(id) #debug line
print "Value of the variable 'mean' is ", repr(mean) #debug line
cursor.execute( """INSERT INTO mean (mean_id,mean_e ng_txt)
VALUES (:id,:mean)""", id=id,mean=mean )

Here is the result from the print repr() statements:

Value of row is u"\ufeff(3,'sad ness, lament; sympathize with,
pity')\r\n"
Value of the variable 'id' is u'\ufeff'
Value of the variable 'mean' is u'('

Clearly, the values loaded into the 'id' and 'mean' variables are not
satisfactory but are picking up the BOM.
Well of course they're "unsatisfactory " and this is absolutely nothing
to do with Oracle and cx_Oracle.

row is a string of characters. row[0] is the BOM. Read my lips (from a
previous thread):

"""
Use utf_16 -- it will strip off the BOM for you.
"""
and again:
"""
| >>codecs.open(' guff.utf16le', 'r', encoding='utf_1 6').read()
| u'abc\n\rdef\n\ rghi' ######### Look, Mom, no BOM!
"""

row[1] is the first ***character*** of what looks suspiciously like the
Python representation of a tuple:

"""(3,'sadn ess, lament; sympathize with, pity')"""

Who wrote that like that??? If it is at all under your control, do it
like this:
Encode each Unicode text field in UTF-8. Write the file as a CSV file
using Python's csv module. Read the CSV file using the same module.
Decode the text fields from UTF-8.

You need to parse the incoming line into column values (the csv module
does this for you) and then convert each column value from
string/Unicode to a Python type that is compatible with the Oracle type
for that column.

My guess (not having used cx_Oracle) is that the error is happening
because the column "id" has a numeric type and you are trying to jam a
Unicode string into it. IOW, nothing to do with the "mean" column
(yet!).

BTW, I've managed to decode that "eng" means English not engineering
and "mean" means meaning i.e. not average and not stingy. Holy
obfuscation, Batman!

HTH,
John

Sep 10 '06 #7
On 10 Sep 2006 15:27:17 -0700, "John Machin" <sj******@lexic on.net>
wrote:
>...
Encode each Unicode text field in UTF-8. Write the file as a CSV file
using Python's csv module. Read the CSV file using the same module.
Decode the text fields from UTF-8.

You need to parse the incoming line into column values (the csv module
does this for you) and then convert each column value from
string/Unicode to a Python type that is compatible with the Oracle type
for that column.
...
John, how am I to reconcile your suggestions above with my
ActivePython 2.4 documentation, which states:

<<12.20 csv -- CSV File Reading and Writing
<<New in version 2.3.
....
<<Note: This version of the csv module doesn't support Unicode input.
Also, there are currently some issues regarding ASCII NUL characters.
Accordingly, all input should generally be printable ASCII to be safe.
These restrictions will be removed in the future.>>

Regards,
Richard Schulman
Sep 16 '06 #8
Richard Schulman wrote:
On 10 Sep 2006 15:27:17 -0700, "John Machin" <sj******@lexic on.net>
wrote:
...
Encode each Unicode text field in UTF-8. Write the file as a CSV file
using Python's csv module. Read the CSV file using the same module.
Decode the text fields from UTF-8.

You need to parse the incoming line into column values (the csv module
does this for you) and then convert each column value from
string/Unicode to a Python type that is compatible with the Oracle type
for that column.
...

John, how am I to reconcile your suggestions above with my
ActivePython 2.4 documentation, which states:

<<12.20 csv -- CSV File Reading and Writing
<<New in version 2.3.
...
<<Note: This version of the csv module doesn't support Unicode input.
Also, there are currently some issues regarding ASCII NUL characters.
Accordingly, all input should generally be printable ASCII to be safe.
These restrictions will be removed in the future.>>
1. For "Unicode" read "UTF-16".

2. Unless you have \u0000 in your Unicode data, encoding it into UTF-8
won't cause any ASCII NUL bytes to appear. Ensuring that you don't have
NULs in your data is a good idea in general.

3. There are also evidently some issues regarding ASCII LF characters
embedded in fields (like when Excel users do Alt-Enter (Windows
version) to put a hard line break in their headings); see
http://docs.python.org/dev/whatsnew/modules.html of which the following
is an extract:
"""
The CSV parser is now stricter about multi-line quoted fields.
Previously, if a line ended within a quoted field without a terminating
newline character, a newline would be inserted into the returned field.
This behavior caused problems when reading files that contained
carriage return characters within fields, so the code was changed to
return the field without inserting newlines. As a consequence, if
newlines embedded within fields are important, the input should be
split into lines in a manner that preserves the newline characters.
"""

4. Provided your fields don't contain any of CR, LF, ctrl-Z (maybe),
and NUL, you should be OK. I can't understand the sentence
"Accordingl y, all input should generally be printable ASCII to be
safe." -- especially the "accordingl y". If it was running amok with
8-bit characters with ord(c) >= 128, there would have been loud shrieks
from several corners of the globe.

5. However, to be safe, you could go the next step and convert the
UTF-8 to base64 -- see
http://docs.python.org/dev/lib/module-base64.html -- BUT once you've
done that your encoded data doesn't even have commas and quotes in it,
so you can avoid the maybe unsafe csv module and just write your data
as ",".join(base64 _encoded_fields ).

HTH,
John

HTH,
John

Sep 16 '06 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
1434
by: Kathy | last post by:
Hi! Is there any way I can create a unicode object from the unicode code itself. I extract the code from a file and need to create a unicode object that represents the corresponding character. So what i get from reading my file is a string containing for example the code 'u8c79' (Chinese character).
2
1551
by: garykpdx | last post by:
Every time I think I understand unicode, I prove I don't. I created a variable in interactive mode like this: s = u'' where this character is the a-umlaut that worked alright. Then I encoded it like this: s.encode( 'latin1') and it printed out a sigma (totally wrong)
1
2086
by: Andrew McCall | last post by:
Hi Folks, I am having a problem with a forms entry using Unicode characters, specifically "smart-quotes". Under Microsoft PWS and MS-SQL 7.0 it works fine and I get the Unicode characters, such as smart-quotes, being written and read form the database. However, if I use the same code using Apache, Linux and SunONE ASP the Unicode characters don't work! Here is an example of the pages:
1
2880
by: Klaubator | last post by:
Hi, A simple task is driving me crazy, just can figure out how to programatically write unicode characters to a SVG (XML) document. With an editor it is easy to write Unicode characters like &#x431 (Cyrillic) to an SVG text element. Trying to do the same with code is a pain but there must be some solution.
0
1808
by: charles hamm | last post by:
I have an app that worked fine under MBCS in VC6 but is broken now under unicode in VC7. The app writes text files using CStdioFile. A standard doc/view pair of classes allows the user to read the text files back in for editing. When the files get read in, two bytes from the file get pushed into one Unicode character. I thought Serialize was supposed to handle this. What am I doing wrong?
4
3321
by: Petr Jakes | last post by:
Hi, I am using Python 2.4.3 on Fedora Core4 and "Eric3" Python IDE .. Below mentioned code works fine in the Eric3 environment. While trying to start it from the command line, it returns: Traceback (most recent call last): File "pokus_1.py", line 5, in ? print str(a) UnicodeEncodeError: 'ascii' codec can't encode character u'\xc1' in
0
1592
by: santhescript01 | last post by:
Unicode to non unicode conversion problem -------------------------------------------------------------------------------- Hi All, I am using C dll in macro which converts Unicode data to 8 bit encoding data ' Prototype of C function. ' extern "C" int _stdcall Uni2Eni(wchar_t * uni, unsigned char * eni, int size)
1
3570
by: Eric S. Johansson | last post by:
I'm having a problem (Python 2.4) converting strings with random 8-bit characters into an escape form which is 7-bit clean for storage in a database. Here's an example: body = meta.encode('unicode-escape') when given an 8-bit string, (in meta), the code fragment above yields the error below. 'ascii' codec can't decode byte 0xe1 in position 13: ordinal not in range(128)
9
14144
by: =?Utf-8?B?dHBhcmtzNjk=?= | last post by:
OK I have some Chinese text in sql server column that looks like this: 12大专题调研破解广东科学发展难题 This is unicode? Anyway, I put this data into a text area like this: articleArea.InnerHtml = article.Text . . .. and it works fine (shows chinese characters). But when I put this data into a asp:textbox control, it just shows up as is... (12大&# etc...) Can anyone tell me how to get the characters to appear...
0
8858
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, well explore What is ONU, What Is Router, ONU & Routers main usage, and What is the difference between ONU and Router. Lets take a closer look ! Part I. Meaning of...
0
9376
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
9148
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9085
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8026
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing, and deploymentwithout human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5994
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4762
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3207
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2146
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.