Psycopg and queries with UTF-8 data

Alban Hertroys

Another python/psycopg question, for which the solution is probably
quite simple; I just don't know where to look.

I have a query that inserts data originating from an utf-8 encoded XML
file. And guess what, it contains utf-8 encoded characters...
Now my problem is that psycopg will only accept queries of type str, so
how do I get my utf-8 encoded data into the DB?

I can't do query.encode('ascii'), that would be similar to:

x = u'\xc8'
print x.encode('ascii')

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc8' in
position 0: ordinal not in range(128)
I also tried setting PostgreSQL's client-encoding by executing "SET
client_encoding TO 'utf-8'", but psycopg still only accepts str-type
strings (which is not really surprising).
I assume that the solution will result in an ascii encoded query string,
and that I then can use the QuotedString type to escape my strings
(which is in my current situation not possible because that also only
accepts str type strings and it contains utf-8 characters).

Regards,
Alban.

Jul 18 '05 #1

Subscribe Post Reply

6352

Diez B. Roggisch

Alban Hertroys wrote:

I have a query that inserts data originating from an utf-8 encoded XML
file. And guess what, it contains utf-8 encoded characters...
Now my problem is that psycopg will only accept queries of type str, so
how do I get my utf-8 encoded data into the DB?
This sounds like the usual unicode/utf-8 confusion: unicode is an abstract
specification of characters, utf-8 as well as latin1 and ascii are
encodings of that specification that allow for certain characters to be
used - namely, ascii for only well-known first 127, latin1 for some major
european languages, and utf-8 defines escapes for all possible characters
defined in unicode - with the result that some of the characters aren't one
byte per character anymore.

So unicode objects encapsulate abstract unicode character sequence - however
they accomplish that is not of your concern. strings on the opposite, are
pure byte sequences - and common libs work with them, with the exception of
the usually unicode aware xml libs. So to yield a string from an unicode
object, one has to specify an encoding - like utf-8 or latin1. Now having a
character in that unicode object that can't be encoded using the specified
encoding, that will produce an error.
Please do read a tutorial on unicode and python - there are several good
ones out there, use google to your advantage.

I can't do query.encode('ascii'), that would be similar to:
>>> x = u'\xc8'
>>> print x.encode('ascii') Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc8' in
position 0: ordinal not in range(128)
Sure- xC8 > 127, so it can't be encoded. Do this:

x = u'\xc8'
x u'\xc8' x.encode('utf-8')

'\xc3\x88'

As you can see, the formerly one byte long character becomes two bytes. The
reason is that on unicode character is translated to that 2-byte sequence
using utf-8.
I also tried setting PostgreSQL's client-encoding by executing "SET
client_encoding TO 'utf-8'", but psycopg still only accepts str-type
strings (which is not really surprising).

Confusion again - please repeat:

unicode is not utf-8!!!
unicode is not utf-8!!!
unicode is not utf-8!!!
unicode is not utf-8!!!

Do encode the unicode object in utf-8, and pass that to the psycopg. If you
set client_encoding to latin1, you have to encode unicod to that.

--
Regards,

Diez B. Roggisch

Jul 18 '05 #2

Jarek Zgoda

Alban Hertroys <al***@magproductions.nl> pisze:

I have a query that inserts data originating from an utf-8 encoded XML
file. And guess what, it contains utf-8 encoded characters...
Now my problem is that psycopg will only accept queries of type str, so
how do I get my utf-8 encoded data into the DB?

I can't do query.encode('ascii'), that would be similar to:
x = u'\xc8'
print x.encode('ascii')

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc8' in
position 0: ordinal not in range(128)

Did you try x.encode('utf-8')?

--
Jarek Zgoda
http://jpa.berlios.de/ | http://www.zgodowie.org/

Jul 18 '05 #3

Alban Hertroys

Diez B. Roggisch wrote:

Alban Hertroys wrote:
I have a query that inserts data originating from an utf-8 encoded XML
file. And guess what, it contains utf-8 encoded characters...
Now my problem is that psycopg will only accept queries of type str, so
how do I get my utf-8 encoded data into the DB?

This sounds like the usual unicode/utf-8 confusion: unicode is an abstract
specification of characters, utf-8 as well as latin1 and ascii are
encodings of that specification that allow for certain characters to be
used - namely, ascii for only well-known first 127, latin1 for some major
european languages, and utf-8 defines escapes for all possible characters
defined in unicode - with the result that some of the characters aren't one
byte per character anymore.

Ah, I see now. I _thought_ it was odd that unicode('string') resulted in
a unicode object and 'string'.encode('utf-8') did not. I understand now
that 'unicode' is data that is actual unicode data, while 'utf-8'
_encoded_ data is really a string, but with special characters rewritten
to specify utf-8 escape sequences instead of the actual unicode bytes.

Thanks for clearing out my confusion.
Please do read a tutorial on unicode and python - there are several good
ones out there, use google to your advantage.
I did, though some time ago. Apparently I missed the point being made
(or forgot about it).
Confusion again - please repeat:

unicode is not utf-8!!!
unicode is not utf-8!!!
unicode is not utf-8!!!
unicode is not utf-8!!!
while confused():
print "unicode is not utf-8!!!"
Do encode the unicode object in utf-8, and pass that to the psycopg. If you
set client_encoding to latin1, you have to encode unicod to that.

I suppose I won't notice much of that until I read from the DB (which is
done in PHP mostly), as the data inserted is already an ascii string by
itself (with escaped utf-8 characters, though). I'll worry about that
later ;)

Many thanks,
Alban.

Jul 18 '05 #4

Diez B. Roggisch

> Ah, I see now. I _thought_ it was odd that unicode('string') resulted in

a unicode object and 'string'.encode('utf-8') did not. I understand now
that 'unicode' is data that is actual unicode data, while 'utf-8'
_encoded_ data is really a string, but with special characters rewritten
to specify utf-8 escape sequences instead of the actual unicode bytes.
Exactly.

Thanks for clearing out my confusion.
Your welcome.
while confused():
print "unicode is not utf-8!!!"

Lets hope confused() is True only for a short time, otherwise you'll end up
with pretty much output...

Do encode the unicode object in utf-8, and pass that to the psycopg. If
you set client_encoding to latin1, you have to encode unicod to that.

I suppose I won't notice much of that until I read from the DB (which is
done in PHP mostly), as the data inserted is already an ascii string by
itself (with escaped utf-8 characters, though). I'll worry about that
later ;)

Well, AFAIK php doesn't care about unicode - all it knows are strings as
byte sequences, plain old C-style. So if you read from it, things should
work if you set your HTTP header variables correct _and_ other parts of you
html-page aren't made in a different encoding - so make sure typing them in
your editor of choice will yield utf-8 data beeing saved.
--
Regards,

Diez B. Roggisch

Jul 18 '05 #5

Similar topics

Has psycopg moved?

by: Jim Hefferon | last post by:

Hello, I want to try psycopg, the module for Postgres access. I'm having trouble reaching the site for four or five days now. For example, the link on the Python web site's DB Modules page...

Python

psycopg problem

by: Gandalf | last post by:

Hi All! Every time I get an error psycopg refuses to execute further commands in the same transaction: psycopg.ProgrammingError:ERROR: current transaction is aborted, commands ignored until...

Python

Can't build psycopg

by: Ed Leafe | last post by:

I've been trying to build psycopg, a python adapter for PostgreSQL. In order to build it, you need to have (among other things) the PostgreSQL source code, headers and libraries. Since my Fedora...

Python

Psycopg; How to detect row locking?

by: Alban Hertroys | last post by:

Good day, I have a number of threads doing inserts in a table, after which I want to do a select. This means that it will occur that the row that I want to select is locked (by the DB). In these...

Python

psycopg, transactions and multiple cursors

by: Alban Hertroys | last post by:

Oh no! It's me and transactions again :) I'm not really sure whether this is a limitation of psycopg or postgresql. When I use multiple cursors in a transaction, the records inserted at the...

Python

psycopg supress output

by: roderik | last post by:

How do I supress the output generated from each psycopg command: >>> import psycopg initpsycopg: initializing psycopg 1.99.10 typecast_init: initializing NUMBER .. .. microprotocols_add:...

Python

Psycopg 1.1.17 compiled binaries for windows, postgre 8.0.0-beta4and python 2.3

by: Eino Mäkitalo | last post by:

I had Visual C++ 6.0, so I compiled those libpq.dll and psycopg.pyd. if there are anyone to play with Windows, Python 2.3 and Postgre-8.0.0-beta4 for windows like me. You cat get those from:...

Python

psycopg authentication

by: jslowery | last post by:

Hello, I'm new to both PostgreSQL and psycopg and I'm trying to connect to my database running on localhost. I have postgres setup to do md5 authentication and this works when using a db admin tool...

Python

inserting/retriving dates in psycopg

by: Michele Simionato | last post by:

Look at this example: >>> import psycopg >>> psycopg.__version__ '1.1.19' >>> import datetime >>> today = datetime.datetime.today() >>> co = psycopg.connect('') >>> cu = co.cursor()

Python

your opinion about psycopg vs pygresql

by: Martin P. Hellwig | last post by:

Hi all, I'm playing a bit with PostgreSQL, in which I've set me the target to create a python script which with user input creates a new user role and a database with that owner (connecting to...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server