473,394 Members | 1,739 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Psycopg and queries with UTF-8 data

Another python/psycopg question, for which the solution is probably
quite simple; I just don't know where to look.

I have a query that inserts data originating from an utf-8 encoded XML
file. And guess what, it contains utf-8 encoded characters...
Now my problem is that psycopg will only accept queries of type str, so
how do I get my utf-8 encoded data into the DB?

I can't do query.encode('ascii'), that would be similar to:
x = u'\xc8'
print x.encode('ascii')

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc8' in
position 0: ordinal not in range(128)
I also tried setting PostgreSQL's client-encoding by executing "SET
client_encoding TO 'utf-8'", but psycopg still only accepts str-type
strings (which is not really surprising).
I assume that the solution will result in an ascii encoded query string,
and that I then can use the QuotedString type to escape my strings
(which is in my current situation not possible because that also only
accepts str type strings and it contains utf-8 characters).

Regards,
Alban.
Jul 18 '05 #1
4 6352
Alban Hertroys wrote:
I have a query that inserts data originating from an utf-8 encoded XML
file. And guess what, it contains utf-8 encoded characters...
Now my problem is that psycopg will only accept queries of type str, so
how do I get my utf-8 encoded data into the DB?
This sounds like the usual unicode/utf-8 confusion: unicode is an abstract
specification of characters, utf-8 as well as latin1 and ascii are
encodings of that specification that allow for certain characters to be
used - namely, ascii for only well-known first 127, latin1 for some major
european languages, and utf-8 defines escapes for all possible characters
defined in unicode - with the result that some of the characters aren't one
byte per character anymore.

So unicode objects encapsulate abstract unicode character sequence - however
they accomplish that is not of your concern. strings on the opposite, are
pure byte sequences - and common libs work with them, with the exception of
the usually unicode aware xml libs. So to yield a string from an unicode
object, one has to specify an encoding - like utf-8 or latin1. Now having a
character in that unicode object that can't be encoded using the specified
encoding, that will produce an error.
Please do read a tutorial on unicode and python - there are several good
ones out there, use google to your advantage.

I can't do query.encode('ascii'), that would be similar to:
>>> x = u'\xc8'
>>> print x.encode('ascii') Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc8' in
position 0: ordinal not in range(128)
Sure- xC8 > 127, so it can't be encoded. Do this:
x = u'\xc8'
x u'\xc8' x.encode('utf-8')

'\xc3\x88'

As you can see, the formerly one byte long character becomes two bytes. The
reason is that on unicode character is translated to that 2-byte sequence
using utf-8.
I also tried setting PostgreSQL's client-encoding by executing "SET
client_encoding TO 'utf-8'", but psycopg still only accepts str-type
strings (which is not really surprising).


Confusion again - please repeat:

unicode is not utf-8!!!
unicode is not utf-8!!!
unicode is not utf-8!!!
unicode is not utf-8!!!

Do encode the unicode object in utf-8, and pass that to the psycopg. If you
set client_encoding to latin1, you have to encode unicod to that.

--
Regards,

Diez B. Roggisch
Jul 18 '05 #2
Alban Hertroys <al***@magproductions.nl> pisze:
I have a query that inserts data originating from an utf-8 encoded XML
file. And guess what, it contains utf-8 encoded characters...
Now my problem is that psycopg will only accept queries of type str, so
how do I get my utf-8 encoded data into the DB?

I can't do query.encode('ascii'), that would be similar to:
x = u'\xc8'
print x.encode('ascii')

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc8' in
position 0: ordinal not in range(128)


Did you try x.encode('utf-8')?

--
Jarek Zgoda
http://jpa.berlios.de/ | http://www.zgodowie.org/
Jul 18 '05 #3
Diez B. Roggisch wrote:
Alban Hertroys wrote:
I have a query that inserts data originating from an utf-8 encoded XML
file. And guess what, it contains utf-8 encoded characters...
Now my problem is that psycopg will only accept queries of type str, so
how do I get my utf-8 encoded data into the DB?

This sounds like the usual unicode/utf-8 confusion: unicode is an abstract
specification of characters, utf-8 as well as latin1 and ascii are
encodings of that specification that allow for certain characters to be
used - namely, ascii for only well-known first 127, latin1 for some major
european languages, and utf-8 defines escapes for all possible characters
defined in unicode - with the result that some of the characters aren't one
byte per character anymore.


Ah, I see now. I _thought_ it was odd that unicode('string') resulted in
a unicode object and 'string'.encode('utf-8') did not. I understand now
that 'unicode' is data that is actual unicode data, while 'utf-8'
_encoded_ data is really a string, but with special characters rewritten
to specify utf-8 escape sequences instead of the actual unicode bytes.

Thanks for clearing out my confusion.
Please do read a tutorial on unicode and python - there are several good
ones out there, use google to your advantage.
I did, though some time ago. Apparently I missed the point being made
(or forgot about it).
Confusion again - please repeat:

unicode is not utf-8!!!
unicode is not utf-8!!!
unicode is not utf-8!!!
unicode is not utf-8!!!
while confused():
print "unicode is not utf-8!!!"
Do encode the unicode object in utf-8, and pass that to the psycopg. If you
set client_encoding to latin1, you have to encode unicod to that.


I suppose I won't notice much of that until I read from the DB (which is
done in PHP mostly), as the data inserted is already an ascii string by
itself (with escaped utf-8 characters, though). I'll worry about that
later ;)

Many thanks,
Alban.
Jul 18 '05 #4
> Ah, I see now. I _thought_ it was odd that unicode('string') resulted in
a unicode object and 'string'.encode('utf-8') did not. I understand now
that 'unicode' is data that is actual unicode data, while 'utf-8'
_encoded_ data is really a string, but with special characters rewritten
to specify utf-8 escape sequences instead of the actual unicode bytes.
Exactly.

Thanks for clearing out my confusion.
Your welcome.
while confused():
print "unicode is not utf-8!!!"


Lets hope confused() is True only for a short time, otherwise you'll end up
with pretty much output...
Do encode the unicode object in utf-8, and pass that to the psycopg. If
you set client_encoding to latin1, you have to encode unicod to that.


I suppose I won't notice much of that until I read from the DB (which is
done in PHP mostly), as the data inserted is already an ascii string by
itself (with escaped utf-8 characters, though). I'll worry about that
later ;)


Well, AFAIK php doesn't care about unicode - all it knows are strings as
byte sequences, plain old C-style. So if you read from it, things should
work if you set your HTTP header variables correct _and_ other parts of you
html-page aren't made in a different encoding - so make sure typing them in
your editor of choice will yield utf-8 data beeing saved.
--
Regards,

Diez B. Roggisch
Jul 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Jim Hefferon | last post by:
Hello, I want to try psycopg, the module for Postgres access. I'm having trouble reaching the site for four or five days now. For example, the link on the Python web site's DB Modules page...
0
by: Gandalf | last post by:
Hi All! Every time I get an error psycopg refuses to execute further commands in the same transaction: psycopg.ProgrammingError:ERROR: current transaction is aborted, commands ignored until...
2
by: Ed Leafe | last post by:
I've been trying to build psycopg, a python adapter for PostgreSQL. In order to build it, you need to have (among other things) the PostgreSQL source code, headers and libraries. Since my Fedora...
12
by: Alban Hertroys | last post by:
Good day, I have a number of threads doing inserts in a table, after which I want to do a select. This means that it will occur that the row that I want to select is locked (by the DB). In these...
11
by: Alban Hertroys | last post by:
Oh no! It's me and transactions again :) I'm not really sure whether this is a limitation of psycopg or postgresql. When I use multiple cursors in a transaction, the records inserted at the...
1
by: roderik | last post by:
How do I supress the output generated from each psycopg command: >>> import psycopg initpsycopg: initializing psycopg 1.99.10 typecast_init: initializing NUMBER .. .. microprotocols_add:...
1
by: Eino Mäkitalo | last post by:
I had Visual C++ 6.0, so I compiled those libpq.dll and psycopg.pyd. if there are anyone to play with Windows, Python 2.3 and Postgre-8.0.0-beta4 for windows like me. You cat get those from:...
7
by: jslowery | last post by:
Hello, I'm new to both PostgreSQL and psycopg and I'm trying to connect to my database running on localhost. I have postgres setup to do md5 authentication and this works when using a db admin tool...
4
by: Michele Simionato | last post by:
Look at this example: >>> import psycopg >>> psycopg.__version__ '1.1.19' >>> import datetime >>> today = datetime.datetime.today() >>> co = psycopg.connect('') >>> cu = co.cursor()
2
by: Martin P. Hellwig | last post by:
Hi all, I'm playing a bit with PostgreSQL, in which I've set me the target to create a python script which with user input creates a new user role and a database with that owner (connecting to...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.