473,732 Members | 2,205 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Psycopg and queries with UTF-8 data

Another python/psycopg question, for which the solution is probably
quite simple; I just don't know where to look.

I have a query that inserts data originating from an utf-8 encoded XML
file. And guess what, it contains utf-8 encoded characters...
Now my problem is that psycopg will only accept queries of type str, so
how do I get my utf-8 encoded data into the DB?

I can't do query.encode('a scii'), that would be similar to:
x = u'\xc8'
print x.encode('ascii ')

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeEr ror: 'ascii' codec can't encode character u'\xc8' in
position 0: ordinal not in range(128)
I also tried setting PostgreSQL's client-encoding by executing "SET
client_encoding TO 'utf-8'", but psycopg still only accepts str-type
strings (which is not really surprising).
I assume that the solution will result in an ascii encoded query string,
and that I then can use the QuotedString type to escape my strings
(which is in my current situation not possible because that also only
accepts str type strings and it contains utf-8 characters).

Regards,
Alban.
Jul 18 '05 #1
4 6397
Alban Hertroys wrote:
I have a query that inserts data originating from an utf-8 encoded XML
file. And guess what, it contains utf-8 encoded characters...
Now my problem is that psycopg will only accept queries of type str, so
how do I get my utf-8 encoded data into the DB?
This sounds like the usual unicode/utf-8 confusion: unicode is an abstract
specification of characters, utf-8 as well as latin1 and ascii are
encodings of that specification that allow for certain characters to be
used - namely, ascii for only well-known first 127, latin1 for some major
european languages, and utf-8 defines escapes for all possible characters
defined in unicode - with the result that some of the characters aren't one
byte per character anymore.

So unicode objects encapsulate abstract unicode character sequence - however
they accomplish that is not of your concern. strings on the opposite, are
pure byte sequences - and common libs work with them, with the exception of
the usually unicode aware xml libs. So to yield a string from an unicode
object, one has to specify an encoding - like utf-8 or latin1. Now having a
character in that unicode object that can't be encoded using the specified
encoding, that will produce an error.
Please do read a tutorial on unicode and python - there are several good
ones out there, use google to your advantage.

I can't do query.encode('a scii'), that would be similar to:
>>> x = u'\xc8'
>>> print x.encode('ascii ') Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeEr ror: 'ascii' codec can't encode character u'\xc8' in
position 0: ordinal not in range(128)
Sure- xC8 > 127, so it can't be encoded. Do this:
x = u'\xc8'
x u'\xc8' x.encode('utf-8')

'\xc3\x88'

As you can see, the formerly one byte long character becomes two bytes. The
reason is that on unicode character is translated to that 2-byte sequence
using utf-8.
I also tried setting PostgreSQL's client-encoding by executing "SET
client_encoding TO 'utf-8'", but psycopg still only accepts str-type
strings (which is not really surprising).


Confusion again - please repeat:

unicode is not utf-8!!!
unicode is not utf-8!!!
unicode is not utf-8!!!
unicode is not utf-8!!!

Do encode the unicode object in utf-8, and pass that to the psycopg. If you
set client_encoding to latin1, you have to encode unicod to that.

--
Regards,

Diez B. Roggisch
Jul 18 '05 #2
Alban Hertroys <al***@magprodu ctions.nl> pisze:
I have a query that inserts data originating from an utf-8 encoded XML
file. And guess what, it contains utf-8 encoded characters...
Now my problem is that psycopg will only accept queries of type str, so
how do I get my utf-8 encoded data into the DB?

I can't do query.encode('a scii'), that would be similar to:
x = u'\xc8'
print x.encode('ascii ')

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeEr ror: 'ascii' codec can't encode character u'\xc8' in
position 0: ordinal not in range(128)


Did you try x.encode('utf-8')?

--
Jarek Zgoda
http://jpa.berlios.de/ | http://www.zgodowie.org/
Jul 18 '05 #3
Diez B. Roggisch wrote:
Alban Hertroys wrote:
I have a query that inserts data originating from an utf-8 encoded XML
file. And guess what, it contains utf-8 encoded characters...
Now my problem is that psycopg will only accept queries of type str, so
how do I get my utf-8 encoded data into the DB?

This sounds like the usual unicode/utf-8 confusion: unicode is an abstract
specification of characters, utf-8 as well as latin1 and ascii are
encodings of that specification that allow for certain characters to be
used - namely, ascii for only well-known first 127, latin1 for some major
european languages, and utf-8 defines escapes for all possible characters
defined in unicode - with the result that some of the characters aren't one
byte per character anymore.


Ah, I see now. I _thought_ it was odd that unicode('string ') resulted in
a unicode object and 'string'.encode ('utf-8') did not. I understand now
that 'unicode' is data that is actual unicode data, while 'utf-8'
_encoded_ data is really a string, but with special characters rewritten
to specify utf-8 escape sequences instead of the actual unicode bytes.

Thanks for clearing out my confusion.
Please do read a tutorial on unicode and python - there are several good
ones out there, use google to your advantage.
I did, though some time ago. Apparently I missed the point being made
(or forgot about it).
Confusion again - please repeat:

unicode is not utf-8!!!
unicode is not utf-8!!!
unicode is not utf-8!!!
unicode is not utf-8!!!
while confused():
print "unicode is not utf-8!!!"
Do encode the unicode object in utf-8, and pass that to the psycopg. If you
set client_encoding to latin1, you have to encode unicod to that.


I suppose I won't notice much of that until I read from the DB (which is
done in PHP mostly), as the data inserted is already an ascii string by
itself (with escaped utf-8 characters, though). I'll worry about that
later ;)

Many thanks,
Alban.
Jul 18 '05 #4
> Ah, I see now. I _thought_ it was odd that unicode('string ') resulted in
a unicode object and 'string'.encode ('utf-8') did not. I understand now
that 'unicode' is data that is actual unicode data, while 'utf-8'
_encoded_ data is really a string, but with special characters rewritten
to specify utf-8 escape sequences instead of the actual unicode bytes.
Exactly.

Thanks for clearing out my confusion.
Your welcome.
while confused():
print "unicode is not utf-8!!!"


Lets hope confused() is True only for a short time, otherwise you'll end up
with pretty much output...
Do encode the unicode object in utf-8, and pass that to the psycopg. If
you set client_encoding to latin1, you have to encode unicod to that.


I suppose I won't notice much of that until I read from the DB (which is
done in PHP mostly), as the data inserted is already an ascii string by
itself (with escaped utf-8 characters, though). I'll worry about that
later ;)


Well, AFAIK php doesn't care about unicode - all it knows are strings as
byte sequences, plain old C-style. So if you read from it, things should
work if you set your HTTP header variables correct _and_ other parts of you
html-page aren't made in a different encoding - so make sure typing them in
your editor of choice will yield utf-8 data beeing saved.
--
Regards,

Diez B. Roggisch
Jul 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
1473
by: Jim Hefferon | last post by:
Hello, I want to try psycopg, the module for Postgres access. I'm having trouble reaching the site for four or five days now. For example, the link on the Python web site's DB Modules page http://initd.org/Software/psycopg times out, as do some other links that Google gives me. Am I looking in the wrong place, or is it just that the site is experiencing some difficulties? Thanks, Jim
0
1124
by: Gandalf | last post by:
Hi All! Every time I get an error psycopg refuses to execute further commands in the same transaction: psycopg.ProgrammingError:ERROR: current transaction is aborted, commands ignored until end of transaction block Why is that? I would like to decide if I want to continue my transaction or not. Isn't it possible?
2
2142
by: Ed Leafe | last post by:
I've been trying to build psycopg, a python adapter for PostgreSQL. In order to build it, you need to have (among other things) the PostgreSQL source code, headers and libraries. Since my Fedora install doesn't include these things, I downloaded the source and ran configure and make. At this point everything should be built and ready to install. I don't want to run make install, since I already have a perfectly good PostgreSQL installation...
12
2620
by: Alban Hertroys | last post by:
Good day, I have a number of threads doing inserts in a table, after which I want to do a select. This means that it will occur that the row that I want to select is locked (by the DB). In these cases, I believe I receive an OperationalError (or is it an InterfaceError?). Is it possible (and if so - how?) to verify that the exception occured because of row locking, so that I can wait and try again?
11
9021
by: Alban Hertroys | last post by:
Oh no! It's me and transactions again :) I'm not really sure whether this is a limitation of psycopg or postgresql. When I use multiple cursors in a transaction, the records inserted at the start of the transaction aren't visible to those later on in that transaction (using a different cursor). Attached is a simplified example (the except's are a bit blunt, I know) of what I'm trying to do. In reality, the different cursors are...
1
1549
by: roderik | last post by:
How do I supress the output generated from each psycopg command: >>> import psycopg initpsycopg: initializing psycopg 1.99.10 typecast_init: initializing NUMBER .. .. microprotocols_add: cast 0x46dd20 for (bool, ?) initpsycopg: module initialization complete
1
1899
by: Eino Mäkitalo | last post by:
I had Visual C++ 6.0, so I compiled those libpq.dll and psycopg.pyd. if there are anyone to play with Windows, Python 2.3 and Postgre-8.0.0-beta4 for windows like me. You cat get those from: http://eino.net/html/python.html Original psycopg source code is available in: http://initd.org/projects/psycopg1
7
8011
by: jslowery | last post by:
Hello, I'm new to both PostgreSQL and psycopg and I'm trying to connect to my database running on localhost. I have postgres setup to do md5 authentication and this works when using a db admin tool on my local network. For some reason, psycopg fails with IDENT authentication. >>> import psycopg >>> psycopg.connect("dbname=jlowery user=jlowery host=localhost password=XXX") Traceback (most recent call last): File "<stdin>", line 1, in ?
4
3231
by: Michele Simionato | last post by:
Look at this example: >>> import psycopg >>> psycopg.__version__ '1.1.19' >>> import datetime >>> today = datetime.datetime.today() >>> co = psycopg.connect('') >>> cu = co.cursor()
2
4367
by: Martin P. Hellwig | last post by:
Hi all, I'm playing a bit with PostgreSQL, in which I've set me the target to create a python script which with user input creates a new user role and a database with that owner (connecting to template1 since I know that at least that db exists). Ok so I installed PostGreSQL and pygresql since it looked like that this is endorsed by PG, I had some trouble with the DB-API2 (complains about there is already a connection to template1,...
0
8946
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9447
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
9235
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9181
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6031
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4550
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4809
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3261
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2180
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.