469,110 Members | 1,897 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,110 developers. It's quick & easy.

Confused with db client encoding

Hi,

Here is the output a psql session. Please notice that the identation
inconsistences in the records containg non ASCII chars is as outputed by
psql.

The db was created with LANIN9 and the console was ran (in the same
machine) using UTF-8 (my system's default).

I was surprised to notice that setting the client to unicode (which is
what that console is using) messed the localized chars as I was
expecting to see the opposite way.

On the other way, when invoking from a Java app, running on the same
machine, the accentuaded chars also appeared messed.

Have I misunderstood the manual? How can I get a consistant behaviour?

It was tested in a Debian/unstable box, running PostgreSQL 7.4.5-3 and
Sun's JVM 1.4.2

Thanks,

Carlos

psql session:
-----
mpb2-m16e=# \l
List of databases
Name | Owner | Encoding
-----------+----------+----------
mpb2-test | carlos | LATIN9
template0 | postgres | LATIN9
template1 | postgres | LATIN9
(3 rows)

mpb2-m16e=# select tipo_doc_id, nome, descricao from tab_tipo_doc where
tipo_doc_id < 100;
tipo_doc_id | nome | descricao
-------------+----------------------+---------------------------------------
0 | | (documento desconhecido)
1 | Encomenda | Encomendas
2 | Factura | Facturas
3 | Tx. Dinheiro | Transacções a Dinheiro
11 | Nota de Crédito | Notas de Crédito
12 | Nota de Débito | Notas de Débito
21 | G. Remessa | Guia de Remessa
91 | Sa*da Armazém | Sa*das de Armazém
92 | Ent. Armazém | Entradas em Armazém
5 | Devolução | Devoluções de Facturas/Tx. Dinheiro
99 | Acerto Inv. | Acerto de Inventário
51 | O.T. | Ordens de Trabalho
(12 rows)

mpb2-m16e=# set client_encoding to unicode;
SET
mpb2-m16e=# select tipo_doc_id, nome, descricao from tab_tipo_doc where
tipo_doc_id < 100;
tipo_doc_id | nome | descricao
-------------+----------------------+---------------------------------------
0 | | (documento desconhecido)
1 | Encomenda | Encomendas
2 | Factura | Facturas
3 | Tx. Dinheiro | Transacções a Dinheiro
11 | Nota de Crédito | Notas de Crédito
12 | Nota de Débito | Notas de Débito
21 | G. Remessa | Guia de Remessa
91 | SaÃ*da Armazém | SaÃ*das de Armazém
92 | Ent. Armazém | Entradas em Armazém
5 | Devolução | Devoluções de Facturas/Tx.
Dinheiro
99 | Acerto Inv. | Acerto de Inventário
51 | O.T. | Ordens de Trabalho
(12 rows)


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #1
1 2484
On Mon, 06 Sep 2004 00:02:24 +0100, Carlos Correia <ca****@m16e.com> wrote:
Hi,

Here is the output a psql session. Please notice that the identation
inconsistences in the records containg non ASCII chars is as outputed by
psql.

The db was created with LANIN9 and the console was ran (in the same
machine) using UTF-8 (my system's default).

I was surprised to notice that setting the client to unicode (which is
what that console is using) messed the localized chars as I was
expecting to see the opposite way.

On the other way, when invoking from a Java app, running on the same
machine, the accentuaded chars also appeared messed.
(...) 3 | Tx. Dinheiro | Transacções a Dinheiro
11 | Nota de Crédito | Notas de Crédito
12 | Nota de Débito | Notas de Débito
21 | G. Remessa | Guia de Remessa


It looks like this data was entered as UTF-8 but the client encoding
was LATIN9 (or whatever), meaning the two incoming bytes from each
accentuated character in UTF-8 was interpreted by the backend as two
individual bytes in LATINx.

Test case (session in a UTF-8 environment):

test=# CREATE DATABASE ctest encoding 'LATIN1';
CREATE DATABASE
test=# \c ctest;
You are now connected to database "ctest".
ctest=# CREATE TABLE coding (data TEXT);
CREATE TABLE
ctest=# SET client_encoding TO LATIN1;
SET
ctest=# INSERT INTO coding VALUES('mller');
INSERT 349960 1
ctest=# SELECT * FROM coding;
data
---------
mller
(1 row)

ctest=# SET client_encoding TO UNICODE;
SET
ctest=# SELECT * FROM coding;
data
---------
müller
(1 row)

Ian Barwick

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

reply views Thread by Unbreakable | last post: by
4 posts views Thread by Ian Harding | last post: by
7 posts views Thread by Christian Wilhelm | last post: by
reply views Thread by helldiversafe-news | last post: by
14 posts views Thread by Ankit Aneja | last post: by
reply views Thread by Pascal Flckiger | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by kglaser89 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.