By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,541 Members | 1,455 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,541 IT Pros & Developers. It's quick & easy.

multibyte support

P: n/a
Running postgresql-7.3.2-3 which came with Red Hat 9.0.

Created a database with unicode encoding (in psql) as below:

create database leatherlink with encoding='unicode' template=leatherlinkdb;

leatherlinkdb is an existing database with the default encoding SQL_ASCII.

When I insert Chinsese strings into the database, it is taken in and displayed
back properly. But there is an issue:

In a varchar(100) field, about 15 characters fill up the whole space. Looking
at the database entry using psql show the characters in hexadecimel values.

The documentation mentions that version 7.3 and greater have mb support by
default. How to configure the database to accept and store the multibyte
characters?

--
Integrated Management Tools for leather industry
----------------------------------
http://www.leatherlink.net

Ma Siva Kumar,
BSG LeatherLink (P) Ltd,

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
Ma Siva Kumar wrote:
Running postgresql-7.3.2-3 which came with Red Hat 9.0.

Created a database with unicode encoding (in psql) as below:

create database leatherlink with encoding='unicode' template=leatherlinkdb;

leatherlinkdb is an existing database with the default encoding SQL_ASCII.

When I insert Chinsese strings into the database, it is taken in and displayed
back properly. But there is an issue:

In a varchar(100) field, about 15 characters fill up the whole space. Looking
at the database entry using psql show the characters in hexadecimel values.

The documentation mentions that version 7.3 and greater have mb support by
default. How to configure the database to accept and store the multibyte
characters?


This is something I've been wondereing about for quite awhile - does
pgsql measure bytes or chars when using UTF for varchars. It looks like
bytes, which is counter intuitive. What are the byte codes for those 15
chars. I think the maximum UTF char's byte lenghty is either 5 or 6
bytes.. Since there are SO many chinese people in the world and Chinese
should either be popluar or getting popular in the comptuer world, I
would have though thta the UTF consotium wold have made Chinese at a
point in the tables that it only required 2,3. or 4 bytes max, and made
obtuse languages up in the 5 to 6 byte part of the table.

--
"You are behaving like a man",
is compliment from an good woman.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #2

P: n/a
On Tuesday 11 Nov 2003 9:02 pm, Dennis Gearon wrote:
This is something I've been wondereing about for quite awhile - does
pgsql measure bytes or chars when using UTF for varchars. It looks like
bytes, which is counter intuitive. What are the byte codes for those 15
chars. I think the maximum UTF char's byte lenghty is either 5 or 6
bytes.. Since there are SO many chinese people in the world and Chinese
should either be popluar or getting popular in the comptuer world, I
would have though thta the UTF consotium wold have made Chinese at a
point in the tables that it only required 2,3. or 4 bytes max, and made
obtuse languages up in the 5 to 6 byte part of the table.


在您的系统*直接获 (entered through html form processed by php script) shows as
在您的系统 when seen with psql. Anything more
than this is rejected for lack of space (the size is varchar(100)

If someone can throw more light on this, I will be grateful.

Best regards
--
Integrated Management Tools for leather industry
----------------------------------
http://www.leatherlink.net

Ma Siva Kumar,
BSG LeatherLink (P) Ltd,
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #3

P: n/a
Ma Siva Kumar <si**@leatherlink.net> writes:
On Tuesday 11 Nov 2003 9:02 pm, Dennis Gearon wrote:
This is something I've been wondereing about for quite awhile - does
pgsql measure bytes or chars when using UTF for varchars. It looks like
bytes, which is counter intuitive.

The measurement is certainly in characters, in 7.3 and later. In 7.2 it
was in characters if you'd enabled multibyte. Once upon a time it was
in bytes, but I don't believe that applies to Ma Siva Kumar's problem.
在您的系统*直接获 (entered through html form processed by php script) shows as
在您的系统 when seen with psql. Anything more
than this is rejected for lack of space (the size is varchar(100)


I think there is some confusion between you and the database about
character set encoding. Double check what the database encoding is
(psql \l will tell you). And double check what the system thinks the
client-side encoding is ("show client_encoding" and/or \encoding).

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 12 '05 #4

P: n/a
On Wednesday 12 Nov 2003 8:40 pm, Tom Lane wrote:
I think there is some confusion between you and the database about
character set encoding. Double check what the database encoding is
(psql \l will tell you). And double check what the system thinks the
client-side encoding is ("show client_encoding" and/or \encoding).


Thanks for the suggestions. In a psql session, \l shows the encoding of the
database as unicode (in Name, Owner, Encoding form) and both \encoding and
show client_encoding; return unicode.

But it turned out that the problem is not with the database, but with the
client application (php). When I entered Chinese characters into the database
through psql client, it IS stored as chinese characters and works as
expected.

This I found out when Mark Rappoport suggested to configure php to handle
multibyte strings. The version of php I run is not handling the multibyte
string entered in the forms properly. I need to recompile php with
--enable-mbstring (http://www.php.net/manual/en/ref.mbstring.php) to solve
the problem.

Thanks everyone for the help.
Integrated Management Tools for leather industry
----------------------------------
http://www.leatherlink.net

Ma Siva Kumar,
BSG LeatherLink (P) Ltd,

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #5

P: n/a
[copying to the list, in case someone else faces the similar situation and is
looking for an answer]

On Thursday 13 Nov 2003 10:46 am, you wrote:
Thank you for this tip. Will you show us the phpinfo() output,
appropriately edited for security, even the single line in the php
section, that shows it is using mb strings when you get done, please?


Hello Dennis,

Here is what I did.

1. Recompile php using the src rpm provided with Red Hat (rpmbuild) using the
following

configure --with-pgsql --without-mysql --enable-mbstring --with-apxs2
make
make install

2. Snipptets of phpinfo():

Configure Command './configure' '--with-pgsql' '--without-mysql'
'--with-apxs2' '--enable-mbstring'

mbstring
Multibyte (Japanese) Support enabled

Directive Local Value Master Value
mbstring.detect_order no value no value
mbstring.func_overload 0 0
mbstring.http_input UTF-8 UTF-8
mbstring.http_output UTF-8 UTF-8
mbstring.internal_encoding UTF-8 UTF-8
mbstring.substitute_character no value no value

3. Since we will be using many languages, I set the encoding to UTF-8 here as
well as in the header files of all my scripts. In addition, I set the
default_charset directive in php.ini to UTF-8.

I guess all the above settings may not be necessary. But it works for me :-)

Best regards,
Ma SivaKumar
--
Integrated Management Tools for leather industry
----------------------------------
http://www.leatherlink.net

Ma Siva Kumar,
BSG LeatherLink (P) Ltd,

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.