473,385 Members | 1,753 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

multibyte support

Running postgresql-7.3.2-3 which came with Red Hat 9.0.

Created a database with unicode encoding (in psql) as below:

create database leatherlink with encoding='unicode' template=leatherlinkdb;

leatherlinkdb is an existing database with the default encoding SQL_ASCII.

When I insert Chinsese strings into the database, it is taken in and displayed
back properly. But there is an issue:

In a varchar(100) field, about 15 characters fill up the whole space. Looking
at the database entry using psql show the characters in hexadecimel values.

The documentation mentions that version 7.3 and greater have mb support by
default. How to configure the database to accept and store the multibyte
characters?

--
Integrated Management Tools for leather industry
----------------------------------
http://www.leatherlink.net

Ma Siva Kumar,
BSG LeatherLink (P) Ltd,

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #1
5 3048
Ma Siva Kumar wrote:
Running postgresql-7.3.2-3 which came with Red Hat 9.0.

Created a database with unicode encoding (in psql) as below:

create database leatherlink with encoding='unicode' template=leatherlinkdb;

leatherlinkdb is an existing database with the default encoding SQL_ASCII.

When I insert Chinsese strings into the database, it is taken in and displayed
back properly. But there is an issue:

In a varchar(100) field, about 15 characters fill up the whole space. Looking
at the database entry using psql show the characters in hexadecimel values.

The documentation mentions that version 7.3 and greater have mb support by
default. How to configure the database to accept and store the multibyte
characters?


This is something I've been wondereing about for quite awhile - does
pgsql measure bytes or chars when using UTF for varchars. It looks like
bytes, which is counter intuitive. What are the byte codes for those 15
chars. I think the maximum UTF char's byte lenghty is either 5 or 6
bytes.. Since there are SO many chinese people in the world and Chinese
should either be popluar or getting popular in the comptuer world, I
would have though thta the UTF consotium wold have made Chinese at a
point in the tables that it only required 2,3. or 4 bytes max, and made
obtuse languages up in the 5 to 6 byte part of the table.

--
"You are behaving like a man",
is compliment from an good woman.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #2
On Tuesday 11 Nov 2003 9:02 pm, Dennis Gearon wrote:
This is something I've been wondereing about for quite awhile - does
pgsql measure bytes or chars when using UTF for varchars. It looks like
bytes, which is counter intuitive. What are the byte codes for those 15
chars. I think the maximum UTF char's byte lenghty is either 5 or 6
bytes.. Since there are SO many chinese people in the world and Chinese
should either be popluar or getting popular in the comptuer world, I
would have though thta the UTF consotium wold have made Chinese at a
point in the tables that it only required 2,3. or 4 bytes max, and made
obtuse languages up in the 5 to 6 byte part of the table.


在您的系统ä¸*直接获 (entered through html form processed by php script) shows as
在您的系统 when seen with psql. Anything more
than this is rejected for lack of space (the size is varchar(100)

If someone can throw more light on this, I will be grateful.

Best regards
--
Integrated Management Tools for leather industry
----------------------------------
http://www.leatherlink.net

Ma Siva Kumar,
BSG LeatherLink (P) Ltd,
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #3
Ma Siva Kumar <si**@leatherlink.net> writes:
On Tuesday 11 Nov 2003 9:02 pm, Dennis Gearon wrote:
This is something I've been wondereing about for quite awhile - does
pgsql measure bytes or chars when using UTF for varchars. It looks like
bytes, which is counter intuitive.

The measurement is certainly in characters, in 7.3 and later. In 7.2 it
was in characters if you'd enabled multibyte. Once upon a time it was
in bytes, but I don't believe that applies to Ma Siva Kumar's problem.
在您的系统ä¸*直接获 (entered through html form processed by php script) shows as
在您的系统 when seen with psql. Anything more
than this is rejected for lack of space (the size is varchar(100)


I think there is some confusion between you and the database about
character set encoding. Double check what the database encoding is
(psql \l will tell you). And double check what the system thinks the
client-side encoding is ("show client_encoding" and/or \encoding).

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 12 '05 #4
On Wednesday 12 Nov 2003 8:40 pm, Tom Lane wrote:
I think there is some confusion between you and the database about
character set encoding. Double check what the database encoding is
(psql \l will tell you). And double check what the system thinks the
client-side encoding is ("show client_encoding" and/or \encoding).


Thanks for the suggestions. In a psql session, \l shows the encoding of the
database as unicode (in Name, Owner, Encoding form) and both \encoding and
show client_encoding; return unicode.

But it turned out that the problem is not with the database, but with the
client application (php). When I entered Chinese characters into the database
through psql client, it IS stored as chinese characters and works as
expected.

This I found out when Mark Rappoport suggested to configure php to handle
multibyte strings. The version of php I run is not handling the multibyte
string entered in the forms properly. I need to recompile php with
--enable-mbstring (http://www.php.net/manual/en/ref.mbstring.php) to solve
the problem.

Thanks everyone for the help.
Integrated Management Tools for leather industry
----------------------------------
http://www.leatherlink.net

Ma Siva Kumar,
BSG LeatherLink (P) Ltd,

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #5
[copying to the list, in case someone else faces the similar situation and is
looking for an answer]

On Thursday 13 Nov 2003 10:46 am, you wrote:
Thank you for this tip. Will you show us the phpinfo() output,
appropriately edited for security, even the single line in the php
section, that shows it is using mb strings when you get done, please?


Hello Dennis,

Here is what I did.

1. Recompile php using the src rpm provided with Red Hat (rpmbuild) using the
following

configure --with-pgsql --without-mysql --enable-mbstring --with-apxs2
make
make install

2. Snipptets of phpinfo():

Configure Command './configure' '--with-pgsql' '--without-mysql'
'--with-apxs2' '--enable-mbstring'

mbstring
Multibyte (Japanese) Support enabled

Directive Local Value Master Value
mbstring.detect_order no value no value
mbstring.func_overload 0 0
mbstring.http_input UTF-8 UTF-8
mbstring.http_output UTF-8 UTF-8
mbstring.internal_encoding UTF-8 UTF-8
mbstring.substitute_character no value no value

3. Since we will be using many languages, I set the encoding to UTF-8 here as
well as in the header files of all my scripts. In addition, I set the
default_charset directive in php.ini to UTF-8.

I guess all the above settings may not be necessary. But it works for me :-)

Best regards,
Ma SivaKumar
--
Integrated Management Tools for leather industry
----------------------------------
http://www.leatherlink.net

Ma Siva Kumar,
BSG LeatherLink (P) Ltd,

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

18
by: Zygmunt Krynicki | last post by:
Hello I've browsed the FAQ but apparently it lacks any questions concenring wide character strings. I'd like to calculate the length of a multibyte string without converting the whole string. ...
3
by: yazan jab | last post by:
Is it true that Multibyte characters are : char arrays (witch represent a string from the basic characters set). In this case Wide characters are the way for encoding characters from the...
3
by: Simon Morgan | last post by:
Hi, The following code is meant to validate a string of multibyte characters by using mbcheck() to call mblen() on each character on the string passed to it. The problem is that it isn't working...
3
by: Weiping | last post by:
Hi, while upgrade to 8.0 (beta3) we got some problem: we have a database which encoding is UNICODE, when we do queries like: select upper('ÖÐÎÄ'); --select some multibyte character, then...
1
by: Marcel Ruff | last post by:
Hi, i have the question on how to determine the string length of a wide string and a multibyte string: 1. Number of letters (one letter may use three bytes) 2. Number of bytes In the code...
0
by: Munch | last post by:
my C program deals with single byte characters but now i want to fetch multibyte data stored in the datbase so what all changes i need to make to the code so that it handles multibyte data as well....
0
by: Munch | last post by:
my C program deals with single byte characters but now i want to fetch multibyte data stored in the datbase so what all changes i need to make to the code so that it handles multibyte data as well. ...
2
by: Andrew McLean | last post by:
I've just moved my web site from one server to another at the same hosting company (I upgraded to a better package). Unfortunately one of my WordPress plugins now complains about the absence of...
10
by: Dancefire | last post by:
Hi, everyone, I'm writing a program using wstring(wchar_t) as internal string. The problem is raised when I convert the multibyte char set string with different encoding to wstring(which is...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.