472,121 Members | 1,460 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,121 software developers and data experts.

Storing some Japanese data.

We are converting a data warehouse to a Unicode database to get ready
for multilingual support. If we will have 95% of our data in English
as we currently do, and less than 5% in other foreign languages
including Japanese, it appears as if we would be best off using
codepage of 1208 and UTF-8. We are thinking we would need to expand
our 'char' and 'varchar' datatypes by four times to accommadate the
Japanese data. Using 'varchar' should minimize the database size
required to store our data, right? We would be minimizing storage
required for single byte English characters which are the majority of
our data. Can anyone validate or shed some further light on this?
Thanks, Tony

May 10 '06 #1
1 2817

<to*******@gmail.com> wrote in message
news:11*********************@e56g2000cwe.googlegro ups.com...
We are converting a data warehouse to a Unicode database to get ready
for multilingual support. If we will have 95% of our data in English
as we currently do, and less than 5% in other foreign languages
including Japanese, it appears as if we would be best off using
codepage of 1208 and UTF-8. We are thinking we would need to expand
our 'char' and 'varchar' datatypes by four times to accommadate the
Japanese data. Using 'varchar' should minimize the database size
required to store our data, right? We would be minimizing storage
required for single byte English characters which are the majority of
our data. Can anyone validate or shed some further light on this?
Thanks, Tony

I've never really had anything to do with storing foreign character sets but
I've always understood that Japanese and other ideographic languages are
supposed to use the "graphic" datatypes, namely GRAPHIC, VARGRAPHIC, and
LONG VARGRAPHIC. These are designed for DBCS (Double Byte Character Set)
data and only use twice as much space as English characters, not four times
as much.

As I said, I've never really worked with Japanese, Korean, Thai or other
non-Latin languages and I'm not very current on the preferred ways of
handling them. It's quite possible that Unicode is the better way to handle
that sort of data today.

I think you should be able to find more information on the best ways of
handling foreign character sets, like Japanese, in the Information Center
for your version of DB2. The information center for DB2 Version 8 for
Unix/Linux/Windows can be found at
http://publib.boulder.ibm.com/infoce...w/v8/index.jsp. I search on
DBCS and came up with lots of hits that discussed DBCS, UTF-8, and other
approaches, including this chart:

Table 50. Japan, territory identifier: JP Code page Group Code set
Territory code Locale Operating system
932 D-1 IBM-932 81 Ja_JP AIX
943 D-1 IBM-943 81 Ja_JP AIX
See note 2.
954 D-1 IBM-eucJP 81 ja_JP AIX
1208 N-1 UTF-8 81 JA_JP AIX
930 D-1 IBM-930 81 - Host
939 D-1 IBM-939 81 - Host
5026 D-1 IBM-5026 81 - Host
5035 D-1 IBM-5035 81 - Host
1390 D-1 81 - Host
1399 D-1 81 - Host
954 D-1 eucJP 81 ja_JP.eucJP HP-UX
5039 D-1 SJIS 81 ja_JP.SJIS HP-UX
954 D-1 EUC-JP 81 ja_JP Linux
932 D-1 IBM-932 81 - OS/2
942 D-1 IBM-942 81 - OS/2
943 D-1 IBM-943 81 - OS/2
954 D-1 eucJP 81 ja SCO
954 D-1 eucJP 81 ja_JP SCO
954 D-1 eucJP 81 ja_JP.EUC SCO
954 D-1 eucJP 81 ja_JP.eucJP SCO
943 D-1 IBM-943 81 ja_JP.PCK Solaris
954 D-1 eucJP 81 ja Solaris
954 D-1 eucJP 81 japanese Solaris
1208 N-1 UTF-8 81 ja_JP.UTF-8 Solaris
943 D-1 IBM-943 81 - Windows
1394 D-1 81 -
See note 3.

These are the relevant notes for the table:
1..
2.. On AIX 4.3 or later the code page is 943. If you are using AIX 4.2 or
earlier, the code page is 932.
3.. Code page 1394 (Shift JIS X0213) can only be used with the load or
import utilities to move data from code page 1394 to a DB2 UDB Unicode
database, or to export from a DB2 UDB Unicode database to code page 1394.

If you search on terms like "DBCS", "Japanese", "Unicode", "UCS-2", and so
forth, you should find the best ways to store Japanese data.

--
Rhino
May 11 '06 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

2 posts views Thread by David Thomas | last post: by
7 posts views Thread by Shelly | last post: by
8 posts views Thread by Daniel | last post: by
3 posts views Thread by Mitchell Thomas | last post: by
21 posts views Thread by Doug Lerner | last post: by
reply views Thread by leo001 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.