By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,610 Members | 1,677 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,610 IT Pros & Developers. It's quick & easy.

Storing some Japanese data.

P: n/a
We are converting a data warehouse to a Unicode database to get ready
for multilingual support. If we will have 95% of our data in English
as we currently do, and less than 5% in other foreign languages
including Japanese, it appears as if we would be best off using
codepage of 1208 and UTF-8. We are thinking we would need to expand
our 'char' and 'varchar' datatypes by four times to accommadate the
Japanese data. Using 'varchar' should minimize the database size
required to store our data, right? We would be minimizing storage
required for single byte English characters which are the majority of
our data. Can anyone validate or shed some further light on this?
Thanks, Tony

May 10 '06 #1
Share this Question
Share on Google+
1 Reply


P: n/a

<to*******@gmail.com> wrote in message
news:11*********************@e56g2000cwe.googlegro ups.com...
We are converting a data warehouse to a Unicode database to get ready
for multilingual support. If we will have 95% of our data in English
as we currently do, and less than 5% in other foreign languages
including Japanese, it appears as if we would be best off using
codepage of 1208 and UTF-8. We are thinking we would need to expand
our 'char' and 'varchar' datatypes by four times to accommadate the
Japanese data. Using 'varchar' should minimize the database size
required to store our data, right? We would be minimizing storage
required for single byte English characters which are the majority of
our data. Can anyone validate or shed some further light on this?
Thanks, Tony

I've never really had anything to do with storing foreign character sets but
I've always understood that Japanese and other ideographic languages are
supposed to use the "graphic" datatypes, namely GRAPHIC, VARGRAPHIC, and
LONG VARGRAPHIC. These are designed for DBCS (Double Byte Character Set)
data and only use twice as much space as English characters, not four times
as much.

As I said, I've never really worked with Japanese, Korean, Thai or other
non-Latin languages and I'm not very current on the preferred ways of
handling them. It's quite possible that Unicode is the better way to handle
that sort of data today.

I think you should be able to find more information on the best ways of
handling foreign character sets, like Japanese, in the Information Center
for your version of DB2. The information center for DB2 Version 8 for
Unix/Linux/Windows can be found at
http://publib.boulder.ibm.com/infoce...w/v8/index.jsp. I search on
DBCS and came up with lots of hits that discussed DBCS, UTF-8, and other
approaches, including this chart:

Table 50. Japan, territory identifier: JP Code page Group Code set
Territory code Locale Operating system
932 D-1 IBM-932 81 Ja_JP AIX
943 D-1 IBM-943 81 Ja_JP AIX
See note 2.
954 D-1 IBM-eucJP 81 ja_JP AIX
1208 N-1 UTF-8 81 JA_JP AIX
930 D-1 IBM-930 81 - Host
939 D-1 IBM-939 81 - Host
5026 D-1 IBM-5026 81 - Host
5035 D-1 IBM-5035 81 - Host
1390 D-1 81 - Host
1399 D-1 81 - Host
954 D-1 eucJP 81 ja_JP.eucJP HP-UX
5039 D-1 SJIS 81 ja_JP.SJIS HP-UX
954 D-1 EUC-JP 81 ja_JP Linux
932 D-1 IBM-932 81 - OS/2
942 D-1 IBM-942 81 - OS/2
943 D-1 IBM-943 81 - OS/2
954 D-1 eucJP 81 ja SCO
954 D-1 eucJP 81 ja_JP SCO
954 D-1 eucJP 81 ja_JP.EUC SCO
954 D-1 eucJP 81 ja_JP.eucJP SCO
943 D-1 IBM-943 81 ja_JP.PCK Solaris
954 D-1 eucJP 81 ja Solaris
954 D-1 eucJP 81 japanese Solaris
1208 N-1 UTF-8 81 ja_JP.UTF-8 Solaris
943 D-1 IBM-943 81 - Windows
1394 D-1 81 -
See note 3.

These are the relevant notes for the table:
1..
2.. On AIX 4.3 or later the code page is 943. If you are using AIX 4.2 or
earlier, the code page is 932.
3.. Code page 1394 (Shift JIS X0213) can only be used with the load or
import utilities to move data from code page 1394 to a DB2 UDB Unicode
database, or to export from a DB2 UDB Unicode database to code page 1394.

If you search on terms like "DBCS", "Japanese", "Unicode", "UCS-2", and so
forth, you should find the best ways to store Japanese data.

--
Rhino
May 11 '06 #2

This discussion thread is closed

Replies have been disabled for this discussion.