473,407 Members | 2,306 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,407 software developers and data experts.

Storing some Japanese data.

We are converting a data warehouse to a Unicode database to get ready
for multilingual support. If we will have 95% of our data in English
as we currently do, and less than 5% in other foreign languages
including Japanese, it appears as if we would be best off using
codepage of 1208 and UTF-8. We are thinking we would need to expand
our 'char' and 'varchar' datatypes by four times to accommadate the
Japanese data. Using 'varchar' should minimize the database size
required to store our data, right? We would be minimizing storage
required for single byte English characters which are the majority of
our data. Can anyone validate or shed some further light on this?
Thanks, Tony

May 10 '06 #1
1 3004

<to*******@gmail.com> wrote in message
news:11*********************@e56g2000cwe.googlegro ups.com...
We are converting a data warehouse to a Unicode database to get ready
for multilingual support. If we will have 95% of our data in English
as we currently do, and less than 5% in other foreign languages
including Japanese, it appears as if we would be best off using
codepage of 1208 and UTF-8. We are thinking we would need to expand
our 'char' and 'varchar' datatypes by four times to accommadate the
Japanese data. Using 'varchar' should minimize the database size
required to store our data, right? We would be minimizing storage
required for single byte English characters which are the majority of
our data. Can anyone validate or shed some further light on this?
Thanks, Tony

I've never really had anything to do with storing foreign character sets but
I've always understood that Japanese and other ideographic languages are
supposed to use the "graphic" datatypes, namely GRAPHIC, VARGRAPHIC, and
LONG VARGRAPHIC. These are designed for DBCS (Double Byte Character Set)
data and only use twice as much space as English characters, not four times
as much.

As I said, I've never really worked with Japanese, Korean, Thai or other
non-Latin languages and I'm not very current on the preferred ways of
handling them. It's quite possible that Unicode is the better way to handle
that sort of data today.

I think you should be able to find more information on the best ways of
handling foreign character sets, like Japanese, in the Information Center
for your version of DB2. The information center for DB2 Version 8 for
Unix/Linux/Windows can be found at
http://publib.boulder.ibm.com/infoce...w/v8/index.jsp. I search on
DBCS and came up with lots of hits that discussed DBCS, UTF-8, and other
approaches, including this chart:

Table 50. Japan, territory identifier: JP Code page Group Code set
Territory code Locale Operating system
932 D-1 IBM-932 81 Ja_JP AIX
943 D-1 IBM-943 81 Ja_JP AIX
See note 2.
954 D-1 IBM-eucJP 81 ja_JP AIX
1208 N-1 UTF-8 81 JA_JP AIX
930 D-1 IBM-930 81 - Host
939 D-1 IBM-939 81 - Host
5026 D-1 IBM-5026 81 - Host
5035 D-1 IBM-5035 81 - Host
1390 D-1 81 - Host
1399 D-1 81 - Host
954 D-1 eucJP 81 ja_JP.eucJP HP-UX
5039 D-1 SJIS 81 ja_JP.SJIS HP-UX
954 D-1 EUC-JP 81 ja_JP Linux
932 D-1 IBM-932 81 - OS/2
942 D-1 IBM-942 81 - OS/2
943 D-1 IBM-943 81 - OS/2
954 D-1 eucJP 81 ja SCO
954 D-1 eucJP 81 ja_JP SCO
954 D-1 eucJP 81 ja_JP.EUC SCO
954 D-1 eucJP 81 ja_JP.eucJP SCO
943 D-1 IBM-943 81 ja_JP.PCK Solaris
954 D-1 eucJP 81 ja Solaris
954 D-1 eucJP 81 japanese Solaris
1208 N-1 UTF-8 81 ja_JP.UTF-8 Solaris
943 D-1 IBM-943 81 - Windows
1394 D-1 81 -
See note 3.

These are the relevant notes for the table:
1..
2.. On AIX 4.3 or later the code page is 943. If you are using AIX 4.2 or
earlier, the code page is 932.
3.. Code page 1394 (Shift JIS X0213) can only be used with the load or
import utilities to move data from code page 1394 to a DB2 UDB Unicode
database, or to export from a DB2 UDB Unicode database to code page 1394.

If you search on terms like "DBCS", "Japanese", "Unicode", "UCS-2", and so
forth, you should find the best ways to store Japanese data.

--
Rhino
May 11 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: David Thomas | last post by:
Hi there, I am trying to store data in a text file and output it to the browser using PHP. All very easy - if I was using English! the problem is, I want to use Japanese and I'm finding it a tad...
7
by: Shelly | last post by:
Hi I am developing a web application for multi language support. But when I view in browser, all languages are shown except Japanese. Do I need to follow some conventions or special settings for...
1
by: GPenn | last post by:
SQL 2000, latest SP. We currently have the need to store data from a UTF-8 application in multiple languages in a single database. Our findings thus far support the fact that single-byte and...
8
by: Daniel | last post by:
I'm trying to make a site work for japanese characters. It works fine except for the alerts in javascript. The characters are stored in unicode, as this; 'コミック全巻配' Those unicode characters...
3
by: Mitchell Thomas | last post by:
I recently created a new database in Access 2002. I took data from an access 97 database converted one of the tables to access 2002 and then imported it into a new table in access 2002. but for...
21
by: Doug Lerner | last post by:
I'm working on a client/server app that seems to work fine in OS Firefox and Windows IE and Firefox. However, in OS X Safari, although the UI/communications themselves work fine, if the...
13
by: mike_dba | last post by:
I am having some data reject on an insert of Japanese characters to a Varchar column in my DB2 UTF-8 database. I am trying to understand how big to make the Varchar column for the inesert to work...
1
by: bjs | last post by:
I am using MS SQL Server 2000 and 2005, IIS 5,0 and ASP. I am able to display data that has been selected from a Unicode column in the database in all languages except Japanese (I get question...
0
by: abrahamvk | last post by:
Hi, How to load data into Japanese DB2 Database using DB2 Load Utility, where the table column names are in Japanese in windows environment. We could successfully load Japanese data into a table...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.