Storing some Japanese data.

tony.pahl

We are converting a data warehouse to a Unicode database to get ready
for multilingual support. If we will have 95% of our data in English
as we currently do, and less than 5% in other foreign languages
including Japanese, it appears as if we would be best off using
codepage of 1208 and UTF-8. We are thinking we would need to expand
our 'char' and 'varchar' datatypes by four times to accommadate the
Japanese data. Using 'varchar' should minimize the database size
required to store our data, right? We would be minimizing storage
required for single byte English characters which are the majority of
our data. Can anyone validate or shed some further light on this?
Thanks, Tony

May 10 '06 #1

Subscribe Post Reply

3004

Rhino

<to*******@gmail.com> wrote in message
news:11*********************@e56g2000cwe.googlegro ups.com...

We are converting a data warehouse to a Unicode database to get ready
for multilingual support. If we will have 95% of our data in English
as we currently do, and less than 5% in other foreign languages
including Japanese, it appears as if we would be best off using
codepage of 1208 and UTF-8. We are thinking we would need to expand
our 'char' and 'varchar' datatypes by four times to accommadate the
Japanese data. Using 'varchar' should minimize the database size
required to store our data, right? We would be minimizing storage
required for single byte English characters which are the majority of
our data. Can anyone validate or shed some further light on this?
Thanks, Tony

I've never really had anything to do with storing foreign character sets but
I've always understood that Japanese and other ideographic languages are
supposed to use the "graphic" datatypes, namely GRAPHIC, VARGRAPHIC, and
LONG VARGRAPHIC. These are designed for DBCS (Double Byte Character Set)
data and only use twice as much space as English characters, not four times
as much.

As I said, I've never really worked with Japanese, Korean, Thai or other
non-Latin languages and I'm not very current on the preferred ways of
handling them. It's quite possible that Unicode is the better way to handle
that sort of data today.

I think you should be able to find more information on the best ways of
handling foreign character sets, like Japanese, in the Information Center
for your version of DB2. The information center for DB2 Version 8 for
Unix/Linux/Windows can be found at
http://publib.boulder.ibm.com/infoce...w/v8/index.jsp. I search on
DBCS and came up with lots of hits that discussed DBCS, UTF-8, and other
approaches, including this chart:

Table 50. Japan, territory identifier: JP Code page Group Code set
Territory code Locale Operating system
932 D-1 IBM-932 81 Ja_JP AIX
943 D-1 IBM-943 81 Ja_JP AIX
See note 2.
954 D-1 IBM-eucJP 81 ja_JP AIX
1208 N-1 UTF-8 81 JA_JP AIX
930 D-1 IBM-930 81 - Host
939 D-1 IBM-939 81 - Host
5026 D-1 IBM-5026 81 - Host
5035 D-1 IBM-5035 81 - Host
1390 D-1 81 - Host
1399 D-1 81 - Host
954 D-1 eucJP 81 ja_JP.eucJP HP-UX
5039 D-1 SJIS 81 ja_JP.SJIS HP-UX
954 D-1 EUC-JP 81 ja_JP Linux
932 D-1 IBM-932 81 - OS/2
942 D-1 IBM-942 81 - OS/2
943 D-1 IBM-943 81 - OS/2
954 D-1 eucJP 81 ja SCO
954 D-1 eucJP 81 ja_JP SCO
954 D-1 eucJP 81 ja_JP.EUC SCO
954 D-1 eucJP 81 ja_JP.eucJP SCO
943 D-1 IBM-943 81 ja_JP.PCK Solaris
954 D-1 eucJP 81 ja Solaris
954 D-1 eucJP 81 japanese Solaris
1208 N-1 UTF-8 81 ja_JP.UTF-8 Solaris
943 D-1 IBM-943 81 - Windows
1394 D-1 81 -
See note 3.

These are the relevant notes for the table:
1..
2.. On AIX 4.3 or later the code page is 943. If you are using AIX 4.2 or
earlier, the code page is 932.
3.. Code page 1394 (Shift JIS X0213) can only be used with the load or
import utilities to move data from code page 1394 to a DB2 UDB Unicode
database, or to export from a DB2 UDB Unicode database to code page 1394.

If you search on terms like "DBCS", "Japanese", "Unicode", "UCS-2", and so
forth, you should find the best ways to store Japanese data.

--
Rhino

May 11 '06 #2

by: David Thomas | last post by:

Hi there, I am trying to store data in a text file and output it to the browser using PHP. All very easy - if I was using English! the problem is, I want to use Japanese and I'm finding it a tad...

PHP

Japanese in browser??

by: Shelly | last post by:

Hi I am developing a web application for multi language support. But when I view in browser, all languages are shown except Japanese. Do I need to follow some conventions or special settings for...

ASP / Active Server Pages

Chinese and Japanese characters in same colation

by: GPenn | last post by:

SQL 2000, latest SP. We currently have the need to store data from a UTF-8 application in multiple languages in a single database. Our findings thus far support the fact that single-byte and...

Microsoft SQL Server

Storing japanese characters

by: Daniel | last post by:

I'm trying to make a site work for japanese characters. It works fine except for the alerts in javascript. The characters are stored in unicode, as this; 'コミック全巻配' Those unicode characters...

Microsoft SQL Server

Access 97 to XP error Japanese Characters

by: Mitchell Thomas | last post by:

I recently created a new database in Access 2002. I took data from an access 97 database converted one of the tables to access 2002 and then imported it into a new table in access 2002. but for...

Microsoft Access / VBA

Prototype, Safari and Japanese problems?

by: Doug Lerner | last post by:

I'm working on a client/server app that seems to work fine in OS Firefox and Windows IE and Firefox. However, in OS X Safari, although the UI/communications themselves work fine, if the...

Javascript

Storing Japanese characters in UTF-8 database

by: mike_dba | last post by:

I am having some data reject on an insert of Japanese characters to a Varchar column in my DB2 UTF-8 database. I am trying to understand how big to make the Varchar column for the inesert to work...

DB2 Database

Reading and Displaying Japanese data in MS SQL Server using ASP

by: bjs | last post by:

I am using MS SQL Server 2000 and 2005, IIS 5,0 and ASP. I am able to display data that has been selected from a Unicode column in the database in all languages except Japanese (I get question...

ASP / Active Server Pages

How to load data using DB2 Load Utility into Japanese database

by: abrahamvk | last post by:

Hi, How to load data into Japanese DB2 Database using DB2 Load Utility, where the table column names are in Japanese in windows environment. We could successfully load Japanese data into a table...

DB2 Database

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Storing some Japanese data.

Similar topics