469,944 Members | 2,344 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,944 developers. It's quick & easy.

Chinese and Japanese characters in same colation

SQL 2000, latest SP. We currently have the need to store data from a
UTF-8 application in multiple languages in a single database.

Our findings thus far support the fact that single-byte and
double-byte characters can be held in the same DB without issue.
However, when holding two sets of DIFFERING double-byte characters
(i.e. Chinese and Japanese) there are issues.

Since Japanese has a superset of both Kanji and Katakana characters
it's our theory that the Japanese collations will hold Chinese as well
(Mandarin).

1) Has anybody tried to store multiple languages in the same db? What
collation was used?

2) Is it possible to change collation by table?

3) Which collation of Japanese should be used for best multibyte,
UTF-8 character sets? Currently we're testing with Japanese_CI_AS
(encoding MS932).

Any and all responses appreciated,

ga**@shimanoweb.com
Jul 20 '05 #1
1 11575
GPenn (gb****@yahoo.com) writes:
SQL 2000, latest SP. We currently have the need to store data from a
UTF-8 application in multiple languages in a single database.
You cannot store UTF-8 data in an SQL Server database. But UTF-8 is
just an encoding form of Unicode, and in SQL Server you store Unicode
data as UTF-16.
Since Japanese has a superset of both Kanji and Katakana characters
it's our theory that the Japanese collations will hold Chinese as well
(Mandarin).
Yes, Unicode unifies the Japanese and Chinese ideographs. The idea is
that if they look different, that is a font and presentation issue.
1) Has anybody tried to store multiple languages in the same db? What
collation was used?

2) Is it possible to change collation by table?
In SQL Server you can have different collations on different columns,
so you could have

chinese_text nvarchar(23) COLLATE <some Chinese collation>
japanese_text nvarchar(23) COLLATE Japanese_xx_xx

Then whether this is a good idea, depends on your application.
3) Which collation of Japanese should be used for best multibyte,
UTF-8 character sets? Currently we're testing with Japanese_CI_AS
(encoding MS932).


That is defintely not my field of expertise, but beware that there
are also Width and Kana-sensitive variations.

--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

8 posts views Thread by pabv | last post: by
reply views Thread by Shrek | last post: by
12 posts views Thread by Steve Howell | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.