By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,851 Members | 1,668 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,851 IT Pros & Developers. It's quick & easy.

Char in US Code page vs Char in UTF-8

P: n/a
I have a question about moving to UTF-8. We will have to include
Double byte characters in our database so I created a new database that
was UTF-8. I moved the table sturcture over and am using VARGRPHIC in
some places. Othere tables do not need to change so I moved them over
as is. I have data that is being tossed because it says it is too
long.

Specifically the string 'HÄMÄLOUT' is being thrown out of a CHAR(9)
column that was pulled out of a US code page database with the exact
same structure.

Why would a UTF-8 database see these Ä characters as more than one
character.

Nov 12 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
The "standard" ascii character set (hex values of 127 and below) is
represented by single byte characters. Your string contains two
characters that lie above 128x and are represented by 16 bit characters
in UTF-8. Its length is 6+(2*2)=10, one byte too large. If you shorten
the string by 1 byte and try the insert, it should work. The actual
character displayed by the high ascii codes is dependent on the codepage
in use and can also be modified by altering the character generator
table in your video card. (I recall doing this many years ago.) UDB will
convert 16 bit characters to an "appropriate" single byte character
using your codepage to figure out what the translation should be.

As an aside, you may see strange things happen if you use db2look to
extract statistics and try to use the generated SQL to move the
statistics to another table. I ran into this on 8.1 when a column was
defined as vargraphic and the HIGH2KEY, LOW2KEY columns are defined as
graphic.

Phil Sherman

st**********@gmail.com wrote:
I have a question about moving to UTF-8. We will have to include
Double byte characters in our database so I created a new database that
was UTF-8. I moved the table sturcture over and am using VARGRPHIC in
some places. Othere tables do not need to change so I moved them over
as is. I have data that is being tossed because it says it is too
long.

Specifically the string 'HÄMÄLOUT' is being thrown out of a CHAR(9)
column that was pulled out of a US code page database with the exact
same structure.

Why would a UTF-8 database see these Ä characters as more than one
character.

Nov 12 '05 #2

P: n/a

Steven,

I have seen this a few times also in other languages.
The problem is, that the number in char(9) does not say how many chars you
can have but how many bytes.
The weird thing is that this is - afaik - SQL standart, and works as designed.

Anyway, we did open a DCR but I am not sure any change will take place.
So for now I am afraid we just have to live with it. Moving to UTF8 can cause
trouble when moving from a local code page having 16bit characters.

Juliane
--
Message posted via http://www.dbmonster.com
Nov 12 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.