473,464 Members | 1,702 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Char in US Code page vs Char in UTF-8

I have a question about moving to UTF-8. We will have to include
Double byte characters in our database so I created a new database that
was UTF-8. I moved the table sturcture over and am using VARGRPHIC in
some places. Othere tables do not need to change so I moved them over
as is. I have data that is being tossed because it says it is too
long.

Specifically the string 'HÄMÄLOUT' is being thrown out of a CHAR(9)
column that was pulled out of a US code page database with the exact
same structure.

Why would a UTF-8 database see these Ä characters as more than one
character.

Nov 12 '05 #1
2 1785
The "standard" ascii character set (hex values of 127 and below) is
represented by single byte characters. Your string contains two
characters that lie above 128x and are represented by 16 bit characters
in UTF-8. Its length is 6+(2*2)=10, one byte too large. If you shorten
the string by 1 byte and try the insert, it should work. The actual
character displayed by the high ascii codes is dependent on the codepage
in use and can also be modified by altering the character generator
table in your video card. (I recall doing this many years ago.) UDB will
convert 16 bit characters to an "appropriate" single byte character
using your codepage to figure out what the translation should be.

As an aside, you may see strange things happen if you use db2look to
extract statistics and try to use the generated SQL to move the
statistics to another table. I ran into this on 8.1 when a column was
defined as vargraphic and the HIGH2KEY, LOW2KEY columns are defined as
graphic.

Phil Sherman

st**********@gmail.com wrote:
I have a question about moving to UTF-8. We will have to include
Double byte characters in our database so I created a new database that
was UTF-8. I moved the table sturcture over and am using VARGRPHIC in
some places. Othere tables do not need to change so I moved them over
as is. I have data that is being tossed because it says it is too
long.

Specifically the string 'HÄMÄLOUT' is being thrown out of a CHAR(9)
column that was pulled out of a US code page database with the exact
same structure.

Why would a UTF-8 database see these Ä characters as more than one
character.

Nov 12 '05 #2

Steven,

I have seen this a few times also in other languages.
The problem is, that the number in char(9) does not say how many chars you
can have but how many bytes.
The weird thing is that this is - afaik - SQL standart, and works as designed.

Anyway, we did open a DCR but I am not sure any change will take place.
So for now I am afraid we just have to live with it. Moving to UTF8 can cause
trouble when moving from a local code page having 16bit characters.

Juliane
--
Message posted via http://www.dbmonster.com
Nov 12 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: siliconmike | last post by:
Somewhere there is a column called blogs - type TEXT with a FULLTEXT index. Most entries would be in English, but few might be in any other language. Now, if I make it UTF8, it will be a...
20
by: Jacky Cheung | last post by:
Hi, I am developing a vCard application which have to support UTF-8. Does the UTF-8 in char* will crash the strlen, I mean does UTF-8 have some char which treat as NULL character in strlen? ...
5
by: Jonas Åkermark | last post by:
Hello! I have a problem decoding the czech character: r The HTML code for this character is ř but when running Server.HTMLdecode on that string it just returns ř instead of the real char....
4
by: sweety | last post by:
Dear all, Kindy help to convert the char* ( LPCSTR) to WCHAR*(LPCWSTR). Would be great if you tell if any function does this job in VC++. Quick response will be greatfull...as its blocked... ...
10
by: Dancefire | last post by:
Hi, everyone, I'm writing a program using wstring(wchar_t) as internal string. The problem is raised when I convert the multibyte char set string with different encoding to wstring(which is...
33
by: Michael B Allen | last post by:
Hello, Early on I decided that all text (what most people call "strings" ) in my code would be unsigned char *. The reasoning is that the elements of these arrays are decidedly not signed. In...
1
by: Alexander Higgins | last post by:
>>Thanks for the response.... Point Taken but this is not the case. Thus, if a person writes a text file on her or his computer and does not use UNICODE to save it, the current code page is...
1
by: Simon Posnjak | last post by:
On Mon, May 5, 2008 at 4:16 PM, Jean-Paul Calderone <exarkun@divmod.comwrote: some_module.some_thing(the_string) function is a swig generated function from a C lib. The C lib function expects...
16
by: Michael Brennan | last post by:
I guess this question only applies to programming applications for UNIX, Windows and similiar. If one develops something for an embedded system I can understand that wchar_t would be unnecessary. ...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.