Char in US Code page vs Char in UTF-8

stevenkblack

I have a question about moving to UTF-8. We will have to include
Double byte characters in our database so I created a new database that
was UTF-8. I moved the table sturcture over and am using VARGRPHIC in
some places. Othere tables do not need to change so I moved them over
as is. I have data that is being tossed because it says it is too
long.

Specifically the string 'HÄMÄLOUT' is being thrown out of a CHAR(9)
column that was pulled out of a US code page database with the exact
same structure.

Why would a UTF-8 database see these Ä characters as more than one
character.

Nov 12 '05 #1

Subscribe Reply

1785

Phil Sherman

The "standard" ascii character set (hex values of 127 and below) is
represented by single byte characters. Your string contains two
characters that lie above 128x and are represented by 16 bit characters
in UTF-8. Its length is 6+(2*2)=10, one byte too large. If you shorten
the string by 1 byte and try the insert, it should work. The actual
character displayed by the high ascii codes is dependent on the codepage
in use and can also be modified by altering the character generator
table in your video card. (I recall doing this many years ago.) UDB will
convert 16 bit characters to an "appropriate" single byte character
using your codepage to figure out what the translation should be.

As an aside, you may see strange things happen if you use db2look to
extract statistics and try to use the generated SQL to move the
statistics to another table. I ran into this on 8.1 when a column was
defined as vargraphic and the HIGH2KEY, LOW2KEY columns are defined as
graphic.

Phil Sherman

st**********@gmail.com wrote:

I have a question about moving to UTF-8. We will have to include
Double byte characters in our database so I created a new database that
was UTF-8. I moved the table sturcture over and am using VARGRPHIC in
some places. Othere tables do not need to change so I moved them over
as is. I have data that is being tossed because it says it is too
long.

Specifically the string 'HÄMÄLOUT' is being thrown out of a CHAR(9)
column that was pulled out of a US code page database with the exact
same structure.

Why would a UTF-8 database see these Ä characters as more than one
character.

Nov 12 '05 #2

Juliane via DBMonster.com

Steven,

I have seen this a few times also in other languages.
The problem is, that the number in char(9) does not say how many chars you
can have but how many bytes.
The weird thing is that this is - afaik - SQL standart, and works as designed.

Anyway, we did open a DCR but I am not sure any change will take place.
So for now I am afraid we just have to live with it. Moving to UTF8 can cause
trouble when moving from a local code page having 16bit characters.

Juliane
--
Message posted via http://www.dbmonster.com

Nov 12 '05 #3

Similar topics

what char-set to use in my case ?

by: siliconmike | last post by:

Somewhere there is a column called blogs - type TEXT with a FULLTEXT index. Most entries would be in English, but few might be in any other language. Now, if I make it UTF8, it will be a...

MySQL Database

UTF-8 in char*

by: Jacky Cheung | last post by:

Hi, I am developing a vCard application which have to support UTF-8. Does the UTF-8 in char* will crash the strlen, I mean does UTF-8 have some char which treat as NULL character in strlen? ...

C / C++

Server.HTMLDecode doesn't decode char 345

by: Jonas Åkermark | last post by:

Hello! I have a problem decoding the czech character: r The HTML code for this character is ř but when running Server.HTMLdecode on that string it just returns ř instead of the real char....

ASP.NET

How to convert char* to WCHAR* in C

by: sweety | last post by:

Dear all, Kindy help to convert the char* ( LPCSTR) to WCHAR*(LPCWSTR). Would be great if you tell if any function does this job in VC++. Quick response will be greatfull...as its blocked... ...

C / C++

How should I handle the multibyte char set string in C++?

by: Dancefire | last post by:

Hi, everyone, I'm writing a program using wstring(wchar_t) as internal string. The problem is raised when I convert the multibyte char set string with different encoding to wstring(which is...

C / C++

I want unsigned char * string literals

by: Michael B Allen | last post by:

Hello, Early on I decided that all text (what most people call "strings" ) in my code would be unsigned char *. The reasoning is that the elements of these arrays are decidedly not signed. In...

C / C++

HTMLEncode: low surrogate char Error

by: Alexander Higgins | last post by:

>>Thanks for the response.... Point Taken but this is not the case. Thus, if a person writes a text file on her or his computer and does not use UNICODE to save it, the current code page is...

.NET Framework

Re: How to convert unicode string to unsigned char *

by: Simon Posnjak | last post by:

On Mon, May 5, 2008 at 4:16 PM, Jean-Paul Calderone <exarkun@divmod.comwrote: some_module.some_thing(the_string) function is a swig generated function from a C lib. The C lib function expects...

Python

Using wchar_t instead of char

by: Michael Brennan | last post by:

I guess this question only applies to programming applications for UNIX, Windows and similiar. If one develops something for an embedded system I can understand that wchar_t would be unnecessary. ...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

Networking - Hardware / Configuration

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp