choosing a server codeset

Frank Swarbrick

Are there advantages to choosing, say, IBM-1252 over UTF-8? If my PC
application uses code page 1252 will it perform better because no code page
translation is required? I assume so. What type of performance hit might I
expect when connecting to a UTF-8 database? What advantages would I get by
using a UTF-8 database? Obviously it can store the entire Unicode 'plane'
(or whatever that's called), but if my PC can't display it anyway what do I
really care? And I guess that storing XML data requires UTF-8? But I don't
think we plan on utilizing this.

What else should we know to make our decision?

Thanks,
Frank

Jan 16 '08 #1

Subscribe Post Reply

2467

Dan van Ginhoven

Hi Frank!

If the database contains national characters other than A-Z, a-z, using
UTF-8, a table column declared as Char(8) will
have room for 4-8 characters, since Characters lika ÅÄÖÉÜ takes 2 bytes in
UTF-8. If you don't work with multiple national languages go for a character
set that suits your situation. If you need to work with XML-data put them in
separate database.
/dg

"Frank Swarbrick" <Fr*************@efirstbank.comwrote in message
news:47******************@efirstbank.com...

Are there advantages to choosing, say, IBM-1252 over UTF-8? If my PC
application uses code page 1252 will it perform better because no code

page

translation is required? I assume so. What type of performance hit might

expect when connecting to a UTF-8 database? What advantages would I get

using a UTF-8 database? Obviously it can store the entire Unicode 'plane'
(or whatever that's called), but if my PC can't display it anyway what do

really care? And I guess that storing XML data requires UTF-8? But I

don't

think we plan on utilizing this.

What else should we know to make our decision?

Thanks,
Frank

Jan 16 '08 #2

Colin Booth

Frank Swarbrick wrote:

Are there advantages to choosing, say, IBM-1252 over UTF-8? If my PC
application uses code page 1252 will it perform better because no code
page
translation is required? I assume so. What type of performance hit might
I
expect when connecting to a UTF-8 database? What advantages would I get
by
using a UTF-8 database? Obviously it can store the entire Unicode 'plane'
(or whatever that's called), but if my PC can't display it anyway what do
I
really care? And I guess that storing XML data requires UTF-8? But I
don't think we plan on utilizing this.

What else should we know to make our decision?

Thanks,
Frank

Hi

Some characters that may be single byte in 1252 are mult-byte in UTF-8. With
a standard UK keyboard I think that there are 3 or 4 characters that are
multi-byte in UTF-8.

I like and prefere UTF-8 but the applications must coded for UTF-8. E.g. if
you have an 8 byte character column and an 8 byte (1252) entry field and
fill the entry field using at least 1 of the UTF-8 multibyte characters you
will get a data truncation error. Also you need to be careful about the
number of characters in a column as the byte count is not necessarily the
character count.

Things are becoming much more global. I have moved to France but still have
some accounts and investments in the UK. I also purchase some things from
the UK and my address contans accents
Colin

Jan 16 '08 #3

Frank Swarbrick

>>On 1/16/2008 at 3:40 PM, in message <fm**********@news.tiscali.fr>,
Colin
Booth<co*********@gmail.comwrote:

Frank Swarbrick wrote:

>Are there advantages to choosing, say, IBM-1252 over UTF-8? If my PC
application uses code page 1252 will it perform better because no code
page
translation is required? I assume so. What type of performance hit
might
>I
expect when connecting to a UTF-8 database? What advantages would I get
by
using a UTF-8 database? Obviously it can store the entire Unicode
'plane'
>(or whatever that's called), but if my PC can't display it anyway what
do
>I
really care? And I guess that storing XML data requires UTF-8? But I
don't think we plan on utilizing this.

What else should we know to make our decision?

Thanks,
Frank

Hi

Some characters that may be single byte in 1252 are mult-byte in UTF-8.
With
a standard UK keyboard I think that there are 3 or 4 characters that are
multi-byte in UTF-8.

I like and prefere UTF-8 but the applications must coded for UTF-8. E.g.
if
you have an 8 byte character column and an 8 byte (1252) entry field and
fill the entry field using at least 1 of the UTF-8 multibyte characters
you
will get a data truncation error. Also you need to be careful about the
number of characters in a column as the byte count is not necessarily
the
character count.

Things are becoming much more global. I have moved to France but still
have
some accounts and investments in the UK. I also purchase some things
from
the UK and my address contans accents

I question your comment "the applications must coded for UTF-8". I just
wrote an OpenCobol application with imbedded DB2. No special "UTF-8"
coding, whatever that might mean. All it does is connect to the database,
retrieve the "string" and "hex" values of a set of VARCHAR(25) columns, and
displays those values.

I run this against two databases:
TEST1 is a database defined as codeset IBM-1252.
UTFDB is a database defined as codeset UTF-8.

Here are the results:

CONNECT TO test1
5B544553545D
+0006: [TEST]
7C544553547C
+0006: |TEST|
A654455354A6
+0006: Â¦TESTÂ¦
80
+0001: â‚¬

CONNECT TO utfdb
5B544553545D
+0006: [TEST]
7C544553547C
+0006: |TEST|
C2A654455354C2A6
+0006: Â¦TESTÂ¦
E282AC
+0001: â‚¬

(+0001: â‚¬ <== that actually shows as the euro symbol in Notepad.)

As you can see, for the UTF-8 database the euro symbol was stored as
x'E282AC'. But since my application used code page 1252 DB2 was smart
enough to translate it to x'80', which is the value for euro in code page
1252.

Now of course when there is a symbol that exists in UTF-8 and not in 1252
then there will be a problem.

I guess your point is, and it's a good one, that if a CHAR or VARCHAR column
is defined in a UTF-8 database then you, in a sense, have to "over define"
the length to take in to account the possibility of multi-byte characters?
For instance, a 1 character field that could possibly contain a multi-byte
UTF-8 character (such as the euro symbol) would have to be defined in the
database as, say, CHAR(3).

This does bring to mind a question I have been pondering. Is there any harm
in defining 'string' fields to be much larger than the largest string length
that you would ever expect? Like an address line. It might be 50 or so
characters. Is there harm in defining it as VARCHAR(250) or even
VARCHAR(32000)? Does it waste space or any other resource?

Thanks for your help.

Frank

Jan 18 '08 #4

by: Nuff Said | last post by:

When I type the following code in the interactive python shell, I get 'UTF-8'; but if I put the code into a Python script and run the script - in the same terminal on my Linux box in which I...

Python

Choosing DB Edition (Std vs Ent)

by: Jonas Hei | last post by:

I need to decided between Standard and Enterprise Edition (Cost is a criteria - but its secondary to performance - ) The server spec under consideration:...

Microsoft SQL Server

Price List for MOM & POP when choosing between MySql/PHP and .NET

by: nospam | last post by:

DEAR MICROSOFT: WOULD YOU PLEASE put up a web page showing the price list of EXPECTED COSTS for MOM & POP when choosing between MySql/PHP and .NET. FIRST: Show INITIAL COSTS for like a 5-10...

.NET Framework

Choosing Exception objects

by: Ant | last post by:

Hi, This might seem like a strange question but I'm wondering how other developers go about choosing the appropriate Exception objects to use in their catch statements. Currently, I choose them...

C# / C Sharp

Pls HELP...CONFUSION in Choosing JOB!!

by: Madi | last post by:

Dear all, Im in a confusion about choosing a job offer.Right now im working in ..Net 3.0 components(Workflows,WCF),asp.net 2.0 webparts and all.Im having a good exposure here but with a very...

.NET Framework

Help Required for Choosing Programming Language

by: ifti_crazy | last post by:

I am VB6 programmer and wants to start new programming language but i am unable to deciced. i have read about Python, Ruby and Visual C++. but i want to go through with GUI based programming...

Python

Choosing a host based on their PHP "security" measures

by: hansBKK | last post by:

Upfront disclaimer - I am a relative newbie, just starting out learning about PHP, mostly by researching, installing and playing with different scripts. I am looking for a host that will provide...

PHP

Server-Client application problem

by: vijayakumar | last post by:

hi all I'm beginner in CORBA Server-Client application development. My server- client application was worked well and i have tested it too. Due to some Network problem we have rebooted our...

C / C++

Property for setting codepage or codeset in JDBC

by: caesarkim | last post by:

I need to connect to the db (created with "IBM-943" codeset) on DB2 AIX . I am having a problem retrieving data with japanese character in 'where' clause something like this. SELECT * FROM...

DB2 Database

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

choosing a server codeset

Similar topics