473,406 Members | 2,956 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

choosing a server codeset

Are there advantages to choosing, say, IBM-1252 over UTF-8? If my PC
application uses code page 1252 will it perform better because no code page
translation is required? I assume so. What type of performance hit might I
expect when connecting to a UTF-8 database? What advantages would I get by
using a UTF-8 database? Obviously it can store the entire Unicode 'plane'
(or whatever that's called), but if my PC can't display it anyway what do I
really care? And I guess that storing XML data requires UTF-8? But I don't
think we plan on utilizing this.

What else should we know to make our decision?

Thanks,
Frank

Jan 16 '08 #1
3 2467
Hi Frank!

If the database contains national characters other than A-Z, a-z, using
UTF-8, a table column declared as Char(8) will
have room for 4-8 characters, since Characters lika ÅÄÖÉÜ takes 2 bytes in
UTF-8. If you don't work with multiple national languages go for a character
set that suits your situation. If you need to work with XML-data put them in
separate database.
/dg

"Frank Swarbrick" <Fr*************@efirstbank.comwrote in message
news:47******************@efirstbank.com...
Are there advantages to choosing, say, IBM-1252 over UTF-8? If my PC
application uses code page 1252 will it perform better because no code
page
translation is required? I assume so. What type of performance hit might
I
expect when connecting to a UTF-8 database? What advantages would I get
by
using a UTF-8 database? Obviously it can store the entire Unicode 'plane'
(or whatever that's called), but if my PC can't display it anyway what do
I
really care? And I guess that storing XML data requires UTF-8? But I
don't
think we plan on utilizing this.

What else should we know to make our decision?

Thanks,
Frank

Jan 16 '08 #2
Frank Swarbrick wrote:
Are there advantages to choosing, say, IBM-1252 over UTF-8? If my PC
application uses code page 1252 will it perform better because no code
page
translation is required? I assume so. What type of performance hit might
I
expect when connecting to a UTF-8 database? What advantages would I get
by
using a UTF-8 database? Obviously it can store the entire Unicode 'plane'
(or whatever that's called), but if my PC can't display it anyway what do
I
really care? And I guess that storing XML data requires UTF-8? But I
don't think we plan on utilizing this.

What else should we know to make our decision?

Thanks,
Frank
Hi

Some characters that may be single byte in 1252 are mult-byte in UTF-8. With
a standard UK keyboard I think that there are 3 or 4 characters that are
multi-byte in UTF-8.

I like and prefere UTF-8 but the applications must coded for UTF-8. E.g. if
you have an 8 byte character column and an 8 byte (1252) entry field and
fill the entry field using at least 1 of the UTF-8 multibyte characters you
will get a data truncation error. Also you need to be careful about the
number of characters in a column as the byte count is not necessarily the
character count.

Things are becoming much more global. I have moved to France but still have
some accounts and investments in the UK. I also purchase some things from
the UK and my address contans accents
Colin
Jan 16 '08 #3
>>On 1/16/2008 at 3:40 PM, in message <fm**********@news.tiscali.fr>,
Colin
Booth<co*********@gmail.comwrote:
Frank Swarbrick wrote:
>Are there advantages to choosing, say, IBM-1252 over UTF-8? If my PC
application uses code page 1252 will it perform better because no code
page
translation is required? I assume so. What type of performance hit
might
>I
expect when connecting to a UTF-8 database? What advantages would I get
by
using a UTF-8 database? Obviously it can store the entire Unicode
'plane'
>(or whatever that's called), but if my PC can't display it anyway what
do
>I
really care? And I guess that storing XML data requires UTF-8? But I
don't think we plan on utilizing this.

What else should we know to make our decision?

Thanks,
Frank

Hi

Some characters that may be single byte in 1252 are mult-byte in UTF-8.
With
a standard UK keyboard I think that there are 3 or 4 characters that are
multi-byte in UTF-8.

I like and prefere UTF-8 but the applications must coded for UTF-8. E.g.
if
you have an 8 byte character column and an 8 byte (1252) entry field and
fill the entry field using at least 1 of the UTF-8 multibyte characters
you
will get a data truncation error. Also you need to be careful about the
number of characters in a column as the byte count is not necessarily
the
character count.

Things are becoming much more global. I have moved to France but still
have
some accounts and investments in the UK. I also purchase some things
from
the UK and my address contans accents
I question your comment "the applications must coded for UTF-8". I just
wrote an OpenCobol application with imbedded DB2. No special "UTF-8"
coding, whatever that might mean. All it does is connect to the database,
retrieve the "string" and "hex" values of a set of VARCHAR(25) columns, and
displays those values.

I run this against two databases:
TEST1 is a database defined as codeset IBM-1252.
UTFDB is a database defined as codeset UTF-8.

Here are the results:

CONNECT TO test1
5B544553545D
+0006: [TEST]
7C544553547C
+0006: |TEST|
A654455354A6
+0006: ¦TEST¦
80
+0001: €

CONNECT TO utfdb
5B544553545D
+0006: [TEST]
7C544553547C
+0006: |TEST|
C2A654455354C2A6
+0006: ¦TEST¦
E282AC
+0001: €

(+0001: € <== that actually shows as the euro symbol in Notepad.)

As you can see, for the UTF-8 database the euro symbol was stored as
x'E282AC'. But since my application used code page 1252 DB2 was smart
enough to translate it to x'80', which is the value for euro in code page
1252.

Now of course when there is a symbol that exists in UTF-8 and not in 1252
then there will be a problem.

I guess your point is, and it's a good one, that if a CHAR or VARCHAR column
is defined in a UTF-8 database then you, in a sense, have to "over define"
the length to take in to account the possibility of multi-byte characters?
For instance, a 1 character field that could possibly contain a multi-byte
UTF-8 character (such as the euro symbol) would have to be defined in the
database as, say, CHAR(3).

This does bring to mind a question I have been pondering. Is there any harm
in defining 'string' fields to be much larger than the largest string length
that you would ever expect? Like an address line. It might be 50 or so
characters. Is there harm in defining it as VARCHAR(250) or even
VARCHAR(32000)? Does it waste space or any other resource?

Thanks for your help.

Frank
Jan 18 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Nuff Said | last post by:
When I type the following code in the interactive python shell, I get 'UTF-8'; but if I put the code into a Python script and run the script - in the same terminal on my Linux box in which I...
4
by: Jonas Hei | last post by:
I need to decided between Standard and Enterprise Edition (Cost is a criteria - but its secondary to performance - <!--and I am not paying for it myself-->) The server spec under consideration:...
13
by: nospam | last post by:
DEAR MICROSOFT: WOULD YOU PLEASE put up a web page showing the price list of EXPECTED COSTS for MOM & POP when choosing between MySql/PHP and .NET. FIRST: Show INITIAL COSTS for like a 5-10...
15
by: Ant | last post by:
Hi, This might seem like a strange question but I'm wondering how other developers go about choosing the appropriate Exception objects to use in their catch statements. Currently, I choose them...
4
by: Madi | last post by:
Dear all, Im in a confusion about choosing a job offer.Right now im working in ..Net 3.0 components(Workflows,WCF),asp.net 2.0 webparts and all.Im having a good exposure here but with a very...
38
by: ifti_crazy | last post by:
I am VB6 programmer and wants to start new programming language but i am unable to deciced. i have read about Python, Ruby and Visual C++. but i want to go through with GUI based programming...
19
by: hansBKK | last post by:
Upfront disclaimer - I am a relative newbie, just starting out learning about PHP, mostly by researching, installing and playing with different scripts. I am looking for a host that will provide...
1
by: vijayakumar | last post by:
hi all I'm beginner in CORBA Server-Client application development. My server- client application was worked well and i have tested it too. Due to some Network problem we have rebooted our...
0
by: caesarkim | last post by:
I need to connect to the db (created with "IBM-943" codeset) on DB2 AIX . I am having a problem retrieving data with japanese character in 'where' clause something like this. SELECT * FROM...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.