473,406 Members | 2,345 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

DB2 Universal Language support

Greetings,

We are building an application written for Windows in C++ which uses
OLEDB to connect to AIX DB2 8.2. Our app stores all string data in
the wchar_t datatype, which generates dynamic SQL, typically with
bound parameters DBTYPE_WSTR, and so is a Unicode app.

We don't know whether to use the vargraphic datatype for storing
strings, or varchar, and which database character sets to support.

Option a - support only a UTF-8 database and use vargraphics.
Performance should be better since the app stores Unicode strings in
Unicode columns (UCS-2). However from a product perspective this is
limiting because customers may want to use, say IBM-1252, because they
want to! But they could be told that UTF-8 will store their data just
fine. Yes, we could also use vargraphics in any mbcs, but there are
conversion issues (control char's convert to SUB's in IBM-943).

Option b - support any database character set, and use varchar string
columns. This requires us to use the multiplier factor (e.g. 3 or 4)
for the column length (to support asian lang's when UTF-8 is chosen),
which heavily devours the limited rowsize in DB2 (you get deducted on
column creation time from the pagesize). Also going with a large
pagesize like 32K may hurt performance. Yes, we could make our code
choose whether to multiply or not depending on the language, but for
simplicity we don't. The UTF-8 is nice because it doesn't store a lot
of extra bytes as for vargraphics, when ascii is primarily stored
(saves disk space).

Our concern is that we support the most popular character set that
real people use. What is most prevalent? If we choose option a, will
there be a customer that balks because they want IBM-943?
Specifically: Will customers be perfectly happy with option a, or
will some demand to use other mbcs's (such as IBM-943), or would they
prefer to have the varchar's?

Thanks,
Kieran
Nov 12 '05 #1
3 7365
vargraphic is for DBCS, not Unicode. Use a Unicode databases with
(var)chars. (Var)chars in a Unicode db are actually ucs-2, although it
appears counterintuitive!

"Kieran Green" <ki***********@yahoo.com> wrote in message
news:ed*************************@posting.google.co m...
Greetings,

We are building an application written for Windows in C++ which uses
OLEDB to connect to AIX DB2 8.2. Our app stores all string data in
the wchar_t datatype, which generates dynamic SQL, typically with
bound parameters DBTYPE_WSTR, and so is a Unicode app.

We don't know whether to use the vargraphic datatype for storing
strings, or varchar, and which database character sets to support.

Option a - support only a UTF-8 database and use vargraphics.
Performance should be better since the app stores Unicode strings in
Unicode columns (UCS-2). However from a product perspective this is
limiting because customers may want to use, say IBM-1252, because they
want to! But they could be told that UTF-8 will store their data just
fine. Yes, we could also use vargraphics in any mbcs, but there are
conversion issues (control char's convert to SUB's in IBM-943).

Option b - support any database character set, and use varchar string
columns. This requires us to use the multiplier factor (e.g. 3 or 4)
for the column length (to support asian lang's when UTF-8 is chosen),
which heavily devours the limited rowsize in DB2 (you get deducted on
column creation time from the pagesize). Also going with a large
pagesize like 32K may hurt performance. Yes, we could make our code
choose whether to multiply or not depending on the language, but for
simplicity we don't. The UTF-8 is nice because it doesn't store a lot
of extra bytes as for vargraphics, when ascii is primarily stored
(saves disk space).

Our concern is that we support the most popular character set that
real people use. What is most prevalent? If we choose option a, will
there be a customer that balks because they want IBM-943?
Specifically: Will customers be perfectly happy with option a, or
will some demand to use other mbcs's (such as IBM-943), or would they
prefer to have the varchar's?

Thanks,
Kieran

Nov 12 '05 #2
MS SQL Server uses nvarchar for Unicode. Isn't the analogy: MSSQL's
nvarchar = DB2's vargraphic? Assuming basic "Unicode" is two bytes,
and there are schemes to encode Unicode, such as UTF-8, wouldn't
MSSQL's nvarchar and DB2's vargraphic both store double-byte Unicode
in the basic form of UCS-2?

The DB2 docs state: "When a Unicode database is created, CHAR,
VARCHAR, [etc] data are stored in UTF-8, and GRAPHIC, VARGRAPHIC,
[etc] data are stored in UCS-2." It would seem that when our Unicode
OLEDB app inserts into a varchar column (DB2 database is created as
UTF-8), the Unicode data gets encoded and stored as UTF-8. Is that
right?

I've heard of "pure DBCS", in reference to IBM Asian character sets.
By "DBCS", do you mean one of the CCSID numbers relating to specific
IBM language encodings in double-byte? If so, then vargraphics would
be great for "DBCS", but if the vargraphic is UCS-2, isn't it as good
a receptical to store Unicode, as is MSSQL's nvarchar?

I'm also concerned about glossing over subtleties with language
encodings if we employ Option a or b, such as loss of support of
characters. So if anyone has some real expertise in use-cases in
Options a or b, that would be useful.

Much Thanks!
Kieran

"Mark Yudkin" <my***********************@boing.org> wrote in message news:<co**********@ngspool-d02.news.aol.com>...
vargraphic is for DBCS, not Unicode. Use a Unicode databases with
(var)chars. (Var)chars in a Unicode db are actually ucs-2, although it
appears counterintuitive!

"Kieran Green" <ki***********@yahoo.com> wrote in message
news:ed*************************@posting.google.co m...
Greetings,

We are building an application written for Windows in C++ which uses
OLEDB to connect to AIX DB2 8.2. Our app stores all string data in
the wchar_t datatype, which generates dynamic SQL, typically with
bound parameters DBTYPE_WSTR, and so is a Unicode app.

We don't know whether to use the vargraphic datatype for storing
strings, or varchar, and which database character sets to support.

Option a - support only a UTF-8 database and use vargraphics.
Performance should be better since the app stores Unicode strings in
Unicode columns (UCS-2). However from a product perspective this is
limiting because customers may want to use, say IBM-1252, because they
want to! But they could be told that UTF-8 will store their data just
fine. Yes, we could also use vargraphics in any mbcs, but there are
conversion issues (control char's convert to SUB's in IBM-943).

Option b - support any database character set, and use varchar string
columns. This requires us to use the multiplier factor (e.g. 3 or 4)
for the column length (to support asian lang's when UTF-8 is chosen),
which heavily devours the limited rowsize in DB2 (you get deducted on
column creation time from the pagesize). Also going with a large
pagesize like 32K may hurt performance. Yes, we could make our code
choose whether to multiply or not depending on the language, but for
simplicity we don't. The UTF-8 is nice because it doesn't store a lot
of extra bytes as for vargraphics, when ascii is primarily stored
(saves disk space).

Our concern is that we support the most popular character set that
real people use. What is most prevalent? If we choose option a, will
there be a customer that balks because they want IBM-943?
Specifically: Will customers be perfectly happy with option a, or
will some demand to use other mbcs's (such as IBM-943), or would they
prefer to have the varchar's?

Thanks,
Kieran

Nov 12 '05 #3
DB2 does not have an equivalent to MS SQL's nvarchar - DB2 does not have a
Unicode data type. I too would like to see such a solution, but that's not
the way IBM decided to do things. Vargraphic is not Unicode, it is DBCS, an
earlier standard for handling CJK languages.

Provided your database has a Unicode code page, the data will be Unicode.
The internal encoding is not really important.

As I implied, I don't recommend either of your options a or b.

"Kieran Green" <ki***********@yahoo.com> wrote in message
news:ed*************************@posting.google.co m...
MS SQL Server uses nvarchar for Unicode. Isn't the analogy: MSSQL's
nvarchar = DB2's vargraphic? Assuming basic "Unicode" is two bytes,
and there are schemes to encode Unicode, such as UTF-8, wouldn't
MSSQL's nvarchar and DB2's vargraphic both store double-byte Unicode
in the basic form of UCS-2?

The DB2 docs state: "When a Unicode database is created, CHAR,
VARCHAR, [etc] data are stored in UTF-8, and GRAPHIC, VARGRAPHIC,
[etc] data are stored in UCS-2." It would seem that when our Unicode
OLEDB app inserts into a varchar column (DB2 database is created as
UTF-8), the Unicode data gets encoded and stored as UTF-8. Is that
right?

I've heard of "pure DBCS", in reference to IBM Asian character sets.
By "DBCS", do you mean one of the CCSID numbers relating to specific
IBM language encodings in double-byte? If so, then vargraphics would
be great for "DBCS", but if the vargraphic is UCS-2, isn't it as good
a receptical to store Unicode, as is MSSQL's nvarchar?

I'm also concerned about glossing over subtleties with language
encodings if we employ Option a or b, such as loss of support of
characters. So if anyone has some real expertise in use-cases in
Options a or b, that would be useful.

Much Thanks!
Kieran

"Mark Yudkin" <my***********************@boing.org> wrote in message
news:<co**********@ngspool-d02.news.aol.com>...
vargraphic is for DBCS, not Unicode. Use a Unicode databases with
(var)chars. (Var)chars in a Unicode db are actually ucs-2, although it
appears counterintuitive!

"Kieran Green" <ki***********@yahoo.com> wrote in message
news:ed*************************@posting.google.co m...
> Greetings,
>
> We are building an application written for Windows in C++ which uses
> OLEDB to connect to AIX DB2 8.2. Our app stores all string data in
> the wchar_t datatype, which generates dynamic SQL, typically with
> bound parameters DBTYPE_WSTR, and so is a Unicode app.
>
> We don't know whether to use the vargraphic datatype for storing
> strings, or varchar, and which database character sets to support.
>
> Option a - support only a UTF-8 database and use vargraphics.
> Performance should be better since the app stores Unicode strings in
> Unicode columns (UCS-2). However from a product perspective this is
> limiting because customers may want to use, say IBM-1252, because they
> want to! But they could be told that UTF-8 will store their data just
> fine. Yes, we could also use vargraphics in any mbcs, but there are
> conversion issues (control char's convert to SUB's in IBM-943).
>
> Option b - support any database character set, and use varchar string
> columns. This requires us to use the multiplier factor (e.g. 3 or 4)
> for the column length (to support asian lang's when UTF-8 is chosen),
> which heavily devours the limited rowsize in DB2 (you get deducted on
> column creation time from the pagesize). Also going with a large
> pagesize like 32K may hurt performance. Yes, we could make our code
> choose whether to multiply or not depending on the language, but for
> simplicity we don't. The UTF-8 is nice because it doesn't store a lot
> of extra bytes as for vargraphics, when ascii is primarily stored
> (saves disk space).
>
> Our concern is that we support the most popular character set that
> real people use. What is most prevalent? If we choose option a, will
> there be a customer that balks because they want IBM-943?
> Specifically: Will customers be perfectly happy with option a, or
> will some demand to use other mbcs's (such as IBM-943), or would they
> prefer to have the varchar's?
>
> Thanks,
> Kieran

Nov 12 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

33
by: news.microsoft.com | last post by:
To Microsoft and fellow MSDN Universal subscribers... Regarding new MSDN Universal (I mean Premier) price and level changes: 1) Way too expensive for the small and medium developer Universal...
2
by: Jeffrey Walton | last post by:
Hi All, BMP Strings are a subset of Universal Strings.The BMP string uses approximately 65,000 code points from Universal String encoding. BMP Strings: ISO/IEC 10646, 2-octet canonical form,...
0
by: Peter Donis | last post by:
When running a doctest text file with doctest.testfile, I noticed that universal newline support did not appear to work when module_relative is False. My text file was saved on a Windows machine...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.