de*******@yahoo.com wrote:
Serge Rielau wrote:
To the best of my knowlegde NVARCHAR in SQL Server is UCS-2 (double
byte
Unicode). In DB2 that would match GRAPHIC in a Unicode database.
If you have a lot of NVARCHAR flying around you may want to consider
just using a unicode. Your VARCHAR columns will then be UTF-8 and
GRAPHIC UCS-2.
That is interesting. So, if the database's default character set is
unicode or UTF-8, then the SQL Server NVARCHAR would just map to a
VARCHAR in DB2. (I take it the same is true for other Nxyz data types
too.) That makes sense and simplifies things a lot.
Thanks a lot!
Yes and no. It is correct that UTF-8 and UCS-2 have the same expressive
power w.r.t. codepoints.
Things are getting interesting when you do do SUBSTR() or LENGTH().
In UCS-2 things are easy (I simply a tiny bit here by not considering
"combining charcters") since 2 bytes match 1 character - always. DB2
knows that and SUBSTR(graphiccol, 3, 5) will truly give you the 5
charcters starting with the third.
In UTF-8 things get messy. Both SUBSTR() and LENGTH() (as well as other
string operations) use bytes for their unit for CHAR. So SUBSTR(utf8col,
3, 5) can give anywhere from 2-5 characters.
So if you don't do much in the way of string manipulation (other then
concat which is harmless) then UTF-8 will be good (space efficient). If
you do string manipulation I recommend GRAPHIC (at the cost of space).
Hope that helps.
Cheers
Serge
PS: In a futire version of DB2 character based string manipulation will
be provided. But this is the way of the land as it is right now.
--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab