471,593 Members | 1,710 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,593 software developers and data experts.

Encoding: how to convert ISO-8559 to Unicode

Hi

This is going to be a question for anyone who is an expert in C# Text Encoding.

My situation is this: I have a Sybase database which is firing back ISO-8559 encoded strings. I am unable to get the db to translate to UTF-8 for non technical reasons.

So I have a string coming back with the character (ISO value 156). this character appears in .NET as a box character because 156 is not a valid Unicode character value.

I have been scratching my head over this one and have produced a series of tests to try to get the conversion correct.

My code is below followed by the output:

Expand|Select|Wrap|Line Numbers
  1.             string sybaseRawString = DataAccessLayer.GetXXX();
  2.  
  3.             Encoding iso = Encoding.GetEncoding("iso-8859-1");
  4.             Encoding sbcs = Encoding.Default; //SBCSCodePageEncoding
  5.             Encoding unicode = Encoding.Unicode;
  6.             Encoding utf8 = Encoding.UTF8;
  7.  
  8.             byte[] isoBytes = iso.GetBytes(sybaseRawString);
  9.             byte[] sbcsBytes = sbcs.GetBytes(sybaseRawString);
  10.             byte[] utf8Bytes = Encoding.Convert(iso, utf8, isoBytes);
  11.             byte[] unicodeBytes = Encoding.Convert(utf8, unicode, utf8Bytes);
  12.  
  13.             WriteLine("SYBASE ISO-8559 STRING");
  14.             WriteLine(sybaseRawString);
  15.             WriteLine(ToString(isoBytes));
  16.  
  17.             WriteLine("ISO -> SBCS ENCODED STRING");
  18.             WriteLine(new String(sbcs.GetChars(sbcsBytes)));
  19.             WriteLine(ToString(sbcsBytes));
  20.  
  21.             string expected = "FTSE TECHMARK 100 ()"; 
  22.             WriteLine("EXPECTED .NET STRING");
  23.             WriteLine(expected);
  24.             WriteLine(ToString(Encoding.Unicode.GetBytes(expected)));
  25.  
  26.             WriteLine("ISO -> UNICODE");
  27.             WriteLine(new String(unicode.GetChars(unicodeBytes)));
  28.             WriteLine(ToString(unicodeBytes));
  29.  
  30.             WriteLine("ISO -> UTF8");
  31.             WriteLine(new String(utf8.GetChars(utf8Bytes)));
  32.             WriteLine(ToString(utf8Bytes));
  33.  
  34. nb. I have replaced the box chars with question marks apart from SBCS which did produce a question mark. This is because html understands them and translates them to !!!
  35.  
  36.  

The output in the DEBUG window is as follows:

SYBASE ISO-8559 STRING
FTSE TECHMARK 100 (?)
46-54-53-45-20-54-45-43-48-4D-41-52-4B-20-31-30-30-20-28-9C-29

ISO -> SBCS ENCODED STRING
FTSE TECHMARK 100 (?)
46-54-53-45-20-54-45-43-48-4D-41-52-4B-20-31-30-30-20-28-3F-29

EXPECTED .NET STRING
FTSE TECHMARK 100 ()
46-00-54-00-53-00-45-00-20-00-54-00-45-00-43-00-48-00-4D-00-41-00-52-00-4B-00-20-00-31-00-30-00-30-00-20-00-28-00-53-01-29-00

ISO -> UNICODE
FTSE TECHMARK 100 (?)
46-00-54-00-53-00-45-00-20-00-54-00-45-00-43-00-48-00-4D-00-41-00-52-00-4B-00-20-00-31-00-30-00-30-00-20-00-28-00-9C-00-29-00

ISO -> UTF8
FTSE TECHMARK 100 (?)
46-54-53-45-20-54-45-43-48-4D-41-52-4B-20-31-30-30-20-28-C2-9C-29



However when I view this in NUnit. all the ? appear correctly as albeit every so slightly different to the Expected .NET version (ISO vs Unicode??), is NUnit is detecting the encoding format of the char and printing it correctly?

My question is how do I get from my original Sybase ISO-8559 string to the Expected .NET bytes (Unicode) so that I can be sure that all of my .NET apps will display the characters correctly.


Many thanks for any help received!
Dec 13 '07 #1
0 4789

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

10 posts views Thread by lawrence | last post: by
8 posts views Thread by Demon News | last post: by
5 posts views Thread by DbNetLink | last post: by
5 posts views Thread by Robert-Paul | last post: by
4 posts views Thread by fitsch | last post: by
reply views Thread by 6kjfsyg02 | last post: by
3 posts views Thread by Dale Strickland-Clark | last post: by
4 posts views Thread by Christina | last post: by
3 posts views Thread by Neil Cerutti | last post: by
6 posts views Thread by ThunderMusic | last post: by
reply views Thread by Anwar ali | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.