469,270 Members | 1,396 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,270 developers. It's quick & easy.

Encoding: how to convert ISO-8559 to Unicode

Hi

This is going to be a question for anyone who is an expert in C# Text Encoding.

My situation is this: I have a Sybase database which is firing back ISO-8559 encoded strings. I am unable to get the db to translate to UTF-8 for non technical reasons.

So I have a string coming back with the character (ISO value 156). this character appears in .NET as a box character because 156 is not a valid Unicode character value.

I have been scratching my head over this one and have produced a series of tests to try to get the conversion correct.

My code is below followed by the output:

Expand|Select|Wrap|Line Numbers
  1.             string sybaseRawString = DataAccessLayer.GetXXX();
  2.  
  3.             Encoding iso = Encoding.GetEncoding("iso-8859-1");
  4.             Encoding sbcs = Encoding.Default; //SBCSCodePageEncoding
  5.             Encoding unicode = Encoding.Unicode;
  6.             Encoding utf8 = Encoding.UTF8;
  7.  
  8.             byte[] isoBytes = iso.GetBytes(sybaseRawString);
  9.             byte[] sbcsBytes = sbcs.GetBytes(sybaseRawString);
  10.             byte[] utf8Bytes = Encoding.Convert(iso, utf8, isoBytes);
  11.             byte[] unicodeBytes = Encoding.Convert(utf8, unicode, utf8Bytes);
  12.  
  13.             WriteLine("SYBASE ISO-8559 STRING");
  14.             WriteLine(sybaseRawString);
  15.             WriteLine(ToString(isoBytes));
  16.  
  17.             WriteLine("ISO -> SBCS ENCODED STRING");
  18.             WriteLine(new String(sbcs.GetChars(sbcsBytes)));
  19.             WriteLine(ToString(sbcsBytes));
  20.  
  21.             string expected = "FTSE TECHMARK 100 ()"; 
  22.             WriteLine("EXPECTED .NET STRING");
  23.             WriteLine(expected);
  24.             WriteLine(ToString(Encoding.Unicode.GetBytes(expected)));
  25.  
  26.             WriteLine("ISO -> UNICODE");
  27.             WriteLine(new String(unicode.GetChars(unicodeBytes)));
  28.             WriteLine(ToString(unicodeBytes));
  29.  
  30.             WriteLine("ISO -> UTF8");
  31.             WriteLine(new String(utf8.GetChars(utf8Bytes)));
  32.             WriteLine(ToString(utf8Bytes));
  33.  
  34. nb. I have replaced the box chars with question marks apart from SBCS which did produce a question mark. This is because html understands them and translates them to !!!
  35.  
  36.  

The output in the DEBUG window is as follows:

SYBASE ISO-8559 STRING
FTSE TECHMARK 100 (?)
46-54-53-45-20-54-45-43-48-4D-41-52-4B-20-31-30-30-20-28-9C-29

ISO -> SBCS ENCODED STRING
FTSE TECHMARK 100 (?)
46-54-53-45-20-54-45-43-48-4D-41-52-4B-20-31-30-30-20-28-3F-29

EXPECTED .NET STRING
FTSE TECHMARK 100 ()
46-00-54-00-53-00-45-00-20-00-54-00-45-00-43-00-48-00-4D-00-41-00-52-00-4B-00-20-00-31-00-30-00-30-00-20-00-28-00-53-01-29-00

ISO -> UNICODE
FTSE TECHMARK 100 (?)
46-00-54-00-53-00-45-00-20-00-54-00-45-00-43-00-48-00-4D-00-41-00-52-00-4B-00-20-00-31-00-30-00-30-00-20-00-28-00-9C-00-29-00

ISO -> UTF8
FTSE TECHMARK 100 (?)
46-54-53-45-20-54-45-43-48-4D-41-52-4B-20-31-30-30-20-28-C2-9C-29



However when I view this in NUnit. all the ? appear correctly as albeit every so slightly different to the Expected .NET version (ISO vs Unicode??), is NUnit is detecting the encoding format of the char and printing it correctly?

My question is how do I get from my original Sybase ISO-8559 string to the Expected .NET bytes (Unicode) so that I can be sure that all of my .NET apps will display the characters correctly.


Many thanks for any help received!
Dec 13 '07 #1
0 4672

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

10 posts views Thread by lawrence | last post: by
8 posts views Thread by Demon News | last post: by
5 posts views Thread by DbNetLink | last post: by
5 posts views Thread by Robert-Paul | last post: by
4 posts views Thread by fitsch | last post: by
reply views Thread by 6kjfsyg02 | last post: by
3 posts views Thread by Dale Strickland-Clark | last post: by
4 posts views Thread by Christina | last post: by
3 posts views Thread by Neil Cerutti | last post: by
6 posts views Thread by ThunderMusic | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.