473,394 Members | 1,007 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Encoding: how to convert ISO-8559 to Unicode

Hi

This is going to be a question for anyone who is an expert in C# Text Encoding.

My situation is this: I have a Sybase database which is firing back ISO-8559 encoded strings. I am unable to get the db to translate to UTF-8 for non technical reasons.

So I have a string coming back with the character (ISO value 156). this character appears in .NET as a box character because 156 is not a valid Unicode character value.

I have been scratching my head over this one and have produced a series of tests to try to get the conversion correct.

My code is below followed by the output:

Expand|Select|Wrap|Line Numbers
  1.             string sybaseRawString = DataAccessLayer.GetXXX();
  2.  
  3.             Encoding iso = Encoding.GetEncoding("iso-8859-1");
  4.             Encoding sbcs = Encoding.Default; //SBCSCodePageEncoding
  5.             Encoding unicode = Encoding.Unicode;
  6.             Encoding utf8 = Encoding.UTF8;
  7.  
  8.             byte[] isoBytes = iso.GetBytes(sybaseRawString);
  9.             byte[] sbcsBytes = sbcs.GetBytes(sybaseRawString);
  10.             byte[] utf8Bytes = Encoding.Convert(iso, utf8, isoBytes);
  11.             byte[] unicodeBytes = Encoding.Convert(utf8, unicode, utf8Bytes);
  12.  
  13.             WriteLine("SYBASE ISO-8559 STRING");
  14.             WriteLine(sybaseRawString);
  15.             WriteLine(ToString(isoBytes));
  16.  
  17.             WriteLine("ISO -> SBCS ENCODED STRING");
  18.             WriteLine(new String(sbcs.GetChars(sbcsBytes)));
  19.             WriteLine(ToString(sbcsBytes));
  20.  
  21.             string expected = "FTSE TECHMARK 100 ()"; 
  22.             WriteLine("EXPECTED .NET STRING");
  23.             WriteLine(expected);
  24.             WriteLine(ToString(Encoding.Unicode.GetBytes(expected)));
  25.  
  26.             WriteLine("ISO -> UNICODE");
  27.             WriteLine(new String(unicode.GetChars(unicodeBytes)));
  28.             WriteLine(ToString(unicodeBytes));
  29.  
  30.             WriteLine("ISO -> UTF8");
  31.             WriteLine(new String(utf8.GetChars(utf8Bytes)));
  32.             WriteLine(ToString(utf8Bytes));
  33.  
  34. nb. I have replaced the box chars with question marks apart from SBCS which did produce a question mark. This is because html understands them and translates them to !!!
  35.  
  36.  

The output in the DEBUG window is as follows:

SYBASE ISO-8559 STRING
FTSE TECHMARK 100 (?)
46-54-53-45-20-54-45-43-48-4D-41-52-4B-20-31-30-30-20-28-9C-29

ISO -> SBCS ENCODED STRING
FTSE TECHMARK 100 (?)
46-54-53-45-20-54-45-43-48-4D-41-52-4B-20-31-30-30-20-28-3F-29

EXPECTED .NET STRING
FTSE TECHMARK 100 ()
46-00-54-00-53-00-45-00-20-00-54-00-45-00-43-00-48-00-4D-00-41-00-52-00-4B-00-20-00-31-00-30-00-30-00-20-00-28-00-53-01-29-00

ISO -> UNICODE
FTSE TECHMARK 100 (?)
46-00-54-00-53-00-45-00-20-00-54-00-45-00-43-00-48-00-4D-00-41-00-52-00-4B-00-20-00-31-00-30-00-30-00-20-00-28-00-9C-00-29-00

ISO -> UTF8
FTSE TECHMARK 100 (?)
46-54-53-45-20-54-45-43-48-4D-41-52-4B-20-31-30-30-20-28-C2-9C-29



However when I view this in NUnit. all the ? appear correctly as albeit every so slightly different to the Expected .NET version (ISO vs Unicode??), is NUnit is detecting the encoding format of the char and printing it correctly?

My question is how do I get from my original Sybase ISO-8559 string to the Expected .NET bytes (Unicode) so that I can be sure that all of my .NET apps will display the characters correctly.


Many thanks for any help received!
Dec 13 '07 #1
0 5008

Sign in to post your reply or Sign up for a free account.

Similar topics

10
by: lawrence | last post by:
Validator chokes on my pages now because I started sending an character encoding header of UTF-8 but the page is full of non UTF-8 characters. Anyway quick way to convert them? ...
8
by: Demon News | last post by:
I'm trying to do a transform (Using XmlTransform class in c#) and in the Transform I'm specifying the the output xsl below: <xsl:output method="xml" encoding="UTF-8" indent="no"/> the...
5
by: DbNetLink | last post by:
I am trying to convert some Japanese text encoded as Shift-JIS/ISO-2022-JP to UTF-8 so I can store all data in my database with a common encoding. My problem is the encoding conversion code works...
5
by: Robert-Paul | last post by:
I have some big trouble getting special characters right when reading from a mailbox. I have a class that reads e-mails from a mailboxusing pop3 (language C#). I have tried various approaches from...
4
by: fitsch | last post by:
Hi, I am trying to write a generic RSS/Atom/OPML feed client. The problem is, that those xml feeds may have different encodings: - <?xml version="1.0" encoding="ISO-8859-1" ?>... - <?xml...
0
by: 6kjfsyg02 | last post by:
I have written a client to a web service. I use ASP.NET 1.1 for the client. It worked until I tried to send accented characters. Then the service answered that my signature is not valid. I was...
3
by: Dale Strickland-Clark | last post by:
A colleague has asked me this and I don't know the answer. Can anyone here help with this? Thanks in advance. Here is his email: I am trying to parse an HTML document using the xml.dom.minidom...
4
by: Christina | last post by:
Hey Guys, Currently, I am using the below code: Dim oReqDoc as XmlDocument Dim requiredBytes As Byte() requiredBytes = System.Text.UTF8Encoding.UTF8.GetBytes(oReqDoc.InnerXml). Here, I am...
3
by: Neil Cerutti | last post by:
How do I cope with faulty encoding settings? I'm writing an application that needs all internal character data to be stored in iso-8859-1. It also must allow input and output using stdin and...
6
by: ThunderMusic | last post by:
Hi, We are trying to encode to ISO-8859-1, but we have problems doing it using the encoders in .NET. We get some unknown characters in some culture which comes out fine if we post (from IE) from a...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.