473,386 Members | 1,738 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Difference in encoding? - Confused

I'm working on a .NET application that requests an XML document, in
string form, from a legacy COM component, then deserializes it. In
order to deserialize the document, the string needs to be placed into a
stream. AFAIK, .NET strings are UTF-16 encoded, but the COM component
returns a UTF-8 encoded document, so my first attempt at creating and
filling a stream used this code:

string response = <some UTF-8 encoded xml from a COM component>;
MemoryStream result = new MemoryStream(response.Length);
UTF8Encoding utf8Encoding = new UTF8Encoding();
result.Write(utf8Encoding.GetBytes(response), 0, response.Length);

The deserialization then worked fine until one of the XML documents
contained a UK pound character - £ - at which point an exception was
thrown indicating an invalid document.

After searching Google, I came across the following alternative code to
create and fill the stream for deserialization. After testing with
pound and a few other problem characters, this seems to work properly.

string response = <some UTF-8 encoded xml from a COM component>;
MemoryStream result = new MemoryStream(response.Length);
StreamWriter writer = new StreamWriter( result, new UTF8Encoding());
writer.Write( response );
writer.Flush();

I have a couple of questions: First, is my understanding of the
encoding issues correct? If I have a UTF-8 encoded document, is it up
to me to decode it into the stream explicitly? Secondly, my reading of
the two code snippets is that they should produce an identical result,
but in reality the first one doesn't seem to be decoding the document
correctly for all characters - can anyone explain what is causing the
different behaviour?

Thanks
Ian
Nov 12 '05 #1
4 1324
Ian Harding wrote:
I'm working on a .NET application that requests an XML document, in
string form, from a legacy COM component, then deserializes it. In
order to deserialize the document, the string needs to be placed into a
stream. AFAIK, .NET strings are UTF-16 encoded, but the COM component
returns a UTF-8 encoded document, so my first attempt at creating and
filling a stream used this code:

string response = <some UTF-8 encoded xml from a COM component>;


I think you should do nothing here. Just parse the string. Why do you
need a stream? This should work:
XmlDocument doc = new XmlDocument();
doc.LoadXml(response);

--
Oleg Tkachenko [XML MVP, MCP]
http://blog.tkachenko.com
Nov 12 '05 #2
Oleg Tkachenko [MVP] wrote:
Ian Harding wrote:
I'm working on a .NET application that requests an XML document, in
string form, from a legacy COM component, then deserializes it. In
order to deserialize the document, the string needs to be placed into
a stream. AFAIK, .NET strings are UTF-16 encoded, but the COM
component returns a UTF-8 encoded document, so my first attempt at
creating and filling a stream used this code:

string response = <some UTF-8 encoded xml from a COM component>;

I think you should do nothing here. Just parse the string. Why do you
need a stream? This should work:
XmlDocument doc = new XmlDocument();
doc.LoadXml(response);


I probably should have explained what I was doing with the data more
clearly.

We have a class library, containing serializable classes that represent
each type of document that can be returned by the COM component. For a
given request, we always know the type of the returned document, so we
just use XmlSerializer to populate a class instance from the XML. Saves
messing about with DOM and XPATH on the client-side. As I understand
it, it isn't possibly to pass a string directly to the serializer for
de-serialization. A MemoryStream seemed like the lowest-overhead way of
getting it into a stream.

Thanks
Ian
Nov 12 '05 #3
Ian Harding wrote:
We have a class library, containing serializable classes that represent
each type of document that can be returned by the COM component. For a
given request, we always know the type of the returned document, so we
just use XmlSerializer to populate a class instance from the XML. Saves
messing about with DOM and XPATH on the client-side. As I understand
it, it isn't possibly to pass a string directly to the serializer for
de-serialization. A MemoryStream seemed like the lowest-overhead way of
getting it into a stream.


XmlSerializer accepts TextReader, that means you can pass it new
StringReader(response). Fiddling with encoding with MemoryStream is
usually very error-prone. Basically if your XML is in .NET string that
means it's already UTF-16 encoded, but its XML declaration says UTF-8.
..NET supports such case just fine by switching to UTF-16.

--
Oleg Tkachenko [XML MVP, MCP]
http://blog.tkachenko.com
Nov 12 '05 #4
Oleg Tkachenko [MVP] wrote:
Ian Harding wrote:
We have a class library, containing serializable classes that
represent each type of document that can be returned by the COM
component. For a given request, we always know the type of the
returned document, so we just use XmlSerializer to populate a class
instance from the XML. Saves messing about with DOM and XPATH on the
client-side. As I understand it, it isn't possibly to pass a string
directly to the serializer for de-serialization. A MemoryStream
seemed like the lowest-overhead way of getting it into a stream.

XmlSerializer accepts TextReader, that means you can pass it new
StringReader(response). Fiddling with encoding with MemoryStream is
usually very error-prone. Basically if your XML is in .NET string that
means it's already UTF-16 encoded, but its XML declaration says UTF-8.
.NET supports such case just fine by switching to UTF-16.


Thank you Oleg. I hoped there was an easier way then getting involved
in encoding issues but just couldn't see it.
Nov 12 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: jean.moser | last post by:
Hi ! I need help to solve the problem of the special characters used by european western languages, for example French. Word is my word-processing tool.I can save the files in txt format but...
3
by: Beowulf | last post by:
Hi, I have an XML file generated by a third party (and therefore unchangable) program. 1st line in it is <?xml version="1.0" encoding="us-ascii"?> and down in the depths of the xml I have a...
4
by: Jaroslav Jakes | last post by:
Hi, please help. Sounds so simple. We receive textfiles (customer orders) as e-mail attachment. These textfiles contain a simple structure of orders, like: custno, itemno, qty, text Since...
1
by: foreman | last post by:
Hi there, Hello everybody. I am a newbie to dot net framework class lib. I am confused about those classes such as all of the stream classes and those XXXReader XXXWriter. In fact, I have tried...
37
by: Zhiv Kurilka | last post by:
Hi, I have a text file with following content: "((^)|(.* +))§§§§§§§§" if I read it with: k=System.IO.StreamReader( "file.txt",System.Text.Encoding.ASCII); k.readtotheend()
8
by: Andy | last post by:
Hello All: I have a windows application that I need to encode a string using Unicode. The example I have been given to use is a Web-Version. Below is the webcode. ...
4
by: Steve | last post by:
I wish my aspx pages to be interpreted as UTF-8 by browsers. Apart from setting the following in the web.config file: <globalization fileEncoding="utf-8" requestEncoding="utf-8"...
6
by: krishnakant Mane | last post by:
hello, I am strangely confused with a date calculation problem. the point is that I want to calculate difference in two dates in days. there are two aspects to this problem. firstly, I can't get...
3
by: Diego F. | last post by:
Hi. I'm using that code: If File.Exists(Ls_NombreFichero) = False Then sw = File.CreateText(Ls_NombreFichero) Else sw = File.AppendText(Ls_NombreFichero) End If I need to change the...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.