473,811 Members | 3,151 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Difference in encoding? - Confused

I'm working on a .NET application that requests an XML document, in
string form, from a legacy COM component, then deserializes it. In
order to deserialize the document, the string needs to be placed into a
stream. AFAIK, .NET strings are UTF-16 encoded, but the COM component
returns a UTF-8 encoded document, so my first attempt at creating and
filling a stream used this code:

string response = <some UTF-8 encoded xml from a COM component>;
MemoryStream result = new MemoryStream(re sponse.Length);
UTF8Encoding utf8Encoding = new UTF8Encoding();
result.Write(ut f8Encoding.GetB ytes(response), 0, response.Length );

The deserialization then worked fine until one of the XML documents
contained a UK pound character - £ - at which point an exception was
thrown indicating an invalid document.

After searching Google, I came across the following alternative code to
create and fill the stream for deserialization . After testing with
pound and a few other problem characters, this seems to work properly.

string response = <some UTF-8 encoded xml from a COM component>;
MemoryStream result = new MemoryStream(re sponse.Length);
StreamWriter writer = new StreamWriter( result, new UTF8Encoding()) ;
writer.Write( response );
writer.Flush();

I have a couple of questions: First, is my understanding of the
encoding issues correct? If I have a UTF-8 encoded document, is it up
to me to decode it into the stream explicitly? Secondly, my reading of
the two code snippets is that they should produce an identical result,
but in reality the first one doesn't seem to be decoding the document
correctly for all characters - can anyone explain what is causing the
different behaviour?

Thanks
Ian
Nov 12 '05 #1
4 1334
Ian Harding wrote:
I'm working on a .NET application that requests an XML document, in
string form, from a legacy COM component, then deserializes it. In
order to deserialize the document, the string needs to be placed into a
stream. AFAIK, .NET strings are UTF-16 encoded, but the COM component
returns a UTF-8 encoded document, so my first attempt at creating and
filling a stream used this code:

string response = <some UTF-8 encoded xml from a COM component>;


I think you should do nothing here. Just parse the string. Why do you
need a stream? This should work:
XmlDocument doc = new XmlDocument();
doc.LoadXml(res ponse);

--
Oleg Tkachenko [XML MVP, MCP]
http://blog.tkachenko.com
Nov 12 '05 #2
Oleg Tkachenko [MVP] wrote:
Ian Harding wrote:
I'm working on a .NET application that requests an XML document, in
string form, from a legacy COM component, then deserializes it. In
order to deserialize the document, the string needs to be placed into
a stream. AFAIK, .NET strings are UTF-16 encoded, but the COM
component returns a UTF-8 encoded document, so my first attempt at
creating and filling a stream used this code:

string response = <some UTF-8 encoded xml from a COM component>;

I think you should do nothing here. Just parse the string. Why do you
need a stream? This should work:
XmlDocument doc = new XmlDocument();
doc.LoadXml(res ponse);


I probably should have explained what I was doing with the data more
clearly.

We have a class library, containing serializable classes that represent
each type of document that can be returned by the COM component. For a
given request, we always know the type of the returned document, so we
just use XmlSerializer to populate a class instance from the XML. Saves
messing about with DOM and XPATH on the client-side. As I understand
it, it isn't possibly to pass a string directly to the serializer for
de-serialization. A MemoryStream seemed like the lowest-overhead way of
getting it into a stream.

Thanks
Ian
Nov 12 '05 #3
Ian Harding wrote:
We have a class library, containing serializable classes that represent
each type of document that can be returned by the COM component. For a
given request, we always know the type of the returned document, so we
just use XmlSerializer to populate a class instance from the XML. Saves
messing about with DOM and XPATH on the client-side. As I understand
it, it isn't possibly to pass a string directly to the serializer for
de-serialization. A MemoryStream seemed like the lowest-overhead way of
getting it into a stream.


XmlSerializer accepts TextReader, that means you can pass it new
StringReader(re sponse). Fiddling with encoding with MemoryStream is
usually very error-prone. Basically if your XML is in .NET string that
means it's already UTF-16 encoded, but its XML declaration says UTF-8.
..NET supports such case just fine by switching to UTF-16.

--
Oleg Tkachenko [XML MVP, MCP]
http://blog.tkachenko.com
Nov 12 '05 #4
Oleg Tkachenko [MVP] wrote:
Ian Harding wrote:
We have a class library, containing serializable classes that
represent each type of document that can be returned by the COM
component. For a given request, we always know the type of the
returned document, so we just use XmlSerializer to populate a class
instance from the XML. Saves messing about with DOM and XPATH on the
client-side. As I understand it, it isn't possibly to pass a string
directly to the serializer for de-serialization. A MemoryStream
seemed like the lowest-overhead way of getting it into a stream.

XmlSerializer accepts TextReader, that means you can pass it new
StringReader(re sponse). Fiddling with encoding with MemoryStream is
usually very error-prone. Basically if your XML is in .NET string that
means it's already UTF-16 encoded, but its XML declaration says UTF-8.
.NET supports such case just fine by switching to UTF-16.


Thank you Oleg. I hoped there was an easier way then getting involved
in encoding issues but just couldn't see it.
Nov 12 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
2054
by: jean.moser | last post by:
Hi ! I need help to solve the problem of the special characters used by european western languages, for example French. Word is my word-processing tool.I can save the files in txt format but special characters like é are transformed in \xe9 when I read the files in Python. How do I proceed to get the original files in latin-1 ? Thanks for your help. Jean
3
4944
by: Beowulf | last post by:
Hi, I have an XML file generated by a third party (and therefore unchangable) program. 1st line in it is <?xml version="1.0" encoding="us-ascii"?> and down in the depths of the xml I have a element <FirstName>Françoise</FirstName> I have an xsl file I've created to attempt to export this xml to CSV.
4
9375
by: Jaroslav Jakes | last post by:
Hi, please help. Sounds so simple. We receive textfiles (customer orders) as e-mail attachment. These textfiles contain a simple structure of orders, like: custno, itemno, qty, text Since these textfile are made on different systems, the field "text" causes some trouble.
1
4930
by: foreman | last post by:
Hi there, Hello everybody. I am a newbie to dot net framework class lib. I am confused about those classes such as all of the stream classes and those XXXReader XXXWriter. In fact, I have tried the StreamReader(which can read in big5 encoding text files well) and besides, I have tried the BufferedStream to do the same thing. It does work fine except that It can't interpret the Chinese big5 words well(which becomes messy code around)....
37
3387
by: Zhiv Kurilka | last post by:
Hi, I have a text file with following content: "((^)|(.* +))§§§§§§§§" if I read it with: k=System.IO.StreamReader( "file.txt",System.Text.Encoding.ASCII); k.readtotheend()
8
8503
by: Andy | last post by:
Hello All: I have a windows application that I need to encode a string using Unicode. The example I have been given to use is a Web-Version. Below is the webcode. Response.ContentEncoding=System.Text.Encoding.Unicode; Response.ContentType = "application/postscript"; Response.Buffer =true; Response.AppendHeader("Content-Disposition","attachment; filename=\"" + sFilename + "\"");
4
1565
by: Steve | last post by:
I wish my aspx pages to be interpreted as UTF-8 by browsers. Apart from setting the following in the web.config file: <globalization fileEncoding="utf-8" requestEncoding="utf-8" responseEncoding="utf-8" /> 1. Do I also have to specify <meta http-equiv="Content-Type" content="text/html; charset=utf-8"in every aspx page?
6
9025
by: krishnakant Mane | last post by:
hello, I am strangely confused with a date calculation problem. the point is that I want to calculate difference in two dates in days. there are two aspects to this problem. firstly, I can't get a way to convert a string like "1/2/2005" in a genuan date object which is needed for calculation. now once this is done I will create a another date object with today = datetime.datetime.now() and then see the difference between this today and...
3
20260
by: Diego F. | last post by:
Hi. I'm using that code: If File.Exists(Ls_NombreFichero) = False Then sw = File.CreateText(Ls_NombreFichero) Else sw = File.AppendText(Ls_NombreFichero) End If I need to change the encoding, as utf-8 is not the one I can use. How can I change it? Encoding property is read only and I don't know how to use the
0
9607
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10137
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9211
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7673
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6895
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5561
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5700
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4346
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
3026
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.