470,594 Members | 1,122 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,594 developers. It's quick & easy.

Find the encoding of an XML char array (or stream) stored in memory

Hi,

I have an application that involves sending a lot of XML data to
various places. The problem is that once in a while, I just want the
XML document as a string (for example, sending to a typed dataset
tableadaptor). In that case, I have a problem: what encoding do I use
to conver the byte[] (or memorystream) into a string? The encoding is
in the file, of course, like any XML file, but how do I find out
which? I tried using an XmlTextReader - this is supposed to have an
Encoding... but all I did basically was

myEncoding = new XmlTextReader(myMemoryStream).Encoding;

and in that case it's null. Is there something I have to do with the
textreareader to get it to populate it's own "Encoding" field from the
given text? The presence of a null indicates I'm doing something
really wierd, since the documentation for XmlTextReader.Encoding says:

Property Value
The encoding value. If no encoding attribute exists, and there is no
byte-order mark, this defaults to UTF-8.

Mar 2 '07 #1
3 2334
Martin Z wrote:
I have an application that involves sending a lot of XML data to
various places. The problem is that once in a while, I just want the
XML document as a string (for example, sending to a typed dataset
tableadaptor). In that case, I have a problem: what encoding do I use
to conver the byte[] (or memorystream) into a string? The encoding is
in the file, of course, like any XML file, but how do I find out
which? I tried using an XmlTextReader - this is supposed to have an
Encoding... but all I did basically was

myEncoding = new XmlTextReader(myMemoryStream).Encoding;

and in that case it's null. Is there something I have to do with the
textreareader to get it to populate it's own "Encoding" field from the
given text?
Yes, the reader has to read the beginning of the stream to find a byte
order mark or look at the XML declaration.
If you do e.g.
XmlTextReader reader = new XmlTextReader(myMemoryStream);
reader.MoveToContent();
then you should be able to get the encoding
Encoding myEncoding = reader.Encoding;

Note that in .NET 2.0 there is also a method ReadOuterXml so instead of
trying to find out the encoding and decode the bytes in the memory
stream to a string it might suffice to do e.g.
XmlTextReader reader = new XmlTextReader(myMemoryStream);
reader.MoveToContent();
string xml = reader.ReadOuterXml();
That will strip out anything like comment or processing instructions
before the root element however. If you want the complete XML then you
might need to call Read and ReadOuterXml for all top level nodes instead
of using MoveToContent.

And most APIs taking XML input usually have overloads to read from a
stream directly, so double check that your API does not take a stream
before you take efforts to decode your stream into a string of XML.
--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Mar 2 '07 #2
For the theory of how to do this, see Appendix F of the XML spec:

http://www.w3.org/TR/REC-xml/#sec-guessing

Code to implement this isn't particularly difficult to write.
Mar 2 '07 #3
Thankyou, you solved my problem perfectly. And as I said, I was using
a typed dataset tableadaptor on SQL 2000 - those use strings for any
text field, so I needed it as a string.

On Mar 2, 10:16 am, Martin Honnen <mahotr...@yahoo.dewrote:
Martin Z wrote:
I have an application that involves sending a lot of XML data to
various places. The problem is that once in a while, I just want the
XML document as a string (for example, sending to a typed dataset
tableadaptor). In that case, I have a problem: what encoding do I use
to conver the byte[] (or memorystream) into a string? The encoding is
in the file, of course, like any XML file, but how do I find out
which? I tried using an XmlTextReader - this is supposed to have an
Encoding... but all I did basically was
myEncoding = new XmlTextReader(myMemoryStream).Encoding;
and in that case it's null. Is there something I have to do with the
textreareader to get it to populate it's own "Encoding" field from the
given text?

Yes, the reader has to read the beginning of the stream to find a byte
order mark or look at the XML declaration.
If you do e.g.
XmlTextReader reader = new XmlTextReader(myMemoryStream);
reader.MoveToContent();
then you should be able to get the encoding
Encoding myEncoding = reader.Encoding;

Note that in .NET 2.0 there is also a method ReadOuterXml so instead of
trying to find out the encoding and decode the bytes in the memory
stream to a string it might suffice to do e.g.
XmlTextReader reader = new XmlTextReader(myMemoryStream);
reader.MoveToContent();
string xml = reader.ReadOuterXml();
That will strip out anything like comment or processing instructions
before the root element however. If you want the complete XML then you
might need to call Read and ReadOuterXml for all top level nodes instead
of using MoveToContent.

And most APIs taking XML input usually have overloads to read from a
stream directly, so double check that your API does not take a stream
before you take efforts to decode your stream into a string of XML.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/

Mar 2 '07 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by J. Campbell | last post: by
9 posts views Thread by Angus | last post: by
7 posts views Thread by s88 | last post: by
2 posts views Thread by ivan47 | last post: by
5 posts views Thread by =?Utf-8?B?QXlrdXQgRXJnaW4=?= | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.