By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
431,899 Members | 1,085 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 431,899 IT Pros & Developers. It's quick & easy.

XMLTextWriter Encoding problem

P: n/a
The following code sample should produce a valid xml file to the
console. However, when I try this in C# (Visual Studio 2003, 1.1
Framework), there is an extra questionmark preceding the rest of the
content.

MemoryStream ms = new MemoryStream();
XmlTextWriter xtw = new XmlTextWriter(ms, Encoding.UTF8);
xtw.Namespaces = false;
xtw.Indentation = 5;
xtw.Formatting = Formatting.Indented;
xtw.WriteStartDocument();
xtw.WriteStartElement("root");
xtw.WriteStartElement("People");
xtw.WriteStartElement("Person");
xtw.WriteAttributeString("FirstName", "John");
xtw.WriteAttributeString("LastName", "Smith");
xtw.WriteEndElement();
xtw.WriteStartElement("Person");
xtw.WriteAttributeString("FirstName", "Jane");
xtw.WriteAttributeString("LastName", "Smith");
xtw.WriteEndElement();
xtw.WriteEndElement();
xtw.WriteEndElement();
xtw.WriteEndDocument();
xtw.Flush();
xtw.Close();

Console.WriteLine( Encoding.UTF8.GetString( ms.ToArray()));

Everything works as expected when I use ASCII encoding. Any encoding
works when I write to a file instead of a memory stream.

Anyone have the answer to this problem?

Thanks in advance.
Nov 12 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
Hi Adam,

The first three bytes in the byte array are BOM(Byte Order Mark 0xEFBBBF),
allowing applications to easily detect UTF-8 encoded text. If you want to
create UTF-8 encoded content without BOM, don't use the default instance
available through Encoding.UTF8, but create a new instance:

Encoding utf8 = new UTF8Encoding(false);

In you program, you can do like the following:

XmlTextWriter xtw = new XmlTextWriter(ms, new UTF8Encoding(false));

If anything is unclear, please feel free to reply to the post.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #2

P: n/a
Thank you. That worked great.
v-****@online.microsoft.com (Kevin Yu [MSFT]) wrote in message news:<uH**************@cpmsftngxa07.phx.gbl>...
Hi Adam,

The first three bytes in the byte array are BOM(Byte Order Mark 0xEFBBBF),
allowing applications to easily detect UTF-8 encoded text. If you want to
create UTF-8 encoded content without BOM, don't use the default instance
available through Encoding.UTF8, but create a new instance:

Encoding utf8 = new UTF8Encoding(false);

In you program, you can do like the following:

XmlTextWriter xtw = new XmlTextWriter(ms, new UTF8Encoding(false));

If anything is unclear, please feel free to reply to the post.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #3

P: n/a
Kevin,

could you explain what the problem is with using the default instance?

Thanks,
Christoph Schittko [MVP, XmlInsider]
Software Architect, .NET Mentor

"Kevin Yu [MSFT]" <v-****@online.microsoft.com> wrote in message
news:uH**************@cpmsftngxa07.phx.gbl...
Hi Adam,

The first three bytes in the byte array are BOM(Byte Order Mark 0xEFBBBF),
allowing applications to easily detect UTF-8 encoded text. If you want to
create UTF-8 encoded content without BOM, don't use the default instance
available through Encoding.UTF8, but create a new instance:

Encoding utf8 = new UTF8Encoding(false);

In you program, you can do like the following:

XmlTextWriter xtw = new XmlTextWriter(ms, new UTF8Encoding(false));

If anything is unclear, please feel free to reply to the post.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #4

P: n/a
Hi Christoph,

The UTF8 is a static instance of UTF8Encoding class. The encoder using this
instance will emit an UTF8 identifier, which is the 3 bytes I mentioned in
my last post. The identifer will help the program to recognize the UTF8
encoding. If we create a new instance with the first parameter of the
constructor set to false, it will not be emited.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #5

P: n/a
Thanks :)

--
HTH
Christoph Schittko [MVP]
Software Architect, .NET Mentor
"Kevin Yu [MSFT]" <v-****@online.microsoft.com> wrote in message
news:V0**************@cpmsftngxa07.phx.gbl...
Hi Christoph,

The UTF8 is a static instance of UTF8Encoding class. The encoder using this instance will emit an UTF8 identifier, which is the 3 bytes I mentioned in
my last post. The identifer will help the program to recognize the UTF8
encoding. If we create a new instance with the first parameter of the
constructor set to false, it will not be emited.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.