473,407 Members | 2,629 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,407 software developers and data experts.

XMLTextWriter Encoding problem

The following code sample should produce a valid xml file to the
console. However, when I try this in C# (Visual Studio 2003, 1.1
Framework), there is an extra questionmark preceding the rest of the
content.

MemoryStream ms = new MemoryStream();
XmlTextWriter xtw = new XmlTextWriter(ms, Encoding.UTF8);
xtw.Namespaces = false;
xtw.Indentation = 5;
xtw.Formatting = Formatting.Indented;
xtw.WriteStartDocument();
xtw.WriteStartElement("root");
xtw.WriteStartElement("People");
xtw.WriteStartElement("Person");
xtw.WriteAttributeString("FirstName", "John");
xtw.WriteAttributeString("LastName", "Smith");
xtw.WriteEndElement();
xtw.WriteStartElement("Person");
xtw.WriteAttributeString("FirstName", "Jane");
xtw.WriteAttributeString("LastName", "Smith");
xtw.WriteEndElement();
xtw.WriteEndElement();
xtw.WriteEndElement();
xtw.WriteEndDocument();
xtw.Flush();
xtw.Close();

Console.WriteLine( Encoding.UTF8.GetString( ms.ToArray()));

Everything works as expected when I use ASCII encoding. Any encoding
works when I write to a file instead of a memory stream.

Anyone have the answer to this problem?

Thanks in advance.
Nov 12 '05 #1
5 16747
Hi Adam,

The first three bytes in the byte array are BOM(Byte Order Mark 0xEFBBBF),
allowing applications to easily detect UTF-8 encoded text. If you want to
create UTF-8 encoded content without BOM, don't use the default instance
available through Encoding.UTF8, but create a new instance:

Encoding utf8 = new UTF8Encoding(false);

In you program, you can do like the following:

XmlTextWriter xtw = new XmlTextWriter(ms, new UTF8Encoding(false));

If anything is unclear, please feel free to reply to the post.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #2
Thank you. That worked great.
v-****@online.microsoft.com (Kevin Yu [MSFT]) wrote in message news:<uH**************@cpmsftngxa07.phx.gbl>...
Hi Adam,

The first three bytes in the byte array are BOM(Byte Order Mark 0xEFBBBF),
allowing applications to easily detect UTF-8 encoded text. If you want to
create UTF-8 encoded content without BOM, don't use the default instance
available through Encoding.UTF8, but create a new instance:

Encoding utf8 = new UTF8Encoding(false);

In you program, you can do like the following:

XmlTextWriter xtw = new XmlTextWriter(ms, new UTF8Encoding(false));

If anything is unclear, please feel free to reply to the post.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #3
Kevin,

could you explain what the problem is with using the default instance?

Thanks,
Christoph Schittko [MVP, XmlInsider]
Software Architect, .NET Mentor

"Kevin Yu [MSFT]" <v-****@online.microsoft.com> wrote in message
news:uH**************@cpmsftngxa07.phx.gbl...
Hi Adam,

The first three bytes in the byte array are BOM(Byte Order Mark 0xEFBBBF),
allowing applications to easily detect UTF-8 encoded text. If you want to
create UTF-8 encoded content without BOM, don't use the default instance
available through Encoding.UTF8, but create a new instance:

Encoding utf8 = new UTF8Encoding(false);

In you program, you can do like the following:

XmlTextWriter xtw = new XmlTextWriter(ms, new UTF8Encoding(false));

If anything is unclear, please feel free to reply to the post.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #4
Hi Christoph,

The UTF8 is a static instance of UTF8Encoding class. The encoder using this
instance will emit an UTF8 identifier, which is the 3 bytes I mentioned in
my last post. The identifer will help the program to recognize the UTF8
encoding. If we create a new instance with the first parameter of the
constructor set to false, it will not be emited.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #5
Thanks :)

--
HTH
Christoph Schittko [MVP]
Software Architect, .NET Mentor
"Kevin Yu [MSFT]" <v-****@online.microsoft.com> wrote in message
news:V0**************@cpmsftngxa07.phx.gbl...
Hi Christoph,

The UTF8 is a static instance of UTF8Encoding class. The encoder using this instance will emit an UTF8 identifier, which is the 3 bytes I mentioned in
my last post. The identifer will help the program to recognize the UTF8
encoding. If we create a new instance with the first parameter of the
constructor set to false, it will not be emited.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Ayende Rahien | last post by:
Two questions: I've a XmlTextWriter that I want to use to build a string in memory. However, when I'm using a StringWriter, the xml comes out at UTF-16, which isn't good for me. Currently, I...
2
by: Greg | last post by:
Hi all, I'm using the XmlDocument class to create an XML document. I found out that in .NET there are special classes that they do that job a lot faster, namelly XmlTextWritter. My problem is...
4
by: z. f. | last post by:
i have xml with the line <VAL ID="artist" VAL="abc & cde"/> i need the & character to be there but the xmlDocument Load method throw exception for invalid character. i don't want to encode...
3
by: David Taylor | last post by:
In .net I am using a HttpWebRequest to read from a WebSite. I am getting everything back except for some characters above hex 7F which appear to have been stripped out of my response. I see these...
2
by: fmancina | last post by:
Hi, I am employing the XmlTextWriter class to generate an XML document. Everything works fine, until I have to write an attribute to an element which contains a value. Examples below: //...
4
by: flyingco | last post by:
URL decoding/encoding problem Iif the url contains chinese char,the url will be encoded. For example : url:http://194.0.0.84/ÖÐÎÄÒ³Ãæ.htm when my tdi driver intercept the packet, I find that...
0
by: Serdar Irmak | last post by:
Hello, I've an encoding problem with asp.net 2.0 pages, it only effects to the input elements of forms that the client submitted, all other page content can be displayed normally including...
5
by: bagelman | last post by:
Hello, I've an encoding problem with asp.net 2.0 pages, it only effects to the content of textBox elements. When user press Submit button and page become postback the strings in textBox become...
2
by: David Gillen | last post by:
Hello. I've a problem (which I believe is a character encoding problem) where I retrieve data from a MSSQL database and euro and pound sign symbols appear as ? when a do a print_r of the rows...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.