I seem to hav emailed you insteda of posting before but here it is:
This is the byte order mark (BOM)and it confused me too at first.
You might think that you've already specified the encoding as "UTF-8" but if
you think about it the reader needs to know the encodding to read the string
"UTF-8" hence the BOM which is a 16 a magic 16 bit unicode value usually put
at the start of the file.
Off the top of my head you have an interaction between XmlTextWriter, the
stream you are writing to and the encoding for that stream.
It IS all documented (just not very clearly) and you definitely can suppress
the BOM (at least for utf-8).
I think you will find that there is a parameter to the encoding constructor
that specifies whether to use the BOM.
Just to confuse things I seem to remember that Encoding.UTF8 and new
UTF8Encoding() are different.
I seem to remember that when I had this problem it was because I was writing
to a MemoryStream which defaulted to Unicode whereas I think file streams
default to UTF-8 for compatibility reasons.
Be careful - it is totally possible to have the xml say "UTF-8" and the BOM
say something else - this will cause a self explanatory error when you try
to load the document.
P.S. Notepad can read and write UTF-8 and unicode big or little endian
see also
http://www.unicode.org/faq/utf_bom.html http://en.wikipedia.org/wiki/Byte_Order_Mark
"Nadav" <na****@gmail.com> wrote in message
news:%2****************@tk2msftngp13.phx.gbl...
Sure...
XmlDocument doc = new XmlDocument();
XmlNode root = doc.CreateElement("XXXX");
doc.AppendChild (root);
and so on....
and then at last I perform the code written before, to add the
decleration.
The reason I thought they are invalid chars, is that I have the same
software which creates the XML in JAVA also (I rewrote it in C#), and when
I
checked - the output XML files were identical (text and structure). But
still the JAVA created XML worked with the perl code and the C# wasn't.
The
reason are probably these chars which don't exist in the Java XML.
Thanks, Nadav.
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Nadav <na****@gmail.com> wrote: When I create an XML header using this code:
XmlDeclaration header = doc.CreateXmlDeclaration("1.0", "UTF-8", null);
XmlElement rootElement = doc.DocumentElement;
doc.InsertBefore(header, rootElement);
It adds some invalid characters before the header itself, only viewable
with a text editor (IE opens the XML ok). This causes some perl code I got,
which reads from the XML, to fail.
This is the header:
?»¿<?xml version="1.0" encoding="UTF-8"?>
They're not invalid characters. That's the byte order mark. It's
perfectly valid for it to be there - it sounds like the Perl code is
broken. You may well not be able to fix that though, so I guess we need
to sort out how to suppress the BOM from the written file.
You haven't shown how you're writing out the document - could you do
so?
--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too