Get 3 chars before <?xml version...
Question posted by: David Thielen
(Guest)
on
November 12th, 2005 05:08 AM
Hi;
My code is:
XmlDocument doc = new XmlDocument();
doc.AppendChild(xmlDoc.CreateXmlDeclaration("1.0", "UTF-8", ""));
....
doc.Save(outStream);
And my saved document has:
0xef 0xbb 0xbf before the <?xml...
What do I have to do to eliminate this? (.net 2.0)
--
thanks - dave
5
Answers Posted
Hello!
[color=blue]
> XmlDocument doc = new XmlDocument();
> doc.AppendChild(xmlDoc.CreateXmlDeclaration("1.0", "UTF-8", ""));
> ...
> doc.Save(outStream);
>
> And my saved document has:
> 0xef 0xbb 0xbf before the <?xml...
>
> What do I have to do to eliminate this? (.net 2.0)[/color]
Why do you want to do it?
AFAIK that's the Unicode Byte-Order-Marks wich every XML parser should
be able to understand.
Maybe a solution would be to switch to another encoding (US-ASCII,...)
--
Pascal Schmitt
I learn something new everyday - I was not aware of this. How long has this
been part of the standard?
--
thanks - dave
"Pascal Schmitt" wrote:
[color=blue]
> Hello!
>[color=green]
> > XmlDocument doc = new XmlDocument();
> > doc.AppendChild(xmlDoc.CreateXmlDeclaration("1.0", "UTF-8", ""));
> > ...
> > doc.Save(outStream);
> >
> > And my saved document has:
> > 0xef 0xbb 0xbf before the <?xml...
> >
> > What do I have to do to eliminate this? (.net 2.0)[/color]
>
> Why do you want to do it?
> AFAIK that's the Unicode Byte-Order-Marks wich every XML parser should
> be able to understand.
>
> Maybe a solution would be to switch to another encoding (US-ASCII,...)
>
>
> --
> Pascal Schmitt
>[/color]
David Thielen wrote:
[color=blue]
> I learn something new everyday - I was not aware of this. How long has
> this been part of the standard?[/color]
Since the very beginning. The WD-xml-961114 draft says (4.2.3):
"Entities encoded in UCS-2 must begin with the Byte Order Mark
described by ISO 10646 Annex E and Unicode Appendix B (the ZERO
WIDTH NO-BREAK SPACE character, U+FEFF). This is an encoding
signature, not part of either the markup or character data of
the XML document. XML processors must be able to use this
character to differentiate between UTF-8 and UCS-2 encoded
documents." [p.20]
///Peter
--
XML FAQ: http://xml.silmaril.ie/
David Thielen wrote:[color=blue]
> I learn something new everyday - I was not aware of this. How long has this
> been part of the standard?[/color]
I don't know...
But it's not directly part of XML - it's part of the Unicode-Standard
(and since XML 1.0 is based on Unicode 2.0, it must be older than this...)
The 3 bytes in your document are the Byte Order Mark for UTF-8, wich is
optional.
--
Pascal Schmitt
> The 3 bytes in your document are the Byte Order Mark for UTF-8, wich is[color=blue]
> optional.[/color]
"Byte Order Mark" makes sense for Unicode where the characters are read and
written as 16 bit quantities and where the byte order depends on the
endianity, not for UTF-8 where the data is read and written as a byte
stream.
For UTF-8, this is rather a "signature".
Bruno
|
|
|
What is Bytes?
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over 196,821 network members.
Top Community Contributors
|