incorrect encoding after serialisation to XML | | |
Using the code below I am trying, in VB .Net 2003, to serialise classes
defined in a couple of XSD documents. The encoding for both is
Unicode(UTF-8). However the resulting XML is encoded as UTF-16. This is
causing me problems when I try to load it into an XPath document. I would
imagine I should be able to use System.Text.Encoding to define the encoding
as UTF-8 but I haven't been able to figure out how so far.
Dim ser As New XmlSerializer(GetType(GTPAPP))
Dim sw As New StringWriter
ser.Serialize(sw, domainObj.getGTPApp)
Return sw.ToString()
any help would be appreciated. | | | | re: incorrect encoding after serialisation to XML
one additional point on this is that when I try to load the XML when encoded
as UTF-16 I get the error
"There is no Unicode byte order mark. Cannot switch to Unicode."
"Stephen" wrote:
[color=blue]
> Using the code below I am trying, in VB .Net 2003, to serialise classes
> defined in a couple of XSD documents. The encoding for both is
> Unicode(UTF-8). However the resulting XML is encoded as UTF-16. This is
> causing me problems when I try to load it into an XPath document. I would
> imagine I should be able to use System.Text.Encoding to define the encoding
> as UTF-8 but I haven't been able to figure out how so far.
>
> Dim ser As New XmlSerializer(GetType(GTPAPP))
> Dim sw As New StringWriter
> ser.Serialize(sw, domainObj.getGTPApp)
> Return sw.ToString()
>
> any help would be appreciated.[/color] | | | | re: incorrect encoding after serialisation to XML
"Stephen" <Stephen@discussions.microsoft.com> wrote in message news:8CB75244-3CB4-4CFD-992F-0CC87BC40753@microsoft.com...[color=blue]
> "Stephen" wrote:[color=green]
> > Using the code below I am trying, in VB .Net 2003, to serialise classes
> > defined in a couple of XSD documents. The encoding for both is
> > Unicode(UTF-8).[/color][/color]
The encoding of the schema documents is irrelevant, and the
encoding of the classes -- huh?
Once the Framework loads a piece of XML, that XML becomes
a node set. Metaphysically, think of it as astral projection (an
out-of-body experience) for the XML ... it goes to a higher plane
of existence where encodings no longer matter (whether attributes
are delimited by single- or double- quotes no longer matters,
whether content came from a CDATA section no longer matters,
etc.)
The challenges you're facing seem to center on entering and
leaving 'the Body' (think of the XML as being corporeal
whenever you see angle brackets).
[color=blue][color=green]
> > However the resulting XML is encoded as UTF-16.[/color][/color]
: :[color=blue][color=green]
> > Dim sw As New StringWriter
> > ser.Serialize(sw, domainObj.getGTPApp)
> > Return sw.ToString()[/color][/color]
A String is always UTF-16 encoded. There is no such thing as a UTF-8
string. It's a myth. It's fiction. UTF-8 strings went out with the dragon,
leprechauns, three-headed dogs guarding the Underworld, and Java.
[color=blue]
> one additional point on this is that when I try to load the XML when encoded
> as UTF-16 I get the error
> "There is no Unicode byte order mark. Cannot switch to Unicode."[/color]
This depends on how the XML was saved. Was it saved with a Stream
that used System.Text.UnicodeEncoding?
You'll get this XmlException when the XML isn't encoded in the encoding
it says it is.
My advice is not to write the XML to a String. If you want to put it into
a file (and have control over it's encoding), use an XmlTextWriter and
wrap it around a Stream. Then read it back in with a TextWriter that
is wrapped around a Stream, with a matching encoding (and any
encoding declaration that appears in the XML declaration should
match both the encoding you used to serialize out and deserialize
in.)
- - - utf8xml.vb (excerpt)
' . . .
Dim tw As New System.Xml.XmlTextWriter( _
New System.IO.FileStream( "file.xml", System.IO.FileMode.Create), _
New System.Text.UTF8Encoding( True) )
' XML will be serialized to file.xml, in UTF-8, with BOM.
ser.Serialize( tw, domainObj.getGTPApp)
' Finish writing the file and close it.
tw.Flush( )
tw.Close( )
Dim tr As New System.IO.TextReader( _
New System.IO.FileStream( "file.xml", System.IO.FileMode.Open), _
New System.Text.UTF8Encoding( True) )
' XML will be deserialized from file.xml, in UTF-8, with BOM.
domainObj.setGTPApp( CType( _
ser.Deserialize( tr), GTPApp) )
tr.Close( )
' . . .
- - -
Derek Harmon | | | | re: incorrect encoding after serialisation to XML
Derek
I have a further question on this query. What I am actually trying to do is
generate XML that can be used as the input XML for a call to a second
application. So rather then generating a file I need to generate the XML to
be used in code. Would you know how to do this?
"Derek Harmon" wrote:
[color=blue]
> "Stephen" <Stephen@discussions.microsoft.com> wrote in message news:8CB75244-3CB4-4CFD-992F-0CC87BC40753@microsoft.com...[color=green]
> > "Stephen" wrote:[color=darkred]
> > > Using the code below I am trying, in VB .Net 2003, to serialise classes
> > > defined in a couple of XSD documents. The encoding for both is
> > > Unicode(UTF-8).[/color][/color]
>
> The encoding of the schema documents is irrelevant, and the
> encoding of the classes -- huh?
>
> Once the Framework loads a piece of XML, that XML becomes
> a node set. Metaphysically, think of it as astral projection (an
> out-of-body experience) for the XML ... it goes to a higher plane
> of existence where encodings no longer matter (whether attributes
> are delimited by single- or double- quotes no longer matters,
> whether content came from a CDATA section no longer matters,
> etc.)
>
> The challenges you're facing seem to center on entering and
> leaving 'the Body' (think of the XML as being corporeal
> whenever you see angle brackets).
>[color=green][color=darkred]
> > > However the resulting XML is encoded as UTF-16.[/color][/color]
> : :[color=green][color=darkred]
> > > Dim sw As New StringWriter
> > > ser.Serialize(sw, domainObj.getGTPApp)
> > > Return sw.ToString()[/color][/color]
>
> A String is always UTF-16 encoded. There is no such thing as a UTF-8
> string. It's a myth. It's fiction. UTF-8 strings went out with the dragon,
> leprechauns, three-headed dogs guarding the Underworld, and Java.
>[color=green]
> > one additional point on this is that when I try to load the XML when encoded
> > as UTF-16 I get the error
> > "There is no Unicode byte order mark. Cannot switch to Unicode."[/color]
>
> This depends on how the XML was saved. Was it saved with a Stream
> that used System.Text.UnicodeEncoding?
>
> You'll get this XmlException when the XML isn't encoded in the encoding
> it says it is.
>
> My advice is not to write the XML to a String. If you want to put it into
> a file (and have control over it's encoding), use an XmlTextWriter and
> wrap it around a Stream. Then read it back in with a TextWriter that
> is wrapped around a Stream, with a matching encoding (and any
> encoding declaration that appears in the XML declaration should
> match both the encoding you used to serialize out and deserialize
> in.)
>
> - - - utf8xml.vb (excerpt)
> ' . . .
> Dim tw As New System.Xml.XmlTextWriter( _
> New System.IO.FileStream( "file.xml", System.IO.FileMode.Create), _
> New System.Text.UTF8Encoding( True) )
>
> ' XML will be serialized to file.xml, in UTF-8, with BOM.
> ser.Serialize( tw, domainObj.getGTPApp)
>
> ' Finish writing the file and close it.
> tw.Flush( )
> tw.Close( )
>
> Dim tr As New System.IO.TextReader( _
> New System.IO.FileStream( "file.xml", System.IO.FileMode.Open), _
> New System.Text.UTF8Encoding( True) )
>
> ' XML will be deserialized from file.xml, in UTF-8, with BOM.
> domainObj.setGTPApp( CType( _
> ser.Deserialize( tr), GTPApp) )
>
> tr.Close( )
> ' . . .
> - - -
>
>
> Derek Harmon
>
>
>[/color] | | | | re: incorrect encoding after serialisation to XML
Stephen,
Certainly, no problem. In the code below, simply replace references to
FileStream with references to MemoryStream. This will yield a Byte( )
array that you can pass along in binary form to this other application.
They're Bytes and not Chars, so the Byte( ) can be UTF-8 encoded.
As far as
The XmlTextWriter declarations below become.
[color=blue][color=green]
> > Dim tw As New System.Xml.XmlTextWriter( _
> > New System.IO.FileStream( "file.xml", System.IO.FileMode.Create), _
> > New System.Text.UTF8Encoding( True) )[/color][/color]
Dim stream As New System.IO.MemoryStream( )
Dim tw As New System.Xml.XmlTextWriter( _
stream, New System.Text.UTF8Encoding( True) )
Again, this produces the Byte Order Mark because I'm creating the
UTF8Encoding with True. When you're sending a binary stream, you
might not want a BOM (or at least the consumer might not want the
BOM). You may want to try it both ways, with- and w/o BOM.
After you are done Serializing to the tw, just like in the code example
below, you would extract the Byte( ) from the MemoryStream.
Dim buffer As Byte( ) = stream.GetBuffer( )
and you can pass this forward to another application (i.e., through
a Socket) or call a method,
Me.OtherMethod( buffer)
' . . .
Public Sub OtherMethod( ByVal incomingXml As Byte( ) )
' OtherMethod proceeds to wrap incomingXml in a
' MemoryStream and deserialize it using an XmlReader
' . . .
End Sub
The deserialization works much the same as the translation of the serialization
I've shown above. Polymorphism is what makes this possible. To the Xml-
Reader and Writer classes it doesn't matter what variety of System.IO.Stream
you use (MemoryStream, FileStream, NetworkStream), only that it is a Stream.
Derek Harmon
"Stephen" <Stephen@discussions.microsoft.com> wrote in message news:0AF1EA5B-0A8E-4536-9EB7-B0C399B24747@microsoft.com...[color=blue]
> Derek
> I have a further question on this query. What I am actually trying to do is
> generate XML that can be used as the input XML for a call to a second
> application. So rather then generating a file I need to generate the XML to
> be used in code. Would you know how to do this?
>
> "Derek Harmon" wrote:[/color]
: :[color=blue][color=green]
> > - - - utf8xml.vb (excerpt)
> > ' . . .
> > Dim tw As New System.Xml.XmlTextWriter( _
> > New System.IO.FileStream( "file.xml", System.IO.FileMode.Create), _
> > New System.Text.UTF8Encoding( True) )
> >
> > ' XML will be serialized to file.xml, in UTF-8, with BOM.
> > ser.Serialize( tw, domainObj.getGTPApp)
> >
> > ' Finish writing the file and close it.
> > tw.Flush( )
> > tw.Close( )
> >
> > Dim tr As New System.IO.TextReader( _
> > New System.IO.FileStream( "file.xml", System.IO.FileMode.Open), _
> > New System.Text.UTF8Encoding( True) )
> >
> > ' XML will be deserialized from file.xml, in UTF-8, with BOM.
> > domainObj.setGTPApp( CType( _
> > ser.Deserialize( tr), GTPApp) )
> >
> > tr.Close( )
> > ' . . .
> > - - -[/color][/color] |  | Similar .NET Framework bytes | | | /bytes/about
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over 226,439 network members.
|