473,372 Members | 894 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,372 software developers and data experts.

HowTo: Serialize to a String without extra characters

I have a class that I want to serialize to an XML string. I want the XML to serialize to utf-8 encoding. When I serialize to an XML file, the data looks great. When I try to serialize to a String (ala StringBuilder) I get utf-16 and instead of the parenthesis (") I get a slash and then a " (\") which makes sense when looking at a character in memory, but not in a string

Here is my code

XmlSerializer serializer = new XmlSerializer (myObject.GetType ())
StringBuilder builder = new StringBuilder ()
StringWriter stringWriter = new StringWriter (builder)
XmlTextWriter xmlWriter = new XmlTextWriter (stringWriter)
xmlWriter.Formatting = Formatting.Indented

// Serialize the document to the XML write
serializer.Serialize (xmlWriter, message)

return builder.ToString ()

If I try to write to a memory stream and then convert the byte array to a string via the binary reader, I get exceptions because for some reason there are garbage characters written to the front of the byte array that are not ASCII/Unicode characters

Any help would be great! Thanks

Brian
Nov 12 '05 #1
5 13256
Let me restate my problem

I am trying to serialize a class into a string in memory that is encoded with utf-8 encoding. Using the StringBuilder, StringWriter and XmlSerializer, I can only ever serialize to utf-16, which if persisted to a file cannot be parsed by XML viewers such as IE and also the utf-16 string cannot be processed by SQL Server. Is there a way to force these classes to serialize to utf-8 encoding

I have tried serializing to a MemoryStream, but I am getting 3 strange characters at the front of the stream. I am creating an empty MemoryStream and passing it to an XmlTextWriter as the stream behind it. When I serialize I get the complete document, but with the three mysterious characters on the front that prevent me from reading the stream as a string

Any serialization help would be great. Please forgive the rambling in my previous posting

Brian
Nov 12 '05 #2
"Brian Reed" <an*******@discussions.microsoft.com> wrote in message news:3B**********************************@microsof t.com...
I am trying to serialize a class into a string in memory that is encoded with utf-8 encoding.
Using the StringBuilder, StringWriter and XmlSerializer, I can only ever serialize to utf-16, : : Is there a way to force these classes to serialize to utf-8 encoding?
No. UTF-16 is the internal representation of System.String. If you require a different
encoding, you must store it in something other than System.String (MemoryStream or
Byte[] are what commonly come to mind).
I have tried serializing to a MemoryStream, but I am getting 3 strange characters at the
front of the stream.


That's probably the UTF-8 Byte Order Mark (BOM), although AFAIK the BOM
only introduces a pair of "mysterious" characters, not a trio. On the bright side,
it suggests that you really have UTF-8 encoding there. ;-)

The BOM exists because UTF-8 characters can be encoded as little-endian
or Big-Endian. Its possible to suppress the BOM, but if you do, realize that your
UTF-8 might be interpreted as containing gibberish when read on Big-Endian
machines (although in that case, its reasonable for smart receivers to infer the
erroneous data is due to its having the incorrect byte ordering, and ideally an
interpretation using the opposite byte ordering could be attempted.)

Look for the constructor where you create the System.Text.UTF8Encoding in
your code. If you pass true as the first argument (shouldEmitUtf8Bom) to this
constructor, try changing it to false instead. This should remove any BOM
from the output.
Derek Harmon
Nov 12 '05 #3
Derek Harmon wrote:
That's probably the UTF-8 Byte Order Mark (BOM), although AFAIK the BOM
only introduces a pair of "mysterious" characters, not a trio. On the bright side,
it suggests that you really have UTF-8 encoding there. ;-)


BOM length depends on encoding - in UTF-8 it's 3 bytes, while in UTF-16
it's 2 bytes. See
http://www.w3.org/TR/2000/REC-xml-20...ng-no-ext-info

--
Oleg Tkachenko [XML MVP, XmlInsider]
http://blog.tkachenko.com
Nov 12 '05 #4
Hi Derek,
That's probably the UTF-8 Byte Order Mark (BOM), although AFAIK the BOM
only introduces a pair of "mysterious" characters, not a trio. On the bright side,
it suggests that you really have UTF-8 encoding there. ;-)

The BOM exists because UTF-8 characters can be encoded as little-endian
or Big-Endian. Its possible to suppress the BOM,


<-- snip -->

can you tell me how to suppress the BOM?

Thanx,
Timo
Nov 12 '05 #5
Timo,
Have you tried the constructors for System.Text.UTF8Encoding that accept a
boolean parameter which specifies whether to prefix or not prefix an
encoding with a Unicode byte order mark?

Hope this helps
Jay

"Timo Henne" <th*@startext.de> wrote in message
news:2f**************************@posting.google.c om...
Hi Derek,
That's probably the UTF-8 Byte Order Mark (BOM), although AFAIK the BOM
only introduces a pair of "mysterious" characters, not a trio. On the bright side, it suggests that you really have UTF-8 encoding there. ;-)

The BOM exists because UTF-8 characters can be encoded as little-endian
or Big-Endian. Its possible to suppress the BOM,


<-- snip -->

can you tell me how to suppress the BOM?

Thanx,
Timo

Nov 12 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Andrew | last post by:
Some have suggested that using serialize() and unserialize is faster than reading/writing an array to disk as a simple text file using $array = file('numbers.txt'); Can anyone justify this? ...
0
by: IMS.Rushikesh | last post by:
Hi All, I want to serialize an object which contain the DataTable. As DataSet is MarshalByRef object and is serializable. I am using it directly. Even my data is also serialize and save it to...
5
by: David Sworder | last post by:
Hi, I've created a UserControl-derived class called MyUserControl that is able to persist and subsequently reload its state. It exposes two methods as follows: public void Serialize(Stream...
17
by: Chad Myers | last post by:
I've been perf testing an application of mine and I've noticed that there are a lot (and I mean A LOT -- megabytes and megabytes of 'em) System.String instances being created. I've done some...
32
by: tshad | last post by:
Can you do a search for more that one string in another string? Something like: someString.IndexOf("something1","something2","something3",0) or would you have to do something like: if...
1
by: Roy | last post by:
Hi, I have a problem that I have been working with for a while. I need to be able from server side (asp.net) to detect that the file i'm streaming down to the client is saved...
3
by: Michael H | last post by:
I'm would like to enconde my XML into ISO-8859-1, but I can't seem to find howto. ANy suggestions? StringWriter writer = new StringWriter(); XmlTextWriter xmlWriter = new XmlTextWriter(writer);...
1
by: Rick Luckwell | last post by:
I have 3 collections(reports, services, charts) of objects(report, service, chart) that are nested with each other. When I serialize the object the output only contains reports and services but...
6
by: frohlinger | last post by:
Hi, I need to perform some numeric calculations on a numeric float value, that is received as wstring. I would like to perform a check before converting the wstring to float, checking that indeed...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.