Connecting Tech Pros Worldwide Help | Site Map

How to obtain a utf-8 string from an XmlReader?

  #1  
Old October 20th, 2008, 04:55 PM
SammyBar
Guest
 
Posts: n/a
Hi all,

I'm trying to convert the xml obtained from a XmlReader object into a UTF-8
array. My general idea is to read the XmlReader and write into a
MemoryStream. Then convert the MemoryStream bytes into utf-8.

MemoryStream ms = new MemoryStream();
XmlTextWriter xmlWriter = new XmlTextWriter(ms, new UTF8Encoding(false));

writer.Formatting = Formatting.Indented;
writer.Namespaces = false;
writer.Indentation = 4;

while(xmlReader.Read())
{
xmlWriter.Write(?);
}

xmlWriter.Flush();
xmlWriter.Close();

string xml_as_utf8 = Encoding.UTF8.GetString(ms.ToArray());

But I fill the XmlReader and XmlWriter are not made for this purpose.
xmlReader.Read() parses the xml stream, and xmlWriter is done to create xml
element by element.
Which is the correct strategy here?

Thanks in advance
Sammy



  #2  
Old October 20th, 2008, 05:15 PM
Martin Honnen
Guest
 
Posts: n/a

re: How to obtain a utf-8 string from an XmlReader?


SammyBar wrote:
Quote:
Hi all,
>
I'm trying to convert the xml obtained from a XmlReader object into a UTF-8
array. My general idea is to read the XmlReader and write into a
MemoryStream. Then convert the MemoryStream bytes into utf-8.
>
MemoryStream ms = new MemoryStream();
XmlTextWriter xmlWriter = new XmlTextWriter(ms, new UTF8Encoding(false));
>
writer.Formatting = Formatting.Indented;
writer.Namespaces = false;
writer.Indentation = 4;
>
while(xmlReader.Read())
{
xmlWriter.Write(?);
}
>
xmlWriter.Flush();
xmlWriter.Close();
>
string xml_as_utf8 = Encoding.UTF8.GetString(ms.ToArray());
>
But I fill the XmlReader and XmlWriter are not made for this purpose.
xmlReader.Read() parses the xml stream, and xmlWriter is done to create xml
element by element.
Which is the correct strategy here?
What exactly is it that you want to achieve?
Strings in the .NET framework are always UTF-16 encoded.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
  #3  
Old October 20th, 2008, 08:05 PM
SammyBar
Guest
 
Posts: n/a

re: How to obtain a utf-8 string from an XmlReader?


I found the following solution: deserialize the text reader into a
XmlDocument object, and then deserialize the it into the MemoryStream.


"SammyBar" <sammybar@gmail.comescribió en el mensaje
news:Ogux9tsMJHA.1304@TK2MSFTNGP02.phx.gbl...
Quote:
Hi all,
>
I'm trying to convert the xml obtained from a XmlReader object into a
UTF-8 array. My general idea is to read the XmlReader and write into a
MemoryStream. Then convert the MemoryStream bytes into utf-8.
>
MemoryStream ms = new MemoryStream();
XmlTextWriter xmlWriter = new XmlTextWriter(ms, new UTF8Encoding(false));
>
writer.Formatting = Formatting.Indented;
writer.Namespaces = false;
writer.Indentation = 4;
>
while(xmlReader.Read())
{
xmlWriter.Write(?);
}
>
xmlWriter.Flush();
xmlWriter.Close();
>
string xml_as_utf8 = Encoding.UTF8.GetString(ms.ToArray());
This is an error: The memory stream is already utf-8 encoded. Then this
previous line re-encodes from utf-8 to the .NET standard encoding so we loss
the encoding when converting.
The solution I found here looks like a hack: to deceive the .NET by reading
the utf-8 saved stream as default encoding:
string xml_data = Encoding.Default.GetString(ms.ToArray());

The code fragments looks like this: (note my input is SqlXml sqlXml
variable)
// load

XmlReader xmlReader = sqlXml.CreateReader();

XmlDocument xDoc = new XmlDocument();

xDoc.Load(xmlReader);

xmlReader.Close();


// save

MemoryStream ms = new MemoryStream();

XmlTextWriter writer = new XmlTextWriter(ms, new UTF8Encoding(false));

writer.Formatting = Formatting.Indented;

writer.Indentation = 4;

xDoc.Save(writer);

writer.Close();

ms.Close();

// convert to utf-8

//string xml_data = Encoding.UTF8.GetString(ms.ToArray());

string xml_data = Encoding.Default.GetString(ms.ToArray());




Closed Thread