473,382 Members | 1,329 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

XmlDocument.ReadNode() breaking - why?

Hi...

A colleague just referred this question to me. He's getting an xml file
from another party, which he's trying to process into another dom using an
XmlTextReader and XmlDocument.ReadNode(). The problem is that it's breaking
and he doesn't understand why. I didn't exactly either, which is why I'm
posting a question here.

First, his program just creates a new dom using new document like this:
XmlDocument xml = new XmlDocument();
XmlElement root = xml.CreateElement("root");
xml.AppendChild(root);

Then it starts sucking in various xml files on disk like this
StreamReader streamreader = File.OpenText(fPath);
XmlTextReader reader = new XmlTextReader(streamreader);
reader.MoveToContent();
XmlNode node = xml.ReadNode(reader);
root.AppendChild(node);

What's happening is that for this weird xml file he gets, the
xml.ReadNode(reader); line throws an encoding error.

The file he got has a bunch of high-bit characters (looks like garbage) that
are valid iso-8859-1 (the document's declared encoding) in a CDATA section.
The error that ReadNode() throws appears to be that the XmlTextReader is
trying to read through this CDATA blob as utf-8, trying to mash these
individual high-bit characters back together according to utf-8 rules to
make unicode chars out of them. Specifically, it's trying to mash ED B3 A8
into &DCE8;, and ReadNode() throws an error that that is an invalid character.

It's as though the XmlTextReader is applying the encoding rules of the
parent dom calling ReadNode() rather than paying attention to the encoding
declaration it saw go by.

The xml file in question does parse successfully on its own.

Is there anything to do so that XmlTextReader/ReadNode pay attention to the
information going by them as it parses?

I know I could recommend that he use xml.ImportNode() to suck the result of
the parsing into his main dom, but I'd like to better understand the rules
ReadNode/XmlTextReader are going to be using; if I can get it to handle the
encoding issues better it seems like it would be more efficient than

XmlDocument xFile = new XmlDocument();
xFile.Load (filePath);
XmlNode n = xml.ImportNode (xFile.documentElement, true);
xml.documentElement.AppendChild (n);

Thanks
-mark

Nov 12 '05 #1
7 4150
"Mark" <mm******@nospam.nospam> wrote in message news:7E**********************************@microsof t.com...
XmlTextReader reader = new XmlTextReader(streamreader);
reader.MoveToContent();
XmlNode node = xml.ReadNode(reader); : : Is there anything to do so that XmlTextReader/ReadNode pay attention to the
information going by them as it parses?


Is reader.Encoding equal to null? (If unspecified, it will be UTF-8.)

If reader.Encoding is not null, then what is reader.Encoding.EncodingName?
Derek Harmon
Nov 12 '05 #2
Hi. The reader.Encoding property is System.Text.UnicodeEncoding. The
reader.Encoding.EncodingName is Unicode. Thanks, David.

Derek Harmon wrote:
"Mark" <mm******@nospam.nospam> wrote in message news:7E**********************************@microsof t.com...
XmlTextReader reader = new XmlTextReader(streamreader);
reader.MoveToContent();
XmlNode node = xml.ReadNode(reader);

: :
Is there anything to do so that XmlTextReader/ReadNode pay attention to the information going by them as it parses?


Is reader.Encoding equal to null? (If unspecified, it will be

UTF-8.)
If reader.Encoding is not null, then what is reader.Encoding.EncodingName?

Derek Harmon


Nov 12 '05 #3
Hi. The reader.Encoding property is System.Text.UnicodeEncoding. The
reader.Encoding.EncodingName is Unicode. Thanks, David.

Derek Harmon wrote:
"Mark" <mm******@nospam.nospam> wrote in message news:7E**********************************@microsof t.com...
XmlTextReader reader = new XmlTextReader(streamreader);
reader.MoveToContent();
XmlNode node = xml.ReadNode(reader);

: :
Is there anything to do so that XmlTextReader/ReadNode pay attention to the information going by them as it parses?


Is reader.Encoding equal to null? (If unspecified, it will be

UTF-8.)
If reader.Encoding is not null, then what is reader.Encoding.EncodingName?

Derek Harmon


Nov 12 '05 #4
Hi Derek...

As David noted, the reader encoding appears to be defaulting to unicode, so
that seems to be the problem. But why doesn't the reader pay attention to
the processing directive that passed under its nose? It is a step you can do
manually (check to see if the first node you have is a processing directive
and grab the encoding yourself), but it seems like one of those things that
would have been good to build in under the covers too.

Thanks
-mark
"Derek Harmon" wrote:
"Mark" <mm******@nospam.nospam> wrote in message news:7E**********************************@microsof t.com...
XmlTextReader reader = new XmlTextReader(streamreader);
reader.MoveToContent();
XmlNode node = xml.ReadNode(reader);

: :
Is there anything to do so that XmlTextReader/ReadNode pay attention to the
information going by them as it parses?


Is reader.Encoding equal to null? (If unspecified, it will be UTF-8.)

If reader.Encoding is not null, then what is reader.Encoding.EncodingName?
Derek Harmon

Nov 12 '05 #5
Okay... Just tried it, and XmlTextReader.Encoding is a read-only property.
The only way I can see to change that is with the constructor that takes an
XmlParserContext - but this leads to the counter-intuitive fact that you have
to guess the document encoding before you create the reader that will read
your xml file.

It doesn't appear that XmlTextReader will pay attention to the encoding in
the processing directive, leaving you kinda high and dry. Is this really the
case? seems like a bad way to make the tools.

Thanks
-mark
"Mark" wrote:
Hi Derek...

As David noted, the reader encoding appears to be defaulting to unicode, so
that seems to be the problem. But why doesn't the reader pay attention to
the processing directive that passed under its nose? It is a step you can do
manually (check to see if the first node you have is a processing directive
and grab the encoding yourself), but it seems like one of those things that
would have been good to build in under the covers too.

Thanks
-mark
"Derek Harmon" wrote:
"Mark" <mm******@nospam.nospam> wrote in message news:7E**********************************@microsof t.com...
XmlTextReader reader = new XmlTextReader(streamreader);
reader.MoveToContent();
XmlNode node = xml.ReadNode(reader);

: :
Is there anything to do so that XmlTextReader/ReadNode pay attention to the
information going by them as it parses?


Is reader.Encoding equal to null? (If unspecified, it will be UTF-8.)

If reader.Encoding is not null, then what is reader.Encoding.EncodingName?
Derek Harmon

Nov 12 '05 #6
Just closing the loop here a bit - another person here pointed out that it
seemed to depend on how you create the stream you feed to XmlTextReader in
the first place.

If you use File.OpenText() to get a StreamReader and then construct
XmlTextReader with a StreamReader, it appears to lock the encoding in place
and XmlTextReader will not respect the processing directive.

If you use File.OpenRead() to get a simple FileStream and use *that* to
construct XmlTextReader, the XmlTextReader is more responsive to what's in
the stream it's reading.

Thanks
-mark

Nov 12 '05 #7
"Mark" <mm******@nospam.nospam> wrote in message news:A3**********************************@microsof t.com...
As David noted, the reader encoding appears to be defaulting to unicode, so
that seems to be the problem.
When the encoding of the XMLDecl and the encoding of the content
presented to the reader are different, then you will have problems.
If you use File.OpenText() to get a StreamReader and then construct
XmlTextReader with a StreamReader, it appears to lock the encoding
in place and XmlTextReader will not respect the processing directive.
The documentation for File.OpenText( ) is clear about interpreting
the file as UTF-8,

http://msdn.microsoft.com/library/en...ntexttopic.asp

Even though the file MAY be encoded as iso-8859-1, doing this will
"present" the file's contents as UTF-8.

The encoding of the I/O StreamReader is paramount because remember,
XmlTextReader depends upon the StreamReader's Read( ) method(s).
The StreamReader is responsible for decoding from whatever bytes
are in the file to characters using it's encoding (it knows nothing about
XMLDecl).
If you use File.OpenRead() to get a simple FileStream and use *that*
to construct XmlTextReader, the XmlTextReader is more responsive
to what's in the stream it's reading.
FileStreams can be binary, therefore choosing a FileStream gives the
XmlTextReader the option to read *bytes* instead of characters. It
then has something to say about what encoding it uses to perform this
translation.
It is a step you can do manually (check to see if the first node
you have is a processing directive and grab the encoding yourself)


That's the XmlDeclaration's Encoding property. It won't appear as
an XmlProcessingInstruction. You could use this code to inject an
XMLDecl if one isn't already present,

if ( xml.FirstChild.NodeType != XmlNodeType.XmlDeclaration )
{
XmlDeclaration decl = xml.CreateXmlDeclaration( "1.0", "iso-8859-1", null);
xml.InsertBefore( decl, xml.FirstChild);
}

to set the XML Declaration if one doesn't exist. To read the
encoding off of an XmlDocument?'s XMLDecl,

string encodingStr = null;
if ( xml.FirstChild.NodeType == XmlNodeType.XmlDeclaration )
encodingStr = (XmlDeclaration)( xml.FirstChild).Encoding;
encodingStr = ( encodingStr == null ) ? "UTF-8" : encodingStr;

If the XmlDocument's FirstChild isn't of XmlNodeType.XmlDeclaration
then it doesn't have an XMLDecl. If there is no XMLDecl, or there is
one without an Encoding, then the encoding is UTF-8 by default.

In my experience, when the encoding on the XMLDecl matches the
encoding of the content, there are no problems.

I've tried producing a file to match your example like this,

- - - WriteOut.cs
using System;
using System.IO;
using System.Text;
using System.Xml;

public class WriteOutIso8859_1
{
public static void Main( )
{
FileStream fs = new FileStream( "iso8859_1.xml", FileMode.CreateNew);
StreamWriter writer = new StreamWriter( fs, Encoding.GetEncoding( "iso-8859-1"));
writer.WriteLine( "<?xml version='1.0' encoding='iso-8859-1'?>");
writer.WriteLine( "<root>");
writer.WriteLine( "\t<first>Hello World</first>");
writer.Write( "\t<second><![CDATA[");
writer.Write( new char[] { (char)0xED, (char)0xB3, (char)0xA8} );
writer.WriteLine( "]]></second>");
writer.WriteLine( "</root>");
writer.Flush( );
writer.Close( );
}
}
- - -

When I read this file in with the following code I have no problems.

FileStream fs = new FileStream( "iso8859_1.xml", FileMode.Open);
StreamReader sw = new StreamReader( fs, Encoding.GetEncoding( "iso-8859-1"));
XmlTextReader reader = new XmlTextReader( sw);
reader.MoveToContent( );
XmlNode node = xmlDoc.ReadNode( reader);
Derek Harmon
Nov 12 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Yechezkal Gutfreund | last post by:
I have been using the following code (successfully) to read Xml formated text packets from a TCP stream. The output from the server stream consists of a sequence of well formed Xml documents...
1
by: Martin Honnen | last post by:
With both .NET 1.0 and 1.1 I have found the following strange behaviour where System.Xml.XmlDocument.LoadXml doesn't throw an error when parsing a text node with a character reference to an invalid...
2
by: Graham Pengelly | last post by:
Hi I am trying to transform on System.Xml.XmlDocument into another using XslTransform without writing the object out to a file. I am guessing it should work something like this... public...
2
by: Dave | last post by:
Hi, Is there an easier way to pull a subset of nodes from one XmlDocument to another? I have the code below but would like to know if there is a more streamlined method. Thanks, Dave ...
8
by: pete | last post by:
Hi there, Can someone explain to me why I can't bind to an XmlDocument but I can bind to an XmlNodeList. It's my understanding that they both implement the IEnumerable interface which is...
2
by: Joe Gass | last post by:
Hi I have a query that uses for xml auto I'd like to load this into an xmldocument with an xmlreader e.g. Dim xr As XmlReader xmlDoc = New XmlDocument xr = oCom.ExecuteXmlReader If xr.Read...
1
by: Peter Nofelt | last post by:
Hey All, I'm running into this issue with parsing through an xml document by tag name. Below is an example xml document: File Name: things.xml <things> <people> <name>Peter</name>
2
by: John Smith | last post by:
I'm writing webervice client using .Net 2.0. I have this class: public class MyWebService : SoapHttpClientProtocol { public XmlDocument validate(string url, XmlDocument xmlDocument) {...
4
by: =?Utf-8?B?TWFyaw==?= | last post by:
Hi... We've got a lot of places in our code where we read relatively small xml user preference blocks. Currently that's creating a new XmlDocument in every spot. I was thinking we might see...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.