Hi...
A colleague just referred this question to me. He's getting an xml file
from another party, which he's trying to process into another dom using an
XmlTextReader and XmlDocument.ReadNode(). The problem is that it's breaking
and he doesn't understand why. I didn't exactly either, which is why I'm
posting a question here.
First, his program just creates a new dom using new document like this:
XmlDocument xml = new XmlDocument();
XmlElement root = xml.CreateElement("root");
xml.AppendChild(root);
Then it starts sucking in various xml files on disk like this
StreamReader streamreader = File.OpenText(fPath);
XmlTextReader reader = new XmlTextReader(streamreader);
reader.MoveToContent();
XmlNode node = xml.ReadNode(reader);
root.AppendChild(node);
What's happening is that for this weird xml file he gets, the
xml.ReadNode(reader); line throws an encoding error.
The file he got has a bunch of high-bit characters (looks like garbage) that
are valid iso-8859-1 (the document's declared encoding) in a CDATA section.
The error that ReadNode() throws appears to be that the XmlTextReader is
trying to read through this CDATA blob as utf-8, trying to mash these
individual high-bit characters back together according to utf-8 rules to
make unicode chars out of them. Specifically, it's trying to mash ED B3 A8
into &DCE8;, and ReadNode() throws an error that that is an invalid character.
It's as though the XmlTextReader is applying the encoding rules of the
parent dom calling ReadNode() rather than paying attention to the encoding
declaration it saw go by.
The xml file in question does parse successfully on its own.
Is there anything to do so that XmlTextReader/ReadNode pay attention to the
information going by them as it parses?
I know I could recommend that he use xml.ImportNode() to suck the result of
the parsing into his main dom, but I'd like to better understand the rules
ReadNode/XmlTextReader are going to be using; if I can get it to handle the
encoding issues better it seems like it would be more efficient than
XmlDocument xFile = new XmlDocument();
xFile.Load (filePath);
XmlNode n = xml.ImportNode (xFile.documentElement, true);
xml.documentElement.AppendChild (n);
Thanks
-mark