XmlTextReader bug ?

Meir S.

I think the following is a bug in XmlTextReader:

I need to process large XMLs, that are typically
constructed of many small elements nested in the root element.

Each inner element represents a command, so I have to parse and
execute them according to the order they appear in the ROOT element.

I don't want to use XmlDocument for the entire huge XML,
but can use XmlDocument for each small element at the time.

My Xml looks like:
<ROOT>
<Add src="...">
<Targets>
<Target name="..." />
<Target name="..." />
</Targets>
</Add>

<Delete ID="...">
</Delete>

<Move ID="...">
<Dest name="..."/>
</Move>

...
... (many more commands)
...
</ROOT>

(note that the inner xml for each action is entirely different:
Add is using an xml totally different than that of Delete or Move)

So, the natural thing to do was to use an XmlTextReader,
iterate through the top-level elements within ROOT,
peek the required action, and send the OuterXml of the
command to the proper method, as follows:

string actionsXml=....; // supplied by user
XmlTextReader xmlRdr=null;
try
{
xmlRdr = new XmlTextReader(new StringReader(actionsXml));

// Important line! Without it, the bug is much worse!
xmlRdr.WhitespaceHandling = WhitespaceHandling.All;

xmlRdr.MoveToContent();
while(xmlRdr.Read())
{
if(xmlRdr.Depth == 1)
{
switch (xmlRdr.NodeType)
{
case XmlNodeType.Element:
string tmpAction = xmlRdr.LocalName;
string tmpLowerAction = tmpAction.ToLower();
if(tmpLowerAction.Equals("add"))
DoAdd(xmlRdr.ReadOuterXml());
else if(tmpLowerAction.Equals("delete"))
tmpRet = DoDel(xmlRdr.ReadOuterXml());
break;
}
}
}
}
finally
{
if (xmlRdr!=null)
xmlRdr.Close();
}
Seems simple BUT:
The above code works ONLY if there are newlines
between the elements !!!

If the XML is supplied as one contiguous line,
then every OTHER element is SKIPPED!!!

Also:
if you omit this line: "xmlRdr.WhitespaceHandling =
WhitespaceHandling.All;"
then the bug is much worse, and even newlines don't help.

I assume that when I call ReadOuterXml(), an extra char
is read (probably the "<" of the next element), causing
the next element to be entirely skipped.
For now, I'm using a workaround at the code that submits
the XML, making sure newlines are inserted, but
I'd love to have this fixed...

Thanks
Meir

Nov 11 '05 #1

Subscribe Reply

5348

Kirk Allen Evans

The problem is not the XmlReader or XmlTextReader. The problem is with your
algorithm and/or understanding of what XmlReader is doing. Each call to
XmlReader.Read() advances the node pointer one node forward in the stream.
ReadOuterXml reads the current node and its children up to the start element
of the next sibling. Assume you are pointing to the "Add" element in your
XML:

After Read():
reader.Name == "Add"
reader.NodeType == XmlNodeType.Element
After ReadOuterXml():
reader.Name == "Delete"
reader.NodeType == XmlNodeType.Element

Notice that you call Read, so the reader.Name is "Add". You then call
ReadOuterXml. The pointer is then advanced to the start of the next
sibling, which is "Delete". Here is where your algorithm / understanding is
flawed. You then go into the loop again, calling Read(). The pointer is
then advanced to the end element for "Delete". The first "if" test passes,
because the Depth is indeed "1" (although it is the end element). The
switch test fails, because the NodeType is EndElement, not Element, so it
breaks through to the start of the while loop again. Effectively, you have
skipped the entire "Delete" element due to your algorithm.

Here is a different algorithm that works. Instead of using Reader.Read() as
a control for the loop, test for Reader.EOF.

private void Page_Load(object sender, System.EventArgs e)
{
XmlTextReader reader = new
XmlTextReader(Server.MapPath("data/xmlfile5.xml"));
reader.WhitespaceHandling = System.Xml.WhitespaceHandling.None;
while(!reader.EOF)
{
if(reader.IsStartElement() && reader.Depth == 1)
{
//This is an element representing a command, like
// Add, Delete, or Move
switch (reader.Name.ToLower())
{
case "add":
DoAdd(reader.ReadOuterXml());
break;
case "delete":
DoDel(reader.ReadOuterXml());
break;
default:
//Catches any types that you forgot to process yet.
reader.Skip();
break;
}
}
else
{
reader.Read();
}
}
reader.Close();
}

--
Kirk Allen Evans
www.xmlandasp.net
Read my web log at http://weblogs.asp.net/kaevans
"Meir S." <me**@clearforest.com> wrote in message
news:up**************@tk2msftngp13.phx.gbl...

I think the following is a bug in XmlTextReader:

I need to process large XMLs, that are typically
constructed of many small elements nested in the root element.

Each inner element represents a command, so I have to parse and
execute them according to the order they appear in the ROOT element.

I don't want to use XmlDocument for the entire huge XML,
but can use XmlDocument for each small element at the time.

My Xml looks like:
<ROOT>
<Add src="...">
<Targets>
<Target name="..." />
<Target name="..." />
</Targets>
</Add>

<Delete ID="...">
</Delete>

<Move ID="...">
<Dest name="..."/>
</Move>

...
... (many more commands)
...
</ROOT>

(note that the inner xml for each action is entirely different:
Add is using an xml totally different than that of Delete or Move)

So, the natural thing to do was to use an XmlTextReader,
iterate through the top-level elements within ROOT,
peek the required action, and send the OuterXml of the
command to the proper method, as follows:

string actionsXml=....; // supplied by user
XmlTextReader xmlRdr=null;
try
{
xmlRdr = new XmlTextReader(new StringReader(actionsXml));

// Important line! Without it, the bug is much worse!
xmlRdr.WhitespaceHandling = WhitespaceHandling.All;

xmlRdr.MoveToContent();
while(xmlRdr.Read())
{
if(xmlRdr.Depth == 1)
{
switch (xmlRdr.NodeType)
{
case XmlNodeType.Element:
string tmpAction = xmlRdr.LocalName;
string tmpLowerAction = tmpAction.ToLower();
if(tmpLowerAction.Equals("add"))
DoAdd(xmlRdr.ReadOuterXml());
else if(tmpLowerAction.Equals("delete"))
tmpRet = DoDel(xmlRdr.ReadOuterXml());
break;
}
}
}
}
finally
{
if (xmlRdr!=null)
xmlRdr.Close();
}
Seems simple BUT:
The above code works ONLY if there are newlines
between the elements !!!

If the XML is supplied as one contiguous line,
then every OTHER element is SKIPPED!!!

Also:
if you omit this line: "xmlRdr.WhitespaceHandling =
WhitespaceHandling.All;"
then the bug is much worse, and even newlines don't help.

I assume that when I call ReadOuterXml(), an extra char
is read (probably the "<" of the next element), causing
the next element to be entirely skipped.
For now, I'm using a workaround at the code that submits
the XML, making sure newlines are inserted, but
I'd love to have this fixed...

Thanks
Meir

Nov 11 '05 #2

Sean Gephardt [MS]

I've run into this problem also,
so depending on what you are trying to do, this sample (below) may help.

In the code below, I was trying to read from a text file that had UNC paths,
and I needed to convert the paths to URLs.

When I did "string tmpFile = r.ReadLine();",
The StreamReader object would skip either the first char (and sometimes the
first two chars),
when using "while (r.Read()) { ... ".

using System;
using System.IO;
using System.Text;
using System.Text.RegularExpressions;
using System.Net;
using System.Windows.Forms;
using System.Web;

namespace CheckPages
{
class Class1
{
private const string FileList = "badexe.txt";
public const string ReportFile = "badexe.htm";
// public static string tmpFile = "";

/// <summary>
/// A simple file parsing application
/// </summary>
[STAThread]
static void Main(string[] args)
{
StreamWriter report = File.CreateText(ReportFile);
report.WriteLine("<html>");
report.WriteLine("<head>");
report.WriteLine("<title>Bad EXEs pages</title>");
report.WriteLine("</head>");
report.WriteLine("<body>");

StreamReader r = new
StreamReader((System.IO.Stream)File.OpenRead(FileL ist),System.Text.Encoding.
ASCII);
r.BaseStream.Seek(0, SeekOrigin.Begin);
while (r.Peek() > -1)
{
string tmpFile = r.ReadLine();
tmpFile =
tmpFile.Replace("\\\\myTestServer\\shared_folder\\ ","http://myWebSiteDomain/
");
tmpFile = tmpFile.Replace("\\","/");
if (tmpFile.IndexOf("_vti_cnf") == -1)
{ report.WriteLine("<a href=\"" + tmpFile + "\">" + tmpFile +
"</a><br/>"); }
}

report.WriteLine("</body>");
report.WriteLine("</html>");
report.AutoFlush = true;
report.Close();
r.Close();
}
}
}

--
Sean Gephardt
MSDN Online SDE

This posting is provided "AS IS"
with no warranties, and confers no rights.

"Meir S." <me**@clearforest.com> wrote in message
news:up**************@tk2msftngp13.phx.gbl...

I think the following is a bug in XmlTextReader:

I need to process large XMLs, that are typically
constructed of many small elements nested in the root element.

Each inner element represents a command, so I have to parse and
execute them according to the order they appear in the ROOT element.

I don't want to use XmlDocument for the entire huge XML,
but can use XmlDocument for each small element at the time.

My Xml looks like:
<ROOT>
<Add src="...">
<Targets>
<Target name="..." />
<Target name="..." />
</Targets>
</Add>

<Delete ID="...">
</Delete>

<Move ID="...">
<Dest name="..."/>
</Move>

...
... (many more commands)
...
</ROOT>

(note that the inner xml for each action is entirely different:
Add is using an xml totally different than that of Delete or Move)

So, the natural thing to do was to use an XmlTextReader,
iterate through the top-level elements within ROOT,
peek the required action, and send the OuterXml of the
command to the proper method, as follows:

string actionsXml=....; // supplied by user
XmlTextReader xmlRdr=null;
try
{
xmlRdr = new XmlTextReader(new StringReader(actionsXml));

// Important line! Without it, the bug is much worse!
xmlRdr.WhitespaceHandling = WhitespaceHandling.All;

xmlRdr.MoveToContent();
while(xmlRdr.Read())
{
if(xmlRdr.Depth == 1)
{
switch (xmlRdr.NodeType)
{
case XmlNodeType.Element:
string tmpAction = xmlRdr.LocalName;
string tmpLowerAction = tmpAction.ToLower();
if(tmpLowerAction.Equals("add"))
DoAdd(xmlRdr.ReadOuterXml());
else if(tmpLowerAction.Equals("delete"))
tmpRet = DoDel(xmlRdr.ReadOuterXml());
break;
}
}
}
}
finally
{
if (xmlRdr!=null)
xmlRdr.Close();
}
Seems simple BUT:
The above code works ONLY if there are newlines
between the elements !!!

If the XML is supplied as one contiguous line,
then every OTHER element is SKIPPED!!!

Also:
if you omit this line: "xmlRdr.WhitespaceHandling =
WhitespaceHandling.All;"
then the bug is much worse, and even newlines don't help.

I assume that when I call ReadOuterXml(), an extra char
is read (probably the "<" of the next element), causing
the next element to be entirely skipped.
For now, I'm using a workaround at the code that submits
the XML, making sure newlines are inserted, but
I'd love to have this fixed...

Thanks
Meir

Nov 11 '05 #3

Meir S.

Thanks! I'll fix my code BUT:
I still think it's a BUG:
Why does ReadOuterXml() also reads the next start-element ?
It is supposed to read up to the end-element, and that's it!
The next start-element is not part of the OuterXml.

I can understand that maybe the low-level implementation
needs to read it to identify the end of the outer xml,
but I think that it should "rewind" so that it doesn't
consume the next start-element.

Thanks again,
Meir

"Kirk Allen Evans" <ka*****@nospamxmlandasp.net> wrote in message
news:Ok**************@TK2MSFTNGP11.phx.gbl...

The problem is not the XmlReader or XmlTextReader. The problem is with your algorithm and/or understanding of what XmlReader is doing. Each call to
XmlReader.Read() advances the node pointer one node forward in the stream.
ReadOuterXml reads the current node and its children up to the start element of the next sibling. Assume you are pointing to the "Add" element in your
XML:

After Read():
reader.Name == "Add"
reader.NodeType == XmlNodeType.Element
After ReadOuterXml():
reader.Name == "Delete"
reader.NodeType == XmlNodeType.Element

Notice that you call Read, so the reader.Name is "Add". You then call
ReadOuterXml. The pointer is then advanced to the start of the next
sibling, which is "Delete". Here is where your algorithm / understanding is flawed. You then go into the loop again, calling Read(). The pointer is
then advanced to the end element for "Delete". The first "if" test passes, because the Depth is indeed "1" (although it is the end element). The
switch test fails, because the NodeType is EndElement, not Element, so it
breaks through to the start of the while loop again. Effectively, you have skipped the entire "Delete" element due to your algorithm.

Here is a different algorithm that works. Instead of using Reader.Read() as a control for the loop, test for Reader.EOF.

private void Page_Load(object sender, System.EventArgs e)
{
XmlTextReader reader = new
XmlTextReader(Server.MapPath("data/xmlfile5.xml"));
reader.WhitespaceHandling = System.Xml.WhitespaceHandling.None;
while(!reader.EOF)
{
if(reader.IsStartElement() && reader.Depth == 1)
{
//This is an element representing a command, like
// Add, Delete, or Move
switch (reader.Name.ToLower())
{
case "add":
DoAdd(reader.ReadOuterXml());
break;
case "delete":
DoDel(reader.ReadOuterXml());
break;
default:
//Catches any types that you forgot to process yet.
reader.Skip();
break;
}
}
else
{
reader.Read();
}
}
reader.Close();
}

--
Kirk Allen Evans
www.xmlandasp.net
Read my web log at http://weblogs.asp.net/kaevans
"Meir S." <me**@clearforest.com> wrote in message
news:up**************@tk2msftngp13.phx.gbl...
I think the following is a bug in XmlTextReader:

I need to process large XMLs, that are typically
constructed of many small elements nested in the root element.

Each inner element represents a command, so I have to parse and
execute them according to the order they appear in the ROOT element.

I don't want to use XmlDocument for the entire huge XML,
but can use XmlDocument for each small element at the time.

My Xml looks like:
<ROOT>
<Add src="...">
<Targets>
<Target name="..." />
<Target name="..." />
</Targets>
</Add>

<Delete ID="...">
</Delete>

<Move ID="...">
<Dest name="..."/>
</Move>

...
... (many more commands)
...
</ROOT>

(note that the inner xml for each action is entirely different:
Add is using an xml totally different than that of Delete or Move)

So, the natural thing to do was to use an XmlTextReader,
iterate through the top-level elements within ROOT,
peek the required action, and send the OuterXml of the
command to the proper method, as follows:

string actionsXml=....; // supplied by user
XmlTextReader xmlRdr=null;
try
{
xmlRdr = new XmlTextReader(new StringReader(actionsXml));

// Important line! Without it, the bug is much worse!
xmlRdr.WhitespaceHandling = WhitespaceHandling.All;

xmlRdr.MoveToContent();
while(xmlRdr.Read())
{
if(xmlRdr.Depth == 1)
{
switch (xmlRdr.NodeType)
{
case XmlNodeType.Element:
string tmpAction = xmlRdr.LocalName;
string tmpLowerAction = tmpAction.ToLower();
if(tmpLowerAction.Equals("add"))
DoAdd(xmlRdr.ReadOuterXml());
else if(tmpLowerAction.Equals("delete"))
tmpRet = DoDel(xmlRdr.ReadOuterXml());
break;
}
}
}
}
finally
{
if (xmlRdr!=null)
xmlRdr.Close();
}
Seems simple BUT:
The above code works ONLY if there are newlines
between the elements !!!

If the XML is supplied as one contiguous line,
then every OTHER element is SKIPPED!!!

Also:
if you omit this line: "xmlRdr.WhitespaceHandling =
WhitespaceHandling.All;"
then the bug is much worse, and even newlines don't help.

I assume that when I call ReadOuterXml(), an extra char
is read (probably the "<" of the next element), causing
the next element to be entirely skipped.
For now, I'm using a workaround at the code that submits
the XML, making sure newlines are inserted, but
I'd love to have this fixed...

Thanks
Meir

Nov 11 '05 #4

Kirk Allen Evans

Again, it is a bug in your algorithm. Read the documentation for
XmlReader.ReadOuterXml, this behavior is very clearly stated.

http://msdn.microsoft.com/library/en...asp?frame=true

I will even go so far as to quote the relevant part for you.

Node Type: Element
Position Before the Call: On the item1 start tag
XML Fragment: <item1>text1</item1><item2>text2</item2>
Return Value: <item1>text1</item1>
Position After the Call: On the item2 start tag.

The behavior is documented clearly, and it stated in three different ways on
the same page to avoid ambiguity. You cannot still claim (with a straight
face, anyway) that you believe this to be a bug in XmlReader or its
implementations.

--
Kirk Allen Evans
www.xmlandasp.net
Read my web log at http://weblogs.asp.net/kaevans
"Meir S." <me**@clearforest.com> wrote in message
news:uc**************@tk2msftngp13.phx.gbl...

Thanks! I'll fix my code BUT:
I still think it's a BUG:
Why does ReadOuterXml() also reads the next start-element ?
It is supposed to read up to the end-element, and that's it!
The next start-element is not part of the OuterXml.

I can understand that maybe the low-level implementation
needs to read it to identify the end of the outer xml,
but I think that it should "rewind" so that it doesn't
consume the next start-element.

Thanks again,
Meir

"Kirk Allen Evans" <ka*****@nospamxmlandasp.net> wrote in message
news:Ok**************@TK2MSFTNGP11.phx.gbl...
The problem is not the XmlReader or XmlTextReader. The problem is with your
algorithm and/or understanding of what XmlReader is doing. Each call to
XmlReader.Read() advances the node pointer one node forward in the stream. ReadOuterXml reads the current node and its children up to the start

element
of the next sibling. Assume you are pointing to the "Add" element in your XML:

After Read():
reader.Name == "Add"
reader.NodeType == XmlNodeType.Element
After ReadOuterXml():
reader.Name == "Delete"
reader.NodeType == XmlNodeType.Element

Notice that you call Read, so the reader.Name is "Add". You then call
ReadOuterXml. The pointer is then advanced to the start of the next
sibling, which is "Delete". Here is where your algorithm / understanding is
flawed. You then go into the loop again, calling Read(). The pointer
is then advanced to the end element for "Delete". The first "if" test

passes,
because the Depth is indeed "1" (although it is the end element). The
switch test fails, because the NodeType is EndElement, not Element, so it breaks through to the start of the while loop again. Effectively, you

have
skipped the entire "Delete" element due to your algorithm.

Here is a different algorithm that works. Instead of using

Reader.Read() as
a control for the loop, test for Reader.EOF.

private void Page_Load(object sender, System.EventArgs e)
{
XmlTextReader reader = new
XmlTextReader(Server.MapPath("data/xmlfile5.xml"));
reader.WhitespaceHandling = System.Xml.WhitespaceHandling.None;
while(!reader.EOF)
{
if(reader.IsStartElement() && reader.Depth == 1)
{
//This is an element representing a command, like
// Add, Delete, or Move
switch (reader.Name.ToLower())
{
case "add":
DoAdd(reader.ReadOuterXml());
break;
case "delete":
DoDel(reader.ReadOuterXml());
break;
default:
//Catches any types that you forgot to process yet.
reader.Skip();
break;
}
}
else
{
reader.Read();
}
}
reader.Close();
}

--
Kirk Allen Evans
www.xmlandasp.net
Read my web log at http://weblogs.asp.net/kaevans
"Meir S." <me**@clearforest.com> wrote in message
news:up**************@tk2msftngp13.phx.gbl...
I think the following is a bug in XmlTextReader:

I need to process large XMLs, that are typically
constructed of many small elements nested in the root element.

Each inner element represents a command, so I have to parse and
execute them according to the order they appear in the ROOT element.

I don't want to use XmlDocument for the entire huge XML,
but can use XmlDocument for each small element at the time.

My Xml looks like:
<ROOT>
<Add src="...">
<Targets>
<Target name="..." />
<Target name="..." />
</Targets>
</Add>

<Delete ID="...">
</Delete>

<Move ID="...">
<Dest name="..."/>
</Move>

...
... (many more commands)
...
</ROOT>

(note that the inner xml for each action is entirely different:
Add is using an xml totally different than that of Delete or Move)

So, the natural thing to do was to use an XmlTextReader,
iterate through the top-level elements within ROOT,
peek the required action, and send the OuterXml of the
command to the proper method, as follows:

string actionsXml=....; // supplied by user
XmlTextReader xmlRdr=null;
try
{
xmlRdr = new XmlTextReader(new StringReader(actionsXml));

// Important line! Without it, the bug is much worse!
xmlRdr.WhitespaceHandling = WhitespaceHandling.All;

xmlRdr.MoveToContent();
while(xmlRdr.Read())
{
if(xmlRdr.Depth == 1)
{
switch (xmlRdr.NodeType)
{
case XmlNodeType.Element:
string tmpAction = xmlRdr.LocalName;
string tmpLowerAction = tmpAction.ToLower();
if(tmpLowerAction.Equals("add"))
DoAdd(xmlRdr.ReadOuterXml());
else if(tmpLowerAction.Equals("delete"))
tmpRet = DoDel(xmlRdr.ReadOuterXml());
break;
}
}
}
}
finally
{
if (xmlRdr!=null)
xmlRdr.Close();
}
Seems simple BUT:
The above code works ONLY if there are newlines
between the elements !!!

If the XML is supplied as one contiguous line,
then every OTHER element is SKIPPED!!!

Also:
if you omit this line: "xmlRdr.WhitespaceHandling =
WhitespaceHandling.All;"
then the bug is much worse, and even newlines don't help.

I assume that when I call ReadOuterXml(), an extra char
is read (probably the "<" of the next element), causing
the next element to be entirely skipped.
For now, I'm using a workaround at the code that submits
the XML, making sure newlines are inserted, but
I'd love to have this fixed...

Thanks
Meir

Nov 11 '05 #5

Similar topics