"Scott" <Sc***@discussions.microsoft.com> wrote in message news:AF**********************************@microsof t.com...
The RemoveChild method is not performant in 1.0. I am trying to find out
what is the absolute fastest way to remove a set of nodes (i.e.
System.Xml.XmlNodeList) in v2 of System.xml
It's not using System.Xml.XmlNodeList, if you want "absolute" speed
then you need to use an XmlReader, like this one (which removes all
<bar> tags and their descendants):
- - - RemoveTagXmlReader.cs
using System;
using System.Xml;
internal class RemoveTagXmlReader : XmlTextReader
{
private object atomRemoveThis;
public RemoveTagXmlReader( string tagNameToRemove, string filename)
: base( filename )
{
this.atomRemoveThis = this.NameTable.Add( tagNameToRemove);
}
public override bool Read( )
{
bool result = base.Read( );
if (this.NodeType == XmlNodeType.Element &&
this.atomRemoveThis == (object)this.LocalName)
{
this.Skip( );
return this.Read( );
}
return result;
}
}
public class TestApp
{
static void Main( string[] args)
{
XmlReaderSettings settings = new XmlReaderSettings( );
// If you want to produce an output file preserving this information,
// then set these to false.
//
settings.IgnoreComments = true;
settings.IgnoreProcessingInstructions = true;
settings.IgnoreWhitespace = true;
// . . .
// Try turning off CheckCharacters and ConformanceLevel if you
// don't need them for extra boost.
using( XmlReader reader = XmlReader.Create( new RemoveTagXmlReader( "bar", "test.xml"), settings))
{
while (reader.Read( ))
{
Console.WriteLine( string.Format( "{0} {1} {2} {3}",
"\t".PadLeft(reader.Depth), reader.NodeType,
reader.LocalName, ((reader.Value == null) ? "(null)" : reader.Value )));
}
}
}
}
- - -
Your key takeaways here are:
1. Not using XmlNodeList. (If you need the luxury of XmlNodeList, learn
to like your XML slow.)
2. When I create the XmlTextReader subclass, I "atomize" the string I'm
going to be doing comparisons against by adding it to the XmlNameTable.
This permits faster object reference identity comparisons to be performed
on the string when it's a element, prefix, namespace URI or attribute name,
rather than char-by-char comparisons (ie, you get the MSIL ceq opcode,
not a call to the String::op_Equality( ) function which is more expensive).
3. You could argue that using the XmlTextReader subclass directly may
be more efficient than using .NET 2.0's Factory Pattern to instantiate
an XmlReader because the latter puts my custom derived XmlTextReader
into a wrapper.
It's a trade-off you need to evaluate based on what your intended use
of the XmlReader is going to be. You'll gain performance by turning
off the unnecessary features. In the example above, to get at the heart
of the Infoset, comments, PIs, and insig. whitespace were discarded.
In this case where only five instance members of the XmlReader are
getting accessed, not making the XmlReader process extraneous
content saves me much more time then the cost of the one additional
level of indirection introduced by the wrapper Create( ) puts around
me.
When you're reading "pretty-printed" XML in general, freeing your-
self from the processing of extra whitespace nodes between content
is usually enough to justify accepting the wrapper (e.g., prefer the
wrapper Create( ) gives you over using your XmlTextReader sub-
class independently).
4. The Read( ) override makes use of the Skip( ) method to bypass
the node you wish to remove and all it's descendants. Any application
code that consumes this XmlTextReader subclass will never know that
those nodes existed.
You never said what you intended to do with the XML after you
had removed it's <bar> nodes. That'll dictate where you go from
here. If you really wanted to get this XML into an XmlDocument
or XPathDocument, then you can do something like this (instead
of my while loop),
XmlDocument doc = new XmlDocument( );
doc.Load( reader);
or,
XPathDocument xpathDoc = new XPathDocument( reader);
Derek Harmon