OK here's is what I wish to do. I have an XML file that I want to read
into an XmlDocument. I then want to be able to interrogate the
XmlNodes to find both their start AND end character positions within
the original file.
So e.g.
<tagA><tagB>som etext</tagB></tagA>
^ ^ ^ ^ ^ ^
0 6 12 19 26 33
tagA: start=0, end=33
tagB: start=6, end=26
sometext: start=12, end=19
I have seen the LineInfo example within the .net docs, see:
"Extending the DOM"
ms-help://MS.VSCC/MS.MSDNVS/cpguide/html/cpconextendingd om.htm
and
www.gotdotnet.com/userfiles/XMLDom/extendDOM.zip
This goes someway to doing what I want, but it only stores the start
position of each xml node, not the end. Also this information is in
line/column number format (via System.Xml.IXml LineInfo). I could work
out the character index from the line/column, but prefereably I would
like to store the positions as the XML is being read.
My first thought was to extend System.IO.Strin gReader (StreamReaderEx )
to keep track of it's current position by overriding the two Read()
methods. I can then extend XmlReader to somehow provide me with the
character position, perhaps by keeping a reference to the
StreamReaderEx. This is a bit messy but should work (I think!). It
also limits me to loading an XmlDocument via a StreamReaderEx.
The remaining problem I have is that I can store the start character
position in the overriden CreateElement()/CreateAttribute () methods,
but where should i plug into the XmlDocument to store the end
positions?
Perhaps I am going about this the wrong way? Surely this position info
is already there somewhere, and I just need to extend the node classes
to store it?
As background I have recently been using a JavaCC/JJTree generated
(javascript)par ser. The parse tree generated gives me a tree of nodes,
each node then has a reference to it's first and last tokens (that
make up that node). Each token knows it's start & end position within
the original input stream (because I extended the code to store this
info when the token was created). Using this approach gives me all the
info I want. I want to avoid using JavaCC for my Xml as it is a
non-standard way of handling Xml. Future maintainers of the code will
wonder what the heck I was doing!
Thanks for reading this far,
Colin