469,917 Members | 1,651 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,917 developers. It's quick & easy.

Extending XmlDocument and associated classes to provide character positions.

OK here's is what I wish to do. I have an XML file that I want to read
into an XmlDocument. I then want to be able to interrogate the
XmlNodes to find both their start AND end character positions within
the original file.

So e.g.

<tagA><tagB>sometext</tagB></tagA>
^ ^ ^ ^ ^ ^
0 6 12 19 26 33
tagA: start=0, end=33
tagB: start=6, end=26
sometext: start=12, end=19
I have seen the LineInfo example within the .net docs, see:

"Extending the DOM"
ms-help://MS.VSCC/MS.MSDNVS/cpguide/html/cpconextendingdom.htm

and

www.gotdotnet.com/userfiles/XMLDom/extendDOM.zip
This goes someway to doing what I want, but it only stores the start
position of each xml node, not the end. Also this information is in
line/column number format (via System.Xml.IXmlLineInfo). I could work
out the character index from the line/column, but prefereably I would
like to store the positions as the XML is being read.

My first thought was to extend System.IO.StringReader (StreamReaderEx)
to keep track of it's current position by overriding the two Read()
methods. I can then extend XmlReader to somehow provide me with the
character position, perhaps by keeping a reference to the
StreamReaderEx. This is a bit messy but should work (I think!). It
also limits me to loading an XmlDocument via a StreamReaderEx.

The remaining problem I have is that I can store the start character
position in the overriden CreateElement()/CreateAttribute() methods,
but where should i plug into the XmlDocument to store the end
positions?

Perhaps I am going about this the wrong way? Surely this position info
is already there somewhere, and I just need to extend the node classes
to store it?
As background I have recently been using a JavaCC/JJTree generated
(javascript)parser. The parse tree generated gives me a tree of nodes,
each node then has a reference to it's first and last tokens (that
make up that node). Each token knows it's start & end position within
the original input stream (because I extended the code to store this
info when the token was created). Using this approach gives me all the
info I want. I want to avoid using JavaCC for my Xml as it is a
non-standard way of handling Xml. Future maintainers of the code will
wonder what the heck I was doing!

Thanks for reading this far,

Colin
Nov 11 '05 #1
1 2052
Colin Green wrote:
OK here's is what I wish to do. I have an XML file that I want to read
into an XmlDocument. I then want to be able to interrogate the
XmlNodes to find both their start AND end character positions within
the original file.

May be it's easier to calculate end position based on
start position + length
or
next-node-start-position - 1
?
--
Oleg Tkachenko
http://www.tkachenko.com/blog
Multiconn Technologies, Israel

Nov 11 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

17 posts views Thread by Phil Powell | last post: by
5 posts views Thread by jen_designs | last post: by
3 posts views Thread by todd | last post: by
5 posts views Thread by needin4mation | last post: by
1 post views Thread by Joe Monnin | last post: by
4 posts views Thread by Divick | last post: by
reply views Thread by Salome Sato | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.