473,322 Members | 1,409 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

Extending XmlDocument and associated classes to provide character positions.

OK here's is what I wish to do. I have an XML file that I want to read
into an XmlDocument. I then want to be able to interrogate the
XmlNodes to find both their start AND end character positions within
the original file.

So e.g.

<tagA><tagB>sometext</tagB></tagA>
^ ^ ^ ^ ^ ^
0 6 12 19 26 33
tagA: start=0, end=33
tagB: start=6, end=26
sometext: start=12, end=19
I have seen the LineInfo example within the .net docs, see:

"Extending the DOM"
ms-help://MS.VSCC/MS.MSDNVS/cpguide/html/cpconextendingdom.htm

and

www.gotdotnet.com/userfiles/XMLDom/extendDOM.zip
This goes someway to doing what I want, but it only stores the start
position of each xml node, not the end. Also this information is in
line/column number format (via System.Xml.IXmlLineInfo). I could work
out the character index from the line/column, but prefereably I would
like to store the positions as the XML is being read.

My first thought was to extend System.IO.StringReader (StreamReaderEx)
to keep track of it's current position by overriding the two Read()
methods. I can then extend XmlReader to somehow provide me with the
character position, perhaps by keeping a reference to the
StreamReaderEx. This is a bit messy but should work (I think!). It
also limits me to loading an XmlDocument via a StreamReaderEx.

The remaining problem I have is that I can store the start character
position in the overriden CreateElement()/CreateAttribute() methods,
but where should i plug into the XmlDocument to store the end
positions?

Perhaps I am going about this the wrong way? Surely this position info
is already there somewhere, and I just need to extend the node classes
to store it?
As background I have recently been using a JavaCC/JJTree generated
(javascript)parser. The parse tree generated gives me a tree of nodes,
each node then has a reference to it's first and last tokens (that
make up that node). Each token knows it's start & end position within
the original input stream (because I extended the code to store this
info when the token was created). Using this approach gives me all the
info I want. I want to avoid using JavaCC for my Xml as it is a
non-standard way of handling Xml. Future maintainers of the code will
wonder what the heck I was doing!

Thanks for reading this far,

Colin
Nov 11 '05 #1
1 2188
Colin Green wrote:
OK here's is what I wish to do. I have an XML file that I want to read
into an XmlDocument. I then want to be able to interrogate the
XmlNodes to find both their start AND end character positions within
the original file.

May be it's easier to calculate end position based on
start position + length
or
next-node-start-position - 1
?
--
Oleg Tkachenko
http://www.tkachenko.com/blog
Multiconn Technologies, Israel

Nov 11 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

17
by: Phil Powell | last post by:
Where can I find an online PHP form validator script library to use? I have tried hacking the one here at work for weeks now and it's getting more and more impossible to customize, especially now...
0
by: James Thurley | last post by:
I'm creating an XmlDocument manually, adding content using the Xml classes such as XmlElement and XmlText, and I then write it out as as "text/xml" to the HttpResponse.Output TextWriter object...
0
by: Gregory.Spencer | last post by:
Summary (still get coffee) explanation: I have added a new "Sessions" table to a DB because the original design could not handle a scenario where an entity "class" had a number of sessions....
5
by: jen_designs | last post by:
Is there a way to return the character position on a page? Not the x and y coordinates, but the number of characters on a page. For instance i have a html page with the following text: This is my...
3
by: todd | last post by:
Simply trying to load xml into a DOM without the dom converting my escape sequence. **code snippet** XmlDocument xmlDoc = new XmlDocument() ; xmlDoc.LoadXml("<x>hello world</x>"); ...
5
by: needin4mation | last post by:
Hi, I read this in a book about the Xml classes in c#: "These classes are abstract and therefore must be extended." I just wanted to know what this statement means. I know it is not in...
1
by: Joe Monnin | last post by:
I have a web service that takes an XmlDocument as a parameter, performs some processing on it, and saves it to a database. The web service signature looks similar to this: public void...
4
by: Divick | last post by:
Hi all, I want to subclass std::exception so as to designate the type of error that I want to throw, out of my classes, and for that I need to store the messages inside the exception classes. I...
10
by: lamxing | last post by:
Dear all, I've spent a long time to try to get the xmldocument.load method to handle UTF-8 characters, but no luck. Every time it loads a document contains european characters (such as the...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.