Hi,
the docs say :
"The Xml-document is not loaded into memory when using XmlTextReader, as
opposed to using the DOM where the entire document is loaded in memory"
but, when using XmlTextReader, how can I parse then if the document is not
loaded ?
something must be loaded no ?
thanks
Chris 5 6398
Chris wrote: "The Xml-document is not loaded into memory when using XmlTextReader, as opposed to using the DOM where the entire document is loaded in memory"
but, when using XmlTextReader, how can I parse then if the document is not loaded ? something must be loaded no ?
XmlTextReader is streaming forward-only non-caching reader. It reads and
holds in memory only one XML node at a time.
--
Oleg Tkachenko [XML MVP] http://blog.tkachenko.com
> the docs say : "The Xml-document is not loaded into memory when using XmlTextReader, as opposed to using the DOM where the entire document is loaded in memory"
but, when using XmlTextReader, how can I parse then if the document is not loaded ? something must be loaded no ?
XmlTextReader provides sequential, forward only, read only view of xml.
In general one works with XmlTextReader as:
while(stop condition)
{
reader.ReadXXX();
Handle read data
}
If you are going to store some data of the predefined schema, you may
consider to declare class, annotate it with xml serialization attributes,
and use XmlSerializer to read and write xml.
--
Vladimir Nesterovsky
e-mail: vl******@nesterovsky-bros.com
home: http://www.nesterovsky-bros.com
Even outside the .NET world, there have been for some time two ways to
read XML. I've heard them referred to as "DOM" and "SAX".
"DOM" (short for "Document Object Model") parsers read the entire XML
document and build a representation of it as a hierarchy of objects in
memory. There are DOM parsers for Java, C, C++, and other languages as
well as the ones built into .NET.
DOM is, generally speaking, the easiest way in which to deal with XML
documents, but it has the disadvantage that it loads the entire
document into memory, which can be a problem if you have a
many-megabyte document.
If you are reading XML into ADO.NET you really don't have any choice
but to use DOM in some form because all of MS's automated
XML-to-ADO.NET tools read the entire document into a dataset.
"SAX" (named after the original parser, I think) parsers read one XML
token at a time. You supply callback methods that the parser should
call when it encounters certain kinds of things in the document. For
example, "Call this method when you find an attribute called
"Address". SAX parsers are extremely resource efficient, because they
read only one XML element at a time. However, they leave it up to the
calling application to maintain state. When your "Address attribute"
method is called, you have no idea where in the document you are, only
that you hit an attribute called "Address". For this reason
programming for SAX parsers can be a pain in the butt.
MS claims that XmlTextReader improves upon the SAX parser, but I
remember thinking that it really wasn't a leap forward in technology,
back when I was investigating .NET's XML support.
I ended up writing what I consider the best balance between SAX and
DOM, and something that I wish MS (and Java, and ... ) would include
in their standard libraries: a sequential, forward-only parser that
reads an XML document's repeating record content one DOM tree at a
time.
In brief, there are two kinds of XML documents: those that represent
documents with little or no repeating structure (such as MS Word
files). For these you use DOM. However, many XML files represent large
record sets, where each "record" has complex substructure. DOM is
overkill for these, because you don't need all of the records in
memory at once: you're processing them serially, one-by-one. SAX,
however, is too simplistic and makes it difficult to work with each
record. What you really want is a parser that, given some information
about what constitutes a "record" in your XML document, reads one
"record" at a time into a mini DOM tree.
This is what I built for our own use here, and it works well. It reads
only a small portion of an XML file into memory at one time, but each
portion comes in as a DOM tree that is easy to work with.
Good insight.
You can however easily build your program on top of XmlTextReader but
writing a custom xml reader that reads a record at a time. At an API level,
the designers cant make a choice between whether we should read records or
not. How would you decide generically what is a record in your structure or
not. XmlTextReader is a low level forward only streaming parser. You could
have built your record reader on top of it, if you havent already done so.
Can you take your program and apply it to any XML? How would you know which
is a record and which is not? XmlTextReader is a parser to read any API and
at the same time check conformance to XML 1.0 spec. What you described is a
custom xml parser solution to your needs. You could have used XmlTextReader
underlying to read tokens and report only records at a time. (ReadOuterXml
and ReadInnerXml do report one structure in a sense).
HTH,
Amol
"Bruce Wood" <br*******@canada.com> wrote in message
news:a6**************************@posting.google.c om... Even outside the .NET world, there have been for some time two ways to read XML. I've heard them referred to as "DOM" and "SAX".
"DOM" (short for "Document Object Model") parsers read the entire XML document and build a representation of it as a hierarchy of objects in memory. There are DOM parsers for Java, C, C++, and other languages as well as the ones built into .NET.
DOM is, generally speaking, the easiest way in which to deal with XML documents, but it has the disadvantage that it loads the entire document into memory, which can be a problem if you have a many-megabyte document.
If you are reading XML into ADO.NET you really don't have any choice but to use DOM in some form because all of MS's automated XML-to-ADO.NET tools read the entire document into a dataset.
"SAX" (named after the original parser, I think) parsers read one XML token at a time. You supply callback methods that the parser should call when it encounters certain kinds of things in the document. For example, "Call this method when you find an attribute called "Address". SAX parsers are extremely resource efficient, because they read only one XML element at a time. However, they leave it up to the calling application to maintain state. When your "Address attribute" method is called, you have no idea where in the document you are, only that you hit an attribute called "Address". For this reason programming for SAX parsers can be a pain in the butt.
MS claims that XmlTextReader improves upon the SAX parser, but I remember thinking that it really wasn't a leap forward in technology, back when I was investigating .NET's XML support.
I ended up writing what I consider the best balance between SAX and DOM, and something that I wish MS (and Java, and ... ) would include in their standard libraries: a sequential, forward-only parser that reads an XML document's repeating record content one DOM tree at a time.
In brief, there are two kinds of XML documents: those that represent documents with little or no repeating structure (such as MS Word files). For these you use DOM. However, many XML files represent large record sets, where each "record" has complex substructure. DOM is overkill for these, because you don't need all of the records in memory at once: you're processing them serially, one-by-one. SAX, however, is too simplistic and makes it difficult to work with each record. What you really want is a parser that, given some information about what constitutes a "record" in your XML document, reads one "record" at a time into a mini DOM tree.
This is what I built for our own use here, and it works well. It reads only a small portion of an XML file into memory at one time, but each portion comes in as a DOM tree that is easy to work with.
My DOM-tree-at-a-time XML parser is, in fact, built on top of
XmlTextReader and is generic. I called it XmlFragmentReader and it's a
subclass of XmlTextReader.
The generic parser needs one extra piece of information in order to do
its work, and adds one extra property for retrieving information.
The additional piece of information it needs is an XPath expression
for the node that encloses what I would call the "record". If I have a
document that looks like this:
<document>
<header>
</header>
<data>
<thing>
<firstData />
</thing>
<thing>
<secondData />
</thing>
</data>
<footer>
</footer>
</document>
Then I would pass my XmlFragmentReader "/document/data" on the
constructor. This tells it that every tag inside the tag
"/document/data" should be returned as a complete DOM tree. So, on the
document above, my fragment reader would return
<thing>
<firstData />
</thing>
as the first DOM tree, and
<thing>
<secondData />
</thing>
as the second DOM tree. The third read would return "end of document".
The only remaining hitch is what to do about the rest of the document.
For this, the XmlFragmentReader has a RemainintDocument property that
returns the rest of the document _excluding any records read and as
read thusfar_. So, upon opening the document and after the second
read, the RemainingDocument property would return
<document>
<header>
</header>
<data>
</data>
</document>
as its DOM tree because it can read at least as far as the opening
<data> tag. After the second read, and at the end of document, the
RemainingDocument property would return
<document>
<header>
</header>
<data>
</data>
<footer>
</footer>
</document>
as its DOM tree because after the second read it would read all the
way to the end looking for the next <data> tag. There is, of course,
no requirement that all tags inside <data> must be the same, nor that
there be only one <data> tag, only that anything not a child element
of the XPath "/document/data" is built into the RemainingDocument
progressively as the XmlTextReader passes over the document, and that
anything inside the XPath "/document/data" is returned one-by-one as a
sequence of DOM trees. This copes quite nicely with the vast majority
of documents containing repeating data, providing the benefits of DOM
trees with the memory savings of a SAX-style parser. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Raghu |
last post by:
Does XmlTextReader class in .net represent SAX implementation?
If yes, are there any performance gains if I use C++ SAX implementation in
msxml4.dll versus XmlTextReader in .net? Did any one try...
|
by: Andy Neilson |
last post by:
I've run across a strange behaviour with XmlSerializer that I'm unable to explain. I came across this while trying to use XmlSerializer to deserialize from a the details of a SoapException. This...
|
by: Geoff Bennett |
last post by:
While parsing an XML document, my TextReader instance skips nodes. For
example, in this fragment:
<Person Sex="Male" FirstHomeBuyer="No" YearsInCurrentProfession="14">
<RelatedEntityRef...
|
by: Yuriy |
last post by:
Hi,
any ideas how to read XML fragment from TextReader? XmlTextReader constructor
accepts only Stream or string as source Do I miss something?
thanks
Yuriy
|
by: RJN |
last post by:
Hi
I'm using XMLTextReader to parse the contents of XML. I have issues when
the xml content itself has some special characters like & ,> etc.
<CompanyName>Johnson & Jhonson</CompanyName>...
|
by: SHC |
last post by:
Hi all,
I did the "Build" on the attached code in my VC++ .NET 2003 - Windows XP Pro
PC. On the c:\ screen, I got the following: Microsoft Development Environment
An unhandled exception of type...
|
by: Chris |
last post by:
Hi,
the docs say :
"The Xml-document is not loaded into memory when using XmlTextReader, as
opposed to using the DOM where the entire document is loaded in memory"
but, when using...
|
by: XML reading with XMLTextReader |
last post by:
im trying to read an xml file which is in the wwwroot folder.im using IIS on
XP Prof.
my code is...
|
by: CodeRazor |
last post by:
I am trying to use an XmlTextReader to retrieve data. I need to use an
XmlTextReader because it is faster than using an XmlDocument.
I have found an inelegant way of retrieving each item's title...
|
by: lllomh |
last post by:
Define the method first
this.state = {
buttonBackgroundColor: 'green',
isBlinking: false, // A new status is added to identify whether the button is blinking or not
}
autoStart=()=>{
|
by: DJRhino |
last post by:
Was curious if anyone else was having this same issue or not....
I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
|
by: giovanniandrean |
last post by:
The energy model is structured as follows and uses excel sheets to give input data:
1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
|
by: NeoPa |
last post by:
Hello everyone.
I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report).
I know it can be done by selecting :...
|
by: NeoPa |
last post by:
Introduction
For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 1 Nov 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM)
Please note that the UK and Europe revert to winter time on...
|
by: nia12 |
last post by:
Hi there,
I am very new to Access so apologies if any of this is obvious/not clear.
I am creating a data collection tool for health care employees to complete. It consists of a number of...
|
by: isladogs |
last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, Mike...
|
by: GKJR |
last post by:
Does anyone have a recommendation to build a standalone application to replace an Access database? I have my bookkeeping software I developed in Access that I would like to make available to other...
| |