XmlTextreader versus DOM - .NET Framework

Chris

Hi,

the docs say :

"The Xml-document is not loaded into memory when using XmlTextReader, as
opposed to using the DOM where the entire document is loaded in memory"

but, when using XmlTextReader, how can I parse then if the document is not
loaded ?
something must be loaded no ?

thanks
Chris

Nov 12 '05 #1

Subscribe Post Reply

6487

Oleg Tkachenko [MVP]

Chris wrote:

"The Xml-document is not loaded into memory when using XmlTextReader, as
opposed to using the DOM where the entire document is loaded in memory"

but, when using XmlTextReader, how can I parse then if the document is not
loaded ?
something must be loaded no ?

XmlTextReader is streaming forward-only non-caching reader. It reads and
holds in memory only one XML node at a time.

--
Oleg Tkachenko [XML MVP]
http://blog.tkachenko.com

Nov 12 '05 #2

Vladimir Nesterovsky

> the docs say :

"The Xml-document is not loaded into memory when using XmlTextReader, as
opposed to using the DOM where the entire document is loaded in memory"

but, when using XmlTextReader, how can I parse then if the document is not
loaded ?
something must be loaded no ?

XmlTextReader provides sequential, forward only, read only view of xml.
In general one works with XmlTextReader as:

while(stop condition)
{
reader.ReadXXX();
Handle read data
}

If you are going to store some data of the predefined schema, you may
consider to declare class, annotate it with xml serialization attributes,
and use XmlSerializer to read and write xml.

--
Vladimir Nesterovsky
e-mail: vl******@nesterovsky-bros.com
home: http://www.nesterovsky-bros.com

Nov 12 '05 #3

Bruce Wood

Even outside the .NET world, there have been for some time two ways to
read XML. I've heard them referred to as "DOM" and "SAX".

"DOM" (short for "Document Object Model") parsers read the entire XML
document and build a representation of it as a hierarchy of objects in
memory. There are DOM parsers for Java, C, C++, and other languages as
well as the ones built into .NET.

DOM is, generally speaking, the easiest way in which to deal with XML
documents, but it has the disadvantage that it loads the entire
document into memory, which can be a problem if you have a
many-megabyte document.

If you are reading XML into ADO.NET you really don't have any choice
but to use DOM in some form because all of MS's automated
XML-to-ADO.NET tools read the entire document into a dataset.

"SAX" (named after the original parser, I think) parsers read one XML
token at a time. You supply callback methods that the parser should
call when it encounters certain kinds of things in the document. For
example, "Call this method when you find an attribute called
"Address". SAX parsers are extremely resource efficient, because they
read only one XML element at a time. However, they leave it up to the
calling application to maintain state. When your "Address attribute"
method is called, you have no idea where in the document you are, only
that you hit an attribute called "Address". For this reason
programming for SAX parsers can be a pain in the butt.

MS claims that XmlTextReader improves upon the SAX parser, but I
remember thinking that it really wasn't a leap forward in technology,
back when I was investigating .NET's XML support.

I ended up writing what I consider the best balance between SAX and
DOM, and something that I wish MS (and Java, and ... ) would include
in their standard libraries: a sequential, forward-only parser that
reads an XML document's repeating record content one DOM tree at a
time.

In brief, there are two kinds of XML documents: those that represent
documents with little or no repeating structure (such as MS Word
files). For these you use DOM. However, many XML files represent large
record sets, where each "record" has complex substructure. DOM is
overkill for these, because you don't need all of the records in
memory at once: you're processing them serially, one-by-one. SAX,
however, is too simplistic and makes it difficult to work with each
record. What you really want is a parser that, given some information
about what constitutes a "record" in your XML document, reads one
"record" at a time into a mini DOM tree.

This is what I built for our own use here, and it works well. It reads
only a small portion of an XML file into memory at one time, but each
portion comes in as a DOM tree that is easy to work with.

Nov 12 '05 #4

Amol Kher [MSFT]

Good insight.

You can however easily build your program on top of XmlTextReader but
writing a custom xml reader that reads a record at a time. At an API level,
the designers cant make a choice between whether we should read records or
not. How would you decide generically what is a record in your structure or
not. XmlTextReader is a low level forward only streaming parser. You could
have built your record reader on top of it, if you havent already done so.

Can you take your program and apply it to any XML? How would you know which
is a record and which is not? XmlTextReader is a parser to read any API and
at the same time check conformance to XML 1.0 spec. What you described is a
custom xml parser solution to your needs. You could have used XmlTextReader
underlying to read tokens and report only records at a time. (ReadOuterXml
and ReadInnerXml do report one structure in a sense).

HTH,
Amol
"Bruce Wood" <br*******@canada.com> wrote in message
news:a6**************************@posting.google.c om...

Even outside the .NET world, there have been for some time two ways to
read XML. I've heard them referred to as "DOM" and "SAX".

"DOM" (short for "Document Object Model") parsers read the entire XML
document and build a representation of it as a hierarchy of objects in
memory. There are DOM parsers for Java, C, C++, and other languages as
well as the ones built into .NET.

DOM is, generally speaking, the easiest way in which to deal with XML
documents, but it has the disadvantage that it loads the entire
document into memory, which can be a problem if you have a
many-megabyte document.

If you are reading XML into ADO.NET you really don't have any choice
but to use DOM in some form because all of MS's automated
XML-to-ADO.NET tools read the entire document into a dataset.

"SAX" (named after the original parser, I think) parsers read one XML
token at a time. You supply callback methods that the parser should
call when it encounters certain kinds of things in the document. For
example, "Call this method when you find an attribute called
"Address". SAX parsers are extremely resource efficient, because they
read only one XML element at a time. However, they leave it up to the
calling application to maintain state. When your "Address attribute"
method is called, you have no idea where in the document you are, only
that you hit an attribute called "Address". For this reason
programming for SAX parsers can be a pain in the butt.

MS claims that XmlTextReader improves upon the SAX parser, but I
remember thinking that it really wasn't a leap forward in technology,
back when I was investigating .NET's XML support.

I ended up writing what I consider the best balance between SAX and
DOM, and something that I wish MS (and Java, and ... ) would include
in their standard libraries: a sequential, forward-only parser that
reads an XML document's repeating record content one DOM tree at a
time.

In brief, there are two kinds of XML documents: those that represent
documents with little or no repeating structure (such as MS Word
files). For these you use DOM. However, many XML files represent large
record sets, where each "record" has complex substructure. DOM is
overkill for these, because you don't need all of the records in
memory at once: you're processing them serially, one-by-one. SAX,
however, is too simplistic and makes it difficult to work with each
record. What you really want is a parser that, given some information
about what constitutes a "record" in your XML document, reads one
"record" at a time into a mini DOM tree.

This is what I built for our own use here, and it works well. It reads
only a small portion of an XML file into memory at one time, but each
portion comes in as a DOM tree that is easy to work with.

Nov 12 '05 #5

Bruce Wood

My DOM-tree-at-a-time XML parser is, in fact, built on top of
XmlTextReader and is generic. I called it XmlFragmentReader and it's a
subclass of XmlTextReader.

The generic parser needs one extra piece of information in order to do
its work, and adds one extra property for retrieving information.

The additional piece of information it needs is an XPath expression
for the node that encloses what I would call the "record". If I have a
document that looks like this:

<document>
<header>
</header>
<data>
<thing>
<firstData />
</thing>
<thing>
<secondData />
</thing>
</data>
<footer>
</footer>
</document>

Then I would pass my XmlFragmentReader "/document/data" on the
constructor. This tells it that every tag inside the tag
"/document/data" should be returned as a complete DOM tree. So, on the
document above, my fragment reader would return

<thing>
<firstData />
</thing>

as the first DOM tree, and

<thing>
<secondData />
</thing>

as the second DOM tree. The third read would return "end of document".

The only remaining hitch is what to do about the rest of the document.
For this, the XmlFragmentReader has a RemainintDocument property that
returns the rest of the document _excluding any records read and as
read thusfar_. So, upon opening the document and after the second
read, the RemainingDocument property would return

<document>
<header>
</header>
<data>
</data>
</document>

as its DOM tree because it can read at least as far as the opening
<data> tag. After the second read, and at the end of document, the
RemainingDocument property would return

<document>
<header>
</header>
<data>
</data>
<footer>
</footer>
</document>

as its DOM tree because after the second read it would read all the
way to the end looking for the next <data> tag. There is, of course,
no requirement that all tags inside <data> must be the same, nor that
there be only one <data> tag, only that anything not a child element
of the XPath "/document/data" is built into the RemainingDocument
progressively as the XmlTextReader passes over the document, and that
anything inside the XPath "/document/data" is returned one-by-one as a
sequence of DOM trees. This copes quite nicely with the vast majority
of documents containing repeating data, providing the benefits of DOM
trees with the memory savings of a SAX-style parser.

Nov 12 '05 #6

Similar topics

SAX model and XmlTextReader

by: Raghu | last post by:

Does XmlTextReader class in .net represent SAX implementation? If yes, are there any performance gains if I use C++ SAX implementation in msxml4.dll versus XmlTextReader in .net? Did any one try...

.NET Framework

XmlSerializer - XmlTextReader vs. XmlNodeReader

by: Andy Neilson | last post by:

I've run across a strange behaviour with XmlSerializer that I'm unable to explain. I came across this while trying to use XmlSerializer to deserialize from a the details of a SoapException. This...

.NET Framework

XmlTextReader is skipping nodes unintentionally

by: Geoff Bennett | last post by:

While parsing an XML document, my TextReader instance skips nodes. For example, in this fragment: <Person Sex="Male" FirstHomeBuyer="No" YearsInCurrentProfession="14"> <RelatedEntityRef...

.NET Framework

XmlTextReader and xml fragment

by: Yuriy | last post by:

Hi, any ideas how to read XML fragment from TextReader? XmlTextReader constructor accepts only Stream or string as source Do I miss something? thanks Yuriy

.NET Framework

XMLTextReader - Issue with special characters &,<,>

by: RJN | last post by:

Hi I'm using XMLTextReader to parse the contents of XML. I have issues when the xml content itself has some special characters like & ,> etc. <CompanyName>Johnson & Jhonson</CompanyName>...

.NET Framework

VC++ .NET 2003: XmlTextReader Class Generates A Runtime Exception

by: SHC | last post by:

Hi all, I did the "Build" on the attached code in my VC++ .NET 2003 - Windows XP Pro PC. On the c:\ screen, I got the following: Microsoft Development Environment An unhandled exception of type...

.NET Framework

XmlTextreader versus DOM

by: Chris | last post by:

Hi, the docs say : "The Xml-document is not loaded into memory when using XmlTextReader, as opposed to using the DOM where the entire document is loaded in memory" but, when using...

.NET Framework

XMLTextReader is not defined

by: XML reading with XMLTextReader | last post by:

im trying to read an xml file which is in the wwwroot folder.im using IIS on XP Prof. my code is...

ASP.NET

XmlTextReader Hack

by: CodeRazor | last post by:

I am trying to use an XmlTextReader to retrieve data. I need to use an XmlTextReader because it is faster than using an XmlDocument. I have found an inelegant way of retrieving each item's title...

C# / C Sharp

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing