MikeB wrote:
Quote:
I originally considered defining a class for each schema version and
using the XmlSerializer class to construct the appropriate one from the
xml document. However, this is where another potential issue raises it's
head: the xml files are rather large: 50+ Mb and over 1 million lines.
>
I suspect that using the XmlSerializer with documents of this size is
probably not appropriate. Am I correct?
|
If you deserialize an XML document with XmlSerializer then you get .NET
objects held in memory. It is hard to tell how much memory a 50 MB
document consumes, you will have to run some tests and of course you
will also have to take into account what kind of systems the users of
your application have. Nowadays they are selling PC systems with 3 GB of
RAM so I wouldn't rule out completely that you can use XmlSerializer to
deserialize your large XML.
Quote:
Bearing this in mind, I could construct the object model by using an
XmlTextReader and analysing XmlTextReader.NodeType. The downside to this
is that AIUI, I will then have to manually handle the schema differences.
|
Note that with .NET 2.0 XmlTextReader is deprecated, you should create
an XmlReader with XmlReader.Create and proper XmlReaderSettings.
Other than that you are right, XmlReader works fast but forwards only
maintaining a low memory footprint that way so it is the .NET XML API
for parsing large XML documents.
You can however combine XmlReader and other APIs like
XPathDocument/XPathNavigator or or XmlSerializer or LINQ to XML (in .NET
3.5) to process the whole document with XmlReader but pass subtrees on
to other APIs to have more comfort or power to extract the data you are
looking for.
For instance with LINQ to XML you have XNode.ReadFrom
http://msdn.microsoft.com/en-us/libr....readfrom.aspx
to consume a subtree.
--
Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/