By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,187 Members | 1,038 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,187 IT Pros & Developers. It's quick & easy.

Handling multiple schemas and large files in XML

P: n/a
Hi

I hope that this is the correct place to post this question.

I'm looking at developing an application which will enable me to import
and process some data that is made available to me as XML.

One complication is that the providers of the data have published two
different schema versions. Whilst effectively describing the same data,
the 2nd schema is a significant refactoring of the first and so is
almost totally different in structure. I also can't rule out the
possibility that they will issue further versions too. I'd ideally like
to be able to handle both of these schemas and I also like to be able to
support for new versions with the minimum of fuss.
From knowlege of the application domain, I am also fairly sure that the
essential data will be stable change across schema versions.

I originally considered defining a class for each schema version and
using the XmlSerializer class to construct the appropriate one from the
xml document. However, this is where another potential issue raises it's
head: the xml files are rather large: 50+ Mb and over 1 million lines.

I suspect that using the XmlSerializer with documents of this size is
probably not appropriate. Am I correct?

Thankfully, it's not necessary to load the entire document in one go as
the user won't need to visualise *all* the data at once. Instead, they
will home into a section of the data and drill down for detail in
tree-like fashion. Because of this, the application's internal object
model can represent just the data that the user is interested in.

Bearing this in mind, I could construct the object model by using an
XmlTextReader and analysing XmlTextReader.NodeType. The downside to this
is that AIUI, I will then have to manually handle the schema differences.

I'd appreciate it if anyone could suggest better approaches. I'm fairly
new to both .NET and XML so please point out if I'm completely off the
mark here. Any suggestions at all are greatly appreciated.

TIA
MikeB
Sep 21 '08 #1
Share this Question
Share on Google+
2 Replies


P: n/a
MikeB wrote:
I originally considered defining a class for each schema version and
using the XmlSerializer class to construct the appropriate one from the
xml document. However, this is where another potential issue raises it's
head: the xml files are rather large: 50+ Mb and over 1 million lines.

I suspect that using the XmlSerializer with documents of this size is
probably not appropriate. Am I correct?
If you deserialize an XML document with XmlSerializer then you get .NET
objects held in memory. It is hard to tell how much memory a 50 MB
document consumes, you will have to run some tests and of course you
will also have to take into account what kind of systems the users of
your application have. Nowadays they are selling PC systems with 3 GB of
RAM so I wouldn't rule out completely that you can use XmlSerializer to
deserialize your large XML.

Bearing this in mind, I could construct the object model by using an
XmlTextReader and analysing XmlTextReader.NodeType. The downside to this
is that AIUI, I will then have to manually handle the schema differences.
Note that with .NET 2.0 XmlTextReader is deprecated, you should create
an XmlReader with XmlReader.Create and proper XmlReaderSettings.
Other than that you are right, XmlReader works fast but forwards only
maintaining a low memory footprint that way so it is the .NET XML API
for parsing large XML documents.
You can however combine XmlReader and other APIs like
XPathDocument/XPathNavigator or or XmlSerializer or LINQ to XML (in .NET
3.5) to process the whole document with XmlReader but pass subtrees on
to other APIs to have more comfort or power to extract the data you are
looking for.
For instance with LINQ to XML you have XNode.ReadFrom
http://msdn.microsoft.com/en-us/libr....readfrom.aspx
to consume a subtree.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Sep 21 '08 #2

P: n/a
On 21 Sep, 11:49, Martin Honnen <mahotr...@yahoo.dewrote:
MikeB wrote:
I originally considered defining a class for each schema version and
using the XmlSerializer class to construct the appropriate one from the
xml document. However, this is where another potential issue raises it's
head: the xml files are rather large: 50+ Mb and over 1 million lines.
I suspect that using the XmlSerializer with documents of this size is
probably not appropriate. Am I correct?

If you deserialize an XML document with XmlSerializer then you get .NET
objects held in memory. It is hard to tell how much memory a 50 MB
document consumes, you will have to run some tests and of course you
will also have to take into account what kind of systems the users of
your application have. Nowadays they are selling PC systems with 3 GB of
RAM so I wouldn't rule out completely that you can use XmlSerializer to
deserialize your large XML.
Bearing this in mind, I could construct the object model by using an
XmlTextReader and analysing XmlTextReader.NodeType. The downside to this
is that AIUI, I will then have to manually handle the schema differences.

Note that with .NET 2.0 XmlTextReader is deprecated, you should create
an XmlReader with XmlReader.Create and proper XmlReaderSettings.
Other than that you are right, XmlReader works fast but forwards only
maintaining a low memory footprint that way so it is the .NET XML API
for parsing large XML documents.
You can however combine XmlReader and other APIs like
XPathDocument/XPathNavigator or or XmlSerializer or LINQ to XML (in .NET
3.5) to process the whole document with XmlReader but pass subtrees on
to other APIs to have more comfort or power to extract the data you are
looking for.
For instance with LINQ to XML you have XNode.ReadFromhttp://msdn.microsoft.com/en-us/library/system.xml.linq.xnode.readfro...
to consume a subtree.

--

* * * * Martin Honnen --- MVP XML
* * * *http://JavaScript.FAQTs.com/
Martin. Thanks for that.
/MikeB
Sep 23 '08 #3

This discussion thread is closed

Replies have been disabled for this discussion.