Handling multiple schemas and large files in XML

MikeB

Hi

I hope that this is the correct place to post this question.

I'm looking at developing an application which will enable me to import
and process some data that is made available to me as XML.

One complication is that the providers of the data have published two
different schema versions. Whilst effectively describing the same data,
the 2nd schema is a significant refactoring of the first and so is
almost totally different in structure. I also can't rule out the
possibility that they will issue further versions too. I'd ideally like
to be able to handle both of these schemas and I also like to be able to
support for new versions with the minimum of fuss.
From knowlege of the application domain, I am also fairly sure that the
essential data will be stable change across schema versions.

I originally considered defining a class for each schema version and
using the XmlSerializer class to construct the appropriate one from the
xml document. However, this is where another potential issue raises it's
head: the xml files are rather large: 50+ Mb and over 1 million lines.

I suspect that using the XmlSerializer with documents of this size is
probably not appropriate. Am I correct?

Thankfully, it's not necessary to load the entire document in one go as
the user won't need to visualise *all* the data at once. Instead, they
will home into a section of the data and drill down for detail in
tree-like fashion. Because of this, the application's internal object
model can represent just the data that the user is interested in.

Bearing this in mind, I could construct the object model by using an
XmlTextReader and analysing XmlTextReader.NodeType. The downside to this
is that AIUI, I will then have to manually handle the schema differences.

I'd appreciate it if anyone could suggest better approaches. I'm fairly
new to both .NET and XML so please point out if I'm completely off the
mark here. Any suggestions at all are greatly appreciated.

TIA
MikeB

Sep 21 '08 #1

Subscribe Post Reply

2306

Martin Honnen

MikeB wrote:

I originally considered defining a class for each schema version and
using the XmlSerializer class to construct the appropriate one from the
xml document. However, this is where another potential issue raises it's
head: the xml files are rather large: 50+ Mb and over 1 million lines.

I suspect that using the XmlSerializer with documents of this size is
probably not appropriate. Am I correct?

If you deserialize an XML document with XmlSerializer then you get .NET
objects held in memory. It is hard to tell how much memory a 50 MB
document consumes, you will have to run some tests and of course you
will also have to take into account what kind of systems the users of
your application have. Nowadays they are selling PC systems with 3 GB of
RAM so I wouldn't rule out completely that you can use XmlSerializer to
deserialize your large XML.

Bearing this in mind, I could construct the object model by using an
XmlTextReader and analysing XmlTextReader.NodeType. The downside to this
is that AIUI, I will then have to manually handle the schema differences.

Note that with .NET 2.0 XmlTextReader is deprecated, you should create
an XmlReader with XmlReader.Create and proper XmlReaderSettings.
Other than that you are right, XmlReader works fast but forwards only
maintaining a low memory footprint that way so it is the .NET XML API
for parsing large XML documents.
You can however combine XmlReader and other APIs like
XPathDocument/XPathNavigator or or XmlSerializer or LINQ to XML (in .NET
3.5) to process the whole document with XmlReader but pass subtrees on
to other APIs to have more comfort or power to extract the data you are
looking for.
For instance with LINQ to XML you have XNode.ReadFrom
http://msdn.microsoft.com/en-us/libr....readfrom.aspx
to consume a subtree.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/

Sep 21 '08 #2

objectivedynamics

On 21 Sep, 11:49, Martin Honnen <mahotr...@yahoo.dewrote:

MikeB wrote:
I originally considered defining a class for each schema version and
using the XmlSerializer class to construct the appropriate one from the
xml document. However, this is where another potential issue raises it's
head: the xml files are rather large: 50+ Mb and over 1 million lines.

I suspect that using the XmlSerializer with documents of this size is
probably not appropriate. Am I correct?

If you deserialize an XML document with XmlSerializer then you get .NET
objects held in memory. It is hard to tell how much memory a 50 MB
document consumes, you will have to run some tests and of course you
will also have to take into account what kind of systems the users of
your application have. Nowadays they are selling PC systems with 3 GB of
RAM so I wouldn't rule out completely that you can use XmlSerializer to
deserialize your large XML.

Bearing this in mind, I could construct the object model by using an
XmlTextReader and analysing XmlTextReader.NodeType. The downside to this
is that AIUI, I will then have to manually handle the schema differences.

Note that with .NET 2.0 XmlTextReader is deprecated, you should create
an XmlReader with XmlReader.Create and proper XmlReaderSettings.
Other than that you are right, XmlReader works fast but forwards only
maintaining a low memory footprint that way so it is the .NET XML API
for parsing large XML documents.
You can however combine XmlReader and other APIs like
XPathDocument/XPathNavigator or or XmlSerializer or LINQ to XML (in .NET
3.5) to process the whole document with XmlReader but pass subtrees on
to other APIs to have more comfort or power to extract the data you are
looking for.
For instance with LINQ to XML you have XNode.ReadFromhttp://msdn.microsoft.com/en-us/library/system.xml.linq.xnode.readfro...
to consume a subtree.

--

* * * * Martin Honnen --- MVP XML
* * * *http://JavaScript.FAQTs.com/

Martin. Thanks for that.
/MikeB

Sep 23 '08 #3

by: Steve George | last post by:

Hi, I have a scenario where I have a master schema that defines a number of complex and simple types. I then have a number of other schemas (with different namespaces) where I would like to reuse...

.NET Framework

What is a good idiom for handling a lazy object?

by: Siemel Naran | last post by:

What is a good idiom for handling a lazy object? I see 2 good possibilities. Any more, any comments? Which way do people here use? (1) class Thing { public: Thing(double x, double y) :...

C / C++

XSDObjectGen and generating multiple .cs files for multiple schema

by: Edward Clements | last post by:

I'm looking for a good tool to generate C# classes (from XSD-schemas) to (de-)serialize XML corresponding to those schemas. XSDObjectGen almost meets these requirements -- only, I need to...

.NET Framework

Consuming multiple web services with a .disco file

by: Matt D | last post by:

I've got two web services that use the same data types and that clients will have to consume. I read the msdn article on sharing types...

.NET Framework

Microsoft wsdl Utility fails when multiple <schema> tags reference same targetNamespace

by: Jeff | last post by:

We are using .Net and the wsdl Utility to generate proxies to consume web services built using the BEA toolset. The data architects on the BEA side create XML schemas with various entities in...

.NET Framework

validation of an xml file against multiple defined schema

by: paul_zaoldyeck | last post by:

does anyone know how to validate an xml file against multiple defined schema? can you show me some examples? i'm making here an xml reader.. thank you

C# / C Sharp

xsd.exe generating multiple enums for simple type of included schema file

by: olympus_mons | last post by:

Hi, I'm just discovering the power of xsd.exe, so maybe I'm doing something wrong. schema files describing requests and responses. So there is an extra xsd file for each response and each...

.NET Framework

How to read multiple worksheet from a single excel file

by: sejal17 | last post by:

hello Can any one tell me how to read multiple worksheets from a single excel file.I have stored that excel in xml file.so i want to read that xml that has multiple worksheet.And i want to store...

PHP

how to read multiple worksheet from an excel file using php

by: sejal17 | last post by:

hello Can any one tell me how to read multiple worksheets from a single excel file.I have stored that excel in xml file.so i want to read that xml that has multiple worksheet.And i want to store...

XML

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

Handling multiple schemas and large files in XML

Similar topics