473,573 Members | 2,862 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

XSLT Compare two documents and output differences

Greetings,

I am relatively new to, what I would call, advanced XSLT/XPath and I
am after some advice from those in the know. I am attempting to figure
out a mechanism within XSLT to compare the difference between two
source documents and output node-sets which are "different" (changed
or new) to new XML files using xsl:result-document

To describe the problem I have provided some example data below along
with my a portion of my current XSLT. I have changed the meaning of
the data to make it less specific to my project just in case the
suggestions we get here prove useful to others.

OK, so problem is as follows:

- We have a source document "SourceData.xml " containing a catalogue
of "Fish" provided for us by a partner so that we can update our
internal databases.

- The process requires that we take each <datarecordno de and parse
it into our internal format using our naming conventions

- We also have to perform a replacement against their "location"
element which does not map to our "habitat" values. I have done this
by loading a lookup file called "DataMapping.xm l" into a global
variable. I then assign an xsl:key to the @clientname attribute of the
<entryelement . When I need to get the value I grab the clients value
into a variable, switch to the lookup documents context using the
xsl:for-each trick and then perform a lookup using key(x,y).

- Each <datarecordno de in the Source will produce a new xml file
containing a single <updateRecordel ement with our structure beneath

All of this works fine (oddly enough) and we have been quite impressed
with how XSLT handles all this. HOWEVER, we have just been told that
the partner who supplies our Source XML is not able to filter the
records they send us to only contain those new or recently modified,
in fact that have to send us pretty much their entire database. There
is no option for them to change this and to make matters worse the
source file could grow to upwards of 50,000 records, making it over
120MB.

I have been asked to look at ways to compare the previous days Source
XML against the one coming in and output only those records which are
new or have changed. I am currently doing this in the code warping the
XSLT Transformation, but it's going to get real slow when there are
50k records.

The rules are:

- Both documents will be an identical structure
- Both documents will have ~95% the same content
- The source document <datarecordha s a compound key to make it
unique <species+ <subspecies>
- A modified record consists of any change to the payload value of
the elements within the <datarecord>' s
- A new record is obviously one not found in the previous days XML
- We only want to produce either a single XML containing new or
modified records *OR* incorporate the required XSLT into our current
GenerateDataSeg ments.xsl

I have been thinking about with loading one document as the source and
then document() to load the previous filename (passed as a Global
Param), but frankly I'm a little lost as to how to attack it after
that.

If the answer is that there is no decent way of doing this in XSLT
without killing the load on the machine, does anyone know of a fully
automatable Command Line tool or Service that can do the "compare and
output differences" bit ? Open Source or Commercial is fine by me. for
the record, I'm currently using the latest build of Saxon-B
<!-- SourceData.xml -->

<?xml version="1.0" encoding="UTF-8"?>
<main>
<datarecord>
<species>23</species>
<subspecies>2 3</subspecies>
<location>Pacif ic</location>
<name>Blue Bopper Fish</name>
</datarecord>
<datarecord>
<species>23</species>
<subspecies>2 5</subspecies>
<location>India n</location>
<name>Purple Bopper Fish</name>
</datarecord>
<datarecord>
<species>17</species>
<subspecies>3 </subspecies>
<location>Atlan tic</location>
<name>Ringed Oaf Fish</name>
</datarecord>
...
</main>
<!-- DataMapping.xml -->

<?xml version="1.0" encoding="UTF-8"?>
<mapping>
<mapsection name="oceans">
<entry clientname="Pac ific" internalname="P acific Ocean">
<entry clientname="Atl antic" internalname="A tlantic Ocean">
<entry clientname="Ind ian" internalname="I ndian Ocean">
<entry clientname="Sou thern" internalname="S outhern Ocean">
</mapsection>
</mapping>
<!-- GenerateDataSeg ments.xsl -->

<xsl:styleshe et xmlns:xsl="http ://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:param name="outputPat h" />
<xsl:variable name="dataMappi ng"
select="documen t('DataMapping. xml')" />
<xsl:key name="oceans" match="mapsecti on[@name='oceans']/
entry" use="@clientnam e" />
<xsl:template match="/">
<xsl:for-each select="main/datarecord">
<xsl:result-document href="file:///{$outputPath}-
{count(ancestor ::node()|preced ing::*)}.xml" >
<updateRecord >
<family><xsl:va lue-of select="species " /></family>
<genus><xsl:val ue-of select="subspec ies" /></genus>
<habitat>
<xsl:variable name="clientHab itat" select="locatio n" />
<xsl:for-each select="$dataMa pping">
<xsl:value-of select="key('oc eans', $clientHabitat)/
@internalname"/>
</xsl:for-each>
</habitat>
<fullname><xsl: value-of select="name" /></fullname>
</updateRecord>
</xsl:result-document>
</xsl:for-each>
</xsl:stylesheet>
<!-- PreviousSourceD ata.xml - Missing one record and value changed in
another-->

<?xml version="1.0" encoding="UTF-8"?>
<main>
<datarecord>
<species>23</species>
<subspecies>2 5</subspecies>
<location>South ern</location>
<name>Purple Bopper Fish</name>
</datarecord>
<datarecord>
<species>17</species>
<subspecies>3 </subspecies>
<location>Atlan tic</location>
<name>Ringed Oaf Fish</name>
</datarecord>
...
</main>

Thanks in advance for your time and assistance,

Al

Jun 22 '07 #1
3 9567
On Jun 22, 3:36 am, super.radd...@g mail.com wrote:
Greetings,

I am relatively new to, what I would call, advanced XSLT/XPath and I
am after some advice from those in the know. I am attempting to figure
out a mechanism within XSLT to compare the difference between two
source documents and output node-sets which are "different" (changed
or new) to new XML files using xsl:result-document

To describe the problem I have provided some example data below along
with my a portion of my current XSLT. I have changed the meaning of
the data to make it less specific to my project just in case the
suggestions we get here prove useful to others.

OK, so problem is as follows:

- We have a source document "SourceData.xml " containing a catalogue
of "Fish" provided for us by a partner so that we can update our
internal databases.

- The process requires that we take each <datarecordno de and parse
it into our internal format using our naming conventions

- We also have to perform a replacement against their "location"
element which does not map to our "habitat" values. I have done this
by loading a lookup file called "DataMapping.xm l" into a global
variable. I then assign an xsl:key to the @clientname attribute of the
<entryelement . When I need to get the value I grab the clients value
into a variable, switch to the lookup documents context using the
xsl:for-each trick and then perform a lookup using key(x,y).

- Each <datarecordno de in the Source will produce a new xml file
containing a single <updateRecordel ement with our structure beneath

All of this works fine (oddly enough) and we have been quite impressed
with how XSLT handles all this. HOWEVER, we have just been told that
the partner who supplies our Source XML is not able to filter the
records they send us to only contain those new or recently modified,
in fact that have to send us pretty much their entire database. There
is no option for them to change this and to make matters worse the
source file could grow to upwards of 50,000 records, making it over
120MB.

I have been asked to look at ways to compare the previous days Source
XML against the one coming in and output only those records which are
new or have changed. I am currently doing this in the code warping the
XSLT Transformation, but it's going to get real slow when there are
50k records.

The rules are:

- Both documents will be an identical structure
- Both documents will have ~95% the same content
- The source document <datarecordha s a compound key to make it
unique <species+ <subspecies>
- A modified record consists of any change to the payload value of
the elements within the <datarecord>' s
- A new record is obviously one not found in the previous days XML
- We only want to produce either a single XML containing new or
modified records *OR* incorporate the required XSLT into our current
GenerateDataSeg ments.xsl

I have been thinking about with loading one document as the source and
then document() to load the previous filename (passed as a Global
Param), but frankly I'm a little lost as to how to attack it after
that.

If the answer is that there is no decent way of doing this in XSLT
without killing the load on the machine, does anyone know of a fully
automatable Command Line tool or Service that can do the "compare and
output differences" bit ? Open Source or Commercial is fine by me. for
the record, I'm currently using the latest build of Saxon-B

<!-- SourceData.xml -->

<?xml version="1.0" encoding="UTF-8"?>
<main>
<datarecord>
<species>23</species>
<subspecies>2 3</subspecies>
<location>Pacif ic</location>
<name>Blue Bopper Fish</name>
</datarecord>
<datarecord>
<species>23</species>
<subspecies>2 5</subspecies>
<location>India n</location>
<name>Purple Bopper Fish</name>
</datarecord>
<datarecord>
<species>17</species>
<subspecies>3 </subspecies>
<location>Atlan tic</location>
<name>Ringed Oaf Fish</name>
</datarecord>
...
</main>

<!-- DataMapping.xml -->

<?xml version="1.0" encoding="UTF-8"?>
<mapping>
<mapsection name="oceans">
<entry clientname="Pac ific" internalname="P acific Ocean">
<entry clientname="Atl antic" internalname="A tlantic Ocean">
<entry clientname="Ind ian" internalname="I ndian Ocean">
<entry clientname="Sou thern" internalname="S outhern Ocean">
</mapsection>
</mapping>

<!-- GenerateDataSeg ments.xsl -->

<xsl:styleshe et xmlns:xsl="http ://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:param name="outputPat h" />
<xsl:variable name="dataMappi ng"
select="documen t('DataMapping. xml')" />
<xsl:key name="oceans" match="mapsecti on[@name='oceans']/
entry" use="@clientnam e" />
<xsl:template match="/">
<xsl:for-each select="main/datarecord">
<xsl:result-document href="file:///{$outputPath}-
{count(ancestor ::node()|preced ing::*)}.xml" >
<updateRecord >
<family><xsl:va lue-of select="species " /></family>
<genus><xsl:val ue-of select="subspec ies" /></genus>
<habitat>
<xsl:variable name="clientHab itat" select="locatio n" />
<xsl:for-each select="$dataMa pping">
<xsl:value-of select="key('oc eans', $clientHabitat)/
@internalname"/>
</xsl:for-each>
</habitat>
<fullname><xsl: value-of select="name" /></fullname>
</updateRecord>
</xsl:result-document>
</xsl:for-each>
</xsl:stylesheet>

<!-- PreviousSourceD ata.xml - Missing one record and value changed in
another-->

<?xml version="1.0" encoding="UTF-8"?>
<main>
<datarecord>
<species>23</species>
<subspecies>2 5</subspecies>
<location>South ern</location>
<name>Purple Bopper Fish</name>
</datarecord>
<datarecord>
<species>17</species>
<subspecies>3 </subspecies>
<location>Atlan tic</location>
<name>Ringed Oaf Fish</name>
</datarecord>
...
</main>

Thanks in advance for your time and assistance,

Al
you could try using the node assertion mechanics of XSLT Unit (http://
xsltunit.org/#notEqual)

<xsltu:test id="test-title">
<xsl:call-template name="xsltu:ass ertEqual">
<xsl:with-param name="id" select="'full-value'"/>
<xsl:with-param name="nodes1">
<xsl:apply-templates select="documen t('library.xml' )/
library/book[isbn='083621746 2']/title"/>
</xsl:with-param>
<xsl:with-param name="nodes2">
<h1>Being a Dog Is a Full-Time Job</h1>
</xsl:with-param>
</xsl:call-template>
</xsltu:test>

Jun 22 '07 #2
>
you could try using the node assertion mechanics of XSLT Unit (http://
xsltunit.org/#notEqual)

<xsltu:test id="test-title">
<xsl:call-template name="xsltu:ass ertEqual">
<xsl:with-param name="id" select="'full-value'"/>
<xsl:with-param name="nodes1">
<xsl:apply-templates select="documen t('library.xml' )/
library/book[isbn='083621746 2']/title"/>
</xsl:with-param>
<xsl:with-param name="nodes2">
<h1>Being a Dog Is a Full-Time Job</h1>
</xsl:with-param>
</xsl:call-template>
</xsltu:test>
I am trying not to use an extensions. I ended up using the following,
which works perfectly.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:styleshe et version="1.0" xmlns:xsl="http ://www.w3.org/1999/XSL/
Transform">
<xsl:output method="xml" indent="yes" />
<xsl:param name="fileCurre ntPath" />
<xsl:param name="filePrevi ousPath" />
<xsl:variable name="fileCurre nt"
select="documen t($fileCurrentP ath, /)" />
<xsl:variable name="filePrevi ous"
select="documen t($filePrevious Path, /)" />
<xsl:template match="/">
<main>
<xsl:apply-templates select="$fileCu rrent//datarecord"
mode="addedchan ged"/>
</main>
</xsl:template>
<xsl:template match="//datarecord" mode="addedchan ged" >
<xsl:variable name="varSpecie s" select="species "/>
<xsl:variable name="varSubspe cies" select="subspec ies"/>
<xsl:choose>
<xsl:when test="$filePrev ious//datarecord[species=$varSpe cies]
[subspecies=$var Subspecies]">
<xsl:if test="not(.=$fi lePrevious//datarecord[species=
$varSpecies][subspecies=$var Subspecies])">
<xsl:copy-of select="."/>
</xsl:if>
</xsl:when>
<xsl:otherwis e>
<xsl:copy-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>

Jun 23 '07 #3
>
you could try using the node assertion mechanics of XSLT Unit (http://
xsltunit.org/#notEqual)

<xsltu:test id="test-title">
<xsl:call-template name="xsltu:ass ertEqual">
<xsl:with-param name="id" select="'full-value'"/>
<xsl:with-param name="nodes1">
<xsl:apply-templates select="documen t('library.xml' )/
library/book[isbn='083621746 2']/title"/>
</xsl:with-param>
<xsl:with-param name="nodes2">
<h1>Being a Dog Is a Full-Time Job</h1>
</xsl:with-param>
</xsl:call-template>
</xsltu:test>
I am trying not to use an extensions. I ended up using the following,
which works perfectly.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:styleshe et version="1.0" xmlns:xsl="http ://www.w3.org/1999/XSL/
Transform">
<xsl:output method="xml" indent="yes" />
<xsl:param name="fileCurre ntPath" />
<xsl:param name="filePrevi ousPath" />
<xsl:variable name="fileCurre nt"
select="documen t($fileCurrentP ath, /)" />
<xsl:variable name="filePrevi ous"
select="documen t($filePrevious Path, /)" />
<xsl:template match="/">
<main>
<xsl:apply-templates select="$fileCu rrent//datarecord"
mode="addedchan ged"/>
</main>
</xsl:template>
<xsl:template match="//datarecord" mode="addedchan ged" >
<xsl:variable name="varSpecie s" select="species "/>
<xsl:variable name="varSubspe cies" select="subspec ies"/>
<xsl:choose>
<xsl:when test="$filePrev ious//datarecord[species=$varSpe cies]
[subspecies=$var Subspecies]">
<xsl:if test="not(.=$fi lePrevious//datarecord[species=
$varSpecies][subspecies=$var Subspecies])">
<xsl:copy-of select="."/>
</xsl:if>
</xsl:when>
<xsl:otherwis e>
<xsl:copy-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>

Jun 23 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
2046
by: Don Garrett | last post by:
I have an XML document at the root of a directory tree that contains relative URIs to resources in a directory tree. During XSLT processing, these URI's can be used without any problems to access the various documents in the tree. However, when generating HTML output, the output <a href=""> tags need to be adjusted for the location of the...
8
2155
by: Maciej Wegorkiewicz | last post by:
Hi, I have small experience in XSLT processing and I have a problem which I cannot solve. Can you look at it? I have an input file containing info about bank accounts like this: (...) <acc id="1"> <balance>100</balance>
3
4585
by: Luther Miller | last post by:
I am using an XSLT file to convert data in a DataSet to XMLSS format for opening in Excel. Excel doesn't like the way Dates in the DataSet are being ouput. It appears that a timezone offset is being appended to the datetime; e.g., instead of "2003-08-26T00:00:00.0000000" I am getting "2003-08-26T00:00:00.0000000-07:00" in the output. If I...
3
2186
by: Teksure | last post by:
Hi group, searching in the Internet I found two products for XML which incorporate a very robust debugger for XSL/XSLT, I would like you to see these products and then, give me your opinion about the development environment or recommend me some other that you know. XML IDE's - http://xslt-process.sourceforge.net -...
2
2768
by: Ganesh Muthuvelu | last post by:
Hello, How can I compare or visually check the differences between two XML schemas. Let us say I have two files like "version_1.xsd" and "version_2.xsd" , how would I programtically find out the differences between these two XSD files?. Is there a way to do this in .NET? Thanks, Ganesh
2
1684
by: bravegag | last post by:
Hi all, I developed a transformation process that works beautifully when tested using MS Internet Explorer i.e. adding the <?xml-stylesheet type="text/xsl" href="../xslt/xmldiffs.xsl"?> on top of the source XML and opening it using MS Internet Explorer. The problem is that testing it under the ultimate actual process from Ant using the...
15
2148
by: Jeff Uchtman | last post by:
Can I draw from 2 XML sources, the structure is exactly the same execpt for data contained into 1 xslt using math to add some structrure, and displaying others as node 1 and node 2? This data is XML from a Barracuda Spam server that has grown to 2. Here is a snip from my form draw. Imports System Imports System.IO Imports System.Net
7
12821
by: HP17 | last post by:
I’m able using Javascript to transform a XML file using XSLT to a nice HTML output. What I need to do now is to combine two XML files and transform them together using XSLT. Here an abstract example: Load(xml1); Load(xml2); Xml = xml1 + xml2; Xml.transformNode(xslt); In my xslt document I need to access then nodes from both xml documents...
6
3721
by: John Larson | last post by:
Hi All, I am some information from INSPEC database records in XML to build a relational database of my own. I am currently trying to extract information by doing an XSLT transform of the XML files into a tab-separated text file that I want to import into the database. I have run into the following problem: in some documents there are...
0
7755
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7679
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
8190
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7756
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
6385
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
5281
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
1
2183
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1284
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
1027
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.