473,387 Members | 1,512 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

XSLT Compare two documents and output differences

Greetings,

I am relatively new to, what I would call, advanced XSLT/XPath and I
am after some advice from those in the know. I am attempting to figure
out a mechanism within XSLT to compare the difference between two
source documents and output node-sets which are "different" (changed
or new) to new XML files using xsl:result-document

To describe the problem I have provided some example data below along
with my a portion of my current XSLT. I have changed the meaning of
the data to make it less specific to my project just in case the
suggestions we get here prove useful to others.

OK, so problem is as follows:

- We have a source document "SourceData.xml" containing a catalogue
of "Fish" provided for us by a partner so that we can update our
internal databases.

- The process requires that we take each <datarecordnode and parse
it into our internal format using our naming conventions

- We also have to perform a replacement against their "location"
element which does not map to our "habitat" values. I have done this
by loading a lookup file called "DataMapping.xml" into a global
variable. I then assign an xsl:key to the @clientname attribute of the
<entryelement. When I need to get the value I grab the clients value
into a variable, switch to the lookup documents context using the
xsl:for-each trick and then perform a lookup using key(x,y).

- Each <datarecordnode in the Source will produce a new xml file
containing a single <updateRecordelement with our structure beneath

All of this works fine (oddly enough) and we have been quite impressed
with how XSLT handles all this. HOWEVER, we have just been told that
the partner who supplies our Source XML is not able to filter the
records they send us to only contain those new or recently modified,
in fact that have to send us pretty much their entire database. There
is no option for them to change this and to make matters worse the
source file could grow to upwards of 50,000 records, making it over
120MB.

I have been asked to look at ways to compare the previous days Source
XML against the one coming in and output only those records which are
new or have changed. I am currently doing this in the code warping the
XSLT Transformation, but it's going to get real slow when there are
50k records.

The rules are:

- Both documents will be an identical structure
- Both documents will have ~95% the same content
- The source document <datarecordhas a compound key to make it
unique <species+ <subspecies>
- A modified record consists of any change to the payload value of
the elements within the <datarecord>'s
- A new record is obviously one not found in the previous days XML
- We only want to produce either a single XML containing new or
modified records *OR* incorporate the required XSLT into our current
GenerateDataSegments.xsl

I have been thinking about with loading one document as the source and
then document() to load the previous filename (passed as a Global
Param), but frankly I'm a little lost as to how to attack it after
that.

If the answer is that there is no decent way of doing this in XSLT
without killing the load on the machine, does anyone know of a fully
automatable Command Line tool or Service that can do the "compare and
output differences" bit ? Open Source or Commercial is fine by me. for
the record, I'm currently using the latest build of Saxon-B
<!-- SourceData.xml -->

<?xml version="1.0" encoding="UTF-8"?>
<main>
<datarecord>
<species>23</species>
<subspecies>23</subspecies>
<location>Pacific</location>
<name>Blue Bopper Fish</name>
</datarecord>
<datarecord>
<species>23</species>
<subspecies>25</subspecies>
<location>Indian</location>
<name>Purple Bopper Fish</name>
</datarecord>
<datarecord>
<species>17</species>
<subspecies>3</subspecies>
<location>Atlantic</location>
<name>Ringed Oaf Fish</name>
</datarecord>
...
</main>
<!-- DataMapping.xml -->

<?xml version="1.0" encoding="UTF-8"?>
<mapping>
<mapsection name="oceans">
<entry clientname="Pacific" internalname="Pacific Ocean">
<entry clientname="Atlantic" internalname="Atlantic Ocean">
<entry clientname="Indian" internalname="Indian Ocean">
<entry clientname="Southern" internalname="Southern Ocean">
</mapsection>
</mapping>
<!-- GenerateDataSegments.xsl -->

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:param name="outputPath" />
<xsl:variable name="dataMapping"
select="document('DataMapping.xml')" />
<xsl:key name="oceans" match="mapsection[@name='oceans']/
entry" use="@clientname" />
<xsl:template match="/">
<xsl:for-each select="main/datarecord">
<xsl:result-document href="file:///{$outputPath}-
{count(ancestor::node()|preceding::*)}.xml" >
<updateRecord>
<family><xsl:value-of select="species" /></family>
<genus><xsl:value-of select="subspecies" /></genus>
<habitat>
<xsl:variable name="clientHabitat" select="location" />
<xsl:for-each select="$dataMapping">
<xsl:value-of select="key('oceans', $clientHabitat)/
@internalname"/>
</xsl:for-each>
</habitat>
<fullname><xsl:value-of select="name" /></fullname>
</updateRecord>
</xsl:result-document>
</xsl:for-each>
</xsl:stylesheet>
<!-- PreviousSourceData.xml - Missing one record and value changed in
another-->

<?xml version="1.0" encoding="UTF-8"?>
<main>
<datarecord>
<species>23</species>
<subspecies>25</subspecies>
<location>Southern</location>
<name>Purple Bopper Fish</name>
</datarecord>
<datarecord>
<species>17</species>
<subspecies>3</subspecies>
<location>Atlantic</location>
<name>Ringed Oaf Fish</name>
</datarecord>
...
</main>

Thanks in advance for your time and assistance,

Al

Jun 22 '07 #1
3 9513
On Jun 22, 3:36 am, super.radd...@gmail.com wrote:
Greetings,

I am relatively new to, what I would call, advanced XSLT/XPath and I
am after some advice from those in the know. I am attempting to figure
out a mechanism within XSLT to compare the difference between two
source documents and output node-sets which are "different" (changed
or new) to new XML files using xsl:result-document

To describe the problem I have provided some example data below along
with my a portion of my current XSLT. I have changed the meaning of
the data to make it less specific to my project just in case the
suggestions we get here prove useful to others.

OK, so problem is as follows:

- We have a source document "SourceData.xml" containing a catalogue
of "Fish" provided for us by a partner so that we can update our
internal databases.

- The process requires that we take each <datarecordnode and parse
it into our internal format using our naming conventions

- We also have to perform a replacement against their "location"
element which does not map to our "habitat" values. I have done this
by loading a lookup file called "DataMapping.xml" into a global
variable. I then assign an xsl:key to the @clientname attribute of the
<entryelement. When I need to get the value I grab the clients value
into a variable, switch to the lookup documents context using the
xsl:for-each trick and then perform a lookup using key(x,y).

- Each <datarecordnode in the Source will produce a new xml file
containing a single <updateRecordelement with our structure beneath

All of this works fine (oddly enough) and we have been quite impressed
with how XSLT handles all this. HOWEVER, we have just been told that
the partner who supplies our Source XML is not able to filter the
records they send us to only contain those new or recently modified,
in fact that have to send us pretty much their entire database. There
is no option for them to change this and to make matters worse the
source file could grow to upwards of 50,000 records, making it over
120MB.

I have been asked to look at ways to compare the previous days Source
XML against the one coming in and output only those records which are
new or have changed. I am currently doing this in the code warping the
XSLT Transformation, but it's going to get real slow when there are
50k records.

The rules are:

- Both documents will be an identical structure
- Both documents will have ~95% the same content
- The source document <datarecordhas a compound key to make it
unique <species+ <subspecies>
- A modified record consists of any change to the payload value of
the elements within the <datarecord>'s
- A new record is obviously one not found in the previous days XML
- We only want to produce either a single XML containing new or
modified records *OR* incorporate the required XSLT into our current
GenerateDataSegments.xsl

I have been thinking about with loading one document as the source and
then document() to load the previous filename (passed as a Global
Param), but frankly I'm a little lost as to how to attack it after
that.

If the answer is that there is no decent way of doing this in XSLT
without killing the load on the machine, does anyone know of a fully
automatable Command Line tool or Service that can do the "compare and
output differences" bit ? Open Source or Commercial is fine by me. for
the record, I'm currently using the latest build of Saxon-B

<!-- SourceData.xml -->

<?xml version="1.0" encoding="UTF-8"?>
<main>
<datarecord>
<species>23</species>
<subspecies>23</subspecies>
<location>Pacific</location>
<name>Blue Bopper Fish</name>
</datarecord>
<datarecord>
<species>23</species>
<subspecies>25</subspecies>
<location>Indian</location>
<name>Purple Bopper Fish</name>
</datarecord>
<datarecord>
<species>17</species>
<subspecies>3</subspecies>
<location>Atlantic</location>
<name>Ringed Oaf Fish</name>
</datarecord>
...
</main>

<!-- DataMapping.xml -->

<?xml version="1.0" encoding="UTF-8"?>
<mapping>
<mapsection name="oceans">
<entry clientname="Pacific" internalname="Pacific Ocean">
<entry clientname="Atlantic" internalname="Atlantic Ocean">
<entry clientname="Indian" internalname="Indian Ocean">
<entry clientname="Southern" internalname="Southern Ocean">
</mapsection>
</mapping>

<!-- GenerateDataSegments.xsl -->

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:param name="outputPath" />
<xsl:variable name="dataMapping"
select="document('DataMapping.xml')" />
<xsl:key name="oceans" match="mapsection[@name='oceans']/
entry" use="@clientname" />
<xsl:template match="/">
<xsl:for-each select="main/datarecord">
<xsl:result-document href="file:///{$outputPath}-
{count(ancestor::node()|preceding::*)}.xml" >
<updateRecord>
<family><xsl:value-of select="species" /></family>
<genus><xsl:value-of select="subspecies" /></genus>
<habitat>
<xsl:variable name="clientHabitat" select="location" />
<xsl:for-each select="$dataMapping">
<xsl:value-of select="key('oceans', $clientHabitat)/
@internalname"/>
</xsl:for-each>
</habitat>
<fullname><xsl:value-of select="name" /></fullname>
</updateRecord>
</xsl:result-document>
</xsl:for-each>
</xsl:stylesheet>

<!-- PreviousSourceData.xml - Missing one record and value changed in
another-->

<?xml version="1.0" encoding="UTF-8"?>
<main>
<datarecord>
<species>23</species>
<subspecies>25</subspecies>
<location>Southern</location>
<name>Purple Bopper Fish</name>
</datarecord>
<datarecord>
<species>17</species>
<subspecies>3</subspecies>
<location>Atlantic</location>
<name>Ringed Oaf Fish</name>
</datarecord>
...
</main>

Thanks in advance for your time and assistance,

Al
you could try using the node assertion mechanics of XSLT Unit (http://
xsltunit.org/#notEqual)

<xsltu:test id="test-title">
<xsl:call-template name="xsltu:assertEqual">
<xsl:with-param name="id" select="'full-value'"/>
<xsl:with-param name="nodes1">
<xsl:apply-templates select="document('library.xml')/
library/book[isbn='0836217462']/title"/>
</xsl:with-param>
<xsl:with-param name="nodes2">
<h1>Being a Dog Is a Full-Time Job</h1>
</xsl:with-param>
</xsl:call-template>
</xsltu:test>

Jun 22 '07 #2
>
you could try using the node assertion mechanics of XSLT Unit (http://
xsltunit.org/#notEqual)

<xsltu:test id="test-title">
<xsl:call-template name="xsltu:assertEqual">
<xsl:with-param name="id" select="'full-value'"/>
<xsl:with-param name="nodes1">
<xsl:apply-templates select="document('library.xml')/
library/book[isbn='0836217462']/title"/>
</xsl:with-param>
<xsl:with-param name="nodes2">
<h1>Being a Dog Is a Full-Time Job</h1>
</xsl:with-param>
</xsl:call-template>
</xsltu:test>
I am trying not to use an extensions. I ended up using the following,
which works perfectly.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/
Transform">
<xsl:output method="xml" indent="yes" />
<xsl:param name="fileCurrentPath" />
<xsl:param name="filePreviousPath" />
<xsl:variable name="fileCurrent"
select="document($fileCurrentPath, /)" />
<xsl:variable name="filePrevious"
select="document($filePreviousPath, /)" />
<xsl:template match="/">
<main>
<xsl:apply-templates select="$fileCurrent//datarecord"
mode="addedchanged"/>
</main>
</xsl:template>
<xsl:template match="//datarecord" mode="addedchanged" >
<xsl:variable name="varSpecies" select="species"/>
<xsl:variable name="varSubspecies" select="subspecies"/>
<xsl:choose>
<xsl:when test="$filePrevious//datarecord[species=$varSpecies]
[subspecies=$varSubspecies]">
<xsl:if test="not(.=$filePrevious//datarecord[species=
$varSpecies][subspecies=$varSubspecies])">
<xsl:copy-of select="."/>
</xsl:if>
</xsl:when>
<xsl:otherwise>
<xsl:copy-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>

Jun 23 '07 #3
>
you could try using the node assertion mechanics of XSLT Unit (http://
xsltunit.org/#notEqual)

<xsltu:test id="test-title">
<xsl:call-template name="xsltu:assertEqual">
<xsl:with-param name="id" select="'full-value'"/>
<xsl:with-param name="nodes1">
<xsl:apply-templates select="document('library.xml')/
library/book[isbn='0836217462']/title"/>
</xsl:with-param>
<xsl:with-param name="nodes2">
<h1>Being a Dog Is a Full-Time Job</h1>
</xsl:with-param>
</xsl:call-template>
</xsltu:test>
I am trying not to use an extensions. I ended up using the following,
which works perfectly.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/
Transform">
<xsl:output method="xml" indent="yes" />
<xsl:param name="fileCurrentPath" />
<xsl:param name="filePreviousPath" />
<xsl:variable name="fileCurrent"
select="document($fileCurrentPath, /)" />
<xsl:variable name="filePrevious"
select="document($filePreviousPath, /)" />
<xsl:template match="/">
<main>
<xsl:apply-templates select="$fileCurrent//datarecord"
mode="addedchanged"/>
</main>
</xsl:template>
<xsl:template match="//datarecord" mode="addedchanged" >
<xsl:variable name="varSpecies" select="species"/>
<xsl:variable name="varSubspecies" select="subspecies"/>
<xsl:choose>
<xsl:when test="$filePrevious//datarecord[species=$varSpecies]
[subspecies=$varSubspecies]">
<xsl:if test="not(.=$filePrevious//datarecord[species=
$varSpecies][subspecies=$varSubspecies])">
<xsl:copy-of select="."/>
</xsl:if>
</xsl:when>
<xsl:otherwise>
<xsl:copy-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>

Jun 23 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Don Garrett | last post by:
I have an XML document at the root of a directory tree that contains relative URIs to resources in a directory tree. During XSLT processing, these URI's can be used without any problems to...
8
by: Maciej Wegorkiewicz | last post by:
Hi, I have small experience in XSLT processing and I have a problem which I cannot solve. Can you look at it? I have an input file containing info about bank accounts like this: (...) <acc...
3
by: Luther Miller | last post by:
I am using an XSLT file to convert data in a DataSet to XMLSS format for opening in Excel. Excel doesn't like the way Dates in the DataSet are being ouput. It appears that a timezone offset is...
3
by: Teksure | last post by:
Hi group, searching in the Internet I found two products for XML which incorporate a very robust debugger for XSL/XSLT, I would like you to see these products and then, give me your opinion about...
2
by: Ganesh Muthuvelu | last post by:
Hello, How can I compare or visually check the differences between two XML schemas. Let us say I have two files like "version_1.xsd" and "version_2.xsd" , how would I programtically find out the...
2
by: bravegag | last post by:
Hi all, I developed a transformation process that works beautifully when tested using MS Internet Explorer i.e. adding the <?xml-stylesheet type="text/xsl" href="../xslt/xmldiffs.xsl"?> on top...
15
by: Jeff Uchtman | last post by:
Can I draw from 2 XML sources, the structure is exactly the same execpt for data contained into 1 xslt using math to add some structrure, and displaying others as node 1 and node 2? This data is...
7
by: HP17 | last post by:
I’m able using Javascript to transform a XML file using XSLT to a nice HTML output. What I need to do now is to combine two XML files and transform them together using XSLT. Here an abstract example:...
6
by: John Larson | last post by:
Hi All, I am some information from INSPEC database records in XML to build a relational database of my own. I am currently trying to extract information by doing an XSLT transform of the XML...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.