Tommy wrote:
The problem is how to achieve the transformation as below:
The source xml contains tons of repeating structure like below, each item
node contains a person element and a insurance element that correlate to the
Person element with the person id.
<Item>
<Person id=”p123” name=”someone1”>
<Insurance ref=”p123” detail=”blabla1”>
</item>
<Item>
<Person id=”p123” name=”someone1”>
<Insurance ref=”p456” detail=”blabla2”>
</item>
<Item>
<Person id=”p456” name=”someone1”>
<Insurance ref=”p123” detail=”blabla3”>
</item>
This isn't XML. It might be SGML. If you want to process it as XML, the
closing > of the Person and Insurance elements must be preceded by a /;
the typographic curly quotes must be replaced by regular " chars;
the end-tags for the Item elements must be </Item> (not lowercase i);
and there must be an outermost enclosing element.
The goal is to regroup to a structure of 1(Person) to many(Insurance), like
below
<Item>
<Person id=”p123” name=”someone1”>
<Insurance ref=”p123” detail=”blabla1”>
<Insurance ref=”p123” detail=”blabla3”>
</Item>
My initial idea was to load the source into memory and dissect into
Hashtables so that I could easily regroup. However, since the source file is
really big (approximate 50M each with 70000 repeating items), obviously my
way of doing it is too memory consuming. I am frustrated, after a whole day
sitting quietly and cannot figure out a better way, I would really appreciate
any help.
If you really wanted to do it in XSLT, you could write:
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml"/>
<xsl:key name="ins" match="Insurance" use="@ref"/>
<xsl:template match="Person">
<xsl:if test="not(preceding::Person/@id=current()/@id)">
<Item>
<Person id="{@id}" name="{@name}"/>
<xsl:apply-templates mode="include" select="key('ins',@id)"/>
</Item>
</xsl:if>
</xsl:template>
<xsl:template match="Insurance" mode="include">
<xsl:copy-of select="."/>
</xsl:template>
<xsl:template match="Insurance"/>
</xsl:stylesheet>
But for a file that size the processing time would be rather long, and
as you point out, it would need lots of memory. Far better to extract
it all to CSV with a very simple linear XSLT routine and load it into a
database (or use a database XML-import system), and do it in {insert
language of choice here}.
///Peter
--
XML FAQ:
http://xml.silmaril.ie/