Tommy wrote:
The problem is how to achieve the transformation as below:
The source xml contains tons of repeating structure like below, each item
node contains a person element and a insurance element that correlate to the
Person element with the person id.
<Item>
<Person id=”p123” name=”someone 1”>
<Insurance ref=”p123” detail=”blabl a1”>
</item>
<Item>
<Person id=”p123” name=”someone 1”>
<Insurance ref=”p456” detail=”blabl a2”>
</item>
<Item>
<Person id=”p456” name=”someone 1”>
<Insurance ref=”p123” detail=”blabl a3”>
</item>
This isn't XML. It might be SGML. If you want to process it as XML, the
closing > of the Person and Insurance elements must be preceded by a /;
the typographic curly quotes must be replaced by regular " chars;
the end-tags for the Item elements must be </Item> (not lowercase i);
and there must be an outermost enclosing element.
The goal is to regroup to a structure of 1(Person) to many(Insurance) , like
below
<Item>
<Person id=”p123” name=”someone 1”>
<Insurance ref=”p123” detail=”blabl a1”>
<Insurance ref=”p123” detail=”blabl a3”>
</Item>
My initial idea was to load the source into memory and dissect into
Hashtables so that I could easily regroup. However, since the source file is
really big (approximate 50M each with 70000 repeating items), obviously my
way of doing it is too memory consuming. I am frustrated, after a whole day
sitting quietly and cannot figure out a better way, I would really appreciate
any help.
If you really wanted to do it in XSLT, you could write:
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:styleshe et
xmlns:xsl="http ://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml"/>
<xsl:key name="ins" match="Insuranc e" use="@ref"/>
<xsl:template match="Person">
<xsl:if test="not(prece ding::Person/@id=current()/@id)">
<Item>
<Person id="{@id}" name="{@name}"/>
<xsl:apply-templates mode="include" select="key('in s',@id)"/>
</Item>
</xsl:if>
</xsl:template>
<xsl:template match="Insuranc e" mode="include">
<xsl:copy-of select="."/>
</xsl:template>
<xsl:template match="Insuranc e"/>
</xsl:stylesheet>
But for a file that size the processing time would be rather long, and
as you point out, it would need lots of memory. Far better to extract
it all to CSV with a very simple linear XSLT routine and load it into a
database (or use a database XML-import system), and do it in {insert
language of choice here}.
///Peter
--
XML FAQ:
http://xml.silmaril.ie/