Tommy wrote:[color=blue]
> The problem is how to achieve the transformation as below:
>
> The source xml contains tons of repeating structure like below, each item
> node contains a person element and a insurance element that correlate to the
> Person element with the person id.
> <Item>
> <Person id=”p123” name=”someone1”>
> <Insurance ref=”p123” detail=”blabla1”>
> </item>
> <Item>
> <Person id=”p123” name=”someone1”>
> <Insurance ref=”p456” detail=”blabla2”>
> </item>
> <Item>
> <Person id=”p456” name=”someone1”>
> <Insurance ref=”p123” detail=”blabla3”>
> </item>[/color]
This isn't XML. It might be SGML. If you want to process it as XML, the
closing > of the Person and Insurance elements must be preceded by a /;
the typographic curly quotes must be replaced by regular " chars;
the end-tags for the Item elements must be </Item> (not lowercase i);
and there must be an outermost enclosing element.
[color=blue]
> The goal is to regroup to a structure of 1(Person) to many(Insurance), like
> below
> <Item>
> <Person id=”p123” name=”someone1”>
> <Insurance ref=”p123” detail=”blabla1”>
> <Insurance ref=”p123” detail=”blabla3”>
> </Item>
> My initial idea was to load the source into memory and dissect into
> Hashtables so that I could easily regroup. However, since the source file is
> really big (approximate 50M each with 70000 repeating items), obviously my
> way of doing it is too memory consuming. I am frustrated, after a whole day
> sitting quietly and cannot figure out a better way, I would really appreciate
> any help.[/color]
If you really wanted to do it in XSLT, you could write:
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml"/>
<xsl:key name="ins" match="Insurance" use="@ref"/>
<xsl:template match="Person">
<xsl:if test="not(preceding::Person/@id=current()/@id)">
<Item>
<Person id="{@id}" name="{@name}"/>
<xsl:apply-templates mode="include" select="key('ins',@id)"/>
</Item>
</xsl:if>
</xsl:template>
<xsl:template match="Insurance" mode="include">
<xsl:copy-of select="."/>
</xsl:template>
<xsl:template match="Insurance"/>
</xsl:stylesheet>
But for a file that size the processing time would be rather long, and
as you point out, it would need lots of memory. Far better to extract
it all to CSV with a very simple linear XSLT routine and load it into a
database (or use a database XML-import system), and do it in {insert
language of choice here}.
///Peter
--
XML FAQ:
http://xml.silmaril.ie/