Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old March 3rd, 2006, 05:05 AM
Tommy
Guest
 
Posts: n/a
Default Complex XML transformation in a better performance way?

The problem is how to achieve the transformation as below:

The source xml contains tons of repeating structure like below, each item
node contains a person element and a insurance element that correlate to the
Person element with the person id.
<Item>
<Person id=”p123” name=”someone1”>
<Insurance ref=”p123” detail=”blabla1”>
</item>
<Item>
<Person id=”p123” name=”someone1”>
<Insurance ref=”p456” detail=”blabla2”>
</item>
<Item>
<Person id=”p456” name=”someone1”>
<Insurance ref=”p123” detail=”blabla3”>
</item>
The goal is to regroup to a structure of 1(Person) to many(Insurance), like
below
<Item>
<Person id=”p123” name=”someone1”>
<Insurance ref=”p123” detail=”blabla1”>
<Insurance ref=”p123” detail=”blabla3”>
</Item>
My initial idea was to load the source into memory and dissect into
Hashtables so that I could easily regroup. However, since the source file is
really big (approximate 50M each with 70000 repeating items), obviously my
way of doing it is too memory consuming. I am frustrated, after a whole day
sitting quietly and cannot figure out a better way, I would really appreciate
any help.

Thanks in advance

  #2  
Old March 5th, 2006, 09:35 PM
Peter Flynn
Guest
 
Posts: n/a
Default Re: Complex XML transformation in a better performance way?

Tommy wrote:[color=blue]
> The problem is how to achieve the transformation as below:
>
> The source xml contains tons of repeating structure like below, each item
> node contains a person element and a insurance element that correlate to the
> Person element with the person id.
> <Item>
> <Person id=”p123” name=”someone1”>
> <Insurance ref=”p123” detail=”blabla1”>
> </item>
> <Item>
> <Person id=”p123” name=”someone1”>
> <Insurance ref=”p456” detail=”blabla2”>
> </item>
> <Item>
> <Person id=”p456” name=”someone1”>
> <Insurance ref=”p123” detail=”blabla3”>
> </item>[/color]

This isn't XML. It might be SGML. If you want to process it as XML, the
closing > of the Person and Insurance elements must be preceded by a /;
the typographic curly quotes must be replaced by regular " chars;
the end-tags for the Item elements must be </Item> (not lowercase i);
and there must be an outermost enclosing element.
[color=blue]
> The goal is to regroup to a structure of 1(Person) to many(Insurance), like
> below
> <Item>
> <Person id=”p123” name=”someone1”>
> <Insurance ref=”p123” detail=”blabla1”>
> <Insurance ref=”p123” detail=”blabla3”>
> </Item>
> My initial idea was to load the source into memory and dissect into
> Hashtables so that I could easily regroup. However, since the source file is
> really big (approximate 50M each with 70000 repeating items), obviously my
> way of doing it is too memory consuming. I am frustrated, after a whole day
> sitting quietly and cannot figure out a better way, I would really appreciate
> any help.[/color]

If you really wanted to do it in XSLT, you could write:

<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:output method="xml"/>

<xsl:key name="ins" match="Insurance" use="@ref"/>

<xsl:template match="Person">
<xsl:if test="not(preceding::Person/@id=current()/@id)">
<Item>
<Person id="{@id}" name="{@name}"/>
<xsl:apply-templates mode="include" select="key('ins',@id)"/>
</Item>
</xsl:if>
</xsl:template>

<xsl:template match="Insurance" mode="include">
<xsl:copy-of select="."/>
</xsl:template>

<xsl:template match="Insurance"/>

</xsl:stylesheet>

But for a file that size the processing time would be rather long, and
as you point out, it would need lots of memory. Far better to extract
it all to CSV with a very simple linear XSLT routine and load it into a
database (or use a database XML-import system), and do it in {insert
language of choice here}.

///Peter
--
XML FAQ: http://xml.silmaril.ie/
 

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles