473,474 Members | 1,571 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Complex XML transformation in a better performance way?

The problem is how to achieve the transformation as below:

The source xml contains tons of repeating structure like below, each item
node contains a person element and a insurance element that correlate to the
Person element with the person id.
<Item>
<Person id=”p123” name=”someone1”>
<Insurance ref=”p123” detail=”blabla1”>
</item>
<Item>
<Person id=”p123” name=”someone1”>
<Insurance ref=”p456” detail=”blabla2”>
</item>
<Item>
<Person id=”p456” name=”someone1”>
<Insurance ref=”p123” detail=”blabla3”>
</item>
The goal is to regroup to a structure of 1(Person) to many(Insurance), like
below
<Item>
<Person id=”p123” name=”someone1”>
<Insurance ref=”p123” detail=”blabla1”>
<Insurance ref=”p123” detail=”blabla3”>
</Item>
My initial idea was to load the source into memory and dissect into
Hashtables so that I could easily regroup. However, since the source file is
really big (approximate 50M each with 70000 repeating items), obviously my
way of doing it is too memory consuming. I am frustrated, after a whole day
sitting quietly and cannot figure out a better way, I would really appreciate
any help.

Thanks in advance

Mar 3 '06 #1
1 1533
Tommy wrote:
The problem is how to achieve the transformation as below:

The source xml contains tons of repeating structure like below, each item
node contains a person element and a insurance element that correlate to the
Person element with the person id.
<Item>
<Person id=”p123” name=”someone1”>
<Insurance ref=”p123” detail=”blabla1”>
</item>
<Item>
<Person id=”p123” name=”someone1”>
<Insurance ref=”p456” detail=”blabla2”>
</item>
<Item>
<Person id=”p456” name=”someone1”>
<Insurance ref=”p123” detail=”blabla3”>
</item>
This isn't XML. It might be SGML. If you want to process it as XML, the
closing > of the Person and Insurance elements must be preceded by a /;
the typographic curly quotes must be replaced by regular " chars;
the end-tags for the Item elements must be </Item> (not lowercase i);
and there must be an outermost enclosing element.
The goal is to regroup to a structure of 1(Person) to many(Insurance), like
below
<Item>
<Person id=”p123” name=”someone1”>
<Insurance ref=”p123” detail=”blabla1”>
<Insurance ref=”p123” detail=”blabla3”>
</Item>
My initial idea was to load the source into memory and dissect into
Hashtables so that I could easily regroup. However, since the source file is
really big (approximate 50M each with 70000 repeating items), obviously my
way of doing it is too memory consuming. I am frustrated, after a whole day
sitting quietly and cannot figure out a better way, I would really appreciate
any help.


If you really wanted to do it in XSLT, you could write:

<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:output method="xml"/>

<xsl:key name="ins" match="Insurance" use="@ref"/>

<xsl:template match="Person">
<xsl:if test="not(preceding::Person/@id=current()/@id)">
<Item>
<Person id="{@id}" name="{@name}"/>
<xsl:apply-templates mode="include" select="key('ins',@id)"/>
</Item>
</xsl:if>
</xsl:template>

<xsl:template match="Insurance" mode="include">
<xsl:copy-of select="."/>
</xsl:template>

<xsl:template match="Insurance"/>

</xsl:stylesheet>

But for a file that size the processing time would be rather long, and
as you point out, it would need lots of memory. Far better to extract
it all to CSV with a very simple linear XSLT routine and load it into a
database (or use a database XML-import system), and do it in {insert
language of choice here}.

///Peter
--
XML FAQ: http://xml.silmaril.ie/
Mar 5 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Dennis Gavrilov | last post by:
Hi, All! I have two questions: strategic and technical. Technical one first: I need to share an array of objects (implemented as hashes, having references to other objects and hashes, sharing...
6
by: Mahesh Hardikar | last post by:
Hi , Oracle 8.1.7.0.0 on HP-UX 11.0 I am a newbie as far as PL-SQL is concerned . I have Sybase/MSSQL T-SQL background. We have a report which uses a select statement . This select...
1
by: Tim Smith | last post by:
Hi, I have a table ORDER_DETAIL with 22 million rows which has an index of (person_id, code_id, created_dtt) I have another ORDER table with 5 million rows which has an index of (order_dtt,...
8
by: Marc Schellens | last post by:
Does anybody know an easy way to get the atan of a complex number in C++? thanks, marc
7
by: seia0106 | last post by:
Hello, Writing a program in c++ that should use complex numbers I have two choices before me. 1- define a struct for complex data i.e struct {float real,imag; }ComplexNum; 2-use an array...
0
by: Jerry Pisk | last post by:
Is there a way to do an XSLT transformation quickly on a large document? Large is in millions of nodes under root in hundreds of megabytes or more. Running node by node by reading XmlReader and...
2
by: Tommy | last post by:
The problem is how to achieve the transformation as below: The source xml contains tons of repeating structure like below, each item node contains a person element and a insurance element that...
1
by: Rudolf Bargholz | last post by:
Hi, We have created triggers to log modifications to tables in our application. The triggers work fine, just on one of the tables in our database the triggers fail with the error message...
2
by: =?Utf-8?B?c2lwcHl1Y29ubg==?= | last post by:
Have a complex process where I need to Import a large amount of data then run some transformations on this data then import into DataBase. The transformation involves multiple fields and multiple...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.