473,594 Members | 2,692 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Complex XML transformation in a better performance way?

The problem is how to achieve the transformation as below:

The source xml contains tons of repeating structure like below, each item
node contains a person element and a insurance element that correlate to the
Person element with the person id.
<Item>
<Person id=”p123” name=”someone 1”>
<Insurance ref=”p123” detail=”blabl a1”>
</item>
<Item>
<Person id=”p123” name=”someone 1”>
<Insurance ref=”p456” detail=”blabl a2”>
</item>
<Item>
<Person id=”p456” name=”someone 1”>
<Insurance ref=”p123” detail=”blabl a3”>
</item>
The goal is to regroup to a structure of 1(Person) to many(Insurance) , like
below
<Item>
<Person id=”p123” name=”someone 1”>
<Insurance ref=”p123” detail=”blabl a1”>
<Insurance ref=”p123” detail=”blabl a3”>
</Item>
My initial idea was to load the source into memory and dissect into
Hashtables so that I could easily regroup. However, since the source file is
really big (approximate 50M each with 70000 repeating items), obviously my
way of doing it is too memory consuming. I am frustrated, after a whole day
sitting quietly and cannot figure out a better way, I would really appreciate
any help.

Thanks in advance

Mar 3 '06 #1
1 1546
Tommy wrote:
The problem is how to achieve the transformation as below:

The source xml contains tons of repeating structure like below, each item
node contains a person element and a insurance element that correlate to the
Person element with the person id.
<Item>
<Person id=”p123” name=”someone 1”>
<Insurance ref=”p123” detail=”blabl a1”>
</item>
<Item>
<Person id=”p123” name=”someone 1”>
<Insurance ref=”p456” detail=”blabl a2”>
</item>
<Item>
<Person id=”p456” name=”someone 1”>
<Insurance ref=”p123” detail=”blabl a3”>
</item>
This isn't XML. It might be SGML. If you want to process it as XML, the
closing > of the Person and Insurance elements must be preceded by a /;
the typographic curly quotes must be replaced by regular " chars;
the end-tags for the Item elements must be </Item> (not lowercase i);
and there must be an outermost enclosing element.
The goal is to regroup to a structure of 1(Person) to many(Insurance) , like
below
<Item>
<Person id=”p123” name=”someone 1”>
<Insurance ref=”p123” detail=”blabl a1”>
<Insurance ref=”p123” detail=”blabl a3”>
</Item>
My initial idea was to load the source into memory and dissect into
Hashtables so that I could easily regroup. However, since the source file is
really big (approximate 50M each with 70000 repeating items), obviously my
way of doing it is too memory consuming. I am frustrated, after a whole day
sitting quietly and cannot figure out a better way, I would really appreciate
any help.


If you really wanted to do it in XSLT, you could write:

<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:styleshe et
xmlns:xsl="http ://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:output method="xml"/>

<xsl:key name="ins" match="Insuranc e" use="@ref"/>

<xsl:template match="Person">
<xsl:if test="not(prece ding::Person/@id=current()/@id)">
<Item>
<Person id="{@id}" name="{@name}"/>
<xsl:apply-templates mode="include" select="key('in s',@id)"/>
</Item>
</xsl:if>
</xsl:template>

<xsl:template match="Insuranc e" mode="include">
<xsl:copy-of select="."/>
</xsl:template>

<xsl:template match="Insuranc e"/>

</xsl:stylesheet>

But for a file that size the processing time would be rather long, and
as you point out, it would need lots of memory. Far better to extract
it all to CSV with a very simple linear XSLT routine and load it into a
database (or use a database XML-import system), and do it in {insert
language of choice here}.

///Peter
--
XML FAQ: http://xml.silmaril.ie/
Mar 5 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
4100
by: Dennis Gavrilov | last post by:
Hi, All! I have two questions: strategic and technical. Technical one first: I need to share an array of objects (implemented as hashes, having references to other objects and hashes, sharing done after blessing) between all of the mod_perl2 threads. The structure can grow quite big - tenths of thousands of array elements. It can grow as system operates (not possible to construct at apache startup). Sharing array is OK, but inserting...
6
4506
by: Mahesh Hardikar | last post by:
Hi , Oracle 8.1.7.0.0 on HP-UX 11.0 I am a newbie as far as PL-SQL is concerned . I have Sybase/MSSQL T-SQL background. We have a report which uses a select statement . This select statement joins 15 tables . SOme of the tables are outer joined. It runs much slow when parameters (From & To Date) are for a month.
1
3366
by: Tim Smith | last post by:
Hi, I have a table ORDER_DETAIL with 22 million rows which has an index of (person_id, code_id, created_dtt) I have another ORDER table with 5 million rows which has an index of (order_dtt, person_id) I have a small CODES table with 1000 rows which allows me to get
8
3653
by: Marc Schellens | last post by:
Does anybody know an easy way to get the atan of a complex number in C++? thanks, marc
7
5050
by: seia0106 | last post by:
Hello, Writing a program in c++ that should use complex numbers I have two choices before me. 1- define a struct for complex data i.e struct {float real,imag; }ComplexNum; 2-use an array of float type
0
1040
by: Jerry Pisk | last post by:
Is there a way to do an XSLT transformation quickly on a large document? Large is in millions of nodes under root in hundreds of megabytes or more. Running node by node by reading XmlReader and either transforming node by node or batching the nodes in a temporary DOM documents seems to be working but allocates so many temporary objects that the % Time in GC counter spikes to 20+ % during the transformation. is there a way to do this faster?...
2
351
by: Tommy | last post by:
The problem is how to achieve the transformation as below: The source xml contains tons of repeating structure like below, each item node contains a person element and a insurance element that correlate to the Person element with the person id. <Item> <Person id=”p123” name=”someone1”> <Insurance ref=”p123” detail=”blabla1”> </item> <Item>
1
15932
by: Rudolf Bargholz | last post by:
Hi, We have created triggers to log modifications to tables in our application. The triggers work fine, just on one of the tables in our database the triggers fail with the error message <SQL0437W Performance of this complex query may be sub-optimal. Reason code: "1". SQLSTATE=01602>. The same trigger on other tables works fine (the triggers are autogenerated with the same structure for most of the tables used in our application). I...
2
1709
by: =?Utf-8?B?c2lwcHl1Y29ubg==?= | last post by:
Have a complex process where I need to Import a large amount of data then run some transformations on this data then import into DataBase. The transformation involves multiple fields and multiple process - so the data needs to be read in 1 record at a time then run thru the transformation that may create new data value then everything is imported into a db to store. I have multiple questions 1)we used to have an internal data structure...
0
7947
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, well explore What is ONU, What Is Router, ONU & Routers main usage, and What is the difference between ONU and Router. Lets take a closer look ! Part I. Meaning of...
0
7880
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8255
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8242
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
5739
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5413
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
3868
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
3903
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2389
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.