Login or Sign up Help | Site Map
Connecting Tech Pros Worldwide

Help to Process two very big xml files....

Question posted by: fuel (Guest) on June 27th, 2008 07:07 PM
Hello,
I have two big xml files (around 50-60 MB) each. and I need to
process the data within each of them. The problem is, I need to
process each node and compare with the other nodes in the other xml
file. After iterating through all the nodes, I need to find those
nodes which have changed or which have been newly introduced.
Assume the following xml structure,

<?xml version="1.0"?>
<root>
<nodeToProcess>

</nodeToProcess>
.....
</root>

I have two such xml files. I keep one xml file as the reference and
compare it with the other. To solve this problem, I thought, I could
use XPath. However, for now, only DOM based XPath processors are
there. Since the file is very huge, I dont think I can afford DOM.
( Memory constraint )

How can I approach this problem ? what would be the right way to start
with.

P.S ( I am trying to access these elements through Java)


Would you like to answer this question?
Sign up for a free account, or Login (if you're already a member).
Manuel Collado's Avatar
Manuel Collado
Guest
n/a Posts
June 27th, 2008
07:07 PM
#2

Re: Help to Process two very big xml files....
fuel escribió:
Quote:
Originally Posted by
Hello,
I have two big xml files (around 50-60 MB) each. and I need to
process the data within each of them. The problem is, I need to
process each node and compare with the other nodes in the other xml
file. After iterating through all the nodes, I need to find those
nodes which have changed or which have been newly introduced.
Assume the following xml structure,
>
<?xml version="1.0"?>
<root>
<nodeToProcess>
>
</nodeToProcess>
.....
</root>
>
I have two such xml files. I keep one xml file as the reference and
compare it with the other. To solve this problem, I thought, I could
use XPath. However, for now, only DOM based XPath processors are
there. Since the file is very huge, I dont think I can afford DOM.
( Memory constraint )
>
How can I approach this problem ? what would be the right way to start
with.


There are ready-to-run tools for differencing XML files. Please google
for xml-diff.
Quote:
Originally Posted by
>
P.S ( I am trying to access these elements through Java)


Some of the tools are written in Java and some of them are open-source.

Don't know the performance of these tools with big files.

Hope this helps.
--
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado

Martin Honnen's Avatar
Martin Honnen
Guest
n/a Posts
June 27th, 2008
07:07 PM
#3

Re: Help to Process two very big xml files....
fuel wrote:
Quote:
Originally Posted by
How can I approach this problem ? what would be the right way to start
with.


Considering current destktop systems with a main memory of 1 or 2 or 3
GB I don't think you will run into problems to perform XPath on 60 MB
files. Just make sure that the Java VM is allowed to allocate enough
memory http://java.sun.com/javase/6/docs/t...ndows/java.html


--

Martin Honnen
http://JavaScript.FAQTs.com/

jimmy Zhang's Avatar
jimmy Zhang
Guest
n/a Posts
August 4th, 2008
01:55 AM
#4

Re: Help to Process two very big xml files....
You should check out vtd-xml, which is ideally suited for the task you
described...
http://vtd-xml.sf.net



"fuel" <ajaykumarns@gmail.comwrote in message
news:0ced59a0-c563-4493-8091-f0f72848bd89@l42g2000hsc.googlegroups.com...
Quote:
Originally Posted by
Hello,
I have two big xml files (around 50-60 MB) each. and I need to
process the data within each of them. The problem is, I need to
process each node and compare with the other nodes in the other xml
file. After iterating through all the nodes, I need to find those
nodes which have changed or which have been newly introduced.
Assume the following xml structure,
>
<?xml version="1.0"?>
<root>
<nodeToProcess>
>
</nodeToProcess>
.....
</root>
>
I have two such xml files. I keep one xml file as the reference and
compare it with the other. To solve this problem, I thought, I could
use XPath. However, for now, only DOM based XPath processors are
there. Since the file is very huge, I dont think I can afford DOM.
( Memory constraint )
>
How can I approach this problem ? what would be the right way to start
with.
>
P.S ( I am trying to access these elements through Java)
>
>




 
Not the answer you were looking for? Post your question . . .
182,317 Experts ready to help you find a solution.
Sign up for a free account, or Login (if you're already a member).

  • Didn't find the answer you were looking for?
    Post Your Question
  • Top Community Contributors