473,405 Members | 2,404 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

Help to Process two very big xml files....

Hello,
I have two big xml files (around 50-60 MB) each. and I need to
process the data within each of them. The problem is, I need to
process each node and compare with the other nodes in the other xml
file. After iterating through all the nodes, I need to find those
nodes which have changed or which have been newly introduced.
Assume the following xml structure,

<?xml version="1.0"?>
<root>
<nodeToProcess>

</nodeToProcess>
.....
</root>

I have two such xml files. I keep one xml file as the reference and
compare it with the other. To solve this problem, I thought, I could
use XPath. However, for now, only DOM based XPath processors are
there. Since the file is very huge, I dont think I can afford DOM.
( Memory constraint )

How can I approach this problem ? what would be the right way to start
with.

P.S ( I am trying to access these elements through Java)
Jun 27 '08 #1
3 2164
fuel escribió:
Hello,
I have two big xml files (around 50-60 MB) each. and I need to
process the data within each of them. The problem is, I need to
process each node and compare with the other nodes in the other xml
file. After iterating through all the nodes, I need to find those
nodes which have changed or which have been newly introduced.
Assume the following xml structure,

<?xml version="1.0"?>
<root>
<nodeToProcess>

</nodeToProcess>
.....
</root>

I have two such xml files. I keep one xml file as the reference and
compare it with the other. To solve this problem, I thought, I could
use XPath. However, for now, only DOM based XPath processors are
there. Since the file is very huge, I dont think I can afford DOM.
( Memory constraint )

How can I approach this problem ? what would be the right way to start
with.
There are ready-to-run tools for differencing XML files. Please google
for xml-diff.
>
P.S ( I am trying to access these elements through Java)
Some of the tools are written in Java and some of them are open-source.

Don't know the performance of these tools with big files.

Hope this helps.
--
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado
Jun 27 '08 #2
fuel wrote:
How can I approach this problem ? what would be the right way to start
with.
Considering current destktop systems with a main memory of 1 or 2 or 3
GB I don't think you will run into problems to perform XPath on 60 MB
files. Just make sure that the Java VM is allowed to allocate enough
memory http://java.sun.com/javase/6/docs/te...dows/java.html
--

Martin Honnen
http://JavaScript.FAQTs.com/
Jun 27 '08 #3
You should check out vtd-xml, which is ideally suited for the task you
described...
http://vtd-xml.sf.net

"fuel" <aj*********@gmail.comwrote in message
news:0c**********************************@l42g2000 hsc.googlegroups.com...
Hello,
I have two big xml files (around 50-60 MB) each. and I need to
process the data within each of them. The problem is, I need to
process each node and compare with the other nodes in the other xml
file. After iterating through all the nodes, I need to find those
nodes which have changed or which have been newly introduced.
Assume the following xml structure,

<?xml version="1.0"?>
<root>
<nodeToProcess>

</nodeToProcess>
.....
</root>

I have two such xml files. I keep one xml file as the reference and
compare it with the other. To solve this problem, I thought, I could
use XPath. However, for now, only DOM based XPath processors are
there. Since the file is very huge, I dont think I can afford DOM.
( Memory constraint )

How can I approach this problem ? what would be the right way to start
with.

P.S ( I am trying to access these elements through Java)


Aug 4 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: baustin75 | last post by:
Posted: Mon Oct 03, 2005 1:41 pm Post subject: cannot mail() in ie only when debugging in php designer 2005 -------------------------------------------------------------------------------- ...
6
by: Jamal | last post by:
I am working on binary files of struct ACTIONS I have a recursive qsort/mergesort hybrid that 1) i'm not a 100% sure works correctly 2) would like to convert to iteration Any comments or...
0
by: Abhi | last post by:
Hi- I'm trying to execute the C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\50\bin\OWSADM.EXE programmatically from a shell using the process.Start() method. I'm also...
0
by: OpticTygre | last post by:
I've been reading some things about threading, delegates, threadpools, locks, etc... yet I can't quite seem to grasp some of the concepts on it quite yet. I'm currently working on a project I need...
1
by: Rahul | last post by:
Hi Everybody I have some problem in my script. please help me. This is script file. I have one *.inq file. I want run this script in XML files. But this script errors shows . If u want i am...
1
by: treelife | last post by:
I'm getting and internal server error when | run the following mod_python script. I am actually trying to run Django. Script: from mod_python import apache def handler(req):...
46
by: Bruce W. Darby | last post by:
This will be my very first VB.Net application and it's pretty simple. But I've got a snag in my syntax somewhere. Was hoping that someone could point me in the right direction. The history: My...
5
by: Sam | last post by:
Hi, I have one table like : MyTable {field1, field2, startdate, enddate} I want to have the count of field1 between startdate and enddate, and the count of field2 where field2 = 1 between...
2
by: =?Utf-8?B?SnJ4dHVzZXIx?= | last post by:
I just started using Windows Live OneCare, I had been using Norton, but was unable to fix the problems I was having. I have yet been unsuccessful with OneCare as well. I keep getting the same...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.