473,795 Members | 2,805 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

huge XML files, XSLT memory problems, Java & SAX...

I have 2 XML data files that I want to extract data from simultaneously

and transform with XSLT to generate a report. The first file is huge
and when XSLT builds the DOM tree in memory, it runs out of space.

I only need a few branches of elements from the original XML, so I am
seeking a recomended way of building a DOM for XSLT of only the
elements
that I need. I'm writing a Java application that invokes Xalan, and
reading up on SAX parsers this afternoon... I'm sure this is a common
problem, and as such, there is probably a clean and easy way to do it,
but I haven't found that one yet...

thanks,
Jeff

Feb 8 '06 #1
12 5365
Jeff Calico wrote:
I have 2 XML data files that I want to extract data from simultaneously

and transform with XSLT to generate a report. The first file is huge
and when XSLT builds the DOM tree in memory, it runs out of space.

I only need a few branches of elements from the original XML, so I am
seeking a recomended way of building a DOM for XSLT of only the
elements
that I need. I'm writing a Java application that invokes Xalan, and
reading up on SAX parsers this afternoon... I'm sure this is a common
problem, and as such, there is probably a clean and easy way to do it,
but I haven't found that one yet...

thanks,
Jeff


1. Save your xml files in an xml-databases.
2. Use xquery to only retrieve the data you want.
3. transform that data with your stylesheet
3. voila : better performance... because a xml database retrieves the
data you want fast from a huge set.
And you only load the elements you need in memory.

The only problem is that most xml-databases are in development..
Just google it.
Feb 8 '06 #2
Jeff Calico wrote:
and transform with XSLT to generate a report. The first file is huge
and when XSLT builds the DOM tree in memory, it runs out of space.


This has become a FAQ. The usual answer is to not use a DOM.
By the way, what do you consider a huge file ?
DOMs should work up until a few 100 MB of XML if
you have all the RAM for your your DOM.
Feb 8 '06 #3
Jeff Calico wrote:
I only need a few branches of elements from the original XML, so I am
seeking a recomended way of building a DOM for XSLT of only the
elements
that I need.


SAX through a SAX filter that selects the information you're concerned
with and thence into a SAX-to-DOM builder if you need an in-memory model.

Note that DOM implementations can vary in their efficiency; I once wrote
a DOM subset that required only six words of memory per node (not
counting text contents), and Xalan-j still uses my DTM data model
internally because it's more efficient than a traditional
Java-object-based DOM implementation (as well as being a better
impedence match to the XPath data model abstraction).

As others have said: What do you consider "huge"? Exceeding physical
memory? Exceeding _virtual_ memory?
Feb 9 '06 #4

Tjerk Wolterink wrote:

1. Save your xml files in an xml-databases.

[snip]

Thanks for your reply Tjerk. We have considered a database, although
not an xml database per se, but it seems a better option to immediately
discard elements that
we don't need, and that will of course be much faster also.

--Jeff

Feb 9 '06 #5
Thanks for your reply, Jurgen. I did do some searching in the archives
of this
newsgroup before posting, but I didn't find what I thought I might. As
I understand it,
XSLT requires a DOM to exist, so if I wish to not write my own XSLT
functionality,
then I must have one.

I do not yet know the exact size of files I must process, but I would
expect them to
be much less than 100 MB (I hope!). However, I have heard several
co-workers talk about
DOMs gobbling up all available memory, so I want to avoid even the
posibility of that.
I did get an out-of-memory error with XML Spy when using it to perform
a transform on a
*small* file of the type I am working with.

Anyway, the real issue is not to construct a huge DOM for no good
reason. I don't
need all that data, just 3 or 4 important nodes and their children...

--Jeff

Feb 9 '06 #6
Thanks for the reply, Joe. I expect I will be using Xalan-j and hence
your
earlier work :-)

As I mentioned in my reply to Jurgen above, I don't know the real sizes
of the XML data files yet, only that I should expect big ones and I did

crash XMLSpy with a fairly small data file while prototyping. As
usual, we
have the issues of speed and memory space; the best solution is not
not process what we don't need right from the beginning.

Would you happen to remember the names of the classes that do the
SAX filtering ---> filtered DOM building? If they are not on the tip
of your
tongue , I can and shall certainly look them up. But it is again the
issue of speed (I'm slow) and memory allocation (brain is running low
on space)
:-)

--Jeff

Feb 9 '06 #7
Jeff Calico wrote:
As I understand it,XSLT requires a DOM to exist
Uhm... Not exactly. XSLT may use a DOM internally (or may use other data
models). But most XSLT processors can accept input from a file, a text
stream, a SAX stream, or a DOM... and will output to any of those. So
you don't have to explicitly create a DOM in order to use XSLT, and
XSLT's internal representation may (or may not) be more efficient than a
general-purpose DOM.
I do not yet know the exact size of files I must process, but I would
expect them to
be much less than 100 MB (I hope!)
We run documents that size through Xalan on a regular basis.
However, I have heard several co-workers talk about
DOMs gobbling up all available memory
The DOM is just an API. How much memory a DOM needs depends on which DOM
implementation you're using as well as on the exact characteristics of
the document being processed.
I did get an out-of-memory error with XML Spy when using it to perform
a transform on a
*small* file of the type I am working with.
That may be a problem in the transformation, or you may have set the
limits on your environment too low.
Anyway, the real issue is not to construct a huge DOM for no good
reason. I don't
need all that data, just 3 or 4 important nodes and their children...


If that's the case, a hand-coded SAX solution will probably be more
efficient than an XSLT solution... for now. Recognizing and optimizing
these cases is an ongoing area of research for XSLT developers.
Feb 9 '06 #8
> Would you happen to remember the names of the classes that do the
SAX filtering ---> filtered DOM building


SAX-driven DOM builders are pretty common; many DOM implementations ship
with one, and if not generic ones are a standard intro-to-XML class
exercise so there are lots of them running around.

Filtering: That's up to you. You need to implement a class which is a
SAX handler, accepting the SAX event calls and tracking them to decide
what does and doesn't have to be passed along to another handler (in
this case, the DOM builder). Very standard bit of SAX programming, and
in fact a bit too standard for me to actually have kept pointers to
examples. Any good SAX tutorial ought to give you all the info you need
to do this -- modulo the hassle of figuring out what criteria you need
to use to decide what is and isn't worth passing along.

Lemme see if I've got anything on tap that's simple enough to be a good
pedagogical illustration...
Feb 9 '06 #9
Jeff Calico wrote:
XSLT requires a DOM to exist, so if I wish to not write my own XSLT
functionality,
then I must have one.
That's also my understanding of the problem.
I do not yet know the exact size of files I must process, but I would
expect them to
be much less than 100 MB (I hope!). However, I have heard several
co-workers talk about
DOMs gobbling up all available memory, so I want to avoid even the
posibility of that.
This sounds like your XSLT implementation has a problem.
I did get an out-of-memory error with XML Spy when using it to perform
a transform on a
*small* file of the type I am working with.
This confirms my guess about a problem with your XSLT implementation.
Anyway, the real issue is not to construct a huge DOM for no good
reason. I don't
need all that data, just 3 or 4 important nodes and their children...


I have heard read such postings several times over the
last months. But I cant give you a pointer right now.
Feb 9 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
5708
by: J Trost | last post by:
I was wondering if anyone knows if it is possible to do basic string replacement using XSLT even though the strings being replaced may contain "<" and ">". Here is my problem: I need to be able to convert XML like this: <?xml version="1.0" encoding="UTF-8"?> <java version="1.4.2_03" class="java.beans.XMLDecoder"> <object class="javax.swing.JButton"> <string>Hello, world</string>
4
2124
by: Ringo Langly | last post by:
Hi all, I'm a seasoned web programmer, but I've never touched XSLT. It's always been one of those acronyms I've never needed to educate myself on. Now... we're working with a web content provider who says we need to use XSLT and Web Services to pull the content from their site. Can someone give me a nutshell definition on how this works??? We use Cold Fusion MX on our web server, but I'm having trouble finding a
6
3171
by: Robbie Baldock | last post by:
Hi - I'm a bit of a newbie to the world of XSLTs but am trying to call a Java method on a parameter passed into an XSLT but am having problems. I've stripped the XSLT down to its bare bones: <xsl:stylesheet version="1.0" xmlns:java="http://xml.apache.org/xslt/java"
1
1533
by: cybersattva | last post by:
Hi everyone, I have some pretty mongo XML source files (say 256MB or so) that I need to run an XSL transform on. The transform works fine when the file has only one or two records in it, but when I tried using xalan on the whole file it ran for an hour with no perceptible results (besides a really sluggish CPU). Has anyone else had any luck with a pre-built XSL translator for monster files, or am I going to need to write a program to...
1
3111
by: Jens Mueller | last post by:
Hi there, this is a Java-XML Question, so I am not sure whether this is the right place, haven't found anything better .... I try to convert a Java object to XML via SAX and let the FOP Transformer convert that via XSLT to valid XSL-FO. So I define a SAXReader which fires the SAX Events for the Java Object. This works fine and the Transformation to PDF is ok. However, I have one object which contains an XHTML String and the tags
6
3807
by: Daniel Walzenbach | last post by:
Hi, I have a web application which sometimes throws an “out of memory” exception. To get an idea what happens I traced some values using performance monitor and got the following values (for one day): \\FFDS24\ASP.NET Applications(_LM_W3SVC_1_Root_ATV2004)\Errors During Execution: 7 \\FFDS24\ASP.NET Apps v1.1.4322(_LM_W3SVC_1_Root_ATV2004)\Compilations
3
2011
by: thomas.porschberg | last post by:
Hi, I want to read records from a database and export it in an arbitrary format. My idea was to feed a class with a String array fetched from the database and let this class fire SAX events as processor input. The basic class hierarchy is:
0
3304
by: DAnne | last post by:
Hi, I'm very new to xslt and this is my first time posting to a Forum so please forgive me if I transgress any protocols. I have to do a tally report. This report is divided up into sections. Each section has a list of questions. Each question has responses. I need to display a list of responses to the questions (i.e. set:distinct), once and only once, each section. My second problem is that these questions can also have corrective...
2
2847
by: alex masselot | last post by:
Hello I'm not familiar with xerces in c++ Currently, we parse xml file with perl (typically XML::Twig) and java (dom4j). With both API, there is a very comfortable way to mix Sax/DOM, by setting handlers to some elements paths. The xml file is parsed, then once a defined paths is reached, the
0
9673
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, well explore What is ONU, What Is Router, ONU & Routers main usage, and What is the difference between ONU and Router. Lets take a closer look ! Part I. Meaning of...
0
9522
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10443
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10216
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10165
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9044
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing, and deploymentwithout human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6783
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
2
3728
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2921
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.