I have 2 XML data files that I want to extract data from simultaneously
and transform with XSLT to generate a report. The first file is huge
and when XSLT builds the DOM tree in memory, it runs out of space.
I only need a few branches of elements from the original XML, so I am
seeking a recomended way of building a DOM for XSLT of only the
elements
that I need. I'm writing a Java application that invokes Xalan, and
reading up on SAX parsers this afternoon... I'm sure this is a common
problem, and as such, there is probably a clean and easy way to do it,
but I haven't found that one yet...
thanks,
Jeff 12 5365
Jeff Calico wrote: I have 2 XML data files that I want to extract data from simultaneously
and transform with XSLT to generate a report. The first file is huge and when XSLT builds the DOM tree in memory, it runs out of space.
I only need a few branches of elements from the original XML, so I am seeking a recomended way of building a DOM for XSLT of only the elements that I need. I'm writing a Java application that invokes Xalan, and reading up on SAX parsers this afternoon... I'm sure this is a common problem, and as such, there is probably a clean and easy way to do it, but I haven't found that one yet...
thanks, Jeff
1. Save your xml files in an xml-databases.
2. Use xquery to only retrieve the data you want.
3. transform that data with your stylesheet
3. voila : better performance... because a xml database retrieves the
data you want fast from a huge set.
And you only load the elements you need in memory.
The only problem is that most xml-databases are in development..
Just google it.
Jeff Calico wrote: and transform with XSLT to generate a report. The first file is huge and when XSLT builds the DOM tree in memory, it runs out of space.
This has become a FAQ. The usual answer is to not use a DOM.
By the way, what do you consider a huge file ?
DOMs should work up until a few 100 MB of XML if
you have all the RAM for your your DOM.
Jeff Calico wrote: I only need a few branches of elements from the original XML, so I am seeking a recomended way of building a DOM for XSLT of only the elements that I need.
SAX through a SAX filter that selects the information you're concerned
with and thence into a SAX-to-DOM builder if you need an in-memory model.
Note that DOM implementations can vary in their efficiency; I once wrote
a DOM subset that required only six words of memory per node (not
counting text contents), and Xalan-j still uses my DTM data model
internally because it's more efficient than a traditional
Java-object-based DOM implementation (as well as being a better
impedence match to the XPath data model abstraction).
As others have said: What do you consider "huge"? Exceeding physical
memory? Exceeding _virtual_ memory?
Tjerk Wolterink wrote: 1. Save your xml files in an xml-databases.
[snip]
Thanks for your reply Tjerk. We have considered a database, although
not an xml database per se, but it seems a better option to immediately
discard elements that
we don't need, and that will of course be much faster also.
--Jeff
Thanks for your reply, Jurgen. I did do some searching in the archives
of this
newsgroup before posting, but I didn't find what I thought I might. As
I understand it,
XSLT requires a DOM to exist, so if I wish to not write my own XSLT
functionality,
then I must have one.
I do not yet know the exact size of files I must process, but I would
expect them to
be much less than 100 MB (I hope!). However, I have heard several
co-workers talk about
DOMs gobbling up all available memory, so I want to avoid even the
posibility of that.
I did get an out-of-memory error with XML Spy when using it to perform
a transform on a
*small* file of the type I am working with.
Anyway, the real issue is not to construct a huge DOM for no good
reason. I don't
need all that data, just 3 or 4 important nodes and their children...
--Jeff
Thanks for the reply, Joe. I expect I will be using Xalan-j and hence
your
earlier work :-)
As I mentioned in my reply to Jurgen above, I don't know the real sizes
of the XML data files yet, only that I should expect big ones and I did
crash XMLSpy with a fairly small data file while prototyping. As
usual, we
have the issues of speed and memory space; the best solution is not
not process what we don't need right from the beginning.
Would you happen to remember the names of the classes that do the
SAX filtering ---> filtered DOM building? If they are not on the tip
of your
tongue , I can and shall certainly look them up. But it is again the
issue of speed (I'm slow) and memory allocation (brain is running low
on space)
:-)
--Jeff
Jeff Calico wrote: As I understand it,XSLT requires a DOM to exist
Uhm... Not exactly. XSLT may use a DOM internally (or may use other data
models). But most XSLT processors can accept input from a file, a text
stream, a SAX stream, or a DOM... and will output to any of those. So
you don't have to explicitly create a DOM in order to use XSLT, and
XSLT's internal representation may (or may not) be more efficient than a
general-purpose DOM.
I do not yet know the exact size of files I must process, but I would expect them to be much less than 100 MB (I hope!)
We run documents that size through Xalan on a regular basis.
However, I have heard several co-workers talk about DOMs gobbling up all available memory
The DOM is just an API. How much memory a DOM needs depends on which DOM
implementation you're using as well as on the exact characteristics of
the document being processed.
I did get an out-of-memory error with XML Spy when using it to perform a transform on a *small* file of the type I am working with.
That may be a problem in the transformation, or you may have set the
limits on your environment too low.
Anyway, the real issue is not to construct a huge DOM for no good reason. I don't need all that data, just 3 or 4 important nodes and their children...
If that's the case, a hand-coded SAX solution will probably be more
efficient than an XSLT solution... for now. Recognizing and optimizing
these cases is an ongoing area of research for XSLT developers.
> Would you happen to remember the names of the classes that do the SAX filtering ---> filtered DOM building
SAX-driven DOM builders are pretty common; many DOM implementations ship
with one, and if not generic ones are a standard intro-to-XML class
exercise so there are lots of them running around.
Filtering: That's up to you. You need to implement a class which is a
SAX handler, accepting the SAX event calls and tracking them to decide
what does and doesn't have to be passed along to another handler (in
this case, the DOM builder). Very standard bit of SAX programming, and
in fact a bit too standard for me to actually have kept pointers to
examples. Any good SAX tutorial ought to give you all the info you need
to do this -- modulo the hassle of figuring out what criteria you need
to use to decide what is and isn't worth passing along.
Lemme see if I've got anything on tap that's simple enough to be a good
pedagogical illustration...
Jeff Calico wrote: XSLT requires a DOM to exist, so if I wish to not write my own XSLT functionality, then I must have one.
That's also my understanding of the problem.
I do not yet know the exact size of files I must process, but I would expect them to be much less than 100 MB (I hope!). However, I have heard several co-workers talk about DOMs gobbling up all available memory, so I want to avoid even the posibility of that.
This sounds like your XSLT implementation has a problem.
I did get an out-of-memory error with XML Spy when using it to perform a transform on a *small* file of the type I am working with.
This confirms my guess about a problem with your XSLT implementation.
Anyway, the real issue is not to construct a huge DOM for no good reason. I don't need all that data, just 3 or 4 important nodes and their children...
I have heard read such postings several times over the
last months. But I cant give you a pointer right now. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: J Trost |
last post by:
I was wondering if anyone knows if it is possible to do basic string
replacement using XSLT even though the strings being replaced may
contain "<" and ">". Here is my problem:
I need to be able to convert XML like this:
<?xml version="1.0" encoding="UTF-8"?>
<java version="1.4.2_03" class="java.beans.XMLDecoder">
<object class="javax.swing.JButton">
<string>Hello, world</string>
|
by: Ringo Langly |
last post by:
Hi all,
I'm a seasoned web programmer, but I've never touched XSLT. It's
always been one of those acronyms I've never needed to educate myself
on.
Now... we're working with a web content provider who says we need to
use XSLT and Web Services to pull the content from their site. Can
someone give me a nutshell definition on how this works??? We use
Cold Fusion MX on our web server, but I'm having trouble finding a
|
by: Robbie Baldock |
last post by:
Hi -
I'm a bit of a newbie to the world of XSLTs but am trying to call a Java method on a parameter passed into an XSLT but
am having problems.
I've stripped the XSLT down to its bare bones:
<xsl:stylesheet
version="1.0"
xmlns:java="http://xml.apache.org/xslt/java"
|
by: cybersattva |
last post by:
Hi everyone,
I have some pretty mongo XML source files (say 256MB or so) that I need
to run an XSL transform on. The transform works fine when the file has
only one or two records in it, but when I tried using xalan on the
whole file it ran for an hour with no perceptible results (besides a
really sluggish CPU). Has anyone else had any luck with a pre-built
XSL translator for monster files, or am I going to need to write a
program to...
|
by: Jens Mueller |
last post by:
Hi there,
this is a Java-XML Question, so I am not sure whether this is the
right place, haven't found anything better ....
I try to convert a Java object to XML via SAX and let the FOP
Transformer convert that via XSLT to valid XSL-FO. So I define a
SAXReader which fires the SAX Events for the Java Object. This works
fine and the Transformation to PDF is ok.
However, I have one object which contains an XHTML String and the tags
| |
by: Daniel Walzenbach |
last post by:
Hi,
I have a web application which sometimes throws an “out of memory”
exception. To get an idea what happens I traced some values using performance
monitor and got the following values (for one day):
\\FFDS24\ASP.NET Applications(_LM_W3SVC_1_Root_ATV2004)\Errors During
Execution: 7
\\FFDS24\ASP.NET Apps v1.1.4322(_LM_W3SVC_1_Root_ATV2004)\Compilations
|
by: thomas.porschberg |
last post by:
Hi,
I want to read records from a database and export it in an arbitrary
format.
My idea was to feed a class with a String array fetched from the
database and let
this class fire SAX events as processor input.
The basic class hierarchy is:
|
by: DAnne |
last post by:
Hi,
I'm very new to xslt and this is my first time posting to a Forum so please forgive me if I transgress any protocols.
I have to do a tally report. This report is divided up into sections. Each section has a list of questions. Each question has responses.
I need to display a list of responses to the questions (i.e. set:distinct), once and only once, each section.
My second problem is that these questions can also have corrective...
|
by: alex masselot |
last post by:
Hello
I'm not familiar with xerces in c++
Currently, we parse xml file with perl (typically XML::Twig) and java
(dom4j).
With both API, there is a very comfortable way to mix Sax/DOM, by
setting handlers to some elements paths.
The xml file is parsed, then once a defined paths is reached, the
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, well explore What is ONU, What Is Router, ONU & Routers main usage, and What is the difference between ONU and Router. Lets take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing, and deploymentwithout human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| | |