473,378 Members | 1,377 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

xslt processing and memory needs

Hi,

I want to read records from a database and export it in an arbitrary
format.
My idea was to feed a class with a String array fetched from the
database and let
this class fire SAX events as processor input.

The basic class hierarchy is:
org.xml.sax.XMLReader [Interface]
^
|
|
AbstractXMLReader
^
/ \
|
|
|
StringArrayXMLReader --------------------> TransformerHandler

The TransformerHandler is associated with a StreamResult
(ByteArrayOutputStream)
and the XSL-Stylesheet.

In a first try I provided a init and close method for
StringArrayXMLReader which implements
startDocument()/endDocument() and some header information.
The problem is here that the xslt processing takes place on endDocument
and therefore all
rows from the database are buffered until endDocument is fired.
Because the result set from the database could be really large this is
not an option for me.
For recap:
fired SAX event was :
<?xml version="1.0" encoding="UTF-8"?>
<dataset>

<row><value>1-String</value><value>2-String</value><value>3-String</value></row>

<row><value>R1-String</value><value>R2-String</value><value>R3-String</value></row>
</dataset>

XSL was:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"/>
<xsl:template match="/">
<table border="1">
<xsl:apply-templates select="dataset/row"/>
</table>
</xsl:template>

<xsl:template match="dataset/row">
<tr>
<xsl:apply-templates select="value"/>
</tr>
</xsl:template>

<xsl:template match="value">
<td>
<xsl:value-of select='.'/>
</td>
</xsl:template>
</xsl:stylesheet>

My next idea was to consider one row from the database result set as
one input document for
the processor and to provide the header information manually.
I also modified my XSL-Stylesheet and put the startDocument/endDocument
in the parse method.
Processor input is now:

<?xml version="1.0" encoding="UTF-8"?>
<row><value>1-String</value><value>2-String</value><value>3-String</value></row>

and XSL looks:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"/>
<xsl:template match="/">
<tr>
<xsl:apply-templates select="row/value"/>
</tr>
</xsl:template>

<xsl:template match="value">
<td>
<xsl:value-of select='.'/>
</td>
</xsl:template>
</xsl:stylesheet>

Doing this now I get an NullPointerException in
org.apache.xalan.transformer.TransformerImpl.run
on the second call of my StringArrayXMLReader.parse method.
The first call looks OK. The fired SAX event is converted properly by
the XSLT-processor.

Beside this special problem I have the more general question:
Is it possible to transform XML with a xslt processor without having
the whole document in memory ?
Because my result set is very large I can not read it in memory.
I tried the SAX-fire event mechanism because I thought it gives me the
fewest overhead.
However consider a row as a complete document looks also not tidy
because now I lose the xslt
power for my header information.
Maybe I misunderstand here something completely.
How would a XML/XSLT expert solve the problem ?

Apr 18 '06 #1
3 1984
th***************@osp-dd.de wrote:
Is it possible to transform XML with a xslt processor without having
the whole document in memory ?
In general, no, because the XSLT language has complete random access to
the source document.

If your stylesheet reliably reads only forward through the document,
some processors can take advantage of that if configured properly. For
example, Xalan's SQL extension, for its own convenience, doesn't attempt
to present the entire document at once and requires that you write the
stylesheet with that restriction in mind.

Automating recognition of that sort of opportunity and using it for
optimization is one of the holy grails of XSLT processing, and is an
ongoing area of research. If you search the archives of the Xalan
mailing list for the keywords "streaming", "pruning", and "filtering"
you'll find discussion of the challenges involved in making this work
properly. I know folks who are still trying to make it work.

Meanwhile, some XSLT processors (eg Xalan) switched from using DOMs
internally to using non-object-based data models for memory efficiency
reasons; that may help you.

In fact, that brings up another possibility. IBM's recently added a
native XML model to DB2, letting it function as a true XML database as
well as a relational database, and I believe they have implemented
XQuery for that. They may have implemented XSLT as well; if not, XQuery
is functionally interchangable with XSLT 2.0 (XQuery and XSLT2 are
actually two sides of the same design effort) so you could probably
rewrite your stylesheet as an XQuery. This would let you leverage DB2's
intelligence in memory management. I believe trial/beta copies are
available from IBM's website.
I tried the SAX-fire event mechanism because I thought it gives me the
fewest overhead.


That's another solution, of course: Write your own SAX-based processing
layer, storing only the information you Really Need in memory. If you
want to use that together with XSLT, what I'd suggest is that you create
a SAX-based filter to discard portions of your document that the
stylesheet doesn't need to see, then run XSLT on the result.

If you really need to render everything, you can try writing a SAX-based
solution from the ground up, which may allow you to avoid storing
anything in memory for the long term -- IF your rendering permits you to
do so. Or you may find it makes sense to actually re-read the source
document several times rather than keeping too much in memory. As I say,
we do hope XSLT will eventually be optimized to the point where it can
make these decisions for you... but for now, there are still cases where
the right thing to do is drop down to a lower-level language, just as a
Java programmer will occasionally find they have to drop down to the
bytecode level or JNI native code to get the performance they need.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Apr 18 '06 #2

Joe Kesselman schrieb:
th***************@osp-dd.de wrote:
Is it possible to transform XML with a xslt processor without having
the whole document in memory ?
In general, no, because the XSLT language has complete random access to
the source document.

If your stylesheet reliably reads only forward through the document,
some processors can take advantage of that if configured properly. For
example, Xalan's SQL extension, for its own convenience, doesn't attempt
to present the entire document at once and requires that you write the
stylesheet with that restriction in mind.


Actually my stylesheets are very simple.
The stylesheets work also all on a very simple XML input which is:

<?xml version="1.0" encoding="UTF-8"?>
<dataset>
<row><value>1-String</value><value>2-String</value></row>
<row><value>R1-String</value><value>R2-String</value></row>
</dataset>

The problem is that it can be very many row entries.
Automating recognition of that sort of opportunity and using it for
optimization is one of the holy grails of XSLT processing, and is an
ongoing area of research. If you search the archives of the Xalan
mailing list for the keywords "streaming", "pruning", and "filtering"
you'll find discussion of the challenges involved in making this work
properly. I know folks who are still trying to make it work.

Meanwhile, some XSLT processors (eg Xalan) switched from using DOMs
internally to using non-object-based data models for memory efficiency
reasons; that may help you.
I do not create a DOM document as input but Sax-Events.
I think the XML part is not the problem but the XSLT processing part.
The problem is that the TransformerHandler starts its work, if the
endDocument() call happend. When I read much <row> data records it
must buffer all the the data read from databse in memory.
In fact, that brings up another possibility. IBM's recently added a
native XML model to DB2, letting it function as a true XML database as
well as a relational database, and I believe they have implemented
XQuery for that. They may have implemented XSLT as well; if not, XQuery
is functionally interchangable with XSLT 2.0 (XQuery and XSLT2 are
actually two sides of the same design effort) so you could probably
rewrite your stylesheet as an XQuery. This would let you leverage DB2's
intelligence in memory management. I believe trial/beta copies are
available from IBM's website. We use an oracle database but I can not use it internal XML
capabilities because
I'm tied to our company database access framework. I can only build the
output on top of
the database result sets.
I tried the SAX-fire event mechanism because I thought it gives me the
fewest overhead.
That's another solution, of course: Write your own SAX-based processing
layer, storing only the information you Really Need in memory. If you
want to use that together with XSLT, what I'd suggest is that you create
a SAX-based filter to discard portions of your document that the
stylesheet doesn't need to see, then run XSLT on the result.

I think the "then run XSLT on the result" is the problem.
When I start the fetch from the database I call a init() method which
calls startDocument() then I fetch row by row and call parse().
In parse() the <row><value>... is fired.
When all records from the database are read, I call a close() method,
which itself
calls endDocument(). Now the xslt processing takes place.
If you really need to render everything, you can try writing a SAX-based
solution from the ground up, which may allow you to avoid storing
anything in memory for the long term -- IF your rendering permits you to
do so. Or you may find it makes sense to actually re-read the source
document several times rather than keeping too much in memory.
Because I fetch the data from Database, I can not read it multiple
times.
But this is not the point. I only fetch the data I really need put it
in a String Array
and produce these startElement,endElement,characters calls of my
ContentHandler.
The code is at:
http://randspringer.de/sax_row.tar
http://randspringer.de/sax.tar
As I say,
we do hope XSLT will eventually be optimized to the point where it can
make these decisions for you... but for now, there are still cases where
the right thing to do is drop down to a lower-level language, just as a
Java programmer will occasionally find they have to drop down to the
bytecode level or JNI native code to get the performance they need.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry


Apr 19 '06 #3
As I said: XSLT processors are still learning when they can
stream/prune/filter the incoming data.

For now, and for your particular case, you'll be better off hand-coding
a SAX-based solution.
Apr 19 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: K. N. | last post by:
Is there any good and fast Python module for XSLT processing ? I'm going to use XML and XSLT to generate web pages, so I need XSLT processor that will be able to transform for example a DOM object...
8
by: Maciej Wegorkiewicz | last post by:
Hi, I have small experience in XSLT processing and I have a problem which I cannot solve. Can you look at it? I have an input file containing info about bank accounts like this: (...) <acc...
3
by: Gordon Moore | last post by:
Hi, I'm new to using xml/xslt and although I can create an xml document using the dataset.WriteXml statement, and I have created an xslt to transform the xml into the output I want, I have to...
12
by: Jeff Calico | last post by:
I have 2 XML data files that I want to extract data from simultaneously and transform with XSLT to generate a report. The first file is huge and when XSLT builds the DOM tree in memory, it runs...
18
by: yinglcs | last post by:
Hi, I have a newbie XSLT question. I have the following xml, and I would like to find out the children of feature element in each 'features' element. i.e. for each <featuresI would like to...
9
by: starlight | last post by:
Hallo, there were some posts about this, but nothing I could find useful. I have a large XML file (80MB) and need certain information out of it. I though I could use XSLT with an fairy simple...
12
by: Chris | last post by:
Hi, Just wondering if anyone out there knows if it is possible to convert a CSV to xml using XSLT? I've seen a lot of examples of xml to CSV, but is it possible to go back the other way? I...
2
by: killy971 | last post by:
I have been testing different libraries to process XSL transformations on large XML files. The fact is that I read a document from Intel, stating their library (XSLT accelerator) was more twice...
12
by: Stu | last post by:
Being a newbie with XSLT transformation code please excuse my neivte. In addition, I am not sure what I want to do can be done with xslt so I apologize up front for asking anything stupid I...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.