473,804 Members | 3,251 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

huge XML files, XSLT memory problems, Java & SAX...

I have 2 XML data files that I want to extract data from simultaneously

and transform with XSLT to generate a report. The first file is huge
and when XSLT builds the DOM tree in memory, it runs out of space.

I only need a few branches of elements from the original XML, so I am
seeking a recomended way of building a DOM for XSLT of only the
elements
that I need. I'm writing a Java application that invokes Xalan, and
reading up on SAX parsers this afternoon... I'm sure this is a common
problem, and as such, there is probably a clean and easy way to do it,
but I haven't found that one yet...

thanks,
Jeff

Feb 8 '06
12 5366
Don't write Java, use a pipeline -- Markup pipeline demo site [1]
includes a pipeline to split a large document into small chunks for
validation, similar approach would work for.

ht

[1] http://www.markup.co.uk/showcase/
--
Henry S. Thompson, Markup Technology Ltd.
4 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- +44 (0) 7866 471 388
Fax: (44) 131 650-4587, e-mail: ht@markuptechno logy.com
URL: http://www.markup.co.uk/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
Feb 10 '06 #11
> Lemme see if I've got anything on tap that's simple enough to be a good
pedagogical illustration...


I don't have anything really good on hand, but look for examples of use
of org.xml.sax.hel pers.XMLFilterI mpl.

That starts out as a no-op filter which just passes everything through.
What you'd need to do is add enough logic to recognize which portions of
the document you're interested in, and pass those (and only those) along
to the next stage of processing. Plus, probably, the document element
(or a synthesized document element) so it's well-formed XML. Handling
namespaces properly complicates this somewhat but not horribly.

Note that the next stage has to be aware that it's seeing a filtered
view of the document; if you've passed along only some subtrees, search
patterns that look at the context they appeared in may of course not
work as expected. For example, if you're prefiltering before running a
stylesheet, some of the XPaths in that stylesheet may have to be rewritten.

As I say, I haven't had much trouble running recent versions of Xalan on
large documents... but this kind of explicit prefiltering may save you
some cycles and storage, at the cost of requiring more cycles of
developer time to create and maintain it.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Feb 11 '06 #12

Thanks, Joe. I shall take a look at org.xml.sax.hel pers.XMLFilterI mpl.

--Jeff

Feb 13 '06 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
5708
by: J Trost | last post by:
I was wondering if anyone knows if it is possible to do basic string replacement using XSLT even though the strings being replaced may contain "<" and ">". Here is my problem: I need to be able to convert XML like this: <?xml version="1.0" encoding="UTF-8"?> <java version="1.4.2_03" class="java.beans.XMLDecoder"> <object class="javax.swing.JButton"> <string>Hello, world</string>
4
2125
by: Ringo Langly | last post by:
Hi all, I'm a seasoned web programmer, but I've never touched XSLT. It's always been one of those acronyms I've never needed to educate myself on. Now... we're working with a web content provider who says we need to use XSLT and Web Services to pull the content from their site. Can someone give me a nutshell definition on how this works??? We use Cold Fusion MX on our web server, but I'm having trouble finding a
6
3172
by: Robbie Baldock | last post by:
Hi - I'm a bit of a newbie to the world of XSLTs but am trying to call a Java method on a parameter passed into an XSLT but am having problems. I've stripped the XSLT down to its bare bones: <xsl:stylesheet version="1.0" xmlns:java="http://xml.apache.org/xslt/java"
1
1533
by: cybersattva | last post by:
Hi everyone, I have some pretty mongo XML source files (say 256MB or so) that I need to run an XSL transform on. The transform works fine when the file has only one or two records in it, but when I tried using xalan on the whole file it ran for an hour with no perceptible results (besides a really sluggish CPU). Has anyone else had any luck with a pre-built XSL translator for monster files, or am I going to need to write a program to...
1
3112
by: Jens Mueller | last post by:
Hi there, this is a Java-XML Question, so I am not sure whether this is the right place, haven't found anything better .... I try to convert a Java object to XML via SAX and let the FOP Transformer convert that via XSLT to valid XSL-FO. So I define a SAXReader which fires the SAX Events for the Java Object. This works fine and the Transformation to PDF is ok. However, I have one object which contains an XHTML String and the tags
6
3812
by: Daniel Walzenbach | last post by:
Hi, I have a web application which sometimes throws an “out of memory” exception. To get an idea what happens I traced some values using performance monitor and got the following values (for one day): \\FFDS24\ASP.NET Applications(_LM_W3SVC_1_Root_ATV2004)\Errors During Execution: 7 \\FFDS24\ASP.NET Apps v1.1.4322(_LM_W3SVC_1_Root_ATV2004)\Compilations
3
2012
by: thomas.porschberg | last post by:
Hi, I want to read records from a database and export it in an arbitrary format. My idea was to feed a class with a String array fetched from the database and let this class fire SAX events as processor input. The basic class hierarchy is:
0
3304
by: DAnne | last post by:
Hi, I'm very new to xslt and this is my first time posting to a Forum so please forgive me if I transgress any protocols. I have to do a tally report. This report is divided up into sections. Each section has a list of questions. Each question has responses. I need to display a list of responses to the questions (i.e. set:distinct), once and only once, each section. My second problem is that these questions can also have corrective...
2
2848
by: alex masselot | last post by:
Hello I'm not familiar with xerces in c++ Currently, we parse xml file with perl (typically XML::Twig) and java (dom4j). With both API, there is a very comfortable way to mix Sax/DOM, by setting handlers to some elements paths. The xml file is parsed, then once a defined paths is reached, the
0
9708
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, well explore What is ONU, What Is Router, ONU & Routers main usage, and What is the difference between ONU and Router. Lets take a closer look ! Part I. Meaning of...
0
9587
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10340
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10324
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10085
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7623
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6857
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5662
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3827
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.