huge XML files, XSLT memory problems, Java & SAX... - Page 2

Jeff Calico

I have 2 XML data files that I want to extract data from simultaneously

and transform with XSLT to generate a report. The first file is huge
and when XSLT builds the DOM tree in memory, it runs out of space.

I only need a few branches of elements from the original XML, so I am
seeking a recomended way of building a DOM for XSLT of only the
elements
that I need. I'm writing a Java application that invokes Xalan, and
reading up on SAX parsers this afternoon... I'm sure this is a common
problem, and as such, there is probably a clean and easy way to do it,
but I haven't found that one yet...

thanks,
Jeff

Feb 8 '06

Subscribe Reply

5366

Henry S. Thompson

Don't write Java, use a pipeline -- Markup pipeline demo site [1]
includes a pipeline to split a large document into small chunks for
validation, similar approach would work for.

ht

[1] http://www.markup.co.uk/showcase/
--
Henry S. Thompson, Markup Technology Ltd.
4 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- +44 (0) 7866 471 388
Fax: (44) 131 650-4587, e-mail: ht@markuptechno logy.com
URL: http://www.markup.co.uk/
[mail really from me _always_ has this .sig -- mail without it is forged spam]

Feb 10 '06 #11

Joe Kesselman

> Lemme see if I've got anything on tap that's simple enough to be a good

pedagogical illustration...

I don't have anything really good on hand, but look for examples of use
of org.xml.sax.hel pers.XMLFilterI mpl.

That starts out as a no-op filter which just passes everything through.
What you'd need to do is add enough logic to recognize which portions of
the document you're interested in, and pass those (and only those) along
to the next stage of processing. Plus, probably, the document element
(or a synthesized document element) so it's well-formed XML. Handling
namespaces properly complicates this somewhat but not horribly.

Note that the next stage has to be aware that it's seeing a filtered
view of the document; if you've passed along only some subtrees, search
patterns that look at the context they appeared in may of course not
work as expected. For example, if you're prefiltering before running a
stylesheet, some of the XPaths in that stylesheet may have to be rewritten.

As I say, I haven't had much trouble running recent versions of Xalan on
large documents... but this kind of explicit prefiltering may save you
some cycles and storage, at the cost of requiring more cycles of
developer time to create and maintain it.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry

Feb 11 '06 #12

Jeff Calico

Thanks, Joe. I shall take a look at org.xml.sax.hel pers.XMLFilterI mpl.

--Jeff

Feb 13 '06 #13

Similar topics

5708

XSLT string replacement question

by: J Trost | last post by:

I was wondering if anyone knows if it is possible to do basic string replacement using XSLT even though the strings being replaced may contain "<" and ">". Here is my problem: I need to be able to convert XML like this: <?xml version="1.0" encoding="UTF-8"?> <java version="1.4.2_03" class="java.beans.XMLDecoder"> <object class="javax.swing.JButton"> <string>Hello, world</string>

.NET Framework

2125

XSLT -- some basic info needed

by: Ringo Langly | last post by:

Hi all, I'm a seasoned web programmer, but I've never touched XSLT. It's always been one of those acronyms I've never needed to educate myself on. Now... we're working with a web content provider who says we need to use XSLT and Web Services to pull the content from their site. Can someone give me a nutshell definition on how this works??? We use Cold Fusion MX on our web server, but I'm having trouble finding a

.NET Framework

3172

Problem calling Java method in XSLT (Xalan)

by: Robbie Baldock | last post by:

Hi - I'm a bit of a newbie to the world of XSLTs but am trying to call a Java method on a parameter passed into an XSLT but am having problems. I've stripped the XSLT down to its bare bones: <xsl:stylesheet version="1.0" xmlns:java="http://xml.apache.org/xslt/java"

.NET Framework

1533

XSLT on really large files?

by: cybersattva | last post by:

Hi everyone, I have some pretty mongo XML source files (say 256MB or so) that I need to run an XSL transform on. The transform works fine when the file has only one or two records in it, but when I tried using xalan on the whole file it ran for an hour with no perceptible results (besides a really sluggish CPU). Has anyone else had any luck with a pre-built XSL translator for monster files, or am I going to need to write a program to...

.NET Framework

3112

FOP, XSLT: problems with SAX2DTM Parsing trying to convert to XSL-FO in Java

by: Jens Mueller | last post by:

Hi there, this is a Java-XML Question, so I am not sure whether this is the right place, haven't found anything better .... I try to convert a Java object to XML via SAX and let the FOP Transformer convert that via XSLT to valid XSL-FO. So I define a SAXReader which fires the SAX Events for the Java Object. This works fine and the Transformation to PDF is ok. However, I have one object which contains an XHTML String and the tags

.NET Framework

3812

Trouble with huge amount of State Server Sessions Timed out

by: Daniel Walzenbach | last post by:

Hi, I have a web application which sometimes throws an â€œout of memoryâ€ exception. To get an idea what happens I traced some values using performance monitor and got the following values (for one day): \\FFDS24\ASP.NET Applications(_LM_W3SVC_1_Root_ATV2004)\Errors During Execution: 7 \\FFDS24\ASP.NET Apps v1.1.4322(_LM_W3SVC_1_Root_ATV2004)\Compilations

ASP.NET

2012

xslt processing and memory needs

by: thomas.porschberg | last post by:

Hi, I want to read records from a database and export it in an arbitrary format. My idea was to feed a class with a String array fetched from the database and let this class fire SAX events as processor input. The basic class hierarchy is:

.NET Framework

3304

xslt problems

by: DAnne | last post by:

Hi, I'm very new to xslt and this is my first time posting to a Forum so please forgive me if I transgress any protocols. I have to do a tally report. This report is divided up into sections. Each section has a list of questions. Each question has responses. I need to display a list of responses to the questions (i.e. set:distinct), once and only once, each section. My second problem is that these questions can also have corrective...

XML

2848

c++ parsing with mix of sax & dom for large files

by: alex masselot | last post by:

Hello I'm not familiar with xerces in c++ Currently, we parse xml file with perl (typically XML::Twig) and java (dom4j). With both API, there is a very comfortable way to mix Sax/DOM, by setting handlers to some elements paths. The xml file is parsed, then once a defined paths is reached, the

.NET Framework

9708

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

9587

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10340

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

10324

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

10085

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

7623

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

6857

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5662

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

3827

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP