473,323 Members | 1,570 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,323 software developers and data experts.

DOM sub trees whilst SAX'ing in perl?

I need to process some XML files that are rather large.
However their structure may usefully be expressed
as
<ELEMENT FILE (RECORD)+>
..
..
..

Each record is a few Kb. The files are many 10's of Megabytes.

I would (dearly) like to use DOM to process each record,
since it's easier to get my head round than SAX events.

But I don't want to pull the whole file into
a DOM tree; it's too big.

These people have come up with a perfect (and obvious?)
solution:
http://www.devsphere.com/xml/saxdomix/

But I'm coding in a Perl environment.

Is there a similar Module, generating separate
DOM sub trees for Perl?

BugBear
Jul 20 '05 #1
5 1756
bugbear wrote:
I need to process some XML files that are rather large.
However their structure may usefully be expressed
as
<ELEMENT FILE (RECORD)+>
.
.
.

Each record is a few Kb. The files are many 10's of Megabytes.

I would (dearly) like to use DOM to process each record,
since it's easier to get my head round than SAX events.

But I don't want to pull the whole file into
a DOM tree; it's too big.

These people have come up with a perfect (and obvious?)
solution:
http://www.devsphere.com/xml/saxdomix/

But I'm coding in a Perl environment.

Is there a similar Module, generating separate
DOM sub trees for Perl?


It looks like what XML::Twig does, except XML::Twig is not SAX/DOM based.

--
mirod
Jul 20 '05 #2
Michel Rodriguez wrote:
bugbear wrote:

These people have come up with a perfect (and obvious?)
solution:
http://www.devsphere.com/xml/saxdomix/

But I'm coding in a Perl environment.

Is there a similar Module, generating separate
DOM sub trees for Perl?

It looks like what XML::Twig does, except XML::Twig is not SAX/DOM based.


OK. That does the right thing; I'd prefer to stay with standards
(i.e. SAX and DOM) if possible. I'll keep looking, and bear
XML::Twig in mind as a fall back position.

BugBear
Jul 20 '05 #3
SL
> >> Is there a similar Module, generating separate
DOM sub trees for Perl?

It looks like what XML::Twig does, except XML::Twig is not SAX/DOM based.


OK. That does the right thing; I'd prefer to stay with standards
(i.e. SAX and DOM) if possible. I'll keep looking, and bear
XML::Twig in mind as a fall back position.


I haven't used it since a while, but there is (or was) a package doing what
you want on CPAN: DocSplitter in XML::SAX::Machines. It allows you to split
a SAX stream into several smaller documents by throwing a startDocument()
and endDocument() event before and after a particular element. For instance,
you may split your stream on each RECORD element, so that each filter below
in the pipeline process RECORD element as the root element of distinct
document. This is is useful in particular with the filtre XML::Filter::XSLT
by Matt Sergeant. If you want to merge again the results of the
transformation into a big document, you may use a "Merger" in the pipeline
package; it works with the splitter for removing the extra startDocument()
and endDocument() events. Machines provide several facilities for dealing
with SAX pipeline.

HTH,
SL
Jul 20 '05 #4
SL wrote:
Is there a similar Module, generating separate
DOM sub trees for Perl?
It looks like what XML::Twig does, except XML::Twig is not SAX/DOM


based.
OK. That does the right thing; I'd prefer to stay with standards
(i.e. SAX and DOM) if possible. I'll keep looking, and bear
XML::Twig in mind as a fall back position.

I haven't used it since a while, but there is (or was) a package doing what
you want on CPAN: DocSplitter in XML::SAX::Machines. It allows you to split
a SAX stream into several smaller documents by throwing a startDocument()
and endDocument() event before and after a particular element. For instance,
you may split your stream on each RECORD element, so that each filter below
in the pipeline process RECORD element as the root element of distinct
document. This is is useful in particular with the filtre XML::Filter::XSLT
by Matt Sergeant. If you want to merge again the results of the
transformation into a big document, you may use a "Merger" in the pipeline
package; it works with the splitter for removing the extra startDocument()
and endDocument() events. Machines provide several facilities for dealing
with SAX pipeline.


So how do I get my DOM(s)?

BugBear
Jul 20 '05 #5
SL
> So how do I get my DOM(s)?

Look into the XML::Filter::XSLT::LibXSLT filter : it used
XML::LibXML::SAX::Builder for building a DOM using the SAX events received.

SL
Jul 20 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

42
by: Fred Ma | last post by:
Hello, This is not a troll posting, and I've refrained from asking because I've seen similar threads get all nitter-nattery. But I really want to make a decision on how best to invest my time....
2
by: Piet | last post by:
Hello, Via Xpath, I want to access nodes which have a namespace prefix. THe document at hand is an Xsl-FO document. I tried the following: from xml.dom import minidom from xml.xpath import...
8
by: Dan | last post by:
Using XML::Simple in perl is extreemly slow to parse big XML files (can be up to 250M, taking ~1h). How can I increase my performance / reduce my memory usage? Is SAX the way forward?
1
by: John Bokma | last post by:
Hi, After some attempts using XSLT I am thinking about switching to Perl with a SAX (like?) parser. Which module(s) should I use? I see so many XML related modules at search.cpan.org that it...
13
by: Jesse Thompson | last post by:
Greetings fell XML folk. I've just gotten started making SAX filters in Perl. I was hoping to build an XML templating engine this way, but the performance of XML::SAX::Expat and XML::SAX::Writer...
1
by: Avi Kak | last post by:
Hello: This questions relates to the behavior of the Perl SAX 2.0 parser XML::LibXML::SAX. (This behavior is also shown by the XML::SAX::Expat parser and, possibly by all other Perl SAX 2.0...
5
by: bugbear | last post by:
I need to do some fairly simple processing of XML files; I would like to be able to do this in perl. However, the XML I'm handling uses namespaces. In practice, the tags do not overlap, so I...
1
by: Ken Browning | last post by:
I have been working with Perl for a while now, but have not used it with XML - I am an XML noob. I want to be able to read an XML schema file and initialize an instance of the data item...
8
by: erik_midtskogen | last post by:
Hi Folks, I'm writing a general-purpose HTML screen-scraping framework in Java (scrape new web sites without writing new code, yada yada...), and I want to use the JAXP DOM api along with XPath...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.