472,805 Members | 871 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,805 software developers and data experts.

"drop-in" DOM replacement for minidom?

We've run into minidom's inabilty to handle large (20+MB) XML files, and
need a replacement that can handle it. Unfortunately, we're pretty
dependent on a DOM, so a pulldom or SAX replacement is likely out of the
question for now.

Has someone done a more efficient minidom replacement module that we can
just drop in? Preferrably written in C?
Jul 18 '05 #1
5 2445
Quoting Paul Miller (pa**@fxtech.com):
We've run into minidom's inabilty to handle large (20+MB) XML files, and
need a replacement that can handle it. Unfortunately, we're pretty
dependent on a DOM, so a pulldom or SAX replacement is likely out of the
question for now.

Has someone done a more efficient minidom replacement module that we can
just drop in? Preferrably written in C?


I've posted on a related topic in the past, when a friend of mine was
blowing thru 8GB of memory parsing a 30MB file in minidom. Pretty much
every response I got was of the general form "well what the hell are
you using DOM for? are you defective?" Some were more diplomatic than
others.

My friend also had some more challenging problems. He was running on a
DEC Alpha, I think under Digital Unix, and as a consequence 4Suite had
byte-ordering problems. PyRXP wouldn't compile for him, if I recall
correctly -- or maybe there were licensing problems? Anyway, he
ultimately settled on using pulldom; that gave him simplicity, speed,
and a small enough memory profile that it satisfied his needs.

Obviously it won't help in your case.

I don't think you'll find something that precisely mimics the minidom
module's interface, so you're going to hafta do some retooling.
However, I believe that if you can get 4Suite to compile, you might
find some love in there. There's a cDomlette component (labelled at
the time of my last reading as "experimental") that builds the parse
tree in C, with a minimal memory consumption.

Here's a link to something that should tell you how to make it work
(though when I personally used cDomlette, I seem to remember it being
harder than this....)

http://uche.ogbuji.net/tech/akara/no...1-01/domlettes

Also, you may be interested in looking at the comparisons done by the
PyRXP folks on their page:

http://www.reportlab.com/xml/pyrxp.html

Best of luck!

--G.

--
Geoff Gerrietts "Whenever people agree with me I always
<geoff at gerrietts net> feel I must be wrong." --Oscar Wilde

Jul 18 '05 #2
Harry George <ha************@boeing.com> wrote in message news:<xq*************@cola2.ca.boeing.com>...
Paul Miller <pa**@fxtech.com> writes:

Switching to
SAX was a major improvement in mem usage and thus in parse time.


As an alternative you can easily build a custom, lightweight, Object
Model. I'm using one designed naively to reflect the set of elements
used in the several XML schemas we use. I use SAX to parse the
document into our object model and have the convenience of programming
with the nicer (in some ways DOM like) interface.

Basically there is a class Element which (since 2.2) is a child of
list. By convention it can contain either a unicode string (CDATA) or
another element. The XML attributes can be either stored as a
dictionary or, as I eventually did, directly as attributes of the
class. Record the parent element (aka location), add some methods
such as nextSibling() etc and you're on your way.

In our case I've adopted a naive approach, ie there is a separate
class for every type of XML element (which all ultimately derive from
Element). This suffers from being non-general (ie specific, to the
specific set of schema we use), but it has the advantage that you
don't have to look up what kind of Element you are dealing with and
determine what to do with it, but can use polymorphism nicely.
Further there is no conceptual difference between a chunk of XML, and
the python object structure (ie Elements within Elements) used to
represent it.

It was because Python was so ideally suited to this kind of thing,
that I originally adopted it. As an aside I wrote an XLST sheet,
which reads the various xml-schema files (I only write DTDs myself,
relying on converters to generate xsd), and writes out the python stub
code, (ie creates the basic class definition for each element adding
the appropriate attributes etc), saving a lot of boring boilerplate
typing and allows for quick and accurate code updates if new
attributes are added to the schema.

Going about it in this kind of way, you get something of much lighter
weight than DOM, but which does have that nice structural (as opposed
to SAX's event-driven) way of working with XML.
Jul 18 '05 #3
On Wed, 13 Aug 2003 11:09:39 -0500, Paul Miller <pa**@fxtech.com> wrote:
We've run into minidom's inabilty to handle large (20+MB) XML files, and
need a replacement that can handle it. Unfortunately, we're pretty
dependent on a DOM, so a pulldom or SAX replacement is likely out of the
question for now.

Has someone done a more efficient minidom replacement module that we can
just drop in? Preferrably written in C?

I'm curious how DOM dependent you really are. I.e., what minidom methods do you really use?
Can you assume that you are dealing with valid (error-free) XML as input?

Regards,
Bengt Richter
Jul 18 '05 #4
Geoff Gerrietts <ge***@gerrietts.net> wrote in message news:<ma**********************************@python. org>...
Quoting Paul Miller (pa**@fxtech.com):
We've run into minidom's inabilty to handle large (20+MB) XML files, and
need a replacement that can handle it. Unfortunately, we're pretty
dependent on a DOM, so a pulldom or SAX replacement is likely out of the
question for now.

Has someone done a more efficient minidom replacement module that we can
just drop in? Preferrably written in C?
I've posted on a related topic in the past, when a friend of mine was
blowing thru 8GB of memory parsing a 30MB file in minidom. Pretty much
every response I got was of the general form "well what the hell are
you using DOM for? are you defective?" Some were more diplomatic than
others.


My response is usually more like "what are you using XML for a single
30MB file for?"

I've long maintained that when working with XML, modest document sizes
is very important, regardless of what tools you're using.

But that having been said, some documents are 30MB, and it makes sense
that they're 30MB, and that's just the way it is.

My friend also had some more challenging problems. He was running on a
DEC Alpha, I think under Digital Unix, and as a consequence 4Suite had
byte-ordering problems.
4Suite used to have byte-ordering problems, originally reported under
Solaris 9, and also affecting some Mac OS X users. Those are fixed
now.

PyRXP wouldn't compile for him, if I recall
correctly -- or maybe there were licensing problems? Anyway, he
ultimately settled on using pulldom; that gave him simplicity, speed,
and a small enough memory profile that it satisfied his needs.

Obviously it won't help in your case.
pulldom is always worth considering.

http://www-106.ibm.com/developerwork...tipulldom.html
I don't think you'll find something that precisely mimics the minidom
module's interface, so you're going to hafta do some retooling.
However, I believe that if you can get 4Suite to compile,
Which I hardly expect to be a problem.
you might
find some love in there. There's a cDomlette component (labelled at
the time of my last reading as "experimental")
cDomlette hasn't been experimental for nearly a year now. We use it
heavily in production.

that builds the parse
tree in C, with a minimal memory consumption.
And fast parse and mutation time.

Here's a link to something that should tell you how to make it work
(though when I personally used cDomlette, I seem to remember it being
harder than this....)

http://uche.ogbuji.net/tech/akara/no...1-01/domlettes
Your memories must be from long ago :-) That API is how it's been for
a while.

Also, you may be interested in looking at the comparisons done by the
PyRXP folks on their page:

http://www.reportlab.com/xml/pyrxp.html

Best of luck!


Ditto.

--Uche
http://uche.ogbuji.net
Jul 18 '05 #5
>>Has someone done a more efficient minidom replacement module that we can
just drop in? Preferrably written in C?

I'm curious how DOM dependent you really are. I.e., what minidom methods do you really use?
Can you assume that you are dealing with valid (error-free) XML as input?


Yes, it is assumed to be valid. We don't even use a DTD. But we use the DOM
to point to later nodes in the tree by following references in nodes higher
in the tree.

But, building a sparse object model initially and resolving references
later might be the right solution.
Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Miles Davenport | last post by:
I would like some advice on what Java server-side alternatives their are to an applet which is a shopping cart application which allows the user to drag-and-drop individual items into "an order"...
1
by: gcash | last post by:
I'm doing stuff using COM to automate IE, for some boring repetitive portal tasks I have to do. Anyway, every so often I'll make some sort of error and things will crash, and I'll have to do...
0
by: Archana | last post by:
I am looking for a web based drawing control but have had no luck findin one. I want something that is similar to Visio where there is a palette o objects that get dragged to a drawing surface or...
7
by: Larry R Harrison Jr | last post by:
I am looking for javascript and a basic tutorial on how to make mouse-over drop-down menus--the type that when you "hover" over a subject links relevant to that subject "emerge" which you can then...
24
by: AES/newspost | last post by:
On many web sites or pages (including my own home page) clicking on certain links will start downloading a PDF file, sometimes without the author having provided any warning in the text of the page...
2
by: Kyle Blaney | last post by:
I am using a Listbox and can not get the "drop down" effect. My listbox is populated with 20 items and a vertical scrollbar is automatically added. One item is visible. When I click on the...
4
by: charliewest | last post by:
I need to set the selected drop down list value at run time. I am aware of the method "SelectIndex" however this works only if you know the precise location of the value within the ListItem...
7
by: Risen | last post by:
Hi,all, I want to execute SQL command " DROP DATABASE mydb" and "Restore DATABASE ....." in vb.net 2003. But it always shows error. If any body can tell me how to execute sql command as above?...
7
by: Rich | last post by:
Is the link rel="stylesheet" supposed to be real plain text, or would some word processor format such as Word/Pad work? This sample stylesheet seems garbled if downloaded and opened with...
2
by: Doogly | last post by:
Hello everyone, I'm really new to Javascript and was wondering if anyone could give me code for a nice looking drop down menu. Ideally it would do that little scrolly animation thingy when it...
0
by: erikbower65 | last post by:
Using CodiumAI's pr-agent is simple and powerful. Follow these steps: 1. Install CodiumAI CLI: Ensure Node.js is installed, then run 'npm install -g codiumai' in the terminal. 2. Connect to...
0
linyimin
by: linyimin | last post by:
Spring Startup Analyzer generates an interactive Spring application startup report that lets you understand what contributes to the application startup time and helps to optimize it. Support for...
0
by: erikbower65 | last post by:
Here's a concise step-by-step guide for manually installing IntelliJ IDEA: 1. Download: Visit the official JetBrains website and download the IntelliJ IDEA Community or Ultimate edition based on...
0
by: kcodez | last post by:
As a H5 game development enthusiast, I recently wrote a very interesting little game - Toy Claw ((http://claw.kjeek.com/))。Here I will summarize and share the development experience here, and hope it...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Sept 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
14
DJRhino1175
by: DJRhino1175 | last post by:
When I run this code I get an error, its Run-time error# 424 Object required...This is my first attempt at doing something like this. I test the entire code and it worked until I added this - If...
0
by: Rina0 | last post by:
I am looking for a Python code to find the longest common subsequence of two strings. I found this blog post that describes the length of longest common subsequence problem and provides a solution in...
5
by: DJRhino | last post by:
Private Sub CboDrawingID_BeforeUpdate(Cancel As Integer) If = 310029923 Or 310030138 Or 310030152 Or 310030346 Or 310030348 Or _ 310030356 Or 310030359 Or 310030362 Or...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.