"drop-in" DOM replacement for minidom?

Paul Miller

We've run into minidom's inabilty to handle large (20+MB) XML files, and
need a replacement that can handle it. Unfortunately, we're pretty
dependent on a DOM, so a pulldom or SAX replacement is likely out of the
question for now.

Has someone done a more efficient minidom replacement module that we can
just drop in? Preferrably written in C?

Jul 18 '05 #1

Subscribe Post Reply

2486

Geoff Gerrietts

Quoting Paul Miller (pa**@fxtech.com):

We've run into minidom's inabilty to handle large (20+MB) XML files, and
need a replacement that can handle it. Unfortunately, we're pretty
dependent on a DOM, so a pulldom or SAX replacement is likely out of the
question for now.

Has someone done a more efficient minidom replacement module that we can
just drop in? Preferrably written in C?

I've posted on a related topic in the past, when a friend of mine was
blowing thru 8GB of memory parsing a 30MB file in minidom. Pretty much
every response I got was of the general form "well what the hell are
you using DOM for? are you defective?" Some were more diplomatic than
others.

My friend also had some more challenging problems. He was running on a
DEC Alpha, I think under Digital Unix, and as a consequence 4Suite had
byte-ordering problems. PyRXP wouldn't compile for him, if I recall
correctly -- or maybe there were licensing problems? Anyway, he
ultimately settled on using pulldom; that gave him simplicity, speed,
and a small enough memory profile that it satisfied his needs.

Obviously it won't help in your case.

I don't think you'll find something that precisely mimics the minidom
module's interface, so you're going to hafta do some retooling.
However, I believe that if you can get 4Suite to compile, you might
find some love in there. There's a cDomlette component (labelled at
the time of my last reading as "experimental") that builds the parse
tree in C, with a minimal memory consumption.

Here's a link to something that should tell you how to make it work
(though when I personally used cDomlette, I seem to remember it being
harder than this....)

http://uche.ogbuji.net/tech/akara/no...1-01/domlettes

Also, you may be interested in looking at the comparisons done by the
PyRXP folks on their page:

http://www.reportlab.com/xml/pyrxp.html

Best of luck!

--G.

--
Geoff Gerrietts "Whenever people agree with me I always
<geoff at gerrietts net> feel I must be wrong." --Oscar Wilde

Jul 18 '05 #2

Armin Wittfoth

Harry George <ha************@boeing.com> wrote in message news:<xq*************@cola2.ca.boeing.com>...

Paul Miller <pa**@fxtech.com> writes:

Switching to
SAX was a major improvement in mem usage and thus in parse time.

As an alternative you can easily build a custom, lightweight, Object
Model. I'm using one designed naively to reflect the set of elements
used in the several XML schemas we use. I use SAX to parse the
document into our object model and have the convenience of programming
with the nicer (in some ways DOM like) interface.

Basically there is a class Element which (since 2.2) is a child of
list. By convention it can contain either a unicode string (CDATA) or
another element. The XML attributes can be either stored as a
dictionary or, as I eventually did, directly as attributes of the
class. Record the parent element (aka location), add some methods
such as nextSibling() etc and you're on your way.

In our case I've adopted a naive approach, ie there is a separate
class for every type of XML element (which all ultimately derive from
Element). This suffers from being non-general (ie specific, to the
specific set of schema we use), but it has the advantage that you
don't have to look up what kind of Element you are dealing with and
determine what to do with it, but can use polymorphism nicely.
Further there is no conceptual difference between a chunk of XML, and
the python object structure (ie Elements within Elements) used to
represent it.

It was because Python was so ideally suited to this kind of thing,
that I originally adopted it. As an aside I wrote an XLST sheet,
which reads the various xml-schema files (I only write DTDs myself,
relying on converters to generate xsd), and writes out the python stub
code, (ie creates the basic class definition for each element adding
the appropriate attributes etc), saving a lot of boring boilerplate
typing and allows for quick and accurate code updates if new
attributes are added to the schema.

Going about it in this kind of way, you get something of much lighter
weight than DOM, but which does have that nice structural (as opposed
to SAX's event-driven) way of working with XML.

Jul 18 '05 #3

Bengt Richter

On Wed, 13 Aug 2003 11:09:39 -0500, Paul Miller <pa**@fxtech.com> wrote:

We've run into minidom's inabilty to handle large (20+MB) XML files, and
need a replacement that can handle it. Unfortunately, we're pretty
dependent on a DOM, so a pulldom or SAX replacement is likely out of the
question for now.

Has someone done a more efficient minidom replacement module that we can
just drop in? Preferrably written in C?

I'm curious how DOM dependent you really are. I.e., what minidom methods do you really use?
Can you assume that you are dealing with valid (error-free) XML as input?

Regards,
Bengt Richter

Jul 18 '05 #4

Uche Ogbuji

Geoff Gerrietts <ge***@gerrietts.net> wrote in message news:<ma**********************************@python. org>...

Quoting Paul Miller (pa**@fxtech.com):
We've run into minidom's inabilty to handle large (20+MB) XML files, and
need a replacement that can handle it. Unfortunately, we're pretty
dependent on a DOM, so a pulldom or SAX replacement is likely out of the
question for now.

Has someone done a more efficient minidom replacement module that we can
just drop in? Preferrably written in C?
I've posted on a related topic in the past, when a friend of mine was
blowing thru 8GB of memory parsing a 30MB file in minidom. Pretty much
every response I got was of the general form "well what the hell are
you using DOM for? are you defective?" Some were more diplomatic than
others.

My response is usually more like "what are you using XML for a single
30MB file for?"

I've long maintained that when working with XML, modest document sizes
is very important, regardless of what tools you're using.

But that having been said, some documents are 30MB, and it makes sense
that they're 30MB, and that's just the way it is.

My friend also had some more challenging problems. He was running on a
DEC Alpha, I think under Digital Unix, and as a consequence 4Suite had
byte-ordering problems.
4Suite used to have byte-ordering problems, originally reported under
Solaris 9, and also affecting some Mac OS X users. Those are fixed
now.

PyRXP wouldn't compile for him, if I recall
correctly -- or maybe there were licensing problems? Anyway, he
ultimately settled on using pulldom; that gave him simplicity, speed,
and a small enough memory profile that it satisfied his needs.

Obviously it won't help in your case.
pulldom is always worth considering.

http://www-106.ibm.com/developerwork...tipulldom.html
I don't think you'll find something that precisely mimics the minidom
module's interface, so you're going to hafta do some retooling.
However, I believe that if you can get 4Suite to compile,
Which I hardly expect to be a problem.
you might
find some love in there. There's a cDomlette component (labelled at
the time of my last reading as "experimental")
cDomlette hasn't been experimental for nearly a year now. We use it
heavily in production.

that builds the parse
tree in C, with a minimal memory consumption.
And fast parse and mutation time.

Here's a link to something that should tell you how to make it work
(though when I personally used cDomlette, I seem to remember it being
harder than this....)

http://uche.ogbuji.net/tech/akara/no...1-01/domlettes
Your memories must be from long ago :-) That API is how it's been for
a while.

Also, you may be interested in looking at the comparisons done by the
PyRXP folks on their page:

http://www.reportlab.com/xml/pyrxp.html

Best of luck!

Ditto.

--Uche
http://uche.ogbuji.net

Jul 18 '05 #5

Paul Miller

>>Has someone done a more efficient minidom replacement module that we can

just drop in? Preferrably written in C?

I'm curious how DOM dependent you really are. I.e., what minidom methods do you really use?
Can you assume that you are dealing with valid (error-free) XML as input?

Yes, it is assumed to be valid. We don't even use a DTD. But we use the DOM
to point to later nodes in the tree by following references in nodes higher
in the tree.

But, building a sparse object model initially and resolving references
later might be the right solution.

Jul 18 '05 #6

by: Miles Davenport | last post by:

I would like some advice on what Java server-side alternatives their are to an applet which is a shopping cart application which allows the user to drag-and-drop individual items into "an order"...

Java

win32: "LoadBarState failed" and flaming death

by: gcash | last post by:

I'm doing stuff using COM to automate IE, for some boring repetitive portal tasks I have to do. Anyway, every so often I'll make some sort of error and things will crash, and I'll have to do...

Python

"Web Controls"

by: Archana | last post by:

I am looking for a web based drawing control but have had no luck findin one. I want something that is similar to Visio where there is a palette o objects that get dragged to a drawing surface or...

.NET Framework

How to Make "Mouse-Over" Drop-Down Menus

by: Larry R Harrison Jr | last post by:

I am looking for javascript and a basic tutorial on how to make mouse-over drop-down menus--the type that when you "hover" over a subject links relevant to that subject "emerge" which you can then...

Javascript

PDF "File Download" window?

by: AES/newspost | last post by:

On many web sites or pages (including my own home page) clicking on certain links will start downloading a PDF file, sometimes without the author having provided any warning in the text of the page...

HTML / CSS

"Drop down" effect missing from Listbox

by: Kyle Blaney | last post by:

I am using a Listbox and can not get the "drop down" effect. My listbox is populated with 20 items and a vertical scrollbar is automatically added. One item is visible. When I click on the...

C# / C Sharp

How to: set the selected "drop down list" value at run time.

by: charliewest | last post by:

I need to set the selected drop down list value at run time. I am aware of the method "SelectIndex" however this works only if you know the precise location of the value within the ListItem...

C# / C Sharp

How to execute sql command just like "Drop DATABASE " and "Restore DATABASE "?

by: Risen | last post by:

Hi,all, I want to execute SQL command " DROP DATABASE mydb" and "Restore DATABASE ....." in vb.net 2003. But it always shows error. If any body can tell me how to execute sql command as above?...

Visual Basic .NET

type="text/css" (?)

by: Rich | last post by:

Is the link rel="stylesheet" supposed to be real plain text, or would some word processor format such as Word/Pad work? This sample stylesheet seems garbled if downloaded and opened with...

HTML / CSS

"Simple" drop down menu?

by: Doogly | last post by:

Hello everyone, I'm really new to Javascript and was wondering if anyone could give me code for a nice looking drop down menu. Ideally it would do that little scrolly animation thingy when it...

Javascript

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

"drop-in" DOM replacement for minidom?

Similar topics