473,725 Members | 2,126 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

"drop-in" DOM replacement for minidom?

We've run into minidom's inabilty to handle large (20+MB) XML files, and
need a replacement that can handle it. Unfortunately, we're pretty
dependent on a DOM, so a pulldom or SAX replacement is likely out of the
question for now.

Has someone done a more efficient minidom replacement module that we can
just drop in? Preferrably written in C?
Jul 18 '05 #1
5 2511
Quoting Paul Miller (pa**@fxtech.co m):
We've run into minidom's inabilty to handle large (20+MB) XML files, and
need a replacement that can handle it. Unfortunately, we're pretty
dependent on a DOM, so a pulldom or SAX replacement is likely out of the
question for now.

Has someone done a more efficient minidom replacement module that we can
just drop in? Preferrably written in C?


I've posted on a related topic in the past, when a friend of mine was
blowing thru 8GB of memory parsing a 30MB file in minidom. Pretty much
every response I got was of the general form "well what the hell are
you using DOM for? are you defective?" Some were more diplomatic than
others.

My friend also had some more challenging problems. He was running on a
DEC Alpha, I think under Digital Unix, and as a consequence 4Suite had
byte-ordering problems. PyRXP wouldn't compile for him, if I recall
correctly -- or maybe there were licensing problems? Anyway, he
ultimately settled on using pulldom; that gave him simplicity, speed,
and a small enough memory profile that it satisfied his needs.

Obviously it won't help in your case.

I don't think you'll find something that precisely mimics the minidom
module's interface, so you're going to hafta do some retooling.
However, I believe that if you can get 4Suite to compile, you might
find some love in there. There's a cDomlette component (labelled at
the time of my last reading as "experiment al") that builds the parse
tree in C, with a minimal memory consumption.

Here's a link to something that should tell you how to make it work
(though when I personally used cDomlette, I seem to remember it being
harder than this....)

http://uche.ogbuji.net/tech/akara/no...1-01/domlettes

Also, you may be interested in looking at the comparisons done by the
PyRXP folks on their page:

http://www.reportlab.com/xml/pyrxp.html

Best of luck!

--G.

--
Geoff Gerrietts "Whenever people agree with me I always
<geoff at gerrietts net> feel I must be wrong." --Oscar Wilde

Jul 18 '05 #2
Harry George <ha************ @boeing.com> wrote in message news:<xq******* ******@cola2.ca .boeing.com>...
Paul Miller <pa**@fxtech.co m> writes:

Switching to
SAX was a major improvement in mem usage and thus in parse time.


As an alternative you can easily build a custom, lightweight, Object
Model. I'm using one designed naively to reflect the set of elements
used in the several XML schemas we use. I use SAX to parse the
document into our object model and have the convenience of programming
with the nicer (in some ways DOM like) interface.

Basically there is a class Element which (since 2.2) is a child of
list. By convention it can contain either a unicode string (CDATA) or
another element. The XML attributes can be either stored as a
dictionary or, as I eventually did, directly as attributes of the
class. Record the parent element (aka location), add some methods
such as nextSibling() etc and you're on your way.

In our case I've adopted a naive approach, ie there is a separate
class for every type of XML element (which all ultimately derive from
Element). This suffers from being non-general (ie specific, to the
specific set of schema we use), but it has the advantage that you
don't have to look up what kind of Element you are dealing with and
determine what to do with it, but can use polymorphism nicely.
Further there is no conceptual difference between a chunk of XML, and
the python object structure (ie Elements within Elements) used to
represent it.

It was because Python was so ideally suited to this kind of thing,
that I originally adopted it. As an aside I wrote an XLST sheet,
which reads the various xml-schema files (I only write DTDs myself,
relying on converters to generate xsd), and writes out the python stub
code, (ie creates the basic class definition for each element adding
the appropriate attributes etc), saving a lot of boring boilerplate
typing and allows for quick and accurate code updates if new
attributes are added to the schema.

Going about it in this kind of way, you get something of much lighter
weight than DOM, but which does have that nice structural (as opposed
to SAX's event-driven) way of working with XML.
Jul 18 '05 #3
On Wed, 13 Aug 2003 11:09:39 -0500, Paul Miller <pa**@fxtech.co m> wrote:
We've run into minidom's inabilty to handle large (20+MB) XML files, and
need a replacement that can handle it. Unfortunately, we're pretty
dependent on a DOM, so a pulldom or SAX replacement is likely out of the
question for now.

Has someone done a more efficient minidom replacement module that we can
just drop in? Preferrably written in C?

I'm curious how DOM dependent you really are. I.e., what minidom methods do you really use?
Can you assume that you are dealing with valid (error-free) XML as input?

Regards,
Bengt Richter
Jul 18 '05 #4
Geoff Gerrietts <ge***@gerriett s.net> wrote in message news:<ma******* *************** ************@py thon.org>...
Quoting Paul Miller (pa**@fxtech.co m):
We've run into minidom's inabilty to handle large (20+MB) XML files, and
need a replacement that can handle it. Unfortunately, we're pretty
dependent on a DOM, so a pulldom or SAX replacement is likely out of the
question for now.

Has someone done a more efficient minidom replacement module that we can
just drop in? Preferrably written in C?
I've posted on a related topic in the past, when a friend of mine was
blowing thru 8GB of memory parsing a 30MB file in minidom. Pretty much
every response I got was of the general form "well what the hell are
you using DOM for? are you defective?" Some were more diplomatic than
others.


My response is usually more like "what are you using XML for a single
30MB file for?"

I've long maintained that when working with XML, modest document sizes
is very important, regardless of what tools you're using.

But that having been said, some documents are 30MB, and it makes sense
that they're 30MB, and that's just the way it is.

My friend also had some more challenging problems. He was running on a
DEC Alpha, I think under Digital Unix, and as a consequence 4Suite had
byte-ordering problems.
4Suite used to have byte-ordering problems, originally reported under
Solaris 9, and also affecting some Mac OS X users. Those are fixed
now.

PyRXP wouldn't compile for him, if I recall
correctly -- or maybe there were licensing problems? Anyway, he
ultimately settled on using pulldom; that gave him simplicity, speed,
and a small enough memory profile that it satisfied his needs.

Obviously it won't help in your case.
pulldom is always worth considering.

http://www-106.ibm.com/developerwork...tipulldom.html
I don't think you'll find something that precisely mimics the minidom
module's interface, so you're going to hafta do some retooling.
However, I believe that if you can get 4Suite to compile,
Which I hardly expect to be a problem.
you might
find some love in there. There's a cDomlette component (labelled at
the time of my last reading as "experiment al")
cDomlette hasn't been experimental for nearly a year now. We use it
heavily in production.

that builds the parse
tree in C, with a minimal memory consumption.
And fast parse and mutation time.

Here's a link to something that should tell you how to make it work
(though when I personally used cDomlette, I seem to remember it being
harder than this....)

http://uche.ogbuji.net/tech/akara/no...1-01/domlettes
Your memories must be from long ago :-) That API is how it's been for
a while.

Also, you may be interested in looking at the comparisons done by the
PyRXP folks on their page:

http://www.reportlab.com/xml/pyrxp.html

Best of luck!


Ditto.

--Uche
http://uche.ogbuji.net
Jul 18 '05 #5
>>Has someone done a more efficient minidom replacement module that we can
just drop in? Preferrably written in C?

I'm curious how DOM dependent you really are. I.e., what minidom methods do you really use?
Can you assume that you are dealing with valid (error-free) XML as input?


Yes, it is assumed to be valid. We don't even use a DTD. But we use the DOM
to point to later nodes in the tree by following references in nodes higher
in the tree.

But, building a sparse object model initially and resolving references
later might be the right solution.
Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
2210
by: Miles Davenport | last post by:
I would like some advice on what Java server-side alternatives their are to an applet which is a shopping cart application which allows the user to drag-and-drop individual items into "an order" panel within the applet environment on the browser client. I am eager to port this application, as the rest of the site uses JSP, servlets, XML, struts..... I do not have much experience of Swing, but would like to know if there is a way of...
1
2217
by: gcash | last post by:
I'm doing stuff using COM to automate IE, for some boring repetitive portal tasks I have to do. Anyway, every so often I'll make some sort of error and things will crash, and I'll have to do something like using task manager to kill Python and/or the IE instance. Then I'll reload pythonwin and get "LoadBarState failed due to win32 exception" and once I get this pythonwin will no longer work. It'll get all sorts of errors, the drop...
0
1367
by: Archana | last post by:
I am looking for a web based drawing control but have had no luck findin one. I want something that is similar to Visio where there is a palette o objects that get dragged to a drawing surface or panel. It needs to have th same basic functionality of Visio like drag and drop, connecting of object with lines (connectors), connector snap points that stay attached when a object is moved and so on. The objects have to be totally customizable....
7
6540
by: Larry R Harrison Jr | last post by:
I am looking for javascript and a basic tutorial on how to make mouse-over drop-down menus--the type that when you "hover" over a subject links relevant to that subject "emerge" which you can then "hover" over and click. (see the links left on http://www.dpreview.com to see what I mean) I have code from smartwebby.com (DHTML) but I'm not sure if it's the best, and I'm not sure how to integrate any menus of my own into it. The code...
24
3870
by: AES/newspost | last post by:
On many web sites or pages (including my own home page) clicking on certain links will start downloading a PDF file, sometimes without the author having provided any warning in the text of the page that this is going to happen (although in some -- all? -- browsers the "next link" line at the bottom of the browser window may show that the next line is a ".pdf" file). I was recently on a page in which clicking on a link brought up a small...
2
2096
by: Kyle Blaney | last post by:
I am using a Listbox and can not get the "drop down" effect. My listbox is populated with 20 items and a vertical scrollbar is automatically added. One item is visible. When I click on the vertical scrollbar I can still only see one item at a time. The effect I want is as follows: The Listbox only displays one item until the user clicks on the right-hand side. When the user clicks on the right-hand side, the list "drops down" to...
4
37463
by: charliewest | last post by:
I need to set the selected drop down list value at run time. I am aware of the method "SelectIndex" however this works only if you know the precise location of the value within the ListItem collection. Otherwise, what is the recommended approach? I have managed to set the appropriate value using the following loop: for (int i = 1; i < ddlUsers.Items.Count; i++) { if (ddlUsers.Items.ToString() == sUserId)
7
3887
by: Risen | last post by:
Hi,all, I want to execute SQL command " DROP DATABASE mydb" and "Restore DATABASE ....." in vb.net 2003. But it always shows error. If any body can tell me how to execute sql command as above? Thanks a lot. Best regard. Risen
7
2299
by: Rich | last post by:
Is the link rel="stylesheet" supposed to be real plain text, or would some word processor format such as Word/Pad work? This sample stylesheet seems garbled if downloaded and opened with Notepad, but seems to view right in browser. Could this be a problem, or cause slow loading when applied? A link for this sample "style.css" is on http://users.ntplx.net/~richw/ Thanks for any advice, Rich
2
1959
by: Doogly | last post by:
Hello everyone, I'm really new to Javascript and was wondering if anyone could give me code for a nice looking drop down menu. Ideally it would do that little scrolly animation thingy when it drops down, you know, like a curtain coming down, but that's not too important. Also, I plan on using it for my blogger menu if that helps. Any advice or code would be appreciated. Thanks.
0
9401
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9257
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9176
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9113
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8097
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
4784
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3221
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2635
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2157
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.