iterate over a series of nodes in an XML file

rajarshi.guha

Hi, I have an XML file which contains entries of the form:

<idlist>
<myID>1</myID>
<myID>2</myID>
.....
<myID>10000</myID>
</idlist>

Currently, I have written a SAX based handler that will read in all the
<myID></myIDentries and return a list of the contents of these
entries. However this is not scalable and for my purposes it would be
better if I could iterate over the list of <myIDnodes. Some thing
like:

for myid in getMyIDList(document):
print myid

I realize that I can do this with generators, but I can't see how I can
incorporate generators into my handler class (which is a subclass of
xml.sax.ContentHandler).

Any pointers would be appreciated

Thanks,
Rajarshi

Jul 5 '06 #1

Subscribe Post Reply

2781

Diez B. Roggisch

ra***********@gmail.com wrote:

Hi, I have an XML file which contains entries of the form:

<idlist>
<myID>1</myID>
<myID>2</myID>
....
<myID>10000</myID>
</idlist>

Currently, I have written a SAX based handler that will read in all the
<myID></myIDentries and return a list of the contents of these
entries. However this is not scalable and for my purposes it would be
better if I could iterate over the list of <myIDnodes. Some thing
like:

for myid in getMyIDList(document):
print myid

I realize that I can do this with generators, but I can't see how I can
incorporate generators into my handler class (which is a subclass of
xml.sax.ContentHandler).

Any pointers would be appreciated

Use ElementTree. Or one of the other packages that implement its very
pythonic interface, lxml or cElementTree.

Otherwise, you don't have much chances of using SAX to create a generator
besides reading the whole document into memory (which somehow defeats the
purpose of SAX in the first place) or creating a separate thread that
communicates with an iterable over a queue.

Alternatively, there are parsers out there that implement a PULL style of
parsing instead of the PUSH SAX does. Butr before you start with theses -
take ElementTree.

Diez

Jul 5 '06 #2

Stefan Behnel

ra***********@gmail.com wrote:

I have an XML file which contains entries of the form:

<idlist>
<myID>1</myID>
<myID>2</myID>
....
<myID>10000</myID>
</idlist>

Currently, I have written a SAX based handler that will read in all the
<myID></myIDentries and return a list of the contents of these
entries. However this is not scalable and for my purposes it would be
better if I could iterate over the list of <myIDnodes. Some thing
like:

for myid in getMyIDList(document):
print myid

You can try lxml 1.1.

http://cheeseshop.python.org/pypi/lxml/1.1alpha

Some documentation is here:
http://codespeak.net/svn/lxml/trunk/doc/api.txt

I haven't tested it, but you should be able to do this:

from lxml.etree import iterparse
last = None
for event, myid in iterparse(document_url, tag="myID"):
print myid.text
if last is not None:
last.getparent().remove(last)
last = myid

Internally, iterparse builds up a tree, so the last three lines are there to
remove the myid elements from the tree that were already handled. This saves a
lot of memory for large documents.

Stefan

Jul 5 '06 #3

rajarshi.guha

Stefan Behnel wrote:

ra***********@gmail.com wrote:
I have an XML file which contains entries of the form:

<idlist>
<myID>1</myID>
<myID>2</myID>
....
<myID>10000</myID>
</idlist>

Thanks to everybody for the pointers. ElementTree is what I ended up
using and my looks like this (based on the ElementTree tutorial code):

def extractIds(filename):
f = open(filename,'r')
context = ET.iterparse(f, events=('start','end'))
context = iter(context)
even, root = context.next()

for event, elem in context:
if event == 'end' and elem.tag == 'Id':
yield elem.text
root.clear()

As a result I can do:

for id in extractIds(someFileName):
do something

Jul 5 '06 #4

Steve M

I see you've had success with elementtree, but in case you are still
thinking about SAX, here is an approach that might interest you. The
idea is basically to turn your program inside-out by writing a
standalone function to process one myID node. This function has nothing
to do with SAX or parsing the XML tree. This function becomes a
callback that you pass to your SAX handler to call on each node.

import xml.sax

def myID_callback(data):
"""Process the text of one myID node - boil it, mash it, stick it
in a list..."""
print data

class MyHandler(xml.sax.ContentHandler):
def __init__(self, myID_callback):
#a buffer to collect text data that may or may not be needed
later
self.current_text_data = []
self.myID_callback = myID_callback

def characters(self, data):
"""Accumulate characters. startElement("myID") resets it."""
self.current_text_data.append(data)

def startElement(self, name, attributes):
if name == 'myID':
self.current_text_data = []

def endElement(self, name):
if name == 'myID':
data = "".join(self.current_text_data)
self.myID_callback(data)

filename = 'idlist.xml'
xml.sax.parse(filename, MyHandler(myID_callback))

Jul 5 '06 #5

by: nickheppleston | last post by:

I'm trying to iterate through repeating elements to extract data using libxml2 but I'm having zero luck - any help would be appreciated. My XML source is similar to the following - I'm trying to...

Python

Update XML config file with treeview nodes

by: Christian Rühl | last post by:

hey! what i wanna do sounds very simple at first, but it turned out to be a real bone crusher... i want to check if a treeView node is checked and if a correspondent node in my xml config file...

.NET Framework

updating xcml config file with treeView nodes

by: Christian Rühl | last post by:

hey! what i wanna do sounds very simple at first, but it turned out to be a real bone crusher... i want to check if a treeView node is checked and if a correspondent node in my xml config file...

C# / C Sharp

How ro Iterate a portion of a container

by: toton | last post by:

Hi, I have a container class, and I want to iterate over a portion of the container class while I insert/remove item from it. Noting down the present location & constructing iterator from there is...

C / C++

Is there a way to delete a subset (continous) of child nodes?

by: Daniel Rucareanu | last post by:

Hello, Does anybody knows how can you delete, in just one step, not using a loop, a subset of the child nodes of a given DOM parent node? The subset will be continous, so for example, if the...

Javascript

Implementing stack and Lucas series in C++

by: suresh A N | last post by:

Hi All, Can anybody help me for these programs. I need to submit ny assignments. 1) Write a C++ program to generate Lucas series 2) Write a C++ program to accept the student's information...

C / C++

Count all nodes in a treeview

by: John Rogers | last post by:

This code only counts the parent nodes or rootnodes in a treeview, how do you count all the nodes in a treeview? // one way int NodeCounter = 0; foreach (TreeNode currentNode in...

C# / C Sharp

Using Recursion to iterate through Treeview control

by: dutsnekcirf | last post by:

I'm new to the concept of recursion and it's quite confusing to me. I found an article here on MSDN that talks about how to iterate through the nodes in a treeview. I've able to get the code...

.NET Framework

XSLT compare attributes of XML nodes

by: blackirish | last post by:

Hi all, I am trying to merge 2 XML files that first of all i need to compare nodes of both files according to 2 attributes in the nodes. If those 3 attributes are equal, i need to replace the...

XML

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

iterate over a series of nodes in an XML file

Similar topics