473,396 Members | 1,886 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

iterate over a series of nodes in an XML file

Hi, I have an XML file which contains entries of the form:

<idlist>
<myID>1</myID>
<myID>2</myID>
.....
<myID>10000</myID>
</idlist>

Currently, I have written a SAX based handler that will read in all the
<myID></myIDentries and return a list of the contents of these
entries. However this is not scalable and for my purposes it would be
better if I could iterate over the list of <myIDnodes. Some thing
like:

for myid in getMyIDList(document):
print myid

I realize that I can do this with generators, but I can't see how I can
incorporate generators into my handler class (which is a subclass of
xml.sax.ContentHandler).

Any pointers would be appreciated

Thanks,
Rajarshi

Jul 5 '06 #1
4 2781
ra***********@gmail.com wrote:
Hi, I have an XML file which contains entries of the form:

<idlist>
<myID>1</myID>
<myID>2</myID>
....
<myID>10000</myID>
</idlist>

Currently, I have written a SAX based handler that will read in all the
<myID></myIDentries and return a list of the contents of these
entries. However this is not scalable and for my purposes it would be
better if I could iterate over the list of <myIDnodes. Some thing
like:

for myid in getMyIDList(document):
print myid

I realize that I can do this with generators, but I can't see how I can
incorporate generators into my handler class (which is a subclass of
xml.sax.ContentHandler).

Any pointers would be appreciated
Use ElementTree. Or one of the other packages that implement its very
pythonic interface, lxml or cElementTree.

Otherwise, you don't have much chances of using SAX to create a generator
besides reading the whole document into memory (which somehow defeats the
purpose of SAX in the first place) or creating a separate thread that
communicates with an iterable over a queue.

Alternatively, there are parsers out there that implement a PULL style of
parsing instead of the PUSH SAX does. Butr before you start with theses -
take ElementTree.

Diez
Jul 5 '06 #2
ra***********@gmail.com wrote:
I have an XML file which contains entries of the form:

<idlist>
<myID>1</myID>
<myID>2</myID>
....
<myID>10000</myID>
</idlist>

Currently, I have written a SAX based handler that will read in all the
<myID></myIDentries and return a list of the contents of these
entries. However this is not scalable and for my purposes it would be
better if I could iterate over the list of <myIDnodes. Some thing
like:

for myid in getMyIDList(document):
print myid
You can try lxml 1.1.

http://cheeseshop.python.org/pypi/lxml/1.1alpha

Some documentation is here:
http://codespeak.net/svn/lxml/trunk/doc/api.txt

I haven't tested it, but you should be able to do this:

from lxml.etree import iterparse
last = None
for event, myid in iterparse(document_url, tag="myID"):
print myid.text
if last is not None:
last.getparent().remove(last)
last = myid

Internally, iterparse builds up a tree, so the last three lines are there to
remove the myid elements from the tree that were already handled. This saves a
lot of memory for large documents.

Stefan
Jul 5 '06 #3

Stefan Behnel wrote:
ra***********@gmail.com wrote:
I have an XML file which contains entries of the form:

<idlist>
<myID>1</myID>
<myID>2</myID>
....
<myID>10000</myID>
</idlist>

Thanks to everybody for the pointers. ElementTree is what I ended up
using and my looks like this (based on the ElementTree tutorial code):

def extractIds(filename):
f = open(filename,'r')
context = ET.iterparse(f, events=('start','end'))
context = iter(context)
even, root = context.next()

for event, elem in context:
if event == 'end' and elem.tag == 'Id':
yield elem.text
root.clear()

As a result I can do:

for id in extractIds(someFileName):
do something

Jul 5 '06 #4
I see you've had success with elementtree, but in case you are still
thinking about SAX, here is an approach that might interest you. The
idea is basically to turn your program inside-out by writing a
standalone function to process one myID node. This function has nothing
to do with SAX or parsing the XML tree. This function becomes a
callback that you pass to your SAX handler to call on each node.

import xml.sax

def myID_callback(data):
"""Process the text of one myID node - boil it, mash it, stick it
in a list..."""
print data

class MyHandler(xml.sax.ContentHandler):
def __init__(self, myID_callback):
#a buffer to collect text data that may or may not be needed
later
self.current_text_data = []
self.myID_callback = myID_callback

def characters(self, data):
"""Accumulate characters. startElement("myID") resets it."""
self.current_text_data.append(data)

def startElement(self, name, attributes):
if name == 'myID':
self.current_text_data = []

def endElement(self, name):
if name == 'myID':
data = "".join(self.current_text_data)
self.myID_callback(data)

filename = 'idlist.xml'
xml.sax.parse(filename, MyHandler(myID_callback))

Jul 5 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: nickheppleston | last post by:
I'm trying to iterate through repeating elements to extract data using libxml2 but I'm having zero luck - any help would be appreciated. My XML source is similar to the following - I'm trying to...
1
by: Christian Rühl | last post by:
hey! what i wanna do sounds very simple at first, but it turned out to be a real bone crusher... i want to check if a treeView node is checked and if a correspondent node in my xml config file...
1
by: Christian Rühl | last post by:
hey! what i wanna do sounds very simple at first, but it turned out to be a real bone crusher... i want to check if a treeView node is checked and if a correspondent node in my xml config file...
3
by: toton | last post by:
Hi, I have a container class, and I want to iterate over a portion of the container class while I insert/remove item from it. Noting down the present location & constructing iterator from there is...
1
by: Daniel Rucareanu | last post by:
Hello, Does anybody knows how can you delete, in just one step, not using a loop, a subset of the child nodes of a given DOM parent node? The subset will be continous, so for example, if the...
3
by: suresh A N | last post by:
Hi All, Can anybody help me for these programs. I need to submit ny assignments. 1) Write a C++ program to generate Lucas series 2) Write a C++ program to accept the student's information...
10
by: John Rogers | last post by:
This code only counts the parent nodes or rootnodes in a treeview, how do you count all the nodes in a treeview? // one way int NodeCounter = 0; foreach (TreeNode currentNode in...
5
by: dutsnekcirf | last post by:
I'm new to the concept of recursion and it's quite confusing to me. I found an article here on MSDN that talks about how to iterate through the nodes in a treeview. I've able to get the code...
12
by: blackirish | last post by:
Hi all, I am trying to merge 2 XML files that first of all i need to compare nodes of both files according to 2 attributes in the nodes. If those 3 attributes are equal, i need to replace the...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.