Stuart McGraw wrote:
I have a broad (~200K nodes) but shallow xml file
I want to parse with Elementtree. There are too many
nodes to read into memory simultaneously so I use
iterparse() to process each node sequentially.
Now I find i need to get and save the input file line
number of each node. Googling turned up a way
to do it by subclassing FancyTreeBuilder,
(http://groups.google.com/group/comp....9553b4b?hl=en&)
but that tries to read everything at once.
Is there a way to do something similiar with iterparse()?
something like this could work:
import elementtree.ElementTree as ET
import StringIO
data = """\
<doc>
<tag>
<subtag>text</subtag>
<subtag>text</subtag>
</tag>
</doc>
"""
class FileWrapper:
def __init__(self, source):
self.source = source
self.lineno = 0
def read(self, bytes):
s = self.source.readline()
self.lineno += 1
return s
# f = FileWrapper(open("source.xml")
f = FileWrapper(StringIO.StringIO(data))
for event, elem in ET.iterparse(f, events=["start", "end"]):
if event == "start":
print f.lineno, event, elem
</F>