467,173 Members | 1,366 Online
Bytes | Developer Community
Ask Question

Home New Posts Topics Members FAQ

Post your question to a community of 467,173 developers. It's quick & easy.

elementtree: line numbers and iterparse

I have a broad (~200K nodes) but shallow xml file
I want to parse with Elementtree. There are too many
nodes to read into memory simultaneously so I use
iterparse() to process each node sequentially.

Now I find i need to get and save the input file line
number of each node. Googling turned up a way
to do it by subclassing FancyTreeBuilder,
(http://groups.google.com/group/comp....9553b4b?hl=en&)
but that tries to read everything at once.

Is there a way to do something similiar with iterparse()?

Sep 13 '06 #1
  • viewed: 3229
Share:
2 Replies
Stuart McGraw wrote:
I have a broad (~200K nodes) but shallow xml file
I want to parse with Elementtree. There are too many
nodes to read into memory simultaneously so I use
iterparse() to process each node sequentially.

Now I find i need to get and save the input file line
number of each node. Googling turned up a way
to do it by subclassing FancyTreeBuilder,
(http://groups.google.com/group/comp....9553b4b?hl=en&)
but that tries to read everything at once.

Is there a way to do something similiar with iterparse()?
something like this could work:

import elementtree.ElementTree as ET
import StringIO

data = """\
<doc>
<tag>
<subtag>text</subtag>
<subtag>text</subtag>
</tag>
</doc>
"""

class FileWrapper:
def __init__(self, source):
self.source = source
self.lineno = 0
def read(self, bytes):
s = self.source.readline()
self.lineno += 1
return s

# f = FileWrapper(open("source.xml")
f = FileWrapper(StringIO.StringIO(data))

for event, elem in ET.iterparse(f, events=["start", "end"]):
if event == "start":
print f.lineno, event, elem

</F>

Sep 13 '06 #2

"Fredrik Lundh" <fr*****@pythonware.comwrote in message news:ma************************************@python .org...
Stuart McGraw wrote:
Now I find i need to get and save the input file line
number of each node. Googling turned up a way
to do it by subclassing FancyTreeBuilder,
(http://groups.google.com/group/comp....9553b4b?hl=en&)
but that tries to read everything at once.

Is there a way to do something similiar with iterparse()?

something like this could work:
...snip...
Indeed it does. Many thanks!

Sep 13 '06 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

7 posts views Thread by Stewart Midwinter | last post: by
1 post views Thread by Greg Wilson | last post: by
4 posts views Thread by alainpoint@yahoo.fr | last post: by
4 posts views Thread by paul.sherwood@gmail.com | last post: by
3 posts views Thread by mirandacascade@yahoo.com | last post: by
15 posts views Thread by Steven Bethard | last post: by
2 posts views Thread by =?ISO-8859-1?Q?J=2E_Pablo_Fern=E1ndez?= | last post: by
4 posts views Thread by jaime.dyson@gmail.com | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.