472,328 Members | 1,051 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,328 software developers and data experts.

elementtree: line numbers and iterparse

I have a broad (~200K nodes) but shallow xml file
I want to parse with Elementtree. There are too many
nodes to read into memory simultaneously so I use
iterparse() to process each node sequentially.

Now I find i need to get and save the input file line
number of each node. Googling turned up a way
to do it by subclassing FancyTreeBuilder,
(http://groups.google.com/group/comp....9553b4b?hl=en&)
but that tries to read everything at once.

Is there a way to do something similiar with iterparse()?

Sep 13 '06 #1
2 3646
Stuart McGraw wrote:
I have a broad (~200K nodes) but shallow xml file
I want to parse with Elementtree. There are too many
nodes to read into memory simultaneously so I use
iterparse() to process each node sequentially.

Now I find i need to get and save the input file line
number of each node. Googling turned up a way
to do it by subclassing FancyTreeBuilder,
(http://groups.google.com/group/comp....9553b4b?hl=en&)
but that tries to read everything at once.

Is there a way to do something similiar with iterparse()?
something like this could work:

import elementtree.ElementTree as ET
import StringIO

data = """\
<doc>
<tag>
<subtag>text</subtag>
<subtag>text</subtag>
</tag>
</doc>
"""

class FileWrapper:
def __init__(self, source):
self.source = source
self.lineno = 0
def read(self, bytes):
s = self.source.readline()
self.lineno += 1
return s

# f = FileWrapper(open("source.xml")
f = FileWrapper(StringIO.StringIO(data))

for event, elem in ET.iterparse(f, events=["start", "end"]):
if event == "start":
print f.lineno, event, elem

</F>

Sep 13 '06 #2

"Fredrik Lundh" <fr*****@pythonware.comwrote in message news:ma************************************@python .org...
Stuart McGraw wrote:
Now I find i need to get and save the input file line
number of each node. Googling turned up a way
to do it by subclassing FancyTreeBuilder,
(http://groups.google.com/group/comp....9553b4b?hl=en&)
but that tries to read everything at once.

Is there a way to do something similiar with iterparse()?

something like this could work:
...snip...
Indeed it does. Many thanks!

Sep 13 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Stewart Midwinter | last post by:
I want to parse a file with ElementTree. My file has the following format: <!-- file population.xml --> <?xml version='1.0' encoding='utf-8'?>...
1
by: Greg Wilson | last post by:
I'm trying to convert from minidom to ElementTree for handling XML, and am having trouble with entities in DTDs. My Python script looks like this:...
4
by: alainpoint | last post by:
Hello, I use Elementtree to parse an elementary SVG file (in fact, it is one of the examples in the "SVG essentials" book). More precisely, it is...
4
by: paul.sherwood | last post by:
Hi Im trying to parse a large(150MB) xml file in order to extract specific required records. import sys from elementtree.ElementTree import...
3
by: mirandacascade | last post by:
Verion of Python: 2.4 O/S: Windows XP ElementTree resides in the c:\python24\lib\site-packages\elementtree\ folder When a string that does not...
15
by: Steven Bethard | last post by:
I'm having trouble using elementtree with an XML file that has some gbk-encoded text. (I can't read Chinese, so I'm taking their word for it that...
0
by: sndive | last post by:
I have a weid problem. If i do this: import elementtree.ElementTree as ET .... tree = ET.parse("whatever") root = tree.getroot() r =...
2
by: =?ISO-8859-1?Q?J=2E_Pablo_Fern=E1ndez?= | last post by:
Hello, Is ElementTree supposed to load DTDs? I have some xmls heavy on entities and it fails this way: Python 2.5.2 (r252:60911, Apr 21 2008,...
4
by: jaime.dyson | last post by:
Hello all, I have the unenviable task of turning about 20K strangely formatted XML documents from different sources into something resembling a...
0
by: tammygombez | last post by:
Hey fellow JavaFX developers, I'm currently working on a project that involves using a ComboBox in JavaFX, and I've run into a bit of an issue....
0
by: concettolabs | last post by:
In today's business world, businesses are increasingly turning to PowerApps to develop custom business applications. PowerApps is a powerful tool...
0
better678
by: better678 | last post by:
Question: Discuss your understanding of the Java platform. Is the statement "Java is interpreted" correct? Answer: Java is an object-oriented...
0
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and...
0
by: CD Tom | last post by:
This happens in runtime 2013 and 2016. When a report is run and then closed a toolbar shows up and the only way to get it to go away is to right...
0
by: CD Tom | last post by:
This only shows up in access runtime. When a user select a report from my report menu when they close the report they get a menu I've called Add-ins...
0
jalbright99669
by: jalbright99669 | last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made...
0
by: Matthew3360 | last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function. Here is my code. ...
1
by: Matthew3360 | last post by:
Hi, I have a python app that i want to be able to get variables from a php page on my webserver. My python app is on my computer. How would I make it...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.