By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,199 Members | 1,064 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,199 IT Pros & Developers. It's quick & easy.

Re: XML -> Tab-delimited text file (using lxml)

P: n/a
Gibson wrote:
I'm attempting to do the following:
A) Read/scan/iterate/etc. through a semi-large XML file (about 135 mb)
B) Grab specific fields and output to a tab-delimited text file
[...]
out = open('output.txt','w')
cat = etree.parse('catalog.xml')
Use iterparse() instead of parsing the file into memory completely.

untested:

for _, item in etree.iterparse('catalog.xml', tag='Item'):
# do some cleanup to save memory
previous_item = item.getprevious()
while previous_item is not None:
previous_item.getparent().remove(previous_item)
previous_item = item.getprevious()

# now read the data
id = item.get('ID')
collect = {}
for child in item:
if child.tag != 'ItemVal': continue
collect[child.get('ValueId')] = child.get('value')

print "%s\t%s\t%s\t%s" % ((id,) + tuple(
collect[key] for key in ['name','description','image']))

Stefan
Nov 19 '08 #1
Share this Question
Share on Google+
1 Reply


P: n/a
On Nov 19, 11:03*am, Stefan Behnel <stefan...@behnel.dewrote:
>
Use iterparse() instead of parsing the file into memory completely.

*stuff*

Stefan
That worked wonders. Thanks a lot, Stefan.

So, iterparse() uses an iterate -parse method instead of parse() and
iter()'s parse -iterate method (if that makes any sense)?
Nov 19 '08 #2

This discussion thread is closed

Replies have been disabled for this discussion.