473,395 Members | 2,795 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

NEWB: reverse traversal of xml file

Hi,

I have an xml file of about 140Mb like this:

<book>
<record>
....
<wordpartWTS>1</wordpartWTS>
</record>
<record>
...
<wordpartWTS>2</wordpartWTS>
</record>
<record>
....
<wordpartWTS>1</wordpartWTS>
</record>
</book>

I want to traverse it from bottom to top and add another field to each
record <totalWordPart>1</totalWordPart>
which would give the highest value of wordpartWTS for each record for
each word

so if wordparts for the first ten records were 1 2 1 1 1 2 3 4 1 2
I want totalWordPart to be 2 2 1 1 4 4 4 4 2 2

I figure the easiest way to do this is to go thru the file backwards.

Any ideas how to do this with an xml data file?

Thanks

May 23 '06 #1
3 1276
manstey wrote:
Hi,

I have an xml file of about 140Mb like this:

<book>
<record>
...
<wordpartWTS>1</wordpartWTS>
</record>
<record>
...
<wordpartWTS>2</wordpartWTS>
</record>
<record>
...
<wordpartWTS>1</wordpartWTS>
</record>
</book>

I want to traverse it from bottom to top and add another field to each
record <totalWordPart>1</totalWordPart>
which would give the highest value of wordpartWTS for each record for
each word

so if wordparts for the first ten records were 1 2 1 1 1 2 3 4 1 2
I want totalWordPart to be 2 2 1 1 4 4 4 4 2 2

I figure the easiest way to do this is to go thru the file backwards.

Any ideas how to do this with an xml data file?


You need to iterate from the beginning and use itertools.groupby:

from itertools import groupby

def enumerate_words(parts):
word_num = 0
prev = 0
for part in parts:
if prev >= part:
word_num += 1
prev = part
yield word_num, part
def get_word_num(item):
return item[0]

parts = 1,2,1,1,1,2,3,4,1,2
for word_num, word in groupby(enumerate_words(parts), get_word_num):
parts_list = list(word)
max_part = parts_list[-1][1]
for word_num, part_num in parts_list:
print max_part, part_num

prints:

2 1
2 2
1 1
1 1
4 1
4 2
4 3
4 4
2 1
2 2

May 23 '06 #2
But will this work if I don't know parts in advance. I only know parts
by reading through the file, which has 450,000 lines.

May 23 '06 #3
manstey wrote:
But will this work if I don't know parts in advance.
Yes it will work as long as the highest part number in the whole file
is not very high. The algorithm needs only store N records in memory,
where N is the highest part number in the whole file.
I only know parts
by reading through the file, which has 450,000 lines.


Lines or records? I created a sequence of 10,000,000 numbers which is
equal to your ten million records like this:

def many_numbers():
for n in xrange(1000000):
for part in xrange(10):
yield part
parts = many_numbers()

and the code processed it consuming virtually no memory in 13 seconds.
That is the advantage of iterators and generators, you can process long
sequences without allocating a lot of memory.

May 24 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: ravi mannan | last post by:
Hello all, I'm trying to read an xml file and create a nested JPopupMenu from that. The first thing I want to do is to read in the xml file and put it in a Document using DOM and then do a...
1
by: nightsaber | last post by:
<script language="JavaScript"> <!-- hide me var the_number = prompt("how many words (3-5 is good)?", "4"); var the_string = ""; var a_word; for (loop = 0; loop < the_number; loop++) {...
1
by: guy001 | last post by:
Hi, I'm trying to traverse the DOM in a bit of a non-traditional manner and am struggling to get my head around it. Just say i have some elements like so: A |-B |-C | |-D |
24
by: Apotheosis | last post by:
The problem professor gave us is: Write a program which reads two integer values. If the first is less than the second, print the message "up". If the second is less than the first, print the...
22
by: delraydog | last post by:
It's quite simple to walk to the DOM tree going forward however I can't figure out a nice clean way to walk the DOM tree in reverse. Checking previousSibling is not sufficient as the...
14
by: manstey | last post by:
Hi, Is there a clever way to see if two strings of the same length vary by only one character, and what the character is in both strings. E.g. str1=yaqtil str2=yaqtel they differ at str1...
6
by: GrispernMix | last post by:
//ques and and level order traversal file name: lab6_build_leaf_up.cpp Instructions:
4
by: John A Grandy | last post by:
Is there a performance difference between forward iteration and reverse iteration through a List<string? for ( i = 0; i < myList.Count; i++ ) { // do work, such as forward iterate through a...
2
by: slizorn | last post by:
hi guys, i need to make a tree traversal algorithm that would help me search the tree.. basically i need to read in a text file... shown below H H,E,L E,B,F B,A,C A,null,null c,null,D
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.