473,395 Members | 1,386 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

"groupby" is brilliant!

Hi all

This is probably old hat to most of you, but for me it was a
revelation, so I thought I would share it in case someone has a similar
requirement.

I had to convert an old program that does a traditional pass through a
sorted data file, breaking on a change of certain fields, processing
each row, accumulating various totals, and doing additional processing
at each break. I am not using a database for this one, as the file
sizes are not large - a few thousand rows at most. I am using csv
files, and using the csv module so that each row is nicely formatted
into a list.

The traditional approach is quite fiddly, saving the values of the
various break fields, comparing the values on each row with the saved
values, and taking action if the values differ. The more break fields
there are, the fiddlier it gets.

I was going to do the same in python, but then I vaguely remembered
reading about 'groupby'. It took a little while to figure it out, but
once I had cracked it, it transformed the task into one of utter
simplicity.

Here is an example. Imagine a transaction file sorted by branch,
account number, and date, and you want to break on all three.

-----------------------------
import csv
from itertools import groupby
from operator import itemgetter

BRN = 0
ACC = 1
DATE = 2

reader = csv.reader(open('trans.csv', 'rb'))
rows = []
for row in reader:
rows.append(row)

for brn,brnList in groupby(rows,itemgetter(BRN)):
for acc,accList in groupby(brnList,itemgetter(ACC)):
for date,dateList in groupby(accList,itemgetter(DATE)):
for row in dateList:
[do something with row]
[do something on change of date]
[do something on change of acc]
[do something on change of brn]
-----------------------------

Hope someone finds this of interest.

Frank Millman

Jun 13 '06 #1
20 1879
vpr
Hi Frank

This is one of the reasons why I love Python, you can write readable
code.
I strive to write clean code but I find that exception handling code
e.g. try:
makes my code ugly and significantly harder to read. Does anyone have
any good
pointers for a former C++ / Perl coder.

/vpr
Frank Millman wrote:
Hi all

This is probably old hat to most of you, but for me it was a
revelation, so I thought I would share it in case someone has a similar
requirement.

I had to convert an old program that does a traditional pass through a
sorted data file, breaking on a change of certain fields, processing
each row, accumulating various totals, and doing additional processing
at each break. I am not using a database for this one, as the file
sizes are not large - a few thousand rows at most. I am using csv
files, and using the csv module so that each row is nicely formatted
into a list.

The traditional approach is quite fiddly, saving the values of the
various break fields, comparing the values on each row with the saved
values, and taking action if the values differ. The more break fields
there are, the fiddlier it gets.

I was going to do the same in python, but then I vaguely remembered
reading about 'groupby'. It took a little while to figure it out, but
once I had cracked it, it transformed the task into one of utter
simplicity.

Here is an example. Imagine a transaction file sorted by branch,
account number, and date, and you want to break on all three.

-----------------------------
import csv
from itertools import groupby
from operator import itemgetter

BRN = 0
ACC = 1
DATE = 2

reader = csv.reader(open('trans.csv', 'rb'))
rows = []
for row in reader:
rows.append(row)

for brn,brnList in groupby(rows,itemgetter(BRN)):
for acc,accList in groupby(brnList,itemgetter(ACC)):
for date,dateList in groupby(accList,itemgetter(DATE)):
for row in dateList:
[do something with row]
[do something on change of date]
[do something on change of acc]
[do something on change of brn]
-----------------------------

Hope someone finds this of interest.

Frank Millman


Jun 13 '06 #2
>
reader = csv.reader(open('trans.csv', 'rb'))
rows = []
for row in reader:
rows.append(row)


This is untested, but you might think about converting your explicit "for...
append" loop into either a list comp,

rows = [row for row in reader]

or just a plain list constructor:

rows = list(reader)

Neh?

-- Paul
(Oh, and I like groupby too! Combine it with sort to quickly create
histograms.)

# tally a histogram of a list of values from 1-10
dataValueRange = range(1,11)
data = [random.choice(dataValueRange) for i in xrange(10000)]

hist = [ (k,len(list(g))) for k,g in itertools.groupby(sorted(data)) ]
print hist

histAsDict = dict((k,len(list(g))) for k,g in
itertools.groupby(sorted(data)))
print histAsDict

Gives:

[(1, 979), (2, 1034), (3, 985), (4, 969), (5, 1020), (6, 975), (7, 981), (8,
1070), (9, 1003), (10, 984)]
{1: 979, 2: 1034, 3: 985, 4: 969, 5: 1020, 6: 975, 7: 981, 8: 1070, 9: 1003,
10: 984}
Jun 13 '06 #3

Paul McGuire wrote:

reader = csv.reader(open('trans.csv', 'rb'))
rows = []
for row in reader:
rows.append(row)


This is untested, but you might think about converting your explicit "for...
append" loop into either a list comp,

rows = [row for row in reader]

or just a plain list constructor:

rows = list(reader)

Neh?

-- Paul


Yup, they both work fine.

There may be times when you want to massage the data before appending
it, in which case you obviously have to do it the long way. Otherwise
these are definitely neater, the last one especially.

You could even do it as a one-liner -
rows = list(csv.reader(open('trans.csv', 'rb')))

It still looks perfectly readable to me.

Thanks

Frank

Jun 13 '06 #4
Frank;
I would just like to thank-you for this timely post.
I am working on a reporting project that needed "groupby" functionality
and I was going to sit down this morning to rework some "very ugly
code" into some "not quite so ugly code".

Your post got me pointed to in the "right" direction and the end
results will be much more flexible and ALOT more maintainable.

Thanks.

Jun 13 '06 #5
Frank Millman wrote:
reader = csv.reader(open('trans.csv', 'rb'))
rows = []
for row in reader:
rows.append(row)


Why do you create a list of rows instead of just iterating over the
reader directly?
--
Benji York
Jun 13 '06 #6
Frank Millman wrote:
Hi all

This is probably old hat to most of you, but for me it was a
revelation, so I thought I would share it in case someone has a similar
requirement.

I had to convert an old program that does a traditional pass through a
sorted data file, breaking on a change of certain fields, processing
each row, accumulating various totals, and doing additional processing
at each break. I am not using a database for this one, as the file
sizes are not large - a few thousand rows at most. I am using csv
files, and using the csv module so that each row is nicely formatted
into a list.

The traditional approach is quite fiddly, saving the values of the
various break fields, comparing the values on each row with the saved
values, and taking action if the values differ. The more break fields
there are, the fiddlier it gets.

I was going to do the same in python, but then I vaguely remembered
reading about 'groupby'. It took a little while to figure it out, but
once I had cracked it, it transformed the task into one of utter
simplicity.

Here is an example. Imagine a transaction file sorted by branch,
account number, and date, and you want to break on all three.

-----------------------------
import csv
from itertools import groupby
from operator import itemgetter

BRN = 0
ACC = 1
DATE = 2

reader = csv.reader(open('trans.csv', 'rb'))
rows = []
for row in reader:
rows.append(row)

for brn,brnList in groupby(rows,itemgetter(BRN)):
for acc,accList in groupby(brnList,itemgetter(ACC)):
for date,dateList in groupby(accList,itemgetter(DATE)):
for row in dateList:
[do something with row]
[do something on change of date]
[do something on change of acc]
[do something on change of brn]
-----------------------------

Hope someone finds this of interest.

Frank Millman


I'm sure I'm going to get a lot of flac on this list for proposing to
turn nested for-loops into a recursive function, but I couldn't help
myself. This seems more simple to me, but for others it may be difficult
to look at, and these people will undoubtedly complain.
import csv
from itertools import groupby
from operator import itemgetter

reader = csv.reader(open('trans.csv', 'rb'))
rows = []
for row in reader:
rows.append(row)

def brn_doer(row):
[doing something with brn here]

def acc_doer(date):
[you get the idea]

[etc.]

doers = [brn_doer, acc_doer, date_doer, row_doer]

def doit(rows, doers, i=0):
for r, alist in groupby(rows, itemgetter(i)):
doit(alist, doers[1:], i+1)
doers[0](r)

doit(rows, doers, 0)

Now all of those ugly for loops become one recursive function. Bear in
mind, its not all that 'elegant', but it looks nicer, is more succinct,
abstracts the process, and scales to arbitrary depth. Tragically,
however, it has been generalized, which is likely to raise some hackles
here. And, oh yes, it didn't answer exactly your question (which you
didn't really have). I'm sure I will regret this becuase, as you will
find, suggesting code on this list with additional utility is somewhat
discouraged by the vociferous few who make a religion out of 'import this'.

Also, I still have no idea what 'groupby' does. It looks interesting
thgough, thanks for pointing it out.

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
Jun 13 '06 #7
James Stroud wrote:
Frank Millman wrote:
Hi all

This is probably old hat to most of you, but for me it was a
revelation, so I thought I would share it in case someone has a similar
requirement.

I had to convert an old program that does a traditional pass through a
sorted data file, breaking on a change of certain fields, processing
each row, accumulating various totals, and doing additional processing
at each break. I am not using a database for this one, as the file
sizes are not large - a few thousand rows at most. I am using csv
files, and using the csv module so that each row is nicely formatted
into a list.

The traditional approach is quite fiddly, saving the values of the
various break fields, comparing the values on each row with the saved
values, and taking action if the values differ. The more break fields
there are, the fiddlier it gets.

I was going to do the same in python, but then I vaguely remembered
reading about 'groupby'. It took a little while to figure it out, but
once I had cracked it, it transformed the task into one of utter
simplicity.

Here is an example. Imagine a transaction file sorted by branch,
account number, and date, and you want to break on all three.

-----------------------------
import csv
from itertools import groupby
from operator import itemgetter

BRN = 0
ACC = 1
DATE = 2

reader = csv.reader(open('trans.csv', 'rb'))
rows = []
for row in reader:
rows.append(row)

for brn,brnList in groupby(rows,itemgetter(BRN)):
for acc,accList in groupby(brnList,itemgetter(ACC)):
for date,dateList in groupby(accList,itemgetter(DATE)):
for row in dateList:
[do something with row]
[do something on change of date]
[do something on change of acc]
[do something on change of brn]
-----------------------------

Hope someone finds this of interest.

Frank Millman


I'm sure I'm going to get a lot of flac on this list for proposing to
turn nested for-loops into a recursive function, but I couldn't help
myself. This seems more simple to me, but for others it may be difficult
to look at, and these people will undoubtedly complain.
import csv
from itertools import groupby
from operator import itemgetter

reader = csv.reader(open('trans.csv', 'rb'))
rows = []
for row in reader:
rows.append(row)

def brn_doer(row):
[doing something with brn here]

def acc_doer(date):
[you get the idea]

[etc.]

doers = [brn_doer, acc_doer, date_doer, row_doer]

def doit(rows, doers, i=0):
for r, alist in groupby(rows, itemgetter(i)):
doit(alist, doers[1:], i+1)
doers[0](r)

doit(rows, doers, 0)

Now all of those ugly for loops become one recursive function. Bear in
mind, its not all that 'elegant', but it looks nicer, is more succinct,
abstracts the process, and scales to arbitrary depth. Tragically,
however, it has been generalized, which is likely to raise some hackles
here. And, oh yes, it didn't answer exactly your question (which you
didn't really have). I'm sure I will regret this becuase, as you will
find, suggesting code on this list with additional utility is somewhat
discouraged by the vociferous few who make a religion out of 'import this'.

Also, I still have no idea what 'groupby' does. It looks interesting
thgough, thanks for pointing it out.

James


Forgot to test for stopping condition:
def doit(rows, doers, i=0):
for r, alist in groupby(rows, itemgetter(i)):
if len(doers) > 1:
doit(alist, doers[1:], i+1)
doers[0](r)

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
Jun 13 '06 #8
Not related to itertools.groupby, but the csv.reader object...

If for some reason you have malformed CSV files, with embedded newlines
or something of that effect, it will raise an exception. To skip those,
you will need a construct of something like this:

raw_csv_in = file('filenamehere.csv')
for raw_line in raw_csv_in:
try:
# Do something to rawline here maybe if necessary to "clean it
up"
row = csv.reader( [raw_line] ).next()
# Do your stuff here
except csv.Error:
pass # or do something more appropriate if the record is
important

May not be applicable in your case, but has stung me a few times...

All the best,

Jon.
Frank Millman wrote:
Paul McGuire wrote:

reader = csv.reader(open('trans.csv', 'rb'))
rows = []
for row in reader:
rows.append(row)


This is untested, but you might think about converting your explicit "for...
append" loop into either a list comp,

rows = [row for row in reader]

or just a plain list constructor:

rows = list(reader)

Neh?

-- Paul


Yup, they both work fine.

There may be times when you want to massage the data before appending
it, in which case you obviously have to do it the long way. Otherwise
these are definitely neater, the last one especially.

You could even do it as a one-liner -
rows = list(csv.reader(open('trans.csv', 'rb')))

It still looks perfectly readable to me.

Thanks

Frank


Jun 13 '06 #9
On 13/06/2006 6:28 PM, Paul McGuire wrote:
(Oh, and I like groupby too! Combine it with sort to quickly create
histograms.)

# tally a histogram of a list of values from 1-10
dataValueRange = range(1,11)
data = [random.choice(dataValueRange) for i in xrange(10000)]

hist = [ (k,len(list(g))) for k,g in itertools.groupby(sorted(data)) ]


That len(list(g)) looks like it uses O(N) memory just to find out what N
is :-(

The best I could come up with is sum(itertools.imap(lambda x: 1, g)) --
but that does look a bit ugly ...

Jun 13 '06 #10
John Machin wrote:
On 13/06/2006 6:28 PM, Paul McGuire wrote:

(Oh, and I like groupby too! Combine it with sort to quickly create
histograms.)

# tally a histogram of a list of values from 1-10
dataValueRange = range(1,11)
data = [random.choice(dataValueRange) for i in xrange(10000)]

hist = [ (k,len(list(g))) for k,g in itertools.groupby(sorted(data)) ]

That len(list(g)) looks like it uses O(N) memory just to find out what N
is :-(

Not at all! A python list *knows* its length at all times. len() is a
constant time lookup of an internal attribute.

Gary Herron
The best I could come up with is sum(itertools.imap(lambda x: 1, g)) --
but that does look a bit ugly ...


Jun 13 '06 #11
On 14/06/2006 8:06 AM, Gary Herron wrote:
John Machin wrote:
On 13/06/2006 6:28 PM, Paul McGuire wrote:

(Oh, and I like groupby too! Combine it with sort to quickly create
histograms.)

# tally a histogram of a list of values from 1-10
dataValueRange = range(1,11)
data = [random.choice(dataValueRange) for i in xrange(10000)]

hist = [ (k,len(list(g))) for k,g in itertools.groupby(sorted(data)) ]

That len(list(g)) looks like it uses O(N) memory just to find out what N
is :-(

Not at all! A python list *knows* its length at all times. len() is a
constant time lookup of an internal attribute.


Did you see any reference to time in what I wrote? Did you notice the
word "memory" at all?

My point is that "g" is an iterator, and list(g) actually builds a list
of size N, merely in order to use len(that_list) to count the number of
items that g will produce.
Jun 13 '06 #12
Gary Herron wrote:
John Machin wrote:
On 13/06/2006 6:28 PM, Paul McGuire wrote:
(Oh, and I like groupby too! Combine it with sort to quickly create
histograms.)

# tally a histogram of a list of values from 1-10
dataValueRange = range(1,11)
data = [random.choice(dataValueRange) for i in xrange(10000)]

hist = [ (k,len(list(g))) for k,g in itertools.groupby(sorted(data)) ]


That len(list(g)) looks like it uses O(N) memory just to find out what N
is :-(


Not at all! A python list *knows* its length at all times. len() is a
constant time lookup of an internal attribute.


The point is that you had to create the list in the first place. g is an iterator.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Jun 13 '06 #13
On 14/06/2006 8:38 AM, Robert Kern wrote:
Gary Herron wrote:
John Machin wrote:
On 13/06/2006 6:28 PM, Paul McGuire wrote:

(Oh, and I like groupby too! Combine it with sort to quickly create
histograms.)

# tally a histogram of a list of values from 1-10
dataValueRange = range(1,11)
data = [random.choice(dataValueRange) for i in xrange(10000)]

hist = [ (k,len(list(g))) for k,g in itertools.groupby(sorted(data)) ]
That len(list(g)) looks like it uses O(N) memory just to find out what N
is :-(

Not at all! A python list *knows* its length at all times. len() is a
constant time lookup of an internal attribute.


The point is that you had to create the list in the first place. g is an iterator.


I didn't have to create a list in the first place. Paul did. The point
of my post was to avoid the memory grab caused by list(g) by seeking a
way that just counted g's output.

Sorry for the confusion my lack of clarity has evidently caused. I'll
rephrase:

That whole construct
len(list(g))
looks like it uses O(N) memory just to find out what N is.
Better?
Jun 13 '06 #14
John Machin" <sj******@lexicon.net> wrote in message
news:44**************@lexicon.net...
On 14/06/2006 8:38 AM, Robert Kern wrote:
Gary Herron wrote:
John Machin wrote:

On 13/06/2006 6:28 PM, Paul McGuire wrote:

> (Oh, and I like groupby too! Combine it with sort to quickly create
> histograms.)
>
> # tally a histogram of a list of values from 1-10
> dataValueRange = range(1,11)
> data = [random.choice(dataValueRange) for i in xrange(10000)]
>
> hist = [ (k,len(list(g))) for k,g in itertools.groupby(sorted(data)) ] That len(list(g)) looks like it uses O(N) memory just to find out what N is :-(
Not at all! A python list *knows* its length at all times. len() is a
constant time lookup of an internal attribute.


The point is that you had to create the list in the first place. g is an iterator.


I didn't have to create a list in the first place. Paul did. The point
of my post was to avoid the memory grab caused by list(g) by seeking a
way that just counted g's output.

Sorry for the confusion my lack of clarity has evidently caused. I'll
rephrase:

That whole construct
len(list(g))
looks like it uses O(N) memory just to find out what N is.
Better?


Ok, ok!!! Here's a non-O(N) memory allocation, using a generator expression
to count the number of items in the list.

hist = [ (k,sum(1 for _g in g)) for k,g in itertools.groupby(sorted(data)) ]

-- Paul
Jun 14 '06 #15

Benji York wrote:
Frank Millman wrote:
reader = csv.reader(open('trans.csv', 'rb'))
rows = []
for row in reader:
rows.append(row)


Why do you create a list of rows instead of just iterating over the
reader directly?
--
Benji York


A - didn't think of it - good idea

B - can't always do it -
B1 - if the file is not sorted, I have to sort the rows first
B2 - if I need to update the file, I can modify the rows in place, and
then call
csv.writer(open('trans.csv','wb')).writerows(rows)

BTW, I know that B2 is simplistic - to be safe I should rename, then
write, then unlink. I will do that for production code.

BTW2, an alternative to B2 is
reader = csv.reader(open('trans.csv', 'rb'))
newtrans = open('newtrans.csv','wb')
writer = csv.writer(newtrans)
for row in reader:
[process and modify row]
writer.writerow(row)
newtrans.close()
[unlink and rename]

Could be useful if the file is large. Food for thought.

Thanks

Frank

Jun 14 '06 #16
Frank Millman <fr***@chagford.com> wrote:
Benji York wrote:
Frank Millman wrote:
reader = csv.reader(open('trans.csv', 'rb'))
rows = []
for row in reader:
rows.append(row)


Why do you create a list of rows instead of just iterating over the
reader directly?
--
Benji York


A - didn't think of it - good idea

B - can't always do it -
B1 - if the file is not sorted, I have to sort the rows first
B2 - if I need to update the file, I can modify the rows in place, and
then call
csv.writer(open('trans.csv','wb')).writerows(rows)

BTW, I know that B2 is simplistic - to be safe I should rename, then
write, then unlink. I will do that for production code.

BTW2, an alternative to B2 is
reader = csv.reader(open('trans.csv', 'rb'))
newtrans = open('newtrans.csv','wb')
writer = csv.writer(newtrans)
for row in reader:
[process and modify row]
writer.writerow(row)
newtrans.close()
[unlink and rename]

Could be useful if the file is large. Food for thought.


BTW, if and when you do need a list for some other purpose,

rows = list(reader)

may be slightly better than the for/append loop; and if you need a
sorted list, perhaps

rows = sorted(reader)

similarly.
Alex
Jun 14 '06 #17
James Stroud <js*****@ucla.edu> wrote:
...
def doit(rows, doers, i=0):
for r, alist in groupby(rows, itemgetter(i)):
if len(doers) > 1:
doit(alist, doers[1:], i+1)
doers[0](r)


Isn't this making N useless slices (thus copies, for most kinds of
sequences) for a doers of length N? Since you're passing i anyway, it
seems to me that:

def doit(rows, doers, i=0):
for r, alist in groupby(rows, itemgetter(i)):
if len(doers) > i+1:
doit(alist, doers, i+1)
doers[i](r)

is equivalent to your code, but avoids these slices (thus copies).
Alex
Jun 14 '06 #18
Alex Martelli wrote:
James Stroud <js*****@ucla.edu> wrote:
...
def doit(rows, doers, i=0):
for r, alist in groupby(rows, itemgetter(i)):
if len(doers) > 1:
doit(alist, doers[1:], i+1)
doers[0](r)

Isn't this making N useless slices (thus copies, for most kinds of
sequences) for a doers of length N? Since you're passing i anyway, it
seems to me that:

def doit(rows, doers, i=0):
for r, alist in groupby(rows, itemgetter(i)):
if len(doers) > i+1:
doit(alist, doers, i+1)
doers[i](r)

is equivalent to your code, but avoids these slices (thus copies).
Alex


Yes, it does seem the copies are useless. Thank you.

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
Jun 14 '06 #19
Alex Martelli wrote:
James Stroud <js*****@ucla.edu> wrote:
...
def doit(rows, doers, i=0):
for r, alist in groupby(rows, itemgetter(i)):
if len(doers) > 1:
doit(alist, doers[1:], i+1)
doers[0](r)

Isn't this making N useless slices (thus copies, for most kinds of
sequences) for a doers of length N? Since you're passing i anyway, it
seems to me that:

def doit(rows, doers, i=0):
for r, alist in groupby(rows, itemgetter(i)):
if len(doers) > i+1:
doit(alist, doers, i+1)
doers[i](r)

is equivalent to your code, but avoids these slices (thus copies).
Alex


Actually, I remember why I wrote it that way--because the itemgetter
argument may not start at zero (admitting that I haven't yet played with
itemgetter at all). Maybe pop(0) is better than a copy?
def doit(rows, doers, i=0):
for r, alist in groupby(rows, itemgetter(i)):
doer = doers.pop(0)
if doers: # empty list tests as False
doit(alist, doers, i+1)
doer(r)
James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
Jun 14 '06 #20
James Stroud <js*****@ucla.edu> wrote:
Alex Martelli wrote:
James Stroud <js*****@ucla.edu> wrote:
...
def doit(rows, doers, i=0):
for r, alist in groupby(rows, itemgetter(i)):
if len(doers) > 1:
doit(alist, doers[1:], i+1)
doers[0](r)

Isn't this making N useless slices (thus copies, for most kinds of
sequences) for a doers of length N? Since you're passing i anyway, it
seems to me that:

def doit(rows, doers, i=0):
for r, alist in groupby(rows, itemgetter(i)):
if len(doers) > i+1:
doit(alist, doers, i+1)
doers[i](r)

is equivalent to your code, but avoids these slices (thus copies).
Alex


Actually, I remember why I wrote it that way--because the itemgetter
argument may not start at zero (admitting that I haven't yet played with
itemgetter at all). Maybe pop(0) is better than a copy?
def doit(rows, doers, i=0):
for r, alist in groupby(rows, itemgetter(i)):
doer = doers.pop(0)
if doers: # empty list tests as False
doit(alist, doers, i+1)
doer(r)


No, doers.pop(0) is O(N), just like the slicing doers[1:].

To support the indexing into itemgetter possibly being different from
that into doers (under the hypothesis that efficiency matters here, of
course!-) I'd rather do something like:

def doit(rows, doers, i=0):
L = len(doers)
def _aux(rows, j):
if L <= j: return
doer = doers[j]
for r, alist in groupby(rows, itemgetter(i+j)):
_aux(alist, j+1)
doer(r)
_aux(rows, 0)

haven't tested this, but it seems a straightforward semantics-preserving
transformation -- the hoisting out of the recursion of the computation
of len(), and out of the loop of the termination-test and indexing,
being trivial optimizations (which I'm not averse to using, even though
trivial, because I think they make the code clearer.

Whether all this is worth it, or not, is in a sense besides the point --
I like this anyway, even just as an example of how the best way to
introduce recursion may often be, not in the "official" function (the
one that gets called by client code), but in a "private" auxiliary
function inside the "official" one.
Alex
Jun 15 '06 #21

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

35
by: les_ander | last post by:
Hi, I know that i can do readline() from a file object. However, how can I read till a specific seperator? for exmple, if my files are name profession id #
42
by: Alan McIntyre | last post by:
Hi all, I have a list of items that has contiguous repetitions of values, but the number and location of the repetitions is not important, so I just need to strip them out. For example, if my...
15
by: Jordan Rastrick | last post by:
First, a disclaimer. I am a second year Maths and Computer Science undergraduate, and this is my first time ever on Usenet (I guess I'm part of the http generation). On top of that, I have been...
5
by: chirayuk | last post by:
Hi, I am trying to treat an environment variable as a python list - and I'm sure there must be a standard and simple way to do so. I know that the interpreter itself must use it (to process...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.