By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,117 Members | 2,178 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,117 IT Pros & Developers. It's quick & easy.

"Collapsing" a list into a list of changes

P: n/a
Hi all,

I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with [0,1,2,3,2,4,5].

Here is the way I'm doing this now:

def straightforward_collapse(myList):
collapsed = [myList[0]]
for n in myList[1:]:
if n != collapsed[-1]:
collapsed.append(n)

return collapsed

Is there an elegant way to do this, or should I just stick with the code
above?

Thanks,
Alan McIntyre
http://www.esrgtech.com
Jul 18 '05 #1
Share this Question
Share on Google+
42 Replies


P: n/a
Alan McIntyre wrote:
Hi all,

I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with [0,1,2,3,2,4,5].

Here is the way I'm doing this now:

def straightforward_collapse(myList):
collapsed = [myList[0]]
for n in myList[1:]:
if n != collapsed[-1]:
collapsed.append(n)

return collapsed

Is there an elegant way to do this, or should I just stick with the code
above?


Well, this does about the same thing, but using enumerate and a list
comprehension:

py> lst = [0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> [item for i, item in enumerate(lst) if i == 0 or item != lst[i-1]]
[0, 1, 2, 3, 2, 4, 5]

Similar code that doesn't check 'if i == 0' each time through:

py> itr = enumerate(lst)
py> itr.next()
(0, 0)
py> [lst[0]] + [item for i, item in itr if item != lst[i-1]]
[0, 1, 2, 3, 2, 4, 5]

I don't know if either of these is really more elegant though...

Steve
Jul 18 '05 #2

P: n/a
On Fri, 04 Feb 2005 12:43:37 -0500, Alan McIntyre wrote:
def straightforward_collapse(myList):
collapsed = [myList[0]]
for n in myList[1:]:
if n != collapsed[-1]:
collapsed.append(n)

return collapsed

Is there an elegant way to do this, or should I just stick with the code
above?


I think that's pretty elegant; I read it and immediately understood what
you were doing. There may be some performance tweaks you could make if you
were doing this to large lists, and my instincts say to re-write it as an
iterator if you use it a lot like:

for item in collapse(yourList):

but other than that which may not even apply, "straightforward" is
generally a *good* thing, don't you think? :-)

Jul 18 '05 #3

P: n/a
On Fri, Feb 04, 2005 at 12:43:37PM -0500, Alan McIntyre wrote:
Hi all,

I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with [0,1,2,3,2,4,5].

Here is the way I'm doing this now:

def straightforward_collapse(myList):
collapsed = [myList[0]]
for n in myList[1:]:
if n != collapsed[-1]:
collapsed.append(n)

return collapsed

Is there an elegant way to do this, or should I just stick with the code
above?

If you are using python2.4,
import itertools as it
[x[0] for (x) in it.groupby([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5])] [0, 1, 2, 3, 2, 4, 5]
Since this is 2.4 you could also return a generator expression.
def iter_collapse(myList): .... return (x[0] for (x) in it.groupby([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]))
.... i = iter_collapse([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5])
i <generator object at 0xb7df6b2c> list(i) [0, 1, 2, 3, 2, 4, 5]

-Jack

Jul 18 '05 #4

P: n/a
Your first example is along the lines of what I was thinking when I said
"elegant." :) I was looking for something that I could drop into one
or two lines of code; I may not do that if I'm writing code that will
have to be maintained, but it's still nice to know how to do it.

Thanks :)
Alan

Steven Bethard wrote:
Well, this does about the same thing, but using enumerate and a list
comprehension:

py> lst = [0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> [item for i, item in enumerate(lst) if i == 0 or item != lst[i-1]]
[0, 1, 2, 3, 2, 4, 5]

Similar code that doesn't check 'if i == 0' each time through:

py> itr = enumerate(lst)
py> itr.next()
(0, 0)
py> [lst[0]] + [item for i, item in itr if item != lst[i-1]]
[0, 1, 2, 3, 2, 4, 5]

I don't know if either of these is really more elegant though...

Steve

Jul 18 '05 #5

P: n/a
Alan McIntyre wrote:
Hi all,

I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with [0,1,2,3,2,4,5].

Here is the way I'm doing this now:

def straightforward_collapse(myList):
collapsed = [myList[0]]
for n in myList[1:]:
if n != collapsed[-1]:
collapsed.append(n)

return collapsed

Is there an elegant way to do this, or should I just stick with the code
above?


Here's a solution that works for iterables other than lists:

py> def collapse(iterable):
.... enumeration = enumerate(iterable)
.... _, lastitem = enumeration.next()
.... yield lastitem
.... for i, item in enumeration:
.... if item != lastitem:
.... yield item
.... lastitem = item
....
py> lst = [0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> list(collapse(lst))
[0, 1, 2, 3, 2, 4, 5]

Again, I'm still not sure I'd call this more elegant...

STeVe
Jul 18 '05 #6

P: n/a
I think you're right; sometimes I'm susceptible to the "I can do that in
one line of code" temptation. :)

Since this current bit of code will probably end up in something that's
going to be maintained, I will probably stick with the straightforward
method just to be nice to the maintainer (especially if it's me!).

Jeremy Bowers wrote:
I think that's pretty elegant; I read it and immediately understood what
you were doing. There may be some performance tweaks you could make if you
were doing this to large lists, and my instincts say to re-write it as an
iterator if you use it a lot like:

for item in collapse(yourList):

but other than that which may not even apply, "straightforward" is
generally a *good* thing, don't you think? :-)

Jul 18 '05 #7

P: n/a
Jack,

I'm not using 2.4 yet; still back in 2.3x. :) Thanks for the examples,
though - they are clear enough that I will probably use them when I upgrade.

Thanks,
Alan

Jack Diederich wrote:
If you are using python2.4,

import itertools as it
[x[0] for (x) in it.groupby([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5])]
[0, 1, 2, 3, 2, 4, 5]
Since this is 2.4 you could also return a generator expression.

def iter_collapse(myList):
... return (x[0] for (x) in it.groupby([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]))
...
i = iter_collapse([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5])
i
<generator object at 0xb7df6b2c>
list(i)


[0, 1, 2, 3, 2, 4, 5]

-Jack

Jul 18 '05 #8

P: n/a
Alan McIntyre wrote:
....
I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with
[0,1,2,3,2,4,5].
....
Is there an elegant way to do this, or should I just stick with the
code above?

def changes( dataset ): .... last = None
.... for value in dataset:
.... if value != last:
.... yield value
.... last = value
.... print list(changes(data ))


which is quite readable/elegant IMO.

Have fun,
Mike

________________________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://www.vrplumber.com
http://blog.vrplumber.com
PyCon is coming...

Jul 18 '05 #9

P: n/a
Mike C. Fletcher wrote:
Alan McIntyre wrote:
...
I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with
[0,1,2,3,2,4,5].

...
Is there an elegant way to do this, or should I just stick with the
code above?


>>> def changes( dataset ):

... last = None
... for value in dataset:
... if value != last:
... yield value
... last = value
... >>> print list(changes(data ))

which is quite readable/elegant IMO.


But fails if the list starts with None:

py> lst = [None,0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> def changes(dataset):
.... last = None
.... for value in dataset:
.... if value != last:
.... yield value
.... last = value
....
py> list(changes(lst))
[0, 1, 2, 3, 2, 4, 5]

A minor modification that does appear to work:

py> def changes(dataset):
.... last = object()
.... for value in dataset:
.... if value != last:
.... yield value
.... last = value
....
py> list(changes(lst))
[None, 0, 1, 2, 3, 2, 4, 5]

STeVe
Jul 18 '05 #10

P: n/a
Steven Bethard <st************@gmail.com> writes:
Mike C. Fletcher wrote:

[...]
>>> def changes( dataset ):

... last = None
... for value in dataset:
... if value != last:
... yield value
... last = value
... >>> print list(changes(data ))
which is quite readable/elegant IMO.


But fails if the list starts with None:

py> lst = [None,0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> def changes(dataset):
... last = None
... for value in dataset:
... if value != last:
... yield value
... last = value
...
py> list(changes(lst))
[0, 1, 2, 3, 2, 4, 5]

A minor modification that does appear to work:

py> def changes(dataset):
... last = object()
... for value in dataset:
... if value != last:
... yield value
... last = value
...
py> list(changes(lst))
[None, 0, 1, 2, 3, 2, 4, 5]


Unless the first object in the list has a weird __cmp__ (does
happen...). OK, weird __cmp__s are nasty anyway, but still, why
compound it through cleverness when you can write a really plodding
function that *always* does what it says on the tin?

clever-is-evil-ly y'rs,
John
Jul 18 '05 #11

P: n/a
John J. Lee wrote:
Steven Bethard <st************@gmail.com> writes:

Mike C. Fletcher wrote:


[...]
>>> def changes( dataset ):
... last = None
... for value in dataset:
... if value != last:
... yield value
... last = value
... >>> print list(changes(data ))
which is quite readable/elegant IMO.


But fails if the list starts with None:

py> lst = [None,0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> def changes(dataset):
... last = None
... for value in dataset:
... if value != last:
... yield value
... last = value
...
py> list(changes(lst))
[0, 1, 2, 3, 2, 4, 5]

A minor modification that does appear to work:

py> def changes(dataset):
... last = object()
... for value in dataset:
... if value != last:
... yield value
... last = value
...
py> list(changes(lst))
[None, 0, 1, 2, 3, 2, 4, 5]

Unless the first object in the list has a weird __cmp__ (does
happen...). OK, weird __cmp__s are nasty anyway, but still, why
compound it through cleverness when you can write a really plodding
function that *always* does what it says on the tin?


In that case:

Py> def collapsed(iterable):
.... itr = iter(iterable)
.... last = itr.next()
.... yield last
.... for value in itr:
.... if value != last:
.... yield value
.... last = value
....
Py> from Tkinter import _flatten #**
Py> lst = list(_flatten([[i] * 5 for i in range(10)]))
Py> lst
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5
, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9]
Py> list(collapsed(lst))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

** Hmm, this function really is quite handy when generating test data. Maybe in
Python 2.5 the line should be spelt "from itertools import flatten"

Cheers,
Nick.

--
Nick Coghlan | nc******@email.com | Brisbane, Australia
---------------------------------------------------------------
http://boredomandlaziness.skystorm.net
Jul 18 '05 #12

P: n/a
Jack Diederich wrote:
Since this is 2.4 you could also return a generator expression.

def iter_collapse(myList):


... return (x[0] for (x) in it.groupby([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]))
...


But why write a function that returns a generator expression, when you could
just turn the function itself into a generator?

Py>def iter_collapse(myList):
.... for x in it.groupby(myList):
.... yield x[0]

Cheers,
Nick.

--
Nick Coghlan | nc******@email.com | Brisbane, Australia
---------------------------------------------------------------
http://boredomandlaziness.skystorm.net
Jul 18 '05 #13

P: n/a
Steven Bethard <st************@gmail.com> wrote:
Here's a solution that works for iterables other than lists:

py> def collapse(iterable):
... enumeration = enumerate(iterable)
... _, lastitem = enumeration.next()
... yield lastitem
... for i, item in enumeration:
... if item != lastitem:
... yield item
... lastitem = item
...
py> lst = [0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> list(collapse(lst))
[0, 1, 2, 3, 2, 4, 5]

Again, I'm still not sure I'd call this more elegant...


Hmmmm, what role does the enumeration play here? I don't see how you're
using it, at all. Why not just:

def collapse(iterable):
it = iter(iterable)
lastitem = it.next()
yield lastitem
for item in it:
if item != lastitem:
yield item
lastitem = item

that's basically just the same as your code but without the strangeness
of making an enumerate for the purpose of ignoring what the enumerate
adds to an ordinary iterator.
Alex
Jul 18 '05 #14

P: n/a

Alan McIntyre wrote:
Hi all,

I have a list of items that has contiguous repetitions of values, but the number and location of the repetitions is not important, so I just need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with [0,1,2,3,2,4,5].
Here is the way I'm doing this now:

def straightforward_collapse(myList):
collapsed = [myList[0]]
for n in myList[1:]:
if n != collapsed[-1]:
collapsed.append(n)

return collapsed

Is there an elegant way to do this, or should I just stick with the code above?

Thanks,
Alan McIntyre
http://www.esrgtech.com


Here is a version using dictionary properties, ie no duplication of
keys.

def condense(l):
d={}
for item in l:
d[item]=1
l=d.keys()
return l

Cheers
Tony

Jul 18 '05 #15

P: n/a
Tony,

Actually I only want to remove a certain kind of duplication; if an item
occurs twice - say like this: [1,1,1,2,2,2,1,1,1], then I need to keep
the order and occurrence of the individual values: [1,2,1]. Using a
dict as you proposed loses the order of occurrence, as well as multiple
occurrences of groups of the same item.

If I didn't need those two qualities of the list to be preserved,
though, I think I'd use something like your solution (if I was using a
Python older than 2.3) or Steve Coats' solution posted above using Set.

Thanks!
Alan

Tony wrote:
Here is a version using dictionary properties, ie no duplication of
keys.

def condense(l):
d={}
for item in l:
d[item]=1
l=d.keys()
return l

Cheers
Tony

Jul 18 '05 #16

P: n/a
Alex,

Wow, that method turns out to be the fastest so far in a simple
benchmark on Python2.3 (on my machine, of course, YMMV); it takes 14%
less time than the one that I deemed most straightforward. :)

Thanks,
Alan

Alex Martelli wrote:
Hmmmm, what role does the enumeration play here? I don't see how you're
using it, at all. Why not just:

def collapse(iterable):
it = iter(iterable)
lastitem = it.next()
yield lastitem
for item in it:
if item != lastitem:
yield item
lastitem = item

that's basically just the same as your code but without the strangeness
of making an enumerate for the purpose of ignoring what the enumerate
adds to an ordinary iterator.
Alex

Jul 18 '05 #17

P: n/a
On Sat, Feb 05, 2005 at 02:31:08PM +1000, Nick Coghlan wrote:
Jack Diederich wrote:
Since this is 2.4 you could also return a generator expression.

>def iter_collapse(myList):


... return (x[0] for (x) in
it.groupby([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]))
...


But why write a function that returns a generator expression, when you
could just turn the function itself into a generator?

Py>def iter_collapse(myList):
... for x in it.groupby(myList):
... yield x[0]

Here is where I say, "because it is faster!"
except it isn't. maybe. The results are unexpected, anyway.

import timeit
import itertools as it

def collapse1(myl):
for x in it.groupby(myl):
yield x[0]

def collapse2(myl):
return (x[0] for (x) in it.groupby(myl))

list_str = '[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]'

print "collapse1", timeit.Timer('collapse1(%s)' % (list_str), 'from __main__ import collapse1').timeit()
print "collapse2", timeit.Timer('collapse2(%s)' % (list_str), 'from __main__ import collapse2').timeit()
print "list(collapse1)", timeit.Timer('list(collapse1(%s))' % (list_str), 'from __main__ import collapse1').timeit()
print "list(collapse2)", timeit.Timer('list(collapse2(%s))' % (list_str), 'from __main__ import collapse2').timeit()

collapse1 1.06855082512
collapse2 3.40627384186
list(collapse1) 8.31489896774
list(collapse2) 9.49903011322

The overhead of creating the generator expression seems much higher than
creating the equivalent function. If you subtract our the setup difference
actually running through the whole iterator is slightly faster for genexps
than functions that yield. At a guess it has something to do with how
they handle lexical scope and some byte code differences.

I said guess, right?

-Jack
Jul 18 '05 #18

P: n/a

Alan McIntyre wrote:
Tony,

Actually I only want to remove a certain kind of duplication; <snip>


How about this one liner?
def condense(m):
print [m[0]]+[m[k] for k in range(1,len(m)) if
m[k]!=m[k-1]]
b=[1,1,1,2,2,2,1,1,1]
condense(b)
Tony Clarke

Jul 18 '05 #19

P: n/a
Alex Martelli wrote:
Steven Bethard <st************@gmail.com> wrote:

Here's a solution that works for iterables other than lists:

py> def collapse(iterable):
... enumeration = enumerate(iterable)
... _, lastitem = enumeration.next()
... yield lastitem
... for i, item in enumeration:
... if item != lastitem:
... yield item
... lastitem = item
...
py> lst = [0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> list(collapse(lst))
[0, 1, 2, 3, 2, 4, 5]

Again, I'm still not sure I'd call this more elegant...

Hmmmm, what role does the enumeration play here?


None =) Sorry, it was an accidental carry-over from the list solution
that looked at lst[i-1]. I thought about correcting it when I realized
this (after posting), but I was too lazy. ;)

Steve
Jul 18 '05 #20

P: n/a
Thanks to everybody that responded; I appreciate all the input, even if
I didn't respond to all of it individually. :)
Jul 18 '05 #21

P: n/a
This is a prefect use case for the good old "reduce" function:

--BEGIN SNAP

a_lst = [None,0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]

def straightforward_collapse(lst):
return reduce(lambda v,e: v[-1]!=e and v+[e] or v, lst[1:], [lst[0]])

def straightforward_collapse_secu(lst):
return lst and reduce(lambda v,e: v[-1]!=e and v+[e] or v, lst[1:],
[lst[0]]) or []

print straightforward_collapse(a_lst)
print straightforward_collapse_secu([])

--END SNAP

Regards

Francis Girard

Le vendredi 4 Février 2005 20:08, Steven Bethard a écrit*:
Mike C. Fletcher wrote:
Alan McIntyre wrote:
...
I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with
[0,1,2,3,2,4,5].


...
Is there an elegant way to do this, or should I just stick with the
code above?

>>> def changes( dataset ):


... last = None
... for value in dataset:
... if value != last:
... yield value
... last = value
... >>> print list(changes(data ))

which is quite readable/elegant IMO.


But fails if the list starts with None:

py> lst = [None,0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> def changes(dataset):
... last = None
... for value in dataset:
... if value != last:
... yield value
... last = value
...
py> list(changes(lst))
[0, 1, 2, 3, 2, 4, 5]

A minor modification that does appear to work:

py> def changes(dataset):
... last = object()
... for value in dataset:
... if value != last:
... yield value
... last = value
...
py> list(changes(lst))
[None, 0, 1, 2, 3, 2, 4, 5]

STeVe


Jul 18 '05 #22

P: n/a
Francis Girard wrote:
This is a prefect use case for the good old "reduce" function:

--BEGIN SNAP

a_lst = [None,0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]

def straightforward_collapse(lst):
return*reduce(lambda*v,e:*v[-1]!=e*and*v+[e]*or*v,*lst[1:],*[lst[0]])


reduce() magically increases the number of function calls by len(a_list).
There will be no good use cases for reduce() until Python can inline
functions.

Apodictically Yours
Peter

Jul 18 '05 #23

P: n/a
Alan McIntyre <al***********@esrgtech.com> wrote:
Hi all,

I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with [0,1,2,3,2,4,5].

Here is the way I'm doing this now:

def straightforward_collapse(myList):
collapsed = [myList[0]]
for n in myList[1:]:
if n != collapsed[-1]:
collapsed.append(n)

return collapsed

Is there an elegant way to do this, or should I just stick with the code
above?
p=[1,1,1,1,1,4,4,4,8,8,9]
filter(lambda y: y>0, map(lambda x,y: x==y and -1 or x,[0]+p,p+[0])) [1, 4, 8, 9]

Z powazaniem
Adam Przybyla
Jul 18 '05 #24

P: n/a
Zut !

I'm very sorry that there is no good use case for the "reduce" function in
Python, like Peter Otten pretends. That's an otherwise very useful tool for
many use cases. At least on paper.

Python documentation should say "There is no good use case for the reduce
function in Python and we don't know why we bother you offering it."

Francis Girard

Le lundi 7 Février 2005 08:24, Peter Otten a écrit*:
Francis Girard wrote:
This is a prefect use case for the good old "reduce" function:

--BEGIN SNAP

a_lst = [None,0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]

def straightforward_collapse(lst):
return*reduce(lambda*v,e:*v[-1]!=e*and*v+[e]*or*v,*lst[1:],*[lst[0]])


reduce() magically increases the number of function calls by len(a_list).
There will be no good use cases for reduce() until Python can inline
functions.

Apodictically Yours
Peter


Jul 18 '05 #25

P: n/a
Francis Girard wrote:
Python documentation should say "There is no good use case for the reduce
function in Python and we don't know why we bother you offering it."


Submit a patch :-)

Peter
Jul 18 '05 #26

P: n/a
Francis Girard wrote:
I'm very sorry that there is no good use case for the "reduce" function in
Python, like Peter Otten pretends. That's an otherwise very useful tool for
many use cases. At least on paper.


Clarity aside[1], can you give an example of where reduce is as
efficient as the eqivalent loop in Python?

Steve

[1] I generally find most reduce solutions much harder to read, but
we'll set aside my personal limitations for the moment. ;)
Jul 18 '05 #27

P: n/a
On Mon, Feb 07, 2005 at 07:07:10PM +0100, Francis Girard wrote:
Zut !

I'm very sorry that there is no good use case for the "reduce" function in
Python, like Peter Otten pretends. That's an otherwise very useful tool for
many use cases. At least on paper.

Python documentation should say "There is no good use case for the reduce
function in Python and we don't know why we bother you offering it."


I am guessing you are joking, right? I think Peter exaggerates when he
says that "there will be no good use cases for reduce"; it is very
useful, in writing very compact code when it does exactly what you
want (and you have to go through hoops to do it otherwise). It also
can be the fastest way to do something. For example, the fastest way
to get the factorial of a (small enough) number in pure python is

factorial = lambda n: reduce(operator.mul, range(1, n+1))

--
John Lenton (jo**@grulic.org.ar) -- Random fortune:
Laugh and the world thinks you're an idiot.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCB7i6gPqu395ykGsRAvzgAKCnnxlkuwfR5sHRDd8zCi mRTFi5rQCgoCwB
oRkSfburxlGnGrbQbhCZpho=
=2pvS
-----END PGP SIGNATURE-----

Jul 18 '05 #28

P: n/a
No. Just wanted to naively use one of the base tool Python had to offer. I
tought it was very usable and very readable. I might be wrong.

Is there someone on this list using this tool and happy with it ? Or is my
mind too much targeted on FP paradigm and most of you really think that all
the functions that apply another function to each and every elements of a
list are bad (like "reduce", "map", "filter") ?

Francis Girard

Le lundi 7 Février 2005 19:25, Steven Bethard a écrit*:
Francis Girard wrote:
I'm very sorry that there is no good use case for the "reduce" function
in Python, like Peter Otten pretends. That's an otherwise very useful
tool for many use cases. At least on paper.


Clarity aside[1], can you give an example of where reduce is as
efficient as the eqivalent loop in Python?

Steve

[1] I generally find most reduce solutions much harder to read, but
we'll set aside my personal limitations for the moment. ;)


Jul 18 '05 #29

P: n/a
Francis Girard wrote:
Is there someone on this list using this tool and happy with it ? Or is my
mind too much targeted on FP paradigm and most of you really think that all
the functions that apply another function to each and every elements of a
list are bad (like "reduce", "map", "filter") ?


I know there are definitely proponents for map and filter, especially
for simple cases like:

map(int, lst)
filter(str.strip, lst)

Note that, unlike reduce, map and filter aren't really going to increase
the number of function calls. Consider the equivalent list comprehensions:

[int(x) for x in lst]
[x for x in lst if str.strip(x)] [1]

The list comprehensions also require the same number of function calls
in these cases. Of course, in cases where a function does not already
exist, map and filter will require more function calls. Compare:

map(lambda x: x**2 + 1, lst)

with

[x**2 + 1 for x in lst]

Where the LC allows you to essentially "inline" the function. (You can
dis.dis these to see the difference if you like.)
As far as my personal preferences go, while the simple cases of map and
filter (the ones using existing functions) are certainly easy enough for
me to read, for more complicated cases, I find things like:

[x**2 + 1 for x in lst]
[x for x in lst if (x**2 + 1) % 3 == 1]

much more readable than the corresponding:

map(lambda x: x**2 + 1, lst)
filter(lambda x: (x**2 + 1) % 3 == 1, lst)

especially since I avoid lambda usage, and would have to write these as:

def f1(x):
return x**2 + 1
map(f1, lst)

def f2(x):
return (x**2 + 1) % 3 == 1
map(f2, lst)

(I actually find the non-lambda code clearer, but still more complicated
than the list comprehensions.)

Given that I use list comprehensions for the complicated cases, I find
it to be more consistent if I use list comprehensions in all cases, so I
even tend to avoid the simple map and filter cases.
As far as reduce goes, I've never seen code that I thought was clearer
using reduce than using a for-loop. I have some experience with FP, and
I can certainly figure out what a given reduce call is doing given
enough time, but I can always understand the for-loop version much more
quickly.
Of course, YMMV.

STeVe
[1] While it's not technically equivalent, this would almost certainly
be written as:
[x for x in lst if x.strip()]
which in fact takes better advantage of Python's duck-typing -- it will
work for unicode objects too.
Jul 18 '05 #30

P: n/a
John Lenton wrote:
For example, the fastest way
to get the factorial of a (small enough) number in pure python is

factorial = lambda n: reduce(operator.mul, range(1, n+1))


Gah! I'll never understand why people use lambda, which is intended to
create _anonymous_ functions, to create named functions! This code is
basically equivalent to:

def factorial(n):
return reduce(operator.mul, range(1, n+1))

except that factorial.__name__ is now the meaningful 'factorial',
instead of '<lambda>'. See "Inappropriate use of Lambda" in
http://www.python.org/moin/DubiousPython
My personal beef with the inappropriate use of lambda aside, and
ignoring the fact that the reduce solution as written fails for
factorial(0), some timings are in order. Given fact.py:

----------------------------------------------------------------------
import operator

def f1(n):
result = 1
for x in xrange(1, n+1):
result *= x
return result

def f2(n):
return reduce(operator.mul, range(1, n+1))

def f3(n):
return reduce(operator.mul, xrange(1, n+1))
----------------------------------------------------------------------

I get the following timeit results (with Python 2.4):

$ python -m timeit -s "import fact" "[fact.f1(x) for x in xrange(1, 100)]"
100 loops, best of 3: 2.53 msec per loop

$ python -m timeit -s "import fact" "[fact.f2(x) for x in xrange(1, 100)]"
100 loops, best of 3: 2.46 msec per loop

$ python -m timeit -s "import fact" "[fact.f3(x) for x in xrange(1, 100)]"
100 loops, best of 3: 2.17 msec per loop

So it does look like reduce is actually faster in this case, especially
if you use xrange instead of range.

Thanks for the example!

Steve
Jul 18 '05 #31

P: n/a
Le lundi 7 Février 2005 19:51, John Lenton a écritÂ*:
On Mon, Feb 07, 2005 at 07:07:10PM +0100, Francis Girard wrote:
Zut !

I'm very sorry that there is no good use case for the "reduce" function
in Python, like Peter Otten pretends. That's an otherwise very useful
tool for many use cases. At least on paper.

Python documentation should say "There is no good use case for the reduce
function in Python and we don't know why we bother you offering it."
I am guessing you are joking, right?


Of course I am joking. I meant the exact contrary. Only wanted to show that
Peter did exaggerate. Thank you for citing me with the full context.
I think Peter exaggerates when he
says that "there will be no good use cases for reduce"; it is very
useful, in writing very compact code when it does exactly what you
want (and you have to go through hoops to do it otherwise). It also
can be the fastest way to do something. For example, the fastest way
to get the factorial of a (small enough) number in pure python is

factorial = lambda n: reduce(operator.mul, range(1, n+1))


Great.

Regards

Francis Girard

Jul 18 '05 #32

P: n/a
Le lundi 7 Février 2005 20:30, Steven Bethard a écrit*:
Francis Girard wrote:
Is there someone on this list using this tool and happy with it ? Or is
my mind too much targeted on FP paradigm and most of you really think
that all the functions that apply another function to each and every
elements of a list are bad (like "reduce", "map", "filter") ?
I know there are definitely proponents for map and filter, especially
for simple cases like:

map(int, lst)
filter(str.strip, lst)

Note that, unlike reduce, map and filter aren't really going to increase
the number of function calls. Consider the equivalent list comprehensions:

[int(x) for x in lst]
[x for x in lst if str.strip(x)] [1]

The list comprehensions also require the same number of function calls
in these cases. Of course, in cases where a function does not already
exist, map and filter will require more function calls. Compare:

map(lambda x: x**2 + 1, lst)

with

[x**2 + 1 for x in lst]

Where the LC allows you to essentially "inline" the function. (You can
dis.dis these to see the difference if you like.)

I see.


As far as my personal preferences go, while the simple cases of map and
filter (the ones using existing functions) are certainly easy enough for
me to read, for more complicated cases, I find things like:

[x**2 + 1 for x in lst]
[x for x in lst if (x**2 + 1) % 3 == 1]

much more readable than the corresponding:

map(lambda x: x**2 + 1, lst)
filter(lambda x: (x**2 + 1) % 3 == 1, lst)

I agree.
especially since I avoid lambda usage, and would have to write these as:
Why avoid "lambda" usage ? You find them too difficult to read (I mean in
general) ?

def f1(x):
return x**2 + 1
map(f1, lst)

def f2(x):
return (x**2 + 1) % 3 == 1
map(f2, lst)

(I actually find the non-lambda code clearer, but still more complicated
than the list comprehensions.)

Given that I use list comprehensions for the complicated cases, I find
it to be more consistent if I use list comprehensions in all cases, so I
even tend to avoid the simple map and filter cases.
As far as reduce goes, I've never seen code that I thought was clearer
using reduce than using a for-loop. I have some experience with FP, and
I can certainly figure out what a given reduce call is doing given
enough time, but I can always understand the for-loop version much more
quickly.

I think it's a question of habits. But I agree that we should always go with
code we find easy to read and that we think others find the same.
I think I will stop using that function in Python if Python practionners find
it difficult to read in general. There is no point in being cool just for
being cool.

Thank you
Francis Girard

Of course, YMMV.

STeVe
[1] While it's not technically equivalent, this would almost certainly
be written as:
[x for x in lst if x.strip()]
which in fact takes better advantage of Python's duck-typing -- it will
work for unicode objects too.


Jul 18 '05 #33

P: n/a
Francis Girard wrote:
Le lundi 7 Février 2005 20:30, Steven Bethard a écrit :
especially since I avoid lambda usage, and would have to write these as:


Why avoid "lambda" usage ? You find them too difficult to read (I mean in
general) ?


Yup, basically a readability thing. I also tend to find that if I
actually declare the function, I can often find a way to refactor things
to make that function useful in more than one place.

Additionally, 'lambda' is on Guido's regrets list, so I'm avoiding it's
use in case it gets yanked for Python 3.0. I think most code can be
written now without it, and in most cases code so written is clearer.
Probably worth looking at is a thread I started that went through some
stdlib uses of lambda and how they could be rewritten:

http://mail.python.org/pipermail/pyt...er/257990.html

Many were rewritable with def statements, list comprehensions, the
operator module functions, or unbound or bound methods.

Steve
Jul 18 '05 #34

P: n/a
John Lenton wrote:
On Mon, Feb 07, 2005 at 07:07:10PM +0100, Francis Girard wrote:
Zut !

I'm very sorry that there is no good use case for the "reduce" function
in Python, like Peter Otten pretends. That's an otherwise very useful
tool for many use cases. At least on paper.

Python documentation should say "There is no good use case for the reduce
function in Python and we don't know why we bother you offering it."


I am guessing you are joking, right? I think Peter exaggerates when he
says that "there will be no good use cases for reduce"; it is very
useful, in writing very compact code when it does exactly what you
want (and you have to go through hoops to do it otherwise). It also
can be the fastest way to do something. For example, the fastest way
to get the factorial of a (small enough) number in pure python is

factorial = lambda n: reduce(operator.mul, range(1, n+1))


I see that you are seduced by the beauty of the expression. Otherwise, if
you would really care for speed:

$ cat factorial.py

#ugly but efficient
factorial = [1]
for i in range(1, 50):
factorial.append(i*factorial[-1])
import operator
def factorial_reduce(n):
return reduce(operator.mul, xrange(1, n+1), 1)

if __name__ == "__main__":
assert factorial[0] == factorial_reduce(0) == 1
assert factorial[5] == factorial_reduce(5) == 1*2*3*4*5

$ python2.4 -m timeit -s'import factorial as f' 'f.factorial_reduce(30)'
100000 loops, best of 3: 13.3 usec per loop
$ python2.4 -m timeit -s'import factorial as f' 'f.factorial[30]'
1000000 loops, best of 3: 0.328 usec per loop

Having factorial raise a UseStirlingsFormulaError for large values is left
as an exercise to the reader :-)
Peter

Jul 18 '05 #35

P: n/a
On Mon, Feb 07, 2005 at 09:39:11PM +0100, Peter Otten wrote:
John Lenton wrote:
For example, the fastest way
to get the factorial of a (small enough) number in pure python is

factorial = lambda n: reduce(operator.mul, range(1, n+1))


I see that you are seduced by the beauty of the expression. Otherwise, if
you would really care for speed:

[...]


that's cheating: you moved the calculation into the setup. You aren't
timing what you say you are.

--
John Lenton (jo**@grulic.org.ar) -- Random fortune:
I'd rather push my Harley than ride a rice burner.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCB9WegPqu395ykGsRAmQFAJ96qzLBiM+/DmYt7+Snpf2jGYoNYACcD70U
DkQ/nRGUFFex22oWpTQ7zpc=
=zvS1
-----END PGP SIGNATURE-----

Jul 18 '05 #36

P: n/a
Le lundi 7 Février 2005 21:21, Steven Bethard a écrit*:
Francis Girard wrote:
Le lundi 7 Février 2005 20:30, Steven Bethard a écrit :
especially since I avoid lambda usage, and would have to write these as:


Why avoid "lambda" usage ? You find them too difficult to read (I mean in
general) ?


Yup, basically a readability thing. I also tend to find that if I
actually declare the function, I can often find a way to refactor things
to make that function useful in more than one place.

Additionally, 'lambda' is on Guido's regrets list, so I'm avoiding it's
use in case it gets yanked for Python 3.0. I think most code can be
written now without it, and in most cases code so written is clearer.
Probably worth looking at is a thread I started that went through some
stdlib uses of lambda and how they could be rewritten:

http://mail.python.org/pipermail/pyt...er/257990.html

Many were rewritable with def statements, list comprehensions, the
operator module functions, or unbound or bound methods.

Steve


I see. I personnaly use them frequently to bind an argument of a function with
some fixed value. Modifying one of the example in
http://mail.python.org/pipermail/pyt...er/257990.html
I frequently have something like :

SimpleXMLRPCServer.py: server.register_1arg-place_function(lambda x: x+2,
'add2')

If "Guido" don't like lambdas, he would have to give me some way to easily do
this. Something like (or similar to fit Python syntax) :

SimpleXMLRPCServer.py: server.register_1arg-place_function(\+2, 'add2')

This would be great.

Regards

Francis Girard

server.register_function(operator.add, 'add')
Jul 18 '05 #37

P: n/a
Francis Girard wrote:
I see. I personnaly use them frequently to bind an argument of a function with
some fixed value. Modifying one of the example in
http://mail.python.org/pipermail/pyt...er/257990.html
I frequently have something like :

SimpleXMLRPCServer.py: server.register_1arg-place_function(lambda x: x+2,
'add2')


PEP 309 has already been accepted, so I assume it will appear in Python 2.5:

http://www.python.org/peps/pep-0309.html

Your code could be written as:

server.register_1arg-place_function(partial(operator.add, 2))

If your task is actually what you describe above -- "to bind an argument
of a function with some fixed value"[1] -- then in general, you should
be able to write this as[2]:

function_with_fixed_value = partial(function, fixed_value)
Steve

[1] As opposed to binding a name to be used in an _expression_ as you do
in your example.

[2] The partial function can also be used to fix multiple argument
values and keyword argument values, if that's necessary for your purposes.
Jul 18 '05 #38

P: n/a
John Lenton wrote:
On Mon, Feb 07, 2005 at 09:39:11PM +0100, Peter Otten wrote:
John Lenton wrote:
> For example, the fastest way
> to get the factorial of a (small enough) number in pure python is
>
> factorial = lambda n: reduce(operator.mul, range(1, n+1))


I see that you are seduced by the beauty of the expression. Otherwise, if
you would really care for speed:

[...]


that's cheating: you moved the calculation into the setup. You aren't
timing what you say you are.


*You* are cheating when you take a predefined function implemented in C
(operator.mul) and then claim you are using pure Python. This gives you the
extra 50 percent that overcompensate the inefficiency of reduce() -- but
only by a hair's breadth:

$ python2.4 -m timeit -s'mul = lambda x, y: x * y' -s'a=1' 'mul(a, a)'
1000000 loops, best of 3: 0.522 usec per loop
$ python2.4 -m timeit -s'from operator import mul' -s'a=1' 'mul(a, a)'
1000000 loops, best of 3: 0.345 usec per loop
Peter


Jul 18 '05 #39

P: n/a
Le lundi 7 Février 2005 22:53, Steven Bethard a écrit*:
Francis Girard wrote:
I see. I personnaly use them frequently to bind an argument of a function
with some fixed value. Modifying one of the example in
http://mail.python.org/pipermail/pyt...er/257990.html
I frequently have something like :

SimpleXMLRPCServer.py: server.register_1arg-place_function(lambda x:
x+2, 'add2')
PEP 309 has already been accepted, so I assume it will appear in Python
2.5:

http://www.python.org/peps/pep-0309.html

Your code could be written as:

server.register_1arg-place_function(partial(operator.add, 2))

If your task is actually what you describe above -- "to bind an argument
of a function with some fixed value"[1] -- then in general, you should
be able to write this as[2]:

function_with_fixed_value = partial(function, fixed_value)


Great ! Great ! Great !
Really, Python is going in the right direction all of the time !
Very good news.

Regards

Francis Girard

Steve

[1] As opposed to binding a name to be used in an _expression_ as you do
in your example.

[2] The partial function can also be used to fix multiple argument
values and keyword argument values, if that's necessary for your purposes.


Jul 18 '05 #40

P: n/a
Peter Otten wrote:
*You* are cheating when you take a predefined function implemented in C
(operator.mul) and then claim you are using pure Python.


operator is a standard module in Python. if you have a desperate need
to be pointless in public, consider using another newsgroup.

</F>

Jul 18 '05 #41

P: n/a
Fredrik Lundh wrote:
Peter Otten wrote:
*You* are cheating when you take a predefined function implemented in C
(operator.mul) and then claim you are using pure Python.


operator is a standard module in Python. if you have a desperate need
to be pointless in public, consider using another newsgroup.

</F>


No.

Peter

Jul 18 '05 #42

P: n/a
[Alan McIntyre]
I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with [0,1,2,3,2,4,5].

import itertools
[k for k, v in itertools.groupby(lst)]

[0, 1, 2, 3, 2, 4, 5]
Raymond Hettinger
Jul 18 '05 #43

This discussion thread is closed

Replies have been disabled for this discussion.