473,572 Members | 3,179 Online

# Profiling, sum-comprehension vs reduce

This must be because of implementation right? Shouldn't reduce be
faster since it iterates once over the list?
doesnt sum first construct the list then sum it?

-----------------------
>>============= =============== ==== RESTART =============== =============== ==
reduce with named function: 37.9864357062
reduce with nested, named function: 39.4710288598
reduce with lambda: 39.2463927678
sum comprehension: 25.9530121845
>>============= =============== ==== RESTART =============== =============== ==
reduce with named function: 36.4529584067
reduce with nested, named function: 37.6278529813
reduce with lambda: 38.2629448715
sum comprehension: 26.0197561422
>>>

from timeit import Timer

return x+y

def rednamed(lst):

def rednn(lst):
return x+y

def redlambda(lst):
return reduce(lambda x,y:x+y, lst)

def com(lst):
return sum(x for x in lst)

s = xrange(101)

t1 = Timer('rednamed (s)', 'from __main__ import rednamed, s')
t2 = Timer('rednn(s) ', 'from __main__ import rednn, s')
t3 = Timer('redlambd a(s)', 'from __main__ import redlambda, s')
t4 = Timer('com(s)', 'from __main__ import com, s')

print "reduce with named function: ", t1.timeit()
print "reduce with nested, named function: ", t2.timeit()
print "reduce with lambda: ", t3.timeit()
print "sum comprehension: ", t4.timeit()
---------------------------------------
also, using range instead of xrange doesnt seem to generate a
performance-penalty:
>>>
reduce with named function: 36.7560729087
reduce with nested, named function: 38.5393266463
reduce with lambda: 38.3852953378
sum comprehension: 27.9001007111
>>>
Sep 13 '08 #1
7 3490
On Sat, 13 Sep 2008 01:06:22 -0700, cnb wrote:
This must be because of implementation right? Shouldn't reduce be faster
since it iterates once over the list? doesnt sum first construct the
list then sum it?
No it doesn't. Why should it?
also, using range instead of xrange doesnt seem to generate a
performance-penalty:
(De)Allocating a list of length 100 isn't very slow. Try some million
elements. And watch the memory consumption too.

Ciao,
Marc 'BlackJack' Rintsch
Sep 13 '08 #2
doesnt sum first construct the list then sum it?
def com(lst):
return sum(x for x in lst)
You construct a generator over an existing list in your code.
Try sum([x for x in lst]) to see the effect of additional list
construction. And while you're at it, try the simple sum(lst).

Cheers,

- harold -
Sep 13 '08 #3
On Sep 13, 10:06*am, cnb <circularf...@y ahoo.sewrote:
This must be because of implementation right? Shouldn't reduce be
faster since it iterates once over the list?
doesnt sum first construct the list then sum it?
No, sum also iterates the sequence just once and doesn't create a
list. It is probably implemented similar to

def sum(sequence, start=0):
it = iter(sequence)
total = start
for i in it:
total += i
return i

but then implemented in C for speed. Reduce is probably implemented
Make sure that you understand the difference between generator
expression and list comprehension, and that [f(x) for x in something]
is (almost) equal to list(f(x) for x in something), so you can emulate
a LC by using the list constructor on the equivalent GE.

HTH,
Bas
Sep 13 '08 #4
On Sat, 13 Sep 2008 01:06:22 -0700, cnb wrote:
This must be because of implementation right? Shouldn't reduce be faster
since it iterates once over the list? doesnt sum first construct the
list then sum it?
What makes you think that?

Given the speed of sum(), it sure doesn't look like it's generating a
full list before summing. Why would it?

reduce with named function: 37.9864357062
reduce with nested, named function: 39.4710288598
reduce with lambda: 39.2463927678
sum comprehension: 25.9530121845
If you want to see reduce really shine, time it with a C-based function
rather than one written in pure Python:
.... 'from operator import add').repeat(nu mber=10000)
[19.724750995635 986, 19.410486936569 214, 19.614511013031 006]
>>>
.... 'def add(x, y): return x+y').repeat(nu mber=10000)
[45.210143089294 434, 44.814558982849 121, 46.906874895095 825]
You probably won't see much (if any) benefit for small lists, so make
sure your test is on a significantly-sized input list.

Of course, sum() is even faster than reduce:
>>Timer('sum(xr ange(10000))'). repeat(number=1 0000)
[9.8149249553680 42, 8.7169640064239 502, 9.5062401294708 252]

--
Steven
Sep 13 '08 #5

Steven D'Aprano wrote:
If you want to see reduce really shine, time it with a C-based function
rather than one written in pure Python:
... 'from operator import add').repeat(nu mber=10000)
[19.724750995635 986, 19.410486936569 214, 19.614511013031 006]
... 'def add(x, y): return x+y').repeat(nu mber=10000)
[45.210143089294 434, 44.814558982849 121, 46.906874895095 825]
....
Of course, sum() is even faster than reduce:
>>>Timer('sum(x range(10000))') .repeat(number= 10000)
[9.8149249553680 42, 8.7169640064239 502, 9.5062401294708 252]
'Of course', because the irreducible difference between reduce(add.seq)
and sum(seq) is that reduce has to call an add function while sum has
the operation built-in in place of a call.

Sep 13 '08 #6
Terry Reedy <tj*****@udel.e duwrites:
>Of course, sum() is even faster than reduce:
>>>>Timer('sum( xrange(10000))' ).repeat(number =10000)
[9.8149249553680 42, 8.7169640064239 502, 9.5062401294708 252]

'Of course', because the irreducible difference between
function while sum has the operation built-in in place of a call.
Note that, despite appearances, it's not as built-in as one might
wish. sum(seq) is still completely generic and works on all
number-like objects (in fact on all objects that define an __add__
operation except strings, which are explicitly forbidden). This means
that it can't just go ahead and execute a C addition, it must properly
call PyNumber_Add (the C API call equivalent to Python's "+"
operator), which will then inspect the objects and invoke the
appropriate implementation of addition. On the other hand,
operator.add is defined to do exactly that -- call PyNumber_Add on its
arguments and return the result.

It's not entirely obvious why summing numbers using the same method,
repeatedly calling PyNumber_Add, would be significantly slower for
reduce than for sum. Admittedly reduce has to execute a Python-level
function call which requires allocating and filling out a tuple, only
to reach PyNumber_Add, which sum calls immediately. But
builtin_reduce( ) is actually careful to optimize those repeated
function calls, for example by reusing the same tuple across the loop.
Sep 13 '08 #7
On Sep 13, 9:27*pm, Hrvoje Niksic <hnik...@xemacs .orgwrote:
Note that, despite appearances, it's not as built-in as one might
wish. *sum(seq) is still completely generic and works on all
number-like objects (in fact on all objects that define an __add__
operation except strings, which are explicitly forbidden). *This means
that it can't just go ahead and execute a C addition, it must properly
call PyNumber_Add (the C API call equivalent to Python's "+"
operator), which will then inspect the objects and invoke the
The time machine strikes again! Try using Py2.6.
The built-in sum() function is much smarter and faster.
It does in fact use C addition.
Raymond
Sep 16 '08 #8

This thread has been closed and replies have been disabled. Please start a new discussion.