473,508 Members | 2,303 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Feature suggestion: sum() ought to use a compensated summation algorithm


I did the following calculation: Generated a list of a million random
numbers between 0 and 1, constructed a new list by subtracting the mean
value from each number, and then calculated the mean again.

The result should be 0, but of course it will differ from 0 slightly
because of rounding errors.

However, I noticed that the simple Python program below gives a result
of ~ 10^-14, while an equivalent Mathematica program (also using double
precision) gives a result of ~ 10^-17, i.e. three orders of magnitude
more precise.

Here's the program (pardon my style, I'm a newbie/occasional user):

from random import random

data = [random() for x in xrange(1000000)]

mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)

A little research shows that Mathematica uses a "compensated summation"
algorithm. Indeed, using the algorithm described at
http://en.wikipedia.org/wiki/Kahan_summation_algorithm
gives us a result around ~ 10^-17:

def compSum(arr):
s = 0.0
c = 0.0
for x in arr:
y = x-c
t = s+y
c = (t-s) - y
s = t
return s

mean = compSum(data)/len(data)
print compSum(x - mean for x in data)/len(data)
I thought that it would be very nice if the built-in sum() function used
this algorithm by default. Has this been brought up before? Would this
have any disadvantages (apart from a slight performance impact, but
Python is a high-level language anyway ...)?

Szabolcs Horvát
Jun 27 '08 #1
32 2152
Szabolcs Horvát <sz******@gmail.comwrites:

[...]
A little research shows that Mathematica uses a "compensated
summation" algorithm. Indeed, using the algorithm described at
http://en.wikipedia.org/wiki/Kahan_summation_algorithm
gives us a result around ~ 10^-17:

def compSum(arr):
s = 0.0
c = 0.0
for x in arr:
y = x-c
t = s+y
c = (t-s) - y
s = t
return s

mean = compSum(data)/len(data)
print compSum(x - mean for x in data)/len(data)
I thought that it would be very nice if the built-in sum() function
used this algorithm by default. Has this been brought up before?
Would this have any disadvantages (apart from a slight performance
impact, but Python is a high-level language anyway ...)?

Szabolcs Horvát
sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.

--
Arnaud
Jun 27 '08 #2
On Sat, 03 May 2008 18:50:34 +0200, Szabolcs Horvát wrote:
I did the following calculation: Generated a list of a million random
numbers between 0 and 1, constructed a new list by subtracting the mean
value from each number, and then calculated the mean again.

The result should be 0, but of course it will differ from 0 slightly
because of rounding errors.

However, I noticed that the simple Python program below gives a result
of ~ 10^-14, while an equivalent Mathematica program (also using double
precision) gives a result of ~ 10^-17, i.e. three orders of magnitude
more precise.

Here's the program (pardon my style, I'm a newbie/occasional user):

from random import random

data = [random() for x in xrange(1000000)]

mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)

A little research shows that Mathematica uses a "compensated summation"
algorithm. Indeed, using the algorithm described at
http://en.wikipedia.org/wiki/Kahan_summation_algorithm gives us a result
around ~ 10^-17:

def compSum(arr):
s = 0.0
c = 0.0
for x in arr:
y = x-c
t = s+y
c = (t-s) - y
s = t
return s

mean = compSum(data)/len(data)
print compSum(x - mean for x in data)/len(data)
I thought that it would be very nice if the built-in sum() function used
this algorithm by default. Has this been brought up before? Would this
have any disadvantages (apart from a slight performance impact, but
Python is a high-level language anyway ...)?

Szabolcs Horvát
Built-in sum should work with everything, not just floats. I think it
would be useful addition to standard math module.

If you know C you could write a patch to mathmodule.c and submit it to
Python devs.

--
Ivan
Jun 27 '08 #3
Szabolcs Horvát <sz******@gmail.comwrote:
I did the following calculation: Generated a list of a million random
numbers between 0 and 1, constructed a new list by subtracting the mean
value from each number, and then calculated the mean again.

The result should be 0, but of course it will differ from 0 slightly
because of rounding errors.

However, I noticed that the simple Python program below gives a result
of ~ 10^-14, while an equivalent Mathematica program (also using double
precision) gives a result of ~ 10^-17, i.e. three orders of magnitude
more precise.

Here's the program (pardon my style, I'm a newbie/occasional user):

from random import random

data = [random() for x in xrange(1000000)]

mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)

A little research shows that Mathematica uses a "compensated summation"
algorithm. Indeed, using the algorithm described at
http://en.wikipedia.org/wiki/Kahan_summation_algorithm
gives us a result around ~ 10^-17:
I was taught in my numerical methods course (about 20 years ago now!)
that the way to do this sum with most accuracy is to sum from the
smallest magnitude to the largest magnitude.

Eg
>>from random import random
data = [random() for x in xrange(1000000)]
mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)
1.64152134108e-14
>>mean = sum(sorted(data))/len(data)
print sum(x - mean for x in data)/len(data)
5.86725534824e-15
>>>
It isn't as good as the compensated sum but it is easy!
I thought that it would be very nice if the built-in sum() function used
this algorithm by default. Has this been brought up before? Would this
have any disadvantages (apart from a slight performance impact, but
Python is a high-level language anyway ...)?
sum() gets used for any numerical types not just floats...

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Jun 27 '08 #4
Arnaud Delobelle wrote:
>
sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.
This occurred to me also, but then I tried

sum(['abc', 'efg'], '')

and it did not work. Or is this just a special exception to prevent the
misuse of sum to join strings? (As I said, I'm only an occasional user.)

Generally, sum() seems to be most useful for numeric types (i.e. those
that form a group with respect to __add__ and __neg__/__sub__), which
may be either exact (e.g. integers) or inexact (e.g. floating point
types). For exact types it does not make sense to use compensated
summation (though it wouldn't give an incorrect answer, either), and
sum() cannot decide whether a user-defined type is exact or inexact.

But I guess that it would still be possible to make sum() use
compensated summation for built-in floating point types (float/complex).

Or, to go further, provide a switch to choose between the two methods,
and make use compensated summation for float/complex by default. (But
perhaps people would consider this last alternative a bit too messy.)

(Just some thoughts ...)
Jun 27 '08 #5
On May 3, 3:44*pm, Szabolcs Horvát <szhor...@gmail.comwrote:
Arnaud Delobelle wrote:
sum() works for any sequence of objects with an __add__ method, not
just floats! *Your algorithm is specific to floats.

This occurred to me also, but then I tried

sum(['abc', 'efg'], '')

and it did not work. *Or is this just a special exception to prevent the
* misuse of sum to join strings? *(As I said, I'm only an occasional user.)

Generally, sum() seems to be most useful for numeric types (i.e. those
that form a group with respect to __add__ and __neg__/__sub__), which
may be either exact (e.g. integers) or inexact (e.g. floating point
types). *For exact types it does not make sense to use compensated
summation (though it wouldn't give an incorrect answer, either), and
sum() cannot decide whether a user-defined type is exact or inexact.

But I guess that it would still be possible to make sum() use
compensated summation for built-in floating point types (float/complex).

Or, to go further, provide a switch to choose between the two methods,
and make use compensated summation for float/complex by default. *(But
perhaps people would consider this last alternative a bit too messy.)

(Just some thoughts ...)
The benefits should be weighted considering the common case. For
example, do you find an error of 10^-14 unbearable ? How many times
someone will add 1.000.000 numbers in a typical scientific application
written in python ?

I believe that moving this to third party could be better. What about
numpy ? Doesn't it already have something similar ?
Jun 27 '08 #6
Szabolcs Horvát <sz******@gmail.comwrites:
Arnaud Delobelle wrote:
>>
sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.

This occurred to me also, but then I tried

sum(['abc', 'efg'], '')

and it did not work. Or is this just a special exception to prevent
the misuse of sum to join strings? (As I said, I'm only an occasional
user.)
I think that's right: anything with an __add__ method, apart from
string, can be sum()ed.

--
Arnaud
Jun 27 '08 #7
Szabolcs Horvát wrote:
Arnaud Delobelle wrote:
>>
sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.

This occurred to me also, but then I tried

sum(['abc', 'efg'], '')

and it did not work. Or is this just a special exception to prevent the
misuse of sum to join strings? (As I said, I'm only an occasional user.)
What you wrote is nonsensical there, no different from 'a' + 1 -- which
is why it quite rightly raises a TypeError.

You're trying to add a list to a string, which is nonsensical. You add
strings to strings, or lists to lists, but mixing them up doesn't make
sense. Python can't guess what you mean when you write something like
['abc', 'def'] + '' -- which is the functional equivalent of your call
to sum -- and so doesn't try. It indicates this by raising a TypeError.

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 18 N 121 57 W && AIM, Y!M erikmaxfrancis
Jun 27 '08 #8
Hallöchen!

Erik Max Francis writes:
Szabolcs Horvát wrote:
>Arnaud Delobelle wrote:
>>>
sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.

This occurred to me also, but then I tried

sum(['abc', 'efg'], '')

and it did not work. Or is this just a special exception to
prevent the misuse of sum to join strings? (As I said, I'm only
an occasional user.)

What you wrote is nonsensical there, no different from 'a' + 1 --
which is why it quite rightly raises a TypeError.
No, the above expression should yield ''+'abc'+'efg', look for the
signature of sum in the docs.

Tschö,
Torsten.

--
Torsten Bronger, aquisgrana, europa vetus
Jabber ID: br*****@jabber.org
(See http://ime.webhop.org for further contact info.)
Jun 27 '08 #9
Torsten Bronger wrote:
No, the above expression should yield ''+'abc'+'efg', look for the
signature of sum in the docs.
You're absolutely right, I misread it. Sorry about that.

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 18 N 121 57 W && AIM, Y!M erikmaxfrancis
Jun 27 '08 #10
On Sat, 03 May 2008 20:44:19 +0200, Szabolcs Horvát wrote:
Arnaud Delobelle wrote:
>>
sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.

This occurred to me also, but then I tried

sum(['abc', 'efg'], '')
Interesting, I always thought that sum is like shortcut of
reduce(operator.add, ...), but I was mistaken.

reduce() is more forgiving:
reduce(operator.add, ['abc', 'efg'], '' ) # it works
'abcefg'

--
Ivan
Jun 27 '08 #11
On May 3, 10:13 pm, hdante <hda...@gmail.comwrote:
I believe that moving this to third party could be better. What about
numpy ? Doesn't it already have something similar ?
Yes, Kahan summation makes sence for numpy arrays. But the problem
with this algorithm is optimizing compilers. The programmer will be
forced to use tricks like inline assembly to get around the optimizer.
If not, the optimizer would remove the desired features of the
algorithm. But then we have a serious portability problem.
Jun 27 '08 #12

On Sat, 2008-05-03 at 21:37 +0000, Ivan Illarionov wrote:
On Sat, 03 May 2008 20:44:19 +0200, Szabolcs Horvát wrote:
Arnaud Delobelle wrote:
>
sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.
This occurred to me also, but then I tried

sum(['abc', 'efg'], '')

Interesting, I always thought that sum is like shortcut of
reduce(operator.add, ...), but I was mistaken.

reduce() is more forgiving:
reduce(operator.add, ['abc', 'efg'], '' ) # it works
'abcefg'
Hm, it works for lists:
sum(([1], [2]), [])
[1, 2]

However I find the seccond argument hack ugly.
Does the sum way have any performance advantages over the reduce way?

--
Best Regards,
Med Venlig Hilsen,
Thomas

Jun 27 '08 #13
On Sun, 04 May 2008 00:31:01 +0200, Thomas Dybdahl Ahle wrote:
On Sat, 2008-05-03 at 21:37 +0000, Ivan Illarionov wrote:
>On Sat, 03 May 2008 20:44:19 +0200, Szabolcs Horvát wrote:
Arnaud Delobelle wrote:

sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.

This occurred to me also, but then I tried

sum(['abc', 'efg'], '')

Interesting, I always thought that sum is like shortcut of
reduce(operator.add, ...), but I was mistaken.

reduce() is more forgiving:
reduce(operator.add, ['abc', 'efg'], '' ) # it works 'abcefg'

Hm, it works for lists:
sum(([1], [2]), [])
[1, 2]

However I find the seccond argument hack ugly. Does the sum way have any
performance advantages over the reduce way?
Yes, sum() is faster:

$ python -m timeit "" "sum([[1], [2], [3, 4]], [])"
100000 loops, best of 3: 6.16 usec per loop

$ python -m timeit "import operator" \
"reduce(operator.add, [[1], [2], [3, 4]])"
100000 loops, best of 3: 11.9 usec per loop

--
Ivan
Jun 27 '08 #14
On May 3, 7:12*pm, Ivan Illarionov <ivan.illario...@gmail.comwrote:
On Sun, 04 May 2008 00:31:01 +0200, Thomas Dybdahl Ahle wrote:
On Sat, 2008-05-03 at 21:37 +0000, Ivan Illarionov wrote:
On Sat, 03 May 2008 20:44:19 +0200, Szabolcs Horvát wrote:
Arnaud Delobelle wrote:
sum() works for any sequence of objects with an __add__ method, not
just floats! *Your algorithm is specific to floats.
This occurred to me also, but then I tried
sum(['abc', 'efg'], '')
Interesting, I always thought that sum is like shortcut of
reduce(operator.add, ...), but I was mistaken.
reduce() is more forgiving:
reduce(operator.add, ['abc', 'efg'], '' ) # it works 'abcefg'
Hm, it works for lists:
sum(([1], [2]), [])
[1, 2]
So it's not reduce() that is more forgiving, it's sum() that makes an
exception for strings only.

However I find the seccond argument hack ugly. Does the sum way have any
performance advantages over the reduce way?

Yes, sum() is faster:

$ python -m timeit "" "sum([[1], [2], [3, 4]], [])"
100000 loops, best of 3: 6.16 usec per loop

$ python -m timeit "import operator" \
"reduce(operator.add, [[1], [2], [3, 4]])"
100000 loops, best of 3: 11.9 usec per loop
Not really; you measure the import and the attribute access time in
the second case. sum() is 2x faster for adding numbers only:

# Adding integers
python -mtimeit --setup="x=[1]*100" "sum(x)"
100000 loops, best of 3: 4.87 usec per loop
python -mtimeit --setup="from operator import add; x=[1]*100"
"reduce(add,x)"
100000 loops, best of 3: 10.1 usec per loop

# Adding floats
python -mtimeit --setup="x=[1.0]*100" "sum(x)"
100000 loops, best of 3: 5.14 usec per loop
python -mtimeit --setup="from operator import add; x=[1.0]*100"
"reduce(add,x)"
100000 loops, best of 3: 10.1 usec per loop

# Adding tuples
python -mtimeit --setup="x=[(1,)]*100" "sum(x,())"
10000 loops, best of 3: 61.6 usec per loop
python -mtimeit --setup="from operator import add; x=[(1,)]*100"
"reduce(add,x,())"
10000 loops, best of 3: 66.3 usec per loop

# Adding lists
python -mtimeit --setup="x=[[1]]*100" "sum(x,[])"
10000 loops, best of 3: 79.9 usec per loop
python -mtimeit --setup="from operator import add; x=[[1]]*100"
"reduce(add,x,[])"
10000 loops, best of 3: 84.3 usec per loop

George
Jun 27 '08 #15
On Sat, 03 May 2008 17:43:57 -0700, George Sakkis wrote:
On May 3, 7:12Â*pm, Ivan Illarionov <ivan.illario...@gmail.comwrote:
>On Sun, 04 May 2008 00:31:01 +0200, Thomas Dybdahl Ahle wrote:
On Sat, 2008-05-03 at 21:37 +0000, Ivan Illarionov wrote:
On Sat, 03 May 2008 20:44:19 +0200, Szabolcs Horvát wrote:
Arnaud Delobelle wrote:
>sum() works for any sequence of objects with an __add__ method,
not just floats! Â*Your algorithm is specific to floats.
This occurred to me also, but then I tried
sum(['abc', 'efg'], '')
>Interesting, I always thought that sum is like shortcut of
reduce(operator.add, ...), but I was mistaken.
>reduce() is more forgiving:
reduce(operator.add, ['abc', 'efg'], '' ) # it works 'abcefg'
Hm, it works for lists:
sum(([1], [2]), [])
[1, 2]

So it's not reduce() that is more forgiving, it's sum() that makes an
exception for strings only.

However I find the seccond argument hack ugly. Does the sum way have
any performance advantages over the reduce way?

Yes, sum() is faster:

$ python -m timeit "" "sum([[1], [2], [3, 4]], [])" 100000 loops, best
of 3: 6.16 usec per loop

$ python -m timeit "import operator" \ "reduce(operator.add, [[1], [2],
[3, 4]])" 100000 loops, best of 3: 11.9 usec per loop

Not really; you measure the import and the attribute access time in the
second case. sum() is 2x faster for adding numbers only:

# Adding integers
python -mtimeit --setup="x=[1]*100" "sum(x)" 100000 loops, best of 3:
4.87 usec per loop python -mtimeit --setup="from operator import add;
x=[1]*100" "reduce(add,x)"
100000 loops, best of 3: 10.1 usec per loop

# Adding floats
python -mtimeit --setup="x=[1.0]*100" "sum(x)" 100000 loops, best of 3:
5.14 usec per loop python -mtimeit --setup="from operator import add;
x=[1.0]*100" "reduce(add,x)"
100000 loops, best of 3: 10.1 usec per loop

# Adding tuples
python -mtimeit --setup="x=[(1,)]*100" "sum(x,())" 10000 loops, best of
3: 61.6 usec per loop python -mtimeit --setup="from operator import add;
x=[(1,)]*100" "reduce(add,x,())"
10000 loops, best of 3: 66.3 usec per loop

# Adding lists
python -mtimeit --setup="x=[[1]]*100" "sum(x,[])" 10000 loops, best of
3: 79.9 usec per loop python -mtimeit --setup="from operator import add;
x=[[1]]*100" "reduce(add,x,[])"
10000 loops, best of 3: 84.3 usec per loop

George
Thanks for correction. Forget about --setup.

--
Ivan
Jun 27 '08 #16
On May 3, 7:05*pm, sturlamolden <sturlamol...@yahoo.nowrote:
On May 3, 10:13 pm, hdante <hda...@gmail.comwrote:
*I believe that moving this to third party could be better. What about
numpy ? Doesn't it already have something similar ?

Yes, Kahan summation makes sence for numpy arrays. But the problem
with this algorithm is optimizing compilers. The programmer will be
No, optimizing compilers shouldn't discard floating point operations
by default, since it changes program behavior. If they do, at least
there should be a flag to turn it off.
forced to use tricks like inline assembly to get around the optimizer.
If not, the optimizer would remove the desired features of the
algorithm. But then we have a serious portability problem.
Jun 27 '08 #17
Szabolcs Horvát <sz******@gmail.comwrote:
I thought that it would be very nice if the built-in sum() function used
this algorithm by default. Has this been brought up before? Would this
have any disadvantages (apart from a slight performance impact, but
Python is a high-level language anyway ...)?
There's a thread you might find interesting:

http://groups.google.com/group/comp....7cef7d4429aa3a

In that thread I suggested that Python ought to implement sum by adding
together each pair of values, then each pair of results and so on. This
means that for input values of a similar magnitude the intermediate results
stay balanced. The implementation doesn't actually do all the first level
sums first, it builds up a stack containing the sum of 2^k, 2^(k-1)...2
values. Also it works for other types as well, and for strings or lists has
the advantage of minimising the number of large string or list concatenations.

If you follow that thread far enough you'll find a post by Jussi Salmela
which shows that summing adjacent pairs performs as well as Kahan summation
(he says 'almost as good' but in fact the only time they get different
results the 'correct' answer is 500000.000005 and Kahan and sumpairs get
the two nearest representable results either side: 500000.00000499998 and
500000.00000500004 respectively).

I never did get round to re-coding my sumpairs() function in C, I would be
interested to see just how much overhead it introduces (and I suspect it
would be faster than the existing sum in some cases such as for lists).
Jun 27 '08 #18
En Sun, 04 May 2008 12:58:25 -0300, Duncan Booth <du**********@invalid.invalidescribió:
Szabolcs Horvát <sz******@gmail.comwrote:
>I thought that it would be very nice if the built-in sum() function used
this algorithm by default. Has this been brought up before? Would this
have any disadvantages (apart from a slight performance impact, but
Python is a high-level language anyway ...)?

There's a thread you might find interesting:

http://groups.google.com/group/comp....7cef7d4429aa3a

In that thread I suggested that Python ought to implement sum by adding
together each pair of values, then each pair of results and so on. This
means that for input values of a similar magnitude the intermediate results
stay balanced. The implementation doesn't actually do all the first level
Python doesn't require __add__ to be associative, so this should not be used as a general sum replacement. But if you know that you're adding floating point numbers you can use whatever algorithm best fits you. Or use numpy arrays; I think they implement Kahan summation or a similar algorithm.

--
Gabriel Genellina

Jun 27 '08 #19
Duncan Booth wrote:
Szabolcs Horvát <sz******@gmail.comwrote:
>I thought that it would be very nice if the built-in sum() function used
this algorithm by default. Has this been brought up before? Would this
have any disadvantages (apart from a slight performance impact, but
Python is a high-level language anyway ...)?

There's a thread you might find interesting:

http://groups.google.com/group/comp....7cef7d4429aa3a

In that thread I suggested that Python ought to implement sum by adding
together each pair of values, then each pair of results and so on. This
means that for input values of a similar magnitude the intermediate results
stay balanced. The implementation doesn't actually do all the first level
sums first, it builds up a stack containing the sum of 2^k, 2^(k-1)...2
values. Also it works for other types as well, and for strings or lists has
the advantage of minimising the number of large string or list concatenations.

If you follow that thread far enough you'll find a post by Jussi Salmela
which shows that summing adjacent pairs performs as well as Kahan summation
(he says 'almost as good' but in fact the only time they get different
results the 'correct' answer is 500000.000005 and Kahan and sumpairs get
the two nearest representable results either side: 500000.00000499998 and
500000.00000500004 respectively).

I never did get round to re-coding my sumpairs() function in C, I would be
interested to see just how much overhead it introduces (and I suspect it
would be faster than the existing sum in some cases such as for lists).
Thanks for this link! Though some of the thread is missing, it was an
interesting read.

This sounds like a very good idea: it is more generally applicable than
the Kahan summation because it does not require subtraction/negation, so
it would work with user-defined types, too.

While now I am convinced that using Kahan summation by default is not
such a good idea, I cannot see any serious disadvantages/problems with
pairwise summation!
Jun 27 '08 #20
Gabriel Genellina wrote:
>
Python doesn't require __add__ to be associative, so this should not be used as a general sum replacement.
It does not _require_ this, but using an __add__ that is not commutative
and associative, or has side effects, would qualify as a serious misuse,
anyway. So I think that this isn't a real disadvantage (it can always
be documented that sum() expects __add__ to have these properties).

But you are right that summing floats with a requirement of high
precision is a quite specific use case, so there are no serious
arguments to do this, either.
Jun 27 '08 #21
On May 3, 9:50*am, Szabolcs Horvát <szhor...@gmail.comwrote:
I did the following calculation: *Generated a list of a million random
numbers between 0 and 1, constructed a new list by subtracting the mean
value from each number, and then calculated the mean again.

The result should be 0, but of course it will differ from 0 slightly
because of rounding errors.
See:
http://aspn.activestate.com/ASPN/Coo.../Recipe/393090
Here's the program (pardon my style, I'm a newbie/occasional user):
. . .
mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)
See:
http://svn.python.org/view/sandbox/t...81&view=markup
Raymond

Jun 27 '08 #22
However I find the seccond argument hack ugly. Does the sum way have any
performance advantages over the reduce way?
Yes, sum() is faster:
....
Not really; you measure the import and the attribute access time in
the second case. sum() is 2x faster for adding numbers only:
Try it with Py2.6 or Py3.0. The sum() function has been optimized and
should be at least 6x faster for summing ints and floats. It should
even beat psyco optimized sums!
Raymond
Jun 27 '08 #23
On May 5, 9:37 am, Szabolcs Horvát <szhor...@gmail.comwrote:
Gabriel Genellina wrote:
Python doesn't require __add__ to be associative, so this should not be used as a general sum replacement.

It does not _require_ this, but using an __add__ that is not commutative
and associative, or has side effects, would qualify as a serious misuse,
anyway.
Sorry, I meant associative only, not commutative.
Jun 27 '08 #24
"Gabriel Genellina" <ga*******@yahoo.com.arwrote:
En Sun, 04 May 2008 12:58:25 -0300, Duncan Booth
<du**********@invalid.invalidescribió:
>Szabolcs Horvát <sz******@gmail.comwrote:
>>I thought that it would be very nice if the built-in sum() function
used this algorithm by default. Has this been brought up before?
Would this have any disadvantages (apart from a slight performance
impact, but Python is a high-level language anyway ...)?

There's a thread you might find interesting:

http://groups.google.com/group/comp....hread/thread/9
eda29faf92f532e/027cef7d4429aa3a

In that thread I suggested that Python ought to implement sum by
adding together each pair of values, then each pair of results and so
on. This means that for input values of a similar magnitude the
intermediate results stay balanced. The implementation doesn't
actually do all the first level

Python doesn't require __add__ to be associative, so this should not
be used as a general sum replacement. But if you know that you're
adding floating point numbers you can use whatever algorithm best fits
you. Or use numpy arrays; I think they implement Kahan summation or a
similar algorithm.
Yes, my argument is more along the line that it should have been
implemented that way in the first place, but changing things now would
require time machine intervention. :)
Jun 27 '08 #25
Szabolcs <sz******@gmail.comwrote:
On May 5, 9:37 am, Szabolcs Horvát <szhor...@gmail.comwrote:
>Gabriel Genellina wrote:
Python doesn't require __add__ to be associative, so this should
not be
used as a general sum replacement.
>>
It does not _require_ this, but using an __add__ that is not
commutative and associative, or has side effects, would qualify as a
serious misuse, anyway.

Sorry, I meant associative only, not commutative.
Unfortunately an __add__ which is not associative is actually perfectly
reasonable. For example you could have a Tree class where adding two trees
returns a new tree so (a+b)+c gives you:

Jun 27 '08 #26
On May 5, 12:24*pm, Duncan Booth <duncan.bo...@invalid.invalidwrote:
Szabolcs <szhor...@gmail.comwrote:
On May 5, 9:37 am, Szabolcs Horvát <szhor...@gmail.comwrote:
Gabriel Genellina wrote:
Python doesn't require __add__ to be associative, so this should
not be
used as a general sum replacement.
It does not _require_ this, but using an __add__ that is not
commutative and associative, or has side effects, would qualify as a
serious misuse, anyway.
Sorry, I meant associative only, not commutative.

Unfortunately an __add__ which is not associative is actually perfectly
reasonable.
Well, 'reasonable' is a subjective notion in this case :-) I have
never seen a non-associative use of the plus sign in maths, so I would
tend to take associativity for granted when seeing a + b + c in code,
therefore I would avoid such uses. (Multiplication signs are a
different story.) Of course others may feel differently about this,
and technically there's nothing in the way of using a non-associative
__add__! :)
For example you could have a Tree class where adding two trees
returns a new tree so (a+b)+c gives you:

* * .
* */ \
* *. *c
* / \
* a *b

but a+(b+c) gives:

* * .
* */ \
* *a *.
* * */ \
* * *b *c
Jun 27 '08 #27
On May 3, 4:31 pm, Thomas Dybdahl Ahle <lob...@gmail.comwrote:
On Sat, 2008-05-03 at 21:37 +0000, Ivan Illarionov wrote:
On Sat, 03 May 2008 20:44:19 +0200, Szabolcs Horvát wrote:
Arnaud Delobelle wrote:
>sum() works for any sequence of objects with an __add__ method, not
>just floats! Your algorithm is specific to floats.
This occurred to me also, but then I tried
sum(['abc', 'efg'], '')
Interesting, I always thought that sum is like shortcut of
reduce(operator.add, ...), but I was mistaken.
reduce() is more forgiving:
reduce(operator.add, ['abc', 'efg'], '' ) # it works
'abcefg'

Hm, it works for lists:
sum(([1], [2]), [])
[1, 2]

However I find the seccond argument hack ugly.
Does the sum way have any performance advantages over the reduce way?
The relative differences are irrelevant. The important thing is they
both perform quadratically, which makes them the wrong choice (unless
you can guarantee your inputs are trivial.)

$ python2.5 -m timeit 'x = [[0]*1000]*10' 'sum(x, [])'
1000 loops, best of 3: 566 usec per loop
$ python2.5 -m timeit 'x = [[0]*1000]*100' 'sum(x, [])'
10 loops, best of 3: 106 msec per loop
$ python2.5 -m timeit 'x = [[0]*1000]*1000' 'sum(x, [])'
10 loops, best of 3: 18.1 sec per loop

Per element of the outer list, that's 0.0566, 1.06, 18.1 msec.
Nasty. Use this instead:

$ python2.5 -m timeit -s 'x = [[0]*1000]*10' 'y = []' 'for a in x:
y.extend(a)'
10000 loops, best of 3: 80.2 usec per loop
$ python2.5 -m timeit -s 'x = [[0]*1000]*100' 'y = []' 'for a in x:
y.extend(a)'
1000 loops, best of 3: 821 usec per loop
$ python2.5 -m timeit -s 'x = [[0]*1000]*1000' 'y = []' 'for a in x:
y.extend(a)'
10 loops, best of 3: 24.3 msec per loop
$ python2.5 -m timeit -s 'x = [[0]*1000]*10000' 'y = []' 'for a in x:
y.extend(a)'
10 loops, best of 3: 241 msec per loop
$ python2.5 -m timeit -s 'x = [[0]*1000]*100000' 'y = []' 'for a in x:
y.extend(a)'
10 loops, best of 3: 2.35 sec per loop

Per element of the outer list t hat's 0.00802, 0.00821, 0.0243,
0.0241, 0.0235 msec. At 1000 elements it seemed to cross some
threshold (maybe cache related, maybe allocation related), but overall
it scales linearly. Know your algorithm's complexity.
Jun 27 '08 #28
Gabriel Genellina wrote:
Python doesn't require __add__ to be associative, so this should not be used as a general sum replacement. But if you know that you're adding floating point numbers you can use whatever algorithm best fits you. Or use numpy arrays; I think they implement Kahan summation or a similar algorithm.
No, we do a straightforward sum, too. Less magic in the default case.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Jun 27 '08 #29
En Mon, 05 May 2008 05:08:15 -0300, Raymond Hettinger <py****@rcn.com>
escribió:
On May 3, 9:50*am, Szabolcs Horvát <szhor...@gmail.comwrote:
>I did the following calculation: *Generated a list of a million random
numbers between 0 and 1, constructed a new list by subtracting the mean
value from each number, and then calculated the mean again.

The result should be 0, but of course it will differ from 0 slightly
because of rounding errors.

See:
http://aspn.activestate.com/ASPN/Coo.../Recipe/393090
Awesome!
See:
http://svn.python.org/view/sandbox/t...81&view=markup
Wonderful!

--
Gabriel Genellina

Jun 27 '08 #30

"Szabolcs Horvát" <sz******@gmail.comwrote in message
news:fv***********@toralf.uib.no...
| Gabriel Genellina wrote:
| >
| Python doesn't require __add__ to be associative, so this should not be
used as a general sum replacement.
|
| It does not _require_ this, but using an __add__ that is not commutative
| and associative, or has side effects, would qualify as a serious misuse,

seq + seq is not commutative ;-)

Jun 27 '08 #31
On May 8, 3:09 pm, "Terry Reedy" <tjre...@udel.eduwrote:
"Szabolcs Horvát" <szhor...@gmail.comwrote in message

news:fv***********@toralf.uib.no...| Gabriel Genellina wrote:

| >
| Python doesn't require __add__ to be associative, so this should not be
used as a general sum replacement.
|
| It does not _require_ this, but using an __add__ that is not commutative
| and associative, or has side effects, would qualify as a serious misuse,

seq + seq is not commutative ;-)
There's no point switching to a special algorithm unless a float is
detected. What you're really arguing is seq + seq + float is not
commutative. In either case we may have to hurt you. ;)
Jun 27 '08 #32
"Terry Reedy" <tj*****@udel.eduwrote:
>
"Szabolcs Horvát" <sz******@gmail.comwrote in message
news:fv***********@toralf.uib.no...
| Gabriel Genellina wrote:
| >
| Python doesn't require __add__ to be associative, so this should
| not be
used as a general sum replacement.
|
| It does not _require_ this, but using an __add__ that is not
| commutative and associative, or has side effects, would qualify as a
| serious misuse,

seq + seq is not commutative ;-)
And three days ago Szabolcs already posted a followup to that
comment saying: "Sorry, I meant associative only, not commutative.".
Jun 27 '08 #33

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

15
2106
by: Jordan Rastrick | last post by:
First, a disclaimer. I am a second year Maths and Computer Science undergraduate, and this is my first time ever on Usenet (I guess I'm part of the http generation). On top of that, I have been...
1
4610
by: Quarco | last post by:
Hi, Suppose I have a query like: SELECT products.name AS product, SUM(IF(stock.invoice=0,1,0)) AS in_stock, SUM(IF(shopcart.status=1,1,0)) AS reserved FROM products LEFT JOIN stock ON...
52
3904
by: Paddy | last post by:
I was browsing the Voidspace blog item on "Flattening Lists", and followed up on the use of sum to do the flattening. A solution was: I would not have thought of using sum in this way. When...
9
7748
by: asaguiar | last post by:
Hi, Given the pseudocode explanation of the Kahan algorithm at http://en.wikipedia.org/wiki/Kahan_summation_algorithm, I tried to implement it in C. It is supposed to minimize the effect of base...
29
1920
by: n00m | last post by:
http://www.spoj.pl/problems/SUMFOUR/ 3 0 0 0 0 0 0 0 0 -1 -1 1 1 Answer for this input data is 33. My solution for the problem is...
2
1494
by: isurujaya | last post by:
Hai Masters, I have a problem in getting the summation of Mysql database several fields. It is like this. My database name -BAR Table - Pda There are several fields in the table like...
0
7225
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7123
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7324
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7382
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
7042
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
1
5052
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4707
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3193
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
1
766
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.