473,804 Members | 2,136 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Feature suggestion: sum() ought to use a compensated summation algorithm


I did the following calculation: Generated a list of a million random
numbers between 0 and 1, constructed a new list by subtracting the mean
value from each number, and then calculated the mean again.

The result should be 0, but of course it will differ from 0 slightly
because of rounding errors.

However, I noticed that the simple Python program below gives a result
of ~ 10^-14, while an equivalent Mathematica program (also using double
precision) gives a result of ~ 10^-17, i.e. three orders of magnitude
more precise.

Here's the program (pardon my style, I'm a newbie/occasional user):

from random import random

data = [random() for x in xrange(1000000)]

mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)

A little research shows that Mathematica uses a "compensate d summation"
algorithm. Indeed, using the algorithm described at
http://en.wikipedia.org/wiki/Kahan_summation_algorithm
gives us a result around ~ 10^-17:

def compSum(arr):
s = 0.0
c = 0.0
for x in arr:
y = x-c
t = s+y
c = (t-s) - y
s = t
return s

mean = compSum(data)/len(data)
print compSum(x - mean for x in data)/len(data)
I thought that it would be very nice if the built-in sum() function used
this algorithm by default. Has this been brought up before? Would this
have any disadvantages (apart from a slight performance impact, but
Python is a high-level language anyway ...)?

Szabolcs Horvát
Jun 27 '08 #1
32 2194
Szabolcs Horvát <sz******@gmail .comwrites:

[...]
A little research shows that Mathematica uses a "compensate d
summation" algorithm. Indeed, using the algorithm described at
http://en.wikipedia.org/wiki/Kahan_summation_algorithm
gives us a result around ~ 10^-17:

def compSum(arr):
s = 0.0
c = 0.0
for x in arr:
y = x-c
t = s+y
c = (t-s) - y
s = t
return s

mean = compSum(data)/len(data)
print compSum(x - mean for x in data)/len(data)
I thought that it would be very nice if the built-in sum() function
used this algorithm by default. Has this been brought up before?
Would this have any disadvantages (apart from a slight performance
impact, but Python is a high-level language anyway ...)?

Szabolcs Horvát
sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.

--
Arnaud
Jun 27 '08 #2
On Sat, 03 May 2008 18:50:34 +0200, Szabolcs Horvát wrote:
I did the following calculation: Generated a list of a million random
numbers between 0 and 1, constructed a new list by subtracting the mean
value from each number, and then calculated the mean again.

The result should be 0, but of course it will differ from 0 slightly
because of rounding errors.

However, I noticed that the simple Python program below gives a result
of ~ 10^-14, while an equivalent Mathematica program (also using double
precision) gives a result of ~ 10^-17, i.e. three orders of magnitude
more precise.

Here's the program (pardon my style, I'm a newbie/occasional user):

from random import random

data = [random() for x in xrange(1000000)]

mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)

A little research shows that Mathematica uses a "compensate d summation"
algorithm. Indeed, using the algorithm described at
http://en.wikipedia.org/wiki/Kahan_summation_algorithm gives us a result
around ~ 10^-17:

def compSum(arr):
s = 0.0
c = 0.0
for x in arr:
y = x-c
t = s+y
c = (t-s) - y
s = t
return s

mean = compSum(data)/len(data)
print compSum(x - mean for x in data)/len(data)
I thought that it would be very nice if the built-in sum() function used
this algorithm by default. Has this been brought up before? Would this
have any disadvantages (apart from a slight performance impact, but
Python is a high-level language anyway ...)?

Szabolcs Horvát
Built-in sum should work with everything, not just floats. I think it
would be useful addition to standard math module.

If you know C you could write a patch to mathmodule.c and submit it to
Python devs.

--
Ivan
Jun 27 '08 #3
Szabolcs Horvát <sz******@gmail .comwrote:
I did the following calculation: Generated a list of a million random
numbers between 0 and 1, constructed a new list by subtracting the mean
value from each number, and then calculated the mean again.

The result should be 0, but of course it will differ from 0 slightly
because of rounding errors.

However, I noticed that the simple Python program below gives a result
of ~ 10^-14, while an equivalent Mathematica program (also using double
precision) gives a result of ~ 10^-17, i.e. three orders of magnitude
more precise.

Here's the program (pardon my style, I'm a newbie/occasional user):

from random import random

data = [random() for x in xrange(1000000)]

mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)

A little research shows that Mathematica uses a "compensate d summation"
algorithm. Indeed, using the algorithm described at
http://en.wikipedia.org/wiki/Kahan_summation_algorithm
gives us a result around ~ 10^-17:
I was taught in my numerical methods course (about 20 years ago now!)
that the way to do this sum with most accuracy is to sum from the
smallest magnitude to the largest magnitude.

Eg
>>from random import random
data = [random() for x in xrange(1000000)]
mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)
1.64152134108e-14
>>mean = sum(sorted(data ))/len(data)
print sum(x - mean for x in data)/len(data)
5.86725534824e-15
>>>
It isn't as good as the compensated sum but it is easy!
I thought that it would be very nice if the built-in sum() function used
this algorithm by default. Has this been brought up before? Would this
have any disadvantages (apart from a slight performance impact, but
Python is a high-level language anyway ...)?
sum() gets used for any numerical types not just floats...

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Jun 27 '08 #4
Arnaud Delobelle wrote:
>
sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.
This occurred to me also, but then I tried

sum(['abc', 'efg'], '')

and it did not work. Or is this just a special exception to prevent the
misuse of sum to join strings? (As I said, I'm only an occasional user.)

Generally, sum() seems to be most useful for numeric types (i.e. those
that form a group with respect to __add__ and __neg__/__sub__), which
may be either exact (e.g. integers) or inexact (e.g. floating point
types). For exact types it does not make sense to use compensated
summation (though it wouldn't give an incorrect answer, either), and
sum() cannot decide whether a user-defined type is exact or inexact.

But I guess that it would still be possible to make sum() use
compensated summation for built-in floating point types (float/complex).

Or, to go further, provide a switch to choose between the two methods,
and make use compensated summation for float/complex by default. (But
perhaps people would consider this last alternative a bit too messy.)

(Just some thoughts ...)
Jun 27 '08 #5
Torsten Bronger wrote:
No, the above expression should yield ''+'abc'+'efg', look for the
signature of sum in the docs.
You're absolutely right, I misread it. Sorry about that.

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 18 N 121 57 W && AIM, Y!M erikmaxfrancis
Jun 27 '08 #6
On Sat, 03 May 2008 20:44:19 +0200, Szabolcs Horvát wrote:
Arnaud Delobelle wrote:
>>
sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.

This occurred to me also, but then I tried

sum(['abc', 'efg'], '')
Interesting, I always thought that sum is like shortcut of
reduce(operator .add, ...), but I was mistaken.

reduce() is more forgiving:
reduce(operator .add, ['abc', 'efg'], '' ) # it works
'abcefg'

--
Ivan
Jun 27 '08 #7
On May 3, 10:13 pm, hdante <hda...@gmail.c omwrote:
I believe that moving this to third party could be better. What about
numpy ? Doesn't it already have something similar ?
Yes, Kahan summation makes sence for numpy arrays. But the problem
with this algorithm is optimizing compilers. The programmer will be
forced to use tricks like inline assembly to get around the optimizer.
If not, the optimizer would remove the desired features of the
algorithm. But then we have a serious portability problem.
Jun 27 '08 #8

On Sat, 2008-05-03 at 21:37 +0000, Ivan Illarionov wrote:
On Sat, 03 May 2008 20:44:19 +0200, Szabolcs Horvát wrote:
Arnaud Delobelle wrote:
>
sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.
This occurred to me also, but then I tried

sum(['abc', 'efg'], '')

Interesting, I always thought that sum is like shortcut of
reduce(operator .add, ...), but I was mistaken.

reduce() is more forgiving:
reduce(operator .add, ['abc', 'efg'], '' ) # it works
'abcefg'
Hm, it works for lists:
sum(([1], [2]), [])
[1, 2]

However I find the seccond argument hack ugly.
Does the sum way have any performance advantages over the reduce way?

--
Best Regards,
Med Venlig Hilsen,
Thomas

Jun 27 '08 #9
On Sun, 04 May 2008 00:31:01 +0200, Thomas Dybdahl Ahle wrote:
On Sat, 2008-05-03 at 21:37 +0000, Ivan Illarionov wrote:
>On Sat, 03 May 2008 20:44:19 +0200, Szabolcs Horvát wrote:
Arnaud Delobelle wrote:

sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.

This occurred to me also, but then I tried

sum(['abc', 'efg'], '')

Interesting, I always thought that sum is like shortcut of
reduce(operato r.add, ...), but I was mistaken.

reduce() is more forgiving:
reduce(operato r.add, ['abc', 'efg'], '' ) # it works 'abcefg'

Hm, it works for lists:
sum(([1], [2]), [])
[1, 2]

However I find the seccond argument hack ugly. Does the sum way have any
performance advantages over the reduce way?
Yes, sum() is faster:

$ python -m timeit "" "sum([[1], [2], [3, 4]], [])"
100000 loops, best of 3: 6.16 usec per loop

$ python -m timeit "import operator" \
"reduce(operato r.add, [[1], [2], [3, 4]])"
100000 loops, best of 3: 11.9 usec per loop

--
Ivan
Jun 27 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

15
2135
by: Jordan Rastrick | last post by:
First, a disclaimer. I am a second year Maths and Computer Science undergraduate, and this is my first time ever on Usenet (I guess I'm part of the http generation). On top of that, I have been using Python for a grand total of about a fortnight now. Hence, I apologise if what follows is a stupid suggestion, or if its already been made somewhere else, or if this is not the appropriate place to make it, etc. But I did honestly do some...
1
4629
by: Quarco | last post by:
Hi, Suppose I have a query like: SELECT products.name AS product, SUM(IF(stock.invoice=0,1,0)) AS in_stock, SUM(IF(shopcart.status=1,1,0)) AS reserved FROM products LEFT JOIN stock ON products.id=stock.product_id
52
3980
by: Paddy | last post by:
I was browsing the Voidspace blog item on "Flattening Lists", and followed up on the use of sum to do the flattening. A solution was: I would not have thought of using sum in this way. When I did help(sum) the docstring was very number-centric: It went further, and precluded its use on strings:
9
7803
by: asaguiar | last post by:
Hi, Given the pseudocode explanation of the Kahan algorithm at http://en.wikipedia.org/wiki/Kahan_summation_algorithm, I tried to implement it in C. It is supposed to minimize the effect of base conversion errors after repeated sum operations. However, the effect is null. The rounding error persists without change. My implementation is:
29
1959
by: n00m | last post by:
http://www.spoj.pl/problems/SUMFOUR/ 3 0 0 0 0 0 0 0 0 -1 -1 1 1 Answer for this input data is 33. My solution for the problem is ======================================================================
2
1512
by: isurujaya | last post by:
Hai Masters, I have a problem in getting the summation of Mysql database several fields. It is like this. My database name -BAR Table - Pda There are several fields in the table like A1,A2, A3 ... in a one record. And data( in int type) has been entered already. Numbers have been entered as data. I want to display the Sum of A1,A2 and A3.. ( total amount of all fiedls in a one record- displaying by one by one acording to...
0
9715
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9595
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
10356
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10099
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6869
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5536
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5675
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4314
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3836
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.