Feature suggestion: sum() ought to use a compensated summation algorithm

=?ISO-8859-1?Q?Szabolcs_Horv=E1t?=

I did the following calculation: Generated a list of a million random
numbers between 0 and 1, constructed a new list by subtracting the mean
value from each number, and then calculated the mean again.

The result should be 0, but of course it will differ from 0 slightly
because of rounding errors.

However, I noticed that the simple Python program below gives a result
of ~ 10^-14, while an equivalent Mathematica program (also using double
precision) gives a result of ~ 10^-17, i.e. three orders of magnitude
more precise.

Here's the program (pardon my style, I'm a newbie/occasional user):

from random import random

data = [random() for x in xrange(1000000)]

mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)

A little research shows that Mathematica uses a "compensate d summation"
algorithm. Indeed, using the algorithm described at
http://en.wikipedia.org/wiki/Kahan_summation_algorithm
gives us a result around ~ 10^-17:

def compSum(arr):
s = 0.0
c = 0.0
for x in arr:
y = x-c
t = s+y
c = (t-s) - y
s = t
return s

mean = compSum(data)/len(data)
print compSum(x - mean for x in data)/len(data)
I thought that it would be very nice if the built-in sum() function used
this algorithm by default. Has this been brought up before? Would this
have any disadvantages (apart from a slight performance impact, but
Python is a high-level language anyway ...)?

Szabolcs Horvát

Jun 27 '08 #1

Subscribe Reply

2194

1
2
3
>
Last »

Arnaud Delobelle

Szabolcs Horvát <sz******@gmail .comwrites:

[...]

A little research shows that Mathematica uses a "compensate d
summation" algorithm. Indeed, using the algorithm described at
http://en.wikipedia.org/wiki/Kahan_summation_algorithm
gives us a result around ~ 10^-17:

def compSum(arr):
s = 0.0
c = 0.0
for x in arr:
y = x-c
t = s+y
c = (t-s) - y
s = t
return s

mean = compSum(data)/len(data)
print compSum(x - mean for x in data)/len(data)
I thought that it would be very nice if the built-in sum() function
used this algorithm by default. Has this been brought up before?
Would this have any disadvantages (apart from a slight performance
impact, but Python is a high-level language anyway ...)?

Szabolcs Horvát

sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.

--
Arnaud

Jun 27 '08 #2

Ivan Illarionov

On Sat, 03 May 2008 18:50:34 +0200, Szabolcs HorvÃ¡t wrote:

I did the following calculation: Generated a list of a million random
numbers between 0 and 1, constructed a new list by subtracting the mean
value from each number, and then calculated the mean again.

The result should be 0, but of course it will differ from 0 slightly
because of rounding errors.

However, I noticed that the simple Python program below gives a result
of ~ 10^-14, while an equivalent Mathematica program (also using double
precision) gives a result of ~ 10^-17, i.e. three orders of magnitude
more precise.

Here's the program (pardon my style, I'm a newbie/occasional user):

from random import random

data = [random() for x in xrange(1000000)]

mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)

A little research shows that Mathematica uses a "compensate d summation"
algorithm. Indeed, using the algorithm described at
http://en.wikipedia.org/wiki/Kahan_summation_algorithm gives us a result
around ~ 10^-17:

def compSum(arr):
s = 0.0
c = 0.0
for x in arr:
y = x-c
t = s+y
c = (t-s) - y
s = t
return s

mean = compSum(data)/len(data)
print compSum(x - mean for x in data)/len(data)
I thought that it would be very nice if the built-in sum() function used
this algorithm by default. Has this been brought up before? Would this
have any disadvantages (apart from a slight performance impact, but
Python is a high-level language anyway ...)?

Szabolcs HorvÃ¡t

Built-in sum should work with everything, not just floats. I think it
would be useful addition to standard math module.

If you know C you could write a patch to mathmodule.c and submit it to
Python devs.

--
Ivan

Jun 27 '08 #3

Nick Craig-Wood

Szabolcs Horvát <sz******@gmail .comwrote:

I did the following calculation: Generated a list of a million random
numbers between 0 and 1, constructed a new list by subtracting the mean
value from each number, and then calculated the mean again.

The result should be 0, but of course it will differ from 0 slightly
because of rounding errors.

However, I noticed that the simple Python program below gives a result
of ~ 10^-14, while an equivalent Mathematica program (also using double
precision) gives a result of ~ 10^-17, i.e. three orders of magnitude
more precise.

Here's the program (pardon my style, I'm a newbie/occasional user):

from random import random

data = [random() for x in xrange(1000000)]

mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)

A little research shows that Mathematica uses a "compensate d summation"
algorithm. Indeed, using the algorithm described at
http://en.wikipedia.org/wiki/Kahan_summation_algorithm
gives us a result around ~ 10^-17:

I was taught in my numerical methods course (about 20 years ago now!)
that the way to do this sum with most accuracy is to sum from the
smallest magnitude to the largest magnitude.

Eg

>>from random import random
data = [random() for x in xrange(1000000)]
mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)

1.64152134108e-14

>>mean = sum(sorted(data ))/len(data)
print sum(x - mean for x in data)/len(data)

5.86725534824e-15

>>>

It isn't as good as the compensated sum but it is easy!

I thought that it would be very nice if the built-in sum() function used
this algorithm by default. Has this been brought up before? Would this
have any disadvantages (apart from a slight performance impact, but
Python is a high-level language anyway ...)?

sum() gets used for any numerical types not just floats...

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick

Jun 27 '08 #4

=?ISO-8859-1?Q?Szabolcs_Horv=E1t?=

Arnaud Delobelle wrote:

>
sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.

This occurred to me also, but then I tried

sum(['abc', 'efg'], '')

and it did not work. Or is this just a special exception to prevent the
misuse of sum to join strings? (As I said, I'm only an occasional user.)

Generally, sum() seems to be most useful for numeric types (i.e. those
that form a group with respect to __add__ and __neg__/__sub__), which
may be either exact (e.g. integers) or inexact (e.g. floating point
types). For exact types it does not make sense to use compensated
summation (though it wouldn't give an incorrect answer, either), and
sum() cannot decide whether a user-defined type is exact or inexact.

But I guess that it would still be possible to make sum() use
compensated summation for built-in floating point types (float/complex).

Or, to go further, provide a switch to choose between the two methods,
and make use compensated summation for float/complex by default. (But
perhaps people would consider this last alternative a bit too messy.)

(Just some thoughts ...)

Jun 27 '08 #5

Erik Max Francis

Torsten Bronger wrote:

No, the above expression should yield ''+'abc'+'efg', look for the
signature of sum in the docs.

You're absolutely right, I misread it. Sorry about that.

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 18 N 121 57 W && AIM, Y!M erikmaxfrancis

Jun 27 '08 #6

Ivan Illarionov

On Sat, 03 May 2008 20:44:19 +0200, Szabolcs HorvÃ¡t wrote:

Arnaud Delobelle wrote:
>>
sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.

This occurred to me also, but then I tried

sum(['abc', 'efg'], '')

Interesting, I always thought that sum is like shortcut of
reduce(operator .add, ...), but I was mistaken.

reduce() is more forgiving:
reduce(operator .add, ['abc', 'efg'], '' ) # it works
'abcefg'

--
Ivan

Jun 27 '08 #7

sturlamolden

On May 3, 10:13 pm, hdante <hda...@gmail.c omwrote:

I believe that moving this to third party could be better. What about
numpy ? Doesn't it already have something similar ?

Yes, Kahan summation makes sence for numpy arrays. But the problem
with this algorithm is optimizing compilers. The programmer will be
forced to use tricks like inline assembly to get around the optimizer.
If not, the optimizer would remove the desired features of the
algorithm. But then we have a serious portability problem.

Jun 27 '08 #8

Thomas Dybdahl Ahle

On Sat, 2008-05-03 at 21:37 +0000, Ivan Illarionov wrote:

On Sat, 03 May 2008 20:44:19 +0200, Szabolcs HorvÃ¡t wrote:

Arnaud Delobelle wrote:
>
sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.
This occurred to me also, but then I tried

sum(['abc', 'efg'], '')

Interesting, I always thought that sum is like shortcut of
reduce(operator .add, ...), but I was mistaken.

reduce() is more forgiving:
reduce(operator .add, ['abc', 'efg'], '' ) # it works
'abcefg'

Hm, it works for lists:
sum(([1], [2]), [])
[1, 2]

However I find the seccond argument hack ugly.
Does the sum way have any performance advantages over the reduce way?

--
Best Regards,
Med Venlig Hilsen,
Thomas

Jun 27 '08 #9

Ivan Illarionov

On Sun, 04 May 2008 00:31:01 +0200, Thomas Dybdahl Ahle wrote:

On Sat, 2008-05-03 at 21:37 +0000, Ivan Illarionov wrote:
>On Sat, 03 May 2008 20:44:19 +0200, Szabolcs HorvÃ¡t wrote:

Arnaud Delobelle wrote:

sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.

This occurred to me also, but then I tried

sum(['abc', 'efg'], '')

Interesting, I always thought that sum is like shortcut of
reduce(operato r.add, ...), but I was mistaken.

reduce() is more forgiving:
reduce(operato r.add, ['abc', 'efg'], '' ) # it works 'abcefg'

Hm, it works for lists:
sum(([1], [2]), [])
[1, 2]

However I find the seccond argument hack ugly. Does the sum way have any
performance advantages over the reduce way?

Yes, sum() is faster:

$ python -m timeit "" "sum([[1], [2], [3, 4]], [])"
100000 loops, best of 3: 6.16 usec per loop

$ python -m timeit "import operator" \
"reduce(operato r.add, [[1], [2], [3, 4]])"
100000 loops, best of 3: 11.9 usec per loop

--
Ivan

Jun 27 '08 #10

Similar topics

2135

Suggesting a new feature - "Inverse Generators"

by: Jordan Rastrick | last post by:

First, a disclaimer. I am a second year Maths and Computer Science undergraduate, and this is my first time ever on Usenet (I guess I'm part of the http generation). On top of that, I have been using Python for a grand total of about a fortnight now. Hence, I apologise if what follows is a stupid suggestion, or if its already been made somewhere else, or if this is not the appropriate place to make it, etc. But I did honestly do some...

Python

4629

2 x SUM(IF.. and 2x LEFT JOIN

by: Quarco | last post by:

Hi, Suppose I have a query like: SELECT products.name AS product, SUM(IF(stock.invoice=0,1,0)) AS in_stock, SUM(IF(shopcart.status=1,1,0)) AS reserved FROM products LEFT JOIN stock ON products.id=stock.product_id

MySQL Database

3980

sum and strings

by: Paddy | last post by:

I was browsing the Voidspace blog item on "Flattening Lists", and followed up on the use of sum to do the flattening. A solution was: I would not have thought of using sum in this way. When I did help(sum) the docstring was very number-centric: It went further, and precluded its use on strings:

Python

7803

Implementation of Kahan sum algorithm

by: asaguiar | last post by:

Hi, Given the pseudocode explanation of the Kahan algorithm at http://en.wikipedia.org/wiki/Kahan_summation_algorithm, I tried to implement it in C. It is supposed to minimize the effect of base conversion errors after repeated sum operations. However, the effect is null. The rounding error persists without change. My implementation is:

C / C++

1959

To count number of quadruplets with sum = 0

by: n00m | last post by:

http://www.spoj.pl/problems/SUMFOUR/ 3 0 0 0 0 0 0 0 0 -1 -1 1 1 Answer for this input data is 33. My solution for the problem is ======================================================================

Python

1512

Sum of MYSQL fields

by: isurujaya | last post by:

Hai Masters, I have a problem in getting the summation of Mysql database several fields. It is like this. My database name -BAR Table - Pda There are several fields in the table like A1,A2, A3 ... in a one record. And data( in int type) has been entered already. Numbers have been entered as data. I want to display the Sum of A1,A2 and A3.. ( total amount of all fiedls in a one record- displaying by one by one acording to...

MySQL Database

9715

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

9595

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10356

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

10099

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

6869

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5536

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5675

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

4314

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

3836

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP