Duncan Booth kirjoitti:

Dennis Lee Bieber <wl*****@ix.netcom.comwrote:

> For floating point, smallest magnitude to largest IS the most

precise.

Pretend you only have 2 significant digits of precision...

10 + 0.4 + 0.4 + 0.4 =10

0.4 + 0.4 + 0.4 + 10 =11

and if you try the way I suggested then initial input order doesn't

matter:

(10 + 0.4) = 10, (0.4 + 0.4) = 0.8, (10 + 0.8) = 11

(0.4 + 0.4) = 0.8, (10 + 0.4) = 10, (0.8 + 10) = 11

Pretend you ran the example code I posted. Specifically the bit where

the output is:

all the same

builtin sum 1000000.0000016732

pairwise sum 1000000.00001

It isn't so much the order of the initial values that matters

(especially if the values are all similar). What *really* matters is

keeping the magnitude of the intermediate results as small as possible.

Summing in pairs ensures that you keep the precision as long as

possible. The order of the initial values is less important than this

(although it does still have a minor effect). For a solution which works

with a sequence of unknown length and doesn't require lookahead I don't

think you can do much better than that.

BTW, in case someone thinks my example numbers are a bit too weird you

can show the same problem averaging a list of 10 digit floating point

numbers with exact representations:

v = [9999999999.] * 10000000

print "builtin sum ", (sum(v)/len(v))

print "pairwise sum", (sumpairs(v)/len(v))

gives:

builtin sum 9999999999.91

pairwise sum 9999999999.0

In both cases the total is large enough to go beyond the range of

integers that can be expressed exactly in floating point and as soon as

that happens the builtin sum loses precision on every subsequent

addition. pairwise summation only loses precision on a very few of the

additions.

P.S. Apologies for the sloppy coding in the sumpairs function. It should

of course do the final addition manually otherwise it is going to give

odd results summing lists or strings or anything else where the order

matters. Revised version:

def sumpairs(seq):

tmp = []

for i,v in enumerate(seq):

if i&1:

tmp[-1] += v

i = i + 1

n = i & -i

while n 2:

t = tmp.pop(-1)

tmp[-1] = tmp[-1] + t

n >>= 1

else:

tmp.append(v)

while len(tmp) 1:

t = tmp.pop(-1)

tmp[-1] = tmp[-1] + t

return tmp[0]

Warning: I'm no floating point guru! Awfully long post following!

I've run a couple of tests and it seems to me that Dennis Lee Bieber is

on the trail of the truth when he claims that smallest magnitude to

the largest is the way to do the summation. Actually it isn't THE way

although it diminishes the error. I was sort of astonished because I

also had somewhere along the years formed the same notion as Dennis.

"Your" pairwise method beats the former method by a large margin

although I don't understand why. To tell you the truth: I started out to

show you were wrong because intuitively (for me at least) the former

method should work better than yours.

I also found a method called "Kahan summation algorithm" which probably

is the best way to do this.

What have I done then? I used your examples and code and added a

straight summation and the Kahan method using the pseudocode presented

in the Wikipedia article. I also made an experiment using randomized

floats merged wildly into a crazy list.

The results (as I interpret them):

1. The straight summation is the same method that the builtin sum uses

because their results are equal(ly poor).

2. Sorting to increasing order helps these two methods to have better

accuracy.

3. The pairwise method is almost as good as the Kahan method.

Here's the code and the results:

================================================== =======================

CODE:

================================================== =======================

import random

def sumpairs(seq):

tmp = []

for i,v in enumerate(seq):

if i&1:

tmp[-1] += v

i = i + 1

n = i & -i

while n 2:

t = tmp.pop(-1)

tmp[-1] = tmp[-1] + t

n >>= 1

else:

tmp.append(v)

while len(tmp) 1:

t = tmp.pop(-1)

tmp[-1] = tmp[-1] + t

return tmp[0]

def straightSum(seq):

result = 0.0

for elem in seq: result += elem

return result

def kahanSum(input):

sum = input[0]

n = len(input)

c = 0.0 # A running compensation for lost low-order

bits.

for i in range(1, n):

y = input[i] - c # So far, so good: c is zero.

t = sum + y # Alas, sum is big, y small,

# so low-order digits of y are lost.

c = (t - sum) - y # (t - sum) recovers the high-order part of y;

# subtracting y recovers -(low part of y)

sum = t # Algebraically, c should always be zero.

# Beware eagerly optimising compilers!

continue # Next time around, the lost low part will be

# added to y in a fresh attempt.

return sum

print '\n======================v = [1000000, 0.00001] * 1000000'

v = [1000000, 0.00001] * 1000000

print "CORRECT ", '500000.000005 <========'

print "builtin ", repr(sum(v)/len(v))

print "straight", repr(straightSum(v)/len(v))

print "pairwise", repr(sumpairs(v)/len(v))

print "Kahan ", repr(kahanSum(v)/len(v))

print "sorted"

v.sort()

print "builtin ", repr(sum(v)/len(v))

print "straight", repr(straightSum(v)/len(v))

print "pairwise", repr(sumpairs(v)/len(v))

print "Kahan ", repr(kahanSum(v)/len(v))

print "reverse sorted"

v.reverse()

print "builtin ", repr(sum(v)/len(v))

print "straight", repr(straightSum(v)/len(v))

print "pairwise", repr(sumpairs(v)/len(v))

print "Kahan ", repr(kahanSum(v)/len(v))

print '\n======================v = [1000000]*1000000 + [0.00001]*1000000'

v = [1000000]*1000000 + [0.00001]*1000000

print "CORRECT ", '500000.000005 <========'

print "builtin ", repr(sum(v)/len(v))

print "straight", repr(straightSum(v)/len(v))

print "pairwise", repr(sumpairs(v)/len(v))

print "Kahan ", repr(kahanSum(v)/len(v))

print '\n======================v = [0.00001]*1000000 + [1000000]*1000000'

v = [0.00001]*1000000 + [1000000]*1000000

print "CORRECT ", '500000.000005 <========'

print "builtin ", repr(sum(v)/len(v))

print "straight", repr(straightSum(v)/len(v))

print "pairwise", repr(sumpairs(v)/len(v))

print "Kahan ", repr(kahanSum(v)/len(v))

print '\n======================v = [1000000.00001] * 1000000'

v = [1000000.00001] * 1000000

print "CORRECT ", '1000000.00001 <========'

print "builtin ", repr(sum(v)/len(v))

print "straight", repr(straightSum(v)/len(v))

print "pairwise", repr(sumpairs(v)/len(v))

print "Kahan ", repr(kahanSum(v)/len(v))

print '\n======================Randomized lists'

print '========lst1 = [random.random() for i in range(1000000)]'

print '========lst2 = [1000000*random.random() for i in range(1000000)]'

N = 100000

lst1 = [random.random() for i in range(N)]

lst2 = [1000000.0 + random.random() for i in range(N)]

v = []

for i in range(N):

v.append(lst1[i])

v.append(lst2[i])

v.append(lst2[i])

v.append(lst2[i])

v.append(lst1[i])

v.append(lst2[i])

v.append(lst2[i])

v.append(lst2[i])

v.append(lst2[i])

print '========v = "Crazy merge of lst1 and lst2'

print 'min, max'

print 'lst1:', min(lst1), max(lst1)

print 'lst2:', min(lst2), max(lst2)

print "builtin ", repr(sum(v)/len(v))

print "straight", repr(straightSum(v)/len(v))

print "pairwise", repr(sumpairs(v)/len(v))

print "Kahan ", repr(kahanSum(v)/len(v))

print "sorted"

v.sort()

print "builtin ", repr(sum(v)/len(v))

print "straight", repr(straightSum(v)/len(v))

print "pairwise", repr(sumpairs(v)/len(v))

print "Kahan ", repr(kahanSum(v)/len(v))

print "reverse sorted"

v.reverse()

print "builtin ", repr(sum(v)/len(v))

print "straight", repr(straightSum(v)/len(v))

print "pairwise", repr(sumpairs(v)/len(v))

print "Kahan ", repr(kahanSum(v)/len(v))

================================================== =======================

RESULTS:

================================================== =======================

======================v = [1000000, 0.00001] * 1000000

CORRECT 500000.000005 <========

builtin 500000.00000083662

straight 500000.00000083662

pairwise 500000.00000499998

Kahan 500000.00000499998

sorted

builtin 500000.00000499998

straight 500000.00000499998

pairwise 500000.00000499998

Kahan 500000.00000499998

reverse sorted

builtin 500000.0

straight 500000.0

pairwise 500000.00000500004

Kahan 500000.00000499998

======================v = [1000000]*1000000 + [0.00001]*1000000

CORRECT 500000.000005 <========

builtin 500000.0

straight 500000.0

pairwise 500000.00000500004

Kahan 500000.00000499998

======================v = [0.00001]*1000000 + [1000000]*1000000

CORRECT 500000.000005 <========

builtin 500000.00000499998

straight 500000.00000499998

pairwise 500000.00000499998

Kahan 500000.00000499998

======================v = [1000000.00001] * 1000000

CORRECT 1000000.00001 <========

builtin 1000000.0000016732

straight 1000000.0000016732

pairwise 1000000.00001

Kahan 1000000.00001

======================Randomized lists

========lst1 = [random.random() for i in range(1000000)]

========lst2 = [1000000*random.random() for i in range(1000000)]

========v = "Crazy merge of lst1 and lst2

min, max

lst1: 1.59494420963e-005 0.999990993581

lst2: 1000000.00001 1000001.0

builtin 777778.27774196747

straight 777778.27774196747

pairwise 777778.27774199261

Kahan 777778.27774199261

sorted

builtin 777778.2777420698

straight 777778.2777420698

pairwise 777778.27774199261

Kahan 777778.27774199261

reverse sorted

builtin 777778.27774202416

straight 777778.27774202416

pairwise 777778.27774199261

Kahan 777778.27774199261

================================================== =======================

Cheers,

Jussi