I did the following calculation: Generated a list of a million random
numbers between 0 and 1, constructed a new list by subtracting the mean
value from each number, and then calculated the mean again.
The result should be 0, but of course it will differ from 0 slightly
because of rounding errors.
However, I noticed that the simple Python program below gives a result
of ~ 10^-14, while an equivalent Mathematica program (also using double
precision) gives a result of ~ 10^-17, i.e. three orders of magnitude
more precise.
Here's the program (pardon my style, I'm a newbie/occasional user):
from random import random
data = [random() for x in xrange(1000000)]
mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)
A little research shows that Mathematica uses a "compensate d summation"
algorithm. Indeed, using the algorithm described at http://en.wikipedia.org/wiki/Kahan_summation_algorithm
gives us a result around ~ 10^-17:
def compSum(arr):
s = 0.0
c = 0.0
for x in arr:
y = x-c
t = s+y
c = (t-s) - y
s = t
return s
mean = compSum(data)/len(data)
print compSum(x - mean for x in data)/len(data)
I thought that it would be very nice if the built-in sum() function used
this algorithm by default. Has this been brought up before? Would this
have any disadvantages (apart from a slight performance impact, but
Python is a high-level language anyway ...)?
Szabolcs Horvát 32 2194
Szabolcs Horvát <sz******@gmail .comwrites:
[...]
A little research shows that Mathematica uses a "compensate d
summation" algorithm. Indeed, using the algorithm described at http://en.wikipedia.org/wiki/Kahan_summation_algorithm
gives us a result around ~ 10^-17:
def compSum(arr):
s = 0.0
c = 0.0
for x in arr:
y = x-c
t = s+y
c = (t-s) - y
s = t
return s
mean = compSum(data)/len(data)
print compSum(x - mean for x in data)/len(data)
I thought that it would be very nice if the built-in sum() function
used this algorithm by default. Has this been brought up before?
Would this have any disadvantages (apart from a slight performance
impact, but Python is a high-level language anyway ...)?
Szabolcs Horvát
sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.
--
Arnaud
On Sat, 03 May 2008 18:50:34 +0200, Szabolcs Horvát wrote:
I did the following calculation: Generated a list of a million random
numbers between 0 and 1, constructed a new list by subtracting the mean
value from each number, and then calculated the mean again.
The result should be 0, but of course it will differ from 0 slightly
because of rounding errors.
However, I noticed that the simple Python program below gives a result
of ~ 10^-14, while an equivalent Mathematica program (also using double
precision) gives a result of ~ 10^-17, i.e. three orders of magnitude
more precise.
Here's the program (pardon my style, I'm a newbie/occasional user):
from random import random
data = [random() for x in xrange(1000000)]
mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)
A little research shows that Mathematica uses a "compensate d summation"
algorithm. Indeed, using the algorithm described at http://en.wikipedia.org/wiki/Kahan_summation_algorithm gives us a result
around ~ 10^-17:
def compSum(arr):
s = 0.0
c = 0.0
for x in arr:
y = x-c
t = s+y
c = (t-s) - y
s = t
return s
mean = compSum(data)/len(data)
print compSum(x - mean for x in data)/len(data)
I thought that it would be very nice if the built-in sum() function used
this algorithm by default. Has this been brought up before? Would this
have any disadvantages (apart from a slight performance impact, but
Python is a high-level language anyway ...)?
Szabolcs Horvát
Built-in sum should work with everything, not just floats. I think it
would be useful addition to standard math module.
If you know C you could write a patch to mathmodule.c and submit it to
Python devs.
--
Ivan
Szabolcs Horvát <sz******@gmail .comwrote:
I did the following calculation: Generated a list of a million random
numbers between 0 and 1, constructed a new list by subtracting the mean
value from each number, and then calculated the mean again.
The result should be 0, but of course it will differ from 0 slightly
because of rounding errors.
However, I noticed that the simple Python program below gives a result
of ~ 10^-14, while an equivalent Mathematica program (also using double
precision) gives a result of ~ 10^-17, i.e. three orders of magnitude
more precise.
Here's the program (pardon my style, I'm a newbie/occasional user):
from random import random
data = [random() for x in xrange(1000000)]
mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)
A little research shows that Mathematica uses a "compensate d summation"
algorithm. Indeed, using the algorithm described at http://en.wikipedia.org/wiki/Kahan_summation_algorithm
gives us a result around ~ 10^-17:
I was taught in my numerical methods course (about 20 years ago now!)
that the way to do this sum with most accuracy is to sum from the
smallest magnitude to the largest magnitude.
Eg
>>from random import random data = [random() for x in xrange(1000000)] mean = sum(data)/len(data) print sum(x - mean for x in data)/len(data)
1.64152134108e-14
>>mean = sum(sorted(data ))/len(data) print sum(x - mean for x in data)/len(data)
5.86725534824e-15
>>>
It isn't as good as the compensated sum but it is easy!
I thought that it would be very nice if the built-in sum() function used
this algorithm by default. Has this been brought up before? Would this
have any disadvantages (apart from a slight performance impact, but
Python is a high-level language anyway ...)?
sum() gets used for any numerical types not just floats...
--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Arnaud Delobelle wrote:
>
sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.
This occurred to me also, but then I tried
sum(['abc', 'efg'], '')
and it did not work. Or is this just a special exception to prevent the
misuse of sum to join strings? (As I said, I'm only an occasional user.)
Generally, sum() seems to be most useful for numeric types (i.e. those
that form a group with respect to __add__ and __neg__/__sub__), which
may be either exact (e.g. integers) or inexact (e.g. floating point
types). For exact types it does not make sense to use compensated
summation (though it wouldn't give an incorrect answer, either), and
sum() cannot decide whether a user-defined type is exact or inexact.
But I guess that it would still be possible to make sum() use
compensated summation for built-in floating point types (float/complex).
Or, to go further, provide a switch to choose between the two methods,
and make use compensated summation for float/complex by default. (But
perhaps people would consider this last alternative a bit too messy.)
(Just some thoughts ...)
Torsten Bronger wrote:
No, the above expression should yield ''+'abc'+'efg', look for the
signature of sum in the docs.
You're absolutely right, I misread it. Sorry about that.
--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 18 N 121 57 W && AIM, Y!M erikmaxfrancis
On Sat, 03 May 2008 20:44:19 +0200, Szabolcs Horvát wrote:
Arnaud Delobelle wrote:
>> sum() works for any sequence of objects with an __add__ method, not just floats! Your algorithm is specific to floats.
This occurred to me also, but then I tried
sum(['abc', 'efg'], '')
Interesting, I always thought that sum is like shortcut of
reduce(operator .add, ...), but I was mistaken.
reduce() is more forgiving:
reduce(operator .add, ['abc', 'efg'], '' ) # it works
'abcefg'
--
Ivan
On May 3, 10:13 pm, hdante <hda...@gmail.c omwrote:
I believe that moving this to third party could be better. What about
numpy ? Doesn't it already have something similar ?
Yes, Kahan summation makes sence for numpy arrays. But the problem
with this algorithm is optimizing compilers. The programmer will be
forced to use tricks like inline assembly to get around the optimizer.
If not, the optimizer would remove the desired features of the
algorithm. But then we have a serious portability problem.
On Sat, 2008-05-03 at 21:37 +0000, Ivan Illarionov wrote:
On Sat, 03 May 2008 20:44:19 +0200, Szabolcs Horvát wrote:
Arnaud Delobelle wrote:
>
sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.
This occurred to me also, but then I tried
sum(['abc', 'efg'], '')
Interesting, I always thought that sum is like shortcut of
reduce(operator .add, ...), but I was mistaken.
reduce() is more forgiving:
reduce(operator .add, ['abc', 'efg'], '' ) # it works
'abcefg'
Hm, it works for lists:
sum(([1], [2]), [])
[1, 2]
However I find the seccond argument hack ugly.
Does the sum way have any performance advantages over the reduce way?
--
Best Regards,
Med Venlig Hilsen,
Thomas
On Sun, 04 May 2008 00:31:01 +0200, Thomas Dybdahl Ahle wrote:
On Sat, 2008-05-03 at 21:37 +0000, Ivan Illarionov wrote:
>On Sat, 03 May 2008 20:44:19 +0200, Szabolcs Horvát wrote:
Arnaud Delobelle wrote:
sum() works for any sequence of objects with an __add__ method, not just floats! Your algorithm is specific to floats.
This occurred to me also, but then I tried
sum(['abc', 'efg'], '')
Interesting, I always thought that sum is like shortcut of reduce(operato r.add, ...), but I was mistaken.
reduce() is more forgiving: reduce(operato r.add, ['abc', 'efg'], '' ) # it works 'abcefg'
Hm, it works for lists:
sum(([1], [2]), [])
[1, 2]
However I find the seccond argument hack ugly. Does the sum way have any
performance advantages over the reduce way?
Yes, sum() is faster:
$ python -m timeit "" "sum([[1], [2], [3, 4]], [])"
100000 loops, best of 3: 6.16 usec per loop
$ python -m timeit "import operator" \
"reduce(operato r.add, [[1], [2], [3, 4]])"
100000 loops, best of 3: 11.9 usec per loop
--
Ivan This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Jordan Rastrick |
last post by:
First, a disclaimer. I am a second year Maths and Computer Science
undergraduate, and this is my first time ever on Usenet (I guess I'm
part of the http generation). On top of that, I have been using Python
for a grand total of about a fortnight now. Hence, I apologise if what
follows is a stupid suggestion, or if its already been made somewhere
else, or if this is not the appropriate place to make it, etc. But I
did honestly do some...
|
by: Quarco |
last post by:
Hi,
Suppose I have a query like:
SELECT
products.name AS product,
SUM(IF(stock.invoice=0,1,0)) AS in_stock,
SUM(IF(shopcart.status=1,1,0)) AS reserved
FROM products
LEFT JOIN stock ON products.id=stock.product_id
|
by: Paddy |
last post by:
I was browsing the Voidspace blog item on "Flattening Lists", and
followed up on the use of sum to do the flattening.
A solution was:
I would not have thought of using sum in this way. When I did help(sum)
the docstring was very number-centric: It went further, and precluded
its use on strings:
|
by: asaguiar |
last post by:
Hi,
Given the pseudocode explanation of the Kahan algorithm at
http://en.wikipedia.org/wiki/Kahan_summation_algorithm, I tried to
implement it in C. It is supposed to minimize the effect of base
conversion errors after repeated sum operations.
However, the effect is null. The rounding error persists without
change.
My implementation is:
|
by: n00m |
last post by:
http://www.spoj.pl/problems/SUMFOUR/
3
0 0 0 0
0 0 0 0
-1 -1 1 1
Answer for this input data is 33.
My solution for the problem is
======================================================================
| |
by: isurujaya |
last post by:
Hai Masters,
I have a problem in getting the summation of Mysql database several fields.
It is like this.
My database name -BAR
Table - Pda
There are several fields in the table like A1,A2, A3 ... in a one record.
And data( in int type) has been entered already. Numbers have been entered as data.
I want to display the Sum of A1,A2 and A3.. ( total amount of all fiedls in a one record- displaying by one by one acording to...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
| |
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |