473,320 Members | 1,810 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

SUM(double_value) is not deterministic?

Hi,

I'm using predefined SUM(() function on DOUBLE column. Surpisingly the
sum can give different results depending on the order of values in my
table.

Is that correct?

I can explain it, as adding two double values can entail rouding the
result, so the order of adding influence the final result.

Actually I cannot change my select statement to cast double column to
decimal, or to change the type in the database. I'm looking for a new
SUM(double) aggregate function, which internaly will be adding values
using decimal, f.e. decimal(31,15) and which finally return casted
double value.
Here is a code to replicate the issue:
---------------------------------------------------------------
create table table1 (value double)
@
create procedure load()
begin
declare c int default 0;
while (c < 100000)
do

set c = c + 1;
insert into table1 values ( cast(rand() as double) );

end while;
end
@
call load()
@
create table table2 like table1
@
insert into table2 select * from table1 order by value
@
select sum(value) from table1
@
select sum(value) from table2
@
select double(sum(cast(value as decimal(31,15)))) from table1
@
results:
+4,99359445173494E+004
+4,99359445173499E+004

The correct one in my case is:
+4,99359445173498E+004

Regards,
Artur Wronski

Apr 25 '06 #1
8 2860
Artur wrote:
Hi,

I'm using predefined SUM(() function on DOUBLE column. Surpisingly the
sum can give different results depending on the order of values in my
table.

Is that correct?

I can explain it, as adding two double values can entail rouding the
result, so the order of adding influence the final result.

Actually I cannot change my select statement to cast double column to
decimal, or to change the type in the database. I'm looking for a new
SUM(double) aggregate function, which internaly will be adding values
using decimal, f.e. decimal(31,15) and which finally return casted
double value.
Here is a code to replicate the issue:
---------------------------------------------------------------
create table table1 (value double)
@
create procedure load()
begin
declare c int default 0;
while (c < 100000)
do

set c = c + 1;
insert into table1 values ( cast(rand() as double) );

end while;
end
@
call load()
@
create table table2 like table1
@
insert into table2 select * from table1 order by value
@
select sum(value) from table1
@
select sum(value) from table2
@
select double(sum(cast(value as decimal(31,15)))) from table1
@
results:
+4,99359445173494E+004
+4,99359445173499E+004

The correct one in my case is:
+4,99359445173498E+004

I'm no expert in the matter, but I can't say that I'm terribly surprised.

Cheers
Serge

--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Apr 26 '06 #2
First, It is well known that the result of sum of floating point
numbers would be different by sequence of summation of numbers, in
mathematical area.
To get more accurate result, usually summed from small numbers to large
numbers.

That might be the reason of the different results of SUM on table1 and
table2.

But, I don't know this means SUM is not deterministic (It might be
depending on DB2 specification(definition of deterministic)).

Second, You tried
select double(sum(cast(value as decimal(31,15)))) from table1
I thought that an error(drop some lowest digits) may be introduced at
the time of converting from float to decimal.
So, this way may not usefull to get correct result.

Last, my question is how do you get correct
one(+4,99359445173498E+004)?

Apr 26 '06 #3
The problem is that having two different sessions, one from CLP, one
from CLI application I get different results of the sum running the
same query on the same static table.

Let's say that I have only 3 values, and in my case let's assume the
precision is only 3 digits:
1.14
2.23
7.99

I can do the sum in two ways, that give different final result:

1.14 + 2.23 -> 3.37 + 7.99 -> 11.4
2.23 + 7.99 -> 10.2 + 1.14 ->11.3

And in my case the correct sum is:
1.14 + 2.23 -> 3.37 + 7.99 -> 11.36 -> 11.4
2.23 + 7.99 -> 10.22 + 1.14 -> 11.36 -> 11.4

Correct in my definition means not order depended. I know that in
general it can be difficult to achieve, but the idea is to add values
internally using different type (which can hold the intermediate result
without truncation), or even with predefined, each time the same order
(fe. not to start adding the highest with lowest value).

I wish to have a way to writh my own sum(double) aggregate function to
replace the standard one (my customer is having badly coded report
which stores data in double field, and have no access to the source
code).

-- Artur Wronski

Apr 26 '06 #4
> I can do the sum in two ways, that give different final result:

1.14 + 2.23 -> 3.37 + 7.99 -> 11.4
2.23 + 7.99 -> 10.2 + 1.14 ->11.3

How did you calculate these expressions?
Especially my question is that why interim result(10.2) in second
expression has only one decimal digit .

Here is an example of calculation of second expression by SQL. I got
correct interim result and final sum.
SELECT interim
, interim + 1.14 fimnal
FROM (VALUES 2.23 + 7.99 ) Q (interim);
--------------------------------------------------------------------

INTERIM FIMNAL
------- -------
10.22 11.36

1 record(s) selected.

Apr 27 '06 #5
I forgot you wrote "precision is only 3 digits".
But, how do you know scale of interim result without knowing data
values boforehand, if you want to make scale of interim result also 3?
If calculated by SQL and you didn't specify precision and scale of
interim result, precisions and scales will be followings.
1.14 + 2.23 -> 3.37 + 7.99 [ -> 11.36] -> 11.3
DEC(3,2) + DEC(3,2) -> DEC(4,2) + DEC(3,2) [ -> DEC(6,2) ] -> DEC(3,1)
2.23 + 7.99 -> 10.22 + 1.14 [ -> 11.36] ->11.3
DEC(3,2) + DEC(3,2) -> DEC(4,2) + DEC(3,2) [ -> DEC(6,2) ] -> DEC(3,1)
To get correct answer with 3 digits, it would be better to use ROUND
for final result.

SELECT interim
, DEC(ROUND( (interim + DEC(7.99, 3 , 2)), 1), 3, 1) fimnal
FROM (VALUES DEC(1.14, 3,2) + DEC(2.23, 3 , 2) ) Q (interim);
--------------------------------------------------------------------

INTERIM FIMNAL
------- ------
3.37 11.4

1 record(s) selected.
------------------------- Commands Entered -------------------------
SELECT interim
, DEC(ROUND( (interim + DEC(1.14, 3 , 2)), 1), 3, 1) fimnal
FROM (VALUES DEC(2.23, 3,2) + DEC(7.99, 3 , 2) ) Q (interim);
--------------------------------------------------------------------

INTERIM FIMNAL
------- ------
10.22 11.4

1 record(s) selected.

Apr 27 '06 #6
Artur,

When you add the smallest double 0.000000.......000001 to all your
double numbers and then sum up and round, it is giving the right total.
I tried this

select round(sum(myfield+0.0000000000000000000001), 2) from mytable

You can replace 2 with any number you want.

regards,

Mehmet

Apr 27 '06 #7
Artur wrote:
Hi,

I'm using predefined SUM(() function on DOUBLE column. Surpisingly the
sum can give different results depending on the order of values in my
table.

Is that correct?


There is a very nicely written summary on floating point numbers here:

http://docs.sun.com/source/806-3568/ncg_goldberg.html

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Apr 28 '06 #8
Thank you guys for you prompt response.

I'm convinced to analyze value ranges in the database and suggest the
database owner to change double to some more precise type.

-- Artur Wronski

Apr 28 '06 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

17
by: John Hunter | last post by:
I have a largish data set (1000 observations x 100 floating point variables), and some of the of the data are missing. I want to try a variety of clustering, neural network, etc, algorithms on the...
699
by: mike420 | last post by:
I think everyone who used Python will agree that its syntax is the best thing going for it. It is very readable and easy for everyone to learn. But, Python does not a have very good macro...
2
by: claus.hirth | last post by:
I wrote a stored procedure that uses a prepared INSERT INTO statement in order to play with the PREPARE and EXECUTE keywords. In transcript 1 below the call to that stored procedure does not...
3
by: lenygold via DBMonster.com | last post by:
One column table: ALL_SUM -------------------------------------------------------------------- 73237155+73240240+73243230+73249335 73237155+73240240+73246345 ...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.