Why different sums for these two functions for large sets of data?

Eric Lilja

Hello, consider the following two functions:

/* function foo() */
void foo()
{
float y = 0.0f;
float sum = 0.0f;

for(int i = 0; i < num; ++i)
{
for(int j = 0; j < num; ++j)
{
const float& x = myDataX[i];

y = myDataY[j];

sum += x + y;
}
}

std::cout << "sum = " << sum << std::endl;
}

/* function bar() */
void bar()
{
float sum = 0.0f;
float y_sum = 0.0f;

for(int y = 0; y < num; ++y)
y_sum += myDataY[y];

for(int x = 0; x < num; ++x)
{
float x_times_num = myDataX[x] * num;

float value = (x_times_num + y_sum);
sum += value;
}

std::cout << "sum = " << sum << std::endl;
}

If myDataX and myDataY are small arrays, both functions produce the correct
sum and bar() seems to be a bit faster (I removed code for timing when
posting). But for large sets of data, bar() produces a too large sum but is
very much faster. Why? Where's the bug in bar()? I have a feeling it has to
do with the fact that I'm using floats and floats don't always behave as one
might expect unless you understand in detail how they're represented
internally. I want to fix bar() so it produces the correct sum for large
data sets, and, hopefully still doing it faster than foo(). I am compiling
with debug info and will all optimizations disabled.

If you see any bugs, have suggestions on how to improve the speed of my
functions, please reply. I need to calculate the sum calculated by foo() and
bar() as fast as possible.

Example data set that works for both:
const int num = 4;
myDataX = new float[num];
myDataY = new float[num];
myDataX[0] = 1;
myDataX[1] = 2;
myDataX[2] = 3;
myDataX[3] = 4;

myDataY[0] = 5;
myDataY[1] = 6;
myDataY[2] = 7;
myDataY[3] = 8;

Example data set (large) where bar() produces an incorrect sum:
const int num = 20000;
myDataX = new float[num];
myDataY = new float[num];

std::fill(myDataX, &myDataX[num-1], 0.1f);
std::fill(myDataY, &myDataY[num-1], 0.1f);

These variables are static globals in my test program.

Thanks for any replies

/ E

Jul 23 '05 #1

Subscribe Post Reply

1539

Howard

"Eric Lilja" <mi****************************@gmail.com> wrote in message
news:d6**********@news.island.liu.se...

Hello, consider the following two functions:

std::fill(myDataX, &myDataX[num-1], 0.1f);
std::fill(myDataY, &myDataY[num-1], 0.1f);

Is that correct? I get totally bogus results when trying to run this code.
Try setting num to 4 for this case, and see what I mean. What do you get?
I get large negative values from both. I think the fill statement isn't
right. (Although I don't know how to fix it, since I've never used
std::fill with an array.)

A few suggestions:

1) why not use a vector, instead of an array?
2) why not use double instead of float, (since it has better precision,
which will help in any large summing of float values)?
3) isn't that formula equal to just: num times (the sum of the Xs plus the
sum of the Ys)?

-Howard

Jul 23 '05 #2

Donovan Rebbechi

On 2005-05-24, Eric Lilja <mi****************************@gmail.com> wrote:

Hello, consider the following two functions:

/* function foo() */
void foo()
{
float y = 0.0f;
float sum = 0.0f;

for(int i = 0; i < num; ++i)
{
for(int j = 0; j < num; ++j)
{
const float& x = myDataX[i];

y = myDataY[j];

sum += x + y;
}
}

std::cout << "sum = " << sum << std::endl;
}

/* function bar() */
void bar()
{
float sum = 0.0f;
float y_sum = 0.0f;

for(int y = 0; y < num; ++y)
y_sum += myDataY[y];

for(int x = 0; x < num; ++x)
{
float x_times_num = myDataX[x] * num;

float value = (x_times_num + y_sum);
sum += value;
}

std::cout << "sum = " << sum << std::endl;
}

If myDataX and myDataY are small arrays, both functions produce the correct
sum and bar() seems to be a bit faster (I removed code for timing when
posting). But for large sets of data, bar() produces a too large sum but is
very much faster. Why? Where's the bug in bar()? I have a feeling it has to
do with the fact that I'm using floats and floats don't always behave as one
might expect unless you understand in detail how they're represented
internally. I want to fix bar() so it produces the correct sum for large
data sets, and, hopefully still doing it faster than foo(). I am compiling
with debug info and will all optimizations disabled.

If you see any bugs, have suggestions on how to improve the speed of my
functions, please reply. I need to calculate the sum calculated by foo() and
bar() as fast as possible.

Example data set that works for both:
const int num = 4;
myDataX = new float[num];
myDataY = new float[num];
myDataX[0] = 1;
myDataX[1] = 2;
myDataX[2] = 3;
myDataX[3] = 4;

myDataY[0] = 5;
myDataY[1] = 6;
myDataY[2] = 7;
myDataY[3] = 8;

Example data set (large) where bar() produces an incorrect sum:
const int num = 20000;
myDataX = new float[num];
myDataY = new float[num];

std::fill(myDataX, &myDataX[num-1], 0.1f);
std::fill(myDataY, &myDataY[num-1], 0.1f);

These variables are static globals in my test program.

Thanks for any replies

Are you sure you've got this the right way around ? Your code looks very
inefficient. If you're trying to compute the sum

Sum(i=0..num-1,j=0..num-1) (x_i + y_j)

then you're doing this very inefficiently. Note that this sum exands as

Sum(i,j) x_i + Sum(i,j) y_j

Now because x_i is independent of j, and x_j is independent of x_i, you can
collapse a subscript of each sum, and get:

size * (Sum(i) x_i +sum(j) x_j)
Sum

The resulting code looks like this:

float bar ( float*x, float* y, int size )
{
double sumx = 0, sumy = 0;
for (int i = 0; i < size; ++i)
{
sumx += *x++;
sumy += *y++;
}
return size*(sumx + sumy);
}

or alternatively, as an STL one liner:
float bar ( float*x, float* y, int size )
{
return size * (std::accumulate(x,x+size,0.f) +
std::accumulate(y,y+size,0.f)
}

This should be (and is, according to my test with your data) more accurate
than your foo() because it only performs 2*(size)+1 floating point additions,
and one multiplication, compared to yours : 2*size*size floating point
additions. When I ran the test (with my version of bar), I got these results:
foo: 4.1943e+06
bar: 8e+07

even with only 200 elements in the array, I get:
foo: 8003.11
bar: 8000

Cheers,
--
Donovan Rebbechi
http://pegasus.rutgers.edu/~elflord/

Jul 23 '05 #3

Eric Lilja

Hi Donovan and thanks for your reply (see below),

"Donovan Rebbechi" wrote:

On 2005-05-24, Eric Lilja <mi****************************@gmail.com>
wrote: [my OP snipped]

Are you sure you've got this the right way around ? Your code looks very
inefficient. If you're trying to compute the sum

Sum(i=0..num-1,j=0..num-1) (x_i + y_j)

then you're doing this very inefficiently. Note that this sum exands as

Sum(i,j) x_i + Sum(i,j) y_j

Now because x_i is independent of j, and x_j is independent of x_i, you
can
collapse a subscript of each sum, and get:

size * (Sum(i) x_i +sum(j) x_j)
Sum

The resulting code looks like this:

float bar ( float*x, float* y, int size )
{
double sumx = 0, sumy = 0;
for (int i = 0; i < size; ++i)
{
sumx += *x++;
sumy += *y++;
}
return size*(sumx + sumy);
}

I tried that version and compared it with the original version that I was
given (that I've been trying to improve upon and that I didn't show in my
OP). Here's a complete program with output for the two varaints. As you can
see the results are equal for the small data set but way off for the large
data set:

#include <algorithm>
#include <cstddef>
#include <cstring>
#include <iomanip>
#include <iostream>

static double
un_optimized(const float *myDataX, const float *myDataY,
std::size_t num)
{
unsigned int i, j;
float sum = 0;

for( i = 0; i < num; i++ )
{
for( j = 0; j < num; j++ )
{
float x = myDataX[i];
float y = myDataY[j];
float value = x+y;
sum+=value;
}
}

return sum;
}

static double
optimized(const float *myDataX, const float *myDataY, std::size_t num)
{
double sumx = 0, sumy = 0;

for(unsigned int i = 0; i < num; ++i)
{
sumx += *myDataX++;
sumy += *myDataY++;
}

return (num*(sumx + sumy));
}

int
main()
{
std::size_t num = 20000;
float *myDataX = new float[num];
float *myDataY = new float[num];

std::fill(myDataX, &myDataX[num], 0.1f);
std::fill(myDataY, &myDataY[num], 0.1f);
//std::memset(myDataX, 1, num);
//std::memset(myDataY, 1, num);

std::cout << std::fixed << std::showpoint << std::setprecision(8);

std::cout << "sum for un_optimized(): "
<< un_optimized(myDataX, myDataY, num) << std::endl;
std::cout << "sum for optmized(): "
<< optimized(myDataX, myDataY, num) << std::endl;

delete [] myDataX;
delete [] myDataY;

num = 4;
float myDataX2[] = {1,2,3,4};
float myDataY2[] = {5,6,7,8};

std::cout << "sum for un_optimized(): "
<< un_optimized(myDataX2, myDataY2, num) << std::endl;
std::cout << "sum for optmized(): "
<< optimized(myDataX2, myDataY2, num) << std::endl;

return 0;
}

Output:
sum for un_optimized(): 4194304.00000000
sum for optmized(): 80000001.19209290
sum for un_optimized(): 144.00000000
sum for optmized(): 144.00000000

Any comments??
or alternatively, as an STL one liner:
float bar ( float*x, float* y, int size )
{
return size * (std::accumulate(x,x+size,0.f) +
std::accumulate(y,y+size,0.f)
}

This should be (and is, according to my test with your data) more accurate
than your foo() because it only performs 2*(size)+1 floating point
additions,
and one multiplication, compared to yours : 2*size*size floating point
additions. When I ran the test (with my version of bar), I got these
results:
foo: 4.1943e+06
bar: 8e+07

even with only 200 elements in the array, I get:
foo: 8003.11
bar: 8000

Cheers,
--
Donovan Rebbechi
http://pegasus.rutgers.edu/~elflord/

/ Eric

Jul 23 '05 #4

Shezan Baig

Eric Lilja wrote:

static double
optimized(const float *myDataX, const float *myDataY, std::size_t num)
{
double sumx = 0, sumy = 0;

for(unsigned int i = 0; i < num; ++i)
{
sumx += *myDataX++;
sumy += *myDataY++;
}

return (num*(sumx + sumy));
}

How about just:

static double optimized(const float *myDataX,
const float *myDataY,
std::size_t num)
{
float x = *myDataX, y = *myDataY;
return num * num * x * y;
}
(Since you are relying on the fact that all elements in each array
contain the same value, you don't even need the loop).

std::fill(myDataX, &myDataX[num], 0.1f);
std::fill(myDataY, &myDataY[num], 0.1f);

Hope this helps,
-shez-

Jul 23 '05 #5

Larry I Smith

Eric Lilja wrote:

Hello, consider the following two functions:

/* function foo() */
void foo()
{
float y = 0.0f;
float sum = 0.0f;

for(int i = 0; i < num; ++i)
{
for(int j = 0; j < num; ++j)
{
const float& x = myDataX[i];

y = myDataY[j];

sum += x + y;
}
}

std::cout << "sum = " << sum << std::endl;
}

/* function bar() */
void bar()
{
float sum = 0.0f;
float y_sum = 0.0f;

for(int y = 0; y < num; ++y)
y_sum += myDataY[y];

for(int x = 0; x < num; ++x)
{
float x_times_num = myDataX[x] * num;

float value = (x_times_num + y_sum);
sum += value;
}

std::cout << "sum = " << sum << std::endl;
}

If myDataX and myDataY are small arrays, both functions produce the correct
sum and bar() seems to be a bit faster (I removed code for timing when
posting). But for large sets of data, bar() produces a too large sum but is
very much faster. Why? Where's the bug in bar()? I have a feeling it has to
do with the fact that I'm using floats and floats don't always behave as one
might expect unless you understand in detail how they're represented
internally. I want to fix bar() so it produces the correct sum for large
data sets, and, hopefully still doing it faster than foo(). I am compiling
with debug info and will all optimizations disabled.

If you see any bugs, have suggestions on how to improve the speed of my
functions, please reply. I need to calculate the sum calculated by foo() and
bar() as fast as possible.

Example data set that works for both:
const int num = 4;
myDataX = new float[num];
myDataY = new float[num];
myDataX[0] = 1;
myDataX[1] = 2;
myDataX[2] = 3;
myDataX[3] = 4;

myDataY[0] = 5;
myDataY[1] = 6;
myDataY[2] = 7;
myDataY[3] = 8;

Example data set (large) where bar() produces an incorrect sum:
const int num = 20000;
myDataX = new float[num];
myDataY = new float[num];

std::fill(myDataX, &myDataX[num-1], 0.1f);
std::fill(myDataY, &myDataY[num-1], 0.1f);

These variables are static globals in my test program.

Thanks for any replies

/ E

That should be:

std::fill(myDataX, myDataX + num, 0.1f);
std::fill(myDataY, myDataY + num, 0.1f);

The 2nd arg needs to be an 'end()' iterator,
i.e. 'one past the end'.

Larry
--
Anti-spam address, change each 'X' to '.' to reply directly.

Jul 23 '05 #6

Larry I Smith

Eric Lilja wrote:
[snip]

I tried that version and compared it with the original version that I was
given (that I've been trying to improve upon and that I didn't show in my
OP). Here's a complete program with output for the two varaints. As you can
see the results are equal for the small data set but way off for the large
data set:

#include <algorithm>
#include <cstddef>
#include <cstring>
#include <iomanip>
#include <iostream>

static double
un_optimized(const float *myDataX, const float *myDataY,
std::size_t num)
{
unsigned int i, j;
float sum = 0;

Oops, the above MUST be 'double sum = 0;'
Otherwise it will overflow for large values of 'num'.

for( i = 0; i < num; i++ )
{
for( j = 0; j < num; j++ )
{
float x = myDataX[i];
float y = myDataY[j];
float value = x+y;
sum+=value;
}
}

return sum;
}

static double
optimized(const float *myDataX, const float *myDataY, std::size_t num)
{
double sumx = 0, sumy = 0;

for(unsigned int i = 0; i < num; ++i)
{
sumx += *myDataX++;
sumy += *myDataY++;
}

return (num*(sumx + sumy));
}

int
main()
{
std::size_t num = 20000;
float *myDataX = new float[num];
float *myDataY = new float[num];

std::fill(myDataX, &myDataX[num], 0.1f);
std::fill(myDataY, &myDataY[num], 0.1f);
//std::memset(myDataX, 1, num);
//std::memset(myDataY, 1, num);

std::cout << std::fixed << std::showpoint << std::setprecision(8);

std::cout << "sum for un_optimized(): "
<< un_optimized(myDataX, myDataY, num) << std::endl;
std::cout << "sum for optmized(): "
<< optimized(myDataX, myDataY, num) << std::endl;

delete [] myDataX;
delete [] myDataY;

num = 4;
float myDataX2[] = {1,2,3,4};
float myDataY2[] = {5,6,7,8};

std::cout << "sum for un_optimized(): "
<< un_optimized(myDataX2, myDataY2, num) << std::endl;
std::cout << "sum for optmized(): "
<< optimized(myDataX2, myDataY2, num) << std::endl;

return 0;
}

Output:
sum for un_optimized(): 4194304.00000000
sum for optmized(): 80000001.19209290
sum for un_optimized(): 144.00000000
sum for optmized(): 144.00000000

Any comments??

[snip]

Change 'sum' to 'double' in un_optimized() (see my comments above)
then this program outputs:

sum for un_optimized(): 80000001.19209290
sum for optmized(): 80000001.19209290
sum for un_optimized(): 144.00000000
sum for optmized(): 144.00000000
Regards,
Larry
--
Anti-spam address, change each 'X' to '.' to reply directly.

Jul 23 '05 #7

Niels Dybdahl

"Eric Lilja" <mi****************************@gmail.com> wrote in message
news:d6**********@news.island.liu.se...

Hello, consider the following two functions:

/* function foo() */
void foo()
{
float y = 0.0f;
float sum = 0.0f;

for(int i = 0; i < num; ++i)
{
for(int j = 0; j < num; ++j)
{
const float& x = myDataX[i];

y = myDataY[j];

sum += x + y;
}
}

std::cout << "sum = " << sum << std::endl;
}

/* function bar() */
void bar()
{
float sum = 0.0f;
float y_sum = 0.0f;

for(int y = 0; y < num; ++y)
y_sum += myDataY[y];

for(int x = 0; x < num; ++x)
{
float x_times_num = myDataX[x] * num;

float value = (x_times_num + y_sum);
sum += value;
}

std::cout << "sum = " << sum << std::endl;
}

If myDataX and myDataY are small arrays, both functions produce the correct sum and bar() seems to be a bit faster (I removed code for timing when
posting). But for large sets of data, bar() produces a too large sum but is very much faster. Why? Where's the bug in bar()?

floats usually have a precision of 6-7 digits. When your result becomes
larger than that compared to the values that you are adding, you loose the
values. This is a bigger problem in foo() because you only add x+y in each
loop. In bar() you add num*x+num*y in each loop, which is a larger value.
If you use double instead you will have around 15 digits precision and that
will solve the problem in both functions.

Niels Dybdahl

Jul 23 '05 #8

Karl Heinz Buchegger

Eric Lilja wrote:

[snip]

Any comments??

Until you know what you do and are willing and have the knowledge
to fight that beast:
* avoid 'float'
* and use 'double' instead.

--
Karl Heinz Buchegger
kb******@gascad.at

Jul 23 '05 #9

Old Wolf

Eric Lilja wrote:

/* function foo() */
void foo()
{
float y = 0.0f;
float sum = 0.0f;

for(int i = 0; i < num; ++i)
{
for(int j = 0; j < num; ++j)
{
const float& x = myDataX[i];
y = myDataY[j];
sum += x + y;
}
}
This is a bizarre way to write that algorithm, consider:

for(int i = 0; i < num; ++i)
for(int j = 0; j < num; ++j)
sum += myDataX[i] + myDataY[j];
If myDataX and myDataY are small arrays, both functions produce
the correct sum and bar() seems to be a bit faster. But for large
sets of data, bar() produces a too large sum but is very much faster.
Why? Where's the bug in bar()?
What makes you so sure that foo() is right and bar() is wrong?

Every floating point operation incurs a small precision error,
and foo does many many more operations than bar, so my guess
would be that bar's result is more correct.
I want to fix bar() so it produces the correct sum for large
data sets, and, hopefully still doing it faster than foo().
1. Use the simpler version everyone's already suggested
2. Use 'double' for your intermediate results. That will probably
be faster than using 'float' , as an added bonus.
3. In your compiler, turn on 'fast floating point' option. This
will exponentially increase the speed of your program, but
it may violate ISO C requirements for what to do when precision
is lost (ie. you may get a slightly different answer, for
better or worse).

Why is 'double' maybe faster ? Many CPUs have builtin instructions
for arithmetic in 'double', and when you store the result in a
float, it has to perform a truncation operation (and if it has to
conform to ISO C, an even slower truncation operation).
Example data set (large) where bar() produces an incorrect sum:
const int num = 20000;
myDataX = new float[num];
myDataY = new float[num];

std::fill(myDataX, &myDataX[num-1], 0.1f);
std::fill(myDataY, &myDataY[num-1], 0.1f);
Those fills are wrong, you are not filling the last member.
As well as the option suggested by Larry, consider this one:

std::fill_n(myDataX, num, 0.1f);

An even better option would be to use RAII:

std::vector<float> myDataX(num, 0.1f);

The correct answer to this calculation is 20000 *
(20000 * (0.1 + 0.1)) ie 80000000.

In your other post you wrote: sum for optmized(): 80000001.19209290

So it looks like bar() is indeed giving the more accurate
answer than foo().

Jul 23 '05 #10

Similar topics

Sums and StringBuffers

by: guyzdancin | last post by:

I have developed a parsing program to handle large csv files and compute sums. The program was developed and successfully tested using only String objects. I want replace String objects with ...

Java

Confused about functions

by: agentxx04 | last post by:

Hi. Our assignment was to creat a program that can find the average, median & mode of a #of integers. Here's my program: #include<stdio.h> int main() { int item; int a, b, t, mode; int...

C / C++

Functions taking pointers to different types as arguments

by: Mikhail Teterin | last post by:

Hello! I'd like to have a variable of a pointer-to-function type. The two possible values are of type (*)(FILE *) and (*)(void *). For example: getter = straight ? fgetc : gzgetc; nextchar...

C / C++

Variadic functions calling variadic functions with the argument list, HLL bit shifts on LE processors

by: Ross A. Finlayson | last post by:

Hi, I hope you can help me understand the varargs facility. Say I am programming in ISO C including stdarg.h and I declare a function as so: void log_printf(const char* logfilename, const...

C / C++

Enumeration idioms: Values from different enumerations

by: Ben Finney | last post by:

Antoon Pardon wrote: > I just downloaded your enum module for python > and played a bit with it. IMO some of the behaviour makes it less > usefull. Feedback is appreciated. I'm hoping to...

Python

Collecting different execution statistics of C++ programs

by: romixnews | last post by:

Hi, I'm facing the problem of analyzing a memory allocation dynamic and object creation dynamics of a very big C++ application with a goal of optimizing its performance and eventually also...

C / C++

Ordering Sums

by: DrewKull | last post by:

Hello, I have a query which sums check boxes (so 0 or -1) then Abs the sum so i can get the number of check box 'checked' per column... based on start and end date. SELECT Abs(Sum(CommCon)) AS ,...

Microsoft Access / VBA

How to build a procedure that returns different numbers of columns as a result based on a parameter

by: serge | last post by:

/* Subject: How to build a procedure that returns different numbers of columns as a result based on a parameter. You can copy/paste this whole post in SQL Query Analyzer or Management Studio...

Microsoft SQL Server

A deque containing different types of objects (with a common baseclass)

by: Juha Nieminen | last post by:

I'm sure this is not a new idea, but I have never heard about it before. I'm wondering if this could work: Assume that you have a common base class and a bunch of classes derived from it, and...

C / C++

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice