455,434 Members | 1,878 Online Need help? Post your question and get tips & solutions from a community of 455,434 IT Pros & Developers. It's quick & easy.

# problem returning floating point

 P: n/a I have written this small app to explain an issue I'm having with a larger program. In the following code I'm taking 10 ints from the keyboard. In the call to average() these 10 ints are then added and Divided by the number of ints to return the average, a float. Can someone see why I get for example: if the total is 227 then divided by 10 I get a return of 22.00000 instead of the correct answer 22.7. Thanks in advance Pete #include float average(int, int[]); /* Prototype returning float */ int main() { int num; /* array to hold 10 ints */ float avg; /* float to hold the average */ int i, size=10; printf("Enter 10 integers: \n"); /* Takes ten ints for keyboard */ for(i=0; i
11 Replies

 P: n/a Peter said: I have written this small app to explain an issue I'm having with a larger program. In the following code I'm taking 10 ints from the keyboard. In the call to average() these 10 ints are then added and Divided by the number of ints to return the average, a float. Can someone see why I get for example: if the total is 227 then divided by 10 I get a return of 22.00000 instead of the correct answer 22.7. 227.0 / 10.0 is indeed 22.7, within reason - the nature of floating point is such that you can't always get an answer as precisely as you'd like. So: 227.0 / 10.0 is 22.7 (ish) 227.0 / 10 is 22.7 (ish) 227 / 10.0 is 22.7 (ish) 227 / 10 is 22 (precisely) Spot the difference? One modification will suffice to fix the above problem: float average(int count, int numbers[]) { int total=0; Change this to: float total=0; -- Richard Heathfield Email: -http://www. +rjh@ Google users: "Usenet is a strange place" - dmr 29 July 1999 Nov 6 '08 #2

 P: n/a Hi On Thu, 06 Nov 2008 07:51:03 +0000, Richard Heathfield wrote: Peter said: >float average(int count, int numbers[]){ int total=0; int i; float ans; for(i=0; iint total=0; Change this to: float total=0; Not strictly wrong, but bad advice. As long as the size of the integer used is big enough to hold the sum, it is both faster and more accurate to do the addition in an integer. A much better change to make is: ans = (double)total / (double)count; Also, it is massively stupid to calculate the division inside each loop, although a half-decent compiler will realise this and optimise it away. My version of this function would look like: float average( int count, int *numbers ){ int total= 0; int *stop= numbers + count; while( numbers < stop ) total+= *numbers++; return (double)total / count; } I have removed the subscripting in favour of comparing pointers, to improve speed, in case this is performance critical. In the case where an int couldn't hold the maximum possible sum of the ten inputs, I would use a long or long long. In the case where that was still not enough, I would use two nested loops, the inner one adding up using integers, and the outer one using doubles. Again, this would give both increased accuracy and increased speed. Nov 6 '08 #3

 P: n/a In article , viza In the case where an int couldn't hold the maximum possible sum of theten inputs, I would use a long or long long. In the case where that wasstill not enough, I would use two nested loops, the inner one adding upusing integers, and the outer one using doubles. Again, this would giveboth increased accuracy and increased speed. Loss of precision in a case like this arises when a small number is added to a large one. A reasonable approach therefore is to add up each half of the numbers and then add these together, using the same method recursively to add up the halves. Something like this (untested): double sum_ints(int count, int *numbers) { if(count == 0) return 0.0; else if(count == 1) return (double)numbers; else return sum_ints(count/2, numbers) + sum_ints(count-count/2, numbers+count/2); } -- Richard -- Please remember to mention me / in tapes you leave behind. Nov 6 '08 #4

 P: n/a On 6 Nov, 12:19, viza wrote: I have removed the subscripting in favour of comparing pointers, to improve speed, in case this is performance critical. do you have access to a compiler where this makes any difference? -- Nick Keighley Nov 6 '08 #5

 P: n/a Hi On Thu, 06 Nov 2008 04:39:57 -0800, Nick Keighley wrote: On 6 Nov, 12:19, viza wrote: >I have removed the subscripting in favour of comparing pointers, toimprove speed, in case this is performance critical. do you have access to a compiler where this makes any difference? No, not noticeably, if at all. The use of one less variable makes the C look tidier to me, quite aside from what the compiler outputs. viza Nov 6 '08 #6

 P: n/a On 6 Nov, 14:20, viza wrote: Hi On Thu, 06 Nov 2008 04:39:57 -0800, Nick Keighley wrote: On 6 Nov, 12:19, viza wrote: I have removed the subscripting in favour of comparing pointers, to improve speed, in case this is performance critical. do you have access to a compiler where this makes any difference? No, not noticeably, if at all. The use of one less variable makes the C look tidier to me, quite aside from what the compiler outputs. I don't see how you are saving a variable. You wrote (I fiddled with layout a bit) float average( int count, int *numbers ) { int total= 0; int *stop= numbers + count; while( numbers < stop ) total+= *numbers++; return (double)total / count; } I'd write float average (int count, int *numbers) { int total = 0; int i; for (i = 0; i < count; count++) total += numbers [i]; return (double)total / count; } I don't find your version significantly easier to understand. The termination condition seems a little obscure. -- Nick Keighley Nov 6 '08 #7

 P: n/a Hi On Thu, 06 Nov 2008 16:05:28 +0000, Richard Heathfield wrote: viza said: >On Thu, 06 Nov 2008 07:51:03 +0000, Richard Heathfield wrote: >As long as the size of the integer used is big enough to hold the sum, We don't know this, so you're already on dodgy ground. We don't know that float is either. As soon as you add two of anything in C you make an assumption that the result will fit in the type. >it is both faster and more accurate to do the addition in an integer. Chapter and verse, please. (Math co-processors are commonplace nowadays.) I did test specifically this function when implementing several image shrinking functions. Integer was faster on both x86 and x86-64, which was all I had to hand. 64-bit integers were even faster then float or double on one 32 bit cpu that I tested. >A much better change to make is: ans = (double)total / (double)count; Introducing two spurious casts, and risking overflow on the total. Your method fails first. The range where all integers can be represented in a float is much smaller than the range of the same size int. Once you pass that limit then if some of your input ints are small they will be completely lost and if many of them are small (wrt the eventual mean) then the output of the float method will be less precise than in the integer method. int data[] = { INT_MAX, INT_MAX - 1, INT_MAX - 2 }; >or long long. Might not be available. Ok, my method simply cannot handle integers of the largest type available on the system which are close to their maximum value. In many situations it is possible to know a priori that this will not be the case, and achieve a more accurate result more quickly by using my method. By your method the OP would discard accuracy when that may not have been necessary or even acceptable, and would have run slower on many systems. Have a nice day. viza Nov 6 '08 #9

 P: n/a viza said: Hi On Thu, 06 Nov 2008 16:05:28 +0000, Richard Heathfield wrote: >viza said: >>A much better change to make is: ans = (double)total / (double)count; Introducing two spurious casts, and risking overflow on the total. Your method fails first. Not according to my testing. -- Richard Heathfield Email: -http://www. +rjh@ Google users: "Usenet is a strange place" - dmr 29 July 1999 Nov 6 '08 #10

 P: n/a On Nov 6, 4:19*am, viza wrote: Hi On Thu, 06 Nov 2008 07:51:03 +0000, Richard Heathfield wrote: Peter said: float average(int count, int numbers[]) { * int total=0; * int i; * float ans; * for(i=0; i

 P: n/a Hi On Thu, 06 Nov 2008 13:13:01 -0800, user923005 wrote: On Nov 6, 4:19Â*am, viza wrote: >As long as the size of the integer used is big enough to hold the sum,it is both faster and more accurate to do the addition in an integer. Also, floating point operations are not appreciably slower and many modern systems have special hardware for floating point to accelerate it. See, for example: http://www.cygnus-software.com/paper...dinfinity.html One floating point addition per cycle is going to be hard to beat. Ok, if you use simd it gets much faster but most compilers can't (or don't by default) use those instructions in a loop like this, so if you are only writing standard C and using a common compiler then integer addition is still much faster. I take your point that both ints and floats can store ints exactly over a certain range; that range is larger for ints though. >I have removed the subscripting in favour of comparing pointers, toimprove speed, in case this is performance critical. More nonsense. The compiler will probably do that for you but it can't hurt. viza Nov 7 '08 #12

### This discussion thread is closed

Replies have been disabled for this discussion. 