448,651 Members | 1,757 Online Need help? Post your question and get tips & solutions from a community of 448,651 IT Pros & Developers. It's quick & easy.

# Integers vs floats

 P: 12 Can someone explain the output I'm getting from the following 2 code portions? In the first, I've used n and sizeOfArrays as long doubles, and in the second I've made sizeOfArrays an integer, as this is the form needed for setting the number of elements in an array. However, the moment it's in integer form, the output is "wrong" ie not what I want it to be. Why is this? How can I get around it? Expand|Select|Wrap|Line Numbers     long double h = 0.1;     long double n = 1.5/h;     long double sizeOfArrays = n + 1;       cout.setf(ios::fixed);     cout.precision(7);       cout << n << " and  " << sizeOfArrays << endl;   Output is 15.0000000 and 16.0000000 Expand|Select|Wrap|Line Numbers     long double h = 0.1;     long double n = 1.5/h;     int sizeOfArrays = n + 1;       cout.setf(ios::fixed);     cout.precision(7);       cout << n << " and  " << sizeOfArrays << endl;   Output is 15.0000000 and 15 If I set n as an integer, it gives an output value of 14, too. This I could understand better, and setting the array size as "n+1" naturally wouldn't work while n was a float, but I thought I could get around it all by creating a new variable called sizeOfArrays set as an integer. The results are above. May 9 '07 #1
6 Replies

 P: 12 To explain more clearly what output I was expecting: n should be = 15, doing the calculation manually, therefore "sizeOfArray" should be = 16. Calling n an integer makes its value = 14, and calling "sizeOfArray" and integer makes it 15. This is what I don't understand. I have to make one or the other an integer to be able to use it to describe the number of elements in the array. The reason I want to do this all in code (when I know what the answers are!) is that I want to be able to change the value of h at will, and n (and therefore the desired array size) depends on the value of h, so doing it in code stops me from having to go all the way through the rest of the program changing every bit that's dependent on h. May 9 '07 #2

 Expert 100+ P: 1,251 Can someone explain the output I'm getting from the following 2 code portions? In the first, I've used n and sizeOfArrays as long doubles, and in the second I've made sizeOfArrays an integer, as this is the form needed for setting the number of elements in an array. However, the moment it's in integer form, the output is "wrong" ie not what I want it to be. Why is this? How can I get around it? Expand|Select|Wrap|Line Numbers     long double h = 0.1;     long double n = 1.5/h;     long double sizeOfArrays = n + 1;       cout.setf(ios::fixed);     cout.precision(7);       cout << n << " and  " << sizeOfArrays << endl;   Output is 15.0000000 and 16.0000000 Expand|Select|Wrap|Line Numbers     long double h = 0.1;     long double n = 1.5/h;     int sizeOfArrays = n + 1;       cout.setf(ios::fixed);     cout.precision(7);       cout << n << " and  " << sizeOfArrays << endl;   Output is 15.0000000 and 15 If I set n as an integer, it gives an output value of 14, too. This I could understand better, and setting the array size as "n+1" naturally wouldn't work while n was a float, but I thought I could get around it all by creating a new variable called sizeOfArrays set as an integer. The results are above. What you are experiencing is rounding error. When converting from a floating point number to a integral number, it takes the binary value of the floating point's integral part and truncates the rest. A number like 14.9999999999999999999999999999999999999999999 will show up as 15 in the code you provided, but it is not actually 15. (If the 9 was repeated indefinitely, then it would actually be equal to 15. There is a proof, but I'm going leave that up to you as an exercise). So the truncation results in 14. This is an unfortunate side-effect of representing infinite floating point numbers using finite ones. A possible work around would be to add a very small number to the value. Say 1e-8 (I picked that somewhat arbitrarily given you are printing to 7 digits). Or a more precise way would be to add epsilon() to it. BTW, what is your reason for using long doubles? I do not see any need for it for the precision you require. Adrian May 9 '07 #4

 P: 12 Thanks, Jos, I'll read through that. Adrian, the code is a portion of a program for numerical methods, so basically I'm just using long doubles because it's the highest accuracy I could use. Perhaps doubles would in fact suffice. But I stopped using regular floats very early on in these types of programs, presumably because I found the accuracy insufficient at the time, and I can't remember now whether I found the same problem with doubles. Thanks for your reply, too - I'll see what I can do based on that. Is there no way just to round off instead of truncate? May 9 '07 #5

 P: 12 All right, adding 1e-8 did the trick. Thanks! May 9 '07 #6

 Expert 100+ P: 1,251 Thanks, Jos, I'll read through that. Adrian, the code is a portion of a program for numerical methods, so basically I'm just using long doubles because it's the highest accuracy I could use. Perhaps doubles would in fact suffice. But I stopped using regular floats very early on in these types of programs, presumably because I found the accuracy insufficient at the time, and I can't remember now whether I found the same problem with doubles. Thanks for your reply, too - I'll see what I can do based on that. Is there no way just to round off instead of truncate? Yeah, try using the round function ;) See http://www.codecogs.com/reference/ma...hp?alias=round. Adrian May 9 '07 #7 