449,304 Members | 2,042 Online
Need help? Post your question and get tips & solutions from a community of 449,304 IT Pros & Developers. It's quick & easy.

# Floating point subtraction with FLT_MAX error

 P: n/a Just a small little program. Can not figure out what am I doing wrong. #include #include #include int main() { double max = FLT_MAX; double sub = 16703.627681; double result = max - sub; printf("%f - %f = %f\n", max, sub, result); return 0; } Output: 340282346638528859811704183484516925440.000000 - 16703.627681 = 340282346638528859811704183484516925440.000000 Any help would be highly appreciated. Thanks Oct 26 '07 #1
17 Replies

 P: n/a On Oct 25, 7:56 pm, spooler...@gmail.com wrote: Just a small little program. Can not figure out what am I doing wrong. #include #include #include int main() { double max = FLT_MAX; double sub = 16703.627681; double result = max - sub; printf("%f - %f = %f\n", max, sub, result); return 0; } Output: 340282346638528859811704183484516925440.000000 - 16703.627681 = 340282346638528859811704183484516925440.000000 Any help would be highly appreciated. Thanks It appears that you are trying to print a double (%Lf) by using a float (%f) hence the error you are seeing. Check out the link below for more info on the printf() function and it's modifiers. http://www.cplusplus.com/reference/c...io/printf.html Keith http://www.doubleblackdesign.com http://www.doubleblackdesign.com/forums Oct 26 '07 #2

 P: n/a On Oct 25, 5:06 pm, husterk #include #include int main() { double max = FLT_MAX; double sub = 16703.627681; double result = max - sub; printf("%f - %f = %f\n", max, sub, result); return 0; } Output: 340282346638528859811704183484516925440.000000 - 16703.627681 = 340282346638528859811704183484516925440.000000 Any help would be highly appreciated. Thanks It appears that you are trying to print a double (%Lf) by using a float (%f) hence the error you are seeing. Check out the link below for more info on the printf() function and it's modifiers. http://www.cplusplus.com/reference/c...io/printf.html Keithhttp://www.doubleblackdesign.comhttp://www.doubleblackdesign.com/forums Tried that already. Same results. %lf 340282346638528859811704183484516925440.000000 - 16703.627681 = 340282346638528859811704183484516925440.000000 %LF (Isn't it for long double?) 340282346638528859811704183484516925440.000000 - 16703.627681 = 0.000000 Thanks for the reply. Also as far as I know printf does not care if it is %f or %lf as everything goes as double, it is scanf which complains. Correct me if I am wrong. Oct 26 '07 #3

 P: n/a sp********@gmail.com wrote: Just a small little program. Can not figure out what am I doing wrong. #include #include #include int main() { double max = FLT_MAX; double sub = 16703.627681; double result = max - sub; printf("%f - %f = %f\n", max, sub, result); return 0; } Output: 340282346638528859811704183484516925440.000000 - 16703.627681 = 340282346638528859811704183484516925440.000000 What happens if you do: double max = DBL_MAX; Oct 26 '07 #4

 P: n/a In article <11**********************@z9g2000hsf.googlegroups. com>, Just a small little program. Can not figure out what am I doing wrong. >double max = FLT_MAX;double sub = 16703.627681; >double result = max - sub; Floating point does not have indefinite precision. What you have discovered is that near FLT_MAX, the numbers that your floating point system are able to represent are more than 16703 apart. Different systems use different schemes for floating point. One of the most common schemes is IEEE 754, http://en.wikipedia.org/wiki/IEEE_fl...point_standard -- "I will speculate that [...] applications [...] could actually see a performance boost for most users by going dual-core [...] because it is running the adware and spyware that [...] are otherwise slowing down the single CPU that user has today" -- Herb Sutter Oct 26 '07 #5

 P: n/a sp********@gmail.com wrote: Just a small little program. Can not figure out what am I doing wrong. #include #include #include int main() { double max = FLT_MAX; double sub = 16703.627681; double result = max - sub; printf("%f - %f = %f\n", max, sub, result); return 0; } Output: 340282346638528859811704183484516925440.000000 - 16703.627681 = 340282346638528859811704183484516925440.000000 Any help would be highly appreciated. Thanks I've changed your code slightly for format considerations. #include #include int main(void) { double max = FLT_MAX; double sub = 16703.627681; double result = max - sub; printf("%f - %f =\n%f\n", max, sub, result); return 0; } Output: 340282346638528859820000000000000000000.000000 - 16703.627681 = 340282346638528859820000000000000000000.000000 FLT_MAX has magnitude e+38 while double has precision of 16 digits or so. Your subtrahend (minuend?) is simply too small to make a difference. -- Joe Wright "Everything should be made as simple as possible, but not simpler." --- Albert Einstein --- Oct 26 '07 #6

 P: n/a On Fri, 26 Oct 2007 00:06:27 -0000, husterk #include #include int main() { double max = FLT_MAX; double sub = 16703.627681; double result = max - sub; printf("%f - %f = %f\n", max, sub, result); return 0; } Output: 340282346638528859811704183484516925440.000000 - 16703.627681 = 340282346638528859811704183484516925440.000000 Any help would be highly appreciated. Thanks It appears that you are trying to print a double (%Lf) by using a float (%f) hence the error you are seeing. Check out the link below for more info on the printf() function and it's modifiers. Did you write this in an absent-minded moment, or are you actually this mistaken about printf() conversion specifiers? http://www.cplusplus.com/reference/c...io/printf.html Even the page you reference states: "L The argument is interpreted as a long double (only applies to floating point specifiers: e, E, f, g and G)." It is undefined behavior to pass a double to printf() with a "%Lf" conversion specifier, although it will probably work on implementations, like Microsoft's compilers for 32-bit Windows, where double and long double have the same size and representation. "%f" is, has always been, and always will be the correct conversion specifier for double. Keith Also, please set your posting software to add a proper signature delimiter, namely "-- \n". -- Jack Klein Home: http://JK-Technology.Com FAQs for comp.lang.c http://c-faq.com/ comp.lang.c++ http://www.parashift.com/c++-faq-lite/ alt.comp.lang.learn.c-c++ http://www.club.cc.cmu.edu/~ajo/docs/FAQ-acllc.html Oct 26 '07 #7

 P: n/a sp********@gmail.com wrote: Just a small little program. Can not figure out what am I doing wrong. #include #include Note that int main() { double max = FLT_MAX; double sub = 16703.627681; double result = max - sub; printf("%f - %f = %f\n", max, sub, result); return 0; } Output: 340282346638528859811704183484516925440.000000 - 16703.627681 = 340282346638528859811704183484516925440.000000 Any help would be highly appreciated. /* The values FLT_EPSILON, DBL_EPSILON, and LDBL_EPSILON are defined to be, for each type, the smallest x 0.0 such that 1.0+x x. These suggests (but does not guarantee) that for a value y 0.0, y*(1.0+x) is the smallest value larger than y. These values are closely linked to the number of significant bits in the representation of the type. There is a limit FLT_DIG, DBL_DIG, or LDBL_DIG representing the guaranteed number of (decimal) digits of precision. Check the following program. */ #include #include #include int main() { double max = FLT_MAX; double sub = 16703.627681; double diff; printf ("The following values are all dependent on the implementation.\n" "They may or may not be similar to the values for your " "implementation.\n\n"); printf("FLT_DIG = %d\n", FLT_DIG); printf("FLT_MAX = %.*g, FLT_EPSILON = %.*g,\n" "FLT_MAX - FLT_MAX/(1.0+FLT_EPSILON) = %.*g,\n" "log10(1.0/FLT_EPSILON - 1.0) = %.*g\n\n", FLT_DIG, FLT_MAX, FLT_DIG, FLT_EPSILON, FLT_DIG, FLT_MAX - FLT_MAX / (1.0 + FLT_EPSILON), FLT_DIG, log10(1.0 / FLT_EPSILON - 1.0)); printf("DBL_DIG = %d\n", DBL_DIG); printf("DBL_MAX = %.*g, DBL_EPSILON = %.*g,\n" "DBL_MAX - DBL_MAX/(1.0+DBL_EPSILON) = %.*g,\n" "log10(1.0/DBL_EPSILON - 1.0) = %.*g\n\n", DBL_DIG, DBL_MAX, DBL_DIG, DBL_EPSILON, DBL_DIG, DBL_MAX - DBL_MAX / (1.0 + DBL_EPSILON), DBL_DIG, log10(1.0 / DBL_EPSILON - 1.0)); printf("LDBL_DIG = %d\n", LDBL_DIG); printf("LDBL_MAX = %.*Lg, LDBL_EPSILON = %.*Lg,\n" "LDBL_MAX - LDBL_MAX/(1.0+LDBL_EPSILON) = %.*Lg,\n" "log10(1.0/LDBL_EPSILON - 1.0) = %.*g\n\n", LDBL_DIG, LDBL_MAX, LDBL_DIG, LDBL_EPSILON, LDBL_DIG, LDBL_MAX - LDBL_MAX / (1.0 + LDBL_EPSILON), DBL_DIG, log10(1.0 / LDBL_EPSILON - 1.0)); diff = max * (1. - 1. / (1.0 + DBL_EPSILON)); printf("max is a double with value %.*g.\n" "We expect the next lower distinguishable double to\n" " differ from max by about %.*g.\n" "The original poster wanted to distinguish a value %.*g\n" "less than max, but this value is only" " %.*g * the (likely) smallest significant difference.\n", DBL_DIG, max, DBL_DIG, diff, DBL_DIG, sub, DBL_DIG, sub / diff); return 0; } [Output] The following values are all dependent on the implementation. They may or may not be similar to the values for your implementation. FLT_DIG = 6 FLT_MAX = 3.40282e+38, FLT_EPSILON = 1.19209e-07, FLT_MAX - FLT_MAX/(1.0+FLT_EPSILON) = 4.05648e+31, log10(1.0/FLT_EPSILON - 1.0) = 6.92369 DBL_DIG = 15 DBL_MAX = 1.79769313486232e+308, DBL_EPSILON = 2.22044604925031e-16, DBL_MAX - DBL_MAX/(1.0+DBL_EPSILON) = 3.99168061906944e+292, log10(1.0/DBL_EPSILON - 1.0) = 15.653559774527 LDBL_DIG = 18 LDBL_MAX = 1.18973149535723177e+4932, LDBL_EPSILON = 1.08420217248550443e-19, LDBL_MAX - LDBL_MAX/(1.0+LDBL_EPSILON) = 1.28990947194073851e+4913, log10(1.0/LDBL_EPSILON - 1.0) = 18.9648897268308 max is a double with value 3.40282346638529e+38. We expect the next lower distinguishable double to differ from max by about 7.55578592223147e+22. The original poster wanted to distinguish a value 16703.627681 less than max, but this value is only 2.21070684809276e-19 * the (likely) smallest significant difference. Oct 26 '07 #8

 P: n/a sp********@gmail.com wrote: > #include #include #include int main() { double max = FLT_MAX; double sub = 16703.627681; double result = max - sub; /* illegal */ printf("%f - %f = %f\n", max, sub, result); /* illegal */ return 0; } I reformatted your code for sanity. See the added comments. A float is not a double. An object is not a constant. -- Chuck F (cbfalconer at maineline dot net) Available for consulting/temporary embedded and systems. -- Posted via a free Usenet account from http://www.teranews.com Oct 26 '07 #9

 P: n/a "CBFalconer" >#include #include #include int main() { double max = FLT_MAX; double sub = 16703.627681; double result = max - sub; /* illegal */ printf("%f - %f = %f\n", max, sub, result); /* illegal */ return 0;} I reformatted your code for sanity. See the added comments. A float is not a double. An object is not a constant. What is illegal on the lines you commented ? -- Chqrlie. Oct 26 '07 #10

 P: n/a sp********@gmail.com wrote: Just a small little program. Can not figure out what am I doing wrong. #include #include #include int main() { double max = FLT_MAX; double sub = 16703.627681; double result = max - sub; printf("%f - %f = %f\n", max, sub, result); return 0; } Output: 340282346638528859811704183484516925440.000000 - 16703.627681 = 340282346638528859811704183484516925440.000000 Try this: #include #include #include #include int main(void) { double max = FLT_MAX; double next = nextafter(max, DBL_MAX); double min_diff = next - max; printf("%f - %f = %f\n", next, max, min_diff); return 0; } On my desktop, I got: 340282346638528897590636046441678635008.000000 -340282346638528859811704183484516925440.000000 =37778931862957161709568.000000 nextafter(x,y) is the next representable number after x, in the direction of y. Therefore, adding 16703.627681 to FLT_MAX gives a number that is not sufficiently different from FLT_MAX to have a different representation from FLT_MAX. Note: nextafter() was added in the 1999 version of the C standard; if you're compiling in C90 mode, it might not be available. Oct 26 '07 #11

 P: n/a Charlie Gordon wrote: "CBFalconer" sp********@gmail.com wrote: >>>#include #include #include int main() { double max = FLT_MAX; double sub = 16703.627681; double result = max - sub; /* illegal */ printf("%f - %f = %f\n", max, sub, result); /* illegal */ return 0;} I reformatted your code for sanity. See the added comments. Afloat is not a double. An object is not a constant. What is illegal on the lines you commented ? An object is not a constant. A float is not a double. -- Chuck F (cbfalconer at maineline dot net) Available for consulting/temporary embedded and systems. -- Posted via a free Usenet account from http://www.teranews.com Oct 26 '07 #12

 P: n/a In article <47***************@yahoo.com>, CBFalconer sp********@gmail.com wrote: >>#include #include #include int main() { double max = FLT_MAX; double sub = 16703.627681; double result = max - sub; /* illegal */ printf("%f - %f = %f\n", max, sub, result); /* illegal */ return 0;} I reformatted your code for sanity. See the added comments. Afloat is not a double. An object is not a constant. Your first comment, "a float is not a double": 1A) The comment could potentially make a difference in double max = FLT_MAX; but only if FLT_MAX could exceed DBL_MAX; otherwise the usual promotions would take care of converting the float to double for storage in max; {In C89 I don't immediately see a prohibition against FLT being bigger than DBL; it is implied by the usual promotions, though.} 2B) the comment could potentially make a difference in double sub = 16703.627681; but the usual promotions take care of that conversion, and 16703.627681 is guaranteed to be within the representable range of a double 2C) the comment could potentially make a difference in printf("%f - %f = %f\n", max, sub, result); but according to C89 4.9.6.1 "The fprintf Function", f the double argument is converted to decimal notation in the style [-]ddd.ddd where the number of digits after the decimal-point character is equal to the precision specification. Therefore it is completely legal (and requried!) to pass a double in at a position to be printed with a %f format. Your second comment, "An object is not a constant": yes, but so what? C89 3.5.7 Initialization All the expressions in an initializer for an object that has static storage duration or in an initializer list for an object that has aggregate or union type shall be constant expressions. If the declaration of an identifier has block scope, and the identifier has external or internal linkage, the declaration shall have no initializer for the identifier. Neither of these apply: the object "result" is automatic storage duration, not static storage duration, and the object "result" has block scope but does not have external or internal linkage. Therefore it is legal to initialize "result", and it is legal to initialize it with a non-constant expression. If you have different interpretations of the standards, you are invited to cite the appropriate sections and clauses. -- "I will speculate that [...] applications [...] could actually see a performance boost for most users by going dual-core [...] because it is running the adware and spyware that [...] are otherwise slowing down the single CPU that user has today" -- Herb Sutter Oct 26 '07 #13

 P: n/a On Oct 26, 9:50 am, rober...@ibd.nrc-cnrc.gc.ca (Walter Roberson) wrote: In article <4721A6E1.94FB0...@yahoo.com>, CBFalconer #include #include int main() { double max = FLT_MAX; double sub = 16703.627681; double result = max - sub; /* illegal */ printf("%f - %f = %f\n", max, sub, result); /* illegal */ return 0; } I reformatted your code for sanity. See the added comments. A float is not a double. An object is not a constant. Your first comment, "a float is not a double": 1A) The comment could potentially make a difference in double max = FLT_MAX; but only if FLT_MAX could exceed DBL_MAX; otherwise the usual promotions would take care of converting the float to double for storage in max; {In C89 I don't immediately see a prohibition against FLT being bigger than DBL; it is implied by the usual promotions, though.} 2B) the comment could potentially make a difference in double sub = 16703.627681; but the usual promotions take care of that conversion, and 16703.627681 is guaranteed to be within the representable range of a double 2C) the comment could potentially make a difference in printf("%f - %f = %f\n", max, sub, result); but according to C89 4.9.6.1 "The fprintf Function", f the double argument is converted to decimal notation in the style [-]ddd.ddd where the number of digits after the decimal-point character is equal to the precision specification. Therefore it is completely legal (and requried!) to pass a double in at a position to be printed with a %f format. Your second comment, "An object is not a constant": yes, but so what? C89 3.5.7 Initialization All the expressions in an initializer for an object that has static storage duration or in an initializer list for an object that has aggregate or union type shall be constant expressions. If the declaration of an identifier has block scope, and the identifier has external or internal linkage, the declaration shall have no initializer for the identifier. Neither of these apply: the object "result" is automatic storage duration, not static storage duration, and the object "result" has block scope but does not have external or internal linkage. Therefore it is legal to initialize "result", and it is legal to initialize it with a non-constant expression. If you have different interpretations of the standards, you are invited to cite the appropriate sections and clauses. -- "I will speculate that [...] applications [...] could actually see a performance boost for most users by going dual-core [...] because it is running the adware and spyware that [...] are otherwise slowing down the single CPU that user has today" -- Herb Sutter Thank you everyone for your replies. It cleared a lot of my confusions about floating point. Oct 26 '07 #14

 P: n/a Walter Roberson wrote: CBFalconer sp********@gmail.com wrote: >>>#include #include #include int main() { double max = FLT_MAX; double sub = 16703.627681; double result = max - sub; /* illegal */ printf("%f - %f = %f\n", max, sub, result); /* illegal */ return 0;} I reformatted your code for sanity. See the added comments. Afloat is not a double. An object is not a constant. Your first comment, "a float is not a double": .... snip ... > Therefore it is completely legal (and requried!) to pass a double in at a position to be printed with a %f format. Your second comment, "An object is not a constant": yes, but so what? The float illegal comment was wrong. I was erroneously referring to the printf statement. The point of the second is that the "double result = ...;" line requires a constant to perform the initialization, or so I thought. I guess the fact that it is an automatic object makes a difference. This leaves very little useful in my post. I shall hang my head in abject shame, and accept wet noodle flogging. -- Chuck F (cbfalconer at maineline dot net) Available for consulting/temporary embedded and systems. -- Posted via a free Usenet account from http://www.teranews.com Oct 26 '07 #15

 P: n/a CBFalconer "CBFalconer" >sp********@gmail.com wrote:#include #include #include int main() { double max = FLT_MAX; double sub = 16703.627681; double result = max - sub; /* illegal */ printf("%f - %f = %f\n", max, sub, result); /* illegal */ return 0;}I reformatted your code for sanity. See the added comments. Afloat is not a double. An object is not a constant. What is illegal on the lines you commented ? An object is not a constant. A float is not a double. Both of those repeated statements are correct. Neither is responsive to the question. The correct answer is that *nothing* is illegal on either line. Neither line uses an object in a context that requires a constant, or vice versa. Neither line uses a float in a context that requires a double, or vice versa. The program is legal and portable (and fails to be strictly conforming only because its output is implementation-defined). If you disagree, please explain without repeating your previous statements. The problem that I presume the OP was worried about is that ``max - sub'' yielded the same value as ``sub''. In fact, this is to be expected, given the finite precision of floating-point operations. All floating-point expressions in the program, other than FLT_MAX, are of type double. "%f" is a correct format for printing a value of type double. ("%f" can also be used for type float, since float arguments are promoted to type double. "%Lf" is for type long double, which is not used here.) Initializing an object of type double with the value of FLT_MAX is odd, and probably not what the OP really intended, but it's perfectly legal. Using DBL_MAX would make more sense here. -- Keith Thompson (The_Other_Keith) ks***@mib.org San Diego Supercomputer Center <* "We must do something. This is something. Therefore, we must do this." -- Antony Jay and Jonathan Lynn, "Yes Minister" Oct 26 '07 #16

 P: n/a Keith Thompson wrote: CBFalconer Charlie Gordon wrote: >>"CBFalconer" #include #include #include >int main() { double max = FLT_MAX; double sub = 16703.627681; double result = max - sub; /* illegal */> printf("%f - %f = %f\n", max, sub, result); /* illegal */ return 0;}I reformatted your code for sanity. See the added comments. Afloat is not a double. An object is not a constant.What is illegal on the lines you commented ? An object is not a constant. A float is not a double. Both of those repeated statements are correct. Neither is responsive to the question. The correct answer is that *nothing* is illegal on either line. Neither line uses an object in a context that requires a constant, or vice versa. Neither line uses a float in a context that requires a double, or vice versa. The program is legal and portable (and fails to be strictly conforming only because its output is implementation-defined). If you disagree, please explain without repeating your previous statements. I don't disagree, and I made an abject apology about 6 hours ago. -- Chuck F (cbfalconer at maineline dot net) Available for consulting/temporary embedded and systems. -- Posted via a free Usenet account from http://www.teranews.com Oct 27 '07 #17

 P: n/a "CBFalconer" "CBFalconer" >sp********@gmail.com wrote:#include #include #include int main() { double max = FLT_MAX; double sub = 16703.627681; double result = max - sub; /* illegal */ printf("%f - %f = %f\n", max, sub, result); /* illegal */ return 0;}I reformatted your code for sanity. See the added comments. Afloat is not a double. An object is not a constant. What is illegal on the lines you commented ? An object is not a constant. A float is not a double. What object? As far as floats, I can see FLT_MAX being converted to double to be stored into max. %f format in printf takes a double as provided. result is assigned a non constant expression, but what problem does it pose for a local variable with automatic storage? I fail to see any problem with this code. -- Chqrlie Oct 27 '07 #18

### This discussion thread is closed

Replies have been disabled for this discussion.