464,411 Members | 1,287 Online Need help? Post your question and get tips & solutions from a community of 464,411 IT Pros & Developers. It's quick & easy.

# Can a double always represent an int exactly?

 P: n/a I'm using the expression "int a = ceil( SomeDouble )". The man page says that ceil returns the smallest integer that is not less than SomeDouble, represented as a double. However, my understanding is that a double has nonuniform precision throughout its value range. Will a double always be able to exactly represent any value of type int? Could someone please point me to an explanation of how this is ensured, given that the details of a type realization varies with the platform? Thanks. Fred P.S. I am not worried about overflowing the int value range, just about the guaranteed precise representation of int by double. Nov 14 '05 #1
22 Replies

 P: n/a In article <41***************@doe.carleton.ca>, Fred Ma wrote:I'm using the expression "int a = ceil( SomeDouble )".The man page says that ceil returns the smallestinteger that is not less than SomeDouble, representedas a double. However, my understanding is that adouble has nonuniform precision throughout its valuerange. Will a double always be able to exactlyrepresent any value of type int? Could someone pleasepoint me to an explanation of how this is ensured,given that the details of a type realization varieswith the platform? I don't know whether the C Standard specifies anything to this effect. But here is an implementation-specific observation. On a machine with 64-bit doubles which follow the IEEE specification, the mantissa part is 53 bits (plus one hidden bit as well) therefore integers as large as around 2 to the 50th power should be exactly representable. In particular, if the machine has 32-bit ints, they are all exactly representable as doubles. On my machine, which has 32-bit ints and 64-bit doubles, the following yields the exact answer: printf("%30.15f\n", 1.0 + pow(2.0, 52.)); However the following stretches it too far and the answer is inexact: printf("%30.15f\n", 1.0 + pow(2.0, 53.)); -- rr Nov 14 '05 #2

 P: n/a Rouben Rostamian wrote: I don't know whether the C Standard specifies anything to this effect. But here is an implementation-specific observation. On a machine with 64-bit doubles which follow the IEEE specification, the mantissa part is 53 bits (plus one hidden bit as well) therefore integers as large as around 2 to the 50th power should be exactly representable. In particular, if the machine has 32-bit ints, they are all exactly representable as doubles. On my machine, which has 32-bit ints and 64-bit doubles, the following yields the exact answer: printf("%30.15f\n", 1.0 + pow(2.0, 52.)); However the following stretches it too far and the answer is inexact: printf("%30.15f\n", 1.0 + pow(2.0, 53.)); I realize that if a double actually uses twice as many bits as ints, the mantissa should be big enough that imprecision should never arise. I'm just concerned about whether this can be relied upon. My faith in what seems normal has been shaken after finding that long has the same number of bits as int in some environments. What if double has the same number of bits as ints in some environments? Some of those bits will be taken up by the exponent, and the mantissa will actually have fewer bits than an int. Hence, it will be less precise than ints within the value range of ints. Fred Nov 14 '05 #3

 P: n/a >I'm using the expression "int a = ceil( SomeDouble )".The man page says that ceil returns the smallestinteger that is not less than SomeDouble, representedas a double. However, my understanding is that adouble has nonuniform precision throughout its valuerange. Will a double always be able to exactlyrepresent any value of type int? No. There is nothing prohibiting an implementation from choosing int = 64-bit signed integer, and double = 64-bit IEEE double, which has only 53 mantissa bits. Integers outside the range +/- 2**53 may be rounded. Could someone pleasepoint me to an explanation of how this is ensured,given that the details of a type realization varieswith the platform? It is NOT ensured. Gordon L. Burditt Nov 14 '05 #4

 P: n/a In article <41***************@doe.carleton.ca> Fred Ma writes:I'm using the expression "int a = ceil( SomeDouble )". The manpage says that ceil returns the smallest integer that is not lessthan SomeDouble, represented as a double. However, my understandingis that a double has nonuniform precision throughout its value range. This is correct (well, I can imagine a weird implementation that deliberately makes "double"s have constant precision by often wasting a lot of space; it seems quite unlikely though). Note that ceil() returns a double, not an int. Will a double always be able to exactly represent any value oftype int? This is implementation-dependent. If "double" is not very precise but INT_MAX is very large, it is possible that not all "int"s can be represented. This is one reason ceil() returns a double (though a small one at best -- the main reason is so that ceil(1.6e35) can still be 1.6e35, for instance). Could someone please point me to an explanation of how this is ensured,given that the details of a type realization varies with the platform? I am not sure what you mean by "this", especially with the PS: P.S. I am not worried about overflowing the int valuerange, just about the guaranteed precise representationof int by double. .... but let me suppose you are thinking of a case that actually occurs if we substitute "float" for "double" on most of today's implementations. Here, we get "interesting" effects near 8388608.0 and 16777216.0. Values below 16777216.0 step by ones: 8388608.0 is followed immediately by 8388609.0, for instance, and 16777215.0 is followed immediately by 16777216.0. On the other hand, below (float)(1<<23) or above (float)(1<<24), we step by 1/2 or 2 respectively. Using nextafterf() (if you have it) and variables set to the right values, you might printf() some results and find: nextafterf(8388608.0, -inf) = 8388607.5 nextafterf(16777216.0, +inf) = 16777216.2 So all ceil() has to do with values that are at least 8388608.0 (in magnitude) is return those values -- they are already integers. It is only values *below* this area that can have fractional parts. Of course, when we use actual "double"s on today's real (IEEE style) implementations, the tricky point is not 2-sup-23 but rather 2-sup-52. The same principal applies, though: values that meet or exceed some magic constant (in either positive or negative direction) are always integral, because they have multiplied away all their fraction bits by their corresponding power of two. Since 2-sup-23 + 2-sup-22 + ... + 2-sup-0 is a sum of integers, it must itself be an integer. Only if the final terms of the sum involve negative powers of two can it contain fractions. The other "this" you might be wondering about is: how do you drop off the fractional bits? *That* one depends (for efficiency reasons) on the CPU. The two easy ways are bit-twiddling, and doing addition followed by subtraction. In both cases, we just want to zero out any mantissa (fraction) bits that represent negative powers of two. The bit-twiddling method does it with the direct and obvious way: mask them out. The add-and-subtract method uses the normalization hardware to knock them out. If normalization is slow (e.g., done in software or with a microcode loop), the bit-twiddling method is generally faster. -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: forget about it http://web.torek.net/torek/index.html Reading email is like searching for food in the garbage, thanks to spammers. Nov 14 '05 #5

 P: n/a Fred Ma wrote: I'm using the expression "int a = ceil( SomeDouble )". The man page says that ceil returns the smallest integer that is not less than SomeDouble, represented as a double. However, my understanding is that a double has nonuniform precision throughout its value range. I am not sure what you mean here, but a double is a floating-point type and like all such has a precision of some fixed number of significant digits. This precision does not vary, but for large exponents the difference between one number and the next higher one can be fairly large. Will a double always be able to exactly represent any value of type int? Not necessarily. If, as is common, a double is 64 bits wide with 53 bits of precision, and (as is less common) int is also 64 bits wide then there are some values of type int which can not be exactly represented by a double. Could someone please point me to an explanation of how this is ensured, given that the details of a type realization varies with the platform? Thanks. Fred P.S. I am not worried about overflowing the int value range, just about the guaranteed precise representation of int by double. -- Erik Trulsson er******@student.uu.se Nov 14 '05 #6

 P: n/a Fred Ma wrote: Rouben Rostamian wrote: I don't know whether the C Standard specifies anything to this effect. But here is an implementation-specific observation. On a machine with 64-bit doubles which follow the IEEE specification, the mantissa part is 53 bits (plus one hidden bit as well) therefore integers as large as around 2 to the 50th power should be exactly representable. In particular, if the machine has 32-bit ints, they are all exactly representable as doubles. On my machine, which has 32-bit ints and 64-bit doubles, the following yields the exact answer: printf("%30.15f\n", 1.0 + pow(2.0, 52.)); However the following stretches it too far and the answer is inexact: printf("%30.15f\n", 1.0 + pow(2.0, 53.)); I realize that if a double actually uses twice as many bits as ints, the mantissa should be big enough that imprecision should never arise. I'm just concerned about whether this can be relied upon. This can't be relied upon. My faith in what seems normal has been shaken after finding that long has the same number of bits as int in some environments. Actually in most environments these days. (Most Unix-variants on 32-bit systems has both int and as 32 bits wide.) What if double has the same number of bits as ints in some environments? Some of those bits will be taken up by the exponent, and the mantissa will actually have fewer bits than an int. Hence, it will be less precise than ints within the value range of ints. Correct, and this can indeed happen. -- Erik Trulsson er******@student.uu.se Nov 14 '05 #7

 P: n/a On 22 Oct 2004 00:07:14 GMT, Fred Ma wrote in comp.lang.c: I'm using the expression "int a = ceil( SomeDouble )". The man page says that ceil returns the smallest integer that is not less than SomeDouble, represented as a double. However, my understanding is that a double has nonuniform precision throughout its value range. Will a double always be able to exactly represent any value of type int? Could someone please point me to an explanation of how this is ensured, given that the details of a type realization varies with the platform? Thanks. Fred P.S. I am not worried about overflowing the int value range, just about the guaranteed precise representation of int by double. As others have mentioned, on 64-bit platforms some integer types, and perhaps even type int on some, have 64 bits and doubles usually have fewer mantissa bits than this. What I haven't seen anyone else point out, so far, is the fact that this implementation-defined characteristic is available to your program via the macros DECIMAL_DIG and DBL_DIG in . -- Jack Klein Home: http://JK-Technology.Com FAQs for comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html comp.lang.c++ http://www.parashift.com/c++-faq-lite/ alt.comp.lang.learn.c-c++ http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html Nov 14 '05 #8

 P: n/a Jack Klein wrote: As others have mentioned, on 64-bit platforms some integer types, and perhaps even type int on some, have 64 bits and doubles usually have fewer mantissa bits than this. What I haven't seen anyone else point out, so far, is the fact that this implementation-defined characteristic is available to your program via the macros DECIMAL_DIG and DBL_DIG in . Hi, Jack, I found these definitions at Dinkum: DECIMAL_DIG #define DECIMAL_DIG <#if expression >= 10> [added with C99] The macro yields the minimum number of decimal digits needed to represent all the significant digits for type long double. FLT_DIG #define FLT_DIG <#if expression >= 6> The macro yields the precision in decimal digits for type float. I guess the point is that one can infer the bit-width of the mantissa from them. Thanks. Fred Nov 14 '05 #10

 P: n/a "Fred Ma" wrote in message news:41***************@doe.carleton.ca... Will a double always be able to exactly represent any value of type int? Wether (strictly speaking) it will or won't I wouldn't dare to say given the plethora of representations in use. What I *can* say from my own expirience is "Do not count on it". Since the mantissa can (within its limits) represent an integer exactly, you can simply set the exponent to 1 and the integer could be represented exactly. However, M_PI/M_PI seldomly equals 1.000000. Nov 14 '05 #11

 P: n/a Fred Ma wrote: Jack Klein wrote:As others have mentioned, on 64-bit platforms some integer types, andperhaps even type int on some, have 64 bits and doubles usually havefewer mantissa bits than this.What I haven't seen anyone else point out, so far, is the fact thatthis implementation-defined characteristic is available to yourprogram via the macros DECIMAL_DIG and DBL_DIG in . I found these definitions at Dinkum: DECIMAL_DIG #define DECIMAL_DIG <#if expression >= 10> [added with C99] The macro yields the minimum number of decimal digits needed to represent all the significant digits for type long double. FLT_DIG #define FLT_DIG <#if expression >= 6> The macro yields the precision in decimal digits for type float. I guess the point is that one can infer the bit-width of the mantissa from them. Thanks. Umh, for the "bit width" rather use DBL_MANT_DIG, after you made sure that FLT_RADIX is 2 (which is the base you expect). If you want to know the highest exactly representable number (in the "contiguous" subset, of course), you can calculate it from there or use (assuming base 2) 2.0/DBL_EPSILON. Use a conversion to unsigned int and back to find out whether unsigned can hold this value. Cheers Michael -- E-Mail: Mine is a /at/ gmx /dot/ de address. Nov 14 '05 #12

 P: n/a Michael Mair wrote: Fred Ma wrote: Jack Klein wrote:As others have mentioned, on 64-bit platforms some integer types, andperhaps even type int on some, have 64 bits and doubles usually havefewer mantissa bits than this.What I haven't seen anyone else point out, so far, is the fact thatthis implementation-defined characteristic is available to yourprogram via the macros DECIMAL_DIG and DBL_DIG in . I found these definitions at Dinkum: DECIMAL_DIG #define DECIMAL_DIG <#if expression >= 10> [added with C99] The macro yields the minimum number of decimal digits needed to represent all the significant digits for type long double. FLT_DIG #define FLT_DIG <#if expression >= 6> The macro yields the precision in decimal digits for type float. I guess the point is that one can infer the bit-width of the mantissa from them. Thanks. Umh, for the "bit width" rather use DBL_MANT_DIG, after you made sure that FLT_RADIX is 2 (which is the base you expect). If you want to know the highest exactly representable number (in the "contiguous" subset, of course), you can calculate it from there or use (assuming base 2) 2.0/DBL_EPSILON. Use a conversion to unsigned int and back to find out whether unsigned can hold this value. Thanks, Michael. Fred Nov 14 '05 #13

 P: n/a dandelion wrote: M_PI/M_PI seldomly equals 1.000000. I imagine that would depend on how division is implemented. Fred Nov 14 '05 #14

 P: n/a "Fred Ma" wrote in message news:41***************@doe.carleton.ca... dandelion wrote: M_PI/M_PI seldomly equals 1.000000. I imagine that would depend on how division is implemented. Fred ----- Original Message ----- From: "Fred Ma" Newsgroups: comp.lang.c Sent: Friday, October 22, 2004 2:03 PM Subject: Re: Can a double always represent an int exactly? dandelion wrote: M_PI/M_PI seldomly equals 1.000000. I imagine that would depend on how division is implemented. Of course, that's why I wrote "seldomly". And which implementation would return 1.000000, exactly? I'm curious. Try a few CPU's/FPU's and check the results. I'll buy you a beer if you find one. I wonder why all that 'epsilon-squared' stuff was good for back in HIO and why the informatics teacher kept hammering us with "Never compare two floats for equality! Never!". Must have been a geek, worrying about such detail. Nov 14 '05 #15

 P: n/a "dandelion" writes: "Fred Ma" wrote in message dandelion wrote: > > M_PI/M_PI seldomly equals 1.000000. I imagine that would depend on how division is implemented. Of course, that's why I wrote "seldomly". And which implementation would return 1.000000, exactly? I'm curious. Try a few CPU's/FPU's and check the results. I'll buy you a beer if you find one. I just tried this on a wide variety of systems; M_PI/M_PI compares equal to 1.0 on all but one of them. (The exception was a Cray SV1.) Here's the program I used: #include #include int main(void) { double var_M_PI = M_PI; double ratio = M_PI / M_PI; double var_ratio = var_M_PI / var_M_PI; printf("M_PI = %g\n", M_PI); printf("var_M_PI = %g\n", var_M_PI); printf("ratio = %g\n", ratio); printf("ratio %s 1.0\n", ratio == 1.0 ? "==" : "!="); printf("var_ratio = %g\n", var_ratio); printf("var_ratio %s 1.0\n", var_ratio == 1.0 ? "==" : "!="); return 0; } Caveats: A moderately clever compiler could compute the value at compilation time (I didn't check this, but I didn't use any optimization options). And of course M_PI is non-standard. -- Keith Thompson (The_Other_Keith) ks***@mib.org San Diego Supercomputer Center <*> We must do something. This is something. Therefore, we must do this. Nov 14 '05 #16

 P: n/a Keith Thompson wrote: "dandelion" writes: "Fred Ma" wrote in message dandelion wrote: > > M_PI/M_PI seldomly equals 1.000000. I imagine that would depend on how division is implemented. Of course, that's why I wrote "seldomly". And which implementation would return 1.000000, exactly? I'm curious. Try a few CPU's/FPU's and check the results. I'll buy you a beer if you find one. I just tried this on a wide variety of systems; M_PI/M_PI compares equal to 1.0 on all but one of them. (The exception was a Cray SV1.) Here's the program I used: #include #include int main(void) { double var_M_PI = M_PI; double ratio = M_PI / M_PI; double var_ratio = var_M_PI / var_M_PI; printf("M_PI = %g\n", M_PI); printf("var_M_PI = %g\n", var_M_PI); printf("ratio = %g\n", ratio); printf("ratio %s 1.0\n", ratio == 1.0 ? "==" : "!="); printf("var_ratio = %g\n", var_ratio); printf("var_ratio %s 1.0\n", var_ratio == 1.0 ? "==" : "!="); return 0; } Caveats: A moderately clever compiler could compute the value at compilation time (I didn't check this, but I didn't use any optimization options). And of course M_PI is non-standard. In Canada, Moosehead beer is pretty good. :) Seriously, I wasn't implying that practical implementations of division were necessarily sophisticated enough to recognize equivalence of numerator and denominator. What I should ahve said was that I can see such a discrepancy arising, since division is not straightforward to implement. I'm talking about cases that aren't optimized away at compile time. Fred Nov 14 '05 #17

 P: n/a A few minor corrections... In article I wrote (in part):.. a double has nonuniform precision throughout its value range.This is correct (well, I can imagine a weird implementation thatdeliberately makes "double"s have constant precision by oftenwasting a lot of space; it seems quite unlikely though). It occurs to me now that "precision" is not properly defined here. When dealing with scientific notation and decimal numbers, something like 1.23e+10 is less precise than 1.230e+10. The precision here is determined by the number of digits in the mantissa (which is why we have to use the "e+10" notation to suppress "unwanted" trailing zeros). Using this definition of precision, and keeping in mind that most computers today use powers of 2 (binary floating point) rather than powers of ten (decimal floating point), we actually do have "constant precision", such as "always exactly 24 bits of mantissa" (provided we ignore those pesky "denorms" :-) ). This is of course not what the original poster and I meant by "precision" (as illustrated below) -- we were referring to digits beyond the decimal point after conversion to printed form via "%f", for instance. Note, however, that IBM "hex float" (as used on S/360 -- floating point with a radix of 16 instead of 2) really *does* have "precision wobble": the number of "useful" bits in the mantissa changes as numbers change in magnitude. This gives the numerical analysis folks headaches. IEEE floating point is rather better behaved. I need to fix one more typo though: ... [using] "float" ... on most of today's implementations.Here, we get "interesting" effects near 8388608.0 and 16777216.0.Values below 16777216.0 step by ones: 8388608.0 is followedimmediately by 8388609.0, for instance, and 16777215.0 is followedimmediately by 16777216.0. On the other hand, below (float)(1<<23)or above (float)(1<<24), we step by 1/2 or 2 respectively. Usingnextafterf() (if you have it) and variables set to the right values,you might printf() some results and find: nextafterf(8388608.0, -inf) = 8388607.5 nextafterf(16777216.0, +inf) = 16777216.2 This last line should read: nextafterf(16777216.0, +inf) = 16777218.0 (I typed this all in manually, rather than writing C code to call nextafterf(), display the results as above, and then cut-and-paste -- so I added 0.2 instead of 2.0 when I made the change by hand.) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: forget about it http://web.torek.net/torek/index.html Reading email is like searching for food in the garbage, thanks to spammers. Nov 14 '05 #18

 P: n/a Chris Torek wrote: A few minor corrections... In article I wrote (in part):.. a double has nonuniform precision throughout its value range.This is correct (well, I can imagine a weird implementation thatdeliberately makes "double"s have constant precision by oftenwasting a lot of space; it seems quite unlikely though). It occurs to me now that "precision" is not properly defined here. When dealing with scientific notation and decimal numbers, something like 1.23e+10 is less precise than 1.230e+10. The precision here is determined by the number of digits in the mantissa (which is why we have to use the "e+10" notation to suppress "unwanted" trailing zeros). Using this definition of precision, and keeping in mind that most computers today use powers of 2 (binary floating point) rather than powers of ten (decimal floating point), we actually do have "constant precision", such as "always exactly 24 bits of mantissa" (provided we ignore those pesky "denorms" :-) ). This is of course not what the original poster and I meant by "precision" (as illustrated below) -- we were referring to digits beyond the decimal point after conversion to printed form via "%f", for instance. Note, however, that IBM "hex float" (as used on S/360 -- floating point with a radix of 16 instead of 2) really *does* have "precision wobble": the number of "useful" bits in the mantissa changes as numbers change in magnitude. This gives the numerical analysis folks headaches. IEEE floating point is rather better behaved. I need to fix one more typo though:... [using] "float" ... on most of today's implementations.Here, we get "interesting" effects near 8388608.0 and 16777216.0.Values below 16777216.0 step by ones: 8388608.0 is followedimmediately by 8388609.0, for instance, and 16777215.0 is followedimmediately by 16777216.0. On the other hand, below (float)(1<<23)or above (float)(1<<24), we step by 1/2 or 2 respectively. Usingnextafterf() (if you have it) and variables set to the right values,you might printf() some results and find: nextafterf(8388608.0, -inf) = 8388607.5 nextafterf(16777216.0, +inf) = 16777216.2 This last line should read: nextafterf(16777216.0, +inf) = 16777218.0 (I typed this all in manually, rather than writing C code to call nextafterf(), display the results as above, and then cut-and-paste -- so I added 0.2 instead of 2.0 when I made the change by hand.) Chris, thanks for the correction. I think I got the gist of it from your original post. I did a blanket reply elaborating on it, Fri. Oct. 22 Message-ID <41***************@doe.carleton.ca>. Thanks for helping me get my brain around it, and if you have any comments on that, I'm certainly interested. Fred Nov 14 '05 #19

 P: n/a In article <41***************@doe.carleton.ca> Fred Ma writes: .... Seriously, I wasn't implying that practical implementations of division were necessarily sophisticated enough to recognize equivalence of numerator and denominator. What I should ahve said was that I can see such a discrepancy arising, since division is not straightforward to implement. I'm talking about cases that aren't optimized away at compile time. It is not straightforward to implement. Nevertheless, whenever the FPU conforms to the IEEE standard the division *must* deliver the exact answer if the quotient is representable. So on all systems using such FPU's (and that is the majority at this moment) should deliver 1.0 when confronted with a/a, in whatever way it is disguised. To get division right is not straigthforward, but it is not so very difficult either. That Keith Thompson found that it was not the case on a Cray SV1 is entirely because that system has not an IEEE conforming floating point system. (That machine does not have a divide instruction. It calculates an approximation of the inverse of the denominator and multiplies with the numerator, and one Newton iteration is performed. Due to some quirks it may give an inexact result. If I remember right, the smallest integral division that is inexact is 17.0/17.0. -- dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131 home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/ Nov 14 '05 #20

 P: n/a dandelion wrote: "Fred Ma" wrote in message news:41***************@doe.carleton.ca...dandelion wrote:M_PI/M_PI seldomly equals 1.000000.I imagine that would depend on how division is implemented.Fred ----- Original Message ----- From: "Fred Ma" Newsgroups: comp.lang.c Sent: Friday, October 22, 2004 2:03 PM Subject: Re: Can a double always represent an int exactly? dandelion wrote:M_PI/M_PI seldomly equals 1.000000.I imagine that would depend on how division is implemented. Of course, that's why I wrote "seldomly". And which implementation would return 1.000000, exactly? I'm curious. Try a few CPU's/FPU's and check the results. I'll buy you a beer if you find one. This one does it for me.. is it the printf fscking up, or .. #include #include float divide_me(float f){ return f/M_PI; } int main() { printf("%.26f\n",divide_me(M_PI)); return 0; } Nov 14 '05 #21

 P: n/a Caveats: A moderately clever compiler could compute the value at compilation time (I didn't check this, but I didn't use any optimization options). And of course M_PI is non-standard. Your beer is waiting... Nice and cold. You earned it, I picked a silly example. Nov 14 '05 #22

 P: n/a "Dik T. Winter" wrote: In article <41***************@doe.carleton.ca> Fred Ma writes: ... > Seriously, I wasn't implying that practical implementations of > division were necessarily sophisticated enough to recognize > equivalence of numerator and denominator. What I should ahve > said was that I can see such a discrepancy arising, since > division is not straightforward to implement. I'm talking about > cases that aren't optimized away at compile time. It is not straightforward to implement. Nevertheless, whenever the FPU conforms to the IEEE standard the division *must* deliver the exact answer if the quotient is representable. So on all systems using such FPU's (and that is the majority at this moment) should deliver 1.0 when confronted with a/a, in whatever way it is disguised. To get division right is not straigthforward, but it is not so very difficult either. That Keith Thompson found that it was not the case on a Cray SV1 is entirely because that system has not an IEEE conforming floating point system. (That machine does not have a divide instruction. It calculates an approximation of the inverse of the denominator and multiplies with the numerator, and one Newton iteration is performed. Due to some quirks it may give an inexact result. If I remember right, the smallest integral division that is inexact is 17.0/17.0. -- dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131 home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/ Thanks for that. It's useful to know. Fred Nov 14 '05 #23

### This discussion thread is closed

Replies have been disabled for this discussion. 