In article <11************ **********@r23g 2000prd.googleg roups.com>,
Steven Woody <na********@gma il.comwrote:
>On Aug 28, 11:49 am, rober...@ibd.nr c-cnrc.gc.ca (Walter Roberson)
wrote:
>In article <1188270577.057 465.25...@z24g2 000prh.googlegr oups.com>,
Steven Woody <narkewo...@gma il.comwrote:
>long i = nnn;
long j;
double d;
d = i;
j = ( long )d;
in this case, i == j ?
>thanks for all your inputs. i now understan, if nnn is so large that
it can not be presented in a double with zero exponent, it becomes
unexactly.
Ummm, not exactly.
The below relates to the most common means of representing floating
point numbers. There are other schemes that differ a bit in the
details:
Take the integer and write it out in binary, with no leading 0's.
Count the number of bits and subtract 1; the result will be the
"unbiased" exponent used. Internally, a constant will be added to
this exponent to produce a "biased exponent" that will actually
be stored (the reasons for this have to do with storing numbers less
than 1.) Thus, the unbiased exponent will be non-zero for any integer
greater than 1. Now discard the leading 1 bit from the binary
representation of the integer, *leaving any leading 0s there*.
Start storing this binary number into the available
mantissa digits (e.g., 23 bits for IEEE 32 bit floats),
starting from the "left" (highest bit position) and progressing
towards the right. If you run out of binary digits before you
run out of available bits, then you were able to store the integer
exactly in the floating point number; pad any remaining mantissa
bits out with binary 0's. If you run out of available mantissa
bits before you run out of binary digits, then you were not
able to store the integer exactly.
Or, as a simpler wording: count the number of binary digits in
the representation of the integer, and subtract 1 from that count.
If the result is more than the number of mantissa bits available,
then you cannot store the integer exactly.
>since my system has a 16bit integer and 32bit double, so i
believe this will not happend and i in above code always equals to j.
There are two problems with that.
A) Your code is written around -long-, not around -int-, and it is not
valid in C for a long to be as little as 16 bits. long in C requires at
least 32 bits.
B) Secondly, in any conforming C program, it is not valid for
double to be as little as 32 bits: 32 bits is not enough to
achieve the requirement that DBL_DIG (the number of decimal
digits that can be reliably stored) be at least 10;
32 bits is only enough for 6 decimal digits of reliable storage.
The minimum number of bits needed to meet the C90 constraints
on the range and precision of double, is 43.
--
"No one has the right to destroy another person's belief by
demanding empirical evidence." -- Ann Landers