An almost-infinite number of numbers can be presented as a finite binary
sequence. For sequences of length N there are 2^N possibilities.
That was the starting point of my researches. For that time I wanted
nothing but write a simple rounding proc with spelled in advance
limitations imposed by computer math, like "here it is will be pretty
exact, here it will not and this is why". Later I noticed - or thought
to notice - a semi-mystical correlation of these limits with Nalimov's
semantical spectra if ported onto computer programs, so the problem
got some linguistical - so some personal interest. That is not a
"showing up", I just wanted to explain why I'm spending more time on
this than it was originally planned - with the manual rounding
question itself shifter to the periphery.
The answer on "What decimal float can be stored exactly in IEEE-754
form?" appeared to be surprisingly hard to find for a non-
mathematician. The majority of online resources is simply stating that
"Not all numbers can be stored exactly in IEEE-754. For instance N
cannot." I collected a whole collection of these "for instance" all
around and nearly got berserk of it. By pure occasion, while checking
for exact English math terms, I found the definition of the dyadic
fraction (a.k.a. dyadic rational)
<http://en.wikipedia.org/wiki/Dyadic_fraction>
and my question got answered for the Turing machine, thus - besides
everything else - for unlimited internal storage space:
Any decimal number representable as a dyadic rational is also
representable as a finite binary sequence thus can be stored exactly
in binary form.
Including the definition of dyadic rational into the theorem itself,
the "1st VK's theorem" would be:
---
Any decimal number can be stored exactly in binary form if and only if
the decimal number in question is representable as rational X/2^Y (X
divided by 2 in power of Y) where X is an integer and Y is a natural
number including 0
---
This way the reduced form vulgar fraction can always tell if the
number is representable exactly in binary form:
92 = 92/1 = 92/2^0 <= dyadic fraction, so can be stored exactly
9.2 = 9/1 + 2/10 = 90/10 + 2/10 = 92/10 = 46/5 <= non-dyadic, so
cannot be stored exactly
As the joke with "1st VK's theorem" is getting a bit old, we can call
it more seriously for further, for instance "the theorem of dyadic
subset".
It is interesting to point out that the theorem applies to any binary
storage media so it doesn't depend on exact standard or space:
IEEE-754 single, double or n-byted is irrelevant.
This way for any set N of real numbers expressed in decimal numeral
form the subset of numbers stored exactly in binary form is formed by
numbers where this equation can be solved: R = X/Y^2
Math-savvies may try to find a formula for the subset size relative to
the whole set size. Empirically it is possible to say that binary
storage space is rather misfortunate media for decimal input so most
of the time any engine will be loosing/factoring/restoring and overall
playing with the precision.
An IEEE Double can have two signs, 2^52 mantissas, and almost 2^11
ordinary exponents; or it can be NaN or +Infinity or -Infinity. It can
be +0 or -0. but I forget where they fit into that scheme.
IEEE-754 is rather tricky in this aspect and here comes soon the "2nd
VK's theorem" :-)
But before that some grounds:
IEEE-754-DP-FP (double-precision floating-point) number takes 64 bit
of storage space with msb for sign, 11 bits for exponent and 52 bits
for mantissa: 1 + 11 + 52 = 64
Trick 1:
Because the exponent part must be able to represent both positive and
negative exponents, it is not stored exactly by as the result of
addition ActualValue+1023 (biased to 1023). This way if IEEE-754-DP-FP
form has the exponent part 1023 then the actual exponent is
1023-1023=0 and if the exponent part 1 then the actual exponent is
1-1023=-1022
Trick 2:
Mantissa is presumed to be in normalized form thus radix (comma)
placed after the first non-zero value: 1.23, 2.0456 etc. Because in
binary system the only non-zero value is 1, it allows do not store the
first bit but simply presume it ("hidden" bit). It means that if
mantissa part stores say
00101000000010100000001010000000101000000010100000 11 then the actual
bit sequence it represents is
1.001010000000101000000010100000001010000000101000 0011
It allows IEEE-754-DP-FP to have 53 bits for mantissa with only 52
physical bits.
Trick 3:
If the exponent part contains only zeros: 00000000000 then both tricks
1 and 2 are getting overridden and the stored number becomes
denormalized. In this case exponent is assumed to be -1022 and the
mantissa doesn't have any hidden bits: "what you see is what you get".
Such shift helps to operate with very small numbers, in the particular
it allows to keep comparison operations for very small values correct.
Extra 1
The normalized numbers always have "hidden" a.k.a. "implicit" bit
added - see Trick 2. Because of that the only way to represent zero
(0) in IEEE-754 is by using a denormalized number. Indeed 0 in
IEEE-754 is the number where both exponent and mantissa parts contain
all zeros. Because sign bit is always presented, it still can be set
to 0 or 1: thus it can be positive zero 0 and negative zero -0
In JavaScript the engine is instructed to say that -0 == 0
That has nothing to do with some IEEE-754 particular demands. This is
just a convenience heuristic added atop to keep regular users in
better sanity :-) Such heuristic is neither required nor uniform in
programming languages. Say in Perl -0 != 0 for endless surprise of
newbes.
Extra 2
IEEE-754 number with exponent all 1s and mantissa all 0s has special
meaning: it denotes Infinity. With sign bit cleared or set we are
getting +Infinity and -Infinity
Extra 3
IEEE-754 number with exponent all 1s and mantissa with at least one
bit set denotes NaN (not a number). There are many kinds of NaN
depending on mantissa bit pattern. The most regular one is Quiet NaN
(QNaN) with msb of mantissa set:
? 11111111111 1... further whatever
S EEEEEEEEEEE M...
QNaN simply informs that the performed operation has no mathematically
defined return value. You are getting QNaN when say performing
parseInt('foobar', 10);
There is also Signalling NaN (SNaN) with msb of mantissa cleared:
? 11111111111 0... further whatever but at least one bit 1
S EEEEEEEEEEE M...
SNaN is used to raise exceptions in math operations, AFAIK it is not
used in ECMAScript implementations.
Because mantissa content is not regulated for NaN except for msb there
can be potentially billions of NaN values. This way I'm taking back my
older opinion that "there are not any physical reasons behind the rule
that NaN != NaN". In fact there is and a good one.
Any number which can be expressed exactly as a sum of powers of 0.5 (or
2.0) can be held exactly,
That is back-reversed "1st VK's theorem" - expressed via Egyptian
rational-like method. IMO the definition over dyadic rational is more
clear and strict though I may be biased.
provided that the range of the powers does not
exceed about 53, and the extreme powers are neither too big nor too
small (960 to -960 should be safe).
Right, the "1st VK's theorem" - in any form - applies to the Turing
machine. On a real PC for IEEE-754-DP-FP we are hitting the mantissa
storage space limit of 52+1 bits
So the actual N of real decimal numbers represented exactly in
IEEE-754-DP-FP will be the subset of subset: first the numbers
satisfying the dyadic subset theorem, then subset of these satisfying
the storage limit condition.
I'm playing with it right now.
btw I'm reading your site as well.