I've been trying to perform a calculation that has been running into
an underflow (insufficient precision) problem in Microsoft Excel, which
calculates using at most 15 significant digits. For this purpose, that
isn't enough.
I was reading a book about some of the financial scandals of the
1990s called "Inventing Money: The Story of Long-Term Capital Management
and the Legends Behind it" by Nicholas Dunbar. On page 95, he mentions
that the 1987 stock-market crash was designated in economists' computer
models as a "20-sigma event". That is, their models (which obviously were
badly flawed!) put such an event in the exponential "tails" of a normal
Gaussian distribution, outside the range +/- 20 standard deviations from
the mean. In other words - vastly unlikely.
So I wanted to know: just how unlikely is that? Where did they get
the data to support that idea, anyway?
I put together an Excel spreadsheet, attempting to calculate this
probability. But after about 8 sigma I ran into the limits of Excel's
15-digit
precision and could go no further. Since the odds of an 8-sigma event
are about 819 trillion to 1 against, a 20-sigma event is obviously going to
be so unlikely that it's not even worth talking about; would probably
never happen in the lifetime of the universe. Further evidence that those
computer models had some serious problems.
To perform that calculation, I used Excel's built-in error function
ERF(), and here's the output:
s Confidence Interval Probability Odds against
1.0 68.268949131685700% 31.7310508683143000000% 3
1.5 86.638554269553900% 13.3614457304461000000% 7
2.0 95.449972950730900% 4.55002704926911000000% 22
2.5 98.758066814134400% 1.24193318586556000000% 81
3.0 99.730020387427800% 0.26997961257220200000% 370
3.5 99.953474183672100% 0.04652581632794690000% 2,149
4.0 99.993665751560700% 0.00633424843932140000% 15,787
4.5 99.999320465373800% 0.00067953462623560100% 147,160
5.0 99.999942669685300% 0.00005733031473997840% 1,744,278
5.5 99.999996202087500% 0.00000379791249560668% 26,330,254
6.0 99.999999802682500% 0.00000019731752898267% 506,797,346
6.5 99.999999991968000% 0.00000000803199728949% 12,450,203,405
7.0 99.999999999744000% 0.00000000025596191833% 390,683,116,666
7.5 99.999999999993600% 0.00000000000638378239% 15,664,694,356,071
8.0 99.999999999999900% 0.00000000000000000000% 818,836,295,885,545
8.5 100.00000000000000% 0.00000000000000000000% #DIV/0!
.. . . .
s =ERF(An/SQRT(2)) =1-Bn =1/Cn
(The 'n' in the cell formulas above represents the row number).
Evidently this calculation hits the limit of Excel's computational
precision at about 8 sigma.
For what it's worth, ERF(z) is defined as 2/pi * INT[(0,z) exp(-t²)
dt], and the area under the "normal" Bell curve from -z to z is just
ERF(z/sqrt(2)). There's a discussion of it here:
http://mathworld.wolfram.com/Erf.html, here:
http://mathworld.wolfram.com/ConfidenceInterval.html, and here:
http://jove.prohosting.com/~skripty/page_295.htm.
So I decided to try to tackle this in C. I downloaded a package
called the GNU Scientific Library (http://www.gnu.org/software/gsl/) -
there's a nice precompiled binary for Microsoft Visual C++ at
http://www.network-theory.co.uk/gsl/freedownloads.html. I wrote a little
piece of code to try it out:
================================================== ========================
#include <stdio.h>
#include <gsl/gsl_math.h>
#include <gsl/gsl_sf_erf.h>
int main(void)
{
double odds;
double index;
double sigma;
double result;
for (index = 1.0; index <= 8.5; index += 0.5)
{
sigma = index / M_SQRT2;
result = 1 - (gsl_sf_erf (sigma));
odds = 1 / result;
printf("P(%2.1f \345) = %.18LE \t %-1.18lE\n", index, result, odds);
}
return 0;
}
================================================== ========================
And here's its output:
P(1.0 s) = 3.173105078629142600E-001 3.151487187534375500E+000
P(1.5 s) = 1.336144025377161700E-001 7.484223115226846800E+000
P(2.0 s) = 4.550026389635841700E-002 2.197789450799282900E+001
P(2.5 s) = 1.241933065155231800E-002 8.051963733448130300E+001
P(3.0 s) = 2.699796063260206900E-003 3.703983473449563900E+002
P(3.5 s) = 4.652581580710801700E-004 2.149344364311446000E+003
P(4.0 s) = 6.334248366623995700E-005 1.578719276732396800E+004
P(4.5 s) = 6.795346249477418600E-006 1.471595358480670300E+005
P(5.0 s) = 5.733031437360480700E-007 1.744277893686913900E+006
P(5.5 s) = 3.797912495606681200E-008 2.633025382119182500E+007
P(6.0 s) = 1.973175289826656400E-009 5.067973459610119500E+008
P(6.5 s) = 8.031997289492665000E-011 1.245020340467724800E+010
P(7.0 s) = 2.559619183273298400E-012 3.906831166662759400E+011
P(7.5 s) = 6.383782391594650100E-014 1.566469435607129100E+013
P(8.0 s) = 1.221245327087672200E-015 8.188362958855447500E+014
P(8.5 s) = 0.000000000000000000E+000 1.#INF00000000000000E+000
Same limits as in Excel, it seems. GSL's gsl_sf_erf function takes a
double-precision argument and returns double. My Visual C++ compiler (v.
6.0) specifies that both double and long double use an 8-byte
representation: "The long double contains 80 bits: 1 for sign, 15 for
exponent, and 64 for mantissa. Its range is +/- 1.2E4932 with at least 19
digits of precision."
Strangely enough, I only seem to be getting 17 digits of precision.
Regardless, I estimate that this calculation will require at least 38
additional digits of precision. Interesting: according to IEEE 754, the
condition for "positive underflow" (single precision) shouldn't happen
for positive numbers greater than about 1.4E-045 or so. For double
precision, it should happen only for positive numbers less than about
1.0E-308.
So here's what I'd like to know.
Are there 64-bit implementations of something like GSL which would
produce more precise output on appropriate OS/hardware platforms like
WinXP-64, Solaris v. 7-9, Tru64 Unix, Linux, etc? Has anyone
implemented a 128-bit long long double datatype or equivalent?
Can a calculation like this be performed with some kind of arbitrary-
precision numeric package (something like Michael Ring's MAPM or PHP
BCMath), or evaluated directly by Mathematica or the java.math package?
Or maybe I'm just confused and there's an easier way to do this which
I'm just not seeing.
By the way, I don't know of anyone who's bothered to tabulate the
values of this function nearly this far: most such texts (Zwillinger's CRC
Handbooks, Abramowitz & Stegun, Gradshteyn & Ryzhik, etc) only go to about
4.0 sigma.
Any ideas?
-- Dave Schulman (ca******@gate.net)