I've been trying to perform a calculation that has been running into

an underflow (insufficient precision) problem in Microsoft Excel, which

calculates using at most 15 significant digits. For this purpose, that

isn't enough.

I was reading a book about some of the financial scandals of the

1990s called "Inventing Money: The Story of Long-Term Capital Management

and the Legends Behind it" by Nicholas Dunbar. On page 95, he mentions

that the 1987 stock-market crash was designated in economists' computer

models as a "20-sigma event". That is, their models (which obviously were

badly flawed!) put such an event in the exponential "tails" of a normal

Gaussian distribution, outside the range +/- 20 standard deviations from

the mean. In other words - vastly unlikely.

So I wanted to know: just how unlikely is that? Where did they get

the data to support that idea, anyway?

I put together an Excel spreadsheet, attempting to calculate this

probability. But after about 8 sigma I ran into the limits of Excel's

15-digit

precision and could go no further. Since the odds of an 8-sigma event

are about 819 trillion to 1 against, a 20-sigma event is obviously going to

be so unlikely that it's not even worth talking about; would probably

never happen in the lifetime of the universe. Further evidence that those

computer models had some serious problems.

To perform that calculation, I used Excel's built-in error function

ERF(), and here's the output:

s Confidence Interval Probability Odds against

1.0 68.268949131685 700% 31.731050868314 3000000% 3

1.5 86.638554269553 900% 13.361445730446 1000000% 7

2.0 95.449972950730 900% 4.5500270492691 1000000% 22

2.5 98.758066814134 400% 1.2419331858655 6000000% 81

3.0 99.730020387427 800% 0.2699796125722 0200000% 370

3.5 99.953474183672 100% 0.0465258163279 4690000% 2,149

4.0 99.993665751560 700% 0.0063342484393 2140000% 15,787

4.5 99.999320465373 800% 0.0006795346262 3560100% 147,160

5.0 99.999942669685 300% 0.0000573303147 3997840% 1,744,278

5.5 99.999996202087 500% 0.0000037979124 9560668% 26,330,254

6.0 99.999999802682 500% 0.0000001973175 2898267% 506,797,346

6.5 99.999999991968 000% 0.0000000080319 9728949% 12,450,203,405

7.0 99.999999999744 000% 0.0000000002559 6191833% 390,683,116,666

7.5 99.999999999993 600% 0.0000000000063 8378239% 15,664,694,356, 071

8.0 99.999999999999 900% 0.0000000000000 0000000% 818,836,295,885 ,545

8.5 100.00000000000 000% 0.0000000000000 0000000% #DIV/0!

.. . . .

s =ERF(An/SQRT(2)) =1-Bn =1/Cn

(The 'n' in the cell formulas above represents the row number).

Evidently this calculation hits the limit of Excel's computational

precision at about 8 sigma.

For what it's worth, ERF(z) is defined as 2/pi * INT[(0,z) exp(-t²)

dt], and the area under the "normal" Bell curve from -z to z is just

ERF(z/sqrt(2)). There's a discussion of it here:

http://mathworld.wolfram.com/Erf.html, here:

http://mathworld.wolfram.com/ConfidenceInterval.html, and here:

http://jove.prohosting.com/~skripty/page_295.htm.

So I decided to try to tackle this in C. I downloaded a package

called the GNU Scientific Library (http://www.gnu.org/software/gsl/) -

there's a nice precompiled binary for Microsoft Visual C++ at

http://www.network-theory.co.uk/gsl/freedownloads.html. I wrote a little

piece of code to try it out:

=============== =============== =============== =============== ==============

#include <stdio.h>

#include <gsl/gsl_math.h>

#include <gsl/gsl_sf_erf.h>

int main(void)

{

double odds;

double index;

double sigma;

double result;

for (index = 1.0; index <= 8.5; index += 0.5)

{

sigma = index / M_SQRT2;

result = 1 - (gsl_sf_erf (sigma));

odds = 1 / result;

printf("P(%2.1f \345) = %.18LE \t %-1.18lE\n", index, result, odds);

}

return 0;

}

=============== =============== =============== =============== ==============

And here's its output:

P(1.0 s) = 3.1731050786291 42600E-001 3.1514871875343 75500E+000

P(1.5 s) = 1.3361440253771 61700E-001 7.4842231152268 46800E+000

P(2.0 s) = 4.5500263896358 41700E-002 2.1977894507992 82900E+001

P(2.5 s) = 1.2419330651552 31800E-002 8.0519637334481 30300E+001

P(3.0 s) = 2.6997960632602 06900E-003 3.7039834734495 63900E+002

P(3.5 s) = 4.6525815807108 01700E-004 2.1493443643114 46000E+003

P(4.0 s) = 6.3342483666239 95700E-005 1.5787192767323 96800E+004

P(4.5 s) = 6.7953462494774 18600E-006 1.4715953584806 70300E+005

P(5.0 s) = 5.7330314373604 80700E-007 1.7442778936869 13900E+006

P(5.5 s) = 3.7979124956066 81200E-008 2.6330253821191 82500E+007

P(6.0 s) = 1.9731752898266 56400E-009 5.0679734596101 19500E+008

P(6.5 s) = 8.0319972894926 65000E-011 1.2450203404677 24800E+010

P(7.0 s) = 2.5596191832732 98400E-012 3.9068311666627 59400E+011

P(7.5 s) = 6.3837823915946 50100E-014 1.5664694356071 29100E+013

P(8.0 s) = 1.2212453270876 72200E-015 8.1883629588554 47500E+014

P(8.5 s) = 0.0000000000000 00000E+000 1.#INF000000000 00000E+000

Same limits as in Excel, it seems. GSL's gsl_sf_erf function takes a

double-precision argument and returns double. My Visual C++ compiler (v.

6.0) specifies that both double and long double use an 8-byte

representation: "The long double contains 80 bits: 1 for sign, 15 for

exponent, and 64 for mantissa. Its range is +/- 1.2E4932 with at least 19

digits of precision."

Strangely enough, I only seem to be getting 17 digits of precision.

Regardless, I estimate that this calculation will require at least 38

additional digits of precision. Interesting: according to IEEE 754, the

condition for "positive underflow" (single precision) shouldn't happen

for positive numbers greater than about 1.4E-045 or so. For double

precision, it should happen only for positive numbers less than about

1.0E-308.

So here's what I'd like to know.

Are there 64-bit implementations of something like GSL which would

produce more precise output on appropriate OS/hardware platforms like

WinXP-64, Solaris v. 7-9, Tru64 Unix, Linux, etc? Has anyone

implemented a 128-bit long long double datatype or equivalent?

Can a calculation like this be performed with some kind of arbitrary-

precision numeric package (something like Michael Ring's MAPM or PHP

BCMath), or evaluated directly by Mathematica or the java.math package?

Or maybe I'm just confused and there's an easier way to do this which

I'm just not seeing.

By the way, I don't know of anyone who's bothered to tabulate the

values of this function nearly this far: most such texts (Zwillinger's CRC

Handbooks, Abramowitz & Stegun, Gradshteyn & Ryzhik, etc) only go to about

4.0 sigma.

Any ideas?

-- Dave Schulman (ca******@gate. net)