Floating-point bit hacking: iterated nextafter() without loop?

Daniel Vallstrom

I am having trouble with floating point addition because of the limited
accuracy of floating types. Consider e.g. x0 += f(n) where x0 and f are
of some floating type. Sometimes x0 is much larger than f(n) so that
x0 + f(n) == x0. For example, x0 could be 2**300*(1.1234. ..) while f(n)
is 2**100*(1.4256. ..). If x0 + f(n) == x0 I still sometimes want the
addition of f(n) to x0 to have some effect on x0. A good solution seems
to be to update x0 to the nth value greater than x0. Something
equivalent to this:

for ( ; n != 0; n-- )
{
x0 = nextafter( x0, DBL_MAX );
}

The problem with the above loop is that it may take too long because n
could be quite large. Moreover, very many such loops would be executed.

Disappointedly gcc doesn't seem to be able to optimize over nextafter
at all.

However, with only some modest bit hacking and without assuming much
about the floating point implementation it is possible to calculate
the nth greater value fast. At the end of this post is a program
showing such a solution. Of course it is not guaranteed to work as
intended but is there any non-odd platform where it will fail? If
there are any places where the solution could fail, what would the
consequences be? Is there a better solution to the problem than the
one below?
Daniel Vallstrom

// Tests consecutive nextafter calls and a way to compute the same thing fast
// without iterated nextafter calls.
// Daniel Vallstrom, 041014.
// Compile with e.g: gcc -std=c99 -pedantic -Wall -O3 -lm nextafterNTest. c

/* The program should print something like:
n: 0x1000000 (16777216)
x0: 0x1.001p+8
xn: n nextafter( x0, DBL_MAX ) iterations: 0x1.0010001p+8
The xn loop took 0.8 cpu seconds.
m: n * nextafter( 0.0, DBL_MAX ): 0x0.0000001p-1022
e: exp2( floor( log2( x0 ) ) ): 0x1p+8
z: e | m: 0x1.0000001p+8
z1: z - e: 0x1p-20
z2: x0 + z1: 0x1.0010001p+8 (==xn)
u: n unrolled nextafter( x0, DBL_MAX ) iterations: 0x1.0010001p+8 (==xn)
The u loops took 0.7 cpu seconds.
*/

#include <stdio.h>
#include <float.h>
#include <math.h>
#include <assert.h>
#include <time.h>
#define nextafter2M( x, y ) (x) = nextafter((x),( y)); (x) = nextafter((x),( y))
#define nextafter4M( x, y ) nextafter2M( (x), (y) ); nextafter2M( (x), (y) )
#define nextafter8M( x, y ) nextafter4M( (x), (y) ); nextafter4M( (x), (y) )
#define nextafter16M( x, y ) nextafter8M( (x), (y) ); nextafter8M( (x), (y) )
#define nextafter32M( x, y ) nextafter16M( (x), (y) ); nextafter16M( (x), (y) )
#define nextafter64M( x, y ) nextafter32M( (x), (y) ); nextafter32M( (x), (y) )
int main( void )
{
clock_t clockAtLoopStar t; // Used to time the start of loops.
clock_t clockAtLoopEnd; // Used to time the end of loops.

// The number of nextafter iterations to use.
unsigned int n = ( 1u << 24 );
printf( "\nn: %#x (%u)\n", n, n );

// The base double to apply nextafter to.
double x0 = 0x1.001p8;
printf( "x0: %a\n", x0 );

// The nth double greater than x0.
double xn = x0;
clockAtLoopStar t = clock();
for ( unsigned int k = 0; k != n; k++ )
{
xn = nextafter( xn, DBL_MAX );
}
clockAtLoopEnd = clock();
printf( "xn: n nextafter( x0, DBL_MAX ) iterations: %a\n", xn );
// xn: n nextafter( x0, DBL_MAX ) iterations: 0x1.0010001p+8
printf( " The xn loop took %.1f cpu seconds.\n",
(double) ( clockAtLoopEnd - clockAtLoopStar t ) / CLOCKS_PER_SEC );
// The xn loop took 0.8 cpu seconds.
// The goal is to compute xn fast. At least gcc -O3 does *not* optimize
// iterated nextafter calls (the above loop). If the loop (n) is long (or
// many loops have to be done) it takes too long to run.

// Calculate a useful mantissa:
double m = nextafter( 0.0, DBL_MAX );
m = n*m;
printf( "m: n * nextafter( 0.0, DBL_MAX ): %a\n", m );
// m: n * nextafter( 0.0, DBL_MAX ): 0x0.0000001p-1022

// Calculate a useful exponent part.
double e = exp2( floor( log2( x0 ) ) );
printf( "e: exp2( floor( log2( x0 ) ) ): %a\n", e );
// e: exp2( floor( log2( x0 ) ) ): 0x1p+8

// Combine e and m.
double z = e;
for ( unsigned char * zPtr = &z, * zEnd = zPtr + sizeof(z), * mPtr = &m;
zPtr != zEnd; zPtr++, mPtr++ )
{
*zPtr |= *mPtr;
}
printf( "z: e | m: %a\n", z );
// z: e | m: 0x1.0000001p+8

// Remove the faulty first mantissa bit in z, the one coming from e.
double z1 = z - e;
printf( "z1: z - e: %a\n", z1 );
// z1: z - e: 0x1p-20

// Finally we are ready to add the value to x0.
double z2 = x0 + z1;
printf( "z2: x0 + z1: %a (==xn)\n", z2 );
// z2: x0 + z1: 0x1.0010001p+8 (==xn)

assert( z2 == xn );
// Test if unrolling the nextafter calls helps. (It doesn't (besides
// the speed-up from the unrolling itself).)

double u = x0;

clockAtLoopStar t = clock();
for ( unsigned int k = n/64; k != 0; k-- )
{
nextafter64M( u, DBL_MAX );
}

for ( unsigned int k = n%64; k != 0; k-- )
{
u = nextafter( u, DBL_MAX );
}
clockAtLoopEnd = clock();

printf( "u: n unrolled nextafter( x0, DBL_MAX ) iterations: %a (==xn)\n",
u );
// u: n unrolled nextafter( x0, DBL_MAX ) iterations: 0x1.0010001p+8 (==xn)
assert( u == xn );
printf( " The u loops took %.1f cpu seconds.\n",
(double) ( clockAtLoopEnd - clockAtLoopStar t ) / CLOCKS_PER_SEC );
// The u loops took 0.7 cpu seconds.
return 0;
}

Nov 14 '05 #1

Subscribe Reply

2573

Daniel Vallstrom

Ack! No replies! Bump. Or should I take the fact that there have
been no replies to mean that the program supplied in the first
post does work in practice?p

da************* *@gmail.com (Daniel Vallstrom) wrote in message news:<62******* *************** ****@posting.go ogle.com>...

I am having trouble with floating point addition because of the limited
accuracy of floating types. Consider e.g. x0 += f(n) where x0 and f are
of some floating type. Sometimes x0 is much larger than f(n) so that
x0 + f(n) == x0. For example, x0 could be 2**300*(1.1234. ..) while f(n)
is 2**100*(1.4256. ..). If x0 + f(n) == x0 I still sometimes want the
addition of f(n) to x0 to have some effect on x0. A good solution seems
to be to update x0 to the nth value greater than x0.
For the background to the problem, I forgot to say that f(n)>0 and
increases with n, i.e. n0<n1 -> f(n0)<f(n1). Also, the various
floating values involved aren't god given but rather some heuristic
measures. Hence, when you want to add f(n) to x0, it sometimes makes
sense to increment x0 n times instead of doing nothing when the
accuracy of the floating type is too small (i.e. when x0 + f(n) == x0).
Something equivalent to this:

for ( ; n != 0; n-- )
{
x0 = nextafter( x0, DBL_MAX );
}

The problem with the above loop is that it may take too long because n
could be quite large. Moreover, very many such loops would be executed.

To see that the approach involving iterated nextafter() calls is
infeasible, consider that values of n could on average be around
2**24, like in the program in the first post. Each such n-iterated
nextafter() loop will take about a second to compute. Furthermore,
thousands or millions such loops could very well have to be executed.

[snip suggested solution to the problem]

Any input is appreciated. Regards,
Daniel Vallstrom

Nov 14 '05 #2

Dik T. Winter

In article <62************ **************@ posting.google. com> da************* *@gmail.com (Daniel Vallstrom) writes:

Ack! No replies! Bump. Or should I take the fact that there have
been no replies to mean that the program supplied in the first
post does work in practice?p
Perhaps everybody is stumped? You will not find much floating-point
expertise in this newsgroup.
da************* *@gmail.com (Daniel Vallstrom) wrote in message news:<62******* *************** ****@posting.go ogle.com>...
I am having trouble with floating point addition because of the limited
accuracy of floating types. Consider e.g. x0 += f(n) where x0 and f are
of some floating type. Sometimes x0 is much larger than f(n) so that
x0 + f(n) == x0. For example, x0 could be 2**300*(1.1234. ..) while f(n)
is 2**100*(1.4256. ..). If x0 + f(n) == x0 I still sometimes want the
addition of f(n) to x0 to have some effect on x0. A good solution seems
to be to update x0 to the nth value greater than x0.

For the background to the problem, I forgot to say that f(n)>0 and
increases with n, i.e. n0<n1 -> f(n0)<f(n1). Also, the various
floating values involved aren't god given but rather some heuristic
measures. Hence, when you want to add f(n) to x0, it sometimes makes
sense to increment x0 n times instead of doing nothing when the
accuracy of the floating type is too small (i.e. when x0 + f(n) == x0).
Something equivalent to this:

for ( ; n != 0; n-- )
{
x0 = nextafter( x0, DBL_MAX );
}

The problem with the above loop is that it may take too long because n
could be quite large. Moreover, very many such loops would be executed.

To see that the approach involving iterated nextafter() calls is
infeasible, consider that values of n could on average be around
2**24, like in the program in the first post. Each such n-iterated
nextafter() loop will take about a second to compute. Furthermore,
thousands or millions such loops could very well have to be executed.

My suggestion for the loop would be breaking up the floating-point number in
its constituent parts, and work with them. In that case you add n (suitably
scaled) to the mantissa and recombine. Be careful with overflow in the
mantissa when you add... See frexp and ldexp (at least on my system).
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Nov 14 '05 #3

CBFalconer

Accumulated ugly effects of top-posting fixed, I hope.

"Dik T. Winter" wrote:

(Daniel Vallstrom) writes:
da************* *@gmail.com (Daniel Vallstrom) wrote in message

I am having trouble with floating point addition because of the
limited accuracy of floating types. Consider e.g. x0 += f(n) where
x0 and f are of some floating type. Sometimes x0 is much larger
than f(n) so that x0 + f(n) == x0. For example, x0 could be
2**300*(1.1234. ..) while f(n) is 2**100*(1.4256. ..). If x0 + f(n)
== x0 I still sometimes want the addition of f(n) to x0 to have
some effect on x0. A good solution seems to be to update x0 to the
nth value greater than x0.

For the background to the problem, I forgot to say that f(n)>0 and
increases with n, i.e. n0<n1 -> f(n0)<f(n1). Also, the various
floating values involved aren't god given but rather some heuristic
measures. Hence, when you want to add f(n) to x0, it sometimes
makes sense to increment x0 n times instead of doing nothing when
the accuracy of the floating type is too small (i.e. when x0 + f(n)
== x0). .... snip ... Ack! No replies! Bump. Or should I take the fact that there have
been no replies to mean that the program supplied in the first
post does work in practice?p

Perhaps everybody is stumped? You will not find much floating-point
expertise in this newsgroup.

I would attempt to keep the accumulated corrections separate.
Ideally such a value would be accumulated starting with the
smallest correction, but we can't necessarily have everything.

double x0, xnow, delta, n

for (n = 1, x0 = XSTART, delta = 0; n <= NLAST; n++) {
delta += f(n);
xnow = x0 + delta;
}

which should at least mitigate the problem. FP arithmetic is
inherently an approximation, so we can never have everything. At
best we can define the error bounds. In this case the bounds will
be a function of the relative magnitudes of delta, f(n), and x0.

--
"I support the Red Sox and any team that beats the Yankees"

"Any baby snookums can be a Yankee fan, it takes real moral
fiber to be a Red Sox fan"

Nov 14 '05 #4

Chris Barts

Daniel Vallstrom wrote:

Ack! No replies! Bump.
Bump? This is Usenet. Good newsreaders don't bump, they just add
responses to the original thread, which stays put, visually speaking.
Or should I take the fact that there have
been no replies to mean that the program supplied in the first
post does work in practice?

Take it to mean that the comp.lang.c folks don't know enough about
floating-point math at the bit level to help you.

I'd suggest some other newsgroups, but I don't know enough to help you
there, either. comp.programmin g is probably a good catchall for this
sort of thing. Or you could find some web discussion groups.

Nov 14 '05 #5

Daniel Vallstrom

"Dik T. Winter" <Di********@cwi .nl> wrote in message news:<I5******* *@cwi.nl>...

In article <62************ **************@ posting.google. com> da************* *@gmail.com (Daniel Vallstrom) writes:
> da************* *@gmail.com (Daniel Vallstrom) wrote in message news:<62******* *************** ****@posting.go ogle.com>...
> > I am having trouble with floating point addition because of the limited
> > accuracy of floating types. Consider e.g. x0 += f(n) where x0 and f are
> > of some floating type. Sometimes x0 is much larger than f(n) so that
> > x0 + f(n) == x0. For example, x0 could be 2**300*(1.1234. ..) while f(n)
> > is 2**100*(1.4256. ..). If x0 + f(n) == x0 I still sometimes want the
> > addition of f(n) to x0 to have some effect on x0. A good solution seems
> > to be to update x0 to the nth value greater than x0.

>
> For the background to the problem, I forgot to say that f(n)>0 and
> increases with n, i.e. n0<n1 -> f(n0)<f(n1). Also, the various
> floating values involved aren't god given but rather some heuristic
> measures. Hence, when you want to add f(n) to x0, it sometimes makes
> sense to increment x0 n times instead of doing nothing when the
> accuracy of the floating type is too small (i.e. when x0 + f(n) == x0).
>
> > Something equivalent to this:
> >
> > for ( ; n != 0; n-- )
> > {
> > x0 = nextafter( x0, DBL_MAX );
> > }
> >
> > The problem with the above loop is that it may take too long because n
> > could be quite large. Moreover, very many such loops would be executed.

>
> To see that the approach involving iterated nextafter() calls is
> infeasible, consider that values of n could on average be around
> 2**24, like in the program in the first post. Each such n-iterated
> nextafter() loop will take about a second to compute. Furthermore,
> thousands or millions such loops could very well have to be executed.

My suggestion for the loop would be breaking up the floating-point number in
its constituent parts, and work with them. In that case you add n (suitably
scaled) to the mantissa and recombine. Be careful with overflow in the
mantissa when you add... See frexp and ldexp (at least on my system).

Thanks! That got me thinking. I'll add an unpolished new
solution at the end of this post. (I ought to have found
it directly but I never moved on after failing to find
standard library functions that directly could do the
bit hacking (mantissa-exponent combining) that I wanted.)

I guess that when the floating point exponent base (FLT_RADIX)
isn't 2, the code won't work, at least not without some fixing.

Daniel Vallstrom
// Tests consecutive nextafter calls and a way to compute the same thing fast
// without iterated nextafter calls.
// Daniel Vallstrom, 041018.
// Compile with e.g: gcc -std=c99 -pedantic -Wall -O3 -lm nextafterNTest. c

/* The program should print something like:
n: 0x1000000 (16777216)
x0: 0x1.001p+8
xn: n nextafter( x0, DBL_MAX ) iterations: 0x1.0010001p+8
The xn loop took 0.8 cpu seconds.
m: n * nextafter( 0.0, DBL_MAX ): 0x0.0000001p-1022
e: exp2( floor( log2( x0 ) ) ): 0x1p+8
z: e | m: 0x1.0000001p+8
z1: z - e: 0x1p-20
z2: x0 + z1: 0x1.0010001p+8 (==xn)
x0Exp: frexp( x0, &x0Exp ): 9
mnorm: DBL_MIN + m: 0x1.0000001p-1022
zb: ldexp( mnorm, x0Exp-DBL_MIN_EXP ): 0x1.0000001p+8 (==z)
zb1: zb - e: 0x1p-20
zb2: x0 + zb1: 0x1.0010001p+8 (==xn)
*/

#include <stdio.h>
#include <float.h>
#include <math.h>
#include <assert.h>
#include <time.h>
int main( void )
{
#if FLT_RADIX != 2
#error Case not supported.
#endif

clock_t clockAtLoopStar t; // Used to time the start of loops.
clock_t clockAtLoopEnd; // Used to time the end of loops.

// The number of nextafter iterations to use. Can be anything.
unsigned int n = ( 1u << 24 );
printf( "\nn: %#x (%u)\n", n, n );

// The base double to apply nextafter to. Can be anything.
double x0 = 0x1.001p8;
printf( "x0: %a\n", x0 );

// The nth double greater than x0.
double xn = x0;
clockAtLoopStar t = clock();
for ( unsigned int k = 0; k != n; k++ )
{
xn = nextafter( xn, DBL_MAX );
}
clockAtLoopEnd = clock();
printf( "xn: n nextafter( x0, DBL_MAX ) iterations: %a\n", xn );
// xn: n nextafter( x0, DBL_MAX ) iterations: 0x1.0010001p+8
printf( " The xn loop took %.1f cpu seconds.\n",
(double) ( clockAtLoopEnd - clockAtLoopStar t ) / CLOCKS_PER_SEC );
// The xn loop took 0.8 cpu seconds.
// The goal is to compute xn fast. At least gcc -O3 does *not* optimize
// iterated nextafter calls (the above loop). If the loop (n) is long (or
// many loops have to be done) it takes too long to run.

// Calculate a useful mantissa:
double m = nextafter( 0.0, DBL_MAX );
m = n*m;
printf( "m: n * nextafter( 0.0, DBL_MAX ): %a\n", m );
// m: n * nextafter( 0.0, DBL_MAX ): 0x0.0000001p-1022

// Calculate a useful exponent part.
double e = exp2( floor( log2( x0 ) ) );
printf( "e: exp2( floor( log2( x0 ) ) ): %a\n", e );
// e: exp2( floor( log2( x0 ) ) ): 0x1p+8

// Combine e and m.
double z = e;
for ( unsigned char * zPtr = &z, * zEnd = zPtr + sizeof(z), * mPtr = &m;
zPtr != zEnd; zPtr++, mPtr++ )
{
*zPtr |= *mPtr;
}
printf( "z: e | m: %a\n", z );
// z: e | m: 0x1.0000001p+8

// Remove the faulty first mantissa bit in z, the one coming from e.
double z1 = z - e;
printf( "z1: z - e: %a\n", z1 );
// z1: z - e: 0x1p-20

// Finally we are ready to add the value to x0.
double z2 = x0 + z1;
printf( "z2: x0 + z1: %a (==xn)\n", z2 );
// z2: x0 + z1: 0x1.0010001p+8 (==xn)

assert( z2 == xn );
// Here is an approach using frexp and ldexp instead of bit-oring.

int x0Exp;
frexp( x0, &x0Exp );
printf( "x0Exp: frexp( x0, &x0Exp ): %d\n", x0Exp );
// x0Exp: frexp( x0, &x0Exp ): 9

// Normalize the mantissa.
double mnorm = DBL_MIN + m;
printf( "mnorm: DBL_MIN + m: %a\n", mnorm );
// mnorm: DBL_MIN + m: 0x1.0000001p-1022

double zb = ldexp( mnorm, x0Exp-DBL_MIN_EXP );
printf( "zb: ldexp( mnorm, x0Exp-DBL_MIN_EXP ): %a (==z)\n", z );
// zb: ldexp( mnorm, x0Exp-DBL_MIN_EXP ): 0x1.0000001p+8 (==z)

assert( zb == z );
// The rest is just like above.

// Remove the faulty first mantissa bit in zb.
double zb1 = zb - e;
printf( "zb1: zb - e: %a\n", zb1 );
// zb1: zb1 - e: 0x1p-20

// We are ready to add the value to x0.
double zb2 = x0 + zb1;
printf( "zb2: x0 + zb1: %a (==xn)\n", zb2 );
// z2: x0 + z1: 0x1.0010001p+8 (==xn)

assert( zb2 == xn );
return 0;
}

Nov 14 '05 #6

Eric Sosman

Daniel Vallstrom wrote:

I am having trouble with floating point addition because of the limited
accuracy of floating types. Consider e.g. x0 += f(n) where x0 and f are
of some floating type. Sometimes x0 is much larger than f(n) so that
x0 + f(n) == x0. For example, x0 could be 2**300*(1.1234. ..) while f(n)
is 2**100*(1.4256. ..). If x0 + f(n) == x0 I still sometimes want the
addition of f(n) to x0 to have some effect on x0. A good solution seems
to be to update x0 to the nth value greater than x0. Something
equivalent to this:

for ( ; n != 0; n-- )
{
x0 = nextafter( x0, DBL_MAX );
}
[...]

Look up "Kahan's summation formula."

--
Er*********@sun .com

Nov 14 '05 #7

Daniel Vallstrom

"Dik T. Winter" <Di********@cwi .nl> wrote in message news:<I5******* *@cwi.nl>...

In article <62************ **************@ posting.google. com> da************* *@gmail.com (Daniel Vallstrom) writes:
> da************* *@gmail.com (Daniel Vallstrom) wrote in message news:<62******* *************** ****@posting.go ogle.com>...
> > [snip question asking how to compute the following loop in a fast manner]
> >
> > for ( ; n != 0; n-- )
> > {
> > x0 = nextafter( x0, DBL_MAX );
> > }
My suggestion for the loop would be breaking up the floating-point number in
its constituent parts, and work with them. In that case you add n (suitably
scaled) to the mantissa and recombine. Be careful with overflow in the
mantissa when you add... See frexp and ldexp (at least on my system).

Thanks again. Here is a simple solution:

int x0Exp;
frexp( x0, &x0Exp );
x0 += ldexp( n * nextafter( 0.0, DBL_MAX ), x0Exp - DBL_MIN_EXP );

I'm still miffed that I didn't find the above solution directly
(eventhough hindsight is always 20/20). I'll post a full program with
the solution at the end.

Daniel Vallstrom
// Tests consecutive nextafter calls and ways to compute the same thing fast
// without iterated nextafter calls.
// Daniel Vallstrom, 041019.
// Compile with e.g: gcc -std=c99 -pedantic -Wall -O3 -lm nextafterNTest. c

/* The program should print something like:
n: 0x1000000 (16777216)
x0: 0x1.001p+8
xn: n nextafter( x0, DBL_MAX ) iterations: 0x1.0010001p+8
The xn loop took 0.8 cpu seconds.
m: n * nextafter( 0.0, DBL_MAX ): 0x0.0000001p-1022
e: exp2( floor( log2( x0 ) ) ): 0x1p+8
z: e | m: 0x1.0000001p+8
z1: z - e: 0x1p-20
z2: x0 + z1: 0x1.0010001p+8 (==xn)
x0Exp: frexp( x0, &x0Exp ): 9
mnorm: DBL_MIN + m: 0x1.0000001p-1022
zb: ldexp( mnorm, x0Exp-DBL_MIN_EXP ): 0x1.0000001p+8 (==z)
zb1: zb - e: 0x1p-20 (==z1)
zb2: x0 + zb1: 0x1.0010001p+8 (==xn)
zc1: ldexp( m, x0Exp-DBL_MIN_EXP ): 0x1p-20 (==z1)
zc2: x0 + zc1: 0x1.0010001p+8 (==xn)
*/

#include <stdio.h>
#include <float.h>
#include <math.h>
#include <assert.h>
#include <time.h>
int main( void )
{
#if FLT_RADIX != 2
#error Case not supported.
#endif

clock_t clockAtLoopStar t; // Used to time the start of loops.
clock_t clockAtLoopEnd; // Used to time the end of loops.

// The number of nextafter iterations to use. Can be anything.
unsigned int n = ( 1u << 24 );
printf( "\nn: %#x (%u)\n", n, n );

// The base double to apply nextafter to. Can be anything.
double x0 = 0x1.001p8;
printf( "x0: %a\n", x0 );

// The nth double greater than x0.
double xn = x0;
clockAtLoopStar t = clock();
for ( unsigned int k = 0; k != n; k++ )
{
xn = nextafter( xn, DBL_MAX );
}
clockAtLoopEnd = clock();
printf( "xn: n nextafter( x0, DBL_MAX ) iterations: %a\n", xn );
// xn: n nextafter( x0, DBL_MAX ) iterations: 0x1.0010001p+8
printf( " The xn loop took %.1f cpu seconds.\n",
(double) ( clockAtLoopEnd - clockAtLoopStar t ) / CLOCKS_PER_SEC );
// The xn loop took 0.8 cpu seconds.
// The goal is to compute xn fast. At least gcc -O3 does *not* optimize
// iterated nextafter calls (the above loop). If the loop (n) is long (or
// many loops have to be done) it takes too long to run.

// Calculate a useful mantissa:
double m = nextafter( 0.0, DBL_MAX );
m = n*m;
printf( "m: n * nextafter( 0.0, DBL_MAX ): %a\n", m );
// m: n * nextafter( 0.0, DBL_MAX ): 0x0.0000001p-1022

// Calculate a useful exponent part.
double e = exp2( floor( log2( x0 ) ) );
printf( "e: exp2( floor( log2( x0 ) ) ): %a\n", e );
// e: exp2( floor( log2( x0 ) ) ): 0x1p+8

// Combine e and m.
double z = e;
for ( unsigned char * zPtr = &z, * zEnd = zPtr + sizeof(z), * mPtr = &m;
zPtr != zEnd; zPtr++, mPtr++ )
{
*zPtr |= *mPtr;
}
printf( "z: e | m: %a\n", z );
// z: e | m: 0x1.0000001p+8

// Remove the faulty first mantissa bit in z, the one coming from e.
double z1 = z - e;
printf( "z1: z - e: %a\n", z1 );
// z1: z - e: 0x1p-20

// Finally we are ready to add the value to x0.
double z2 = x0 + z1;
printf( "z2: x0 + z1: %a (==xn)\n", z2 );
// z2: x0 + z1: 0x1.0010001p+8 (==xn)

assert( z2 == xn );
// Here is an approach using frexp and ldexp instead of bit-oring.

int x0Exp;
frexp( x0, &x0Exp );
printf( "x0Exp: frexp( x0, &x0Exp ): %d\n", x0Exp );
// x0Exp: frexp( x0, &x0Exp ): 9

// Normalize the mantissa.
double mnorm = DBL_MIN + m;
printf( "mnorm: DBL_MIN + m: %a\n", mnorm );
// mnorm: DBL_MIN + m: 0x1.0000001p-1022

// Increase the exponent of mnorm to match the exponent of x0.
double zb = ldexp( mnorm, x0Exp-DBL_MIN_EXP );
printf( "zb: ldexp( mnorm, x0Exp-DBL_MIN_EXP ): %a (==z)\n", zb );
// zb: ldexp( mnorm, x0Exp-DBL_MIN_EXP ): 0x1.0000001p+8 (==z)

assert( zb == z );
// The rest is just like above.

// Remove the faulty first mantissa bit in zb.
assert( e == ldexp( 1.0, x0Exp-1 ) );
double zb1 = zb - e;
printf( "zb1: zb - e: %a (==z1)\n", zb1 );
// zb1: zb1 - e: 0x1p-20
assert( zb1 == z1 );

// We are ready to add the value to x0.
double zb2 = x0 + zb1;
printf( "zb2: x0 + zb1: %a (==xn)\n", zb2 );
// zb2: x0 + zb1: 0x1.0010001p+8 (==xn)
assert( zb2 == xn );
// Here is a better solution.

double zc1 = ldexp( m, x0Exp-DBL_MIN_EXP );
printf( "zc1: ldexp( m, x0Exp-DBL_MIN_EXP ): %a (==z1)\n", zc1 );
// zc1: ldexp( m, x0Exp-DBL_MIN_EXP ): 0x1p-20 (==z1)
assert( zc1 == z1 );

// Add the value to x0.
double zc2 = x0 + zc1;
printf( "zc2: x0 + zc1: %a (==xn)\n", zc2 );
// zc2: x0 + zc1: 0x1.0010001p+8 (==xn)
assert( zc2 == xn );
return 0;
}

Nov 14 '05 #8

Similar topics

20431

detect keyboard input without reading it?

by: Scott Shaw | last post by:

Hi all, I was wondering if you could help out with this problem that I am having. What I am trying to do is detect keyboard input in a while loop without halting/pausing the loop until the key is pressed (without hitting return). I looked at serveral faq's on the net and installed the cspan readkey module and neither seems to work most likey...

Perl

1691

Whoever it was that suggested a floating DIV instead of a chromeless window

by: George Hester | last post by:

At the time this suggestion was made I didn't have the wherewithall to even attempt it. But over time I learned enough to make a stab at it. Let just say the foating DIV had to provide the same advantages as a chromeless window else it really wouldn''t help. You see the idea is that the chromeless window forces a client to make a choice. It...

Javascript

7357

Floating Div problem

by: Michael Bernstein | last post by:

Hello, I've tried to solve this problem six ways from sunday, but I'm conceding defeat and asking for help at this point. The following site is rendering *very* oddly in Mozilla only. IE seems to lay it out just fine: http://pythonphotos.org

HTML / CSS

26057

Floating div

by: Simon Wigzell | last post by:

How do a create a floating div that will always stay in the same place relative to the browser window as the user scrolls the main page? I guess an adaptation of one of these floating menus that always stay in view would work but I just need the few lines of code to keep it positioned a certain x and y pix location from the window top left...

Javascript

3041

bus error with printf line included, error without printf line?

by: ben | last post by:

why is it, in the below code, when there's a printf statement (the one commented with /* ****** */) the final for loop prints out fine, but without the commented with stars printf statement included in the code there's a bus error on the fourth element in the final for loop? final for loop print out when the /* ****** */ printf line is...

C / C++

5969

How to write array of doubles to stream without using a loop?

by: John Dumais | last post by:

Hello, I have been trying to figure out how to write an array of doubles (in this specific case) to a binary stream without using a loop. What I have been doing is... foreach(double d in TraceData) { instanceOfBinaryWriter.Write(d); }

C# / C Sharp

3640

How Do You Make a Loop Iterate Early Without Using GoTo?

by: eBob.com | last post by:

How do you make a loop iterate early without using a GoTo? (I guess I've done too much structured programming and I really don't like using GoTos.) Here's my code ... For Each Thing As OFI In FileInfo If Thing.Displayed <> True Then GoTo Iterate 'skip this entry; try next one End If

Visual Basic .NET

5471

Floating window in another frame

by: hendrakieran | last post by:

Hi Guys, I'd like to get some help regarding creating a floating window with the following scenario: * I have index.html which is contains the frameset definition: <html> <head> </head>

Javascript

2911

Floating boxes don't extend containing box

by: Ivor Somerset | last post by:

Dear CSS community, The code below shows my problem. I have a containing DIV box into which I place floating boxes. As the background-color shows, the size of the containing box is not extended by the inner floating boxes. Without the float, it works as expected. Why? What is it I am missing here about the CSS box model? Thanks in...

HTML / CSS

2508

page event to hide floating div

by: Jeremy | last post by:

I've got a floating div which becomes visible when a link is clicked. I want the div to be hidden when the user clicks anywhere on the page except for whithin the div. What is the best way to do this? Really, I only need to support ie6+ but cross browser is always nice.

Javascript

8205

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...

C / C++

8339

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...

Online Marketing

7967

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...

Windows Server

6619

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...

Career Advice

5712

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...

Microsoft Access / VBA

3840

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...

Networking - Hardware / Configuration

3872

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

1452

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

1185

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

General