assert( x > 0.0 && 1 && x == 0.0 ) holding

Daniel Vallstrom

I'm having problems with inconsistent floating point behavior
resulting in e.g.

assert( x > 0.0 && putchar('\n') && x == 0.0 );

holding. (Actually, my problem is the dual one where I get
failed assertions for assertions that at first thought ought
to hold, but that's not important.)

At the end is a full program containing the above seemingly
inconsistent assertion. On my x86 using gcc the assertion
doesn't fail when x is of type double.

AFAIK, what is happening is that at first x temporary resides
in a 80 bit register with a higher precision than the normal
64 bits. Hence x is greater than 0.0 at first even though
the "real" 64-bit value is 0.0. If you cast the value to long
double and print it you can see that it indeed is slightly
larger than 0.0 at first, but then becomes 0.0.

Is not failing "assert( x > 0.0 && 1 && x == 0.0 );"
acceptable?

What is the best workaround to the problem? One possibility is
using volatile intermediate variables, when needed, like this:

volatile double y = x;
assert( y > 0.0 && putchar('\n') && y == 0.0 );

Is that the best solution?
Daniel Vallstrom

/* Tests weird inconsistent floating point behavior resulting in
something like "assert( x > 0.0 && 1 && x == 0.0 );" holding!
Daniel Vallstrom, 041030.
Compile with e.g: gcc -std=c99 -pedantic -Wall -O -lm fpbug.c
*/

#include <stdio.h>
#include <math.h>
#include <assert.h>

int main( void )
{
double x = nextafter( 0.0, -1.0 ) * nextafter( 0.0, -1.0 );

/* The putchar-conjunct below is just something arbitrary in */
/* order to clear the x-register as a side-effect. At least */
/* that's what I guess is happening. */
assert( x > 0.0 && putchar('\n') && x == 0.0 );

return 0;
}

Nov 14 '05 #1

Subscribe Post Reply

3781

Elliott Back

Daniel Vallstrom wrote:

I'm having problems with inconsistent floating point behavior
resulting in e.g.

assert( x > 0.0 && putchar('\n') && x == 0.0 );

Testing whether a floating point number is equal to zero is an
indefinite thing, due to roundoff, truncation, and other FP errors. You
can never really be sure if it's equal, although the IEEE specification
will define 0 to be within a certain FP range.

Is there some other way you can structure the logic?

--
Thanks,
Elliott C. Bäck
--------------------------
www.elliottback.com
www.spreadIE.com

Nov 14 '05 #2

Chris Croughton

On 29 Oct 2004 16:39:04 -0700, Daniel Vallstrom
<da**************@gmail.com> wrote:

Is not failing "assert( x > 0.0 && 1 && x == 0.0 );"
acceptable?
It all depends (as the late Professor Joad used to say) on what you mean
by equality of floating point numbers. Or rather, on what your compiler
and processor think it means. Can you get your compiler to generate
assembler code and check? It may be that it is looking for "absolute
value less than a very small amount" for equality, and the value is not
quite zero. Or perhaps your multiply which generated it resulted in
underflow and a NAN ("Not A Number", a special value indicating that
something odd happened) which is evaluated as 'zero' for equality but
still has a sign.
What is the best workaround to the problem? One possibility is
using volatile intermediate variables, when needed, like this:

volatile double y = x;
assert( y > 0.0 && putchar('\n') && y == 0.0 );

Is that the best solution?

I don't understand what you are trying to achieve by having a condition
which should always evaluate as false like that. Or are you expecting y
(or x) to change value in the short time between testing y > 0 and y ==
0? The putchar() in the middle just confuses things more. Is this
something you found in a larger program? The nextafter() function call
would imply that.

Try putting in printfs tracing the value of x, with ridiculously high
precision:

printf("%-30.30g\n", x);

That should show whether the value is actually zero (with no rounding)
or not.

(I can't get it to fail even with very small values of x on my gcc
2.95.4 Debial GNU/Linux Duron 1200 system...)

Chris C

Nov 14 '05 #3

Daniel Vallstrom

Elliott Back <el*****@cornell.edu> wrote in message news:<Bs*********************@twister.nyroc.rr.com >...

Daniel Vallstrom wrote:
I'm having problems with inconsistent floating point behavior
resulting in e.g.

assert( x > 0.0 && putchar('\n') && x == 0.0 );
Testing whether a floating point number is equal to zero is an
indefinite thing, due to roundoff, truncation, and other FP errors. You
can never really be sure if it's equal, although the IEEE specification
will define 0 to be within a certain FP range.

I don't think that's the issue here. I don't mind the limited
precision. My problem is the flip-flop behavior, showing as x
being one thing at first, then suddenly another thing without
anything happening in between. If it's the 0.0 you object to, you
can replace that with an y!=0.0. I'll add such an example at
the end of the post.

Is there some other way you can structure the logic?

Don't know what you mean by this. If you mean weakening the
assertions (in the real program) I would never do that. The
volatile solution seems better.
Daniel Vallstrom

/* Tests weird inconsistent floating point behavior resulting in
something like "assert( x > y && 1 && x == y );" holding!
Daniel Vallstrom, 041030.
Compile with e.g: gcc -std=c99 -pedantic -Wall -O -lm fpbug2.c
*/

#include <stdio.h>
#include <math.h>
#include <assert.h>

int main( void )
{
double y = nextafter( 0.0, 1.0 );
double x = y + 0x1.0p-2 * y;

/* The putchar-conjunct below is just something arbitrary in */
/* order to clear the x-register as a side-effect. At least */
/* that's what I guess is happening. */
assert( x > y && putchar('\n') && x == y );

return 0;
}

Nov 14 '05 #4

Christian Bau

In article <Bs*********************@twister.nyroc.rr.com>,
Elliott Back <el*****@cornell.edu> wrote:

Daniel Vallstrom wrote:
I'm having problems with inconsistent floating point behavior
resulting in e.g.

assert( x > 0.0 && putchar('\n') && x == 0.0 );

Testing whether a floating point number is equal to zero is an
indefinite thing, due to roundoff, truncation, and other FP errors. You
can never really be sure if it's equal, although the IEEE specification
will define 0 to be within a certain FP range.

This is just stupid.

Testing whether a floating point number is equal to zero is a perfectly
reasonable thing to do on any C implementation that conforms to IEEE 754
- unfortunately, the x86 stack-based floating-point arithmetic is
absolutely braindamaged (SSE2 has fixed this, so the problem will go
away in the next few years) and does not conform to IEEE 754 in the most
common mode that it is used.

Nov 14 '05 #5

Christian Bau

In article <sl******************@ccserver.keris.net>,
Chris Croughton <ch***@keristor.net> wrote:

On 29 Oct 2004 16:39:04 -0700, Daniel Vallstrom
<da**************@gmail.com> wrote:
Is not failing "assert( x > 0.0 && 1 && x == 0.0 );"
acceptable?

It all depends (as the late Professor Joad used to say) on what you mean
by equality of floating point numbers. Or rather, on what your compiler
and processor think it means. Can you get your compiler to generate
assembler code and check? It may be that it is looking for "absolute
value less than a very small amount" for equality, and the value is not
quite zero. Or perhaps your multiply which generated it resulted in
underflow and a NAN ("Not A Number", a special value indicating that
something odd happened) which is evaluated as 'zero' for equality but
still has a sign.

In both cases the implementation would not be conforming to IEEE 754
(which it isn't anyway, as the assert proves). In an implementation that
conforms to IEEE 754, x compares equal to zero if and only if x is
either a positive zero or x is a negative zero, no "less than some very
small amount" bullshit. And a NaN definitely doesn't compare equal to
anything, not even equal to itself, and most definitely not equal to
zero, and it also doesn't compare greater than anything, including zero.

What is the best workaround to the problem? One possibility is
using volatile intermediate variables, when needed, like this:

volatile double y = x;
assert( y > 0.0 && putchar('\n') && y == 0.0 );

Is that the best solution?

I don't understand what you are trying to achieve by having a condition
which should always evaluate as false like that. Or are you expecting y
(or x) to change value in the short time between testing y > 0 and y ==
0? The putchar() in the middle just confuses things more. Is this
something you found in a larger program? The nextafter() function call
would imply that.

Looks like that is exactly what is happening: Either the value x is at
the same time greater than zero and equal to zero, or it changes between
the evaluation of (x > 0.0) and (x == 0.0). The first must not happen on
any implementation that conforms to IEEE 754, the second must not happen
on any Standard C implementation.

Nov 14 '05 #6

Tim Rentsch

da**************@gmail.com (Daniel Vallstrom) writes:

I'm having problems with inconsistent floating point behavior
resulting in e.g.

assert( x > 0.0 && putchar('\n') && x == 0.0 );

holding. (Actually, my problem is the dual one where I get
failed assertions for assertions that at first thought ought
to hold, but that's not important.)

At the end is a full program containing the above seemingly
inconsistent assertion. On my x86 using gcc the assertion
doesn't fail when x is of type double.

AFAIK, what is happening is that at first x temporary resides
in a 80 bit register with a higher precision than the normal
64 bits. Hence x is greater than 0.0 at first even though
the "real" 64-bit value is 0.0. If you cast the value to long
double and print it you can see that it indeed is slightly
larger than 0.0 at first, but then becomes 0.0.

Is not failing "assert( x > 0.0 && 1 && x == 0.0 );"
acceptable?

What is the best workaround to the problem? One possibility is
using volatile intermediate variables, when needed, like this:

volatile double y = x;
assert( y > 0.0 && putchar('\n') && y == 0.0 );

Is that the best solution?

[program snipped]

It looks to me like your analysis is right, including what to do about
resolving it. One minor suggestion: you might try writing the assert
without an explicit temporary, thusly:

assert( *(volatile double *)&x > 0.0 && putchar('\n')
&& *(volatile double *)&x == 0.0 );

which has a somewhat nicer feel in the context of the assert usage.
If this has the same behavior as the assert code with the explicit
temporary (and I think it should) you might want to use this form
instead, perhaps with a suitable CPP macro for the 'volatile' access.

Alternatively, you might use 'nextafter()' to compute the
smallest non-zero double, and test

assert( x >= smallest_nonzero_double && putchar('\n') && x == 0.0 );

This idea might give you another way of thinking about the problem
you're trying to solve. Floating point numbers are tricky; if you're
going to be testing them in assert's you probably want to think about
what conditions you're testing very, very carefully. Not that I
think you don't know that already. :)

Nov 14 '05 #7

Daniel Vallstrom

Tim Rentsch <tx*@alumnus.caltech.edu> wrote in message news:<kf*************@alumnus.caltech.edu>...

da**************@gmail.com (Daniel Vallstrom) writes:
I'm having problems with inconsistent floating point behavior
resulting in e.g.

assert( x > 0.0 && putchar('\n') && x == 0.0 );

holding. (Actually, my problem is the dual one where I get
failed assertions for assertions that at first thought ought
to hold, but that's not important.)
To clarify, the real construction I'm having problem with
looks something like this:

assert( y >= f(...) );

The assertion is correct (because previous calculations
effectively have yielded just that) but erroneously fails
because the f value gets extra precision yielding it
strictly larger than y.

At the end is a full program containing the above seemingly
inconsistent assertion. On my x86 using gcc the assertion
doesn't fail when x is of type double.

AFAIK, what is happening is that at first x temporary resides
in a 80 bit register with a higher precision than the normal
64 bits. Hence x is greater than 0.0 at first even though
the "real" 64-bit value is 0.0. If you cast the value to long
double and print it you can see that it indeed is slightly
larger than 0.0 at first, but then becomes 0.0.

Is not failing "assert( x > 0.0 && 1 && x == 0.0 );"
acceptable?

What is the best workaround to the problem? One possibility is
using volatile intermediate variables, when needed, like this:

volatile double y = x;
assert( y > 0.0 && putchar('\n') && y == 0.0 );

Is that the best solution?

[program snipped]
It looks to me like your analysis is right, including what to do about
resolving it.

Good to know. I'll go with the volatile solution.
One minor suggestion: you might try writing the assert
without an explicit temporary, thusly:

assert( *(volatile double *)&x > 0.0 && putchar('\n')
&& *(volatile double *)&x == 0.0 );

which has a somewhat nicer feel in the context of the assert usage.
If this has the same behavior as the assert code with the explicit
temporary (and I think it should) you might want to use this form
instead, perhaps with a suitable CPP macro for the 'volatile' access.
I thought about a volatile cast (only the simple
(volatile double) though; yours seems safer) but felt
very unsure of the meaning of a volatile cast. Having now
glanced through the standard I'm still unsure (even if it
might work on a test-example).

However, while looking through the standard, I found this
(C99, 6.5.4 Cast operators, p81):
86) If the value of the expression is represented with
greater precision or range than required by the type
named by the cast (6.3.1.8), then the cast specifies a
conversion even if the type of the expression is
the same as the named type.

Hence, a simple (double)x should work. But I tried that
before posting the original post and it didn't work! I.e.

assert( (double)x > 0.0 && putchar('\n') && x == 0.0 );

still doesn't fail. Looks like a bug in gcc. I'll add a full
program showing the bug at the end. To be fair, gcc doesn't
claim to be a C99 compiler but perhaps earlier standards also
guaranteed footnote 86) to hold? It sounds sensible. (The use
of nextafter is unimportant and can be coded in pre-C99.)

Alternatively, you might use 'nextafter()' to compute the
smallest non-zero double, and test

assert( x >= smallest_nonzero_double && putchar('\n') && x == 0.0 );
This is unacceptable since it would make the assertion
strictly weaker (the real assertion in the real program
that is; the above is strictly stronger).

This idea might give you another way of thinking about the problem
you're trying to solve. Floating point numbers are tricky; if you're
going to be testing them in assert's you probably want to think about
what conditions you're testing very, very carefully. Not that I
think you don't know that already. :)

I'm trying but gcc and the hardware failing me is not being helpfull;p
Daniel Vallstrom
/* Tests weird inconsistent floating point behavior resulting in something
like "assert( (double)x > 0.0 && 1 && x == 0.0 );" holding! At least
on my x86 the asertion doesn't fail using gcc. This shows a bug in gcc.
Daniel Vallstrom, 041031.
Compile with e.g: gcc -std=c99 -pedantic -Wall -O -lm fpbug3.c
*/

#include <stdio.h>
#include <math.h>
#include <assert.h>

int main( void )
{
double x = 0x1.0p-2 * nextafter( 0.0, 1.0 );

/* The putchar-conjunct below is just something arbitrary in */
/* order to clear the x-register as a side-effect. At least */
/* that's what I guess is happening. */
assert( (double)x > 0.0 && putchar('\n') && (double)x == 0.0 );

return 0;
}

Nov 14 '05 #8

Chris Torek

In article <news:62**************************@posting.google. com>
Daniel Vallstrom <da**************@gmail.com> wrote:

To clarify, the real construction I'm having problem with
looks something like this:

assert( y >= f(...) );

The assertion is correct (because previous calculations
effectively have yielded just that) but erroneously fails
because the f value gets extra precision yielding it
strictly larger than y.
[massive snippage]
I thought about a volatile cast (only the simple
(volatile double) though; yours seems safer) but felt
very unsure of the meaning of a volatile cast.
The meaning is up to the implementation (but this is true even for
ordinary "volatile double" variables). Using a separate volatile
double variable will work on today's gcc variants, at least, and
in my opinion is a reasonable thing to ask of them.
However, while looking through the standard, I found [...]
Hence, a simple (double)x should work. But I tried that
before posting the original post and it didn't work!
... Looks like a bug in gcc. ... gcc doesn't
claim to be a C99 compiler but perhaps earlier standards also
guaranteed footnote 86) to hold?

Yes, this is "supposed to work" even in C89. GCC has an option,
"-ffloat-store", that gets it closer to conformance in both cases.
The problem is that you do not want to use -ffloat-store, not
only because it apparently does not always work[%], but also because
it has horrible effects on performance.

Ultimately, the problem boils down to "the x86 CPU does not implement
IEEE single- and double-precision floating point quickly, but rather
only IEEE-extended".[%%] With "-ffloat-store", gcc gives you a choice
between "goes fast and doesn't work" or "crawls so slowly as to be
useless, but works". :-) The "volatile double" trick is a sort
of compromise that will probably get you there.

[% I am not sure what it is that is supposed to not-work when
using -ffloat-store.]

[%% There is a precision control field in the FPU control word,
but it affects only the mantissa, not the exponent. Depending on
your particular numbers, manipulating this field may or may not
suffice.]
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Nov 14 '05 #9

Mark McIntyre

On 31 Oct 2004 09:01:25 -0800, in comp.lang.c , da**************@gmail.com
(Daniel Vallstrom) wrote:

To clarify, the real construction I'm having problem with
looks something like this:

assert( y >= f(...) );

The assertion is correct (because previous calculations
effectively have yielded just that) but erroneously fails
because the f value gets extra precision yielding it
strictly larger than y.

For floating point values of y other than zero and some integers, this is a
doomed exercise. I realise you've already considered and discounted the
inherent inaccuracy of floats, but except in special circumstances, you
can't do this kind of comparison.

And by the way, are you using an assert to perform a check in release code?
Thats generally a bad idea.

--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt>
----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---

Nov 14 '05 #10

Daniel Vallstrom

Mark McIntyre <ma**********@spamcop.net> wrote in message news:<48********************************@4ax.com>. ..

On 31 Oct 2004 09:01:25 -0800, in comp.lang.c , da**************@gmail.com
(Daniel Vallstrom) wrote:
To clarify, the real construction I'm having problem with
looks something like this:

assert( y >= f(...) );

The assertion is correct (because previous calculations
effectively have yielded just that) but erroneously fails
because the f value gets extra precision yielding it
strictly larger than y.
For floating point values of y other than zero and some integers, this is a
doomed exercise. I realise you've already considered and discounted the
inherent inaccuracy of floats, but except in special circumstances, you
can't do this kind of comparison.

Sure you can. As said, the problem was that x86 floating point sucks
and that gcc is buggy. But even given that, the workaround is simple
and works fine. The workaround, as said, looks like this:

volatile double yvol = (double)y;
volatile double fvol = (double)f(...);
assert( yvol >= fvol );

The double casts guarantee this to work with any correct compiler.
And it also works with gcc since gcc handles volatiles in a reasonable
way.

Furthermore there is no performance hit to talk about, at least in this
case.

And by the way, are you using an assert to perform a check in release code?
Thats generally a bad idea.

Where did this question come from? Anyway, if anything, the opposite
is true. For example, what about that big blackout in northeast
america some time ago? IIRC, the reason it got so widespread was
that persons supervising the grid didn't notice anything wrong for
hours. And the reason for that was because the system diagnosing the
grid had entered into a state not anticipated by the makers I think.
Instead of failing an assert --- which presumably would have
alerted the technicians --- the program kept on going as if
nothing was wrong, showing no changes. (That's the picture of the
blackout I have at least. Truthfully, I'm not completely sure that
all the details in that story are correct:-)

Of course, if it's better that a program keeps going on, even if
its state is screwed up, then so be it. The point is that in many
situations it is better to keep the asssertions active.

I'm a big fan of a Hoare flavored assert system, effectively
guaranteeing that the program is correct with high probability.
For example, say that you are coding a sorting function. Then,
besides assertions in the sorting function itself, there should
be a big assertion at the end, asserting that all the properties
that should hold at the end actually do so, e.g. that the array
say is actually sorted.

Some assertions could take a lot of time and even change the
complexity of the program, from e.g. O(n) to O(n^2). Hence you
have to layer the assertions according to how much they slow
down the program.

If you feel that you can't use the assertion construction
provided by C you should implement your own equivalent system
rather than define NDEBUG.

In anticipation, a counter-argument to assertions that sometimes
comes up is that if you think assertions are so important you
should instead handle the cases properly. I.e. instead of doing
"assert(p)" you should do "if (!p) ...". That argument shows
a misunderstanding of what asserts are about, namely assuring
that the program is correct, not handling cases. Assertions
look like "assert(true)" and it's nonsense to instead go
"if (!true)...".
Daniel Vallstrom

Nov 14 '05 #11

Mark McIntyre

On 1 Nov 2004 09:36:22 -0800, in comp.lang.c , da**************@gmail.com
(Daniel Vallstrom) wrote:

Mark McIntyre <ma**********@spamcop.net> wrote in message news:<48********************************@4ax.com>. ..
For floating point values of y other than zero and some integers, this is a
doomed exercise. I realise you've already considered and discounted the
inherent inaccuracy of floats, but except in special circumstances, you
can't do this kind of comparison.

Sure you can.

Really? Go tell that to Goldberg et al.
As said, the problem was that x86 floating point sucks
and that gcc is buggy.
Maybe.
But even given that, the workaround is simple
and works fine. The workaround, as said, looks like this:

volatile double yvol = (double)y;
So you're telling me that casting FP values to doubles and using the
volatile keyword removes the imprecision inherent in storing irrational
numbers in finite binary format? If so, this is a special extension to gcc,
a) not guaranteed by the Standard and b) offtopic here.

And by the way, are you using an assert to perform a check in release code?
Thats generally a bad idea.

Where did this question come from?

What does that matter? You post to CLC, you expect to find people noticing
all sorts of other things.
Anyway, if anything, the opposite is true.
assert() calls abort(). This is a bad way to handle errors.
For example, what about that big blackout in northeast
america some time ago? IIRC, the reason it got so widespread was
that persons supervising the grid didn't notice anything wrong for
hours. And the reason for that was because the system diagnosing the
grid had entered into a state not anticipated by the makers I think.
Instead of failing an assert --- which presumably would have
alerted the technicians --- the program kept on going as if
nothing was wrong, showing no changes.
Mhm. So an assert() would have blacked out the *entire* US, including
bringing down the computer systems, sealing the technicians in their
airtight computer facility etc. Good solution.
Of course, if it's better that a program keeps going on, even if
its state is screwed up, then so be it. The point is that in many
situations it is better to keep the asssertions active.
I disagree, but YMMV. Remind me not to ask you to work on any
mission-critical systems of mine.... :-)
In anticipation, a counter-argument to assertions that sometimes
comes up is that if you think assertions are so important you
should instead handle the cases properly.

Indeed you should.
--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt>
----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---

Nov 14 '05 #12

Tim Rentsch

da**************@gmail.com (Daniel Vallstrom) writes:

Tim Rentsch <tx*@alumnus.caltech.edu> wrote in message news:<kf*************@alumnus.caltech.edu>...
da**************@gmail.com (Daniel Vallstrom) writes:

[lots snipped]
It looks to me like your analysis is right, including what to do about
resolving it.

Good to know. I'll go with the volatile solution.

After thinking about this more I'm coming around to the point of view
that using 'volatile' (or something similar) in the assertion is not
the right answer. But first let me address the use of 'volatile'.

One minor suggestion: you might try writing the assert
without an explicit temporary, thusly:

assert( *(volatile double *)&x > 0.0 && putchar('\n')
&& *(volatile double *)&x == 0.0 );

which has a somewhat nicer feel in the context of the assert usage.
If this has the same behavior as the assert code with the explicit
temporary (and I think it should) you might want to use this form
instead, perhaps with a suitable CPP macro for the 'volatile' access.

I thought about a volatile cast (only the simple
(volatile double) though; yours seems safer) but felt
very unsure of the meaning of a volatile cast. Having now
glanced through the standard I'm still unsure (even if it
might work on a test-example).

The '*(volatile double *)&x' idea came up in a thread (in fact it was
a response from Chris Torek) answering a question of mine about
volatile. As I recall that thread also concluded that just casting
directly, as in '(volatile double) x' is not guaranteed to work. Also
I remember that the '*(volatile double *)&x' approach is mentioned
explicitly in the standard (or perhaps the rationale - I don't always
remember which is which) as the right way to do this kind of forced
access.

Using a volatile variable should work equally well. My model (well,
part of it anyway) for volatile is assigning to a volatile variable
means a "store" must be done, and referencing a volatile variable
means a "load" must be done. I think that model is good operationally
even if it doesn't correspond exactly to what's said in the standard.

Furthermore, this suggests another means to accomplish the forced
access that doesn't suffer the restriction of needing an L-value
(which the address-casting approach has) or need another local
variable:

static inline int
double_GE( volatile double a, volatile double b ){
return a >= b;
}

. . .

assert( double_GE( y, f() ) );

Using this approach on the test program caused the "can't possibly
succeed" assertion to fail, as desired. Incidentally, leaving
off the 'volatile' specifiers on the parameters left the assertion
in the "can't possibly succeed, yet it does" state.

However, while looking through the standard, I found this
(C99, 6.5.4 Cast operators, p81):
86) If the value of the expression is represented with
greater precision or range than required by the type
named by the cast (6.3.1.8), then the cast specifies a
conversion even if the type of the expression is
the same as the named type.

Hence, a simple (double)x should work. But I tried that
before posting the original post and it didn't work!
[snip related test with gcc]

It seems right that the standard mandates that casting to double
should make the asssertion do what you'd expect, and it therefore
seems right that the program not doing that means gcc has a bug.
Personally I think this specification is mildly insane; if x is a
double (and assuming x has a floating point value rather than NaN),
the expression 'x == (double)x' should ALWAYS be true. But even if
we accept that using a '(double)' cast will correct the 64/80 bit
problem, this still seems like the wrong way to solve the problem,
for the same reason that using a 'volatile' forced access seems
wrong - see below.

Alternatively, you might use 'nextafter()' to compute the
smallest non-zero double, and test

assert( x >= smallest_nonzero_double && putchar('\n') && x == 0.0 );

This is unacceptable since it would make the assertion
strictly weaker (the real assertion in the real program
that is; the above is strictly stronger).

I hear you. My next question is, what is it that you are really
trying to guarantee?

This idea might give you another way of thinking about the problem
you're trying to solve. Floating point numbers are tricky; if you're
going to be testing them in assert's you probably want to think about
what conditions you're testing very, very carefully. Not that I
think you don't know that already. :)

I'm trying but gcc and the hardware failing me is not being helpfull;p

No kidding.

Rewriting the assertion - whether using (double) or volatile or some
other mechanism - and doing nothing else isn't the right way to solve
the problem. Here's my reasoning.

Why are you writing the assertion in the first place? Presumably it's
to guarantee that some other piece of code that depends on that
assertion being true is going to execute at some point in the (perhaps
near) future. For example, if

assert( y >= f() );

perhaps we are going to form the expression 'y - f()' and depend on
that value being non-negative.

Whatever it is that we're depending on should be exactly expressed by
what is in the assertion (or at least, should be implied by what is in
the assertion). If we write the assertion one way, and the later code
a different way, that won't be true - the assertion could succeed, and
the later code fail, or vice versa. So, whatever it is we do, the
assertion should be written in the very same form as the code that
the assertion is supposed to protect. Thus, if we have

static inline double
double_minus( volatile double a, volatile double b ){
return a - b;
}

/* make sure y - f() is non-negative */
assert( double_minus( y, f() ) >= 0 );

then the later code should use

difference = double_minus( y, f() );

and not

difference = y - f();

You see what I mean? That's related to my question about
reformulating the assertion expression, so that the assertion
expression and the subsequent code that depends on it are guaranteed
to be in sync, whatever it is that the subsequent really needs to
guarantee. Because the subsequent code may have just the same
problems that the code in the assertion expression has.

Picking up an earlier part of the message:

I'm having problems with inconsistent floating point behavior
resulting in e.g.

assert( x > 0.0 && putchar('\n') && x == 0.0 );

holding. (Actually, my problem is the dual one where I get
failed assertions for assertions that at first thought ought
to hold, but that's not important.)

To clarify, the real construction I'm having problem with
looks something like this:

assert( y >= f(...) );

The assertion is correct (because previous calculations
effectively have yielded just that) but erroneously fails
because the f value gets extra precision yielding it
strictly larger than y.

I've been in touch with the person who is now chairing the IEEE-754
revision committee, and he's going to take up this whole question with
the committee. He's asked for a code sample that illustrates the
'y >= f()' problem; if you get that to me I'll forward it to him.
If it's too large to post please feel free to send it to me in email.

Nov 14 '05 #13

Michael Mair

Mark McIntyre wrote:

On 1 Nov 2004 09:36:22 -0800, in comp.lang.c , da**************@gmail.com
(Daniel Vallstrom) wrote:

Mark McIntyre <ma**********@spamcop.net> wrote in message news:<48********************************@4ax.com>. ..

For floating point values of y other than zero and some integers, this is a
doomed exercise. I realise you've already considered and discounted the
inherent inaccuracy of floats, but except in special circumstances, you
can't do this kind of comparison.

Sure you can.

Really? Go tell that to Goldberg et al.

Why? As the OP used nextafter() in his failing example, at least this
is correct; and for many choices of f(), this also is possible.
If f() is keeping track of the way everything is rounded, then you
certainly can ask for >=. The point just is that you have to be
very careful.
BTW: I rather like the "extended" version of the Goldberg paper provided
by Sun.

As said, the problem was that x86 floating point sucks
and that gcc is buggy.

Maybe.

With certainty.

But even given that, the workaround is simple
and works fine. The workaround, as said, looks like this:

volatile double yvol = (double)y;

So you're telling me that casting FP values to doubles and using the
volatile keyword removes the imprecision inherent in storing irrational
numbers in finite binary format? If so, this is a special extension to gcc,
a) not guaranteed by the Standard and b) offtopic here.

The gcc/x86 issue has been brought up time and again -- in fact,
my first post to c.l.c was about that. Most people initially
are not sure whether the double cast has to work as intended
(and also intended by the standard) or whether there is some
exception to the rule they do not know.
The thing why it works with the volatile variable and the
volatile double * cast trick is due to a better implementation
of the volatile semantics.
Further: We are not talking about irrational numbers but rather
about numbers representable by long double and double.

And by the way, are you using an assert to perform a check in release code?
Thats generally a bad idea.
[snip]Anyway, if anything, the opposite is true.

assert() calls abort(). This is a bad way to handle errors.

Correct.
One remark: Many people create their own assertion macro which
does more cleanup, spits out more information and so on.

However, one usually places assertions where one thinks that
they always will hold and #defines them away for the release
version -- error checking and cleanup still has to take place.
Cheers,
Michael
--
E-Mail: Mine is a gmx dot de address.

Nov 14 '05 #14

Mark Piffer

Tim Rentsch <tx*@alumnus.caltech.edu> wrote in message news:<kf*************@alumnus.caltech.edu>...

[lotsoftextsnipped]

I hear you. My next question is, what is it that you are really
trying to guarantee?

This idea might give you another way of thinking about the problem
you're trying to solve. Floating point numbers are tricky; if you're
going to be testing them in assert's you probably want to think about
what conditions you're testing very, very carefully. Not that I
think you don't know that already. :)

I'm trying but gcc and the hardware failing me is not being helpfull;p

No kidding.

Rewriting the assertion - whether using (double) or volatile or some
other mechanism - and doing nothing else isn't the right way to solve
the problem. Here's my reasoning.

[carefully crafted argument snipped]

I agree, no amount of reformulation trickery will make his problems go
away if the flaw is in ignoring a basic rule for floating point
calculations: avoid doing arithmetic with denormalized numbers. The
result x is obviously denormalized (or even underflowed) at it's point
of use and therefore the 64-Bit representation vanishes into 0.0. For
a calculation which is possible to end up in a range close to 0, do
yourself a favour and limit it's result to machine precision (i.e. if
|x|<epsilon then x = 0).

Mark

Nov 14 '05 #15

Dan Pop

In <62**************************@posting.google.com > da**************@gmail.com (Daniel Vallstrom) writes:

I'm having problems with inconsistent floating point behavior
resulting in e.g.

assert( x > 0.0 && putchar('\n') && x == 0.0 );

holding. (Actually, my problem is the dual one where I get
failed assertions for assertions that at first thought ought
to hold, but that's not important.)

At the end is a full program containing the above seemingly
inconsistent assertion. On my x86 using gcc the assertion
doesn't fail when x is of type double.

AFAIK, what is happening is that at first x temporary resides
in a 80 bit register with a higher precision than the normal
64 bits. Hence x is greater than 0.0 at first even though
the "real" 64-bit value is 0.0. If you cast the value to long
double and print it you can see that it indeed is slightly
larger than 0.0 at first, but then becomes 0.0.

Is not failing "assert( x > 0.0 && 1 && x == 0.0 );"
acceptable?
Yes. However, assert((double)x > 0.0 && (double)x == 0.0) *must* fail.
It doesn't in gcc for x86, however, it is a well known bug that the
gcc people don't seem very eager to fix.
What is the best workaround to the problem? One possibility is
using volatile intermediate variables, when needed, like this:

volatile double y = x;
assert( y > 0.0 && putchar('\n') && y == 0.0 );

Is that the best solution?
Yes, if it works. gcc is doing very aggressive optimisations in this
area and happily ignoring my casts above, although this is forbidden by
the standard. You may actually want to use memcpy to copy the value of
x in y and have a look at the generated code.
/* Tests weird inconsistent floating point behavior resulting in
something like "assert( x > 0.0 && 1 && x == 0.0 );" holding!
Daniel Vallstrom, 041030.
Compile with e.g: gcc -std=c99 -pedantic -Wall -O -lm fpbug.c
*/

Adding -ffloat-store will fix your particular problem. However, this is
not a panacea for all the precision-related bugs of gcc and it is slowing
down floating-point intensive code. It is, however, the first thing you
may want to try when puzzled by gcc's behaviour.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Currently looking for a job in the European Union

Nov 14 '05 #16

Tim Rentsch

Mark McIntyre <ma**********@spamcop.net> writes:

On 1 Nov 2004 09:36:22 -0800, in comp.lang.c , da**************@gmail.com
(Daniel Vallstrom) wrote:
But even given that, the workaround is simple
and works fine. The workaround, as said, looks like this:

volatile double yvol = (double)y;

So you're telling me that casting FP values to doubles and using the
volatile keyword removes the imprecision inherent in storing irrational
numbers in finite binary format? If so, this is a special extension to gcc,
a) not guaranteed by the Standard and b) offtopic here.

Since Daniel included a citation (it was section 6.5.4 I believe)
about '(double)' forcing a conversion in certain cases, the issue is
certainly on-topic, regardless of what the ultimate resolution of the
question [about '(double)' removing extra precision] might be.

That's also true for the question about whether 'volatile' implies
moving out of extended precision. Regardless of whether it does
or does not remove extended precision, it's on-topic to ask if
the standard requires it to.

Just by the way, I don't think the answer to either of those questions
is as clear-cut as the "not guaranteed" statement implies. So if
there's an argument supporting that position, it would be good to
hear it, including relevant citations.

Nov 14 '05 #17

Tim Rentsch

so************@yahoo.com (Mark Piffer) writes:

I agree, no amount of reformulation trickery will make his problems go
away if the flaw is in ignoring a basic rule for floating point
calculations: avoid doing arithmetic with denormalized numbers. The
result x is obviously denormalized (or even underflowed) at it's point
of use and therefore the 64-Bit representation vanishes into 0.0. For
a calculation which is possible to end up in a range close to 0, do
yourself a favour and limit it's result to machine precision (i.e. if
|x|<epsilon then x = 0).

It's true that the result here is denormalized in a sense (it's as
normalized as it's possible for this value to be, given how floating
point numbers are represented). But the larger problem is more
pervasive than just numbers close to zero, or denormalized numbers.

What is the right way to handle the larger problem?

Nov 14 '05 #18

Mark McIntyre

On Tue, 02 Nov 2004 10:50:20 +0100, in comp.lang.c , Michael Mair
<Mi**********@invalid.invalid> wrote:

Mark McIntyre wrote:
On 1 Nov 2004 09:36:22 -0800, in comp.lang.c , da**************@gmail.com
(Daniel Vallstrom) wrote:
Really? Go tell that to Goldberg et al.
Why? As the OP used nextafter() in his failing example, at least this
is correct; and for many choices of f(), this also is possible.

I wonder if you actually *read* my original comment. Which said exactly the
same as you.
If f() is keeping track of the way everything is rounded, then you
certainly can ask for >=.
I suspect you can't, based on experience, but really don't have the energy
to argue, especially as my son is burbling in my left ear about some sort
of computerised walking car....
The point just is that you have to be very careful.

Indeed.

As said, the problem was that x86 floating point sucks
and that gcc is buggy.

Maybe.

With certainty.

That was "maybe, but in the context of CLC topicality, who knows?"!
--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt>
----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---

Nov 14 '05 #19

Christian Bau

In article <cc**************************@posting.google.com >,
so************@yahoo.com (Mark Piffer) wrote:

I agree, no amount of reformulation trickery will make his problems go
away if the flaw is in ignoring a basic rule for floating point
calculations: avoid doing arithmetic with denormalized numbers.

Avoid doing floating point arithmetic with tiny numbers on a processor
that is so brain-damaged that it cannot decide if they are denormalized
or not. On a proper implementation of IEEE 754, denormalized numbers are
no problem.

Nov 14 '05 #20

Christian Bau

In article <kf*************@alumnus.caltech.edu>,
Tim Rentsch <tx*@alumnus.caltech.edu> wrote:

so************@yahoo.com (Mark Piffer) writes:
I agree, no amount of reformulation trickery will make his problems go
away if the flaw is in ignoring a basic rule for floating point
calculations: avoid doing arithmetic with denormalized numbers. The
result x is obviously denormalized (or even underflowed) at it's point
of use and therefore the 64-Bit representation vanishes into 0.0. For
a calculation which is possible to end up in a range close to 0, do
yourself a favour and limit it's result to machine precision (i.e. if
|x|<epsilon then x = 0).

It's true that the result here is denormalized in a sense (it's as
normalized as it's possible for this value to be, given how floating
point numbers are represented). But the larger problem is more
pervasive than just numbers close to zero, or denormalized numbers.

What is the right way to handle the larger problem?

You could have a look at the Java language specification, where this
problem is handled. The first Java implementations required that all
double operations are actually done in double precision and not with
higher precision or a higher exponent range. A very reasonable demand
which unfortunately made correct implementations on x86 10 times slower
than incorrect ones in extreme cases. So now Java has "strict fp" and
"non-strict fp". In the "non-strict" case, an extended exponent range is
allowed in absolutely well-defined situations.

Nov 14 '05 #21

Daniel Vallstrom

Tim Rentsch <tx*@alumnus.caltech.edu> wrote in message news:<kf*************@alumnus.caltech.edu>...

da**************@gmail.com (Daniel Vallstrom) writes:
Tim Rentsch <tx*@alumnus.caltech.edu> wrote in message news:<kf*************@alumnus.caltech.edu>...
da**************@gmail.com (Daniel Vallstrom) writes:
>
> [lots snipped]

[large snip]
Furthermore, this suggests another means to accomplish the forced
access that doesn't suffer the restriction of needing an L-value
(which the address-casting approach has) or need another local
variable:

static inline int
double_GE( volatile double a, volatile double b ){
return a >= b;
}

. . .

assert( double_GE( y, f() ) );

Using this approach on the test program caused the "can't possibly
succeed" assertion to fail, as desired. Incidentally, leaving
off the 'volatile' specifiers on the parameters left the assertion
in the "can't possibly succeed, yet it does" state.
Nice. This is a clean solution. However, I haven't incorporated
it yet and are still using the old trusted solution with the
explicit extra volatile variable. Truthfully, even though I believe
that your cleaner solution will work, I feel somewhat reluctant to
convert to it since sadly I have lost trust in gcc. After all, the
simple double cast is also supposed to work. Reading e.g. Dan Pop
elsethread doesn't help to bring back trust in gcc either.

[snip]
It seems right that the standard mandates that casting to double
should make the asssertion do what you'd expect, and it therefore
seems right that the program not doing that means gcc has a bug.
Personally I think this specification is mildly insane; if x is a
double (and assuming x has a floating point value rather than NaN),
the expression 'x == (double)x' should ALWAYS be true.
Agreed.
But even if
we accept that using a '(double)' cast will correct the 64/80 bit
problem, this still seems like the wrong way to solve the problem,
for the same reason that using a 'volatile' forced access seems
wrong - see below.

[snip]
My next question is, what is it that you are really
trying to guarantee?
Together with what you write at the bottom of your reply it seems
that you are asking for a full-blown example containing the problem?
I'll try to cut out the relevant parts of the program with the problem
and patch it up to some smaller program.

[snip]
Rewriting the assertion - whether using (double) or volatile or some
other mechanism - and doing nothing else isn't the right way to solve
the problem. Here's my reasoning.

Why are you writing the assertion in the first place? Presumably it's
to guarantee that some other piece of code that depends on that
assertion being true is going to execute at some point in the (perhaps
near) future.
Yes and no. I'm an assert junky and like to take a Hoare/DBC
[see e.g. wikipedia?] like approach where I assert key properties
to hold inside a function implementing an algorithm but also at
the end, when a function is finished, assert that all properties
that should hold also do so. So I'm not really worrying about some
future code but am just asserting that 1) the function is correct
and 2) that all the properties promised by the function hold. In
this sense I'm not protecting code as such but rather properties
code can use.

In the program I'll try to post at the end, the main property
to hold at the end of the function is that an array was sorted
"modulo double casts" (i.e. after removing excessive precision).
That's all you can aim for. The property is not that the array
is "absolutely sorted also for 80 bit double and you can do
whatever you want". For example, say that element i is >= element
j according to the sorting function, and that i's floating value
is x and j's is y. Then it's asserted that (double)x >= (double)y
but it might still be the case that y > x (using extra precision).
(Actually, this is precisely the case in the program at the end.)

With this approach the simple volatile solution is sufficient,
AFAICS.

For example, if

assert( y >= f() );

perhaps we are going to form the expression 'y - f()' and depend on
that value being non-negative.

Whatever it is that we're depending on should be exactly expressed by
what is in the assertion (or at least, should be implied by what is in
the assertion). If we write the assertion one way, and the later code
a different way, that won't be true - the assertion could succeed, and
the later code fail, or vice versa. So, whatever it is we do, the
assertion should be written in the very same form as the code that
the assertion is supposed to protect. Thus, if we have

static inline double
double_minus( volatile double a, volatile double b ){
return a - b;
}

/* make sure y - f() is non-negative */
assert( double_minus( y, f() ) >= 0 );

then the later code should use

difference = double_minus( y, f() );

and not

difference = y - f();

You see what I mean?
I guess so but it doesn't really apply to the DBC approach.
It seems better to try specify more general properties to hold
and try to make the properties as strong as possible. I am
having trouble seeing a real situation where there would
be a difference. Can you give a more concrete example?

That's related to my question about
reformulating the assertion expression, so that the assertion
expression and the subsequent code that depends on it are guaranteed
to be in sync, whatever it is that the subsequent really needs to
guarantee. Because the subsequent code may have just the same
problems that the code in the assertion expression has.
Maybe the program example I'll post at the end will clear up what
I'm trying to do.

[snip]
I've been in touch with the person who is now chairing the IEEE-754
revision committee, and he's going to take up this whole question with
the committee. He's asked for a code sample that illustrates the
'y >= f()' problem; if you get that to me I'll forward it to him.
If it's too large to post please feel free to send it to me in email.

Alright, here is a sorting program. (I'll add the background to the
program in parenthesis. The program originates from a constraint
solver. You want to sort variables according to some strength
heuristic in order to branch on the strongest variables first.) The
program sorts an array a=[2,4,6...] according to a certain strength
measure. Each element (variable) p in 'a' has two strength measures
associated with it, str[p] and str[p+1] (the strength of p and its
negation -p respectively). The full strength measure of p (or rather
the pair (p,-p)) is then str[p]*str[p+1] (because the proof
derivations/search simplifications possible is proportional to the
product).

Without the volatile qualifiers, assertions fail on my x86. The output
I get with str defined as below is:

4 0x8p-1077 0x0.0000000000001p-1022
6 0x8p-2151 0x0p+0
8 0x8p-2151 0x0p+0
10 0x8p-2151 0x0p+0
2 0x8p-1079 0x0p+0

You can see that the array is not sorted w.r.t 80 bit precision.

As said, I just cut out the program and tried to patch it up. It
seems to work but I haven't read through it yet. That will have
to wait to tomorrow though. This has been a too long post as it
is already. Let me know if something didn't make sense:-)
Daniel Vallstrom

/* Implementation of quicksort with floating point comparisons and
assertions somewhat akin to a Hoare or DBC style.
Daniel Vallstrom, 041102.
Compile with e.g: gcc -std=c99 -pedantic -Wall -O -lm fpqsort.c
*/
#include <stdio.h>
#include <assert.h>
#include <math.h>
#include <stdlib.h>
// The higher AssertionLevel is, the more expensive assertions are made.
#define AssertionLevel 5

#define assert3(p) assert(p)

// The main floating point type.
typedef double Strength;
// Swaps 'a' and 'b' using temp as temporary storage.
#define swapM( a, b, temp ) { (temp) = (a); (a) = (b); (b) = (temp); }
// Sorts a, b and c optimally using temp as temporary storage. f o ('+'s)
// will be applied to the elements to be sorted and then they are compared
// using <. The largest element will be placed first.
#define sort3FPM( a, b, c, temp, f, s ) \
{ \
if ( (f)((s)+(a)) < (f)((s)+(b)) ) \
{ \
if ( (f)((s)+(a)) < (f)((s)+(c)) ) \
{ \
if ( (f)((s)+(b)) < (f)((s)+(c)) ) \
{ \
swapM( (a), (c), (temp) ); \
} \
else \
{ \
(temp) = (a); \
(a) = (b); \
(b) = (c); \
(c) = (temp); \
} \
} \
else \
{ \
swapM( (a), (b), (temp) ); \
} \
} \
else \
{ \
if ( (f)((s)+(b)) < (f)((s)+(c)) ) \
{ \
if ( (f)((s)+(a)) < (f)((s)+(c)) ) \
{ \
(temp) = (a); \
(a) = (c); \
(c) = (b); \
(b) = (temp); \
} \
else \
{ \
swapM( (b), (c), (temp) ); \
} \
} \
} \
}

// Sorts the array a. f o ('+'s) will be applied to the elements to be
// sorted and then they are compared using <. The largest element will be
// placed first. s is an array of strength for each literal. n elements
// are sorted.
// For now we'll do a recursive quicksort.
static void srt( unsigned int * a, Strength (*f)( Strength * x ),
Strength * s, unsigned int n )
{
unsigned int tmp;

switch ( n )
{
case 0:
case 1:

break;

case 2:

if ( f( s+a[0] ) < f( s+a[1] ) )
{
swapM( a[0], a[1], tmp );
}

break;

case 3:

sort3FPM( a[0], a[1], a[2], tmp, f, s );

break;

default:

{
// Pointer to the right of the pivot. *c and all values to the
// right of c is <= p, the pivot value.
unsigned int * c = a + n - 1;

{
// Set up the pivot using the first, middle and last positions.

// Pivot pointer.
unsigned int * pPtr = a + n/2;

sort3FPM( *a, *pPtr, *c, tmp, f, s );

// The pivot value.
Strength p = f(s+*pPtr);
// Sort in halves w.r.t the pivot.

// Pointer to the left of the pivot. *b and all values to the
// left of b is >= p.
unsigned int * b = a;

while ( 1 )
{
// Search for a value < p to the left.
do
{
b++;
}
while ( f(s+*b) >= p && b != c );

if ( b == c )
{
#ifndef NDEBUG
// Because of a bug in gcc, a (Strength) cast doesn't
// ensure that excessive precision is stripped. Hence
// the use of the intermediate volatile variables.
volatile Strength xvol = (Strength)f( s + c[-1] );
volatile Strength pvol = (Strength)p;
assert( xvol >= pvol );
xvol = f( s + *c );
assert( xvol <= pvol );
#endif

break;
}

// Search for a value > p to the right.
do
{
c--;
}
while ( f(s+*c) <= p && c != b );

if ( c == b )
{
#ifndef NDEBUG
volatile Strength xvol = (Strength)f( s + c[-1] );
volatile Strength pvol = (Strength)p;
assert( xvol >= pvol );
xvol = f( s + *c );
assert( xvol <= pvol );
#endif

break;
}

swapM( *c, *b, tmp );
}
}

assert( c-a >= 1 );
assert( n - (c-a) >= 1 );

srt( a, f, s, c-a );
srt( c, f, s, n - (c-a) );
}

break;
}
}
// Sorts the array a. f o ('+'s) will be applied to the elements to be
// sorted and then they are compared using <. The largest element will be
// placed first. s is an array of strength for each literal. n elements
// are sorted.
static void sort( unsigned int * a, Strength (*f)( Strength * x ),
Strength * s, unsigned int n )
{
srt( a, f, s, n );

// Assert that a is sorted.
#if AssertionLevel >= 3 && !defined(NDEBUG)
{
unsigned int * aEnd = a + n;

if ( a != aEnd )
{
// The sum of all variables in a. Used to ensure with high
// probability that each variable occurs exactly once.
unsigned int sum = *a;

#if 1
printf( "\n %lu %La %a\n", (unsigned long int) *a,
(long double) f(s+a[0]), f(s+a[0]) ); // tmp print
#endif
for ( a++; a != aEnd; a++ )
{
volatile Strength xvol = (Strength)f(s+a[-1]);
volatile Strength yvol = (Strength)f(s+a[0]);
assert3( xvol >= yvol );
#if 1
printf( " %lu %La %a\n", (unsigned long int) *a,
(long double) f(s+a[0]), yvol ); // tmp print
#endif
sum += *a;
}

assert( 2*sum == n * ( 2*n + 2 ) );
}
}
#endif
}
// Returns a measure of the combined strength of p and -p given the individual
// strength of p and -p. varStrPtr[0] is the strength of the unnegated variable
// (p), varStrPtr[1] is the strength of the negated variable (-p).
static Strength litPairStrengthF( Strength * varStrPtr )
{
return varStrPtr[0] * varStrPtr[1];
}
int main( void )
{
// The number of elements to be sorted.
unsigned int n = 5;

// The array to be sorted. Contains the numbers 2, 4, 6, 8, ...
// a[i] is considered >= a[j] if litPairStrengthF( str + a[i] ) >=
// litPairStrengthF( str + a[j] ), where str is defined below.
unsigned int * a = malloc( sizeof *a * n );

Strength * str = malloc( sizeof *str * ( 2*n + 2 ) );

if ( a == NULL || str == NULL )
{
printf( "Error: not enough memory.\n" );
return EXIT_FAILURE;
}

// Init a.
for ( unsigned int k = 2, * b = a, * aEnd = a + n; b != aEnd; b++, k+=2 )
{
*b = k;
}

// Init str. This is arbitrary.
for ( Strength * s = str, * strEnd = str + 2*n + 2; s != strEnd; s++ )
{
*s = nextafter( 0.0, 1.0 );
}
str[2] = nextafter( 0.0, 1.0 );
str[3] = 0x1.0p-2;
str[4] = nextafter( 0.0, 1.0 );
str[5] = 0x1.0p0;

sort( a, litPairStrengthF, str, n );

return 0;
}

Nov 14 '05 #22

Daniel Vallstrom

Mark McIntyre <ma**********@spamcop.net> wrote in message news:<61********************************@4ax.com>. ..

On 1 Nov 2004 09:36:22 -0800, in comp.lang.c , da**************@gmail.com
(Daniel Vallstrom) wrote:
Mark McIntyre <ma**********@spamcop.net> wrote in message news:<48********************************@4ax.com>. ..

For floating point values of y other than zero and some integers, this is a
doomed exercise. I realise you've already considered and discounted the
inherent inaccuracy of floats, but except in special circumstances, you
can't do this kind of comparison.

Sure you can.

Really? Go tell that to Goldberg et al.

No need. This was the closest quote from Goldberg on the subject
I could be bothered to find (Goldberg, What Every Computer
Scientist Should Know About Floating Point Arithmetic (1991), p42):

"Incidentally, some people think that the solution to such anomalies
is never to compare floating-point numbers for equality, but instead
to consider them equal if they are within some error bound E. This is
hardly a cure-all because it raises as many questions as it answer.
What should the value of E be? If x < 0 and y > 0 are within E, should
they really be considered to be equal, even though they have different
signs? Furthermore, the relation defined by this rule,
a ~ b |a - b| < E, is not an equivalence relation because a ~ b and
b ~ c does not imply that a ~ c."
[snap]
Daniel Vallstrom

Nov 14 '05 #23

Villy Kruse

On 2 Nov 2004 15:28:23 -0800,
Daniel Vallstrom <da**************@gmail.com> wrote:

"Incidentally, some people think that the solution to such anomalies
is never to compare floating-point numbers for equality, but instead
to consider them equal if they are within some error bound E. This is
hardly a cure-all because it raises as many questions as it answer.
What should the value of E be? If x < 0 and y > 0 are within E, should
they really be considered to be equal, even though they have different
signs? Furthermore, the relation defined by this rule,
a ~ b |a - b| < E, is not an equivalence relation because a ~ b and
b ~ c does not imply that a ~ c."

Then both x and y are within epsilon of zero so both should be practically
equal to zero and thus equal to each other. These problems are handled
in the mathematical discipline Numerical Analysis, and it isn't realy
trivial.
Villy

Nov 14 '05 #24

Daniel Vallstrom

da**************@gmail.com (Daniel Vallstrom) wrote in message news:<62**************************@posting.google. com>...

Tim Rentsch <tx*@alumnus.caltech.edu> wrote in message news:<kf*************@alumnus.caltech.edu>...
[snips]

My next question is, what is it that you are really
trying to guarantee? Why are you writing the assertion in the first place? Presumably it's
to guarantee that some other piece of code that depends on that
assertion being true is going to execute at some point in the (perhaps
near) future.

To reiterate a bit more clearly, I'm guaranteeing what the assertion
says (e.g. (double)x>=(double)y), nothing more, nothing less.
[snip]

Regarding the program, I forgot to say exactly what values the elements
in the original floating point array (str) can take. The answer is *any*
positive value including 0 but not infinite. They can also be of some
special negative value.

[snip]
// Swaps 'a' and 'b' using temp as temporary storage.
#define swapM( a, b, temp ) { (temp) = (a); (a) = (b); (b) = (temp); }

Hmm, did I write that?) With proper use of curly parenthesis it shouldn't
matter but generally it's of course better to write swap macros like this:

#define swapM( a, b, temp ) ( (temp) = (a), (a) = (b), (b) = (temp) )

or

#define swapM( a, b, temp ) \
do { (temp) = (a); (a) = (b); (b) = (temp); } while(0)
Comments on the program in the parent post are welcome.
Daniel Vallstrom

Nov 14 '05 #25

Tim Rentsch

da**************@gmail.com (Daniel Vallstrom) writes:

Tim Rentsch <tx*@alumnus.caltech.edu> wrote:

[SNIP]

My next question is, what is it that you are really
trying to guarantee?

Together with what you write at the bottom of your reply it seems
that you are asking for a full-blown example containing the problem?
I'll try to cut out the relevant parts of the program with the problem
and patch it up to some smaller program.

Yes that's what I was asking. The posted program was very helpful,
just what I was looking for.

I'm due for a followup on this. I've gone about three rounds of
emails with the IEEE-784 folks. The answer is pretty simple once
it's boiled down.
What should happen: what SHOULD happen is that the program should
work, without any need of casting or any 'volatile' specifiers.
The reason for this is how C99 'double's are handled for assignment
and function return values. In particular,

y = ... some expression ...;

while( f(xyz) <= y ) ...

In both of these cases the values for y and f() should be 64 bit
values, because: assignment should cause a value to be converted to
64 bit double; and, a function return result should be converted to
64 bit double. Those rules are how C99 programs are supposed to work.
If these rules were followed, the program would (I'm pretty sure)
simply work, asserts included, with no casting or 'volatile' tricks
required.
What does happen: what DOES happen is that these conversions aren't
done according to specification. Part of the explanation for why this
is so is the x86 architecture - it doesn't[*] have a way to evoke
conversion from 80 bit to 64 bit except by storing in memory and then
loading again. Since this extra storing and loading causes horrible
performance, it normally isn't done (and standard be d***ed :). You
might fault gcc for taking this attitude, but it looks like plenty of
other compilers do exactly the same thing, which is to say, they have
a switch to make the "right" semantics happen but otherwise don't
necessarily follow the "right" semantics exactly.
[*] at least, not in all models.
What is a good way to fix it: at certain places we need to force
conversion to 64 bit doubles. This is best done (IMO) by function
encapsulation:

static inline double
double64( double d ){
return * (volatile double *) &d;
}

and then using 'double64()' as needed in the application. For
the program posted, changing 'litPairStrengthF' so it reads

return double64( varStrPtr[0] * varStrPtr[1] );

seems to fix both the sorting behavior and the assert's -- that is,
the sorting order has changed (and looks right), and the assert's no
longer need 'volatile' to succeed.

Note that this change corresponds to making this function behave
according to the C99 semantics described by the IEEE-784 group.
Disclaimer: I speak for myself, not the IEEE-784 committee. I
believe my comments here are consistent with what I've heard from
them, but these are my comments not theirs.
(I have some other program comments but I think it's better to
do those off-line rather than in the NG. If you'd like those
Daniel please send me an email. Did you get the earlier email
I sent you?)

Nov 14 '05 #26

Tim Rentsch

Christian Bau <ch***********@cbau.freeserve.co.uk> writes:

In article <kf*************@alumnus.caltech.edu>,
Tim Rentsch <tx*@alumnus.caltech.edu> wrote:
so************@yahoo.com (Mark Piffer) writes:
I agree, no amount of reformulation trickery will make his problems go
away if the flaw is in ignoring a basic rule for floating point
calculations: avoid doing arithmetic with denormalized numbers. The
result x is obviously denormalized (or even underflowed) at it's point
of use and therefore the 64-Bit representation vanishes into 0.0. For
a calculation which is possible to end up in a range close to 0, do
yourself a favour and limit it's result to machine precision (i.e. if
|x|<epsilon then x = 0).

It's true that the result here is denormalized in a sense (it's as
normalized as it's possible for this value to be, given how floating
point numbers are represented). But the larger problem is more
pervasive than just numbers close to zero, or denormalized numbers.

What is the right way to handle the larger problem?

You could have a look at the Java language specification, where this
problem is handled. The first Java implementations required that all
double operations are actually done in double precision and not with
higher precision or a higher exponent range. A very reasonable demand
which unfortunately made correct implementations on x86 10 times slower
than incorrect ones in extreme cases. So now Java has "strict fp" and
"non-strict fp". In the "non-strict" case, an extended exponent range is
allowed in absolutely well-defined situations.

You've misunderstood the problem I was asking about. I know about
strict fp in Java. It's perfectly easy (if not always especially
convenient) to get strict fp semantics in C. The question is about
something altogether different, viz., how to get good results even
in the presence of non-deterministically differing precisions at
different points in program execution.

Nov 14 '05 #27

Daniel Vallstrom

Tim Rentsch wrote:

What is a good way to fix it: at certain places we need to force
conversion to 64 bit doubles. This is best done (IMO) by function
encapsulation:

static inline double
double64( double d ){
return * (volatile double *) &d;
}

and then using 'double64()' as needed in the application.
Agreed, this is the cleanest solution. The thing one wants to
be able to do is to double cast variables of type double, for
example (double)x. Since that isn't working, something like
double64(x) --- or perhaps forceDoubleCast(x) --- seems like
the next best thing.
For
the program posted, changing 'litPairStrengthF' so it reads

return double64( varStrPtr[0] * varStrPtr[1] );

seems to fix both the sorting behavior and the assert's -- that is,
the sorting order has changed (and looks right), and the assert's no
longer need 'volatile' to succeed.
One could perhaps do that but it would be different from
the original program. The straight forward way would be to
replace all the constructions involving volatile in the
original program with a double64() construction. After all,
all the volatile constructions in the original program are
nothing but more elaborate ways of doing a double64 call
(or a double cast if that would have worked).

Casting already in the 'litPairStrengthF' function might
not be a good idea for a couple of reasons:

(1) The function could be supplied by an external user and
you don't want her to have to bother with subtle double64
calls. (You could add a double64 wrapper call to all the
litPairStrengthF calls though, but see (2).)

(2) The function is or could be used in other, more time
critical, code. Doing the cast inside the function could
as you say slow down the real program too much. At the
very least it would slow down the sorting. You only
want to force the casts when it is necessary.

(Of course, you had no way of knowing about this, not (1) at least.)

With your double64 call in litPairStrengthF, the sorting order
changes only by accident. One can't guarantee that the values
get sorted also w.r.t 80 bit (in this way). (But I don't care
about that.) E.g. with the initialization below (in addition to
the str init loop), the values still don't get 80 bit sorted even
with your double64 addition in 'litPairStrengthF':

str[7] = 0x1.0p-2;
str[5] = 0x1.0p0;
(I have some other program comments but I think it's better to
do those off-line rather than in the NG. If you'd like those
Daniel please send me an email. Did you get the earlier email
I sent you?)

Feel free to post them here or, if you think that's more appropriate,
email me. I got your mail.
Daniel Vallstrom

Nov 14 '05 #28

assert( x > 0.0 && 1 && x == 0.0 ) holding

Similar topics