By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,877 Members | 1,100 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,877 IT Pros & Developers. It's quick & easy.

x/x=1.0 in double-Arithmetik

P: n/a
Let:

double x,y;
double z = max(x,y);

Does the standard ensure that

( z == 0.0 ) || ( x/z == 1.0 ) || ( y/z == 1.0 )

always gives true?

With best regards,
Tobias

Aug 24 '07 #1
Share this Question
Share on Google+
9 Replies


P: n/a
Tobias wrote:
Let:

double x,y;
double z = max(x,y);

Does the standard ensure that

( z == 0.0 ) || ( x/z == 1.0 ) || ( y/z == 1.0 )

always gives true?
I'm not that proficient with IEEE 754 floating point arithmetic, but
AFAIK this is wrong if x or y is positive infinity; there may well be
other such problems with "special values".

Cheers,
Daniel

--
Got two Dear-Daniel-Instant Messages
by MSN, associate ICQ with stress--so
please use good, old E-MAIL!
Aug 24 '07 #2

P: n/a
Tobias wrote:
Let:

double x,y;
Uninitialised. Contain indeterminate values, possibly such that
can cause a hardware exception. And, of course, there is no
guarantee that 'x' and 'y' contain the same indeterminate value.
double z = max(x,y);
Passing indeterminate values to 'max' probably leads to undefined
behaviour.
Does the standard ensure that

( z == 0.0 ) || ( x/z == 1.0 ) || ( y/z == 1.0 )

always gives true?
No guarantees because of indeterminate values used.

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Aug 24 '07 #3

P: n/a
On 2007-08-24 18:47, Tobias wrote:
Let:

double x,y;
double z = max(x,y);

Does the standard ensure that

( z == 0.0 ) || ( x/z == 1.0 ) || ( y/z == 1.0 )

always gives true?
No, the first one requires that either x, y, or both are zero and the
other is less than zero, which the standard can not guarantee.

The second ones requires exact arithmetic to always be true, due to the
way floating point operations this might not always be the case, but it
will be close.

--
Erik Wikström
Aug 24 '07 #4

P: n/a
>
Passing indeterminate values to 'max' probably leads to undefined
behaviour.
I'm sorry, I should have written:

bool foo(double x, double y)
{
double z = max(x,y);
return ( z == 0.0 ) || ( x/z == 1.0 ) || ( y/z == 1.0 );
}

That would make the real question more obvious.

Thanks for the comment anyway.

With best regards,
Tobias

Aug 24 '07 #5

P: n/a
AFAIK this is wrong if x or y is positive infinity; there may well be
other such problems with "special values".
Thanks for the comment.
In my special application infinity will not occur.

With best regards,
Tobias
Aug 24 '07 #6

P: n/a
The second ones requires exact arithmetic to always be true, due to the
way floating point operations this might not always be the case, but it
will be close.
That is the real question. I'm especially interested in the case when
x and y become very small.

Somebody assured me that division a double by the same double (in the
same representation) gives always the exact double 1.0.

But might there be a problem with normalization or with representation
of the double numbers in the registers of the numerical core?

Best regards,
Tobias

Aug 24 '07 #7

P: n/a
On Fri, 24 Aug 2007 11:13:05 -0700, Tobias wrote:
>The second ones requires exact arithmetic to always be true, due to the
way floating point operations this might not always be the case, but it
will be close.

That is the real question. I'm especially interested in the case when x
and y become very small.

Somebody assured me that division a double by the same double (in the
same representation) gives always the exact double 1.0.
That is true for IEEE 754 conforming arithmetic. Basic arithmetic
operations need to be done with exact rounding i.e. conceptually the
operation is carried out with infinite precision and then rounded (using
the current rounding mode) to the nearest representable number.
But might there be a problem with normalization or with representation
of the double numbers in the registers of the numerical core?
Since on the Intel architecture double and extended precision are mixed
and the actual CPU instructions make it very hard to avoid this most IEEE
754 guarantees are moot (thank you Intel).

For stable numerical behaviour you need to avoid the Intel architecture
or force your compiler to use the newer SSE2 instruction set exclusively
for floating point arithmetic if possible.

--
Markus Schoder
Aug 24 '07 #8

P: n/a
pan
In article <13*************@corp.supernews.com"Alf P.
Steinbach"<al***@start.nowrote:
* Tobias:
> ( x/z == 1.0 )
Unless x and y are special values, with modern floating point
implementations this is guaranteed. Floating point division, with a
modern floating point implementation, isn't inherently inexact: it
just rounds the result to some specific number of binary digits.
Hence x/x, with normal non-zero numbers, produces exactly 1.
I had a funny experience that allows me to say that the number of
binarydigits is not so specific, but can vary unexpectedly =)

I had this compare predicate which gave me funny assertions when used
incombination with particular optimizations:

(unchecked code, just to give the idea)

struct Point {
double a, b;
};

bool anglesorter(Point a, Point b) {
return atan2(a.y, a.x) < atan2(b.y, b.x);
}

which, passed to a sort function, lead to assertions when a point in
the sequence was replicated. The assertion (inserted by microsoft as a
checkfor the predicates) has this condition:

pred(a, b) && !pred(b, a)

which is quite funny. After disassembling i found out that one of the
two atan2 was kept in a x87 register (80-bit), while the other was
stored inmemory (64-bit precision), so the result of

anglesorter(a, a)

would give true for some values of a.

Now, this doesn't look the same as the OP's example, but I wouldn't
trust the floating point optimizations again: what happens if foo gets
inlined, its parameters are into extended precision register, but the
result ofmax, for some odd reason, does get trimmed to
double-precision?

I must admit I don't like the alternative version too:

x/z == 1.0

becomes

fabs(x/z-1.0) < numeric_limits<double>::epsilon()

or, avoiding divisions:

fabs(x-z) < numeric_limits<double>::epsilon()*z

because they are both unreadable.
I'd try to avoid the need of such a compare in the first
place.

--
Marco

--
I'm trying a new usenet client for Mac, Nemo OS X.
You can download it at http://www.malcom-mac.com/nemo

Aug 25 '07 #9

P: n/a
On 2007-08-25 07:44:49 -0400, "Alf P. Steinbach" <al***@start.nosaid:
* pan:
>>
which is quite funny. After disassembling i found out that one of the
two atan2 was kept in a x87 register (80-bit), while the other was
stored inmemory (64-bit precision), so the result of

anglesorter(a, a)

would give true for some values of a.

Now, this doesn't look the same as the OP's example, but I wouldn't
trust the floating point optimizations again: what happens if foo gets
inlined, its parameters are into extended precision register, but the
result ofmax, for some odd reason, does get trimmed to
double-precision?

This sounds like a case of not really having established the reason,
e.g., with a clearly established reason you'd be able to say that under
such and such conditions, x == x would yield false (which it does for
IEEE NaN). That said, Visual C++ 6.0 or thereabouts had notoriously
unreliable floating point optimization. If I had to guess, I'd guess
that then year old compiler, but simply encountering a NaN or two
without understanding it could also be the Real Problem.
But this example isn't x == x, it's anglesorter(x, x), where
anglesorter applies the same computation to both of its arguments. C
and C++ both allow the implementation to do computations involving
floating point values at higher precision than that of the actual type.
For x87, this means that computations involving doubles (64 bits) can
be done at full width for the processor (80 bits). When you compare the
result of atan2 computed at 80 bits with the result of atan2 rounded to
64 bits, the results will almost certainly be different. And that's
allowed by the language definition (in C99 you can check how the
implementation handles this by looking at the value of the macro
FLT_EVAL_METHOD).

What isn't allowed is hanging on to the extra precision across a store:

double x1 = atan2(a.y, a.x);
double x2 = atan2(b.y, b.x);
return x1 < x2;

In this case, the values of both x1 and x2 must be rounded to double,
even if the compiler elides the actual store and holds them in
registers. Many compilers skip this step, because it's an impediment to
optimization, unless you set a command line switch to tell the compiler
that your really mean it.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Aug 25 '07 #10

This discussion thread is closed

Replies have been disabled for this discussion.