457,877 Members | 1,100 Online Need help? Post your question and get tips & solutions from a community of 457,877 IT Pros & Developers. It's quick & easy.

# x/x=1.0 in double-Arithmetik

 P: n/a Let: double x,y; double z = max(x,y); Does the standard ensure that ( z == 0.0 ) || ( x/z == 1.0 ) || ( y/z == 1.0 ) always gives true? With best regards, Tobias Aug 24 '07 #1
9 Replies

 P: n/a Tobias wrote: Let: double x,y; double z = max(x,y); Does the standard ensure that ( z == 0.0 ) || ( x/z == 1.0 ) || ( y/z == 1.0 ) always gives true? I'm not that proficient with IEEE 754 floating point arithmetic, but AFAIK this is wrong if x or y is positive infinity; there may well be other such problems with "special values". Cheers, Daniel -- Got two Dear-Daniel-Instant Messages by MSN, associate ICQ with stress--so please use good, old E-MAIL! Aug 24 '07 #2

 P: n/a Tobias wrote: Let: double x,y; Uninitialised. Contain indeterminate values, possibly such that can cause a hardware exception. And, of course, there is no guarantee that 'x' and 'y' contain the same indeterminate value. double z = max(x,y); Passing indeterminate values to 'max' probably leads to undefined behaviour. Does the standard ensure that ( z == 0.0 ) || ( x/z == 1.0 ) || ( y/z == 1.0 ) always gives true? No guarantees because of indeterminate values used. V -- Please remove capital 'A's when replying by e-mail I do not respond to top-posted replies, please don't ask Aug 24 '07 #3

 P: n/a On 2007-08-24 18:47, Tobias wrote: Let: double x,y; double z = max(x,y); Does the standard ensure that ( z == 0.0 ) || ( x/z == 1.0 ) || ( y/z == 1.0 ) always gives true? No, the first one requires that either x, y, or both are zero and the other is less than zero, which the standard can not guarantee. The second ones requires exact arithmetic to always be true, due to the way floating point operations this might not always be the case, but it will be close. -- Erik Wikström Aug 24 '07 #4

 P: n/a > Passing indeterminate values to 'max' probably leads to undefined behaviour. I'm sorry, I should have written: bool foo(double x, double y) { double z = max(x,y); return ( z == 0.0 ) || ( x/z == 1.0 ) || ( y/z == 1.0 ); } That would make the real question more obvious. Thanks for the comment anyway. With best regards, Tobias Aug 24 '07 #5

 P: n/a AFAIK this is wrong if x or y is positive infinity; there may well be other such problems with "special values". Thanks for the comment. In my special application infinity will not occur. With best regards, Tobias Aug 24 '07 #6

 P: n/a The second ones requires exact arithmetic to always be true, due to the way floating point operations this might not always be the case, but it will be close. That is the real question. I'm especially interested in the case when x and y become very small. Somebody assured me that division a double by the same double (in the same representation) gives always the exact double 1.0. But might there be a problem with normalization or with representation of the double numbers in the registers of the numerical core? Best regards, Tobias Aug 24 '07 #7

 P: n/a On Fri, 24 Aug 2007 11:13:05 -0700, Tobias wrote: >The second ones requires exact arithmetic to always be true, due to theway floating point operations this might not always be the case, but itwill be close. That is the real question. I'm especially interested in the case when x and y become very small. Somebody assured me that division a double by the same double (in the same representation) gives always the exact double 1.0. That is true for IEEE 754 conforming arithmetic. Basic arithmetic operations need to be done with exact rounding i.e. conceptually the operation is carried out with infinite precision and then rounded (using the current rounding mode) to the nearest representable number. But might there be a problem with normalization or with representation of the double numbers in the registers of the numerical core? Since on the Intel architecture double and extended precision are mixed and the actual CPU instructions make it very hard to avoid this most IEEE 754 guarantees are moot (thank you Intel). For stable numerical behaviour you need to avoid the Intel architecture or force your compiler to use the newer SSE2 instruction set exclusively for floating point arithmetic if possible. -- Markus Schoder Aug 24 '07 #8

 P: n/a In article <13*************@corp.supernews.com"Alf P. Steinbach" ( x/z == 1.0 ) Unless x and y are special values, with modern floating point implementations this is guaranteed. Floating point division, with a modern floating point implementation, isn't inherently inexact: it just rounds the result to some specific number of binary digits. Hence x/x, with normal non-zero numbers, produces exactly 1. I had a funny experience that allows me to say that the number of binarydigits is not so specific, but can vary unexpectedly =) I had this compare predicate which gave me funny assertions when used incombination with particular optimizations: (unchecked code, just to give the idea) struct Point { double a, b; }; bool anglesorter(Point a, Point b) { return atan2(a.y, a.x) < atan2(b.y, b.x); } which, passed to a sort function, lead to assertions when a point in the sequence was replicated. The assertion (inserted by microsoft as a checkfor the predicates) has this condition: pred(a, b) && !pred(b, a) which is quite funny. After disassembling i found out that one of the two atan2 was kept in a x87 register (80-bit), while the other was stored inmemory (64-bit precision), so the result of anglesorter(a, a) would give true for some values of a. Now, this doesn't look the same as the OP's example, but I wouldn't trust the floating point optimizations again: what happens if foo gets inlined, its parameters are into extended precision register, but the result ofmax, for some odd reason, does get trimmed to double-precision? I must admit I don't like the alternative version too: x/z == 1.0 becomes fabs(x/z-1.0) < numeric_limits::epsilon() or, avoiding divisions: fabs(x-z) < numeric_limits::epsilon()*z because they are both unreadable. I'd try to avoid the need of such a compare in the first place. -- Marco -- I'm trying a new usenet client for Mac, Nemo OS X. You can download it at http://www.malcom-mac.com/nemo Aug 25 '07 #9

 P: n/a On 2007-08-25 07:44:49 -0400, "Alf P. Steinbach" >which is quite funny. After disassembling i found out that one of thetwo atan2 was kept in a x87 register (80-bit), while the other wasstored inmemory (64-bit precision), so the result ofanglesorter(a, a)would give true for some values of a.Now, this doesn't look the same as the OP's example, but I wouldn'ttrust the floating point optimizations again: what happens if foo getsinlined, its parameters are into extended precision register, but theresult ofmax, for some odd reason, does get trimmed todouble-precision? This sounds like a case of not really having established the reason, e.g., with a clearly established reason you'd be able to say that under such and such conditions, x == x would yield false (which it does for IEEE NaN). That said, Visual C++ 6.0 or thereabouts had notoriously unreliable floating point optimization. If I had to guess, I'd guess that then year old compiler, but simply encountering a NaN or two without understanding it could also be the Real Problem. But this example isn't x == x, it's anglesorter(x, x), where anglesorter applies the same computation to both of its arguments. C and C++ both allow the implementation to do computations involving floating point values at higher precision than that of the actual type. For x87, this means that computations involving doubles (64 bits) can be done at full width for the processor (80 bits). When you compare the result of atan2 computed at 80 bits with the result of atan2 rounded to 64 bits, the results will almost certainly be different. And that's allowed by the language definition (in C99 you can check how the implementation handles this by looking at the value of the macro FLT_EVAL_METHOD). What isn't allowed is hanging on to the extra precision across a store: double x1 = atan2(a.y, a.x); double x2 = atan2(b.y, b.x); return x1 < x2; In this case, the values of both x1 and x2 must be rounded to double, even if the compiler elides the actual store and holds them in registers. Many compilers skip this step, because it's an impediment to optimization, unless you set a command line switch to tell the compiler that your really mean it. -- Pete Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The Standard C++ Library Extensions: a Tutorial and Reference (www.petebecker.com/tr1book) Aug 25 '07 #10

### This discussion thread is closed

Replies have been disabled for this discussion. 