By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
458,124 Members | 1,569 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 458,124 IT Pros & Developers. It's quick & easy.

Re: long double precision

P: n/a
James Kanze <ja*********@gmail.comwrites:
On Nov 16, 10:55 pm, "BobR" <removeBadB...@worldnet.att.netwrote:
>Markus Moll wrote in message...
What does MSVC++ say about sizeof(long double) vs sizeof(double)?
>I asked My Second Virtual Cousin (twice added), and he said nothing! <G>
>In Assembler, I used to use eight-byte(dd) and ten-byte(dt) types. That's
not even close to "twice the size" (If we're talking number of bits).
[ assembler == a386 ]

And g++ on a PC, at least in some configurations, uses 8 and 12
bytes.

At the hardware level, there are 10 bytes of information in an
Intel long double.
True, with some additional considerations. The commonly used IEEE 754
floating point formats are

single precision: 32 bits including 1 sign bit, 23 significand bits
(with an implicit leading 1, for 24 total), and 8 exponent bits

double precision: 64 bits including 1 sign bit, 52 significand bits
(with an implicit leading 1, for 24 total), and 11 exponent bits

double extended precision: 80 bits including 1 sign bit, 64 significand
bits (no implicit leading 1), and 15 exponent bits.

The native format of the x87 FPU is "double extended". We run into this
from time to time when floating point computations compile with more
optimization give slightly different results. The issue is that one
optimiziation is to hold intermediate results in 80-bit FPU registers
instead of rounding them down to fit in 64-bit memory locations. gcc
offers the -ffloat-store to suppress this optimization for that very
reason.

In addition, the x87 control register allows one to select between
extended (the default), double and single precision. Try this (on
Linux/gcc/glibc), for example:

#include <fpu_control.h>

void set_double(void)
{
unsigned short cw;
_FPU_GETCW(cw);
cw = (cw & ~_FPU_EXTENDED) | _FPU_DOUBLE;
_FPU_SETCW(cw);
}

and similarly for "set_extended". Note that this will make all kinds of
trouble for you because libm depends on having the FPU in extended
precision mode.

Now comes the real kicker: the SSE/SSE2/SSE3 vector co-processors do not
support double extended precision; only single and double precision.
gcc and icc are both using these co-processors pretty extensively now,
so you're not guaranteed to get even intermediate results done in
double-extended arithmetic.

Going back on-topic, if you want to peek at the binary representations
of IEEE floating-point numbers, you might enjoy the template included
below. I supply typedefs for single and double precision, writing the
typedef for double extended precision is left as an exercise to the
reader.

// IEEE Floating-Point template
// Copyright (C) 2007 Charles M. "Chip" Coldwell <co******@gmail.com>

// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.

// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.

// You should have received a copy of the GNU General Public License
// along with this program. If not, see <http://www.gnu.org/licenses/>.

#ifndef IEEEFLOAT_HH
#define IEEEFLOAT_HH

template
<typename _float_t, typename _sint_t, typename _uint_t, int _mbits, int _ebits>
class ieee_float {
public:
typedef _uint_t uint_t;
typedef _sint_t sint_t;
typedef _float_t float_t;
enum { mbits = _mbits, ebits = _ebits };

#ifdef __BIG_ENDIAN__
uint_t s:1;
uint_t e:ebits;
uint_t m:mbits;
#else
uint_t m:mbits;
uint_t e:ebits;
uint_t s:1;
#endif

static const uint_t mdenom = ((uint_t)1 << mbits);
static const uint_t ebias = ((uint_t)1 << (ebits - 1)) - 1;

sint_t sign(void) const { return 1 - 2*s; }
sint_t exponent(void) const { return e ? e - ebias : -(ebias - 1); }
uint_t mantissa(void) const { return ((uint_t)(!!e) << mbits) | m; }
bool infinity(void) const { return (e == ((1 << ebits) - 1)) && (m == 0); }
bool nan(void) const { return (e == ((1 << ebits) - 1)) && (m != 0); }
bool denormal(void) const { return (e == 0) && (m != 0); }

ieee_float(float_t f) { *reinterpret_cast<float_t *>(this) = f; }
ieee_float(uint_t u) { *reinterpret_cast<uint_t *>(this) = u; }

operator float_t() const { return *reinterpret_cast<const float_t *>(this); }
operator uint_t() const { return *reinterpret_cast<const uint_t *>(this); }
};

typedef ieee_float<float, int, unsigned, 23, 8>
single_precision;
typedef ieee_float<double, long long, unsigned long long, 52, 11>
double_precision;

#endif

Chip

--
Charles M. "Chip" Coldwell
"Turn on, log in, tune out"
GPG Key ID: 852E052F
GPG Key Fingerprint: 77E5 2B51 4907 F08A 7E92 DE80 AFA9 9A8F 852E 052F
Jun 27 '08 #1
Share this question for a faster answer!
Share on Google+

This discussion thread is closed

Replies have been disabled for this discussion.