By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,916 Members | 1,306 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,916 IT Pros & Developers. It's quick & easy.

float? double?

P: n/a
hi all...

I've readed some lines about the difference between float and double
data types... but, in the real world, which is the best? when should we
use float or double??

thanks
Erick

Aug 18 '06 #1
Share this Question
Share on Google+
60 Replies


P: n/a
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Erick-wrote:
hi all...

I've readed some lines about the difference between float and double
data types... but, in the real world, which is the best? when should we
use float or double??
That's a fairly open-ended question.

However, considering that C floatingpoint math is conducted in
double-precision (with conversions to and from single precision only to
store or retrieve values from variables), and that (for certain types
of function) single-precision floatingpoint values ("float") are
promoted to double-precision values ("double") automagically, I'd have
to say that the only benefit float has over double is that float may
take less storage space than double.

For my money, unless space is an issue, it's probably better to stick
with double as the default floatingpoint format for your code.

Just my 2cents worth
- --
Lew Pitcher
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (MingW32) - WinPT 0.11.12

iD8DBQFE5hHGagVFX4UWr64RAmY2AKCBaT3TGdQdV9z8BHRXDc EgbNw5dACcCN8A
Ljp+/QJFHbOg99yloCUp00c=
=lP4v
-----END PGP SIGNATURE-----

Aug 18 '06 #2

P: n/a
In article <11*********************@m79g2000cwm.googlegroups. com>,
Erick-<er*********@yahoo.eswrote:
>I've readed some lines about the difference between float and double
data types... but, in the real world, which is the best? when should we
use float or double??

Which is best, a pickup truck, or a half-tonne truck?
float and double are defined in terms of minimum precision allowed
for each. On any given platform, it is not required that there is
*any* difference between the two: if the float data type meets the
minimum requirements that C imposes on the double data type,
then the two could be exactly the same.

Traditionally, float was faster than double but offered less
precision. double never offers -less- precision than float, but
these days, it is not uncommon to find computers on which double
is as fast (or faster than) float. Also, float never occupies -more-
permanent storage than does double, and sometimes memory size
is the biggest factor (but these days you usually just go out and
buy more memory if you need it.)

There are also still computers which do not implement either float or
double in hardware, so from a speed perspective, sometimes both
are significantly worse than using integer arithmetic instead. But
there are also numerous computers these days on which integer arithmetic
is slower than double -- computers being sold into markets where
(say) 95% of the operations requiring speed are likely to be
floating point operations, so the development resources are spent
primarily on accelarating floating point. Thus, from a speed
perspective, you cannot trust that floating point with be either
slower or faster than integer arithmetic. But there -are- times
when using integer arithmetic can be absolutely crucial for
preserving required accuracy.

All of which is to say, "it depends" ;-)
Size, speed, precision: for any given task, any of them might be
the key factor. Speed is particularily variable: a CPU upgrade
without changing anything else might completely alter the speed factors.
--
If you lie to the compiler, it will get its revenge. -- Henry Spencer
Aug 18 '06 #3

P: n/a
In article <11**********************@m79g2000cwm.googlegroups .com>,
Lew Pitcher <lp******@sympatico.cawrote:
>However, considering that C floatingpoint math is conducted in
double-precision (with conversions to and from single precision only to
store or retrieve values from variables),
That's incorrect, at least for C89 (I haven't checked C99.)

The "usual arithmetic conversions" choses long double if necessary,
then double if necessary, then:

Otherwise, if either operand has type float, the other operand
is converted to float.

The descriptions of operations such as binary + refer to the
usual arithmetic conversions, but definitely do NOT say that
float is promoted to double for the purposes of the calculation.
--
If you lie to the compiler, it will get its revenge. -- Henry Spencer
Aug 18 '06 #4

P: n/a
Erick-wrote:
hi all...

I've readed some lines about the difference between float and double
data types... but, in the real world, which is the best? when should we
use float or double??
float would be used when better performance is required (faster, or less
storage used). double, on most current platforms, gives more range and
precision, and generally requires less care about numerical issues.
Aug 18 '06 #5

P: n/a
Tim Prince wrote:
Erick-wrote:
>hi all...

I've readed some lines about the difference between float and double
data types... but, in the real world, which is the best? when should we
use float or double??
float would be used when better performance is required (faster, or less
storage used).
Be certain that this is true for your platform before making such a
decision. It's entirely possible that your FPU datapath is more
efficient with doubles than with a shorter format. It's also possible
that float and double are the same datatype.

I would suggest the possibility that for a given application, increased
precision or range could conceivably be a liability. And in projects
I've worked on, I've seen routines that were dependent on assumptions
about the internal representation of the data (bad, not my fault, not in
my power to fix).

Anyway, check before assuming that float is more efficient than double.
It's entirely possible that the assumptions are backwards from the
results.
double, on most current platforms, gives more range and
precision, and generally requires less care about numerical issues.
I don't know about "most current platforms" because every time I make an
assumption about that, I'm reminded that what I think is the predominate
platform may be far from correct... I would seriously consider checking
into whether the current ARM chips have double fpu registers, and how
the compilers for that platform define float and double...

Now granted, in my tiny corner of the world, floats are 4 bytes and
doubles are 8 and long doubles 12.
Shoot me if I ever write anything that depends on that. (Wait, no, I
*have to* do that still, don't shoot.)
Aug 18 '06 #6

P: n/a
Erick-wrote:
hi all...

I've readed some lines about the difference between float and double
data types... but, in the real world, which is the best? when should we
use float or double??

thanks
Erick
double means DOUBLE PRECISION. Note the word precision here.

When you want more precision, use double precision, and if that
doesn't cut it use long double.

Precision means the number of significant digits you get
for the calculations. float gives you 6 digits precision,
(assuming IEEE 754) double gives you 15, and long double
more than that, using Intel/Amd implementation gives you 18.

If precision is not important (you are only interested
in a rough approximation) float is great.

Aug 18 '06 #7

P: n/a
jacob navia wrote:
double means DOUBLE PRECISION. Note the word precision here.

When you want more precision, use double precision, and if that
doesn't cut it use long double.

Precision means the number of significant digits you get
for the calculations. float gives you 6 digits precision,
(assuming IEEE 754) double gives you 15, and long double
more than that, using Intel/Amd implementation gives you 18.

I'm not up to date on the ISO specs, but I don't remember any
requirements like the ones you mention; only that long doubles be at
least as long as doubles, which themselves must be at least as long as
floats, and that there is a minimum range of -10^37 through 10^37.

It may be reasonable to assume IEEE754 on some (very common) platforms,
but is it strictly compliant to do so?
Aug 18 '06 #8

P: n/a
jmcgill wrote:
jacob navia wrote:
>double means DOUBLE PRECISION. Note the word precision here.

When you want more precision, use double precision, and if that
doesn't cut it use long double.

Precision means the number of significant digits you get
for the calculations. float gives you 6 digits precision,
(assuming IEEE 754) double gives you 15, and long double
more than that, using Intel/Amd implementation gives you 18.

I'm not up to date on the ISO specs, but I don't remember any
requirements like the ones you mention; only that long doubles be at
least as long as doubles, which themselves must be at least as long as
floats, and that there is a minimum range of -10^37 through 10^37.

It may be reasonable to assume IEEE754 on some (very common) platforms,
but is it strictly compliant to do so?
Do not confuse C99 standard (ISO) and IEEE 754 (floating point)
Aug 18 '06 #9

P: n/a
jacob navia wrote:
Do not confuse C99 standard (ISO) and IEEE 754 (floating point)
I wouldn't, or at least, I would be very explicit about my intentions if
doing so. But I've seen comments on this thread today that kind of
scare me. (By "kind of", I mean, I don't really give a damn, since none
of the posters work for me or are my students ;-)
Aug 18 '06 #10

P: n/a
Erick-wrote:
>
hi all...

I've readed some lines about the difference between float and double
data types... but, in the real world, which is the best?
when should we use float or double??
Use float, when you want the smallest floating point type.
In C89, use long double, when you want the floating point type
with the greatest range.
Use double, all the rest of the time.

--
pete
Aug 19 '06 #11

P: n/a
>I've readed some lines about the difference between float and double
>data types... but, in the real world, which is the best? when should we
use float or double??
I've read about the difference between rat poison, condoms,
and sunscreen. In the real world, which is the best?

It depends on what you're trying to do.

Typical floats have less than 7 digits of precision (and they aren't
guaranteed to have more than 6). If you're planning on doing
accounting with amounts up to 10 million dollars and expect the
total of a bunch of numbers to be accurate to a penny for the
accountants, don't use floats. (Also, use integer quantities of
cents whether or not you use a variable type of int or float or
double or long double). Use doubles, long doubles, or long longs.

If you are attempting to store large quantities of shoe sizes, which
for normal people don't need much in precision but keeping them all
in memory is a requirement, use float (or perhaps use short with
the size multiplied by 10).

Performance issues are muddled. It is not guaranteed that the
performance of float is worse than or better than or about the same
as the performance of doubles. Yep, on some systems the performance
of floats is worse (convert to double, do operation, convert back),
on some it's better (less bits to worry about), and on some it's
the same, and on some it's worse, better, or the same depending on
what you are trying to do.

Aug 19 '06 #12

P: n/a
jmcgill <jm*****@email.arizona.eduwrites:
jacob navia wrote:
>double means DOUBLE PRECISION. Note the word precision here.
When you want more precision, use double precision, and if that
doesn't cut it use long double.
Precision means the number of significant digits you get
for the calculations. float gives you 6 digits precision,
(assuming IEEE 754) double gives you 15, and long double
more than that, using Intel/Amd implementation gives you 18.
I think the name "double" probably comes from Fortran, where "DOUBLE
PRECISION" is exactly twice the size of "FLOAT". (I'm not much of a
Fortran person, so I could easily be mistaken.)
I'm not up to date on the ISO specs, but I don't remember any
requirements like the ones you mention; only that long doubles be at
least as long as doubles, which themselves must be at least as long as
floats, and that there is a minimum range of -10^37 through 10^37.

It may be reasonable to assume IEEE754 on some (very common) platforms,
but is it strictly compliant to do so?
IEEE 754 is certainly the most common set of floating-point formats,
but it's not required by the C standard; C implementations exist on a
number of platforms with other FP formats (IBM, Cray, VAX, etc.).

The C standard requires:
FLT_DIG >= 6
DBL_DIG >= 10
LDBL_DIG >= 10

FLT_MAX >= 1E+37
DBL_MAX >= 1E+37
LDBL_MAX >= 1E+37

Obviously IEEE 754 exceeds these minimal requirements.

Note that double and long double, to be conforming, need at least 40
bits (if I've done the math correctly); float could be as few as 27 or
so. An implementation with 32-bit float and 64-bit double and long
double could easily be conforming. (There's been some confusion about
this with some compilers not supporting long double, when they could
easily have just given it the same characteristics as double.)

But it's not at all obvious that float is going to be *faster* than
double. The common wisdom, as I understand it, is to use double
rather than float *unless* you really need to save storage space, but
the tradeoffs could vary on different systems.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Aug 19 '06 #13

P: n/a
Ark
jmcgill wrote:
jacob navia wrote:
>Do not confuse C99 standard (ISO) and IEEE 754 (floating point)

I wouldn't, or at least, I would be very explicit about my intentions if
doing so. But I've seen comments on this thread today that kind of
scare me. (By "kind of", I mean, I don't really give a damn, since none
of the posters work for me or are my students ;-)
Check, however, the normative Annex F. If your compiler defines
__STDC_IEC_559__, it should be pretty close.
Aug 19 '06 #14

P: n/a
In article <44*********************@news.orange.frjacob navia <ja***@jacob.remcomp.frwrites:
.....
When you want more precision, use double precision, and if that
doesn't cut it use long double.
When precision is a problem, it is much better to analyse *why* it is a
problem. That is much better than going to double, and if that does not
cut it, going to long double.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
Aug 19 '06 #15

P: n/a
pete <pf*****@mindspring.comwrites:
Erick-wrote:
>>
hi all...

I've readed some lines about the difference between float and double
data types... but, in the real world, which is the best?
when should we use float or double??

Use float, when you want the smallest floating point type.
In C89, use long double, when you want the floating point type
with the greatest range.
Use double, all the rest of the time.
Why the "In C89" qualification? long double is the largest
floating-point type in both C89/C90 and C99; hardly anyone should have
to worry about pre-C89 implementations these days.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Aug 19 '06 #16

P: n/a
jacob navia wrote:
Erick-wrote:
>hi all...

I've readed some lines about the difference between float and double
data types... but, in the real world, which is the best? when should we
use float or double??

thanks
Erick

double means DOUBLE PRECISION. Note the word precision here.

When you want more precision, use double precision, and if that
doesn't cut it use long double.

Precision means the number of significant digits you get
for the calculations. float gives you 6 digits precision,
(assuming IEEE 754) double gives you 15, and long double
more than that, using Intel/Amd implementation gives you 18.

If precision is not important (you are only interested
in a rough approximation) float is great.
Hi Jake, Joe here. Comment ca va? My machines are x86 from Intel and AMD
and SPARC from Sun. All claim IEEE 754 floating point.

The precision of a floating point type is binary and tied to the width
of the mantissa. On my machines, the mantissa of a 32-bit float is 24
bits wide and the mantissa of a 64-bit double is 53 bits wide.

The 24-bit mantissa of the float demands 8 decimal digits for its
representation. The 53-bit double mantissa demands 16 decimal digits.

I had occasion, some time ago, to express float and double as text and
then from text back to float and double. Exactly.

Given a double, text is..

char buf[30]; /* more than enough */

sprintf(buf, "%.16e", dbl);

For a float..

sprintf(buf, "%.8e", flt);

Now use of atof() or strtod() will take the text back to floating point.
Exactly.

I went through all this to avoid the tragedy of importing binary files
from disparate systems. Endianess is also an issue that disappears when
you and your recipient agree that the file format is 'text' instead of
'binary'.

--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
Aug 19 '06 #17

P: n/a
Keith Thompson wrote:
>
pete <pf*****@mindspring.comwrites:
Erick-wrote:
>
hi all...

I've readed some lines about the
difference between float and double
data types... but, in the real world, which is the best?
when should we use float or double??
Use float, when you want the smallest floating point type.
In C89, use long double, when you want the floating point type
with the greatest range.
Use double, all the rest of the time.

Why the "In C89" qualification? long double is the largest
floating-point type in both C89/C90 and C99;
I was thinking of long long int, and got confused.
hardly anyone should have
to worry about pre-C89 implementations these days.
--
pete
Aug 19 '06 #18

P: n/a
Dik T. Winter wrote:
In article <44*********************@news.orange.frjacob navia <ja***@jacob.remcomp.frwrites:
....
When you want more precision, use double precision, and if that
doesn't cut it use long double.

When precision is a problem, it is much better to analyse *why* it is a
problem. That is much better than going to double, and if that does not
cut it, going to long double.
Right you are. The lowly float with 24-bit mantissa is precise to one in
sixteen million. How close do you need to be? :-)

--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
Aug 19 '06 #19

P: n/a
Joe Wright wrote:
Dik T. Winter wrote:
>jacob navia <ja***@jacob.remcomp.frwrites:
....
>>When you want more precision, use double precision, and if that
doesn't cut it use long double.

When precision is a problem, it is much better to analyse *why*
it is a problem. That is much better than going to double, and
if that does not cut it, going to long double.

Right you are. The lowly float with 24-bit mantissa is precise to
one in sixteen million. How close do you need to be? :-)
You can often get away with much worse. 25 years ago I had a
system with 24 bit floats, which yielded 4.8 digits precision, but
was fast (for its day) and rounded properly. This was quite
adequate to do least square fits to 3rd order polynomials, which
involve some not too well behaved matrix inversions.

The proper rounding is critical. Early on I checked some
operations with a Basic implementation, and got poorer results
because the Basic truncated rather than rounding, even though it
had 8 more bits of precision! I think that was one of Microsofts.

--
"The power of the Executive to cast a man into prison without
formulating any charge known to the law, and particularly to
deny him the judgement of his peers, is in the highest degree
odious and is the foundation of all totalitarian government
whether Nazi or Communist." -- W. Churchill, Nov 21, 1943
Aug 19 '06 #20

P: n/a
"Keith Thompson" <ks***@mib.orgwrote
jmcgill <jm*****@email.arizona.eduwrites:
But it's not at all obvious that float is going to be *faster* than
double. The common wisdom, as I understand it, is to use double
rather than float *unless* you really need to save storage space, but
the tradeoffs could vary on different systems.
By a great deal. The Playstation 2 came with a compiler that supported
double, but in software emulation. The single precision floating point unit
was, on the other hand, extremely fast, at least for its day.
--
www.personal.leeds.ac.uk/~bgy1mm
freeware games to download.

Aug 19 '06 #21

P: n/a

"Dik T. Winter" <Di********@cwi.nlwrote in message
news:J4********@cwi.nl...
In article <44*********************@news.orange.frjacob navia
<ja***@jacob.remcomp.frwrites:
....
When you want more precision, use double precision, and if that
doesn't cut it use long double.

When precision is a problem, it is much better to analyse *why* it is a
problem. That is much better than going to double, and if that does not
cut it, going to long double.
There's a lot of truth there.
My first ever 3D cube rotated a cube quite nicely for a second, then pulled
it apart as errors accumlated. I immediately fixed the problem by going to
double.

Had double not been available, I would have realised that there is no need
to make incremental changes in the xyz coordinates. Simply increment the
rotation. I would have had a much better program for it.
--
www.personal.leeds.ac.uk/~bgy1mm
freeware games to download.

Aug 19 '06 #22

P: n/a
Joe Wright <jo********@comcast.netwrites:
Dik T. Winter wrote:
>In article <44*********************@news.orange.frjacob navia
<ja***@jacob.remcomp.frwrites:
....
> When you want more precision, use double precision, and if that
doesn't cut it use long double.
When precision is a problem, it is much better to analyse *why* it
is a problem. That is much better than going to double, and if
that does not cut it, going to long double.

Right you are. The lowly float with 24-bit mantissa is precise to one
in sixteen million. How close do you need to be? :-)
It depends on the application (and on the quality of the
implementation).

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Aug 19 '06 #23

P: n/a
In article <ln************@nuthaus.mib.orgKeith Thompson <ks***@mib.orgwrites:
jacob navia wrote:
double means DOUBLE PRECISION. Note the word precision here.
When you want more precision, use double precision, and if that
doesn't cut it use long double.
....
I think the name "double" probably comes from Fortran, where "DOUBLE
PRECISION" is exactly twice the size of "FLOAT". (I'm not much of a
Fortran person, so I could easily be mistaken.)
And originally also exactly twice the precision. A double precision
number was implemented as two single precision numbers. In many cases
handled in software.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
Aug 19 '06 #24

P: n/a
jacob navia wrote:
>
Erick-wrote:
hi all...

I've readed some lines about the difference between float and double
data types... but, in the real world, which is the best? when should we
use float or double??

thanks
Erick

double means DOUBLE PRECISION. Note the word precision here.

When you want more precision, use double precision, and if that
doesn't cut it use long double.

Precision means the number of significant digits you get
for the calculations. float gives you 6 digits precision,
(assuming IEEE 754) double gives you 15, and long double
more than that, using Intel/Amd implementation gives you 18.

If precision is not important (you are only interested
in a rough approximation) float is great.
float, is a low ranking type
which is subject to the default argument promotions.
double, is the more natural type.

--
pete
Aug 19 '06 #25

P: n/a
In article <gv******************************@comcast.com>,
Joe Wright <jo********@comcast.netwrote:
>The 24-bit mantissa of the float demands 8 decimal digits for its
representation. The 53-bit double mantissa demands 16 decimal digits.
>I had occasion, some time ago, to express float and double as text and
then from text back to float and double. Exactly.
>Given a double, text is..
>char buf[30]; /* more than enough */
sprintf(buf, "%.16e", dbl);
>For a float..
sprintf(buf, "%.8e", flt);
>Now use of atof() or strtod() will take the text back to floating point.
Exactly.
Hmmm -- it is not obvious to me that exact conversion will happen in
that case, Joe. 8 or 16 decimal digits gets you to the point at which
you can precisely pin down the last decimal digit displayed, but there
may have been up to around 3 additional bits worth of information
stored without being able to select the precise decimal digit for
output.

For example, the system might know that the bottom 3 bits are 011, but
be unable to decide whether to output a 4 (.375 rounded up through
((.45 minus epsilon) rounded down), or a 5 (.45 rounded up through
((.5 minus epsilon) rounded down)). The alternative is to print out
more digits than are really present, in order to get enough
information to fill the bottom bits.
--
Is there any thing whereof it may be said, See, this is new? It hath
been already of old time, which was before us. -- Ecclesiastes
Aug 20 '06 #26

P: n/a

"Keith Thompson" <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
Joe Wright <jo********@comcast.netwrites:
>Dik T. Winter wrote:
>>In article <44*********************@news.orange.frjacob navia
<ja***@jacob.remcomp.frwrites:
....
When you want more precision, use double precision, and if that
doesn't cut it use long double.
When precision is a problem, it is much better to analyse *why* it
is a problem. That is much better than going to double, and if
that does not cut it, going to long double.

Right you are. The lowly float with 24-bit mantissa is precise to one
in sixteen million. How close do you need to be? :-)

It depends on the application (and on the quality of the
implementation).
There's hardly any application where an accuracy of 1 in 16 million is not
acceptable. For instance if you are machining space shuttle parts it is
unlikely they go to a tolerance of more than about 1 in 10000.
The real problem is that errors can propagate. If you multiply by a million,
suddenly you only have an accuracy of 1 in 16.
--
www.personal.leeds.ac.uk/~bgy1mm
freeware games to download.

Aug 20 '06 #27

P: n/a
Walter Roberson wrote:
In article <gv******************************@comcast.com>,
Joe Wright <jo********@comcast.netwrote:
>The 24-bit mantissa of the float demands 8 decimal digits for its
representation. The 53-bit double mantissa demands 16 decimal digits.

>I had occasion, some time ago, to express float and double as text and
then from text back to float and double. Exactly.
>Given a double, text is..
>char buf[30]; /* more than enough */
sprintf(buf, "%.16e", dbl);
>For a float..
sprintf(buf, "%.8e", flt);
>Now use of atof() or strtod() will take the text back to floating point.
Exactly.

Hmmm -- it is not obvious to me that exact conversion will happen in
that case, Joe. 8 or 16 decimal digits gets you to the point at which
you can precisely pin down the last decimal digit displayed, but there
may have been up to around 3 additional bits worth of information
stored without being able to select the precise decimal digit for
output.

For example, the system might know that the bottom 3 bits are 011, but
be unable to decide whether to output a 4 (.375 rounded up through
((.45 minus epsilon) rounded down), or a 5 (.45 rounded up through
((.5 minus epsilon) rounded down)). The alternative is to print out
more digits than are really present, in order to get enough
information to fill the bottom bits.
I suggest you are wrong Walter. What three extra bits are you talking
about? My point is that a given float printed with sprintf(buff,".8e",f)
will produce a string that when presented to atof() or strtod() will
produce the original float value exatcly.

Same for sprintf(buff,".16e",d) for double.

--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
Aug 20 '06 #28

P: n/a
Joe Wright wrote:
Walter Roberson wrote:
>In article <gv******************************@comcast.com>,
Joe Wright <jo********@comcast.netwrote:
>>The 24-bit mantissa of the float demands 8 decimal digits for its
representation. The 53-bit double mantissa demands 16 decimal digits.

>>I had occasion, some time ago, to express float and double as text
and then from text back to float and double. Exactly.
>>Given a double, text is..
>>char buf[30]; /* more than enough */
sprintf(buf, "%.16e", dbl);
>>For a float..
sprintf(buf, "%.8e", flt);
>>Now use of atof() or strtod() will take the text back to floating
point. Exactly.

Hmmm -- it is not obvious to me that exact conversion will happen in
that case, Joe. 8 or 16 decimal digits gets you to the point at which
you can precisely pin down the last decimal digit displayed, but there
may have been up to around 3 additional bits worth of information
stored without being able to select the precise decimal digit for output.

For example, the system might know that the bottom 3 bits are 011, but
be unable to decide whether to output a 4 (.375 rounded up through
((.45 minus epsilon) rounded down), or a 5 (.45 rounded up through
((.5 minus epsilon) rounded down)). The alternative is to print out
more digits than are really present, in order to get enough
information to fill the bottom bits.

I suggest you are wrong Walter. What three extra bits are you talking
about? My point is that a given float printed with sprintf(buff,".8e",f)
will produce a string that when presented to atof() or strtod() will
produce the original float value exatcly.

Same for sprintf(buff,".16e",d) for double.
According to the IEEE 754 standards, %.9e format for float data type,
and %.17e for double, are required to avoid losing accuracy, and this
can be supported only within well defined ranges. Standard C doesn't
assure you that IEEE754 is followed, but it cannot improve on it.
Aug 20 '06 #29

P: n/a
>There's hardly any application where an accuracy of 1 in 16 million is not
>acceptable.
Two common exceptions to this are currency and time.

Accountants expect down-to-the-penny (or whatever the smallest unit
of currency is) accuracy no matter what. And governments spend
trillions of dollars a year.

If your time base is in the year 1AD, and you subtract two current-day
times (stored in floats) to get an interval, you can get rounding
error in excess of an hour. Even for POSIX time (epoch 1 Jan 1970),
you still have rounding errors in excess of 1 minute.
>For instance if you are machining space shuttle parts it is
unlikely they go to a tolerance of more than about 1 in 10000.
The real problem is that errors can propagate. If you multiply by a million,
suddenly you only have an accuracy of 1 in 16.
If you had a precision of 1 in 16 million, and you multiply by a
million (an exact number), you still have 1 in 16 million. You
lose precision when you SUBTRACT nearly-equal numbers. If you
subtract two POSIX times about 1.1 billion seconds past the epoch,
but store these in floats before the subtraction, your result for
the difference is only accurate to within a minute. This stinks if
the real difference is supposed to be 5 seconds.

Aug 20 '06 #30

P: n/a
>Hmmm -- it is not obvious to me that exact conversion will happen in
>that case, Joe. 8 or 16 decimal digits gets you to the point at which
you can precisely pin down the last decimal digit displayed, but there
may have been up to around 3 additional bits worth of information
stored without being able to select the precise decimal digit for
output.
>For example, the system might know that the bottom 3 bits are 011, but
be unable to decide whether to output a 4 (.375 rounded up through
((.45 minus epsilon) rounded down), or a 5 (.45 rounded up through
((.5 minus epsilon) rounded down)). The alternative is to print out
more digits than are really present, in order to get enough
information to fill the bottom bits.
He *is* printing more digits than are guaranteed to exist.
A float is guaranteed to have 6 significant decimal digits. For IEEE
floats, this number is about 6.9 digits. But he's printing 8 digits,
which is (I suspect, I haven't tested this) necessary to ensure that
every representable value has a unique representation.

Aug 20 '06 #31

P: n/a
In article <GO******************************@comcast.comJoe Wright <jo********@comcast.netwrites:
....
I suggest you are wrong Walter. What three extra bits are you talking
about? My point is that a given float printed with sprintf(buff,".8e",f)
will produce a string that when presented to atof() or strtod() will
produce the original float value exatcly.
(And ".16e" for double.)

That is right for IEEE. To get round-trip exactness when reading in
and printing back again the maximum number of decimal digits allowed is
floor((p - 1) log_10 b), where p is the number of base b digits.
Round-trip exactness the other way around requires
ceil(p log_10 b + 1) decimal digits.

For IEEE that means FLT_DIG=6 and DBL_DIG=15. For correct conversion
in all cases the other way around you need 9 digits for float and
17 digits for double. In a "%.e" format you have subtract 1 from the
required number (because there is always a digit printed in front),
so 8 and 16 are good enough for IEEE.

That 8 as total number of digits for two floats is not enough can be
shown with the pair a = 1073741824.0 and b = 1073741760.0.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
Aug 21 '06 #32

P: n/a
In article <12*************@corp.supernews.comgo***********@burditt.org (Gordon Burditt) writes:
There's hardly any application where an accuracy of 1 in 16 million is not
acceptable.

Two common exceptions to this are currency and time.

Accountants expect down-to-the-penny (or whatever the smallest unit
of currency is) accuracy no matter what. And governments spend
trillions of dollars a year.
Right. One of the reasons to use fixed point for this, and not floating
point. With fixed point you can get the rounding as it should be.
If your time base is in the year 1AD, and you subtract two current-day
times (stored in floats) to get an interval, you can get rounding
error in excess of an hour. Even for POSIX time (epoch 1 Jan 1970),
you still have rounding errors in excess of 1 minute.
Again, a good reason not to use floating point for this, but fixed point.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
Aug 21 '06 #33

P: n/a
In article <m2******************@newssvr21.news.prodigy.comtp*****@nospammyrealbox.com writes:
....
According to the IEEE 754 standards, %.9e format for float data type,
and %.17e for double, are required to avoid losing accuracy, and this
can be supported only within well defined ranges.
Are you sure? 9 digits and 17 digits are sufficient, but %.9e gives 10
digits and %.17e gives 18 digits. I think you committed the same error
I did at first, not counting the digit before the decimal point.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
Aug 21 '06 #34

P: n/a
In article <12*************@corp.supernews.comgo***********@burditt.org (Gordon Burditt) writes:
....
He *is* printing more digits than are guaranteed to exist.
Hrm. What do you mean with "guaranteed to exist"?
A float is guaranteed to have 6 significant decimal digits.
No. It is guaranteed that a decimal number with 6 significant decimal
digits, when read in and printed out again with the same precision
will yield the original number.
For IEEE
floats, this number is about 6.9 digits. But he's printing 8 digits,
which is (I suspect, I haven't tested this) necessary to ensure that
every representable value has a unique representation.
No, he is printing 9 digits (do not forget the leading digit on %.e
formats), and these are indeed necessary.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
Aug 21 '06 #35

P: n/a
Gordon Burditt wrote:
>There's hardly any application where an accuracy of 1 in 16 million is not
acceptable.

Two common exceptions to this are currency and time.

Accountants expect down-to-the-penny (or whatever the smallest unit
of currency is) accuracy no matter what. And governments spend
trillions of dollars a year.

If your time base is in the year 1AD, and you subtract two current-day
times (stored in floats) to get an interval, you can get rounding
error in excess of an hour. Even for POSIX time (epoch 1 Jan 1970),
you still have rounding errors in excess of 1 minute.
>For instance if you are machining space shuttle parts it is
unlikely they go to a tolerance of more than about 1 in 10000.
The real problem is that errors can propagate. If you multiply by a million,
suddenly you only have an accuracy of 1 in 16.

If you had a precision of 1 in 16 million, and you multiply by a
million (an exact number), you still have 1 in 16 million. You
lose precision when you SUBTRACT nearly-equal numbers. If you
subtract two POSIX times about 1.1 billion seconds past the epoch,
but store these in floats before the subtraction, your result for
the difference is only accurate to within a minute. This stinks if
the real difference is supposed to be 5 seconds.
You're making all this up, aren't you? Posix time today is somewhere
around 1,156,103,121 seconds since the Epoch. We are therefore a little
over half way to the end of Posix time in early 2038. Total Posix
seconds are 2^31 or 2,147,483,648 seconds. I would not expect to treat
such a number with a lowly float with only a 24-bit mantissa. I do point
out that double has a 53-bit mantissa and is very much up to the task.

Aside: Why was time_t defined as 32-bit signed integer? What was
supposed to happen when time_t assumes LONG_MAX + 1 ? Why was there no
time to be considered before the Epoch. Arrogance of young men I assume.

The double type would have been a much better choice for time_t.

We must try to remain clear ourselves about the difference between
accuracy and precision. It is the representation that offers the
precision, it is our (the programmer's) calculations that may provide
accuracy.

--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
Aug 21 '06 #36

P: n/a
Joe Wright wrote:
>
Aside: Why was time_t defined as 32-bit signed integer? What was
supposed to happen when time_t assumes LONG_MAX + 1 ? Why was there no
time to be considered before the Epoch. Arrogance of young men I assume.
For the same reason we had the year 2K bug?
The double type would have been a much better choice for time_t.
Probably not on the hardware available in the '70s.

--
Ian Collins.
Aug 21 '06 #37

P: n/a
In article <H_******************************@comcast.com>
Joe Wright <jo********@comcast.netwrote:
>Aside: Why was time_t defined as 32-bit signed integer?
Why was int32_t defined as a 16-bit integer?

(For that matter, why *do* cats paint?)

Seriously, though:
>The double type would have been a much better choice for time_t.
The original Unix time was a 16-bit type.

The original Unix epoch was moved several times (three, I think).

Then they got sick of that, and finally went to 32-bit (and 24-bit,
for disk block numbers; anyone remember "l3tol()"?) integers, and
eventually added "long" to the C language. After that came "unsigned
long", and now we have "long long" and "unsigned long long" and
there is no reason[%] not to make time_t a 64-bit type, as it is on
some systems.

[% Well, "backwards compatibility", especially with all those
binary file formats. Some people planned ahead, and some did not.
Some code will break, and some will not.]
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Aug 21 '06 #38

P: n/a
In article <H_******************************@comcast.comJoe Wright <jo********@comcast.netwrites:
....
Aside: Why was time_t defined as 32-bit signed integer?
What operating system?
What was
supposed to happen when time_t assumes LONG_MAX + 1 ? Why was there no
time to be considered before the Epoch. Arrogance of young men I assume.
Why were only two digits used to specify the year?
The double type would have been a much better choice for time_t.
Not at all. time_t should be an integral type, otherwise you can get
problems with rounding.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
Aug 21 '06 #39

P: n/a
Chris Torek <no****@torek.netwrites:
In article <H_******************************@comcast.com>
Joe Wright <jo********@comcast.netwrote:
>>Aside: Why was time_t defined as 32-bit signed integer?

Why was int32_t defined as a 16-bit integer?

(For that matter, why *do* cats paint?)

Seriously, though:
>>The double type would have been a much better choice for time_t.

The original Unix time was a 16-bit type.

The original Unix epoch was moved several times (three, I think).
Um, are you sure about that? 16 bits with 1-second resolution only
covers about 18 hours. Even 1-minute resolution only covers about a
month and a half.

1970 was very early in the history of Unix. I wouldn't think there'd
have been much time to shift the epoch.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Aug 21 '06 #40

P: n/a
Joe Wright <jo********@comcast.netwrites:
[...]
Aside: Why was time_t defined as 32-bit signed integer? What was
supposed to happen when time_t assumes LONG_MAX + 1 ? Why was there no
time to be considered before the Epoch. Arrogance of young men I
assume.
Arrogance would have been assuming that the system they were designing
would still be in use 68 years later.

The C standard only says that time_t is a numeric type.
The double type would have been a much better choice for time_t.
I disagree. If you want 1-second resolution, a 64-bit signed integer
gives you more than enough range. If you use a floating-point type,
you get very fine resolution near the epoch, and relatively poor
resolution farther away, which doesn't seem particularly useful.

<OT>Assuming a Unix-style time_t (a signed integer type with 1-second
resolution with 0 representing 1970-01-01 00:00:00 GMT), there's
plenty of time before 2038 to expand it to 64 bits; it's already
happened on many systems.</OT>

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Aug 21 '06 #41

P: n/a
In article <ec*********@news2.newsguy.comI wrote, in part:
>The original Unix time was a 16-bit type.

The original Unix epoch was moved several times (three, I think).
In article <ln************@nuthaus.mib.org>
Keith Thompson <ks***@mib.orgwrote:
>Um, are you sure about that? 16 bits with 1-second resolution only
covers about 18 hours. Even 1-minute resolution only covers about a
month and a half.
Oops, you are correct that it was not 16 bits (it was 32), but I was
correct about the moved epochs. See
<http://www.tuhs.org/Archive/PDP-11/Distributions/research/Dennis_v3/Readme.nsys>.

(In any case, the "real" point -- that time_t is not specified as
32-bit, or even signed -- still stands. Unsigned 32-bit takes one
to a bit beyond 2100, and of course signed or unsigned 64-bit is
better.)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Aug 21 '06 #42

P: n/a
Keith Thompson wrote:
>
.... snip ...
>
<OT>Assuming a Unix-style time_t (a signed integer type with 1-second
resolution with 0 representing 1970-01-01 00:00:00 GMT), there's
plenty of time before 2038 to expand it to 64 bits; it's already
happened on many systems.</OT>
Having the epoch start in 1970, or even 1978 (Digital Research) is
foolish, when 1968 or 1976 would simplify leap year calculations.
This is also an argument for using 1900.

--
Chuck F (cb********@yahoo.com) (cb********@maineline.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.netUSE maineline address!

Aug 21 '06 #43

P: n/a
Keith Thompson wrote:
>
.... snip ...
>
<OT>Assuming a Unix-style time_t (a signed integer type with 1-second
resolution with 0 representing 1970-01-01 00:00:00 GMT), there's
plenty of time before 2038 to expand it to 64 bits; it's already
happened on many systems.</OT>
Having the epoch start in 1970, or even 1978 (Digital Research) is
foolish, when 1968 or 1976 would simplify leap year calculations.
This is also an argument for using 1900.

--
Chuck F (cb********@yahoo.com) (cb********@maineline.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.netUSE maineline address!

Aug 21 '06 #44

P: n/a
CBFalconer <cb********@yahoo.comwrites:
Keith Thompson wrote:
>>
... snip ...
>>
<OT>Assuming a Unix-style time_t (a signed integer type with 1-second
resolution with 0 representing 1970-01-01 00:00:00 GMT), there's
plenty of time before 2038 to expand it to 64 bits; it's already
happened on many systems.</OT>

Having the epoch start in 1970, or even 1978 (Digital Research) is
foolish, when 1968 or 1976 would simplify leap year calculations.
This is also an argument for using 1900.
It's not really an argument for 1900, which *wasn't* a leap year.
(I've seen systems that use 1901 because of this.)

But IMHO the leap year issue just isn't that big a deal. The
calculations aren't that hard, and it's a solved problem.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Aug 21 '06 #45

P: n/a


CBFalconer wrote On 08/21/06 04:36,:
Keith Thompson wrote:

... snip ...
>><OT>Assuming a Unix-style time_t (a signed integer type with 1-second
resolution with 0 representing 1970-01-01 00:00:00 GMT), there's
plenty of time before 2038 to expand it to 64 bits; it's already
happened on many systems.</OT>


Having the epoch start in 1970, or even 1978 (Digital Research) is
foolish, when 1968 or 1976 would simplify leap year calculations.
This is also an argument for using 1900.
If concerns about leap year are to govern the choice,
the zero point should be xxxx-03-01 00:00:00, where xxxx
is a multiple of 400.

However, leap year calculations are not so important
that they should govern such a choice. Everyone who's
anyone already knows that the Right Thing To Do is to
define the zero point as 1858-11-17 00:00:00.

--
Er*********@sun.com

Aug 21 '06 #46

P: n/a
CBFalconer wrote:
Keith Thompson wrote:
.... snip ...
><OT>Assuming a Unix-style time_t (a signed integer type with 1-second
resolution with 0 representing 1970-01-01 00:00:00 GMT), there's
plenty of time before 2038 to expand it to 64 bits; it's already
happened on many systems.</OT>

Having the epoch start in 1970, or even 1978 (Digital Research) is
foolish, when 1968 or 1976 would simplify leap year calculations.
This is also an argument for using 1900.
It would be an even better argument for using 1600-03-01.
--
Clark S. Cox III
cl*******@gmail.com
Aug 21 '06 #47

P: n/a
av
On Sat, 19 Aug 2006 10:13:41 -0400, CBFalconer wrote:
>Joe Wright wrote:
>Dik T. Winter wrote:
>>jacob navia <ja***@jacob.remcomp.frwrites:
....
When you want more precision, use double precision, and if that
doesn't cut it use long double.

When precision is a problem, it is much better to analyse *why*
it is a problem. That is much better than going to double, and
if that does not cut it, going to long double.

Right you are. The lowly float with 24-bit mantissa is precise to
one in sixteen million. How close do you need to be? :-)

You can often get away with much worse. 25 years ago I had a
system with 24 bit floats, which yielded 4.8 digits precision, but
was fast (for its day) and rounded properly. This was quite
i think it is not useful to round numbers
the only place where to round numbers is an iussie seems when input-
output
Aug 21 '06 #48

P: n/a

av wrote:
On Sat, 19 Aug 2006 10:13:41 -0400, CBFalconer wrote:
Joe Wright wrote:
Dik T. Winter wrote:
jacob navia <ja***@jacob.remcomp.frwrites:
....
When you want more precision, use double precision, and if that
doesn't cut it use long double.

When precision is a problem, it is much better to analyse *why*
it is a problem. That is much better than going to double, and
if that does not cut it, going to long double.

Right you are. The lowly float with 24-bit mantissa is precise to
one in sixteen million. How close do you need to be? :-)
You can often get away with much worse. 25 years ago I had a
system with 24 bit floats, which yielded 4.8 digits precision, but
was fast (for its day) and rounded properly. This was quite

i think it is not useful to round numbers
Then don't use floating point (you think e.g. 1/3 has an exact
floating
point representation on your machine?).

However, it will come as a shock to many people to learn that floating
point is not useful.

-William Hughes

Aug 21 '06 #49

P: n/a


av wrote On 08/21/06 13:38,:
On Sat, 19 Aug 2006 10:13:41 -0400, CBFalconer wrote:
>>
You can often get away with much worse. 25 years ago I had a
system with 24 bit floats, which yielded 4.8 digits precision, but
was fast (for its day) and rounded properly. This was quite

i think it is not useful to round numbers
the only place where to round numbers is an iussie seems when input-
output
double d = sqrt(2.0);

Assuming the machine has a finite amount of memory, how
do you propose to carry out this calculation without some
kind of rounding?

--
Er*********@sun.com

Aug 21 '06 #50

60 Replies

This discussion thread is closed

Replies have been disabled for this discussion.