473,473 Members | 1,933 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Float precision (0.5678f becomes 0.56779998540878)

void Test()
{
float fValue = 0.5678f; // Value is 0.567800
double dValue = (double)fValue; // Value is 0.56779998540878
}

Is there anyway I can round a float at 6 positions behind the decimal?
That way the value would be 0.567800 again?

Jul 19 '05 #1
10 7071
Mad Butch wrote:
void Test()
{
float fValue = 0.5678f; // Value is 0.567800
double dValue = (double)fValue; // Value is 0.56779998540878
}

Is there anyway I can round a float at 6 positions behind the decimal?
That way the value would be 0.567800 again?


try this:

#include <iostream>
#include <cmath>

int main()
{
float fValue = 0.5678f;
double dValue = (double)fValue;

std::cout.precision(15);
std::cout.width(20);

std::cout << dValue << std::endl;

dValue = std::floor( 1e6l * fValue + 0.5l ) / 1e6l;

std::cout.precision(15);
std::cout.width(20);

std::cout << dValue << std::endl;
}

Jul 19 '05 #2

"Gianni Mariani" <gi*******@mariani.ws> wrote in message
news:bk********@dispatch.concentric.net...
Mad Butch wrote:
void Test()
{
float fValue = 0.5678f; // Value is 0.567800
double dValue = (double)fValue; // Value is 0.56779998540878
}

Is there anyway I can round a float at 6 positions behind the decimal?
That way the value would be 0.567800 again?


try this:

#include <iostream>
#include <cmath>

int main()
{
float fValue = 0.5678f;
double dValue = (double)fValue;

std::cout.precision(15);
std::cout.width(20);

std::cout << dValue << std::endl;

dValue = std::floor( 1e6l * fValue + 0.5l ) / 1e6l;

std::cout.precision(15);
std::cout.width(20);

std::cout << dValue << std::endl;
}


Without trying that code out, it appears that it *writes out* a result that
looks like what the OP was asking for.

However, that is not what I got from the OP's question. There was no "cout"
in his code. I *think* what was being asked was how to actually get the
value stored in memory to have the same value as a double that it had as a
float. And the answer to that is, in general, you can't!

Floating-point values are stored in binary, not decimal, and there (in most
cases) no way to get a specific floating-point value accurate to some number
of *decimal* places. It can only be accurate to some number of *binary*
digits, because it's binary, not decimal.

One should, in general, never rely on a floating-point number to be exactly
anything (except perhaps a whole number, such as zero).
-Howard

Jul 19 '05 #3
Howard wrote:

One should, in general, never rely on a floating-point number to be exactly
anything (except perhaps a whole number, such as zero).


There's usually a pretty broad range of contiguous integers that can be
exactly represented, but as far as I know there's no guarantee of this
in the standard.

As a specific example, the IEEE format that is used on a number of
systems (including Intel x86 systems) has doubles that can represent
every integer from something like -9,007,199,254,740,992 to
+9,007,199,254,740,992 (I worked that out myself, so I may have made a
mistake). Outside that range, I don't think you can find two consecutive
integers that can be precisely represented.

-Kevin
--
My email address is valid, but changes periodically.
To contact me please use the address from a recent posting.

Jul 19 '05 #4
On Wed, 17 Sep 2003 18:57:29 GMT, Kevin Goodsell
<us*********************@neverbox.com> wrote in comp.lang.c++:
Howard wrote:

One should, in general, never rely on a floating-point number to be exactly
anything (except perhaps a whole number, such as zero).


There's usually a pretty broad range of contiguous integers that can be
exactly represented, but as far as I know there's no guarantee of this
in the standard.


Sure there is. The C++ standard specifically inherits <float.h> from
ISO C, with either that name or <climits>, and makes the following
normative: ISO C subclause 7.1.5, 5.2.4.2.2, 5.2.4.2.1.

DBL_DIG, FLT_DIG, and LDBL_DIG are required to be available and have
the same meaning they have in C, namely the maximum number of decimal
digits a whole number can contain and be guaranteed to be converted to
the floating point type and back again with the original value.

These might be available in <limits> as well, I haven't looked it up.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq
Jul 19 '05 #5
Jack Klein wrote:
On Wed, 17 Sep 2003 18:57:29 GMT, Kevin Goodsell
<us*********************@neverbox.com> wrote in comp.lang.c++:

There's usually a pretty broad range of contiguous integers that can be
exactly represented, but as far as I know there's no guarantee of this
in the standard.

Sure there is. The C++ standard specifically inherits <float.h> from
ISO C, with either that name or <climits>, and makes the following
normative: ISO C subclause 7.1.5, 5.2.4.2.2, 5.2.4.2.1.

DBL_DIG, FLT_DIG, and LDBL_DIG are required to be available and have
the same meaning they have in C, namely the maximum number of decimal
digits a whole number can contain and be guaranteed to be converted to
the floating point type and back again with the original value.

These might be available in <limits> as well, I haven't looked it up.


Oh, OK. Most of those macros relating to floating point types are a
mystery to me. Thanks for the correction.

-Kevin
--
My email address is valid, but changes periodically.
To contact me please use the address from a recent posting.

Jul 19 '05 #6


Kevin Goodsell wrote:

Howard wrote:

One should, in general, never rely on a floating-point number to be exactly
anything (except perhaps a whole number, such as zero).


There's usually a pretty broad range of contiguous integers that can be
exactly represented, but as far as I know there's no guarantee of this
in the standard.

As a specific example, the IEEE format that is used on a number of
systems (including Intel x86 systems) has doubles that can represent
every integer from something like -9,007,199,254,740,992 to
+9,007,199,254,740,992 (I worked that out myself, so I may have made a
mistake). Outside that range, I don't think you can find two consecutive
integers that can be precisely represented.


I doubt that.
The only difference between 0.3 and 30 as a floating point number
is the exponent.

0.3 is stored (of course in binary) as 0.3 E 0
30.0 is stored as 0.3 E 2

so if the same mantissa is used and we know that 0.3
cannot be represented exactly by a binary floating point
number, then 30 also cannot be represented exactly.
Of course the real thing is different, since the exponent
usually is not base 10, but base 2, but the principle is the
same.

--
Karl Heinz Buchegger
kb******@gascad.at
Jul 19 '05 #7
Karl Heinz Buchegger wrote in news:3F***************@gascad.at:
As a specific example, the IEEE format that is used on a number of
systems (including Intel x86 systems) has doubles that can represent
every integer from something like -9,007,199,254,740,992 to
+9,007,199,254,740,992 (I worked that out myself, so I may have made
a mistake). Outside that range, I don't think you can find two
consecutive integers that can be precisely represented.
I doubt that.
The only difference between 0.3 and 30 as a floating point number
is the exponent.

0.3 is stored (of course in binary) as 0.3 E 0


Note that 0.3 can't be stored (exactly) in binary, 3 can and so
can 0.5, 0.25, 0.125 ... etc. I mean binary as base 2 not just
as a collection of bits (which is probably what you meant).

30.0 is stored as 0.3 E 2

so if the same mantissa is used and we know that 0.3
cannot be represented exactly by a binary floating point
number, then 30 also cannot be represented exactly.
0.3 can't be represented exactly as an integer multiplied by a
power of 2, 30 can. My point here is that your argument is
backwards, it should be a floating point value is an integer
multiplied by some other integer (usually 2 but possibly 10)
raised to the power of yet another integer. I.e. you can
inferer things about a non-integer by making reasoned arguments
about integers not the other way around.

Of course the real thing is different, since the exponent
usually is not base 10, but base 2, but the principle is the
same.


I get your point but jack said: "Outside that range, I don't think you
can find two consecutive integers that can be precisely represented."

You missed "...consecutive ..." I think.

Rob.
--
http://www.victim-prime.dsl.pipex.com/
Jul 19 '05 #8


Rob Williscroft wrote:

0.3 can't be represented exactly as an integer multiplied by a
power of 2, 30 can. My point here is that your argument is
backwards, it should be a floating point value is an integer
multiplied by some other integer (usually 2 but possibly 10)
raised to the power of yet another integer. I.e. you can
inferer things about a non-integer by making reasoned arguments
about integers not the other way around.


I'm not sure I understand what you are trying to say to me.
My point is (ist's hard to express all of this for me, since
I am not a native english speaker, so be patient)

the mantissa can be seen as the sum of fractions.
Thus you need to find the coefficients for

sizeof( double ) * 8 (assuming 8 bits per byte)
+----
\ 1
\ bit * -----
/ i 2
/ i
+----
i = 0

such that this sum equals the requested floating point number.

eg.
1 1 1 1 1
0.3 = 0 * - + 1 * - + 0 * - + 0 * -- + 1 * -- + ....
2 4 8 16 32

thus the bit sequence for 0.3 starts with 01001....
I don't know, if this repeated summing finally ends up at 0.3
(within the limited bits of double), but if it does not
(replace 0.3 with some other number if necc.), then all
2-multiples (0.6, 1.2, 2.4, 4.8, 9.6, ... ) of that number will
also be not representable.

Ohh. Now I see where I have gone wrong and what you mean.
I started with the some number and tried to end up at an
integer with that dubbling. I tried to base my doubt on the fact
that not all numbers in the range 0 .. 1) are representable by
a sum_of_fractions.
But in fact the opposite is happening.
I should have started with an integer and by halfing that (and
incementing the exponent) I will always end up with a number
in the range 0 .. 1( where such a sum_of_fractions exists.

mantissa exponent
3.0 0
1.5 1
0.75 2

0.75 = 0.5 + 0.25

1 1 2
( - + - ) * 2 = 3.0
2 4
Kevin was right and I was wrong.
Thanks for making me think, although
I should have done that before replying.

--
Karl Heinz Buchegger
kb******@gascad.at
Jul 19 '05 #9
Rob Williscroft wrote in news:Xns93FA6F98C7E3EukcoREMOVEfreenetrtw@
195.129.110.130:
I get your point but jack said: "Outside that range, I don't think you
can find two consecutive integers that can be precisely represented."


My apologies to Kevin and Jack for some reason I thought Karl was
responding to Jack and not Kevin and attributed Kevin's statments
to Jack.

Rob.
--
http://www.victim-prime.dsl.pipex.com/
Jul 19 '05 #10
Karl Heinz Buchegger wrote:

I doubt that.
The only difference between 0.3 and 30 as a floating point number
is the exponent.

0.3 is stored (of course in binary) as 0.3 E 0
30.0 is stored as 0.3 E 2

so if the same mantissa is used and we know that 0.3
cannot be represented exactly by a binary floating point
number, then 30 also cannot be represented exactly.
Of course the real thing is different, since the exponent
usually is not base 10, but base 2, but the principle is the
same.


It looks like you've worked it out (I didn't totally follow the
discussion), but let me explain my logic, and show how integers are
represented in IEEE doubles.

First, the IEEE double is 64 bits. 1 sign bit, 11 exponent bits, and 52
mantissa bits. The mantissa actually has one more 'implied' bit - it's
value is always 1, and it is logically placed to the left of the other
52 bits. Now, integer value 1 is represented like this:

1|.0000[...]0000

Where the 1 is not actually stored, but is implied (represented here by
using a vertical bar to separate it from the physical bits). All the
physical mantissa bits are zero ([...] is used so I don't have to type
all 52 bits), and the binary point (represented with a dot) is placed
between the implied 1 bit and the 'real' mantissa bits. The exponent
isn't shown, but it is implied by the position of the binary point (you
run out of mantissa long before you run out of exponent, so there's no
need to watch for exponent overflow in this example).

Moving on, the integer 2 is represented this way:

1|0.0000[...]0000

Only the exponent (position of the binary point) changes. 3 looks like this:

1|1.0000[...]0000

4-9:

1|00.000[...]0000
1|01.000[...]0000
1|10.000[...]0000
1|11.000[...]0000
1|000.00[...]0000
1|001.00[...]0000

If you continue this for a very, very long time, you arrive at this:

1|1111[...]1111.

All the mantissa bits are set to 1. This isn't quite the end, though.
You can add one more to get this:

1|0000[...]0000|0.

Now there's an implied 0 on the right side of the mantissa. Actually,
there's a lot of implied zeros over there, they just haven't come into
play until now. This should give the value I listed as the upper limit
(pow(2, 53)).

At this point, you can't add 1 and get anything different. It would
require flipping the implied 0 bit, which you can't do. You can add 2
and get this, though:

1|0000[...]0001|0.

There are obviously many more integers that can be precisely
represented, but none of them are contiguous - given two contiguous
integers, one must have it's least significant bit set to 1 (in other
words, must be odd), but all integers above this point use an implied 0
bit for their least significant bit. As you go higher, more implied 0
bits are used, making the distance from one exact integer to the next
even greater. For a while, you have only even integers. Then, only
integers that are divisible by 4, then 8, 16, etc. (until you run out of
exponent, a very long time later).

-Kevin
--
My email address is valid, but changes periodically.
To contact me please use the address from a recent posting.

Jul 19 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Tomasz Stochmal | last post by:
Hi I need to write a function that will convert a float number given as string into long and reverse function of that, example: '713566671863.6850' becomes 7135666718636850L...
16
by: BigMan | last post by:
How can I check if assignment of a float to a double (or vice versa) will result in loss of precision?
6
by: spasmous | last post by:
Hi, I'm running out of memory storing a large float array. One acceptable workaround would be to reduce the float precision by switching from the default, which is 4 bytes long, to a smaller one -...
8
by: vijay | last post by:
Hello, What happens to float variable in loops. For example, float f=8.7; if(f<8.7) printf("less"); else if(f==8.7) printf("equal"); else if(f>8.7)
13
by: Michele Guidolin | last post by:
Hello to everybody. I'm doing some benchmark about a red black Gauss Seidel algorithm with 2 dimensional grid of different size and type, I have some strange result when I change the computation...
15
by: michael.mcgarry | last post by:
Hi, I have a question about floating point precision in C. What is the minimum distinguishable difference between 2 floating point numbers? Does this differ for various computers? Is this...
17
by: kiplring | last post by:
float sum = (float)Math.Sqrt( floatA*floatA + floatB*floatB); I'm using DirectX with c#. But the Math class in .net framework has a problem. It is "double" base! So I'm doing type casting...
8
by: bearophileHUGS | last post by:
sys.maxint gives the largest positive integer supported by Python's regular integer type. But maybe such attribute, with few others (they can be called min and max) can be given to int type itself....
13
by: Shirsoft | last post by:
I have a 32 bit intel and 64 bit AMD machine. There is a rounding error in the 8th digit. Unfortunately because of the algorithm we use, the errors percolate into higher digits. C++ code is...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.