473,372 Members | 824 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,372 software developers and data experts.

Float precision (0.5678f becomes 0.56779998540878)

void Test()
{
float fValue = 0.5678f; // Value is 0.567800
double dValue = (double)fValue; // Value is 0.56779998540878
}

Is there anyway I can round a float at 6 positions behind the decimal?
That way the value would be 0.567800 again?

Jul 19 '05 #1
10 7067
Mad Butch wrote:
void Test()
{
float fValue = 0.5678f; // Value is 0.567800
double dValue = (double)fValue; // Value is 0.56779998540878
}

Is there anyway I can round a float at 6 positions behind the decimal?
That way the value would be 0.567800 again?


try this:

#include <iostream>
#include <cmath>

int main()
{
float fValue = 0.5678f;
double dValue = (double)fValue;

std::cout.precision(15);
std::cout.width(20);

std::cout << dValue << std::endl;

dValue = std::floor( 1e6l * fValue + 0.5l ) / 1e6l;

std::cout.precision(15);
std::cout.width(20);

std::cout << dValue << std::endl;
}

Jul 19 '05 #2

"Gianni Mariani" <gi*******@mariani.ws> wrote in message
news:bk********@dispatch.concentric.net...
Mad Butch wrote:
void Test()
{
float fValue = 0.5678f; // Value is 0.567800
double dValue = (double)fValue; // Value is 0.56779998540878
}

Is there anyway I can round a float at 6 positions behind the decimal?
That way the value would be 0.567800 again?


try this:

#include <iostream>
#include <cmath>

int main()
{
float fValue = 0.5678f;
double dValue = (double)fValue;

std::cout.precision(15);
std::cout.width(20);

std::cout << dValue << std::endl;

dValue = std::floor( 1e6l * fValue + 0.5l ) / 1e6l;

std::cout.precision(15);
std::cout.width(20);

std::cout << dValue << std::endl;
}


Without trying that code out, it appears that it *writes out* a result that
looks like what the OP was asking for.

However, that is not what I got from the OP's question. There was no "cout"
in his code. I *think* what was being asked was how to actually get the
value stored in memory to have the same value as a double that it had as a
float. And the answer to that is, in general, you can't!

Floating-point values are stored in binary, not decimal, and there (in most
cases) no way to get a specific floating-point value accurate to some number
of *decimal* places. It can only be accurate to some number of *binary*
digits, because it's binary, not decimal.

One should, in general, never rely on a floating-point number to be exactly
anything (except perhaps a whole number, such as zero).
-Howard

Jul 19 '05 #3
Howard wrote:

One should, in general, never rely on a floating-point number to be exactly
anything (except perhaps a whole number, such as zero).


There's usually a pretty broad range of contiguous integers that can be
exactly represented, but as far as I know there's no guarantee of this
in the standard.

As a specific example, the IEEE format that is used on a number of
systems (including Intel x86 systems) has doubles that can represent
every integer from something like -9,007,199,254,740,992 to
+9,007,199,254,740,992 (I worked that out myself, so I may have made a
mistake). Outside that range, I don't think you can find two consecutive
integers that can be precisely represented.

-Kevin
--
My email address is valid, but changes periodically.
To contact me please use the address from a recent posting.

Jul 19 '05 #4
On Wed, 17 Sep 2003 18:57:29 GMT, Kevin Goodsell
<us*********************@neverbox.com> wrote in comp.lang.c++:
Howard wrote:

One should, in general, never rely on a floating-point number to be exactly
anything (except perhaps a whole number, such as zero).


There's usually a pretty broad range of contiguous integers that can be
exactly represented, but as far as I know there's no guarantee of this
in the standard.


Sure there is. The C++ standard specifically inherits <float.h> from
ISO C, with either that name or <climits>, and makes the following
normative: ISO C subclause 7.1.5, 5.2.4.2.2, 5.2.4.2.1.

DBL_DIG, FLT_DIG, and LDBL_DIG are required to be available and have
the same meaning they have in C, namely the maximum number of decimal
digits a whole number can contain and be guaranteed to be converted to
the floating point type and back again with the original value.

These might be available in <limits> as well, I haven't looked it up.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq
Jul 19 '05 #5
Jack Klein wrote:
On Wed, 17 Sep 2003 18:57:29 GMT, Kevin Goodsell
<us*********************@neverbox.com> wrote in comp.lang.c++:

There's usually a pretty broad range of contiguous integers that can be
exactly represented, but as far as I know there's no guarantee of this
in the standard.

Sure there is. The C++ standard specifically inherits <float.h> from
ISO C, with either that name or <climits>, and makes the following
normative: ISO C subclause 7.1.5, 5.2.4.2.2, 5.2.4.2.1.

DBL_DIG, FLT_DIG, and LDBL_DIG are required to be available and have
the same meaning they have in C, namely the maximum number of decimal
digits a whole number can contain and be guaranteed to be converted to
the floating point type and back again with the original value.

These might be available in <limits> as well, I haven't looked it up.


Oh, OK. Most of those macros relating to floating point types are a
mystery to me. Thanks for the correction.

-Kevin
--
My email address is valid, but changes periodically.
To contact me please use the address from a recent posting.

Jul 19 '05 #6


Kevin Goodsell wrote:

Howard wrote:

One should, in general, never rely on a floating-point number to be exactly
anything (except perhaps a whole number, such as zero).


There's usually a pretty broad range of contiguous integers that can be
exactly represented, but as far as I know there's no guarantee of this
in the standard.

As a specific example, the IEEE format that is used on a number of
systems (including Intel x86 systems) has doubles that can represent
every integer from something like -9,007,199,254,740,992 to
+9,007,199,254,740,992 (I worked that out myself, so I may have made a
mistake). Outside that range, I don't think you can find two consecutive
integers that can be precisely represented.


I doubt that.
The only difference between 0.3 and 30 as a floating point number
is the exponent.

0.3 is stored (of course in binary) as 0.3 E 0
30.0 is stored as 0.3 E 2

so if the same mantissa is used and we know that 0.3
cannot be represented exactly by a binary floating point
number, then 30 also cannot be represented exactly.
Of course the real thing is different, since the exponent
usually is not base 10, but base 2, but the principle is the
same.

--
Karl Heinz Buchegger
kb******@gascad.at
Jul 19 '05 #7
Karl Heinz Buchegger wrote in news:3F***************@gascad.at:
As a specific example, the IEEE format that is used on a number of
systems (including Intel x86 systems) has doubles that can represent
every integer from something like -9,007,199,254,740,992 to
+9,007,199,254,740,992 (I worked that out myself, so I may have made
a mistake). Outside that range, I don't think you can find two
consecutive integers that can be precisely represented.
I doubt that.
The only difference between 0.3 and 30 as a floating point number
is the exponent.

0.3 is stored (of course in binary) as 0.3 E 0


Note that 0.3 can't be stored (exactly) in binary, 3 can and so
can 0.5, 0.25, 0.125 ... etc. I mean binary as base 2 not just
as a collection of bits (which is probably what you meant).

30.0 is stored as 0.3 E 2

so if the same mantissa is used and we know that 0.3
cannot be represented exactly by a binary floating point
number, then 30 also cannot be represented exactly.
0.3 can't be represented exactly as an integer multiplied by a
power of 2, 30 can. My point here is that your argument is
backwards, it should be a floating point value is an integer
multiplied by some other integer (usually 2 but possibly 10)
raised to the power of yet another integer. I.e. you can
inferer things about a non-integer by making reasoned arguments
about integers not the other way around.

Of course the real thing is different, since the exponent
usually is not base 10, but base 2, but the principle is the
same.


I get your point but jack said: "Outside that range, I don't think you
can find two consecutive integers that can be precisely represented."

You missed "...consecutive ..." I think.

Rob.
--
http://www.victim-prime.dsl.pipex.com/
Jul 19 '05 #8


Rob Williscroft wrote:

0.3 can't be represented exactly as an integer multiplied by a
power of 2, 30 can. My point here is that your argument is
backwards, it should be a floating point value is an integer
multiplied by some other integer (usually 2 but possibly 10)
raised to the power of yet another integer. I.e. you can
inferer things about a non-integer by making reasoned arguments
about integers not the other way around.


I'm not sure I understand what you are trying to say to me.
My point is (ist's hard to express all of this for me, since
I am not a native english speaker, so be patient)

the mantissa can be seen as the sum of fractions.
Thus you need to find the coefficients for

sizeof( double ) * 8 (assuming 8 bits per byte)
+----
\ 1
\ bit * -----
/ i 2
/ i
+----
i = 0

such that this sum equals the requested floating point number.

eg.
1 1 1 1 1
0.3 = 0 * - + 1 * - + 0 * - + 0 * -- + 1 * -- + ....
2 4 8 16 32

thus the bit sequence for 0.3 starts with 01001....
I don't know, if this repeated summing finally ends up at 0.3
(within the limited bits of double), but if it does not
(replace 0.3 with some other number if necc.), then all
2-multiples (0.6, 1.2, 2.4, 4.8, 9.6, ... ) of that number will
also be not representable.

Ohh. Now I see where I have gone wrong and what you mean.
I started with the some number and tried to end up at an
integer with that dubbling. I tried to base my doubt on the fact
that not all numbers in the range 0 .. 1) are representable by
a sum_of_fractions.
But in fact the opposite is happening.
I should have started with an integer and by halfing that (and
incementing the exponent) I will always end up with a number
in the range 0 .. 1( where such a sum_of_fractions exists.

mantissa exponent
3.0 0
1.5 1
0.75 2

0.75 = 0.5 + 0.25

1 1 2
( - + - ) * 2 = 3.0
2 4
Kevin was right and I was wrong.
Thanks for making me think, although
I should have done that before replying.

--
Karl Heinz Buchegger
kb******@gascad.at
Jul 19 '05 #9
Rob Williscroft wrote in news:Xns93FA6F98C7E3EukcoREMOVEfreenetrtw@
195.129.110.130:
I get your point but jack said: "Outside that range, I don't think you
can find two consecutive integers that can be precisely represented."


My apologies to Kevin and Jack for some reason I thought Karl was
responding to Jack and not Kevin and attributed Kevin's statments
to Jack.

Rob.
--
http://www.victim-prime.dsl.pipex.com/
Jul 19 '05 #10
Karl Heinz Buchegger wrote:

I doubt that.
The only difference between 0.3 and 30 as a floating point number
is the exponent.

0.3 is stored (of course in binary) as 0.3 E 0
30.0 is stored as 0.3 E 2

so if the same mantissa is used and we know that 0.3
cannot be represented exactly by a binary floating point
number, then 30 also cannot be represented exactly.
Of course the real thing is different, since the exponent
usually is not base 10, but base 2, but the principle is the
same.


It looks like you've worked it out (I didn't totally follow the
discussion), but let me explain my logic, and show how integers are
represented in IEEE doubles.

First, the IEEE double is 64 bits. 1 sign bit, 11 exponent bits, and 52
mantissa bits. The mantissa actually has one more 'implied' bit - it's
value is always 1, and it is logically placed to the left of the other
52 bits. Now, integer value 1 is represented like this:

1|.0000[...]0000

Where the 1 is not actually stored, but is implied (represented here by
using a vertical bar to separate it from the physical bits). All the
physical mantissa bits are zero ([...] is used so I don't have to type
all 52 bits), and the binary point (represented with a dot) is placed
between the implied 1 bit and the 'real' mantissa bits. The exponent
isn't shown, but it is implied by the position of the binary point (you
run out of mantissa long before you run out of exponent, so there's no
need to watch for exponent overflow in this example).

Moving on, the integer 2 is represented this way:

1|0.0000[...]0000

Only the exponent (position of the binary point) changes. 3 looks like this:

1|1.0000[...]0000

4-9:

1|00.000[...]0000
1|01.000[...]0000
1|10.000[...]0000
1|11.000[...]0000
1|000.00[...]0000
1|001.00[...]0000

If you continue this for a very, very long time, you arrive at this:

1|1111[...]1111.

All the mantissa bits are set to 1. This isn't quite the end, though.
You can add one more to get this:

1|0000[...]0000|0.

Now there's an implied 0 on the right side of the mantissa. Actually,
there's a lot of implied zeros over there, they just haven't come into
play until now. This should give the value I listed as the upper limit
(pow(2, 53)).

At this point, you can't add 1 and get anything different. It would
require flipping the implied 0 bit, which you can't do. You can add 2
and get this, though:

1|0000[...]0001|0.

There are obviously many more integers that can be precisely
represented, but none of them are contiguous - given two contiguous
integers, one must have it's least significant bit set to 1 (in other
words, must be odd), but all integers above this point use an implied 0
bit for their least significant bit. As you go higher, more implied 0
bits are used, making the distance from one exact integer to the next
even greater. For a while, you have only even integers. Then, only
integers that are divisible by 4, then 8, 16, etc. (until you run out of
exponent, a very long time later).

-Kevin
--
My email address is valid, but changes periodically.
To contact me please use the address from a recent posting.

Jul 19 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Tomasz Stochmal | last post by:
Hi I need to write a function that will convert a float number given as string into long and reverse function of that, example: '713566671863.6850' becomes 7135666718636850L...
16
by: BigMan | last post by:
How can I check if assignment of a float to a double (or vice versa) will result in loss of precision?
6
by: spasmous | last post by:
Hi, I'm running out of memory storing a large float array. One acceptable workaround would be to reduce the float precision by switching from the default, which is 4 bytes long, to a smaller one -...
8
by: vijay | last post by:
Hello, What happens to float variable in loops. For example, float f=8.7; if(f<8.7) printf("less"); else if(f==8.7) printf("equal"); else if(f>8.7)
13
by: Michele Guidolin | last post by:
Hello to everybody. I'm doing some benchmark about a red black Gauss Seidel algorithm with 2 dimensional grid of different size and type, I have some strange result when I change the computation...
15
by: michael.mcgarry | last post by:
Hi, I have a question about floating point precision in C. What is the minimum distinguishable difference between 2 floating point numbers? Does this differ for various computers? Is this...
17
by: kiplring | last post by:
float sum = (float)Math.Sqrt( floatA*floatA + floatB*floatB); I'm using DirectX with c#. But the Math class in .net framework has a problem. It is "double" base! So I'm doing type casting...
8
by: bearophileHUGS | last post by:
sys.maxint gives the largest positive integer supported by Python's regular integer type. But maybe such attribute, with few others (they can be called min and max) can be given to int type itself....
13
by: Shirsoft | last post by:
I have a 32 bit intel and 64 bit AMD machine. There is a rounding error in the 8th digit. Unfortunately because of the algorithm we use, the errors percolate into higher digits. C++ code is...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.