473,395 Members | 1,386 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

converting floating point types round off error ....

Consider the equation (flight dynamics stuff):

Yaw (Degrees) = Azimuth Angle(Radians) * 180 (Degrees) /
3.1415926535897932384626433832795 (Radians)

There's a valid reason to use single precision floating point types.
The number of decimal digits guaranteed to be correct on my
implementation is 6. (i.e numeric_limits < float >::digits10 = 6 )

If I'm reading the IEEE standard, I'd could paraphrase the issue
surrounding conversion to a string and back _without_ loss of
precision as follows:

If a float is correct to a decimal string with a least 6 significant
decimal digits, and then converted back to a float, then the final
number must match the original.

IOW: given
float a = 1. F ;
float aa = 0. ;
std::stringstream s ;
s. precision ( 6 ) ;
s << std::scientific << a ;
s >aa;
assert ( a != aa ) ;

No sweat

I have to serialize the Yaw answer above. The question: Is it safe to
state that my PI representation is useless beyond six significant
digits? I'd like for the C++ source to reflect my Matlab models but
I'm starting to get concerned here with the conversion aspect.

Is there a good source out there that will show me how far out I could
represent a value ( say PI ) for both single and double precision
before truncation/round off loss kicks in? ( I tend to struggle with
numeric_limits at times + coupled with all the idiosyncrasies of
machines and floating point types )
Oct 4 '08 #1
7 4753
ma740988 <ma******@gmail.comkirjutas:
Consider the equation (flight dynamics stuff):

Yaw (Degrees) = Azimuth Angle(Radians) * 180 (Degrees) /
3.1415926535897932384626433832795 (Radians)

There's a valid reason to use single precision floating point types.
The number of decimal digits guaranteed to be correct on my
implementation is 6. (i.e numeric_limits < float >::digits10 = 6 )

If I'm reading the IEEE standard, I'd could paraphrase the issue
surrounding conversion to a string and back _without_ loss of
precision as follows:

If a float is correct to a decimal string with a least 6 significant
decimal digits, and then converted back to a float, then the final
number must match the original.

IOW: given
float a = 1. F ;
float aa = 0. ;
std::stringstream s ;
s. precision ( 6 ) ;
s << std::scientific << a ;
s >aa;
assert ( a != aa ) ;
You mean: assert(a==aa) ?
>
No sweat

I have to serialize the Yaw answer above. The question: Is it safe to
state that my PI representation is useless beyond six significant
digits? I'd like for the C++ source to reflect my Matlab models but
No, it's not safe to assume that. The sizes of most types, including
float, are implementation-specific, so there is no guarantee that the
float ever needs only 6 digits.

Moreover, floating-point numbers are by nature inexact, so there seems to
be not much point for requiring the *exact* equality after serialization
and deserialization. Any numeric algorithms using the result should be
stable against slight deviations.
I'm starting to get concerned here with the conversion aspect.

Is there a good source out there that will show me how far out I could
represent a value ( say PI ) for both single and double precision
before truncation/round off loss kicks in? ( I tend to struggle with
Do you mean the truncation and rounding performed by the compiler when
compiling the literal constant in the code into the double
representation? Why should you worry about this?

If you are serializing the end result with precision 6 anyway then there
is not much point to go much beyond this in the PI constant (at least in
case of linear relations between them, like your example formula). On the
other hand, the excess precision does no harm. If you have already
bothered to write down some constant with 30 decimal places, I would
leave this as is and not worry about this any more.

I have an itching feeling I have not understood your actual problem...
Paavo

Oct 5 '08 #2
"ma740988" <ma******@gmail.comschreef in bericht
news:ae**********************************@z72g2000 hsb.googlegroups.com...
Consider the equation (flight dynamics stuff):

Yaw (Degrees) = Azimuth Angle(Radians) * 180 (Degrees) /
3.1415926535897932384626433832795 (Radians)
Note that here 3.141... is converted to a double not a float.
>
There's a valid reason to use single precision floating point types.
The number of decimal digits guaranteed to be correct on my
implementation is 6. (i.e numeric_limits < float >::digits10 = 6 )
....
>
I have to serialize the Yaw answer above. The question: Is it safe to
state that my PI representation is useless beyond six significant
digits? I'd like for the C++ source to reflect my Matlab models but
I'm starting to get concerned here with the conversion aspect.

The 6 digits is a minimal guarantee. That is if you convert a number with 6
decimal digits to a float and then back to a string, the result is the same.

When converting PI to a float, you want the floating point number that is
closest to the number PI, not a a float number that when converting to a
string will have the same 6 digits as the PI.
So using more digits can give you a single precision floating point number
closer to PI.

Note also that every operation can result in a rounding error. So if you
divide by a approximation of PI, and the result cannot be represented by a
floating point, the result will be rounded.
So even if your input is accurate in 6 digits, you should use double (or
even long double) to perform your calculations.

See "27 bits are not enough for 8-digit accuracy" from Bennet Goldberg ,
"What every computer scientist should know about floating-point arithmetic"
from David Goldberg, and the home page of William Kahan
(http://www.cs.berkeley.edu/~wkahan/) for more info.

Greetings,
Hans.

Oct 5 '08 #3
On 2008-10-05 05:45:56 -0400, Paavo Helde <no****@ebi.eesaid:
>
Moreover, floating-point numbers are by nature inexact,
No, floating-point numbers are exact. The problem is that they don't
represent real numbers, and if you assume that they do, you don't get
the results you expect.
so there seems to
be not much point for requiring the *exact* equality after serialization
and deserialization.
It's not simple, but it can be done.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Oct 5 '08 #4
Pete Becker <pe**@versatilecoding.comkirjutas:
On 2008-10-05 05:45:56 -0400, Paavo Helde <no****@ebi.eesaid:
>>
Moreover, floating-point numbers are by nature inexact,

No, floating-point numbers are exact.
Yes, that's what I meant, I should have expressed myself better. I have
physics background, there the only exact thing is by definition the actual
physical quantity, which can never be measured exactly (modulo counting of
integers of course).

Paavo
Oct 5 '08 #5

I'd like thank you all (James - as always) for clearing up my
confusion here. The claim was made that since the data is being
serialized I'll be subject to round off and truncation errors above 6
digits (for single precision floating point types). As Hans pointed
out the 6 digits is a minimal guarantee. As you all pointed out (and
I'm paraphrasing) the excess precision doesn't hurt. I have a follow-
on question: Now given this source.
# include <iostream>

class Serializer {

template <typename T>
void Swap( T& var ) {
char* start,*end;
char tmp;
start = (char * ) &var;
end = (char *)&var;
end += sizeof(T)-1;
while(start < end) {
tmp = *start;
*start = *end;
*end = tmp;
start++;end--;
}
}

public :
//////////////////////////////////////////
/// @name Overloaded functions using "double"
//////////////////////////////////////////
char* put_data(char* out, const double& source) {
*(double *)out = source;
return out + sizeof(double);
}
char* get_data(double& target, char* source) {
target = *(double *)source;
return source + sizeof(double);
}
char* put_swapped_data(char* out, const double& source) {
*(double *)out = source;
Swap(*(double *)out);
return out + sizeof(double);
}
char* get_swapped_data(double& target, char* source) {
target = *(double *)source;
Swap(target);
return source + sizeof(double);
}
//////////////////////////////////////////
/// @name Overloaded functions using "float"
//////////////////////////////////////////
char* put_data(char* out, const float& source) {
*(float *)out = source;
return out + sizeof(float);
}
char* get_data(float& target, char* source) {
target = *(float *)source;
return source + sizeof(float);
}
char* put_swapped_data(char* out, const float& source) {
*(float *)out = source;
Swap(*(float *)out);
return out + sizeof(float);
}
char* GetSwappedData(float& target, char* source) {
target = *(float *)source;
Swap(target);
return source + sizeof(float);
}

///////////////////////////////////////////////
/// @name Overloaded functions using "unsigned char"
///////////////////////////////////////////////
char* put_data(char* out, const unsigned char& source) {
*(unsigned char *)out = source;
return out + sizeof(unsigned char);
}
char* get_data(unsigned char& target, char* source) {
target = *(unsigned char *)source;
return source + sizeof(unsigned char);
}
char* put_swapped_data(char* out, const unsigned char& source) {
*(unsigned char *)out = source;
return out + sizeof(unsigned char);
}
char* GetSwappedData(unsigned char& target, char* source) {
target = *(unsigned char *)source;
return source + sizeof(unsigned char);
}

///////////////////////////////////////////////
/// @name Overloaded functions using short*
///////////////////////////////////////////////
char* put_data(char* out, short* source,unsigned length16BitUnits)
{
memcpy(out, (char *)source, length16BitUnits*2);
return out + length16BitUnits*2;
}

char* get_data(short* target, char* source,unsigned
length16BitUnits) {
memcpy((char *)target,source,length16BitUnits*2);
return source+length16BitUnits*2;
}

char* put_swapped_data(char* out, short* source,unsigned
length16BitUnits) {
unsigned i;
char* tmp = put_data(out,source,length16BitUnits);
short* tmpShort = (short *)out;
for(i = 0; i < length16BitUnits; i++) {
Swap(*tmpShort);
tmpShort++;
}
return tmp;
}
char* GetSwappedData(short* target, char* source,unsigned
length16BitUnits) {
unsigned i;
char* tmp = get_data(target,source,length16BitUnits);
short* tmpShort=target;
for(i = 0; i < length16BitUnits; i++) {
Swap(*tmpShort);
tmpShort++;
}
return tmp;
}

};

int main () {
Serializer is ;
int const size_of_type = 40 ;
double source = 3.1415926535897932384626433832795;
char buffer [ size_of_type ] = { 0 };
char *ptr = is.put_data ( buffer, source ) ;
//std::cout << *ptr << std::endl;
std::cin.get() ;
}

First I think this ought to be written to use generic programming.
That aside (I'm not a fan of char* but the vendor string facility -
from what i understand is lacking), how would I verify that the
contents of source is in the buffer and what is the required buffer
size? (i.e based on the function prototype - should the buffer size be
size of type - double in this case )

Oct 6 '08 #6
On Oct 6, 2:38 pm, ma740988 <ma740...@gmail.comwrote:
I'd like thank you all (James - as always) for clearing up my
confusion here. The claim was made that since the data is
being serialized I'll be subject to round off and truncation
errors above 6 digits (for single precision floating point
types). As Hans pointed out the 6 digits is a minimal
guarantee. As you all pointed out (and I'm paraphrasing) the
excess precision doesn't hurt. I have a follow- on question:
Now given this source.
# include <iostream>
class Serializer {

template <typename T>
void Swap( T& var ) {
char* start,*end;
char tmp;
start = (char * ) &var;
end = (char *)&var;
That's a reinterpret_cast. That should tell you immediately
that something is wrong.
end += sizeof(T)-1;
while(start < end) {
tmp = *start;
*start = *end;
*end = tmp;
start++;end--;
}
}
And the entire function looks very much like std::reverse< char* >,
called with a reintpret_cast, e.g.:

template< typename T >
void swap( T& var )
{
std::reverse( reintpret_cast< char* >( &var ),
reintpret_cast< char* >( &var + 1 ) ) ;
}

Any attempt to access the argument after having called this
function (unless T is a character type) is undefined behavior.
If T is an integral type, it will simply give an unspecified
value on most modern machines; if T is a floating point type,
there's a good chance of a core dump.
public :
//////////////////////////////////////////
/// @name Overloaded functions using "double"
//////////////////////////////////////////
char* put_data(char* out, const double& source) {
*(double *)out = source;
And this will core domp 7 times in 8 on my machine (Sun Sparc).
return out + sizeof(double);
}
char* get_data(double& target, char* source) {
target = *(double *)source;
As will this.
return source + sizeof(double);
}
char* put_swapped_data(char* out, const double& source) {
*(double *)out = source;
And this.
Swap(*(double *)out);
return out + sizeof(double);
}
char* get_swapped_data(double& target, char* source) {
target = *(double *)source;
And this.
Swap(target);
return source + sizeof(double);
}
//////////////////////////////////////////
/// @name Overloaded functions using "float"
//////////////////////////////////////////
char* put_data(char* out, const float& source) {
*(float *)out = source;
return out + sizeof(float);
}
char* get_data(float& target, char* source) {
target = *(float *)source;
return source + sizeof(float);
}
char* put_swapped_data(char* out, const float& source) {
*(float *)out = source;
Swap(*(float *)out);
return out + sizeof(float);
}
char* GetSwappedData(float& target, char* source) {
target = *(float *)source;
Swap(target);
return source + sizeof(float);
}
As above, except these will only core dump 3 times in 4, rather
than 7 in 8.

You can't take a char*, and assign a float or a double to it;
there's no guarantee that it is a legal address for a float or a
double.
///////////////////////////////////////////////
/// @name Overloaded functions using "unsigned char"
///////////////////////////////////////////////
char* put_data(char* out, const unsigned char& source) {
*(unsigned char *)out = source;
return out + sizeof(unsigned char);
}
char* get_data(unsigned char& target, char* source) {
target = *(unsigned char *)source;
return source + sizeof(unsigned char);
}
char* put_swapped_data(char* out, const unsigned char& source) {
*(unsigned char *)out = source;
return out + sizeof(unsigned char);
}
char* GetSwappedData(unsigned char& target, char* source) {
target = *(unsigned char *)source;
return source + sizeof(unsigned char);
}
///////////////////////////////////////////////
/// @name Overloaded functions using short*
///////////////////////////////////////////////
char* put_data(char* out, short* source,unsigned length16BitUnits)
{
memcpy(out, (char *)source, length16BitUnits*2);
return out + length16BitUnits*2;
}
char* get_data(short* target, char* source,unsigned
length16BitUnits) {
memcpy((char *)target,source,length16BitUnits*2);
return source+length16BitUnits*2;
}
char* put_swapped_data(char* out, short* source,unsigned
length16BitUnits) {
unsigned i;
char* tmp = put_data(out,source,length16BitUnits);
short* tmpShort = (short *)out;
for(i = 0; i < length16BitUnits; i++) {
Swap(*tmpShort);
tmpShort++;
}
return tmp;
}
char* GetSwappedData(short* target, char* source,unsigned
length16BitUnits) {
unsigned i;
char* tmp = get_data(target,source,length16BitUnits);
short* tmpShort=target;
for(i = 0; i < length16BitUnits; i++) {
Swap(*tmpShort);
tmpShort++;
}
return tmp;
}
};
int main () {
Serializer is ;
int const size_of_type = 40 ;
double source = 3.1415926535897932384626433832795;
char buffer [ size_of_type ] = { 0 };
char *ptr = is.put_data ( buffer, source ) ;
//std::cout << *ptr << std::endl;
std::cin.get() ;
}
First I think this ought to be written to use generic
programming. That aside (I'm not a fan of char* but the
vendor string facility - from what i understand is lacking),
how would I verify that the contents of source is in the
buffer and what is the required buffer size? (i.e based on the
function prototype - should the buffer size be size of type -
double in this case )
I'm not too sure what you're trying to do, but it looks like
you're playing funny games with types, which will get you into
trouble in the long run. (There are a few that you can
sometimes play, if you have to for performance reasons, but you
really have to know what you are doing.)

If the problem is just serialization, I'd go with your original
attempt to use textual formatting. It's a lot easier to debug,
for starters. For IEEE floating point, it is guaranteed that 9
decimal digits suffice for a round trip conversion for float,
and 17 for double (provided the conversion routines are
correct). See http://www.validlab.com/goldberg/paper.pdf, in
particular the section "Binary to Decimal Conversion".

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Oct 6 '08 #7
On Oct 6, 11:06*am, James Kanze <james.ka...@gmail.comwrote:
You can't take a char*, and assign a float or a double to it;
there's no guarantee that it is a legal address for a float or a
double.
Point taken.
I'm not too sure what you're trying to do, but it looks like
you're playing funny games with types, which will get you into
trouble in the long run. *(There are a few that you can
sometimes play, if you have to for performance reasons, but you
really have to know what you are doing.)
Well, the developer hired to do this came up with source shown. I'm a
huge fan of std::string so when I see 'char *' to the extent that I
saw in this code I became nervous( Sadly, support for std::string is
always limited or non-existent when you go the embedded route. Not
sure why). Long story short he had to take off on a 3 week vacation
and I was trying to understand what his issue was with serializing the
data given the requirements I gave him.
>
If the problem is just serialization, I'd go with your original
attempt to use textual formatting. *It's a lot easier to debug,
for starters. *
Got it. I'll have him figure out the right way to do this using
(used sparingly) the 'C' way since since stringstream and string is
off limits
Oct 7 '08 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

687
by: cody | last post by:
no this is no trollposting and please don't get it wrong but iam very curious why people still use C instead of other languages especially C++. i heard people say C++ is slower than C but i can't...
2
by: Benjamin Rutt | last post by:
Does anyone have C code laying around to do this? I have to read in some binary data files that contains some 4-byte IBM/370 floating point values. I would like a function to convert 4-byte...
15
by: michael.mcgarry | last post by:
Hi, I have a question about floating point precision in C. What is the minimum distinguishable difference between 2 floating point numbers? Does this differ for various computers? Is this...
116
by: Dilip | last post by:
Recently in our code, I ran into a situation where were stuffing a float inside a double. The precision was extended automatically because of that. To make a long story short, this caused...
0
by: Edwin.Madari | last post by:
>>round(76.1, -2) 100.0 80.0 76.0 builtin function round, will work for you...... Help on built-in function round in module __builtin__: round(...) round(number) -floating point number
9
by: ssubbarayan | last post by:
Hi all, I am trying a program to convert floating point values to a byte array and printing the same to the screen.The idea behind this is we already have an existing function which can do byte...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.