472,111 Members | 1,896 Online

# sanity check - floating point comparison

template <class T> inline bool isEqual( const T& a, const T& b,
const T epsilon = std::numeric_limits<T>::epsilon() )
{
const T diff = a - b;
return ( diff <= epsilon ) && ( diff >= -epsilon );
}

int main()
{
std::deque<double> pt ;
pt.push_back ( 2.3123 );
pt.push_back ( 4.3445 );
pt.push_back ( 1.343 );
pt.push_back ( 4.3445 );
pt.push_back ( 4.3445 );
pt.push_back ( 2.3123 );
std::deque<double> jt ;
jt.push_back ( 4.3445 );
jt.push_back ( 2.3123 );

std::deque<double> results;
// results should have - 1, 3, 4, 0, 5

for ( int idx ( 0 ) ; idx < jt.size(); ++idx )
{
for ( int kdx ( 0 ); kdx < pt.size(); ++kdx )
{
if ( isEqual<double>( jt [ idx ], pt [ kdx ] ) )
{
results.push_back ( kdx );
}
}
}
std::copy ( results.begin(), results.end(),
std::ostream_iterator<int> ( std::cout, "\n" ) );
}

My intent. I'll search the pt container for the values in the jt
container. If found, I'll store - in the result container - the
index/location where the value was found in the pt container. For
example: Search pt container for first element in jt container. So I
found 4 at postions 1, 3 and 4. Similarily for 2. 2 was found within
the pt container at 0 and 5.

Result will print, 1, 3, 4, 0, 5. Works.

What makes me nervous though is the floating point comparsion. After
all numeric_limits is not defined on one platform (using gcc 2.96).
That said, I was opting to use iterators with - I think std::distance
but I'll end up doing floating point comparison anyways, in which case
my own version ( with my own comparator - isEqual) works best. Correct?

Apr 25 '06 #1
32 3762
In message <11**********************@u72g2000cwu.googlegroups .com>,
ma740988 <ma******@gmail.com> writes

template <class T> inline bool isEqual( const T& a, const T& b,
const T epsilon = std::numeric_limits<T>::epsilon() )
{
const T diff = a - b;
return ( diff <= epsilon ) && ( diff >= -epsilon );
}

int main()
{
std::deque<double> pt ;
pt.push_back ( 2.3123 );
pt.push_back ( 4.3445 );
pt.push_back ( 1.343 );
pt.push_back ( 4.3445 );
pt.push_back ( 4.3445 );
pt.push_back ( 2.3123 );
std::deque<double> jt ;
jt.push_back ( 4.3445 );
jt.push_back ( 2.3123 );

std::deque<double> results;
// results should have - 1, 3, 4, 0, 5

for ( int idx ( 0 ) ; idx < jt.size(); ++idx )
{
for ( int kdx ( 0 ); kdx < pt.size(); ++kdx )
{
if ( isEqual<double>( jt [ idx ], pt [ kdx ] ) )
{
results.push_back ( kdx );
}
}
}
std::copy ( results.begin(), results.end(),
std::ostream_iterator<int> ( std::cout, "\n" ) );
}

My intent. I'll search the pt container for the values in the jt
container. If found, I'll store - in the result container - the
index/location where the value was found in the pt container. For
example: Search pt container for first element in jt container. So I
found 4 at postions 1, 3 and 4. Similarily for 2. 2 was found within
the pt container at 0 and 5.

Result will print, 1, 3, 4, 0, 5. Works.

What makes me nervous though is the floating point comparsion. After
all numeric_limits is not defined on one platform (using gcc 2.96).
?
That said, I was opting to use iterators with - I think std::distance
? Use of iterators versus indexing is a completely different question
from how to compare the stored values.
but I'll end up doing floating point comparison anyways, in which case
my own version ( with my own comparator - isEqual) works best. Correct?

IMO no.

I can't imagine anyone deliberately setting out to produce a compiler on
which the same floating literal 2.3123 etc., wouldn't produce the same
double value internally in each place in the code where it's used, if
you use the same compiler flags. (If you're generating the numbers by
arithmetic it's a different matter, of course.)

Even if the compiler did produce different values for the same literal,
there's no reason to suppose that the results would pass isEqual().
epsilon() has its uses in numerical analysis, but I don't think this is
an appropriate one. In effect it's guaranteed that if X==1.0, (a) X and
X + epsilon() are distinct values and (b) X and X + epsilon()/2 are not,
but for other values of X one of those assertions may not be true.

To specify fuzzy floating-point comparisons correctly you generally need
some knowledge of the domain being modelled, but here you're trying to
second-guess a compiler problem that probably doesn't even exist.

Finally, if you really must use your own comparator, don't call it
isEqual.

--
Richard Herring
Apr 25 '06 #2

IMO no.

I can't imagine anyone deliberately setting out to produce a compiler on
which the same floating literal 2.3123 etc., wouldn't produce the same
double value internally in each place in the code where it's used, if
you use the same compiler flags. (If you're generating the numbers by
arithmetic it's a different matter, of course.)

So if I understand you correctly. Simply doing
if ( jt [ idx ] == pt [ kdx ] ) )
would suffice?

Apr 25 '06 #3
Hello,

ma740988 wrote:

template <class T> inline bool isEqual( const T& a, const T& b,
const T epsilon = std::numeric_limits<T>::epsilon() )
{
const T diff = a - b;
return ( diff <= epsilon ) && ( diff >= -epsilon );
}

That epsilon is the difference of 1 and the least value strictly greater
than 1. If you take 1000000 and the least value strictly greater than
1000000, than you will see, that the difference is greater than
epsilon. I don't think you want those values to make isEqual return
false.

int main()
{
std::deque<double> pt ;
pt.push_back ( 2.3123 );
pt.push_back ( 4.3445 );
pt.push_back ( 1.343 );
pt.push_back ( 4.3445 );
pt.push_back ( 4.3445 );
pt.push_back ( 2.3123 );
std::deque<double> jt ;
jt.push_back ( 4.3445 );
jt.push_back ( 2.3123 );
if ( isEqual<double>( jt [ idx ], pt [ kdx ] ) ) my own version ( with my own comparator - isEqual) works best.
Correct?

Almost, just don't abuse epsilon. Your input values seem to be rounded
to 4 digits after decimal point, so take 1e-4 instead of epsilon. You
will have to add epsilon as parameter to isEqual, either as template
argument via some bypass, or in the function argument, or hardcode
epsilon some other way.

Bernd Strieder
Apr 25 '06 #4

ma740988 wrote:
What makes me nervous though is the floating point comparsion. After
all numeric_limits is not defined on one platform (using gcc 2.96).

You need to upgrade your complier to gcc 3.x at least. Or you try to
find a patch for gcc 2.96 to fix the problem.

Apr 26 '06 #5

Richard Herring wrote:
To specify fuzzy floating-point comparisons correctly you generally need
some knowledge of the domain being modelled, but here you're trying to
second-guess a compiler problem that probably doesn't even exist.

why not ? But it is a fact that gcc 2.96 has a problem in handling
"epsilon" .

Apr 26 '06 #6

ma740988 wrote:

IMO no.

I can't imagine anyone deliberately setting out to produce a compiler on
which the same floating literal 2.3123 etc., wouldn't produce the same
double value internally in each place in the code where it's used, if
you use the same compiler flags. (If you're generating the numbers by
arithmetic it's a different matter, of course.)

So if I understand you correctly. Simply doing
if ( jt [ idx ] == pt [ kdx ] ) )
would suffice?

It doesn't work for float point number.

Apr 26 '06 #7

Bernd Strieder wrote:
That epsilon is the difference of 1 and the least value strictly greater
than 1. If you take 1000000 and the least value strictly greater than
1000000, than you will see, that the difference is greater than
epsilon. I don't think you want those values to make isEqual return
false.
epsilon depends on data type.

Almost, just don't abuse epsilon. Your input values seem to be rounded
to 4 digits after decimal point, so take 1e-4 instead of epsilon. You
will have to add epsilon as parameter to isEqual, either as template
argument via some bypass, or in the function argument, or hardcode
epsilon some other way.

the hard code should work here. But it is not the best way that uses
epsilon.
If the data type is not double but other kind of struct, how do you
deal with it?

struct new_type
{
char x;
int y;
long z;
double w;
}

Apr 26 '06 #8
In message <11*********************@u72g2000cwu.googlegroups. com>,
dan2online <da********@gmail.com> writes

ma740988 wrote:
> >
> IMO no.
>
> I can't imagine anyone deliberately setting out to produce a compiler on
> which the same floating literal 2.3123 etc., wouldn't produce the same
> double value internally in each place in the code where it's used, if
> you use the same compiler flags. (If you're generating the numbers by
> arithmetic it's a different matter, of course.)
>

So if I understand you correctly. Simply doing
if ( jt [ idx ] == pt [ kdx ] ) )
would suffice?

It doesn't work for float point number.

What do you mean "it doesn't work"? Equality is perfectly well defined
for floating types. It just isn't always what you want to test.

--
Richard Herring
Apr 26 '06 #9
In message <11*********************@u72g2000cwu.googlegroups. com>,
dan2online <da********@gmail.com> writes

Bernd Strieder wrote:
That epsilon is the difference of 1 and the least value strictly greater
than 1. If you take 1000000 and the least value strictly greater than
1000000, than you will see, that the difference is greater than
epsilon. I don't think you want those values to make isEqual return
false.
epsilon depends on data type.

Regardless, it's still the wrong thing to use, because it only
approximates "least detectable difference" for values near to 1.0.
Almost, just don't abuse epsilon. Your input values seem to be rounded
to 4 digits after decimal point, so take 1e-4 instead of epsilon. You
will have to add epsilon as parameter to isEqual, either as template
argument via some bypass, or in the function argument, or hardcode
epsilon some other way.

the hard code should work here. But it is not the best way that uses
epsilon.
If the data type is not double but other kind of struct, how do you
deal with it?

struct new_type
{
char x;
int y;
long z;
double w;
}

Then you understand the problem domain it's supposed to be modelling,
and define your comparison accordingly.

--
Richard Herring
Apr 26 '06 #10
In message <11*********************@u72g2000cwu.googlegroups. com>,
dan2online <da********@gmail.com> writes

Richard Herring wrote:
To specify fuzzy floating-point comparisons correctly you generally need
some knowledge of the domain being modelled, but here you're trying to
second-guess a compiler problem that probably doesn't even exist.
why not ?

Why would it?

(And even if it does, why would the resulting error be related to
epsilon?)
But it is a fact that gcc 2.96 has a problem in handling
"epsilon" .

--
Richard Herring
Apr 26 '06 #11
In message <11**********************@y43g2000cwc.googlegroups .com>,
ma740988 <ma******@gmail.com> writes

>

IMO no.

I can't imagine anyone deliberately setting out to produce a compiler on
which the same floating literal 2.3123 etc., wouldn't produce the same
double value internally in each place in the code where it's used, if
you use the same compiler flags. (If you're generating the numbers by
arithmetic it's a different matter, of course.)

So if I understand you correctly. Simply doing
if ( jt [ idx ] == pt [ kdx ] ) )
would suffice?

Yes, *if* the numbers were floating literals inserted into jt and pt as
your previous code suggests.

If they are computed, you need a comparator named according to what it
does e.g. IsNearlyEqual() which compares with a tolerance you determine,
based on the nature of the domain you're trying to model.

--
Richard Herring
Apr 26 '06 #12

Richard Herring wrote:
In message <11*********************@u72g2000cwu.googlegroups. com>,
dan2online <da********@gmail.com> writes

Bernd Strieder wrote:
That epsilon is the difference of 1 and the least value strictly greater
than 1. If you take 1000000 and the least value strictly greater than
1000000, than you will see, that the difference is greater than
epsilon. I don't think you want those values to make isEqual return
false.

epsilon depends on data type.

Regardless, it's still the wrong thing to use, because it only
approximates "least detectable difference" for values near to 1.0.

My frame of reference was the C++ FAQ. In that regard I tried to model
something akin to:

http://www.parashift.com/c++-faq-li....html#faq-29.17
In that regard I was just trying to get 'close' as the FAQ puts it.

Apr 26 '06 #13

Richard Herring wrote:
In message <11*********************@u72g2000cwu.googlegroups. com>,
dan2online <da********@gmail.com> writes

Bernd Strieder wrote:
That epsilon is the difference of 1 and the least value strictly greater
than 1. If you take 1000000 and the least value strictly greater than
1000000, than you will see, that the difference is greater than
epsilon. I don't think you want those values to make isEqual return
false.

epsilon depends on data type.

Regardless, it's still the wrong thing to use, because it only
approximates "least detectable difference" for values near to 1.0.

My frame of reference was the C++ FAQ. In that regard I tried to model
something akin to:

http://www.parashift.com/c++-faq-li....html#faq-29.17
In that regard I was just trying to get 'close' as the FAQ puts it.

Apr 26 '06 #14
In message <11**********************@j33g2000cwa.googlegroups .com>,
ma740988 <ma******@gmail.com> writes

Richard Herring wrote:
In message <11*********************@u72g2000cwu.googlegroups. com>,
dan2online <da********@gmail.com> writes
>
>Bernd Strieder wrote:
>> That epsilon is the difference of 1 and the least value strictly greater
>> than 1. If you take 1000000 and the least value strictly greater than
>> 1000000, than you will see, that the difference is greater than
>> epsilon. I don't think you want those values to make isEqual return
>> false.
>
>epsilon depends on data type.

Regardless, it's still the wrong thing to use, because it only
approximates "least detectable difference" for values near to 1.0.

My frame of reference was the C++ FAQ. In that regard I tried to model
something akin to:

http://www.parashift.com/c++-faq-li....html#faq-29.17
In that regard I was just trying to get 'close' as the FAQ puts it.

Fair enough, but note that their "epsilon" is not the function from
numeric_limits, but "some small number such as 1e-5" and the onus is on
you to determine what is appropriate. Moreover, in the FAQ code, epsilon
represents a relative error, not an absolute one.

The worst thing about that FAQ is the name of the function. It doesn't
compute equality, so isEqual() is not an appropriate name for it - as
the author points out, it isn't even symmetric.
--
Richard Herring
Apr 26 '06 #15
Richard Herring <ju**@[127.0.0.1]> wrote:
Fair enough, but note that their "epsilon" is not the function from
numeric_limits, but "some small number such as 1e-5" and the onus is on
you to determine what is appropriate. Moreover, in the FAQ code, epsilon
represents a relative error, not an absolute one.

The worst thing about that FAQ is the name of the function. It doesn't
compute equality, so isEqual() is not an appropriate name for it - as
the author points out, it isn't even symmetric.

This whole discussion reminds me of an article I read a while ago called
"Comparing Floats - How To Determine if Floating Quantities Are Close
Enough Once a Tolerance Has Been Reached" in the March 2000 C++ Report.
It used to be online at http://www.adtmag.com/joop/crarticle.asp?ID=396
but not anymore. In the process of trying to find the above-cited
article, I found this, which may be helpful for the OP:

http://www.boost.org/libs/test/doc/c...omparison.html

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
Apr 26 '06 #16
Richard Herring wrote:
In message <11*********************@u72g2000cwu.googlegroups. com>,
dan2online <da********@gmail.com> writes Regardless, it's still the wrong thing to use, because it only
approximates "least detectable difference" for values near to 1.0.

First, you can test the example of the orignal post and print out
epsilon like:
template <class T> inline bool isEqual( const T& a, const T& b,
const T epsilon = std::numeric_limits<T>::epsilon() )
{
const T diff = a - b;

+ cout << epsilon <<endl;
return ( diff <= epsilon ) && ( diff >= -epsilon );
}

so you will see what value the "epsilon" is.
epsilon = 2.22045e-016 for IA32 architecture.

Second, "only approximates "least detectable difference" for values
near to 1.0" doesn't mean that epsilon is near to 1.0. You can test
the following code for the epsilon algorithm.

#include <iostream>
int main()
{
double base = 1.0f;
double test = base;
double epsilon = test;

while (1)
{
double tmp = base + test;
if (tmp == base)
break;
else
epsilon = test;
test /= 2.0f;
}

std::cout << "epsilon = "<<epsilon << "\n";
return 0;
}
Third, the original post uses gcc 2.96 that cannot handle the epsilon
properly.
That is a defect with gcc 2.96.

Apr 26 '06 #17

Richard Herring wrote:
In message <11*********************@u72g2000cwu.googlegroups. com>,
dan2online <da********@gmail.com> writes

What do you mean "it doesn't work"? Equality is perfectly well defined
for floating types. It just isn't always what you want to test.

Only if two floating point numbers have the same bit pattern in memory,
you can say they are *perfectly* equal.

Apr 26 '06 #18
dan2online wrote:
What do you mean "it doesn't work"? Equality is perfectly well defined
for floating types. It just isn't always what you want to test.

Only if two floating point numbers have the same bit pattern in memory,
you can say they are *perfectly* equal.

RH said "perfectly well defined", not "perfectly equal".

I don't know if The Standard says two floats with the same bit pattern must
compare equal. I will not rely on that, and I can envision a CPU with an
arithmetic logic unit that optimizes such a comparison in some inscrutable
way that breaks equality. So I would prefer my compiler to produce fast
opcodes, not opcodes that hold my hand.

--
Phlip
http://c2.com/cgi/wiki?ZeekLand <-- NOT a blog!!!
Apr 26 '06 #19
In message <11**********************@u72g2000cwu.googlegroups .com>,
dan2online <da********@gmail.com> writes
Richard Herring wrote:
In message <11*********************@u72g2000cwu.googlegroups. com>,
dan2online <da********@gmail.com> writes
Regardless, it's still the wrong thing to use, because it only
approximates "least detectable difference" for values near to 1.0.
>

First, you can test the example of the orignal post and print out
epsilon like:
template <class T> inline bool isEqual( const T& a, const T& b,
const T epsilon = std::numeric_limits<T>::epsilon() )
{
const T diff = a - b;

+ cout << epsilon <<endl;
return ( diff <= epsilon ) && ( diff >= -epsilon );
}

so you will see what value the "epsilon" is.
epsilon = 2.22045e-016 for IA32 architecture.

Yes. So what?
Second, "only approximates "least detectable difference" for values
near to 1.0" doesn't mean that epsilon is near to 1.0.
You appear to be reading things I never posted. Of course epsilon is
nowhere near 1.0, it's *the values being compared* that have to be near
1.0

[...]
Third, the original post uses gcc 2.96 that cannot handle the epsilon
properly.
That is a defect with gcc 2.96.

Since numeric_limits::epsilon() is the wrong quantity to use here, why
does that matter?

--
Richard Herring
Apr 27 '06 #20
In message <l7******************@newssvr33.news.prodigy.com >, Phlip
<ph******@yahoo.com> writes
dan2online wrote:
What do you mean "it doesn't work"? Equality is perfectly well defined
for floating types. It just isn't always what you want to test.
Only if two floating point numbers have the same bit pattern in memory,
you can say they are *perfectly* equal.

RH said "perfectly well defined", not "perfectly equal".

I don't know if The Standard says two floats with the same bit pattern must
compare equal.

I think dan2online is the only poster who's raised the issue of bit
patterns.
I will not rely on that, and I can envision a CPU with an
arithmetic logic unit that optimizes such a comparison in some inscrutable
way that breaks equality. So I would prefer my compiler to produce fast
opcodes, not opcodes that hold my hand.

I would hope that we can rely on floating-point numbers being ordered
such that for any pair X, Y of "normal" floating values (i.e. excluding
things like NaN) exactly one of X<Y, X==Y, Y<X is true, and that these
relations are transitive, and that if X==Y, X-Y is zero.

--
Richard Herring
Apr 27 '06 #21
Richard Herring wrote:
I think dan2online is the only poster who's raised the issue of bit
patterns.

Here is an off-topic discussion, but perhaps it is helpful.

When using == to compare two floating point number, the FP unit will
compare them based on bit patterns. (sign, exponent, fraction bits,
IEEE754 standard)
S EEEEEEEEEEE FFFFFFFFFFFFFF....FFFF
0 1 11 12 63
Any of parts is different, two floating points is not equal.
Computer hardware supporting double floating point number also employs
80 bit temp registers for arithmetic/logic operation. In this scenario,
== is OK to compare two floating point number.

But if some embedded hardware or old computer has no hardware to
support floating point, or has no 80 bit temp registers,
arithmetic/logic operation will rely on software emulation. In this
scenario, == is questionable because of the potential precision loss.

That's why I mentioned the issue of bit pattern.

Practically, we can use the difference between two floating point
numbers. Strictly speaking, nearEqual is more pricise than isEqual for
floating point number.

If two floating point number bit patterns are the same except the least
bit, two floating point number can be considered equal in most cases.
I guess the original post want to try it for this reason. The method
was used in old days.

So epsilon is least detectable difference between two floating point
numbers near 1.0.
epsilon = 2^(-52) = 2.2204460492503130808472633361816e-16

Two floating point numbers are not near to 1 like the example of
original post,
compute diff = a/b -1, if compare diff < epsilon or diff > -epsilon.
then a = b.

The example of original post is not correct ! Ruchard Herring pointed
out the problem.
I will not rely on that, and I can envision a CPU with an
arithmetic logic unit that optimizes such a comparison in some inscrutable
way that breaks equality. So I would prefer my compiler to produce fast
opcodes, not opcodes that hold my hand.

I would hope that we can rely on floating-point numbers being ordered
such that for any pair X, Y of "normal" floating values (i.e. excluding
things like NaN) exactly one of X<Y, X==Y, Y<X is true, and that these
relations are transitive, and that if X==Y, X-Y is zero.

Apr 30 '06 #22
In article <1145974247.296024.308190
says...

template <class T> inline bool isEqual( const T& a, const T& b,
const T epsilon = std::numeric_limits<T>::epsilon() )
{
const T diff = a - b;
return ( diff <= epsilon ) && ( diff >= -epsilon );
}
[ ... ]
What makes me nervous though is the floating point comparsion. After
all numeric_limits is not defined on one platform (using gcc 2.96).
That said, I was opting to use iterators with - I think std::distance
but I'll end up doing floating point comparison anyways, in which case
my own version ( with my own comparator - isEqual) works best. Correct?

This question came up a while back in
comp.lang.c++.moderated. Here's what I had to say about
it there:

http://tinyurl.com/lvpl8

Here's my idea of a function that does a comparison in a
reasonable fashion:

http://tinyurl.com/qcmyd

Richard Herring does have a good point though -- it would
be better to rename the function to reflect the fact that
it tests for approximate equality rather than equality.

--
Later,
Jerry.

The universe is a figment of its own imagination.
May 6 '06 #23

Here's my idea of a function that does a comparison in a
reasonable fashion:

http://tinyurl.com/qcmyd

Jerry, appreaciate the link and after playing with isEqual I think I've
got a handle on it.
Got a follow-up question for you, with regards to the comparion in
question.

Give two files with contents akin to:
file1.txt
65.3433
43.9999
// more

file2.txt
65.3433
43.9999
// more

Now with regards to your version of isEqual. How would i use said
version to compare the contents of the two files?

May 8 '06 #24
In article <1147102369.653191.324490
says...

[ ... ]
Give two files with contents akin to:
[ one floating point number per line ]
Now with regards to your version of isEqual. How would i use said
version to compare the contents of the two files?

Well, I suppose you'd read an item from each file,
compare them, and react appropriately based on whether
they're nearly equal or not.

--
Later,
Jerry.

The universe is a figment of its own imagination.
May 8 '06 #25

Jerry Coffin wrote:

Well, I suppose you'd read an item from each file,
compare them, and react appropriately based on whether
they're nearly equal or not.

Actually what I was after was a way to improve my source to take
advantage of the function object in isEqual with an algorithm. With my
current approach i read both files into a vector<double> then compare
them with a for loop. Works so I wont fuss with it.

Thanks.

May 8 '06 #26
ma740988 wrote:

Actually what I was after was a way to improve my source to take
advantage of the function object in isEqual with an algorithm. With my
current approach i read both files into a vector<double> then compare
them with a for loop. Works so I wont fuss with it.

Read each line as text and compare them as text. No conversions. Much
better than doing possibly lossy conversions and then trying to guess
whether there was a big enough loss of information to be concerned about.

--

Pete Becker
Roundhouse Consulting, Ltd.
May 8 '06 #27
Jerry Coffin <jc*****@taeus.com> wrote:
This question came up a while back in
comp.lang.c++.moderated. Here's what I had to say about
it there:

http://tinyurl.com/lvpl8

Here's my idea of a function that does a comparison in a
reasonable fashion:

http://tinyurl.com/qcmyd

Richard Herring does have a good point though -- it would
be better to rename the function to reflect the fact that
it tests for approximate equality rather than equality.

I was going through some old bookmarks and I found another article that
may be of interest:

Comparing Floating Point Numbers
by Bruce Dawson
http://www.cygnus-software.com/paper...ringfloats.htm

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
May 8 '06 #28
Read each line as text and compare them as text. No conversions. Much
better than doing possibly lossy conversions and then trying to guess
whether there was a big enough loss of information to be concerned about.

I think what you're alluding to is some getline approach?
Here's my approach in a nutshell.
# include <iostream>
# include <string>
# include <algorithm>
# include <iterator>
# include <vector>
# include <algorithm>
# include <fstream>
# include <numeric>

template<typename T>
void read_any_t(std::ifstream& i, const std::vector<T>& v ) {
size_t len = v.size();
i.read( reinterpret_cast<T*>( &v[ 0 ] ), streamsize ( len ) * sizeof
( T ) );
}

unsigned int DEBUG ( 0 );
int main(int argc, char* argv[])
{
std::vector< double > v1;
std::vector< double > v2;
v1.reserve ( 0x100000 ); // wild guess.. reserve enough to prevent
relocation
v2.reserve ( 0x100000 ); // wild guess.. reserve enough to prevent
relocation

std::string file1 ( "file1.txt" );
std::string file2 ( "file2.txt" );

std::ifstream f1 ( file1.c_str() );
std::ifstream f2 ( file2.c_str() );

if ( !f1 || !f2 )
return EXIT_FAILURE;
// only problem i have with this approach
// .. is determining how to check for failure .. if error here how
does one _know_?

std::copy(std::istream_iterator<double>(f1),
std::istream_iterator<double>(),
std::back_inserter(v1));
std::copy(std::istream_iterator<double>(f2),
std::istream_iterator<double>(),
std::back_inserter(v2));
// doesn't fly .. i suspect binary expected.

// read_any_t<double> ( f1, v1 );
// read_any_t<double> ( f2, v2 );

if ( DEBUG )
{
std::copy ( v1.begin(), v1.end(),
std::ostream_iterator< double > ( std::cout, "\n" ) );
std::cout << std::endl;
std::copy ( v2.begin(), v2.end(),
std::ostream_iterator< double > ( std::cout, "\n" ) );
}
typedef std::vector< double >::size_type size_type;
size_type const sz1 = v1.size();
size_type const sz2 = v2.size();

if ( sz1 == sz2 )
{
typedef std::vector< double >::const_iterator const_iter;
const_iter v1beg = v1.begin();
const_iter v1end = v1.end();
const_iter v2beg = v2.begin();

for ( ; v1beg != v1end; ++v1beg )
{
// use Jerry's isEqual
}
}

// finally can we replace that for loop with some algo of sorts..
// more on this later
}

I'm using Jerry's isEqual - which is not shown in the above but you get
the point

May 8 '06 #29
In article <1147104844.160961.272690
says...

Jerry Coffin wrote:

Well, I suppose you'd read an item from each file,
compare them, and react appropriately based on whether
they're nearly equal or not.

Actually what I was after was a way to improve my source to take
advantage of the function object in isEqual with an algorithm. With my
current approach i read both files into a vector<double> then compare
them with a for loop. Works so I wont fuss with it.

Ah, I see. It'll still depend on what sort of comparison
you're doing. For example, if you want to know how many
of them are equal, you could use something like
std::count_if. If you only want to know whether the files
are the same or not, you can stop reading at the first
pair that compares non-equal -- and if the files are big,
that may save quite a bit of time.

As far as most algorithms care, you should be able to use
std::istream_iterators to work with the files directly,
rather than reading from the files into vectors, and then
applying the algorithm to those iterators.

--
Later,
Jerry.

The universe is a figment of its own imagination.
May 8 '06 #30
In article <Ip********************@giganews.com>,
pe********@acm.org says...

[ ... ]
Read each line as text and compare them as text. No conversions. Much
better than doing possibly lossy conversions and then trying to guess
whether there was a big enough loss of information to be concerned about.

That depends. If you want to know whether the values in
the files were precisely identical, textual comparison is
clearly the way to go -- and you'd usually be much better
off writing the values on in hexadecimal or binary rather
than converting them to decimal at all.

If, OTOH, you're doing something like regression testing,
and want to allow an optimized version of calculations,
as long as the results agree to (say) ten significant
decimal digits (with proper rounding), that's usually not
nearly as easy to do by textual comparison.

As an aside, are you honestly implying that if I read and
convert two identical pieces of text from the input file,
that the double values produced might not be identical?

Taking a value, converting to decimal, and then
converting back to a double (or float or long double) I'd
expect to usually produce a value slightly different from
the original before being written out. IOW, the process
has less than perfect accuracy. I'm a bit surprised,
however, at the implication that it might not be
repeatable...

--
Later,
Jerry.

The universe is a figment of its own imagination.
May 8 '06 #31
Jerry Coffin wrote:

That depends. If you want to know whether the values in
the files were precisely identical, textual comparison is
clearly the way to go

In the example files in the message that I replied to that was the case.

--

Pete Becker
Roundhouse Consulting, Ltd.
May 8 '06 #32
In article <1147107741.206933.32250
says...
Read each line as text and compare them as text. No conversions. Much
better than doing possibly lossy conversions and then trying to guess
whether there was a big enough loss of information to be concerned about.

I think what you're alluding to is some getline approach?
Here's my approach in a nutshell.

[ code elided ... ]

I'd consider using std::equal:

std::istream_iterator<double> input1(stream1), end;
std::istream_iterator<double> input2(stream2);

if (!std::equal(input1, end, input2, isApproxEqual) ||
input1 || input2)
{
std::cout << "The streams are different.\n";
}
else
{
std::cout << "The streams are equal.\n";
}

Basically, we consider the streams equal if and only if
each value we read from each compare approximately equal,
and we reach the end of both streams at the same time.

--
Later,
Jerry.

The universe is a figment of its own imagination.
May 8 '06 #33

### This discussion thread is closed

Replies have been disabled for this discussion.