P: n/a

I wanted to write hash function float or double. Any suggestion would
be appreciated.  
Share this Question
P: n/a

shaanxxx wrote On 08/23/06 11:32,:
I wanted to write hash function float or double.
Why?
Have you ever heard the advice "Never compare floating
point values for equality?" I'll admit that "never" is too
strong, but it's still true that you should very seldom
compare floatingpoint values for equality. The problem is
not with the values themselves, but with their sources: two
computations that "ought" to produce the same result may in
fact produce results that differ in a few loworder bits.
Consequently, if you are using floatingpoint values as
keys in a hash table  a data structure that relies on the
notion of exact equality  you're in trouble. Calculate
the square root of two as the key of some item and add it
to the table. Later on, calculate the square root of eight
and divide it by two (it "ought" to be sqrt(2), right?) and
search the hash table for an item with that key  there's
an excellent chance you won't find the item.
Floatingpoint values make terrible hash table keys.
Any suggestion would
be appreciated.
Imperfect but easy to do:
double d = sqrt(2.0);
unsigned char *p = (unsigned char*)&d;
/* Now compute a hash code for the array of
* values p[0] through p[sizeof d  1].
*/
It's imperfect because there may be floatingpoint values
that compare equal but have different bit patterns. Many
systems provide "plus zero" and "minus zero," which look
different but are equal. Some systems support "unnormalized"
numbers (not to be confused with "denormal" numbers), which
allows some values to be represented in several forms (just
as the square root of nine can be written as 3.0, 0.3e1,
0.03e2, ...). Finally, many systems provide "NaN" values
which never compare equal to anything, not even themselves,
so you may have two bitforbit identical values that still
aren't "==".
 Er*********@sun.com  
P: n/a

Eric Sosman wrote:
shaanxxx wrote On 08/23/06 11:32,:
I wanted to write hash function float or double.
Why?
Have you ever heard the advice "Never compare floating
point values for equality?" I'll admit that "never" is too
strong, but it's still true that you should very seldom
compare floatingpoint values for equality. The problem is
not with the values themselves, but with their sources: two
computations that "ought" to produce the same result may in
fact produce results that differ in a few loworder bits.
Consequently, if you are using floatingpoint values as
keys in a hash table  a data structure that relies on the
notion of exact equality  you're in trouble. Calculate
the square root of two as the key of some item and add it
to the table. Later on, calculate the square root of eight
and divide it by two (it "ought" to be sqrt(2), right?) and
search the hash table for an item with that key  there's
an excellent chance you won't find the item.
Floatingpoint values make terrible hash table keys.
Any suggestion would
be appreciated.
Imperfect but easy to do:
double d = sqrt(2.0);
unsigned char *p = (unsigned char*)&d;
/* Now compute a hash code for the array of
* values p[0] through p[sizeof d  1].
*/
It's imperfect because there may be floatingpoint values
that compare equal but have different bit patterns. Many
systems provide "plus zero" and "minus zero," which look
different but are equal. Some systems support "unnormalized"
numbers (not to be confused with "denormal" numbers), which
allows some values to be represented in several forms (just
as the square root of nine can be written as 3.0, 0.3e1,
0.03e2, ...). Finally, many systems provide "NaN" values
which never compare equal to anything, not even themselves,
so you may have two bitforbit identical values that still
aren't "==".
 Er*********@sun.com
I have developed following macro for float comparisions based on
limit.h
#define FLOAT_EPSILON(a,b) ( a b ? fabs(a) * FLT_EPSILON : fabs(b) *
FLT_EPSILON)
#define FLT_EQUAL(a, b) (fabs((a)(b)) <= FLOAT_EPSILON(a,b))
Its working fine float comparision. I am wondering how long it will
work fine.
any suggestion ?  
P: n/a

shaanxxx schrieb:
Eric Sosman wrote:
>>shaanxxx wrote On 08/23/06 11:32,:
>>>I wanted to write hash function float or double.
Why?
Have you ever heard the advice "Never compare floating point values for equality?" I'll admit that "never" is too strong, but it's still true that you should very seldom compare floatingpoint values for equality. The problem is not with the values themselves, but with their sources: two computations that "ought" to produce the same result may in fact produce results that differ in a few loworder bits.
Consequently, if you are using floatingpoint values as keys in a hash table  a data structure that relies on the notion of exact equality  you're in trouble. Calculate the square root of two as the key of some item and add it to the table. Later on, calculate the square root of eight and divide it by two (it "ought" to be sqrt(2), right?) and search the hash table for an item with that key  there's an excellent chance you won't find the item.
Floatingpoint values make terrible hash table keys.
>>>Any suggestion would be appreciated.
Imperfect but easy to do:
double d = sqrt(2.0); unsigned char *p = (unsigned char*)&d; /* Now compute a hash code for the array of * values p[0] through p[sizeof d  1]. */
It's imperfect because there may be floatingpoint values that compare equal but have different bit patterns. Many systems provide "plus zero" and "minus zero," which look different but are equal. Some systems support "unnormalized" numbers (not to be confused with "denormal" numbers), which allows some values to be represented in several forms (just as the square root of nine can be written as 3.0, 0.3e1, 0.03e2, ...). Finally, many systems provide "NaN" values which never compare equal to anything, not even themselves, so you may have two bitforbit identical values that still aren't "==".
I have developed following macro for float comparisions based on
limit.h
#define FLOAT_EPSILON(a,b) ( a b ? fabs(a) * FLT_EPSILON : fabs(b) *
FLT_EPSILON)
Forgets to take care of the sign. Try a=0.0F, b=1e10...
#define FLT_EQUAL(a, b) (fabs((a)(b)) <= FLOAT_EPSILON(a,b))
Consider
#define FLT_APPROX_EQUAL(a,b, eps) \
(2 * fabs((a)  (b)) <= (eps)*(fabs(a)+fabs(b)))
instead, with
#define MY_FLT_EQUAL(a, b) \
FLT_APPROX_EQUAL(a,b,MY_EPSILON)
where you explicitly define MY_EPSILON.
Its working fine float comparision. I am wondering how long it will
work fine.
any suggestion ?
Yes: Be careful. FLT_EQUAL(a, b) and FLT_EQUAL(b, c) do
not imply FLT_EQUAL(a, c); this can be detrimental for the
efficiency of hashing as you become dependent on the order
in which you enter a, b, c in the table as you can either have
separate entries for a and c or the same.
If you know more about how the floating point numbers you are
trying to hash, there may be reliable equivalence classes.
Cheers
Michael

EMail: Mine is an /at/ gmx /dot/ de address.  
P: n/a

shaanxxx wrote:
Eric Sosman wrote:
>>shaanxxx wrote On 08/23/06 11:32,:
>>>I wanted to write hash function float or double.
Why?
I have developed following macro for float comparisions based on
limit.h
[...]
You still haven't explained why you want to hash
floatingpoint values. Using "approximate equality"
instead of "strict equality" is *not* going to make
FP keys work satisfactorily with hash tables.
What are you trying to do?

Eric Sosman es*****@acmdotorg.invalid  
P: n/a

Eric Sosman wrote:
shaanxxx wrote:
>Eric Sosman wrote:
>>shaanxxx wrote On 08/23/06 11:32,:
I wanted to write hash function float or double.
Why?
I have developed following macro for float comparisions based on limit.h [...]
You still haven't explained why you want to hash
floatingpoint values. Using "approximate equality"
instead of "strict equality" is *not* going to make
FP keys work satisfactorily with hash tables.
What are you trying to do?
A long time ago (1986!) I had a perfectly good case of hashing floating
point numbers: in a PROLOG implementation that used hashconsing, a
technique in which there is only *ONE* copy of any list. Hence we needed
say to find all lists beginning with 1.33.
Greetings,

Michel Bardiaux
R&D Director
T +32 [0] 2 790 29 41
F +32 [0] 2 790 29 02
E mailto:mb*******@mediaxim.be
Mediaxim NV/SA
Vorstlaan 191 Boulevard du Souverain
Brussel 1160 Bruxelles http://www.mediaxim.com/  
P: n/a

Michel Bardiaux wrote:
Eric Sosman wrote:
>> You still haven't explained why you want to hash floatingpoint values. Using "approximate equality" instead of "strict equality" is *not* going to make FP keys work satisfactorily with hash tables.
What are you trying to do?
A long time ago (1986!) I had a perfectly good case of hashing floating
point numbers: in a PROLOG implementation that used hashconsing, a
technique in which there is only *ONE* copy of any list. Hence we needed
say to find all lists beginning with 1.33.
The usual difficulty is that you may find all the lists
beginning with 1.33, but miss the lists beginning with 1.2+.13
and so on. That may be all right in some situations, but it's
usually a drawback  which is why I hope the O.P. will describe
his intentions more fully.

Eric Sosman es*****@acmdotorg.invalid  
P: n/a

You still haven't explained why you want to hash
floatingpoint values. Using "approximate equality"
instead of "strict equality" is *not* going to make
FP keys work satisfactorily with hash tables.
What are you trying to do?
you can implement logical join operator using
Nested loop join
Merge join
Hash join.
When you have floating point as join predicate, you cant hash join for
that.
I was thinking, if we have some good hash function, we can even use
hash join for floating
point.
Thanks
Shaan.  
P: n/a

Eric Sosman wrote:
shaanxxx wrote:
Eric Sosman wrote:
>shaanxxx wrote On 08/23/06 11:32,:
I wanted to write hash function float or double.
Why?
I have developed following macro for float comparisions based on
limit.h
[...]
You still haven't explained why you want to hash
floatingpoint values. Using "approximate equality"
instead of "strict equality" is *not* going to make
FP keys work satisfactorily with hash tables.
What are you trying to do?
You still haven't explained why you want to hash
floatingpoint values. Using "approximate equality"
instead of "strict equality" is *not* going to make
FP keys work satisfactorily with hash tables.
What are you trying to do?
you can implement logical join operator using
Nested loop join
Merge join
Hash join.
When you have floating point as join predicate, you cant use hash join
for
that.
I was thinking, if we can have some good hash function, we can even use
hash join for floating point.
Thanks
Shaan.  
P: n/a

Eric Sosman wrote:
shaanxxx wrote On 08/23/06 11:32,:
I wanted to write hash function float or double.
Why?
A hash is used to generate a map. Why can't the OP want a map from
floats?
Have you ever heard the advice "Never compare floating
point values for equality?" I'll admit that "never" is too
strong, but it's still true that you should very seldom
compare floatingpoint values for equality.
That's fine if you are computationally heavy and if you require certain
numerical properties. But that might easily not be the case.
Any suggestion would
be appreciated.
Imperfect but easy to do:
double d = sqrt(2.0);
unsigned char *p = (unsigned char*)&d;
/* Now compute a hash code for the array of
* values p[0] through p[sizeof d  1].
*/
It's imperfect because there may be floatingpoint values
that compare equal but have different bit patterns.
That's right  it can't do 0 so its wrong. Let's try this:
double d = ...;
int64_t m = INT64_C(0);
int e = 0, s;
figure out the Nans, and two infs and set s to 2, 3, and 4
respectively, otherwise:
s = d < 0;
if (s) d = d;
e = ceil (log (d));
m = (INT64_MAX + 1.0) * d * exp (e);
Now incrementally hash the 3 bits of s, the (16?) bits of e, and the 64
bits of m.
Figuring out the NaNs, and Infs are platform specific, though C99
apparently has some support for deducing them. In any event, equality
of keys will imply equality of the hash, and assumes you have accuracy
problems only in the last few bits (on most systems.) But you can
ignore the infs and Nans in most cases.
That solution is not very fast, but its meant to be somewhat portable.
If you are going to go with platform specific solutions, then you can
use the pointer hacking method suggested above, except you would
predetect all the aliased representations, and hash from those static
constants instead.

Paul Hsieh http://www.pobox.com/~qed/ http://bstring.sf.net/  
P: n/a

In article <11*********************@m79g2000cwm.googlegroups. com>,
shaanxxx <sh******@yahoo.com(the original poster) wrote:
>you can implement logical join operator using Nested loop join Merge join Hash join.
>When you have floating point as join predicate, you cant hash join for that. I was thinking, if we have some good hash function, we can even use hash join for floating point.
I'm not certain what you mean by "logical join operator". Is that
the same as the set union operator?

All is vanity.  Ecclesiastes  
P: n/a
 we******@gmail.com wrote On 08/24/06 11:15,:
>
That's right  it can't do 0 so its wrong. Let's try this:
double d = ...;
int64_t m = INT64_C(0);
int e = 0, s;
figure out the Nans, and two infs and set s to 2, 3, and 4
respectively, otherwise:
s = d < 0;
if (s) d = d;
e = ceil (log (d));
m = (INT64_MAX + 1.0) * d * exp (e);
Now incrementally hash the 3 bits of s, the (16?) bits of e, and the 64
bits of m.
Figuring out the NaNs, and Infs are platform specific, though C99
apparently has some support for deducing them. In any event, equality
of keys will imply equality of the hash, and assumes you have accuracy
problems only in the last few bits (on most systems.) But you can
ignore the infs and Nans in most cases.
Don't you need a special case for zero in the above?
Also, it makes me uneasy to use so many inexact calculations
in developing a hash code  both the transcendental functions
are necessarily inexact, and I don't think you can guarantee
that `INT64_MAX+1.0' will be accurate. (INT64_MAX probably has
no exact representation in double, so you need to rely on the
convertandadd being carried out at higher precision.)
But the overall approach seems promising. You could use
frexp() instead of all that horsing around with logarithms
and exponentials, and you could use ldexp() to scale the fraction
without worrying about inaccuracy in the multiplier. You'd get
something like
double d = ...;
int e;
double f = frexp(d, &e);
int64_t m = ldexp(f, DBL_MANT_DIG);
/* now hash e and m */
(This assumes FLT_RADIX==2; maybe replacing DBL_MANT_DIG with
63 would be a simple way to duck the issue.)
As before, NaNs and Infs need special handling  but
I think you've got a reasonably portable hash otherwise.
 Er*********@sun.com  
P: n/a

Walter Roberson wrote:
In article <11*********************@m79g2000cwm.googlegroups. com>,
shaanxxx <sh******@yahoo.com(the original poster) wrote:
you can implement logical join operator using
Nested loop join
Merge join
Hash join.
When you have floating point as join predicate, you cant hash join for
that.
I was thinking, if we have some good hash function, we can even use
hash join for floating
point.
I'm not certain what you mean by "logical join operator". Is that
the same as the set union operator?
join relational operator == selection(cross product)
That is not union operator.   This discussion thread is closed Replies have been disabled for this discussion.   Question stats  viewed: 9959
 replies: 12
 date asked: Aug 23 '06
