471,873 Members | 2,063 Online

# Hash function for float.

I wanted to write hash function float or double. Any suggestion would
be appreciated.

Aug 23 '06 #1
12 12025

shaanxxx wrote On 08/23/06 11:32,:
I wanted to write hash function float or double.
Why?

Have you ever heard the advice "Never compare floating-
point values for equality?" I'll admit that "never" is too
strong, but it's still true that you should very seldom
compare floating-point values for equality. The problem is
not with the values themselves, but with their sources: two
computations that "ought" to produce the same result may in
fact produce results that differ in a few low-order bits.

Consequently, if you are using floating-point values as
keys in a hash table -- a data structure that relies on the
notion of exact equality -- you're in trouble. Calculate
the square root of two as the key of some item and add it
to the table. Later on, calculate the square root of eight
and divide it by two (it "ought" to be sqrt(2), right?) and
search the hash table for an item with that key -- there's
an excellent chance you won't find the item.

Floating-point values make terrible hash table keys.
Any suggestion would
be appreciated.
Imperfect but easy to do:

double d = sqrt(2.0);
unsigned char *p = (unsigned char*)&d;
/* Now compute a hash code for the array of
* values p[0] through p[sizeof d - 1].
*/

It's imperfect because there may be floating-point values
that compare equal but have different bit patterns. Many
systems provide "plus zero" and "minus zero," which look
different but are equal. Some systems support "unnormalized"
numbers (not to be confused with "denormal" numbers), which
allows some values to be represented in several forms (just
as the square root of nine can be written as 3.0, 0.3e1,
0.03e2, ...). Finally, many systems provide "NaN" values
which never compare equal to anything, not even themselves,
so you may have two bit-for-bit identical values that still
aren't "==".

--
Er*********@sun.com

Aug 23 '06 #2

Eric Sosman wrote:
shaanxxx wrote On 08/23/06 11:32,:
I wanted to write hash function float or double.

Why?

Have you ever heard the advice "Never compare floating-
point values for equality?" I'll admit that "never" is too
strong, but it's still true that you should very seldom
compare floating-point values for equality. The problem is
not with the values themselves, but with their sources: two
computations that "ought" to produce the same result may in
fact produce results that differ in a few low-order bits.

Consequently, if you are using floating-point values as
keys in a hash table -- a data structure that relies on the
notion of exact equality -- you're in trouble. Calculate
the square root of two as the key of some item and add it
to the table. Later on, calculate the square root of eight
and divide it by two (it "ought" to be sqrt(2), right?) and
search the hash table for an item with that key -- there's
an excellent chance you won't find the item.

Floating-point values make terrible hash table keys.
Any suggestion would
be appreciated.

Imperfect but easy to do:

double d = sqrt(2.0);
unsigned char *p = (unsigned char*)&d;
/* Now compute a hash code for the array of
* values p[0] through p[sizeof d - 1].
*/

It's imperfect because there may be floating-point values
that compare equal but have different bit patterns. Many
systems provide "plus zero" and "minus zero," which look
different but are equal. Some systems support "unnormalized"
numbers (not to be confused with "denormal" numbers), which
allows some values to be represented in several forms (just
as the square root of nine can be written as 3.0, 0.3e1,
0.03e2, ...). Finally, many systems provide "NaN" values
which never compare equal to anything, not even themselves,
so you may have two bit-for-bit identical values that still
aren't "==".

--
Er*********@sun.com
I have developed following macro for float comparisions based on
limit.h

#define FLOAT_EPSILON(a,b) ( a b ? fabs(a) * FLT_EPSILON : fabs(b) *
FLT_EPSILON)
#define FLT_EQUAL(a, b) (fabs((a)-(b)) <= FLOAT_EPSILON(a,b))

Its working fine float comparision. I am wondering how long it will
work fine.
any suggestion ?

Aug 24 '06 #3
shaanxxx schrieb:
Eric Sosman wrote:
>>shaanxxx wrote On 08/23/06 11:32,:
>>>I wanted to write hash function float or double.

Why?

Have you ever heard the advice "Never compare floating-
point values for equality?" I'll admit that "never" is too
strong, but it's still true that you should very seldom
compare floating-point values for equality. The problem is
not with the values themselves, but with their sources: two
computations that "ought" to produce the same result may in
fact produce results that differ in a few low-order bits.

Consequently, if you are using floating-point values as
keys in a hash table -- a data structure that relies on the
notion of exact equality -- you're in trouble. Calculate
the square root of two as the key of some item and add it
to the table. Later on, calculate the square root of eight
and divide it by two (it "ought" to be sqrt(2), right?) and
search the hash table for an item with that key -- there's
an excellent chance you won't find the item.

Floating-point values make terrible hash table keys.
>>>Any suggestion would
be appreciated.

Imperfect but easy to do:

double d = sqrt(2.0);
unsigned char *p = (unsigned char*)&d;
/* Now compute a hash code for the array of
* values p[0] through p[sizeof d - 1].
*/

It's imperfect because there may be floating-point values
that compare equal but have different bit patterns. Many
systems provide "plus zero" and "minus zero," which look
different but are equal. Some systems support "unnormalized"
numbers (not to be confused with "denormal" numbers), which
allows some values to be represented in several forms (just
as the square root of nine can be written as 3.0, 0.3e1,
0.03e2, ...). Finally, many systems provide "NaN" values
which never compare equal to anything, not even themselves,
so you may have two bit-for-bit identical values that still
aren't "==".

I have developed following macro for float comparisions based on
limit.h

#define FLOAT_EPSILON(a,b) ( a b ? fabs(a) * FLT_EPSILON : fabs(b) *
FLT_EPSILON)
Forgets to take care of the sign. Try a=0.0F, b=-1e10...
#define FLT_EQUAL(a, b) (fabs((a)-(b)) <= FLOAT_EPSILON(a,b))
Consider
#define FLT_APPROX_EQUAL(a,b, eps) \
(2 * fabs((a) - (b)) <= (eps)*(fabs(a)+fabs(b)))
#define MY_FLT_EQUAL(a, b) \
FLT_APPROX_EQUAL(a,b,MY_EPSILON)
where you explicitly define MY_EPSILON.
Its working fine float comparision. I am wondering how long it will
work fine.
any suggestion ?
Yes: Be careful. FLT_EQUAL(a, b) and FLT_EQUAL(b, c) do
not imply FLT_EQUAL(a, c); this can be detrimental for the
efficiency of hashing as you become dependent on the order
in which you enter a, b, c in the table as you can either have
separate entries for a and c or the same.
If you know more about how the floating point numbers you are
trying to hash, there may be reliable equivalence classes.
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.
Aug 24 '06 #4
shaanxxx wrote:
Eric Sosman wrote:

>>shaanxxx wrote On 08/23/06 11:32,:
>>>I wanted to write hash function float or double.

Why?

I have developed following macro for float comparisions based on
limit.h
[...]
You still haven't explained why you want to hash
floating-point values. Using "approximate equality"
instead of "strict equality" is *not* going to make
F-P keys work satisfactorily with hash tables.

What are you trying to do?

--
Eric Sosman
es*****@acm-dot-org.invalid

Aug 24 '06 #5
Eric Sosman wrote:
shaanxxx wrote:
>Eric Sosman wrote:

>>shaanxxx wrote On 08/23/06 11:32,:

I wanted to write hash function float or double.

Why?

I have developed following macro for float comparisions based on
limit.h
[...]

You still haven't explained why you want to hash
floating-point values. Using "approximate equality"
instead of "strict equality" is *not* going to make
F-P keys work satisfactorily with hash tables.

What are you trying to do?
A long time ago (1986!) I had a perfectly good case of hashing floating
point numbers: in a PROLOG implementation that used hash-consing, a
technique in which there is only *ONE* copy of any list. Hence we needed
say to find all lists beginning with 1.33.

Greetings,
--
Michel Bardiaux
R&D Director
T +32 [0] 2 790 29 41
F +32 [0] 2 790 29 02
E mailto:mb*******@mediaxim.be

Mediaxim NV/SA
Vorstlaan 191 Boulevard du Souverain
Brussel 1160 Bruxelles
http://www.mediaxim.com/
Aug 24 '06 #6
Michel Bardiaux wrote:
Eric Sosman wrote:
>>
You still haven't explained why you want to hash
floating-point values. Using "approximate equality"
instead of "strict equality" is *not* going to make
F-P keys work satisfactorily with hash tables.

What are you trying to do?
A long time ago (1986!) I had a perfectly good case of hashing floating
point numbers: in a PROLOG implementation that used hash-consing, a
technique in which there is only *ONE* copy of any list. Hence we needed
say to find all lists beginning with 1.33.
The usual difficulty is that you may find all the lists
beginning with 1.33, but miss the lists beginning with 1.2+.13
and so on. That may be all right in some situations, but it's
usually a drawback -- which is why I hope the O.P. will describe
his intentions more fully.

--
Eric Sosman
es*****@acm-dot-org.invalid
Aug 24 '06 #7
You still haven't explained why you want to hash
floating-point values. Using "approximate equality"
instead of "strict equality" is *not* going to make
F-P keys work satisfactorily with hash tables.

What are you trying to do?

you can implement logical join operator using
Nested loop join
Merge join
Hash join.

When you have floating point as join predicate, you cant hash join for
that.
I was thinking, if we have some good hash function, we can even use
hash join for floating
point.
Thanks
Shaan.

Aug 24 '06 #8

Eric Sosman wrote:
shaanxxx wrote:
Eric Sosman wrote:

>shaanxxx wrote On 08/23/06 11:32,:

I wanted to write hash function float or double.

Why?
I have developed following macro for float comparisions based on
limit.h
[...]

You still haven't explained why you want to hash
floating-point values. Using "approximate equality"
instead of "strict equality" is *not* going to make
F-P keys work satisfactorily with hash tables.

What are you trying to do?
You still haven't explained why you want to hash
floating-point values. Using "approximate equality"
instead of "strict equality" is *not* going to make
F-P keys work satisfactorily with hash tables.
What are you trying to do?
you can implement logical join operator using
Nested loop join
Merge join
Hash join.

When you have floating point as join predicate, you cant use hash join
for
that.
I was thinking, if we can have some good hash function, we can even use

hash join for floating point.

Thanks
Shaan.

Aug 24 '06 #9
Eric Sosman wrote:
shaanxxx wrote On 08/23/06 11:32,:
I wanted to write hash function float or double.

Why?
A hash is used to generate a map. Why can't the OP want a map from
floats?
Have you ever heard the advice "Never compare floating-
point values for equality?" I'll admit that "never" is too
strong, but it's still true that you should very seldom
compare floating-point values for equality.
That's fine if you are computationally heavy and if you require certain
numerical properties. But that might easily not be the case.
Any suggestion would
be appreciated.

Imperfect but easy to do:

double d = sqrt(2.0);
unsigned char *p = (unsigned char*)&d;
/* Now compute a hash code for the array of
* values p[0] through p[sizeof d - 1].
*/

It's imperfect because there may be floating-point values
that compare equal but have different bit patterns.
That's right -- it can't do 0 so its wrong. Let's try this:

double d = ...;

int64_t m = INT64_C(0);
int e = 0, s;
figure out the Nans, and two infs and set s to 2, 3, and 4
respectively, otherwise:
s = d < 0;
if (s) d = -d;
e = ceil (log (d));
m = (INT64_MAX + 1.0) * d * exp (-e);
Now incrementally hash the 3 bits of s, the (16?) bits of e, and the 64
bits of m.

Figuring out the NaNs, and Infs are platform specific, though C99
apparently has some support for deducing them. In any event, equality
of keys will imply equality of the hash, and assumes you have accuracy
problems only in the last few bits (on most systems.) But you can
ignore the infs and Nans in most cases.

That solution is not very fast, but its meant to be somewhat portable.
If you are going to go with platform specific solutions, then you can
use the pointer hacking method suggested above, except you would
predetect all the aliased representations, and hash from those static

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Aug 24 '06 #10
shaanxxx <sh******@yahoo.com(the original poster) wrote:
>you can implement logical join operator using
Nested loop join
Merge join
Hash join.
>When you have floating point as join predicate, you cant hash join for
that.
I was thinking, if we have some good hash function, we can even use
hash join for floating
point.
I'm not certain what you mean by "logical join operator". Is that
the same as the set union operator?
--
All is vanity. -- Ecclesiastes
Aug 24 '06 #11

we******@gmail.com wrote On 08/24/06 11:15,:
>
That's right -- it can't do 0 so its wrong. Let's try this:

double d = ...;

int64_t m = INT64_C(0);
int e = 0, s;
figure out the Nans, and two infs and set s to 2, 3, and 4
respectively, otherwise:
s = d < 0;
if (s) d = -d;
e = ceil (log (d));
m = (INT64_MAX + 1.0) * d * exp (-e);
Now incrementally hash the 3 bits of s, the (16?) bits of e, and the 64
bits of m.

Figuring out the NaNs, and Infs are platform specific, though C99
apparently has some support for deducing them. In any event, equality
of keys will imply equality of the hash, and assumes you have accuracy
problems only in the last few bits (on most systems.) But you can
ignore the infs and Nans in most cases.
Don't you need a special case for zero in the above?

Also, it makes me uneasy to use so many inexact calculations
in developing a hash code -- both the transcendental functions
are necessarily inexact, and I don't think you can guarantee
that `INT64_MAX+1.0' will be accurate. (INT64_MAX probably has
no exact representation in double, so you need to rely on the
convert-and-add being carried out at higher precision.)

But the overall approach seems promising. You could use
frexp() instead of all that horsing around with logarithms
and exponentials, and you could use ldexp() to scale the fraction
without worrying about inaccuracy in the multiplier. You'd get
something like

double d = ...;
int e;
double f = frexp(d, &e);
int64_t m = ldexp(f, DBL_MANT_DIG);
/* now hash e and m */

(This assumes FLT_RADIX==2; maybe replacing DBL_MANT_DIG with
63 would be a simple way to duck the issue.)

As before, NaNs and Infs need special handling -- but
I think you've got a reasonably portable hash otherwise.

--
Er*********@sun.com

Aug 24 '06 #12

Walter Roberson wrote:
shaanxxx <sh******@yahoo.com(the original poster) wrote:
you can implement logical join operator using
Nested loop join
Merge join
Hash join.
When you have floating point as join predicate, you cant hash join for
that.
I was thinking, if we have some good hash function, we can even use
hash join for floating
point.

I'm not certain what you mean by "logical join operator". Is that
the same as the set union operator?
join relational operator == selection(cross product)
That is not union operator.

Aug 25 '06 #13

### This discussion thread is closed

Replies have been disabled for this discussion.