473,397 Members | 1,972 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,397 software developers and data experts.

Hash function for float.

I wanted to write hash function float or double. Any suggestion would
be appreciated.

Aug 23 '06 #1
12 12392


shaanxxx wrote On 08/23/06 11:32,:
I wanted to write hash function float or double.
Why?

Have you ever heard the advice "Never compare floating-
point values for equality?" I'll admit that "never" is too
strong, but it's still true that you should very seldom
compare floating-point values for equality. The problem is
not with the values themselves, but with their sources: two
computations that "ought" to produce the same result may in
fact produce results that differ in a few low-order bits.

Consequently, if you are using floating-point values as
keys in a hash table -- a data structure that relies on the
notion of exact equality -- you're in trouble. Calculate
the square root of two as the key of some item and add it
to the table. Later on, calculate the square root of eight
and divide it by two (it "ought" to be sqrt(2), right?) and
search the hash table for an item with that key -- there's
an excellent chance you won't find the item.

Floating-point values make terrible hash table keys.
Any suggestion would
be appreciated.
Imperfect but easy to do:

double d = sqrt(2.0);
unsigned char *p = (unsigned char*)&d;
/* Now compute a hash code for the array of
* values p[0] through p[sizeof d - 1].
*/

It's imperfect because there may be floating-point values
that compare equal but have different bit patterns. Many
systems provide "plus zero" and "minus zero," which look
different but are equal. Some systems support "unnormalized"
numbers (not to be confused with "denormal" numbers), which
allows some values to be represented in several forms (just
as the square root of nine can be written as 3.0, 0.3e1,
0.03e2, ...). Finally, many systems provide "NaN" values
which never compare equal to anything, not even themselves,
so you may have two bit-for-bit identical values that still
aren't "==".

--
Er*********@sun.com

Aug 23 '06 #2

Eric Sosman wrote:
shaanxxx wrote On 08/23/06 11:32,:
I wanted to write hash function float or double.

Why?

Have you ever heard the advice "Never compare floating-
point values for equality?" I'll admit that "never" is too
strong, but it's still true that you should very seldom
compare floating-point values for equality. The problem is
not with the values themselves, but with their sources: two
computations that "ought" to produce the same result may in
fact produce results that differ in a few low-order bits.

Consequently, if you are using floating-point values as
keys in a hash table -- a data structure that relies on the
notion of exact equality -- you're in trouble. Calculate
the square root of two as the key of some item and add it
to the table. Later on, calculate the square root of eight
and divide it by two (it "ought" to be sqrt(2), right?) and
search the hash table for an item with that key -- there's
an excellent chance you won't find the item.

Floating-point values make terrible hash table keys.
Any suggestion would
be appreciated.

Imperfect but easy to do:

double d = sqrt(2.0);
unsigned char *p = (unsigned char*)&d;
/* Now compute a hash code for the array of
* values p[0] through p[sizeof d - 1].
*/

It's imperfect because there may be floating-point values
that compare equal but have different bit patterns. Many
systems provide "plus zero" and "minus zero," which look
different but are equal. Some systems support "unnormalized"
numbers (not to be confused with "denormal" numbers), which
allows some values to be represented in several forms (just
as the square root of nine can be written as 3.0, 0.3e1,
0.03e2, ...). Finally, many systems provide "NaN" values
which never compare equal to anything, not even themselves,
so you may have two bit-for-bit identical values that still
aren't "==".

--
Er*********@sun.com
I have developed following macro for float comparisions based on
limit.h

#define FLOAT_EPSILON(a,b) ( a b ? fabs(a) * FLT_EPSILON : fabs(b) *
FLT_EPSILON)
#define FLT_EQUAL(a, b) (fabs((a)-(b)) <= FLOAT_EPSILON(a,b))

Its working fine float comparision. I am wondering how long it will
work fine.
any suggestion ?

Aug 24 '06 #3
shaanxxx schrieb:
Eric Sosman wrote:
>>shaanxxx wrote On 08/23/06 11:32,:
>>>I wanted to write hash function float or double.

Why?

Have you ever heard the advice "Never compare floating-
point values for equality?" I'll admit that "never" is too
strong, but it's still true that you should very seldom
compare floating-point values for equality. The problem is
not with the values themselves, but with their sources: two
computations that "ought" to produce the same result may in
fact produce results that differ in a few low-order bits.

Consequently, if you are using floating-point values as
keys in a hash table -- a data structure that relies on the
notion of exact equality -- you're in trouble. Calculate
the square root of two as the key of some item and add it
to the table. Later on, calculate the square root of eight
and divide it by two (it "ought" to be sqrt(2), right?) and
search the hash table for an item with that key -- there's
an excellent chance you won't find the item.

Floating-point values make terrible hash table keys.
>>>Any suggestion would
be appreciated.

Imperfect but easy to do:

double d = sqrt(2.0);
unsigned char *p = (unsigned char*)&d;
/* Now compute a hash code for the array of
* values p[0] through p[sizeof d - 1].
*/

It's imperfect because there may be floating-point values
that compare equal but have different bit patterns. Many
systems provide "plus zero" and "minus zero," which look
different but are equal. Some systems support "unnormalized"
numbers (not to be confused with "denormal" numbers), which
allows some values to be represented in several forms (just
as the square root of nine can be written as 3.0, 0.3e1,
0.03e2, ...). Finally, many systems provide "NaN" values
which never compare equal to anything, not even themselves,
so you may have two bit-for-bit identical values that still
aren't "==".

I have developed following macro for float comparisions based on
limit.h

#define FLOAT_EPSILON(a,b) ( a b ? fabs(a) * FLT_EPSILON : fabs(b) *
FLT_EPSILON)
Forgets to take care of the sign. Try a=0.0F, b=-1e10...
#define FLT_EQUAL(a, b) (fabs((a)-(b)) <= FLOAT_EPSILON(a,b))
Consider
#define FLT_APPROX_EQUAL(a,b, eps) \
(2 * fabs((a) - (b)) <= (eps)*(fabs(a)+fabs(b)))
instead, with
#define MY_FLT_EQUAL(a, b) \
FLT_APPROX_EQUAL(a,b,MY_EPSILON)
where you explicitly define MY_EPSILON.
Its working fine float comparision. I am wondering how long it will
work fine.
any suggestion ?
Yes: Be careful. FLT_EQUAL(a, b) and FLT_EQUAL(b, c) do
not imply FLT_EQUAL(a, c); this can be detrimental for the
efficiency of hashing as you become dependent on the order
in which you enter a, b, c in the table as you can either have
separate entries for a and c or the same.
If you know more about how the floating point numbers you are
trying to hash, there may be reliable equivalence classes.
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.
Aug 24 '06 #4
shaanxxx wrote:
Eric Sosman wrote:

>>shaanxxx wrote On 08/23/06 11:32,:
>>>I wanted to write hash function float or double.

Why?

I have developed following macro for float comparisions based on
limit.h
[...]
You still haven't explained why you want to hash
floating-point values. Using "approximate equality"
instead of "strict equality" is *not* going to make
F-P keys work satisfactorily with hash tables.

What are you trying to do?

--
Eric Sosman
es*****@acm-dot-org.invalid

Aug 24 '06 #5
Eric Sosman wrote:
shaanxxx wrote:
>Eric Sosman wrote:

>>shaanxxx wrote On 08/23/06 11:32,:

I wanted to write hash function float or double.

Why?

I have developed following macro for float comparisions based on
limit.h
[...]

You still haven't explained why you want to hash
floating-point values. Using "approximate equality"
instead of "strict equality" is *not* going to make
F-P keys work satisfactorily with hash tables.

What are you trying to do?
A long time ago (1986!) I had a perfectly good case of hashing floating
point numbers: in a PROLOG implementation that used hash-consing, a
technique in which there is only *ONE* copy of any list. Hence we needed
say to find all lists beginning with 1.33.

Greetings,
--
Michel Bardiaux
R&D Director
T +32 [0] 2 790 29 41
F +32 [0] 2 790 29 02
E mailto:mb*******@mediaxim.be

Mediaxim NV/SA
Vorstlaan 191 Boulevard du Souverain
Brussel 1160 Bruxelles
http://www.mediaxim.com/
Aug 24 '06 #6
Michel Bardiaux wrote:
Eric Sosman wrote:
>>
You still haven't explained why you want to hash
floating-point values. Using "approximate equality"
instead of "strict equality" is *not* going to make
F-P keys work satisfactorily with hash tables.

What are you trying to do?
A long time ago (1986!) I had a perfectly good case of hashing floating
point numbers: in a PROLOG implementation that used hash-consing, a
technique in which there is only *ONE* copy of any list. Hence we needed
say to find all lists beginning with 1.33.
The usual difficulty is that you may find all the lists
beginning with 1.33, but miss the lists beginning with 1.2+.13
and so on. That may be all right in some situations, but it's
usually a drawback -- which is why I hope the O.P. will describe
his intentions more fully.

--
Eric Sosman
es*****@acm-dot-org.invalid
Aug 24 '06 #7
You still haven't explained why you want to hash
floating-point values. Using "approximate equality"
instead of "strict equality" is *not* going to make
F-P keys work satisfactorily with hash tables.

What are you trying to do?

you can implement logical join operator using
Nested loop join
Merge join
Hash join.

When you have floating point as join predicate, you cant hash join for
that.
I was thinking, if we have some good hash function, we can even use
hash join for floating
point.
Thanks
Shaan.

Aug 24 '06 #8

Eric Sosman wrote:
shaanxxx wrote:
Eric Sosman wrote:

>shaanxxx wrote On 08/23/06 11:32,:

I wanted to write hash function float or double.

Why?
I have developed following macro for float comparisions based on
limit.h
[...]

You still haven't explained why you want to hash
floating-point values. Using "approximate equality"
instead of "strict equality" is *not* going to make
F-P keys work satisfactorily with hash tables.

What are you trying to do?
You still haven't explained why you want to hash
floating-point values. Using "approximate equality"
instead of "strict equality" is *not* going to make
F-P keys work satisfactorily with hash tables.
What are you trying to do?
you can implement logical join operator using
Nested loop join
Merge join
Hash join.

When you have floating point as join predicate, you cant use hash join
for
that.
I was thinking, if we can have some good hash function, we can even use

hash join for floating point.

Thanks
Shaan.

Aug 24 '06 #9
Eric Sosman wrote:
shaanxxx wrote On 08/23/06 11:32,:
I wanted to write hash function float or double.

Why?
A hash is used to generate a map. Why can't the OP want a map from
floats?
Have you ever heard the advice "Never compare floating-
point values for equality?" I'll admit that "never" is too
strong, but it's still true that you should very seldom
compare floating-point values for equality.
That's fine if you are computationally heavy and if you require certain
numerical properties. But that might easily not be the case.
Any suggestion would
be appreciated.

Imperfect but easy to do:

double d = sqrt(2.0);
unsigned char *p = (unsigned char*)&d;
/* Now compute a hash code for the array of
* values p[0] through p[sizeof d - 1].
*/

It's imperfect because there may be floating-point values
that compare equal but have different bit patterns.
That's right -- it can't do 0 so its wrong. Let's try this:

double d = ...;

int64_t m = INT64_C(0);
int e = 0, s;
figure out the Nans, and two infs and set s to 2, 3, and 4
respectively, otherwise:
s = d < 0;
if (s) d = -d;
e = ceil (log (d));
m = (INT64_MAX + 1.0) * d * exp (-e);
Now incrementally hash the 3 bits of s, the (16?) bits of e, and the 64
bits of m.

Figuring out the NaNs, and Infs are platform specific, though C99
apparently has some support for deducing them. In any event, equality
of keys will imply equality of the hash, and assumes you have accuracy
problems only in the last few bits (on most systems.) But you can
ignore the infs and Nans in most cases.

That solution is not very fast, but its meant to be somewhat portable.
If you are going to go with platform specific solutions, then you can
use the pointer hacking method suggested above, except you would
predetect all the aliased representations, and hash from those static
constants instead.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Aug 24 '06 #10
In article <11*********************@m79g2000cwm.googlegroups. com>,
shaanxxx <sh******@yahoo.com(the original poster) wrote:
>you can implement logical join operator using
Nested loop join
Merge join
Hash join.
>When you have floating point as join predicate, you cant hash join for
that.
I was thinking, if we have some good hash function, we can even use
hash join for floating
point.
I'm not certain what you mean by "logical join operator". Is that
the same as the set union operator?
--
All is vanity. -- Ecclesiastes
Aug 24 '06 #11


we******@gmail.com wrote On 08/24/06 11:15,:
>
That's right -- it can't do 0 so its wrong. Let's try this:

double d = ...;

int64_t m = INT64_C(0);
int e = 0, s;
figure out the Nans, and two infs and set s to 2, 3, and 4
respectively, otherwise:
s = d < 0;
if (s) d = -d;
e = ceil (log (d));
m = (INT64_MAX + 1.0) * d * exp (-e);
Now incrementally hash the 3 bits of s, the (16?) bits of e, and the 64
bits of m.

Figuring out the NaNs, and Infs are platform specific, though C99
apparently has some support for deducing them. In any event, equality
of keys will imply equality of the hash, and assumes you have accuracy
problems only in the last few bits (on most systems.) But you can
ignore the infs and Nans in most cases.
Don't you need a special case for zero in the above?

Also, it makes me uneasy to use so many inexact calculations
in developing a hash code -- both the transcendental functions
are necessarily inexact, and I don't think you can guarantee
that `INT64_MAX+1.0' will be accurate. (INT64_MAX probably has
no exact representation in double, so you need to rely on the
convert-and-add being carried out at higher precision.)

But the overall approach seems promising. You could use
frexp() instead of all that horsing around with logarithms
and exponentials, and you could use ldexp() to scale the fraction
without worrying about inaccuracy in the multiplier. You'd get
something like

double d = ...;
int e;
double f = frexp(d, &e);
int64_t m = ldexp(f, DBL_MANT_DIG);
/* now hash e and m */

(This assumes FLT_RADIX==2; maybe replacing DBL_MANT_DIG with
63 would be a simple way to duck the issue.)

As before, NaNs and Infs need special handling -- but
I think you've got a reasonably portable hash otherwise.

--
Er*********@sun.com

Aug 24 '06 #12

Walter Roberson wrote:
In article <11*********************@m79g2000cwm.googlegroups. com>,
shaanxxx <sh******@yahoo.com(the original poster) wrote:
you can implement logical join operator using
Nested loop join
Merge join
Hash join.
When you have floating point as join predicate, you cant hash join for
that.
I was thinking, if we have some good hash function, we can even use
hash join for floating
point.

I'm not certain what you mean by "logical join operator". Is that
the same as the set union operator?
join relational operator == selection(cross product)
That is not union operator.

Aug 25 '06 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Markus Dehmann | last post by:
I have a class "Data" and I store Data pointers in an STL set. But I have millions of inserts and many more lookups, and my profiler found that they cost a lot of runtime. Therefore, I want to...
2
by: Bryan Olson | last post by:
The current Python standard library provides two cryptographic hash functions: MD5 and SHA-1 . The authors of MD5 originally stated: It is conjectured that it is computationally infeasible to...
14
by: al | last post by:
Here's what I am trying to write a simple hash table: struct Employee { char name; int id; char department; float salary; };
19
by: anguo | last post by:
i find in many hash function use 5381,for exampla: static constmap_hash hash(char *pchData, int iLen) { unsigned char cBuf; constmap_hash ulHashId; ulHashId = 5381; while (iLen > 0) { cBuf =...
12
by: Arash Partow | last post by:
Hi all, I've ported various hash functions to python if anyone is interested: def RSHash(key): a = 378551 b = 63689 hash = 0
21
by: Johan Tibell | last post by:
I would be grateful if someone had a minute or two to review my hash table implementation. It's not yet commented but hopefully it's short and idiomatic enough to be readable. Some of the code...
21
by: Hallvard B Furuseth | last post by:
Is the code below valid? Generally a value must be accessed through the same type it was stored as, but there is an exception for data stored through a character type. I'm not sure if that...
139
by: ravi | last post by:
Hi can anybody tell me that which ds will be best suited to implement a hash table in C/C++ thanx. in advanced
5
by: abir | last post by:
Hi, I want a user defined key for tr1 unordered_map. My classes are, template<typename T> struct work{ int count; work(int count) : count(count){} }; template<typename W>
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.