471,350 Members | 1,951 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,350 software developers and data experts.

Generate hash code

Hi,

How do I write a GenerateHashcode function that will generate guaranteed
unique hashcodes for my classes?

cheers,
mortb
Jan 18 '06 #1
8 13096
If you mean unique per instance, then the only guaranteed way that I know of
is to have something unique in each instance - a simple static int counter
(along with an instance field) would suffice, where each instance (in the
instance ctor) gets the next value and increments the static counter (pref.
using thread-safe increment).

*however* Assuming your hash would be Int32, you are then limited to 2^32
objects - otherwise then can't be unique! (assuming no complex code to
recycle numbers)

You could implement this either as an abstract base class, but then your
*total* instances (of subtypes of this) would be limited to 2^32.

Is it actually required that they be unique? Although not unknown, this is
not (to my mind) recommended usage for a hash value?

Marc
Jan 18 '06 #2
As Marc alluded to, you might be better off looking at variations of
Guid.NewGuid
to ensure "Absolute" uniqeness. Unless of course you are travelling to Pluto
on the probe, in which case there's still a very remote possibilty there
could be a duplicate before you arrive there...
Peter

--
Co-founder, Eggheadcafe.com developer portal:
http://www.eggheadcafe.com
UnBlog:
http://petesbloggerama.blogspot.com


"mortb" wrote:
Hi,

How do I write a GenerateHashcode function that will generate guaranteed
unique hashcodes for my classes?

cheers,
mortb

Jan 18 '06 #3
I've thought a bit about the issue with hashcodes.

I use hashtables and I suppose that they use the GetHashcode function to
determine the identity of the objects stored. Then I thought that the
GetHashcode would have to return an unique number for every possible object
if I am to use it as key in the hashtable.

I've seen a few implementations on the internet looking like this:

Class C
{
public int propertyA;
public int propertyB;

public override GetHashcode()
{
return propertyA.GetHashcode ^ propertyB.GetHashcode;
}
}

Whould the bitwize OR generate unique hashcodes for objects with all
possible combinations in propertyA and propertyB?

In my current application there are several types of objects that inherit
from a base class that we've written. They are stored in a database and have
a primary key combined from two values in the database. The two values are
object id and version number.Then they have sveral other properties such as
name, createdByUserId, createTimeStamp, updateTimeStamp.

My guess is that I could write a GetHashcode function in the base class
looking like this:

GetHashcode()
{
return (typeof(this)).GetHashcode() ^ this.objectId ^
this.versionNumber();
}

Would this be a good implementation?
Are there any resources which sumarize how to quickly write *good*
GetHashcode() functions?

thanks for the help!
/mortb

"Marc Gravell" <mg******@rm.com> wrote in message
news:%2****************@TK2MSFTNGP11.phx.gbl...
If you mean unique per instance, then the only guaranteed way that I know
of is to have something unique in each instance - a simple static int
counter (along with an instance field) would suffice, where each instance
(in the instance ctor) gets the next value and increments the static
counter (pref. using thread-safe increment).

*however* Assuming your hash would be Int32, you are then limited to 2^32
objects - otherwise then can't be unique! (assuming no complex code to
recycle numbers)

You could implement this either as an abstract base class, but then your
*total* instances (of subtypes of this) would be limited to 2^32.

Is it actually required that they be unique? Although not unknown, this is
not (to my mind) recommended usage for a hash value?

Marc

Jan 18 '06 #4

mortb wrote:
I've thought a bit about the issue with hashcodes.

I use hashtables and I suppose that they use the GetHashcode function to
determine the identity of the objects stored.
Well, to determine a hash key for the objects, rather than the unique
identity of the objects.
Then I thought that the
GetHashcode would have to return an unique number for every possible object
if I am to use it as key in the hashtable.
No; the point of a hash function is to provide a hash key that will
*usually* be different. Hash collisions aren't the end of the world.
Suppose I have a hashtable that stores books, using the 4th digit of
the ISBN as a hash key. If I get lucky, I will be able to store 10
books before getting a hash collision. But at some point, my hash
buckets are going to start having more than one book in. That's not an
error, it's part of the design of a hashtable. The point is that by
storing the books in ten different buckets, the list of books I have to
search to find a particular one is typically ten times shorter than if
I just kept them all in one list. That's the point of a hash table.

If hashcodes had to be unique, then no hashtable could ever store more
than (the number of possible hashcodes) items, which wouldn't be
useful; and furthermore every object that ever wanted to be stored in a
hashtable would somehow have to obtain a unique hashcode, which would
be extremely unpleasant.

I've seen a few implementations on the internet looking like this:

Class C
{
public int propertyA;
public int propertyB;

public override GetHashcode()
{
return propertyA.GetHashcode ^ propertyB.GetHashcode;
}
}

Whould the bitwize OR generate unique hashcodes for objects with all
possible combinations in propertyA and propertyB?
It wouldn't; the purpose is to create some number that partakes of the
hashcodes for both propertyA and propertyB.

In my current application there are several types of objects that inherit
from a base class that we've written. They are stored in a database and have
a primary key combined from two values in the database. The two values are
object id and version number.Then they have sveral other properties such as
name, createdByUserId, createTimeStamp, updateTimeStamp.

My guess is that I could write a GetHashcode function in the base class
looking like this:

GetHashcode()
{
return (typeof(this)).GetHashcode() ^ this.objectId ^
this.versionNumber();
}

Would this be a good implementation?
Sure.
Are there any resources which sumarize how to quickly write *good*
GetHashcode() functions?


Not that I've seen but I'm sure someone has, so watch this space.

--
Larry Lard
Replies to group please

Jan 18 '06 #5

"mortb" <mo****@hotmail.com> wrote in message
news:ex**************@TK2MSFTNGP10.phx.gbl...
I've thought a bit about the issue with hashcodes.
.... I've seen a few implementations on the internet looking like this:

Class C
{
public int propertyA;
public int propertyB;

public override GetHashcode()
{
return propertyA.GetHashcode ^ propertyB.GetHashcode;
}
}

Whould the bitwize OR generate unique hashcodes for objects with all
possible combinations in propertyA and propertyB?

....

No, it would not. Btw, it not bitwise or, it's XOR (exclusive or). For
example, 1 ^ 0 yields the same result as 0 ^ 1. If hashcode is 32 bit wide
int, there are 2^32 possible hashcodes while there are 2^32 * 2^32 (2 ^ 64)
possible combinations of 2 ints. So they must collide. But, hashcode does
not have to be unique. You can even do:

public override int GetHashcode()
{
return 0;
}

and HashTable should be working ok (with the performance about the same as
simple list). Though I don't know what for you need hash.

Regards,
Goran
Jan 18 '06 #6
The hashcode is not the same as the key; the hashcode is just used to
partition the items (fairly arbitrarily) - so that it can jump to a subset
of the larger list before doing any real work. This is why IEqualityComparer
requests both an equality method and a hash method.

I've never written a hashtable from scratch, but I *believe* it tends to
work something like the following:

* The hashtable has a number of buckets; lets say 20 (although this is often
dynamic based on the amount of data contained)
* When adding items, it gets the hashcode (lets say it return 342545), and
then takes the mod (remainder on division) of this with the bucket count to
choose a bucket (bucket 5 in this case)
* When you query the hashtable, it
1: gets the hashcode of your argument and does the same operation to
pick the bucket to look within
2: scans the contents of this bucket (a much shorter list), possibly
checking Equals, or possibly first checking the hash of each item and then
checking Equals if the hash matched (i.e. something in bucket 5 could have a
hash of 342545, or it could have a hash of 25)

This raises another implementation point:

If 2 instances share a hash-code, it is *not* necessary for them to count as
equal (via Equals)
HOWEVER: if two instances *should* count as equal, then the hash codes
should be identical

Of course, if you use any of the inbuilt primatives (int, string, etc) for
your key you will be pretty safe

Marc
Jan 18 '06 #7
Thanks Marc, Larry, Goran and Peter for explaining things to me!

I conclude:

* GetHashcode has nothing to do whith uniqueness except that two identical
objects should generate the same hashcode. Instead GetHashcode is used to
provide a good spreeding in the hashtable. It should just do that and do it
quick.

* Hash table only allows one unique key. The key's uniqueness in
Hashtable.Add(key, object) is not determined - as I thougth - by using the
GetHashcode function. I suppose it uses object.Equals(x) or its override.

What think I should do is to override both GetHashcode and Equals(x) :)

thanks!
mortb

"Peter Bromberg [C# MVP]" <pb*******@yahoo.nospammin.com> wrote in message
news:DF**********************************@microsof t.com...
As Marc alluded to, you might be better off looking at variations of
Guid.NewGuid
to ensure "Absolute" uniqeness. Unless of course you are travelling to
Pluto
on the probe, in which case there's still a very remote possibilty there
could be a duplicate before you arrive there...
Peter

--
Co-founder, Eggheadcafe.com developer portal:
http://www.eggheadcafe.com
UnBlog:
http://petesbloggerama.blogspot.com


"mortb" wrote:
Hi,

How do I write a GenerateHashcode function that will generate guaranteed
unique hashcodes for my classes?

cheers,
mortb

Jan 18 '06 #8
To have guaranteed unique IDs, he needs a static counter and use this
(incrementing it for each new object, thread-safe, storing the "current"
value persistently if the objects are stored and need their ID stored also).

It's not a hash, but a hash is usually not guaranteed to be unique.

Unless for a counter (with sufficient range!), there's no uniqueness in the
world by any random value.

Ch.

"Peter Bromberg [C# MVP]" <pb*******@yahoo.nospammin.com> wrote in message
news:DF**********************************@microsof t.com...
As Marc alluded to, you might be better off looking at variations of
Guid.NewGuid
to ensure "Absolute" uniqeness. Unless of course you are travelling to
Pluto
on the probe, in which case there's still a very remote possibilty there
could be a duplicate before you arrive there...
Peter

--
Co-founder, Eggheadcafe.com developer portal:
http://www.eggheadcafe.com
UnBlog:
http://petesbloggerama.blogspot.com


"mortb" wrote:
Hi,

How do I write a GenerateHashcode function that will generate guaranteed
unique hashcodes for my classes?

cheers,
mortb

Jan 18 '06 #9

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

16 posts views Thread by dutche | last post: by
12 posts views Thread by Jim Michaels | last post: by
reply views Thread by rayapati | last post: by
9 posts views Thread by Omatase | last post: by
15 posts views Thread by Ashish Khandelwal | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.