469,362 Members | 2,470 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,362 developers. It's quick & easy.

about Equals and GetHashCode

Hello!

I can't figure out what point it is to use GetHashCode. I know that this
GetHashCode is used for obtaining a unique integer
value.
Can somebody give me an example that prove the usefulness of this
GetHashCode or it it of no use at all?

A mean that if I get an integer from current time in some way what should I
use it for?

One more question should these Equals and GetHashCode be used in pair?

//Tony

Jun 27 '08 #1
28 3546
No; GetHashCode() does not guarantee uniqueness - for starters, it is
an Int32 - it would take a trivial example involving Int64 and i++ to
show that they aren't unique.

The point is that if a.GetHashCode() and b.GetHashCode() are
*different* then a nd b definitely aren't the same. If the hashcodes
are the same, then you need to check a and b. This provides an
efficient way of narrowing down the amount of things we actually need
to check, by first excluding everything that doesn't have the same
hashcode,
leaving a much smaller set.
Further, advanced structures such as hash-tables can use the hash-code
to create "buckets" - so, for example, based on the number of items,
it might decide to split it into 20 "buckets" by taking the modulo of
the hashcode - i.e. each item goes into the bucket with index
GetHashCode() % bucketCount.

Then, to find a given item, it can do:
* get the haschode of the search key
* calculate the modulo of that hashcode - we now know which bucket to
look in [down to 1/20th of the data already]
* look in that bucket for everything with the correct hashcode
[probably down to 1 or 2 items now]
* perform the actual Equals between the search key and the stored
items, to find the one we want

So: GetHashCode() is very rarely used without Equals() [although
Equals() can be used without GetHashCode()]; it is definitely very
important, and underpins all the fast lookups from Dictionary<,>, etc.
Linear search (testing each item) would be very, very slow on a long
list.

Marc
Jun 27 '08 #2
(for info, Dictionary<,and HashTable actually use a more
sophisticated algorithm - but that gives an overall picture of how a
hashcode might be used in theory)
Jun 27 '08 #3
If you override GetHashCode you should also override Equals, to ensure that
when two objects are equal their hashcodes are the same as well. As for an
example of its use, it is used in Hashtable and Dictionary<,>

--
Pete
=========================================
I use Enterprise Core Objects (Domain driven design)
http://www.capableobjects.com/
=========================================
Jun 27 '08 #4
Hello!

When I override the Equals method I should also overide the GetHashCode
according to others and all documentaion that I have read but why?
I mean in the Equals method I implement what I mean to say that these two
object are equals I can't see
any point what the GetHashCode can add to my implementation of the Equals
method?

I mean if I for example have a Person with an integer of age in this class
Person.
I just check if the ages in these two object are equal if so then return
true else false from the Equals method.
But here if I also override the GetHashCode what can this method add to this
checking if the two ages in my two
objects are equal or not?
//Tony

"Marc Gravell" <ma**********@gmail.comskrev i meddelandet
news:f7**********************************@s50g2000 hsb.googlegroups.com...
(for info, Dictionary<,and HashTable actually use a more
sophisticated algorithm - but that gives an overall picture of how a
hashcode might be used in theory)


Jun 27 '08 #5
On Jun 12, 4:23 pm, "Tony Johansson" <johansson.anders...@telia.com>
wrote:
When I override the Equals method I should also overide the GetHashCode
according to others and all documentaion that I have read but why?
I mean in the Equals method I implement what I mean to say that these two
object are equals I can't see
any point what the GetHashCode can add to my implementation of the Equals
method?
Two objects which are considered equal should return the same hash
code. If they do, that type can be used in a hashtable (such as
Dictionary). If they don't, a hash lookup will (potentially) fail for
equal keys, which violates the point of a hashtable.
I mean if I for example have a Person with an integer of age in this class
Person.
I just check if the ages in these two object are equal if so then return
true else false from the Equals method.
But here if I also override the GetHashCode what can this method add to this
checking if the two ages in my two
objects are equal or not?
If Equals is just checking the person's age, then returning the age as
the hash code is a reasonable start.

Generally you build a hash up from the values which are compared for
equality.

Jon
Jun 27 '08 #6
Very good answer Jon
short and consist exact what I wanted.

Many thanks!!

//Tony

"Jon Skeet [C# MVP]" <sk***@pobox.comskrev i meddelandet
news:bc**********************************@34g2000h sf.googlegroups.com...
On Jun 12, 4:23 pm, "Tony Johansson" <johansson.anders...@telia.com>
wrote:
>When I override the Equals method I should also overide the GetHashCode
according to others and all documentaion that I have read but why?
I mean in the Equals method I implement what I mean to say that these two
object are equals I can't see
any point what the GetHashCode can add to my implementation of the Equals
method?

Two objects which are considered equal should return the same hash
code. If they do, that type can be used in a hashtable (such as
Dictionary). If they don't, a hash lookup will (potentially) fail for
equal keys, which violates the point of a hashtable.
>I mean if I for example have a Person with an integer of age in this
class
Person.
I just check if the ages in these two object are equal if so then return
true else false from the Equals method.
But here if I also override the GetHashCode what can this method add to
this
checking if the two ages in my two
objects are equal or not?

If Equals is just checking the person's age, then returning the age as
the hash code is a reasonable start.

Generally you build a hash up from the values which are compared for
equality.

Jon

Jun 27 '08 #7
Hello!

One more thing how will this GetHashCode be called I assume that it's not
called explicitly like someObject.GetHasCode();

//Tony
"Jon Skeet [C# MVP]" <sk***@pobox.comskrev i meddelandet
news:bc**********************************@34g2000h sf.googlegroups.com...
On Jun 12, 4:23 pm, "Tony Johansson" <johansson.anders...@telia.com>
wrote:
>When I override the Equals method I should also overide the GetHashCode
according to others and all documentaion that I have read but why?
I mean in the Equals method I implement what I mean to say that these two
object are equals I can't see
any point what the GetHashCode can add to my implementation of the Equals
method?

Two objects which are considered equal should return the same hash
code. If they do, that type can be used in a hashtable (such as
Dictionary). If they don't, a hash lookup will (potentially) fail for
equal keys, which violates the point of a hashtable.
>I mean if I for example have a Person with an integer of age in this
class
Person.
I just check if the ages in these two object are equal if so then return
true else false from the Equals method.
But here if I also override the GetHashCode what can this method add to
this
checking if the two ages in my two
objects are equal or not?

If Equals is just checking the person's age, then returning the age as
the hash code is a reasonable start.

Generally you build a hash up from the values which are compared for
equality.

Jon

Jun 27 '08 #8
On Jun 12, 5:35 pm, "Tony Johansson" <johansson.anders...@telia.com>
wrote:
One more thing how will this GetHashCode be called I assume that it's not
called explicitly like someObject.GetHasCode();
It certainly could be. It's just a normal method. It will usually be
called by either:
a) an implementation of a hashtable, such as Dictionary
b) another object which contains an instance of your type, is using
that as part of equality comparison, and wants to use it to build its
own hashcode.

As an example of b) if your person had a name as well as an age, you
might do:

public override int GetHashCode()
{
int ret = 17;
ret = 37*ret + age;
ret = 37*ret + name.GetHashCode();
return ret;
}

Jon
Jun 27 '08 #9
On Jun 12, 11:48*am, "Jon Skeet [C# MVP]" <sk...@pobox.comwrote:
It certainly could be. It's just a normal method. It will usually be
called by either:
a) an implementation of a hashtable, such as Dictionary
b) another object which contains an instance of your type, is using
that as part of equality comparison, and wants to use it to build its
own hashcode.

As an example of b) if your person had a name as well as an age, you
might do:

public override int GetHashCode()
{
* * int ret = 17;
* * ret = 37*ret + age;
* * ret = 37*ret + name.GetHashCode();
* * return ret;

}

Jon
Maybe it's just a typo, but did you mean to use age as-is or did you
mean to call GetHashCode on it too? The reason I'm asking is because
the way it is written would cause people with the same name to cluster
together. The effect would be more severe had age been merged into
the hash code after name.

Also, in the interest of playing the Monday morning quarterback, I've
wondered whether Microsoft would have been better off forcing all
dictionary keys to implement a special interface, say IDictionaryKey,
that inherits IEquatable and provides the GetHashCode method. That
way the requirement for implementing GetHashCode and Equals in tandem
could be enforced declaratively instead of having a special case the
compiler must check for. Opinion?

Jun 27 '08 #10
On Thu, 12 Jun 2008 12:10:22 -0700, Brian Gideon <br*********@yahoo.com>
wrote:
>public override int GetHashCode()
{
* * int ret = 17;
* * ret = 37*ret + age;
* * ret = 37*ret + name.GetHashCode();
* * return ret;

}

Jon

Maybe it's just a typo, but did you mean to use age as-is or did you
mean to call GetHashCode on it too? The reason I'm asking is because
the way it is written would cause people with the same name to cluster
together. The effect would be more severe had age been merged into
the hash code after name.
Jon can answer for himself, of course. But that's never stopped me from
jumping in before. :)

I'm not sure what you mean about "cluster". The general rule is that a
type that's already effectively an int can just be used directly as its
own hash code. I'd be surprised if int.GetHashCode() does anything except
return the int value anyway, but regardless, there's no reason not to just
use the int directly as its own hash code.

I _do_ believe that pre-initializing the hashcode with 629 is pointless.
You could start at 0 just as easily, and the values would mathematically
be distributed the same, I believe. At least, the last time I brought
that up (in the Java newsgroup) no one provided any counter-proof, and the
only reponses were in agreement with my statement. It could be that just
means they are all as dumb as me, but I don't think there's an actual
mathematical difference.
Also, in the interest of playing the Monday morning quarterback, I've
wondered whether Microsoft would have been better off forcing all
dictionary keys to implement a special interface, say IDictionaryKey,
that inherits IEquatable and provides the GetHashCode method. That
way the requirement for implementing GetHashCode and Equals in tandem
could be enforced declaratively instead of having a special case the
compiler must check for. Opinion?
I'm also not sure what you mean here. Equals and GetHashCode are defined
in Object. All types have a default implementation (for reference types,
the default implementation is simply reference equality), so the compiler
doesn't need to check for them at all. The default implementation may or
may not be what's desired with respect to a dictionary key, but that's not
something the compiler can figure out anyway. There's no "special case
the compiler must check for".

So, no...I don't see the value in having an IDictionaryKey interface.

Pete
Jun 27 '08 #11
>Maybe it's just a typo, but did you mean to use age as-is or did you
>mean to call GetHashCode on it too?
GetHashCode returns an Int32. int == Int32. Therefore the value of an int
is already as unique as you can represent given 4 bytes. Therefore:

public struct Int32
{
...
public override int GetHashCode()
{
return this;
}
...
}


--
Pete
=========================================
I use Enterprise Core Objects (Domain driven design)
http://www.capableobjects.com/
=========================================
Jun 27 '08 #12
On Jun 12, 2:35*pm, "Peter Duniho" <NpOeStPe...@nnowslpianmk.com>
wrote:
I'm not sure what you mean about "cluster". *The general rule is that a *
type that's already effectively an int can just be used directly as its *
own hash code. *I'd be surprised if int.GetHashCode() does anything except *
return the int value anyway, but regardless, there's no reason not to just*
use the int directly as its own hash code.
You're right Int32 returns a hash code that is equal to the value so
it doesn't matter at all if GetHashCode is called. I'm using the term
clustering to describe the situation that arises when a set of objects
yield hash codes that tend to bunch together within a smaller subset
of the entire Int32 range because of similarities in there
attributes. Clustering causes inefficiency in a lot hash tables
implementations.

In the example people with the same name, but different ages would
yield hash codes that would cluster around around a value dominated by
the name and spread out over range of no more than 3700 (since 100 is
a reasonable life expectancy). The effect is that the hashtable may
become no better than a linked list if it contains people with the
same name.
>
I _do_ believe that pre-initializing the hashcode with 629 is pointless. *
You could start at 0 just as easily, and the values would mathematically *
be distributed the same, I believe. *At least, the last time I brought *
that up (in the Java newsgroup) no one provided any counter-proof, and the*
only reponses were in agreement with my statement. *It could be that just *
means they are all as dumb as me, but I don't think there's an actual *
mathematical difference.
Don't know.
I'm also not sure what you mean here. *Equals and GetHashCode are defined *
in Object. *All types have a default implementation (for reference types, *
the default implementation is simply reference equality), so the compiler *
doesn't need to check for them at all. *The default implementation may or *
may not be what's desired with respect to a dictionary key, but that's not*
something the compiler can figure out anyway. *There's no "special case *
the compiler must check for".

So, no...I don't see the value in having an IDictionaryKey interface.
The compiler has to be handling this as a special case otherwise you
wouldn't get the warning that "...overrides Object.Equals but does not
override Object.GetHashCode". An interface based approach would
eliminate this special case and be able to strictly enforce this
relationship between GetHashCode and Equals. Not mention that it
would encourage developers to put more thought into using custom
objects as dictionary keys (which they probably should be anyway).
Jun 27 '08 #13
>*Not mention that it
would encourage developers to put more thought into using custom
objects as dictionary keys (which they probably should be anyway).
My only problem with that is that people tend to forget that the
hashcode should be immutable, at least once it has been used as a key
- otherwise it can become unfindable...

Re the clustering - the most common mistake people make here would be
using xor - any matched pairs come out at zero, making for a great big
collision. I don't expect I'm telling anybody involved in this
dicsussion anything they don't know, but for the list's benefit ;-p

Marc
Jun 27 '08 #14
Brian Gideon <br*********@yahoo.comwrote:
public override int GetHashCode()
{
* * int ret = 17;
* * ret = 37*ret + age;
* * ret = 37*ret + name.GetHashCode();
* * return ret;

}
Maybe it's just a typo, but did you mean to use age as-is or did you
mean to call GetHashCode on it too? The reason I'm asking is because
the way it is written would cause people with the same name to cluster
together. The effect would be more severe had age been merged into
the hash code after name.
I'd probably just leave it at that. Calling GetHashCode on age wouldn't
have don't anything, as Int32's implementation of GetHashCode is just
to return the original value.

I'd hope that a good hashtable implementation would effectively ignore
the "closeness" of hashcodes to some extent, for instance by bucketing
with some appropriate mod function.
Also, in the interest of playing the Monday morning quarterback, I've
wondered whether Microsoft would have been better off forcing all
dictionary keys to implement a special interface, say IDictionaryKey,
that inherits IEquatable and provides the GetHashCode method. That
way the requirement for implementing GetHashCode and Equals in tandem
could be enforced declaratively instead of having a special case the
compiler must check for. Opinion?
Agreed. Ironically I was thinking about blogging precisely this point,
although for slightly different reasons. Many types - most types, even
- aren't really suitable as hash keys unless you're actually talking
about identity. That can still be useful, but it's probably worth
knowing what you're talking about.

So long as you made an implementation of IEqualityComparer available
which took the "native" implementation of the current
object.GetHashCode available to cope with the identity part, I think it
would be entirely reasonable to not have GetHashCode and Equals as part
of object itself. I think of it as a mistake that Java made and then
..NET copied.

(There may have been platforms before Java which had a single type
hierarchy where the uber-type had hashcode and equality defined - I
just don't know of it.)

--
Jon Skeet - <sk***@pobox.com>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Jun 27 '08 #15
Peter Duniho <Np*********@nnowslpianmk.comwrote:
I _do_ believe that pre-initializing the hashcode with 629 is pointless.
It's pointless for all but purposes of symmetry. It means that *every*
field, from first to last, is treated the same way: take "current
value", multiply by constant, add hash for field.

It means that while you're deciding equality / hashcode implementation,
you can add and remove fields at will. No particular reason beyond
that.

--
Jon Skeet - <sk***@pobox.com>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Jun 27 '08 #16
Brian Gideon <br*********@yahoo.comwrote:
On Jun 12, 2:35*pm, "Peter Duniho" <NpOeStPe...@nnowslpianmk.com>
wrote:
I'm not sure what you mean about "cluster". *The general rule is thata *
type that's already effectively an int can just be used directly as its*
own hash code. *I'd be surprised if int.GetHashCode() does anything except *
return the int value anyway, but regardless, there's no reason not to just *
use the int directly as its own hash code.
You're right Int32 returns a hash code that is equal to the value so
it doesn't matter at all if GetHashCode is called. I'm using the term
clustering to describe the situation that arises when a set of objects
yield hash codes that tend to bunch together within a smaller subset
of the entire Int32 range because of similarities in there
attributes. Clustering causes inefficiency in a lot hash tables
implementations.
I *hope* that the .NET implementations don't suffer from that problem.
In the example people with the same name, but different ages would
yield hash codes that would cluster around around a value dominated by
the name and spread out over range of no more than 3700 (since 100 is
a reasonable life expectancy). The effect is that the hashtable may
become no better than a linked list if it contains people with the
same name.
Well, even if it has a linked list of people with the same name it can
still get rid of definite non-matches (same name, different age)
without doing a single comparison of actual people.

--
Jon Skeet - <sk***@pobox.com>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Jun 27 '08 #17
On Thu, 12 Jun 2008 15:26:10 -0700, Jon Skeet [C# MVP] <sk***@pobox.com>
wrote:
Peter Duniho <Np*********@nnowslpianmk.comwrote:
>I _do_ believe that pre-initializing the hashcode with 629 is pointless.

It's pointless for all but purposes of symmetry. It means that *every*
field, from first to last, is treated the same way: take "current
value", multiply by constant, add hash for field.
No, my point is that you could just initialize the local hash variable to
0, rather than 17.

You could still multiply the 0 by 37 in the first field incorporated,
preserving the symmetry aspect. I just don't see the point in starting
with 17 versus 0. It has the sense of "magic" about it, but unlike the
multiplication by 37, I don't see any practical advantage. A less
"magical" initializer of 0 would be just as good.

I think. :)

Pete
Jun 27 '08 #18
On Jun 12, 5:24*pm, Jon Skeet [C# MVP] <sk...@pobox.comwrote:
I'd probably just leave it at that. Calling GetHashCode on age wouldn't
have don't anything, as Int32's implementation of GetHashCode is just
to return the original value.
Yeah, after realizing Int32.GetHashCode just returns the same value
I'd leave it as-is as well. It's too much effort trying to think of a
better distribution especially considering that people with same name
(first and last anyway) doesn't occur frequently enough.

Ya know, my naive line of thinking was that Int32.GetHashCode would
return something other than the original value so that the
distribution would be completely random. As it is now the
distribution can definitely be considered good since the hash code has
the same entropy as the original value, but it's not at all random.
To be perfectly honest, I'm not sure how important it is for the hash
code to be absolutely random as opposed to just good.
>
I'd hope that a good hashtable implementation would effectively ignore
the "closeness" of hashcodes to some extent, for instance by bucketing
with some appropriate mod function.
The documentation does mention that for best performance the
distribution of GetHashCode should be random. But, I'm sure that
Hashtable and Dictionary both have good implementations and that the
clustering around the name in your example is probably negligible. I
brought it up because I thought it would stimulate some interesting
discussions at the very least.

Several years ago I looked at the shared source code for Hashtable and
I believe it was using the guadratic probing method. It's been awhile
since my data structures class so I'm a little rusty, but I believe
that method does offer some protection from the clustering problem.
Anyone know what method Dictionary uses? I thought it was different
from Hashtable.
Agreed. Ironically I was thinking about blogging precisely this point,
although for slightly different reasons. Many types - most types, even
- aren't really suitable as hash keys unless you're actually talking
about identity. That can still be useful, but it's probably worth
knowing what you're talking about.

So long as you made an implementation of IEqualityComparer available
which took the "native" implementation of the current
object.GetHashCode available to cope with the identity part, I think it
would be entirely reasonable to not have GetHashCode and Equals as part
of object itself. I think of it as a mistake that Java made and then
.NET copied.

(There may have been platforms before Java which had a single type
hierarchy where the uber-type had hashcode and equality defined - I
just don't know of it.)

--
I think that would be an interesting topic for a blog. I do try to
read your blog posts so I'll catch it if you do decide to post on the
topic.
Jun 27 '08 #19
On Jun 12, 3:21*pm, Marc Gravell <marc.grav...@gmail.comwrote:
My only problem with that is that people tend to forget that the
hashcode should be immutable, at least once it has been used as a key
- otherwise it can become unfindable...
That's seems so obvious, but I could actually see myself making that
very mistake. You've got to admit that such a simple concept could be
easily overlooked.
>
Re the clustering - the most common mistake people make here would be
using xor - any matched pairs come out at zero, making for a great big
collision. I don't expect I'm telling anybody involved in this
dicsussion anything they don't know, but for the list's benefit ;-p
Yep, I'm aware of the problems in using xor, but I bet most developers
aren't. I don't override GetHashCode very often, but when I do I
sometimes use xor because it doesn't require any thought, but I always
put a TODO comment in reminding myself that I need to come back and
change it.
>
Marc
Jun 27 '08 #20
On Jun 12, 6:51*pm, "Peter Duniho" <NpOeStPe...@nnowslpianmk.com>
wrote:
You could still multiply the 0 by 37 in the first field incorporated, *
preserving the symmetry aspect. *I just don't see the point in starting *
with 17 versus 0. *It has the sense of "magic" about it, but unlike the *
multiplication by 37, I don't see any practical advantage. *A less *
"magical" initializer of 0 would be just as good.

I think. *:)

Pete
It does seem like magic doesn't it? But, then again mathematics is
full intriguing quirks so it wouldn't surprise me if there was a valid
reason.

It reminds of me of the section in Knuth Vol 2 where he talks about
random number generators and warns that using one random number
generator to seed another generator may actually make the end result
*less* random! I know...completely off topic. My mind works in
mysterious ways :)

Jun 27 '08 #21
On Thu, 12 Jun 2008 19:41:32 -0700, Brian Gideon <br*********@yahoo.com>
wrote:
It does seem like magic doesn't it? But, then again mathematics is
full intriguing quirks so it wouldn't surprise me if there was a valid
reason.
Well, in this case it's pretty much just algebra. We're not in the realm
of "intriguing quirks of mathematics". It's proveable that the
initializer always winds up just affecting the hash code by a fixed
constant. It's hard to see how that fixed constant could be significant.
At best, it would just shift all of the hashed codes by the same amount
along the bins for the hash table. It won't change their relative
arrangement, and thus wouldn't affect the actual operation.

Pete
Jun 27 '08 #22
On Jun 13, 12:51 am, "Peter Duniho" <NpOeStPe...@nnowslpianmk.com>
wrote:
No, my point is that you could just initialize the local hash variable to
0, rather than 17.

You could still multiply the 0 by 37 in the first field incorporated,
preserving the symmetry aspect. I just don't see the point in starting
with 17 versus 0. It has the sense of "magic" about it, but unlike the
multiplication by 37, I don't see any practical advantage. A less
"magical" initializer of 0 would be just as good.

I think. :)
Okay, I'll have to consult Effective Java, which is where the advice
originally came from. I believe there's some benefit to having two
magic numbers which are coprime to each other. Wikipedia doesn't show
this as one of the example hash functions, unfortunately, so there's
no discussion there to help :(

Jon
Jun 27 '08 #23
On Thu, 12 Jun 2008 22:54:40 -0700, Jon Skeet [C# MVP] <sk***@pobox.com>
wrote:
Okay, I'll have to consult Effective Java, which is where the advice
originally came from.
For what it's worth, the last discussion I was in when this came up,
Bloch's book was also quoted. So at least I'm seeing a consistent source
here. :)
I believe there's some benefit to having two
magic numbers which are coprime to each other.
As I mentioned, my inspection of the problem concluded that no matter what
number you pick for the initial value, it factors out to a single constant
added to the final result. Straight algebraic manipulation. And since
adding a single constant to a hash code isn't going to affect its
usefulness, selection of some specific initial value should not either.

I don't have a copy of "Effective Java", but if you do and can go back to
look to see if he offered any sort of justification for the initializer, I
suppose I'd be interested in hearing that. I've been wrong before, but
usually there's someone around who can explain how or why. Or, I suppose,
maybe that's not true and I've actually been wrong a lot more than I or
anyone else realized. :) But assuming the latter isn't actually the
case, I remain relatively confident that I've figured this one correctly.
Until proven otherwise, of course. :)

Pete
Jun 27 '08 #24
On Jun 13, 3:28 am, Brian Gideon <briangid...@yahoo.comwrote:
To be perfectly honest, I'm not sure how important it is for the hash
code to be absolutely random as opposed to just good.
I think to know that we'd have to know exactly what you mean by
"random". Completely evenly distributed?

I suspect it's beneficial because whatever the hashtable does to
process hashcodes, if you have some pattern/clustering to your data
you're more likely to get a poor distribution in terms of buckets.
However, I would hope that a good hashtable implementation would make
it *unlikely* that you'd see a problem. Then again, my understanding
of the *exact* implementation of hashtables is somewhat lacking.
I think that would be an interesting topic for a blog. I do try to
read your blog posts so I'll catch it if you do decide to post on the
topic.
I'll put it on my mental list of future posts :)

Jon
Jun 27 '08 #25
Re adding a seed; actually, it has one advantage... a lot of default
instances get created just with new Foo() or
Activator.CreateInstance()
Without a seed, this type of hash algorithm (using + and *) will
return 0 for every default instance. Which is fine in itself, since
every default instance is broadly comparable... but! Now imagine that
we are comparing tuples of values, or lists of values - where the
tuple's hashcode is a composite (again using + and *) of the contained
values' hashcodes.
It is nice (from a purely theoretical standpoint) to be able to
immediately distinguish a set of 2 (default instances) from a set of 8
(default instances) - or the sets {A,B} and {[default],A,B}.

OK; in reality, this is an unlikely scenario - but it does still seem
desirable to avoid any "obvious" causes of zero. Yes, even with a seed
you will still get zero from some values, but then as a genuine
exception, and so it is not an issue.

Re random - "uniform" would be a better term; Int32.GetHashCode() is
uniform but not random.

Marc
Jun 27 '08 #26
On Thu, 12 Jun 2008 23:36:38 -0700, Marc Gravell <ma**********@gmail.com>
wrote:
[...]
It is nice (from a purely theoretical standpoint) to be able to
immediately distinguish a set of 2 (default instances) from a set of 8
(default instances) - or the sets {A,B} and {[default],A,B}.
Um, okay. I'll buy that. Still, it seems like a pretty esoteric scenario
to drive a hash code implementation. Especially since it still doesn't
address sets of identical lengths. :)
[...]
Re random - "uniform" would be a better term; Int32.GetHashCode() is
uniform but not random.
Agreed. I was going to pick that nit myself, but then I didn't. :)

Pete
Jun 27 '08 #27
Um, okay. I'll buy that. Still, it seems like a pretty esoteric scenario
to drive a hash code implementation. Especially since it still doesn't
address sets of identical lengths. :)
Re identical lengths - if the lengths and contents match, I'm happy
for the hashcode to match under that approach, since they sound like
the same set (assuming the definition of equality ties with
GetHashCode(), which it should).

Marc
Jun 27 '08 #28
On Jun 13, 12:54*am, "Jon Skeet [C# MVP]" <sk...@pobox.comwrote:
Okay, I'll have to consult Effective Java, which is where the advice
originally came from. I believe there's some benefit to having two
magic numbers which are coprime to each other. Wikipedia doesn't show
this as one of the example hash functions, unfortunately, so there's
no discussion there to help :(

Jon
Okay, I'm looking through Knuth Vol 3 (those who have read the books
know that it isn't an easy read) on hashing and he talks about
choosing a prime number as the multiplier for the "scrambling
function" (aka GetHashCode). However, I don't see anything about the
initial seed being a prime number.

My best guess right now is definitely along the lines of what Marc
suggested. It has more to do with the end result you get from
combining the values of more than one call to GetHashCode (via
combinations of + and *).

From clues in Knuth this may be related to random number generation
theory. Since the scrambling function needs to be random it's easy to
see how that works in. Seeding the hash code with a number that is
coprime to the multiplier may be an attempt to prevent unwanted
periods from developing in the final hash code when combinging results
from multiple calls to GetHashCode.
Jun 27 '08 #29

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Bijay Kumar | last post: by
3 posts views Thread by cmrchs | last post: by
2 posts views Thread by Mark | last post: by
8 posts views Thread by Kenneth Baltrinic | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
1 post views Thread by Marylou17 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.