473,385 Members | 1,712 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Generic GetHashCode method

I'm trying to write a generic method to generate Hashcodes but am having some
problems with (generic) collections. Here is the code of my method:

public static int GetHashCode(object input)
{
try
{
Type objectType = input.GetType();
PropertyInfo[] properties = objectType.GetProperties();

int totalHashCode = 7;

foreach (PropertyInfo property in properties)
{
// Reset hashcode
int hashcode = 0;

// Get property value
object value = property.GetValue(input, null);

// Invoke GetHashCode method on property
if (property.PropertyType.IsValueType || value !=
null)
{

int.TryParse(property.PropertyType.InvokeMember("G etHashCode",
BindingFlags.Default | BindingFlags.InvokeMethod, null, value, null,
CultureInfo.InvariantCulture).ToString(), out hashcode);
}

totalHashCode ^= hashcode;
}
return totalHashCode;
}
catch(Exception)
{
// Handle exception
return 0;
}
}

public class Customer{
public Customer(){}
public string Name{get{return "Mr Blobby";}}
public string Telephone{get{return "999";}}
}

public class CustomerCollection : List<Customer>
{
public CustomerCollection(){}
}

The problem I am having is that this method doesn't really take into account
objects which are(generic) collections. eg.. (excusing my simplicity) the
Customer class would work but the Customer Collection wouldn't. Can anyone
give me some pointers to improve this? Cheers in advance for any tips!
May 22 '06 #1
5 2244
Metaman <ma****@nospam.nospam.com> wrote:
I'm trying to write a generic method to generate Hashcodes but am having some
problems with (generic) collections. Here is the code of my method:
Normally, one overrides the GetHashCode() method to provide the hashcode
for an object.

It would be clearer if you described your algorithm for generating this
hashcode. It appears that you are iterating through all the properties
of an object and getting the hashcode of each property, and xoring them
together, and using that as the hashcode.

Because GetHashCode() is virtual and all objects in .NET have it, you
can invoke it directly. As far as I can make out, you could rewrite your
method like this:

---8<---
public static int GetHashCode(object value)
{
if (value == null)
return 0;

int result = 0;
foreach (PropertyInfo prop in value.GetType().GetProperties(
BindingFlags.Instance | BindingFlags.Public))
{
if (prop.GetIndexParameters().Length == 0)
{
object propValue = prop.GetValue(value, null);
if (propValue != null)
result ^= propValue.GetHashCode();
}
}
return result;
}
--->8---

However, I don't think you should do this. I'll explain below.
public class Customer{
public Customer(){}
public string Name{get{return "Mr Blobby";}}
public string Telephone{get{return "999";}}
}
Why don't you override the GetHashCode() method instead?
The problem I am having is that this method doesn't really take into account
objects which are(generic) collections. eg.. (excusing my simplicity) the
Customer class would work but the Customer Collection wouldn't. Can anyone
give me some pointers to improve this? Cheers in advance for any tips!


What you are describing is a GetHashCode() which iterates through an
entire list, and enumerates all the properties of every object in the
list, all to produce a single hash code.

Hash codes are used to insert and retrieve values into and from data
structures which use hash codes, usually Dictionary<,> or Hashtable. One
of the most important concerns in writing a hash function is that it
should be fast.

With the deep definition of equality that you are using, these hash
tables will be expensive for inserts and expensive for lookups.

-- Barry

--
http://barrkel.blogspot.com/
May 22 '06 #2
Thanks for your reply Barry. It slipped my mind that GetHashCode method is on
the base object as well so that makes things a little easier. The reason that
I am writing this function is that I have a number of objects that contain
quite a lot of properties. I need a reliable way of comparing two objects of
the same type to see if they are the same. Even if only one of their
properties differs slightly I need to be made aware of it. After doing some
testing the only way that seemed to work reliably was to Xor the hashcodes of
each property together. The problem with this is that for each object I then
need to override the GetHashcode method and write in all the properties
(which can be slow and unmaintainable if there is 30 properties or so)

eg..

public override int GetHashCode()
{
int result = Global.HashCodeMultiplier;
result ^= ( Name.GetHashCode() ^
Street1.GetHashCode() ^
Telephone.GetHashCode() );
return result;
}

so I thought that an alternative would be to write a generic function to
handle this and then just override the GetHashCode method for each object as
follows:

public override int GetHashCode()
{
return ObjectHelper.GetHashCode(this);
}

I appreciate that this is not the most effiecient way of checking whether
objects are the same but not sure what a better way of generating a hashcode
could be.

P.S. I actually override the Equals method first which checks the easy
things (makes sure that they are the same type etc..)

"Barry Kelly" wrote:
Metaman <ma****@nospam.nospam.com> wrote:
I'm trying to write a generic method to generate Hashcodes but am having some
problems with (generic) collections. Here is the code of my method:


Normally, one overrides the GetHashCode() method to provide the hashcode
for an object.

It would be clearer if you described your algorithm for generating this
hashcode. It appears that you are iterating through all the properties
of an object and getting the hashcode of each property, and xoring them
together, and using that as the hashcode.

Because GetHashCode() is virtual and all objects in .NET have it, you
can invoke it directly. As far as I can make out, you could rewrite your
method like this:

---8<---
public static int GetHashCode(object value)
{
if (value == null)
return 0;

int result = 0;
foreach (PropertyInfo prop in value.GetType().GetProperties(
BindingFlags.Instance | BindingFlags.Public))
{
if (prop.GetIndexParameters().Length == 0)
{
object propValue = prop.GetValue(value, null);
if (propValue != null)
result ^= propValue.GetHashCode();
}
}
return result;
}
--->8---

However, I don't think you should do this. I'll explain below.
public class Customer{
public Customer(){}
public string Name{get{return "Mr Blobby";}}
public string Telephone{get{return "999";}}
}


Why don't you override the GetHashCode() method instead?
The problem I am having is that this method doesn't really take into account
objects which are(generic) collections. eg.. (excusing my simplicity) the
Customer class would work but the Customer Collection wouldn't. Can anyone
give me some pointers to improve this? Cheers in advance for any tips!


What you are describing is a GetHashCode() which iterates through an
entire list, and enumerates all the properties of every object in the
list, all to produce a single hash code.

Hash codes are used to insert and retrieve values into and from data
structures which use hash codes, usually Dictionary<,> or Hashtable. One
of the most important concerns in writing a hash function is that it
should be fast.

With the deep definition of equality that you are using, these hash
tables will be expensive for inserts and expensive for lookups.

-- Barry

--
http://barrkel.blogspot.com/

May 23 '06 #3


Metaman wrote:
Thanks for your reply Barry. It slipped my mind that GetHashCode method is on
the base object as well so that makes things a little easier. The reason that
I am writing this function is that I have a number of objects that contain
quite a lot of properties. I need a reliable way of comparing two objects of
the same type to see if they are the same. Even if only one of their
properties differs slightly I need to be made aware of it. After doing some
Then looking at the hash-codes won't be enough.

By convention:
x.Equals(y) => x.GetHashCode() == y.GetHashCode()
but
x.GetHashCode() == y.GetHashCode() =/> x.Equals(y)

If your hash-codes are intractible you may be able to argue equivalence
upto a certain acceptable error-margin, but really 2^32 is a very small
co-domain for an intractible hash.

If you wish to "fit into .NET" you should write an implementation of
IEqualityComparer<T> that uses reflections on the properties and
(recursively) applies hashing/comparison to the objects.

Note that computing the hash-value and comparing those to "optimize"
comparison will probably be a lot slower than just doing the comparison
inline, and unless you rely on intractability of the hash you will need
to do the comparison anyway.
I appreciate that this is not the most effiecient way of checking whether
objects are the same but not sure what a better way of generating a hashcode
could be.
hashing is *not* equality.
P.S. I actually override the Equals method first which checks the easy
things (makes sure that they are the same type etc..)


If x.Equals(y) then it should definatly be the case that x.GetHashCode()
== y .GetHashCode(), not the other way around.

--
Helge
May 23 '06 #4
Hey Helge,

Thanks for your reply. I appreciate that as Hashcodes determined by a
GetHashcode method are only integer values there is a finite number of
results with collision a possibility. Part of the reason why I am using
hashcodes is that I need to calculate whether objects have changed when
exporting from another system. I store hashcodes in a database and then
compare the hashcode of the exported object with the value that has
previously been stored in the database.

Is there a better way of creating a hashcode of an object that will have
less collision than a GetHashCode integer?

I agree that for comparing two objects (rather than object vs hashcode -
above) implementing the IEqualityComparer<T> would be better. I'm also aware
that hashing is *not* equality (see my overriden Equals method below).
Perhaps my wording was not correct.

public override bool Equals(object obj)
{
// Check that the parameter has value
if (obj == null) return false;

// Check that types are the same
if (GetType() != obj.GetType()) return false;

// safe because of the GetType check
Customer customer = (Customer)obj;

// Check if hash codes match (being the same does not guarantee
equality
// however if the hashcode are different then definitely not
equal)
if (! GetHashCode().Equals(customer.GetHashCode())) return false;

return true;
}
"Helge Jensen" wrote:


Metaman wrote:
Thanks for your reply Barry. It slipped my mind that GetHashCode method is on
the base object as well so that makes things a little easier. The reason that
I am writing this function is that I have a number of objects that contain
quite a lot of properties. I need a reliable way of comparing two objects of
the same type to see if they are the same. Even if only one of their
properties differs slightly I need to be made aware of it. After doing some


Then looking at the hash-codes won't be enough.

By convention:
x.Equals(y) => x.GetHashCode() == y.GetHashCode()
but
x.GetHashCode() == y.GetHashCode() =/> x.Equals(y)

If your hash-codes are intractible you may be able to argue equivalence
upto a certain acceptable error-margin, but really 2^32 is a very small
co-domain for an intractible hash.

If you wish to "fit into .NET" you should write an implementation of
IEqualityComparer<T> that uses reflections on the properties and
(recursively) applies hashing/comparison to the objects.

Note that computing the hash-value and comparing those to "optimize"
comparison will probably be a lot slower than just doing the comparison
inline, and unless you rely on intractability of the hash you will need
to do the comparison anyway.
I appreciate that this is not the most effiecient way of checking whether
objects are the same but not sure what a better way of generating a hashcode
could be.


hashing is *not* equality.
P.S. I actually override the Equals method first which checks the easy
things (makes sure that they are the same type etc..)


If x.Equals(y) then it should definatly be the case that x.GetHashCode()
== y .GetHashCode(), not the other way around.

--
Helge

May 24 '06 #5


Metaman wrote:
Hey Helge,

Thanks for your reply. I appreciate that as Hashcodes determined by a
GetHashcode method are only integer values there is a finite number of
results with collision a possibility. Part of the reason why I am using
hashcodes is that I need to calculate whether objects have changed when
exporting from another system. I store hashcodes in a database and then
compare the hashcode of the exported object with the value that has
previously been stored in the database.
So, basically you *are* using your hash-function as an equality
relation? That is a useful technique, since you don't need to store the
original object, only the hash, to decide if anything has changed.

Unfortunately, it comes at a cost: the chance that the object is
different but has the same hash.

You may be willing to accept that risk, which is determined by the
quality of your hash-function and the size of the co-domain.

If your hash-function is perfect the chance of invalidly classifying an
object as unchanged will be 1/(2^32), which may be acceptable to you.

If your hash-function is worse than that, it may be *much* worse. If you
compare, not only to an "old" hash-value, but to a set of old values
chances of failure also increases dramatically.

You were using xor to combine hash'es, that means that any number of
occurrences of the same item modulo 2 will cancel out. As an example,
the objects { x=1; y=1 } and { x=2; y=2 } will collide. This may be a
*real* problem for you. Good hash-functions have a complicated structure
to try and prevent structural equivalence from generating hash-equivalence.
Is there a better way of creating a hashcode of an object that will have
less collision than a GetHashCode integer?
You expect to rely on GetHashCode() of each member right? and then
combine them in some way for a combined hash. I would suggest ordering
the traversal and applying a cryptographic hash (for example one of the
SHA variants) to the contanenation of the GetHashCode()'es.
public override bool Equals(object obj)
{
// Check that the parameter has value
if (obj == null) return false;
You may wish to:

if ( Object.ReferenceEquals(this,obj) )
return true;

depending on the expected self vs. non-self. comparison ratio.
// Check that types are the same
if (GetType() != obj.GetType()) return false;
So inherited implementations, without member-variables are considered
non-equal, even if all members are equal?

Test-implementations also? :)
// safe because of the GetType check
Customer customer = (Customer)obj;

// Check if hash codes match (being the same does not guarantee
equality
// however if the hashcode are different then definitely not
equal)
if (! GetHashCode().Equals(customer.GetHashCode())) return false;
So, you really need:

// Check every part sequentially
return this.x == customer.x && this.y == customer.y && ...;

or you are in fact relying on the hash-function for equality.
return true;
}


--
Helge
May 24 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Avin Patel | last post by:
Hi I have question for GetHashCode() function, Is it correct in following code or there is more efficient way to implement GetHashCode() function class IntArray public int data public...
2
by: One Handed Man [ OHM# ] | last post by:
The help for the .NET Framework Class Library tells us that the Object.GetHashCode() Method does not guarantee uniqueness' or consistency and that overriding this and the Equals method is a good...
12
by: Elhanan | last post by:
hi.. i wanted to build a Dictionary Classs that will load my own class called letter, i understood that i implement the IEquatable interface's equles method that then the dictionary would use...
3
by: kim.nolsoee | last post by:
Hi I want to use the Dictionary Classs that will load my own class called KeyClass used as TKey. Here is the code: public class Dictionary { public static void Main()
4
by: Andrew Robinson | last post by:
I have a class that has three properties: two of type int and one of type string. Is this the best method when overriding the GetHashCode() ? I am guessing not... any thing better? public...
5
by: Ethan Strauss | last post by:
Hi, I have just started using Generic Collections for .Net 2.0 and so far they are working very nicely for me. But, I do have a couple of questions. If I have a Generic collection which has a type...
6
by: Andrus | last post by:
I need to create generic table field level cache. Table primary key (PrimaryKeyStructType) can be int, string or struct containing int and string fields. FieldName contains table field name to be...
28
by: Tony Johansson | last post by:
Hello! I can't figure out what point it is to use GetHashCode. I know that this GetHashCode is used for obtaining a unique integer value. Can somebody give me an example that prove the...
1
by: raylopez99 | last post by:
Here is an example of a home grown generic class, representing a pair of values. Adapted from Jon Skeet's book "C# In Depth". The generic class is "sealed" for some reason (I think for...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.