By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
445,750 Members | 1,221 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 445,750 IT Pros & Developers. It's quick & easy.

hash function

P: 9
i didn't understand what does it mean for a hash function to produce well-dispersed output, and why it is important?

thank u for ur help
Dec 20 '06 #1
Share this Question
Share on Google+
3 Replies


b1randon
Expert 100+
P: 171
i didn't understand what does it mean for a hash function to produce well-dispersed output, and why it is important?

thank u for ur help
I am assuming you are talking about using a hash for security. Usually you take a hash of some data for encryption. It is important for your has output to be well-dispersed so that your encryption is not easily cracked. If your output is predictable someone can tell what kind of input was used when the hash was created.
Dec 20 '06 #2

Ganon11
Expert 2.5K+
P: 3,652
When I've used hash, it has been for sorting and searching. A hash code takes the attributes of an object or struct and generates a semi-random number from it - each object of that type with those attributes will have that hash code, but the hash code itself might be nonsense.

If you were storing a list of objects and wanted to search for them on an exact basis (rather than searching by name, or amount, or any search based on one attribute), hash would be very useful. Many objects may have the same name, or amount, or the same of one attribute - thus, searching by name would produce multiple results. Hash code, though, produces a number unique to that object with those specific qualities. By searching by hash code, you ensure that whatever results you get will be exactly what you need.

For hash code to work, though, you must have a good method that will create unique hash for each object. Usually, this involves taking some integer representation of all object variables and multiplying them by something, or some other method of generating a number from the variables.

A bad hash method would be, for example, adding the character values of each string - since "abc" and "cab" would both produce the same hash (198). In this case, "abc", "cab", "acb", etc would all have the same hash code, and searching for all strings with hash value 198 would produce many results.
Dec 20 '06 #3

Banfa
Expert Mod 5K+
P: 8,916
When you create a hash code and hash algorithm you would often expect to get multiple hits on the same code.

This is because you are representing an object of some sort as either an 8, 16 or 32 bit integer value (normally) and generally you would be expecting to represent many objects. Certainly enough that for a 8 or 16 bit hash code you would easily have more objects than unique codes.

You create you hash code from the data of the object.

You would then have a hash table which would be a table that easily allows you to look up a given code and jump straight to the objects that have that code then using specific object data to pick from the objects with that hash code the specific object you are interested in.

Well dispersed output is required in a good hash table because it provides more efficient lookup of the object.

Taking an example lets assume that we have 2000 objects and we are using a 8 bit hash. Well dispersed distribution would result in 7 to 8 objects with each hash code. Looking for a single object requires 1 hash table lookup plus a search of 7-8 objects.

On the other hand if we have 1 object each with hash values 0-250, no objects with hash values 251-254 and 1750 objects with a hash value of 255, i.e. very poor distribution then you do not get efficient lookup of objects.

Each lookup will require 1 hash table lookup, then 12.5% of the time you will have a hash value that leads directly to a single object but 87.5% of the time you will then have to search 1750 objects to find the required object. This is little better than just having a list of objects and looking up directly in the list.


So you want even distribution to provide an efficient lookup, even distribution is dependent on the hash algorithm, so this algorithm needs careful design.

Note also I mention having a 32 bit hash value but for a hash algorithm to improve efficiency then you need to be able to quickly lookup in a hash table the location of the objects with that hash value. A 32 bit hash table would require 2^32 entries. Few computers have this capacity.
Dec 21 '06 #4

Post your reply

Sign in to post your reply or Sign up for a free account.