454,137 Members | 1,024 Online
Need help? Post your question and get tips & solutions from a community of 454,137 IT Pros & Developers. It's quick & easy.

hash

 P: n/a how much time does it take to lok for a value - key in a hash_map ? Thanks Dec 27 '06 #1
11 Replies

 P: n/a Douglas Dude wrote: how much time does it take to lok for a value - key in a hash_map ? Thanks I mean the O-notation. Please help me because I think it is O(1). Is this correct ? Dec 27 '06 #2

 P: n/a Douglas Dude wrote: Douglas Dude wrote: >how much time does it take to lok for a value - key in a hash_map ?Thanks I mean the O-notation. Please help me because I think it is O(1). Is this correct ? I don't see how an O(1) runtime is possible for a lookup in a _general purpose_ hash table, which is what I assume hash_map to be. If it uses chaining, one would expect that with N items and M buckets, lookups would, on average and assuming that the hash function is not biased, take O(N/M), without taking into account the algorithm that generates the hash out of the key, if such an operation is required. -n Dec 27 '06 #3

 P: n/a Douglas Dude wrote: Douglas Dude wrote: how much time does it take to lok for a value - key in a hash_map ? Thanks I mean the O-notation. Please help me because I think it is O(1). Is this correct ? Lookups are O(1) on average, the rare worst case time could be up to 0(n). For a constant average time performance the use of a well-designed hash function is necessary to produce a uniform distribution of elements and reduce the number of collisions within the hash Dec 27 '06 #4

 P: n/a bjeremy wrote: Douglas Dude wrote: >Douglas Dude wrote: >>how much time does it take to lok for a value - key in a hash_map ?Thanks I mean the O-notation. Please help me because I think it is O(1). Isthis correct ? Lookups are O(1) on average Assuming that the size of the hash table is fixed and that the hash function equally distributes the data, then, on average, chains would be N/M items long. That suggests that the order of lookups is an O(N/M) operation. Two cases can be said to be O(1): if M is a linear function of N which means that N/M = C, or if N is small and M is large. But it's really not accurate to suggest that a hash table can, in general, perform lookups in constant time. For more detailed analysis, look up "Algorithms in C++" by Robert Sedgewick or volume 3 of "The Art of Computer Programming" by Donald Knuth. -n Dec 27 '06 #5

 P: n/a Nikolaos D. Bougalis wrote: bjeremy wrote: Douglas Dude wrote: Douglas Dude wrote:how much time does it take to lok for a value - key in a hash_map ?Thanks I mean the O-notation. Please help me because I think it is O(1). Is this correct ? Lookups are O(1) on average Assuming that the size of the hash table is fixed and that the hash function equally distributes the data, then, on average, chains would be N/M items long. That suggests that the order of lookups is an O(N/M) operation. Two cases can be said to be O(1): if M is a linear function of N which means that N/M = C, or if N is small and M is large. But it's really not accurate to suggest that a hash table can, in general, perform lookups in constant time. For more detailed analysis, look up "Algorithms in C++" by Robert Sedgewick or volume 3 of "The Art of Computer Programming" by Donald Knuth. -n The hash buckets will be able to be accessed in constant time.. the number of M depends on the frequency of your collisions... If you handle collisions through chaining your hash may degrade in performance to a linear search over M elements (all those that mapped to the particular N location)... But in practice average case you will experience constant time lookup. Or if you like it better... In my practice I have experienced constant time lookup before degradation. I Dec 28 '06 #6

 P: n/a Of course, you could say "well, all my hash tables have 500 buckets and 1000 items, and every bucket has exactly 2 items, so according to your O(N/M) logic that works out to O(2) which is a constant, so it's constant time! HA! HA!" Yes.. that's exactly what I was going to say... wow it's like you don't even need to anyone else around to carry on a conversation! So explain to lowly me the purpose of having a hash map if you are saying its complexity is equal to that of a linked list. Does your explanation ivolve something like Hash Maps are designed to have a O(1) lookup on average but could exhibit a O(n) lookup depending on your hash algorithm. p.s. The multiplicative Hash function proposed by Knuth in "The Art of Comp[uter Programming" exhibits poor clustering behavior ... but I guess there is no universal consensus on a "good" hash Dec 28 '06 #8

 P: n/a bjeremy wrote: So explain to lowly me the purpose of having a hash map if you are saying its complexity is equal to that of a linked list. That is not what I am saying at all. I'm saying that the complexity of lookup in a hash table, excluding the cost of calculating the hash from the key, is on average O(N/M). This applies whether you have more items that buckets or more buckets than items, provided you have a decent hash function. Do you disagree that if a hash bucket has P items inside, you still have to perform a linear-time scan of those items to find the particular one you're interested in, and that the best possible time complexity for that linear search is O(P)? Do you agree that in the best case, P will be one, in the worst, P will be N, and ideally it should be close to N/M. Does your explanation ivolve something like Hash Maps are designed to have a O(1) lookup on average but could exhibit a O(n) lookup depending on your hash algorithm. No, although hash tables can exhibit horrible worst case behavior, as you said, with chained ones degenerating to a linked list and others (i.e. open probing) exhibiting similar behavior. -n Dec 28 '06 #9

 P: n/a Nikolaos D. Bougalis wrote: bjeremy wrote: So explain to lowly me the purpose of having a hash map if you are saying its complexity is equal to that of a linked list. That is not what I am saying at all. I'm saying that the complexity of lookup in a hash table, excluding the cost of calculating the hash from the key, is on average O(N/M). This applies whether you have more items that buckets or more buckets than items, provided you have a decent hash function. Do you disagree that if a hash bucket has P items inside, you still have to perform a linear-time scan of those items to find the particular one you're interested in, and that the best possible time complexity for that linear search is O(P)? Do you agree that in the best case, P will be one, in the worst, P will be N, and ideally it should be close to N/M. Does your explanation ivolve something like Hash Maps are designed to have a O(1) lookup on average but could exhibit a O(n) lookup depending on your hash algorithm. No, although hash tables can exhibit horrible worst case behavior, as you said, with chained ones degenerating to a linked list and others (i.e. open probing) exhibiting similar behavior. -n I don't disagree, but what I'm inferring (which may be different than what you're implying) is that you are computing the complexity based upon frequent and expected collisions. A good hash table (with a "good" hash function) should try to limit the number of collisions. Now, a nonzero probability of collisions is inevitable in any implementation of a hash table, but a hash table should try to minimize the amount of collisions. i.e. if all the keys are known ahead of time, and there are no more keys that can fit in the hash table, then perfect hashing can be used which leaves no collisions, and the lookup time would be O(1). That was not intended to be a "gotcha" scenerio, but my argument is that even though you probably will not do perfect hashing, my intent when using a hash map is to try and make collisions a rare or infrequent occurence. Otherwise, I wouldn't so much be creating a hash map as I would be creating a two dimensional lookup table with an expensive hash function. Dec 28 '06 #10

 P: n/a bjeremy wrote: Nikolaos D. Bougalis wrote: particular N location)... But in practice average case you will experience constant time lookup. Only in algorithmic meaning. In real world, hash maps slow down as you are getting out of L1 and L2 cache (and long time later, when you start swapping ;) In fact, getting out of L1/L2 can have bigger impact than collisions IME. Mirek Dec 28 '06 #11

 P: n/a Nikolaos D. Bougalis wrote: It's common sense, really. Once you have more items that buckets, you invariably end up having to perform some operations over N items, where N is the number of items that have the same hash value. And no matter how one cuts, slices or dices the operations in question, they will, at best, end up taking linear time. And linear time is not constant time. Who says you ever have to have more items than buckets? One can trivially design a hash table so that the number of buckets is equal to the number of items. While it's not particularly efficient to do that, its problems are not in the scalability department. (It just wastes storage space.) You can create a hash table that has O(1) insert, delete, and find times. However, you have to meet all of the following requirements: 1) The size of the hash value must be much larger than the number of items in the hash. The choice of hash function will always place an upper limit on scalability. 2) The number of buckets must adjust with the number of elements, such that the average number of items per bucket stays constant. (Thus M is a linear function of N, and N/M becomes a constant.) 3) The cost of adjusting the number of buckets must not be dependent on the number of buckets. (Lawson's algorithm meets this requirement.) 4) The hash function must produces values for the objects that are effectively random. If you're sufficiently clever, you can design hash functions so that the speed of an insertion with 100 objects, 50,000 objects, or 50,000,000 objects is nearly the same. All you see are cache effects. DS Jan 4 '07 #12