By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
454,137 Members | 1,024 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 454,137 IT Pros & Developers. It's quick & easy.

hash

P: n/a
how much time does it take to lok for a value - key in a hash_map ?
Thanks

Dec 27 '06 #1
Share this Question
Share on Google+
11 Replies


P: n/a

Douglas Dude wrote:
how much time does it take to lok for a value - key in a hash_map ?
Thanks
I mean the O-notation. Please help me because I think it is O(1). Is
this correct ?

Dec 27 '06 #2

P: n/a
Douglas Dude wrote:
Douglas Dude wrote:
>how much time does it take to lok for a value - key in a hash_map ?
Thanks

I mean the O-notation. Please help me because I think it is O(1). Is
this correct ?
I don't see how an O(1) runtime is possible for a lookup in a _general
purpose_ hash table, which is what I assume hash_map to be.

If it uses chaining, one would expect that with N items and M buckets,
lookups would, on average and assuming that the hash function is not biased,
take O(N/M), without taking into account the algorithm that generates the hash
out of the key, if such an operation is required.

-n
Dec 27 '06 #3

P: n/a

Douglas Dude wrote:
Douglas Dude wrote:
how much time does it take to lok for a value - key in a hash_map ?
Thanks

I mean the O-notation. Please help me because I think it is O(1). Is
this correct ?
Lookups are O(1) on average, the rare worst case time could be up to
0(n). For a constant average time performance the use of a
well-designed hash function is necessary to produce a uniform
distribution of elements and reduce the number of collisions within
the hash

Dec 27 '06 #4

P: n/a
bjeremy wrote:
Douglas Dude wrote:
>Douglas Dude wrote:
>>how much time does it take to lok for a value - key in a hash_map ?
Thanks
I mean the O-notation. Please help me because I think it is O(1). Is
this correct ?

Lookups are O(1) on average
Assuming that the size of the hash table is fixed and that the hash function
equally distributes the data, then, on average, chains would be N/M items
long. That suggests that the order of lookups is an O(N/M) operation.

Two cases can be said to be O(1): if M is a linear function of N which means
that N/M = C, or if N is small and M is large. But it's really not accurate to
suggest that a hash table can, in general, perform lookups in constant time.

For more detailed analysis, look up "Algorithms in C++" by Robert Sedgewick
or volume 3 of "The Art of Computer Programming" by Donald Knuth.

-n
Dec 27 '06 #5

P: n/a

Nikolaos D. Bougalis wrote:
bjeremy wrote:
Douglas Dude wrote:
Douglas Dude wrote:
how much time does it take to lok for a value - key in a hash_map ?
Thanks
I mean the O-notation. Please help me because I think it is O(1). Is
this correct ?
Lookups are O(1) on average

Assuming that the size of the hash table is fixed and that the hash function
equally distributes the data, then, on average, chains would be N/M items
long. That suggests that the order of lookups is an O(N/M) operation.

Two cases can be said to be O(1): if M is a linear function of N which means
that N/M = C, or if N is small and M is large. But it's really not accurate to
suggest that a hash table can, in general, perform lookups in constant time.

For more detailed analysis, look up "Algorithms in C++" by Robert Sedgewick
or volume 3 of "The Art of Computer Programming" by Donald Knuth.

-n
The hash buckets will be able to be accessed in constant time.. the
number of M depends on the frequency of your collisions... If you
handle collisions through chaining your hash may degrade in performance
to a linear search over M elements (all those that mapped to the
particular N location)... But in practice average case you will
experience constant time lookup. Or if you like it better... In my
practice I have experienced constant time lookup before degradation. I

Dec 28 '06 #6

P: n/a
bjeremy wrote:
Nikolaos D. Bougalis wrote:
>bjeremy wrote:
>>Douglas Dude wrote:
Douglas Dude wrote:
how much time does it take to lok for a value - key in a hash_map ?
Thanks
I mean the O-notation. Please help me because I think it is O(1). Is
this correct ?
Lookups are O(1) on average
Assuming that the size of the hash table is fixed and that the hash function
equally distributes the data, then, on average, chains would be N/M items
long. That suggests that the order of lookups is an O(N/M) operation.

Two cases can be said to be O(1): if M is a linear function of N which means
that N/M = C, or if N is small and M is large. But it's really not accurate to
suggest that a hash table can, in general, perform lookups in constant time.

For more detailed analysis, look up "Algorithms in C++" by Robert Sedgewick
or volume 3 of "The Art of Computer Programming" by Donald Knuth.

-n

The hash buckets will be able to be accessed in constant time..
You can locate a bucket in constant time -- but, that in itself, is
meaningless. When dealing with chaining (although exactly the same argument
also applies to the other collision resolution strategies, and yields the same
conclusion) you end up having to iterate over the items that collided. And no
matter how you cut, slice or dice it, iterating a list of N items is an O(N)
operation. And such an operation cannot be constant.

the number of M depends on the frequency of your collisions... If you
handle collisions through chaining your hash may degrade in performance
to a linear search over M elements (all those that mapped to the
particular N location)...
Degradation will happen regardless of your collision resolution strategy as
the load factor of the hash table goes up.

But in practice average case you will
experience constant time lookup. Or if you like it better... In my
practice I have experienced constant time lookup before degradation. I
The thing is this: a hash table cuts search times by orders of magnitude
because it dramatically reduces the amount of items that must be inspected.
That can make it "feel" like a constant time operation. But that doesn't make
it a constant time operation.

It's common sense, really. Once you have more items that buckets, you
invariably end up having to perform some operations over N items, where N is
the number of items that have the same hash value. And no matter how one cuts,
slices or dices the operations in question, they will, at best, end up taking
linear time. And linear time is not constant time.

Of course, you could say "well, all my hash tables have 500 buckets and 1000
items, and every bucket has exactly 2 items, so according to your O(N/M) logic
that works out to O(2) which is a constant, so it's constant time! HA! HA!"
But really, you might as well say "my linked list always has a million items,
so iterating the list takes O(1000000) which means I can iterate my million
item list from start to finish in constant time." You might as well say that
because all pennies are coins and dimes are coins, 100 pennies and 100 dimes
are the same thing.

-n
Dec 28 '06 #7

P: n/a
Of course, you could say "well, all my hash tables have 500 buckets and 1000
items, and every bucket has exactly 2 items, so according to your O(N/M) logic
that works out to O(2) which is a constant, so it's constant time! HA! HA!"
Yes.. that's exactly what I was going to say... wow it's like you don't
even need to anyone else around to carry on a conversation!

So explain to lowly me the purpose of having a hash map if you are
saying its complexity is equal to that of a linked list.

Does your explanation ivolve something like Hash Maps are designed to
have a O(1) lookup on average but could exhibit a O(n) lookup depending
on your hash algorithm.

p.s. The multiplicative Hash function proposed by Knuth in "The Art of
Comp[uter Programming" exhibits poor clustering behavior ... but I
guess there is no universal consensus on a "good" hash

Dec 28 '06 #8

P: n/a
bjeremy wrote:
So explain to lowly me the purpose of having a hash map if you are
saying its complexity is equal to that of a linked list.
That is not what I am saying at all. I'm saying that the complexity of lookup
in a hash table, excluding the cost of calculating the hash from the key, is
on average O(N/M). This applies whether you have more items that buckets or
more buckets than items, provided you have a decent hash function.

Do you disagree that if a hash bucket has P items inside, you still have to
perform a linear-time scan of those items to find the particular one you're
interested in, and that the best possible time complexity for that linear
search is O(P)?

Do you agree that in the best case, P will be one, in the worst, P will be N,
and ideally it should be close to N/M.
Does your explanation ivolve something like Hash Maps are designed to
have a O(1) lookup on average but could exhibit a O(n) lookup depending
on your hash algorithm.
No, although hash tables can exhibit horrible worst case behavior, as you
said, with chained ones degenerating to a linked list and others (i.e. open
probing) exhibiting similar behavior.

-n

Dec 28 '06 #9

P: n/a

Nikolaos D. Bougalis wrote:
bjeremy wrote:
So explain to lowly me the purpose of having a hash map if you are
saying its complexity is equal to that of a linked list.

That is not what I am saying at all. I'm saying that the complexity of lookup
in a hash table, excluding the cost of calculating the hash from the key, is
on average O(N/M). This applies whether you have more items that buckets or
more buckets than items, provided you have a decent hash function.

Do you disagree that if a hash bucket has P items inside, you still have to
perform a linear-time scan of those items to find the particular one you're
interested in, and that the best possible time complexity for that linear
search is O(P)?

Do you agree that in the best case, P will be one, in the worst, P will be N,
and ideally it should be close to N/M.
Does your explanation ivolve something like Hash Maps are designed to
have a O(1) lookup on average but could exhibit a O(n) lookup depending
on your hash algorithm.

No, although hash tables can exhibit horrible worst case behavior, as you
said, with chained ones degenerating to a linked list and others (i.e. open
probing) exhibiting similar behavior.

-n
I don't disagree, but what I'm inferring (which may be different than
what you're implying) is that you are computing the complexity based
upon frequent and expected collisions. A good hash table (with a "good"
hash function) should try to limit the number of collisions. Now, a
nonzero probability of collisions is inevitable in any implementation
of a hash table, but a hash table should try to minimize the amount of
collisions. i.e. if all the keys are known ahead of time, and there are
no more keys that can fit in the hash table, then perfect hashing can
be used which leaves no collisions, and the lookup time would be O(1).

That was not intended to be a "gotcha" scenerio, but my argument is
that even though you probably will not do perfect hashing, my intent
when using a hash map is to try and make collisions a rare or
infrequent occurence. Otherwise, I wouldn't so much be creating a hash
map as I would be creating a two dimensional lookup table with an
expensive hash function.

Dec 28 '06 #10

P: n/a

bjeremy wrote:
Nikolaos D. Bougalis wrote:
particular N location)... But in practice average case you will
experience constant time lookup.
Only in algorithmic meaning. In real world, hash maps slow down as you
are getting out of L1 and L2 cache (and long time later, when you start
swapping ;) In fact, getting out of L1/L2 can have bigger impact than
collisions IME.

Mirek

Dec 28 '06 #11

P: n/a

Nikolaos D. Bougalis wrote:
It's common sense, really. Once you have more items that buckets, you
invariably end up having to perform some operations over N items, where N is
the number of items that have the same hash value. And no matter how one cuts,
slices or dices the operations in question, they will, at best, end up taking
linear time. And linear time is not constant time.
Who says you ever have to have more items than buckets? One can
trivially design a hash table so that the number of buckets is equal to
the number of items. While it's not particularly efficient to do that,
its problems are not in the scalability department. (It just wastes
storage space.)

You can create a hash table that has O(1) insert, delete, and find
times. However, you have to meet all of the following requirements:

1) The size of the hash value must be much larger than the number of
items in the hash. The choice of hash function will always place an
upper limit on scalability.

2) The number of buckets must adjust with the number of elements, such
that the average number of items per bucket stays constant. (Thus M is
a linear function of N, and N/M becomes a constant.)

3) The cost of adjusting the number of buckets must not be dependent on
the number of buckets. (Lawson's algorithm meets this requirement.)

4) The hash function must produces values for the objects that are
effectively random.

If you're sufficiently clever, you can design hash functions so that
the speed of an insertion with 100 objects, 50,000 objects, or
50,000,000 objects is nearly the same. All you see are cache effects.

DS

Jan 4 '07 #12

This discussion thread is closed

Replies have been disabled for this discussion.