hash

Douglas Dude wrote:

Douglas Dude wrote:
>how much time does it take to lok for a value - key in a hash_map ?
Thanks

I mean the O-notation. Please help me because I think it is O(1). Is
this correct ?

I don't see how an O(1) runtime is possible for a lookup in a _general
purpose_ hash table, which is what I assume hash_map to be.

If it uses chaining, one would expect that with N items and M buckets,
lookups would, on average and assuming that the hash function is not biased,
take O(N/M), without taking into account the algorithm that generates the hash
out of the key, if such an operation is required.

-n

Dec 27 '06 #3

bjeremy

Douglas Dude wrote:

Douglas Dude wrote:
how much time does it take to lok for a value - key in a hash_map ?
Thanks

I mean the O-notation. Please help me because I think it is O(1). Is
this correct ?

Lookups are O(1) on average, the rare worst case time could be up to
0(n). For a constant average time performance the use of a
well-designed hash function is necessary to produce a uniform
distribution of elements and reduce the number of collisions within
the hash

Dec 27 '06 #4

bjeremy wrote:

Douglas Dude wrote:
>Douglas Dude wrote:
>>how much time does it take to lok for a value - key in a hash_map ?
Thanks
I mean the O-notation. Please help me because I think it is O(1). Is
this correct ?

Lookups are O(1) on average

Assuming that the size of the hash table is fixed and that the hash function
equally distributes the data, then, on average, chains would be N/M items
long. That suggests that the order of lookups is an O(N/M) operation.

Two cases can be said to be O(1): if M is a linear function of N which means
that N/M = C, or if N is small and M is large. But it's really not accurate to
suggest that a hash table can, in general, perform lookups in constant time.

For more detailed analysis, look up "Algorithms in C++" by Robert Sedgewick
or volume 3 of "The Art of Computer Programming" by Donald Knuth.

-n

Dec 27 '06 #5

bjeremy

Nikolaos D. Bougalis wrote:

bjeremy wrote:
Douglas Dude wrote:
Douglas Dude wrote:
how much time does it take to lok for a value - key in a hash_map ?
Thanks
I mean the O-notation. Please help me because I think it is O(1). Is
this correct ?
Lookups are O(1) on average

Assuming that the size of the hash table is fixed and that the hash function
equally distributes the data, then, on average, chains would be N/M items
long. That suggests that the order of lookups is an O(N/M) operation.

Two cases can be said to be O(1): if M is a linear function of N which means
that N/M = C, or if N is small and M is large. But it's really not accurate to
suggest that a hash table can, in general, perform lookups in constant time.

For more detailed analysis, look up "Algorithms in C++" by Robert Sedgewick
or volume 3 of "The Art of Computer Programming" by Donald Knuth.

-n

The hash buckets will be able to be accessed in constant time.. the
number of M depends on the frequency of your collisions... If you
handle collisions through chaining your hash may degrade in performance
to a linear search over M elements (all those that mapped to the
particular N location)... But in practice average case you will
experience constant time lookup. Or if you like it better... In my
practice I have experienced constant time lookup before degradation. I

Dec 28 '06 #6

bjeremy wrote:

Nikolaos D. Bougalis wrote:
>bjeremy wrote:
>>Douglas Dude wrote:
Douglas Dude wrote:
how much time does it take to lok for a value - key in a hash_map ?
Thanks
I mean the O-notation. Please help me because I think it is O(1). Is
this correct ?
Lookups are O(1) on average
Assuming that the size of the hash table is fixed and that the hash function
equally distributes the data, then, on average, chains would be N/M items
long. That suggests that the order of lookups is an O(N/M) operation.

Two cases can be said to be O(1): if M is a linear function of N which means
that N/M = C, or if N is small and M is large. But it's really not accurate to
suggest that a hash table can, in general, perform lookups in constant time.

For more detailed analysis, look up "Algorithms in C++" by Robert Sedgewick
or volume 3 of "The Art of Computer Programming" by Donald Knuth.

-n

The hash buckets will be able to be accessed in constant time..

You can locate a bucket in constant time -- but, that in itself, is
meaningless. When dealing with chaining (although exactly the same argument
also applies to the other collision resolution strategies, and yields the same
conclusion) you end up having to iterate over the items that collided. And no
matter how you cut, slice or dice it, iterating a list of N items is an O(N)
operation. And such an operation cannot be constant.

the number of M depends on the frequency of your collisions... If you
handle collisions through chaining your hash may degrade in performance
to a linear search over M elements (all those that mapped to the
particular N location)...

Degradation will happen regardless of your collision resolution strategy as
the load factor of the hash table goes up.

But in practice average case you will
experience constant time lookup. Or if you like it better... In my
practice I have experienced constant time lookup before degradation. I

The thing is this: a hash table cuts search times by orders of magnitude
because it dramatically reduces the amount of items that must be inspected.
That can make it "feel" like a constant time operation. But that doesn't make
it a constant time operation.

It's common sense, really. Once you have more items that buckets, you
invariably end up having to perform some operations over N items, where N is
the number of items that have the same hash value. And no matter how one cuts,
slices or dices the operations in question, they will, at best, end up taking
linear time. And linear time is not constant time.

Of course, you could say "well, all my hash tables have 500 buckets and 1000
items, and every bucket has exactly 2 items, so according to your O(N/M) logic
that works out to O(2) which is a constant, so it's constant time! HA! HA!"
But really, you might as well say "my linked list always has a million items,
so iterating the list takes O(1000000) which means I can iterate my million
item list from start to finish in constant time." You might as well say that
because all pennies are coins and dimes are coins, 100 pennies and 100 dimes
are the same thing.

-n

Dec 28 '06 #7

bjeremy

Of course, you could say "well, all my hash tables have 500 buckets and 1000
items, and every bucket has exactly 2 items, so according to your O(N/M) logic
that works out to O(2) which is a constant, so it's constant time! HA! HA!"

Yes.. that's exactly what I was going to say... wow it's like you don't
even need to anyone else around to carry on a conversation!

So explain to lowly me the purpose of having a hash map if you are
saying its complexity is equal to that of a linked list.

Does your explanation ivolve something like Hash Maps are designed to
have a O(1) lookup on average but could exhibit a O(n) lookup depending
on your hash algorithm.

p.s. The multiplicative Hash function proposed by Knuth in "The Art of
Comp[uter Programming" exhibits poor clustering behavior ... but I
guess there is no universal consensus on a "good" hash

Dec 28 '06 #8

bjeremy wrote:

So explain to lowly me the purpose of having a hash map if you are
saying its complexity is equal to that of a linked list.

That is not what I am saying at all. I'm saying that the complexity of lookup
in a hash table, excluding the cost of calculating the hash from the key, is
on average O(N/M). This applies whether you have more items that buckets or
more buckets than items, provided you have a decent hash function.

Do you disagree that if a hash bucket has P items inside, you still have to
perform a linear-time scan of those items to find the particular one you're
interested in, and that the best possible time complexity for that linear
search is O(P)?

Do you agree that in the best case, P will be one, in the worst, P will be N,
and ideally it should be close to N/M.

Does your explanation ivolve something like Hash Maps are designed to
have a O(1) lookup on average but could exhibit a O(n) lookup depending
on your hash algorithm.

No, although hash tables can exhibit horrible worst case behavior, as you
said, with chained ones degenerating to a linked list and others (i.e. open
probing) exhibiting similar behavior.

-n

Dec 28 '06 #9

bjeremy

Nikolaos D. Bougalis wrote:

bjeremy wrote:

So explain to lowly me the purpose of having a hash map if you are
saying its complexity is equal to that of a linked list.

That is not what I am saying at all. I'm saying that the complexity of lookup
in a hash table, excluding the cost of calculating the hash from the key, is
on average O(N/M). This applies whether you have more items that buckets or
more buckets than items, provided you have a decent hash function.

Do you disagree that if a hash bucket has P items inside, you still have to
perform a linear-time scan of those items to find the particular one you're
interested in, and that the best possible time complexity for that linear
search is O(P)?

Do you agree that in the best case, P will be one, in the worst, P will be N,
and ideally it should be close to N/M.

Does your explanation ivolve something like Hash Maps are designed to
have a O(1) lookup on average but could exhibit a O(n) lookup depending
on your hash algorithm.

No, although hash tables can exhibit horrible worst case behavior, as you
said, with chained ones degenerating to a linked list and others (i.e. open
probing) exhibiting similar behavior.

-n

I don't disagree, but what I'm inferring (which may be different than
what you're implying) is that you are computing the complexity based
upon frequent and expected collisions. A good hash table (with a "good"
hash function) should try to limit the number of collisions. Now, a
nonzero probability of collisions is inevitable in any implementation
of a hash table, but a hash table should try to minimize the amount of
collisions. i.e. if all the keys are known ahead of time, and there are
no more keys that can fit in the hash table, then perfect hashing can
be used which leaves no collisions, and the lookup time would be O(1).

That was not intended to be a "gotcha" scenerio, but my argument is
that even though you probably will not do perfect hashing, my intent
when using a hash map is to try and make collisions a rare or
infrequent occurence. Otherwise, I wouldn't so much be creating a hash
map as I would be creating a two dimensional lookup table with an
expensive hash function.

Dec 28 '06 #10

Mirek Fidler

bjeremy wrote:

Nikolaos D. Bougalis wrote:
particular N location)... But in practice average case you will
experience constant time lookup.

Only in algorithmic meaning. In real world, hash maps slow down as you
are getting out of L1 and L2 cache (and long time later, when you start
swapping ;) In fact, getting out of L1/L2 can have bigger impact than
collisions IME.

Mirek

Dec 28 '06 #11

JoelKatz

Nikolaos D. Bougalis wrote:

It's common sense, really. Once you have more items that buckets, you
invariably end up having to perform some operations over N items, where N is
the number of items that have the same hash value. And no matter how one cuts,
slices or dices the operations in question, they will, at best, end up taking
linear time. And linear time is not constant time.

Who says you ever have to have more items than buckets? One can
trivially design a hash table so that the number of buckets is equal to
the number of items. While it's not particularly efficient to do that,
its problems are not in the scalability department. (It just wastes
storage space.)

You can create a hash table that has O(1) insert, delete, and find
times. However, you have to meet all of the following requirements:

1) The size of the hash value must be much larger than the number of
items in the hash. The choice of hash function will always place an
upper limit on scalability.

2) The number of buckets must adjust with the number of elements, such
that the average number of items per bucket stays constant. (Thus M is
a linear function of N, and N/M becomes a constant.)

3) The cost of adjusting the number of buckets must not be dependent on
the number of buckets. (Lawson's algorithm meets this requirement.)

4) The hash function must produces values for the objects that are
effectively random.

If you're sufficiently clever, you can design hash functions so that
the speed of an insertion with 100 objects, 50,000 objects, or
50,000,000 objects is nearly the same. All you see are cache effects.

DS

Jan 4 '07 #12

Similar topics

STL hash_map hash

by: Murali | last post by:

I have a requirement where I have to use two unsigned ints as a key in a STL hash map. A couple of ways to do this is 1. create a struct with two unsigned ints and use that as key (write my own...

Better crypto hash functions, long, with code

by: Bryan Olson | last post by:

The current Python standard library provides two cryptographic hash functions: MD5 and SHA-1 . The authors of MD5 originally stated: It is conjectured that it is computationally infeasible to...

don't need dictionary's keys - hash table?

by: kdotsky | last post by:

Hello, I am using some very large dictionaries with keys that are long strings (urls). For a large dictionary these keys start to take up a significant amount of memory. I do not need access to...

General Hash Functions In Python

by: Arash Partow | last post by:

Hi all, I've ported various hash functions to python if anyone is interested: def RSHash(key): a = 378551 b = 63689 hash = 0

My hash table is in need of peer review

by: Johan Tibell | last post by:

I would be grateful if someone had a minute or two to review my hash table implementation. It's not yet commented but hopefully it's short and idiomatic enough to be readable. Some of the code...

accessing char as int through union

by: Hallvard B Furuseth | last post by:

Is the code below valid? Generally a value must be accessed through the same type it was stored as, but there is an exception for data stored through a character type. I'm not sure if that...

How to implement a Hash Table in C

139

by: ravi | last post by:

Hi can anybody tell me that which ds will be best suited to implement a hash table in C/C++ thanx. in advanced

How to say $a=$b->{"A"} ||={} in Python?

by: beginner | last post by:

Hi All. I'd like to do the following in more succint code: if k in b: a=b else: a={} b=a

Printing array of references to hash maps (perl newbie)

JSON hash

by: Jeff | last post by:

Lets say we have what I would call a "hash": var HASH =new Array(); HASH='first'; HASH='second'; HASH='third'; I'd like to return that as JSON data. Is there a direct way to do that?

Javascript

by: sixtyfootersdude | last post by:

Good Morning! I am a perl newbie and I think that I am struggling with references. I have an array of references to hashes which I am trying to print. This is what I have: for(my...

Perl

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...