"Ivan Vecerina" <NO**********************************@vecerina.com > schrieb
im Newsbeitrag news:d1**********@news.hispeed.ch...
"Christian Meier" <ch***@gmx.ch> wrote in message
news:d1**********@news.hispeed.ch... Yes, I know that hash maps aren't in the standard. But I couldn't find
any better newsgroup for this post. (or is there an SGI library newsgroup?)
I am currently testing the hash_map implementation of SGI. IIRC, hash_map and hash_set (etc) are expected to enter the next C++
standard
library as unordered_map and unordered_set.
[...] As you can see, my hash function always returns 1. So all the values go
into
the same bucket. When I run this program, I get the following output:
bucket_count: 769
bucket_count: 1543
My question is why???
Your hash function is obviously malformed, and the container does not
expect it. With a uniform hash function, increasing the number of buckets
will uniformly decrease the number of items per bucket.
Even if all items fall into the same bucket for a given bucket count,
the algorithm can legitimately expect that increasing the bucket count
will lead to a somewhat more uniform distribution of elements.
After inserting the 769th element, the number of
buckets is doubled. I could understand this behaviour if each element
went into a own bucket and all buckets were used. But I use only one bucket
because of my hash function. The hash map never has less buckets than
number
of elements.... When I set C_INSERTIONS to (1543 + 1) then the
bucket_count
returns 3079...
Now, what am I doing wrong?
A good hash function is essential for these containers to work correcly.
Yes, that's the reason why I didn't delete my origin hash function source
code:
size_t operator() (const uint64_t& ujHash) const { return 1; /* return
static_cast<size_t>(ujHash); */ }
Returning 1 was just for testing purposes.... to be sure that all elements
go into the same bucket.
It is essential that the function returns a relatively uniformly
distributed random value. A minimalistic way to achieve this is to multiply an input
value by some large prime number, better is to use one of the many well-
studied hash functions you'll find on the web.
For an intro, see for example:
http://www.concentric.net/~Ttwang/tech/inthash.htm
I hope this helps,
Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
Because there is no hash<uint64_t> function, I wrote my own. As the
hash<int> value for an int of the value 5435438 is 5435438 and for 123456 is
123456, I just return the uint64_t value:
return static_cast<size_t>(ujHash);
My values do not have to be multiplied by a prime number because I get
different values with little difference (not 1000000, 2000000 and 3000000).
And before inserting into the map, each hash value is calculated with:
hash_val %= bucket_count();
And the number of buckets is always a prime number in the SGI
implementation.
In the meantime I looked up the source code of the SGI library. And there is
the function insert_unique which is called by the hash map function
::insert():
pair<iterator, bool> insert_unique(const value_type& __obj)
{
resize(_M_num_elements + 1);
return insert_unique_noresize(__obj);
}
This means: Each time an element is inserted into the hash map, it will be
checked for resizing depending on _M_num_elements. _M_num_elements is the
number of ALL elements in the map. If I have all elements in the same
bucket, the map will be resized after reaching the number of buckets altough
they are all in the same bucket...
I don't know why this is written like this. This implementation is written
for a hash codes which are unique. Well, this is no problem for numeric data
types of smaller size than std::size_t. But this implementation of the hash
map would be quite ugly if I wanted to insert large strings for example.....
Well, I could answer my question by myself. But I do not really understand
why the SGI people want to have as many buckets as elements in every
case....
But thanks for your help anyway!
Greets Chris