473,414 Members | 1,775 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,414 software developers and data experts.

SGI hash_map

Hello,

Yes, I know that hash maps aren't in the standard. But I couldn't find any
better newsgroup for this post. (or is there an SGI library newsgroup?)

I am currently testing the hash_map implementation of SGI. And now I am not
sure if it is really true, what I discovered....
Here is my code:
#include <stdint.h> // for uint64_t
#include <iostream>
using namespace std;

#include <ext/hash_map>
using namespace __gnu_cxx;
const int C_BUCKETS = 700;
const int C_INSERTIONS = 800;

struct hashFunc {
size_t operator() (const uint64_t& ujHash) const { return 1; /* return
static_cast<size_t>(ujHash); */ }
};

typedef hash_map<uint64_t, uint64_t, hashFunc> MyHashMap;
int main()
{
MyHashMap myHashMap(C_BUCKETS);

cout << "bucket_count: " << myHashMap.bucket_count() << endl;
for (uint64_t uj = 0; uj < C_INSERTIONS; ++uj) {
myHashMap.insert(make_pair(uj, uj));
} // for
cout << "bucket_count: " << myHashMap.bucket_count() << endl;

} // main()
As you can see, my hash function always returns 1. So all the values go into
the same bucket. When I run this program, I get the following output:
bucket_count: 769
bucket_count: 1543

My question is why??? After inserting the 769th element, the number of
buckets is doubled. I could understand this behaviour if each element went
into a own bucket and all buckets were used. But I use only one bucket
because of my hash function. The hash map never has less buckets than number
of elements.... When I set C_INSERTIONS to (1543 + 1) then the bucket_count
returns 3079...
Now, what am I doing wrong?
Or is this really the meaning of the implementation of the SGI hash_map? If
so, why is this done like that?

Thanks for your answers!

Chris
Jul 23 '05 #1
4 5826
"Christian Meier" <ch***@gmx.ch> wrote in message
news:d1**********@news.hispeed.ch...
Yes, I know that hash maps aren't in the standard. But I couldn't find any
better newsgroup for this post. (or is there an SGI library newsgroup?)

I am currently testing the hash_map implementation of SGI. IIRC, hash_map and hash_set (etc) are expected to enter the next C++
standard
library as unordered_map and unordered_set.

[...] As you can see, my hash function always returns 1. So all the values go
into
the same bucket. When I run this program, I get the following output:
bucket_count: 769
bucket_count: 1543

My question is why??? Your hash function is obviously malformed, and the container does not
expect it. With a uniform hash function, increasing the number of buckets
will uniformly decrease the number of items per bucket.
Even if all items fall into the same bucket for a given bucket count,
the algorithm can legitimately expect that increasing the bucket count
will lead to a somewhat more uniform distribution of elements.
After inserting the 769th element, the number of
buckets is doubled. I could understand this behaviour if each element went
into a own bucket and all buckets were used. But I use only one bucket
because of my hash function. The hash map never has less buckets than
number
of elements.... When I set C_INSERTIONS to (1543 + 1) then the
bucket_count
returns 3079...
Now, what am I doing wrong?

A good hash function is essential for these containers to work correcly.
It is essential that the function returns a relatively uniformly distributed
random value. A minimalistic way to achieve this is to multiply an input
value by some large prime number, better is to use one of the many well-
studied hash functions you'll find on the web.
For an intro, see for example:
http://www.concentric.net/~Ttwang/tech/inthash.htm
I hope this helps,
Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
Jul 23 '05 #2

"Ivan Vecerina" <NO**********************************@vecerina.com > schrieb
im Newsbeitrag news:d1**********@news.hispeed.ch...
"Christian Meier" <ch***@gmx.ch> wrote in message
news:d1**********@news.hispeed.ch...
Yes, I know that hash maps aren't in the standard. But I couldn't find any better newsgroup for this post. (or is there an SGI library newsgroup?)

I am currently testing the hash_map implementation of SGI. IIRC, hash_map and hash_set (etc) are expected to enter the next C++
standard
library as unordered_map and unordered_set.

[...]
As you can see, my hash function always returns 1. So all the values go
into
the same bucket. When I run this program, I get the following output:
bucket_count: 769
bucket_count: 1543

My question is why???

Your hash function is obviously malformed, and the container does not
expect it. With a uniform hash function, increasing the number of buckets
will uniformly decrease the number of items per bucket.
Even if all items fall into the same bucket for a given bucket count,
the algorithm can legitimately expect that increasing the bucket count
will lead to a somewhat more uniform distribution of elements.
After inserting the 769th element, the number of
buckets is doubled. I could understand this behaviour if each element went into a own bucket and all buckets were used. But I use only one bucket
because of my hash function. The hash map never has less buckets than
number
of elements.... When I set C_INSERTIONS to (1543 + 1) then the
bucket_count
returns 3079...
Now, what am I doing wrong?

A good hash function is essential for these containers to work correcly.


Yes, that's the reason why I didn't delete my origin hash function source
code:
size_t operator() (const uint64_t& ujHash) const { return 1; /* return
static_cast<size_t>(ujHash); */ }

Returning 1 was just for testing purposes.... to be sure that all elements
go into the same bucket.
It is essential that the function returns a relatively uniformly distributed random value. A minimalistic way to achieve this is to multiply an input
value by some large prime number, better is to use one of the many well-
studied hash functions you'll find on the web.
For an intro, see for example:
http://www.concentric.net/~Ttwang/tech/inthash.htm
I hope this helps,
Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form


Because there is no hash<uint64_t> function, I wrote my own. As the
hash<int> value for an int of the value 5435438 is 5435438 and for 123456 is
123456, I just return the uint64_t value:
return static_cast<size_t>(ujHash);

My values do not have to be multiplied by a prime number because I get
different values with little difference (not 1000000, 2000000 and 3000000).
And before inserting into the map, each hash value is calculated with:
hash_val %= bucket_count();
And the number of buckets is always a prime number in the SGI
implementation.

In the meantime I looked up the source code of the SGI library. And there is
the function insert_unique which is called by the hash map function
::insert():

pair<iterator, bool> insert_unique(const value_type& __obj)
{
resize(_M_num_elements + 1);
return insert_unique_noresize(__obj);
}

This means: Each time an element is inserted into the hash map, it will be
checked for resizing depending on _M_num_elements. _M_num_elements is the
number of ALL elements in the map. If I have all elements in the same
bucket, the map will be resized after reaching the number of buckets altough
they are all in the same bucket...
I don't know why this is written like this. This implementation is written
for a hash codes which are unique. Well, this is no problem for numeric data
types of smaller size than std::size_t. But this implementation of the hash
map would be quite ugly if I wanted to insert large strings for example.....
Well, I could answer my question by myself. But I do not really understand
why the SGI people want to have as many buckets as elements in every
case....

But thanks for your help anyway!

Greets Chris
Jul 23 '05 #3
It's my understanding the hash_map is working correctly. For the
rehashing (thus increasing the number of buckets) it only takes into
account the global usage of the table, not the usage of each bucket or
anything like that. As soon as that usage rises over a certain point,
the table is rehashed, regardless of it being a degenerate case (all
elements into the same bucket) or not.

-- Javier

Christian Meier wrote:
Hello,

Yes, I know that hash maps aren't in the standard. But I couldn't find any
better newsgroup for this post. (or is there an SGI library newsgroup?)

I am currently testing the hash_map implementation of SGI. And now I am not
sure if it is really true, what I discovered....
Here is my code:
#include <stdint.h> // for uint64_t
#include <iostream>
using namespace std;

#include <ext/hash_map>
using namespace __gnu_cxx;
const int C_BUCKETS = 700;
const int C_INSERTIONS = 800;

struct hashFunc {
size_t operator() (const uint64_t& ujHash) const { return 1; /* return
static_cast<size_t>(ujHash); */ }
};

typedef hash_map<uint64_t, uint64_t, hashFunc> MyHashMap;
int main()
{
MyHashMap myHashMap(C_BUCKETS);

cout << "bucket_count: " << myHashMap.bucket_count() << endl;
for (uint64_t uj = 0; uj < C_INSERTIONS; ++uj) {
myHashMap.insert(make_pair(uj, uj));
} // for
cout << "bucket_count: " << myHashMap.bucket_count() << endl;

} // main()
As you can see, my hash function always returns 1. So all the values go into
the same bucket. When I run this program, I get the following output:
bucket_count: 769
bucket_count: 1543

My question is why??? After inserting the 769th element, the number of
buckets is doubled. I could understand this behaviour if each element went
into a own bucket and all buckets were used. But I use only one bucket
because of my hash function. The hash map never has less buckets than number
of elements.... When I set C_INSERTIONS to (1543 + 1) then the bucket_count
returns 3079...
Now, what am I doing wrong?
Or is this really the meaning of the implementation of the SGI hash_map? If
so, why is this done like that?

Thanks for your answers!

Chris

Jul 23 '05 #4
"Christian Meier" <ch***@gmx.ch> wrote in message
news:d1**********@news.hispeed.ch...
"Ivan Vecerina" <NO**********************************@vecerina.com >
schrieb
It is essential that the function returns a relatively uniformly
distributed
random value. A minimalistic way to achieve this is to multiply an input
value by some large prime number, better is to use one of the many well-
studied hash functions you'll find on the web.
For an intro, see for example:
http://www.concentric.net/~Ttwang/tech/inthash.htm
.... Because there is no hash<uint64_t> function, I wrote my own. As the
hash<int> value for an int of the value 5435438 is 5435438 and for 123456
is
123456, I just return the uint64_t value:
return static_cast<size_t>(ujHash);

My values do not have to be multiplied by a prime number because I get
different values with little difference (not 1000000, 2000000 and
3000000).
And before inserting into the map, each hash value is calculated with:
hash_val %= bucket_count();
And the number of buckets is always a prime number in the SGI
implementation. Depending on how the values are distributed, you may or may not have
a uniform distribution. If you care to check, you could probably write
a program to count the number of buckets that contain multiple items.
In the meantime I looked up the source code of the SGI library. And there
is
the function insert_unique which is called by the hash map function
::insert():

pair<iterator, bool> insert_unique(const value_type& __obj)
{
resize(_M_num_elements + 1);
return insert_unique_noresize(__obj);
}

This means: Each time an element is inserted into the hash map, it will be
checked for resizing depending on _M_num_elements. _M_num_elements is the
number of ALL elements in the map. If I have all elements in the same
bucket, the map will be resized after reaching the number of buckets
altough
they are all in the same bucket...
I don't know why this is written like this. This ensures that item search is always as efficient as possible (if
this doesn't matter to a program, then std::map may be a better candiadate).
Like for the resizing of std::vector, the number of 'rehasings' in hashmap
is amortized constant relative to the number of contained item. So
this is normally not a problem. (NB: there are some sophisticated hash
table algorithms to dynamically 'redistribute' items, but they only make
sense in specific implementations).
This implementation is written for a hash codes which are unique. Yes, this is what they are supposed to be !
Well, this is no problem for numeric data
types of smaller size than std::size_t. But this implementation of the
hash
map would be quite ugly if I wanted to insert large strings for
example..... Again, not really a problem because the number of hasch code computations
is amortized constant (~2) per item inserted.
Well, I could answer my question by myself. But I do not really understand
why the SGI people want to have as many buckets as elements in every
case....

In non-pathological cases (proper hashing) this is what allows hash_map
to perform queries at optimal speed - this is the only benefit of
hash_map. Searching (linearily) through multiple items in the same
bucket can be quite expensive.
Cheers,
Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
Jul 23 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Florian Liefers | last post by:
"Hello World\n", i get error C2143 (Syntaxerror, missing ';' before '<') using the following code: #include <hash_map> struct eqstr { bool operator()(const char* s1, const char* s2) const
10
by: Jon Cosby | last post by:
I need help in hashmaps. Why doesn't this work: #include <hash_map> hash_map <int, string> hm1; typedef pair <int, string> pr; hm1.insert(str_pair(1, "Hello")); It compiles, but crashes at...
3
by: Mark | last post by:
Hi, I'm trying to use hash_map (gcc 3.2.2) with a std::string as the key. It will compile if I use <map> but I get a bunch of template compile errors when I change it to hash_map. Any...
5
by: peter_k | last post by:
Hi I've defined hash_map in my code using this: ------------------------------------------- #include <string> #include <hash_map.h> & namespace __gnu_cxx {
3
by: kony | last post by:
Hi there, I would much appreciate your help with the following problem. Below is the code that uses a hash_map. I want to release all the memory occupied by the hash_map for other use. Apparently...
1
by: jayesah | last post by:
Hi All, I am developing my code with Apache stdcxx. I am bound to use STL of Apache only. Now today I need hash_map in code but as I learned, it is not available in Apache since it is not...
2
by: Amit Bhatia | last post by:
Hi, I am trying to use hash maps from STL on gcc 3.3 as follows: #ifndef NODE_H #define NODE_H #include <ext/hash_map> #include "node_hasher.h" class Node; typedef hash_map<pair<int,int>,...
4
by: James Kanze | last post by:
On Jul 16, 10:53 pm, Mirco Wahab <wa...@chemie.uni-halle.dewrote: It depends. You might like to have a look at my "Hashing.hh" header (in the code at kanze.james.neuf.fr/code-en.html---the...
5
by: frankw | last post by:
Hi, I have a hash_map with string as key and an object pointer as value. the object is like class{ public: float a; float b; ...
2
by: marek.vondrak | last post by:
Hi, I am wondering if there are any functional differences between SGI's hash_map and tr1's unordered_map. Can these two containers be interchanged? What would it take to switch from hash_map to...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.