472,372 Members | 1,457 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,372 software developers and data experts.

SGI hash_map

Hello,

Yes, I know that hash maps aren't in the standard. But I couldn't find any
better newsgroup for this post. (or is there an SGI library newsgroup?)

I am currently testing the hash_map implementation of SGI. And now I am not
sure if it is really true, what I discovered....
Here is my code:
#include <stdint.h> // for uint64_t
#include <iostream>
using namespace std;

#include <ext/hash_map>
using namespace __gnu_cxx;
const int C_BUCKETS = 700;
const int C_INSERTIONS = 800;

struct hashFunc {
size_t operator() (const uint64_t& ujHash) const { return 1; /* return
static_cast<size_t>(ujHash); */ }
};

typedef hash_map<uint64_t, uint64_t, hashFunc> MyHashMap;
int main()
{
MyHashMap myHashMap(C_BUCKETS);

cout << "bucket_count: " << myHashMap.bucket_count() << endl;
for (uint64_t uj = 0; uj < C_INSERTIONS; ++uj) {
myHashMap.insert(make_pair(uj, uj));
} // for
cout << "bucket_count: " << myHashMap.bucket_count() << endl;

} // main()
As you can see, my hash function always returns 1. So all the values go into
the same bucket. When I run this program, I get the following output:
bucket_count: 769
bucket_count: 1543

My question is why??? After inserting the 769th element, the number of
buckets is doubled. I could understand this behaviour if each element went
into a own bucket and all buckets were used. But I use only one bucket
because of my hash function. The hash map never has less buckets than number
of elements.... When I set C_INSERTIONS to (1543 + 1) then the bucket_count
returns 3079...
Now, what am I doing wrong?
Or is this really the meaning of the implementation of the SGI hash_map? If
so, why is this done like that?

Thanks for your answers!

Chris
Jul 23 '05 #1
4 5756
"Christian Meier" <ch***@gmx.ch> wrote in message
news:d1**********@news.hispeed.ch...
Yes, I know that hash maps aren't in the standard. But I couldn't find any
better newsgroup for this post. (or is there an SGI library newsgroup?)

I am currently testing the hash_map implementation of SGI. IIRC, hash_map and hash_set (etc) are expected to enter the next C++
standard
library as unordered_map and unordered_set.

[...] As you can see, my hash function always returns 1. So all the values go
into
the same bucket. When I run this program, I get the following output:
bucket_count: 769
bucket_count: 1543

My question is why??? Your hash function is obviously malformed, and the container does not
expect it. With a uniform hash function, increasing the number of buckets
will uniformly decrease the number of items per bucket.
Even if all items fall into the same bucket for a given bucket count,
the algorithm can legitimately expect that increasing the bucket count
will lead to a somewhat more uniform distribution of elements.
After inserting the 769th element, the number of
buckets is doubled. I could understand this behaviour if each element went
into a own bucket and all buckets were used. But I use only one bucket
because of my hash function. The hash map never has less buckets than
number
of elements.... When I set C_INSERTIONS to (1543 + 1) then the
bucket_count
returns 3079...
Now, what am I doing wrong?

A good hash function is essential for these containers to work correcly.
It is essential that the function returns a relatively uniformly distributed
random value. A minimalistic way to achieve this is to multiply an input
value by some large prime number, better is to use one of the many well-
studied hash functions you'll find on the web.
For an intro, see for example:
http://www.concentric.net/~Ttwang/tech/inthash.htm
I hope this helps,
Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
Jul 23 '05 #2

"Ivan Vecerina" <NO**********************************@vecerina.com > schrieb
im Newsbeitrag news:d1**********@news.hispeed.ch...
"Christian Meier" <ch***@gmx.ch> wrote in message
news:d1**********@news.hispeed.ch...
Yes, I know that hash maps aren't in the standard. But I couldn't find any better newsgroup for this post. (or is there an SGI library newsgroup?)

I am currently testing the hash_map implementation of SGI. IIRC, hash_map and hash_set (etc) are expected to enter the next C++
standard
library as unordered_map and unordered_set.

[...]
As you can see, my hash function always returns 1. So all the values go
into
the same bucket. When I run this program, I get the following output:
bucket_count: 769
bucket_count: 1543

My question is why???

Your hash function is obviously malformed, and the container does not
expect it. With a uniform hash function, increasing the number of buckets
will uniformly decrease the number of items per bucket.
Even if all items fall into the same bucket for a given bucket count,
the algorithm can legitimately expect that increasing the bucket count
will lead to a somewhat more uniform distribution of elements.
After inserting the 769th element, the number of
buckets is doubled. I could understand this behaviour if each element went into a own bucket and all buckets were used. But I use only one bucket
because of my hash function. The hash map never has less buckets than
number
of elements.... When I set C_INSERTIONS to (1543 + 1) then the
bucket_count
returns 3079...
Now, what am I doing wrong?

A good hash function is essential for these containers to work correcly.


Yes, that's the reason why I didn't delete my origin hash function source
code:
size_t operator() (const uint64_t& ujHash) const { return 1; /* return
static_cast<size_t>(ujHash); */ }

Returning 1 was just for testing purposes.... to be sure that all elements
go into the same bucket.
It is essential that the function returns a relatively uniformly distributed random value. A minimalistic way to achieve this is to multiply an input
value by some large prime number, better is to use one of the many well-
studied hash functions you'll find on the web.
For an intro, see for example:
http://www.concentric.net/~Ttwang/tech/inthash.htm
I hope this helps,
Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form


Because there is no hash<uint64_t> function, I wrote my own. As the
hash<int> value for an int of the value 5435438 is 5435438 and for 123456 is
123456, I just return the uint64_t value:
return static_cast<size_t>(ujHash);

My values do not have to be multiplied by a prime number because I get
different values with little difference (not 1000000, 2000000 and 3000000).
And before inserting into the map, each hash value is calculated with:
hash_val %= bucket_count();
And the number of buckets is always a prime number in the SGI
implementation.

In the meantime I looked up the source code of the SGI library. And there is
the function insert_unique which is called by the hash map function
::insert():

pair<iterator, bool> insert_unique(const value_type& __obj)
{
resize(_M_num_elements + 1);
return insert_unique_noresize(__obj);
}

This means: Each time an element is inserted into the hash map, it will be
checked for resizing depending on _M_num_elements. _M_num_elements is the
number of ALL elements in the map. If I have all elements in the same
bucket, the map will be resized after reaching the number of buckets altough
they are all in the same bucket...
I don't know why this is written like this. This implementation is written
for a hash codes which are unique. Well, this is no problem for numeric data
types of smaller size than std::size_t. But this implementation of the hash
map would be quite ugly if I wanted to insert large strings for example.....
Well, I could answer my question by myself. But I do not really understand
why the SGI people want to have as many buckets as elements in every
case....

But thanks for your help anyway!

Greets Chris
Jul 23 '05 #3
It's my understanding the hash_map is working correctly. For the
rehashing (thus increasing the number of buckets) it only takes into
account the global usage of the table, not the usage of each bucket or
anything like that. As soon as that usage rises over a certain point,
the table is rehashed, regardless of it being a degenerate case (all
elements into the same bucket) or not.

-- Javier

Christian Meier wrote:
Hello,

Yes, I know that hash maps aren't in the standard. But I couldn't find any
better newsgroup for this post. (or is there an SGI library newsgroup?)

I am currently testing the hash_map implementation of SGI. And now I am not
sure if it is really true, what I discovered....
Here is my code:
#include <stdint.h> // for uint64_t
#include <iostream>
using namespace std;

#include <ext/hash_map>
using namespace __gnu_cxx;
const int C_BUCKETS = 700;
const int C_INSERTIONS = 800;

struct hashFunc {
size_t operator() (const uint64_t& ujHash) const { return 1; /* return
static_cast<size_t>(ujHash); */ }
};

typedef hash_map<uint64_t, uint64_t, hashFunc> MyHashMap;
int main()
{
MyHashMap myHashMap(C_BUCKETS);

cout << "bucket_count: " << myHashMap.bucket_count() << endl;
for (uint64_t uj = 0; uj < C_INSERTIONS; ++uj) {
myHashMap.insert(make_pair(uj, uj));
} // for
cout << "bucket_count: " << myHashMap.bucket_count() << endl;

} // main()
As you can see, my hash function always returns 1. So all the values go into
the same bucket. When I run this program, I get the following output:
bucket_count: 769
bucket_count: 1543

My question is why??? After inserting the 769th element, the number of
buckets is doubled. I could understand this behaviour if each element went
into a own bucket and all buckets were used. But I use only one bucket
because of my hash function. The hash map never has less buckets than number
of elements.... When I set C_INSERTIONS to (1543 + 1) then the bucket_count
returns 3079...
Now, what am I doing wrong?
Or is this really the meaning of the implementation of the SGI hash_map? If
so, why is this done like that?

Thanks for your answers!

Chris

Jul 23 '05 #4
"Christian Meier" <ch***@gmx.ch> wrote in message
news:d1**********@news.hispeed.ch...
"Ivan Vecerina" <NO**********************************@vecerina.com >
schrieb
It is essential that the function returns a relatively uniformly
distributed
random value. A minimalistic way to achieve this is to multiply an input
value by some large prime number, better is to use one of the many well-
studied hash functions you'll find on the web.
For an intro, see for example:
http://www.concentric.net/~Ttwang/tech/inthash.htm
.... Because there is no hash<uint64_t> function, I wrote my own. As the
hash<int> value for an int of the value 5435438 is 5435438 and for 123456
is
123456, I just return the uint64_t value:
return static_cast<size_t>(ujHash);

My values do not have to be multiplied by a prime number because I get
different values with little difference (not 1000000, 2000000 and
3000000).
And before inserting into the map, each hash value is calculated with:
hash_val %= bucket_count();
And the number of buckets is always a prime number in the SGI
implementation. Depending on how the values are distributed, you may or may not have
a uniform distribution. If you care to check, you could probably write
a program to count the number of buckets that contain multiple items.
In the meantime I looked up the source code of the SGI library. And there
is
the function insert_unique which is called by the hash map function
::insert():

pair<iterator, bool> insert_unique(const value_type& __obj)
{
resize(_M_num_elements + 1);
return insert_unique_noresize(__obj);
}

This means: Each time an element is inserted into the hash map, it will be
checked for resizing depending on _M_num_elements. _M_num_elements is the
number of ALL elements in the map. If I have all elements in the same
bucket, the map will be resized after reaching the number of buckets
altough
they are all in the same bucket...
I don't know why this is written like this. This ensures that item search is always as efficient as possible (if
this doesn't matter to a program, then std::map may be a better candiadate).
Like for the resizing of std::vector, the number of 'rehasings' in hashmap
is amortized constant relative to the number of contained item. So
this is normally not a problem. (NB: there are some sophisticated hash
table algorithms to dynamically 'redistribute' items, but they only make
sense in specific implementations).
This implementation is written for a hash codes which are unique. Yes, this is what they are supposed to be !
Well, this is no problem for numeric data
types of smaller size than std::size_t. But this implementation of the
hash
map would be quite ugly if I wanted to insert large strings for
example..... Again, not really a problem because the number of hasch code computations
is amortized constant (~2) per item inserted.
Well, I could answer my question by myself. But I do not really understand
why the SGI people want to have as many buckets as elements in every
case....

In non-pathological cases (proper hashing) this is what allows hash_map
to perform queries at optimal speed - this is the only benefit of
hash_map. Searching (linearily) through multiple items in the same
bucket can be quite expensive.
Cheers,
Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
Jul 23 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Florian Liefers | last post by:
"Hello World\n", i get error C2143 (Syntaxerror, missing ';' before '<') using the following code: #include <hash_map> struct eqstr { bool operator()(const char* s1, const char* s2) const
10
by: Jon Cosby | last post by:
I need help in hashmaps. Why doesn't this work: #include <hash_map> hash_map <int, string> hm1; typedef pair <int, string> pr; hm1.insert(str_pair(1, "Hello")); It compiles, but crashes at...
3
by: Mark | last post by:
Hi, I'm trying to use hash_map (gcc 3.2.2) with a std::string as the key. It will compile if I use <map> but I get a bunch of template compile errors when I change it to hash_map. Any...
5
by: peter_k | last post by:
Hi I've defined hash_map in my code using this: ------------------------------------------- #include <string> #include <hash_map.h> & namespace __gnu_cxx {
3
by: kony | last post by:
Hi there, I would much appreciate your help with the following problem. Below is the code that uses a hash_map. I want to release all the memory occupied by the hash_map for other use. Apparently...
1
by: jayesah | last post by:
Hi All, I am developing my code with Apache stdcxx. I am bound to use STL of Apache only. Now today I need hash_map in code but as I learned, it is not available in Apache since it is not...
2
by: Amit Bhatia | last post by:
Hi, I am trying to use hash maps from STL on gcc 3.3 as follows: #ifndef NODE_H #define NODE_H #include <ext/hash_map> #include "node_hasher.h" class Node; typedef hash_map<pair<int,int>,...
4
by: James Kanze | last post by:
On Jul 16, 10:53 pm, Mirco Wahab <wa...@chemie.uni-halle.dewrote: It depends. You might like to have a look at my "Hashing.hh" header (in the code at kanze.james.neuf.fr/code-en.html---the...
5
by: frankw | last post by:
Hi, I have a hash_map with string as key and an object pointer as value. the object is like class{ public: float a; float b; ...
2
by: marek.vondrak | last post by:
Hi, I am wondering if there are any functional differences between SGI's hash_map and tr1's unordered_map. Can these two containers be interchanged? What would it take to switch from hash_map to...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was proposed, which integrated multiple engines and...
1
by: Matthew3360 | last post by:
Hi, I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web server and have made sure to enable curl. I get a...
0
Oralloy
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the synthesis of my design into a bitstream, not the C++...
0
by: Carina712 | last post by:
Setting background colors for Excel documents can help to improve the visual appeal of the document and make it easier to read and understand. Background colors can be used to highlight important...
0
BLUEPANDA
by: BLUEPANDA | last post by:
At BluePanda Dev, we're passionate about building high-quality software and sharing our knowledge with the community. That's why we've created a SaaS starter kit that's not only easy to use but also...
0
by: Rahul1995seven | last post by:
Introduction: In the realm of programming languages, Python has emerged as a powerhouse. With its simplicity, versatility, and robustness, Python has gained popularity among beginners and experts...
1
by: Johno34 | last post by:
I have this click event on my form. It speaks to a Datasheet Subform Private Sub Command260_Click() Dim r As DAO.Recordset Set r = Form_frmABCD.Form.RecordsetClone r.MoveFirst Do If...
1
by: ezappsrUS | last post by:
Hi, I wonder if someone knows where I am going wrong below. I have a continuous form and two labels where only one would be visible depending on the checkbox being checked or not. Below is the...
0
by: jack2019x | last post by:
hello, Is there code or static lib for hook swapchain present? I wanna hook dxgi swapchain present for dx11 and dx9.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.