473,789 Members | 2,729 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

SGI hash_map

Hello,

Yes, I know that hash maps aren't in the standard. But I couldn't find any
better newsgroup for this post. (or is there an SGI library newsgroup?)

I am currently testing the hash_map implementation of SGI. And now I am not
sure if it is really true, what I discovered....
Here is my code:
#include <stdint.h> // for uint64_t
#include <iostream>
using namespace std;

#include <ext/hash_map>
using namespace __gnu_cxx;
const int C_BUCKETS = 700;
const int C_INSERTIONS = 800;

struct hashFunc {
size_t operator() (const uint64_t& ujHash) const { return 1; /* return
static_cast<siz e_t>(ujHash); */ }
};

typedef hash_map<uint64 _t, uint64_t, hashFunc> MyHashMap;
int main()
{
MyHashMap myHashMap(C_BUC KETS);

cout << "bucket_cou nt: " << myHashMap.bucke t_count() << endl;
for (uint64_t uj = 0; uj < C_INSERTIONS; ++uj) {
myHashMap.inser t(make_pair(uj, uj));
} // for
cout << "bucket_cou nt: " << myHashMap.bucke t_count() << endl;

} // main()
As you can see, my hash function always returns 1. So all the values go into
the same bucket. When I run this program, I get the following output:
bucket_count: 769
bucket_count: 1543

My question is why??? After inserting the 769th element, the number of
buckets is doubled. I could understand this behaviour if each element went
into a own bucket and all buckets were used. But I use only one bucket
because of my hash function. The hash map never has less buckets than number
of elements.... When I set C_INSERTIONS to (1543 + 1) then the bucket_count
returns 3079...
Now, what am I doing wrong?
Or is this really the meaning of the implementation of the SGI hash_map? If
so, why is this done like that?

Thanks for your answers!

Chris
Jul 23 '05 #1
4 5846
"Christian Meier" <ch***@gmx.ch > wrote in message
news:d1******** **@news.hispeed .ch...
Yes, I know that hash maps aren't in the standard. But I couldn't find any
better newsgroup for this post. (or is there an SGI library newsgroup?)

I am currently testing the hash_map implementation of SGI. IIRC, hash_map and hash_set (etc) are expected to enter the next C++
standard
library as unordered_map and unordered_set.

[...] As you can see, my hash function always returns 1. So all the values go
into
the same bucket. When I run this program, I get the following output:
bucket_count: 769
bucket_count: 1543

My question is why??? Your hash function is obviously malformed, and the container does not
expect it. With a uniform hash function, increasing the number of buckets
will uniformly decrease the number of items per bucket.
Even if all items fall into the same bucket for a given bucket count,
the algorithm can legitimately expect that increasing the bucket count
will lead to a somewhat more uniform distribution of elements.
After inserting the 769th element, the number of
buckets is doubled. I could understand this behaviour if each element went
into a own bucket and all buckets were used. But I use only one bucket
because of my hash function. The hash map never has less buckets than
number
of elements.... When I set C_INSERTIONS to (1543 + 1) then the
bucket_count
returns 3079...
Now, what am I doing wrong?

A good hash function is essential for these containers to work correcly.
It is essential that the function returns a relatively uniformly distributed
random value. A minimalistic way to achieve this is to multiply an input
value by some large prime number, better is to use one of the many well-
studied hash functions you'll find on the web.
For an intro, see for example:
http://www.concentric.net/~Ttwang/tech/inthash.htm
I hope this helps,
Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
Jul 23 '05 #2

"Ivan Vecerina" <NO************ *************** *******@vecerin a.com> schrieb
im Newsbeitrag news:d1******** **@news.hispeed .ch...
"Christian Meier" <ch***@gmx.ch > wrote in message
news:d1******** **@news.hispeed .ch...
Yes, I know that hash maps aren't in the standard. But I couldn't find any better newsgroup for this post. (or is there an SGI library newsgroup?)

I am currently testing the hash_map implementation of SGI. IIRC, hash_map and hash_set (etc) are expected to enter the next C++
standard
library as unordered_map and unordered_set.

[...]
As you can see, my hash function always returns 1. So all the values go
into
the same bucket. When I run this program, I get the following output:
bucket_count: 769
bucket_count: 1543

My question is why???

Your hash function is obviously malformed, and the container does not
expect it. With a uniform hash function, increasing the number of buckets
will uniformly decrease the number of items per bucket.
Even if all items fall into the same bucket for a given bucket count,
the algorithm can legitimately expect that increasing the bucket count
will lead to a somewhat more uniform distribution of elements.
After inserting the 769th element, the number of
buckets is doubled. I could understand this behaviour if each element went into a own bucket and all buckets were used. But I use only one bucket
because of my hash function. The hash map never has less buckets than
number
of elements.... When I set C_INSERTIONS to (1543 + 1) then the
bucket_count
returns 3079...
Now, what am I doing wrong?

A good hash function is essential for these containers to work correcly.


Yes, that's the reason why I didn't delete my origin hash function source
code:
size_t operator() (const uint64_t& ujHash) const { return 1; /* return
static_cast<siz e_t>(ujHash); */ }

Returning 1 was just for testing purposes.... to be sure that all elements
go into the same bucket.
It is essential that the function returns a relatively uniformly distributed random value. A minimalistic way to achieve this is to multiply an input
value by some large prime number, better is to use one of the many well-
studied hash functions you'll find on the web.
For an intro, see for example:
http://www.concentric.net/~Ttwang/tech/inthash.htm
I hope this helps,
Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form


Because there is no hash<uint64_t> function, I wrote my own. As the
hash<int> value for an int of the value 5435438 is 5435438 and for 123456 is
123456, I just return the uint64_t value:
return static_cast<siz e_t>(ujHash);

My values do not have to be multiplied by a prime number because I get
different values with little difference (not 1000000, 2000000 and 3000000).
And before inserting into the map, each hash value is calculated with:
hash_val %= bucket_count();
And the number of buckets is always a prime number in the SGI
implementation.

In the meantime I looked up the source code of the SGI library. And there is
the function insert_unique which is called by the hash map function
::insert():

pair<iterator, bool> insert_unique(c onst value_type& __obj)
{
resize(_M_num_e lements + 1);
return insert_unique_n oresize(__obj);
}

This means: Each time an element is inserted into the hash map, it will be
checked for resizing depending on _M_num_elements . _M_num_elements is the
number of ALL elements in the map. If I have all elements in the same
bucket, the map will be resized after reaching the number of buckets altough
they are all in the same bucket...
I don't know why this is written like this. This implementation is written
for a hash codes which are unique. Well, this is no problem for numeric data
types of smaller size than std::size_t. But this implementation of the hash
map would be quite ugly if I wanted to insert large strings for example.....
Well, I could answer my question by myself. But I do not really understand
why the SGI people want to have as many buckets as elements in every
case....

But thanks for your help anyway!

Greets Chris
Jul 23 '05 #3
It's my understanding the hash_map is working correctly. For the
rehashing (thus increasing the number of buckets) it only takes into
account the global usage of the table, not the usage of each bucket or
anything like that. As soon as that usage rises over a certain point,
the table is rehashed, regardless of it being a degenerate case (all
elements into the same bucket) or not.

-- Javier

Christian Meier wrote:
Hello,

Yes, I know that hash maps aren't in the standard. But I couldn't find any
better newsgroup for this post. (or is there an SGI library newsgroup?)

I am currently testing the hash_map implementation of SGI. And now I am not
sure if it is really true, what I discovered....
Here is my code:
#include <stdint.h> // for uint64_t
#include <iostream>
using namespace std;

#include <ext/hash_map>
using namespace __gnu_cxx;
const int C_BUCKETS = 700;
const int C_INSERTIONS = 800;

struct hashFunc {
size_t operator() (const uint64_t& ujHash) const { return 1; /* return
static_cast<siz e_t>(ujHash); */ }
};

typedef hash_map<uint64 _t, uint64_t, hashFunc> MyHashMap;
int main()
{
MyHashMap myHashMap(C_BUC KETS);

cout << "bucket_cou nt: " << myHashMap.bucke t_count() << endl;
for (uint64_t uj = 0; uj < C_INSERTIONS; ++uj) {
myHashMap.inser t(make_pair(uj, uj));
} // for
cout << "bucket_cou nt: " << myHashMap.bucke t_count() << endl;

} // main()
As you can see, my hash function always returns 1. So all the values go into
the same bucket. When I run this program, I get the following output:
bucket_count: 769
bucket_count: 1543

My question is why??? After inserting the 769th element, the number of
buckets is doubled. I could understand this behaviour if each element went
into a own bucket and all buckets were used. But I use only one bucket
because of my hash function. The hash map never has less buckets than number
of elements.... When I set C_INSERTIONS to (1543 + 1) then the bucket_count
returns 3079...
Now, what am I doing wrong?
Or is this really the meaning of the implementation of the SGI hash_map? If
so, why is this done like that?

Thanks for your answers!

Chris

Jul 23 '05 #4
"Christian Meier" <ch***@gmx.ch > wrote in message
news:d1******** **@news.hispeed .ch...
"Ivan Vecerina" <NO************ *************** *******@vecerin a.com>
schrieb
It is essential that the function returns a relatively uniformly
distributed
random value. A minimalistic way to achieve this is to multiply an input
value by some large prime number, better is to use one of the many well-
studied hash functions you'll find on the web.
For an intro, see for example:
http://www.concentric.net/~Ttwang/tech/inthash.htm
.... Because there is no hash<uint64_t> function, I wrote my own. As the
hash<int> value for an int of the value 5435438 is 5435438 and for 123456
is
123456, I just return the uint64_t value:
return static_cast<siz e_t>(ujHash);

My values do not have to be multiplied by a prime number because I get
different values with little difference (not 1000000, 2000000 and
3000000).
And before inserting into the map, each hash value is calculated with:
hash_val %= bucket_count();
And the number of buckets is always a prime number in the SGI
implementation. Depending on how the values are distributed, you may or may not have
a uniform distribution. If you care to check, you could probably write
a program to count the number of buckets that contain multiple items.
In the meantime I looked up the source code of the SGI library. And there
is
the function insert_unique which is called by the hash map function
::insert():

pair<iterator, bool> insert_unique(c onst value_type& __obj)
{
resize(_M_num_e lements + 1);
return insert_unique_n oresize(__obj);
}

This means: Each time an element is inserted into the hash map, it will be
checked for resizing depending on _M_num_elements . _M_num_elements is the
number of ALL elements in the map. If I have all elements in the same
bucket, the map will be resized after reaching the number of buckets
altough
they are all in the same bucket...
I don't know why this is written like this. This ensures that item search is always as efficient as possible (if
this doesn't matter to a program, then std::map may be a better candiadate).
Like for the resizing of std::vector, the number of 'rehasings' in hashmap
is amortized constant relative to the number of contained item. So
this is normally not a problem. (NB: there are some sophisticated hash
table algorithms to dynamically 'redistribute' items, but they only make
sense in specific implementations ).
This implementation is written for a hash codes which are unique. Yes, this is what they are supposed to be !
Well, this is no problem for numeric data
types of smaller size than std::size_t. But this implementation of the
hash
map would be quite ugly if I wanted to insert large strings for
example..... Again, not really a problem because the number of hasch code computations
is amortized constant (~2) per item inserted.
Well, I could answer my question by myself. But I do not really understand
why the SGI people want to have as many buckets as elements in every
case....

In non-pathological cases (proper hashing) this is what allows hash_map
to perform queries at optimal speed - this is the only benefit of
hash_map. Searching (linearily) through multiple items in the same
bucket can be quite expensive.
Cheers,
Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
Jul 23 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
9986
by: Florian Liefers | last post by:
"Hello World\n", i get error C2143 (Syntaxerror, missing ';' before '<') using the following code: #include <hash_map> struct eqstr { bool operator()(const char* s1, const char* s2) const
10
4760
by: Jon Cosby | last post by:
I need help in hashmaps. Why doesn't this work: #include <hash_map> hash_map <int, string> hm1; typedef pair <int, string> pr; hm1.insert(str_pair(1, "Hello")); It compiles, but crashes at runtime, pointing to something in xhash.
3
11881
by: Mark | last post by:
Hi, I'm trying to use hash_map (gcc 3.2.2) with a std::string as the key. It will compile if I use <map> but I get a bunch of template compile errors when I change it to hash_map. Any suggestions? My program and the errors are below... #include <ext/hash_map> #include <string>
5
8625
by: peter_k | last post by:
Hi I've defined hash_map in my code using this: ------------------------------------------- #include <string> #include <hash_map.h> & namespace __gnu_cxx {
3
3844
by: kony | last post by:
Hi there, I would much appreciate your help with the following problem. Below is the code that uses a hash_map. I want to release all the memory occupied by the hash_map for other use. Apparently clear() function is not working and the trick with swap() is half working. Does anybody know how to deallocate the hash_map? Thanks in advance. Kon #include <functional>
1
9402
by: jayesah | last post by:
Hi All, I am developing my code with Apache stdcxx. I am bound to use STL of Apache only. Now today I need hash_map in code but as I learned, it is not available in Apache since it is not standard c++. Though it is available with GNU STL. The code module where I use hash_map will generate separate object file during compilation. This code module is also using STL string.
2
4282
by: Amit Bhatia | last post by:
Hi, I am trying to use hash maps from STL on gcc 3.3 as follows: #ifndef NODE_H #define NODE_H #include <ext/hash_map> #include "node_hasher.h" class Node; typedef hash_map<pair<int,int>, Node, Node_HasherLoc_Tree;
4
3415
by: James Kanze | last post by:
On Jul 16, 10:53 pm, Mirco Wahab <wa...@chemie.uni-halle.dewrote: It depends. You might like to have a look at my "Hashing.hh" header (in the code at kanze.james.neuf.fr/code-en.html---the Hashing component is in the Basic section). Or for a discussion and some benchmarks, http://kanze.james.neuf.fr/code/Docs/html/Hashcode.html. (That article is a little out of date now, as I've tried quite a few more hashing algorithms. But the...
5
3374
by: frankw | last post by:
Hi, I have a hash_map with string as key and an object pointer as value. the object is like class{ public: float a; float b; ...
2
8146
by: marek.vondrak | last post by:
Hi, I am wondering if there are any functional differences between SGI's hash_map and tr1's unordered_map. Can these two containers be interchanged? What would it take to switch from hash_map to unordered_map? Thank you. -Marek
0
9666
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9511
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10199
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10139
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9983
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
5417
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4092
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3700
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2909
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.