473,385 Members | 1,610 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

hash_set: how to handle collisions?

Do I have to handle hash collisions in a hash_set myself?

I did a test in which I use find() to look for objects in a hash_set.
These objects are definitely not contained, but find() sometimes finds
them anyway. See this code:

<code>
#include <iostream>
#include <stdexcept>
#include <ctime>
#include <set>
// also include header that defines gnu_namespace and includes
hash_set

class MyContainer {
std::vector<intv;
public:
MyContainer(){}
std::size_t hashcode() const {
std::size_t hash = 0;
for(unsigned i=0; i<v.size(); ++i){ // sdbm function gives
collisions sometimes
hash = v[i] + (hash << 6) + (hash << 16) - hash;
}
return hash;
}
void add(int i){v.push_back(i);}
};

struct eqPtrs {
bool operator()(const MyContainer* h1, const MyContainer* h2) const
{
return h1->hashcode() == h2->hashcode(); // problematic b/c of
collisions?
}
};

namespace gnu_namespace {
template<struct hash<const MyContainer*{
inline size_t operator()(const MyContainer* d) const {
return d->hashcode();
}
};
}

int getRand(int min, int max){
return ((rand() % (max-min+1)) + min);
}

int main(int argc, char** argv){
srand(time(0));
typedef hash_set<const MyContainer*, hash<const MyContainer*>,
eqPtrsMyMap;
MyMap myMap;
int repeat = 100000;
int size = 10;
for(int i=0; i<repeat; ++i){
MyContainer* h = new MyContainer();
for(int j=0; j<size; ++j){
h->add(getRand(0, 1000));
}
myMap.insert(h);
}
for(int i=0; i<repeat; ++i){
MyContainer* h = new MyContainer();
for(int j=0; j<size; ++j){
h->add(getRand(2000, 3000));
}
MyMap::const_iterator found = myMap.find(h);
assert(found == myMap.end()); // aborts!
}
// TODO: finally delete elements in myMap
return EXIT_SUCCESS;
}
</code>

The solution seems to be to adapt the equality condition in eqPtrs to
also test for actual equality of the members, not just equality of the
hash codes:

struct eqPtrs {
bool operator()(const MyContainer* h1, const MyContainer* h2) const
{
return h1->hashcode() == h2->hashcode() && haveSameElements(*h1,
*h2); // added condition
}
};

Is that the solution, or am I doing something wrong in general?
General comments on my code are welcome!

Markus

Jul 11 '08 #1
5 2497
On Jul 10, 9:20*pm, Markus Dehmann <markus.dehm...@gmail.comwrote:
Do I have to handle hash collisions in a hash_set myself?

I did a test in which I use find() to look for objects in a hash_set.
These objects are definitely not contained, but find() sometimes finds
them anyway. See this code:

<code>
#include <iostream>
#include <stdexcept>
#include <ctime>
#include <set>
// also include header that defines gnu_namespace and includes
hash_set

class MyContainer {
* std::vector<intv;
public:
* MyContainer(){}
* std::size_t hashcode() const {
* * std::size_t hash = 0;
* * for(unsigned i=0; i<v.size(); ++i){ // sdbm function gives
collisions sometimes
* * * hash = v[i] + (hash << 6) + (hash << 16) - hash;
* * }
* * return hash;
* }
* void add(int i){v.push_back(i);}

};

struct eqPtrs {
* bool operator()(const MyContainer* h1, const MyContainer* h2) const
{
* * return h1->hashcode() == h2->hashcode(); // problematic b/c of
collisions?
* }

};

namespace gnu_namespace {
* template<struct hash<const MyContainer*{
* * inline size_t operator()(const MyContainer* d) const {
* * * return d->hashcode();
* * }
* };

}

int getRand(int min, int max){
* return ((rand() % (max-min+1)) + min);

}

int main(int argc, char** argv){
* srand(time(0));
* typedef hash_set<const MyContainer*, hash<const MyContainer*>,
eqPtrsMyMap;
* MyMap myMap;
* int repeat = 100000;
* int size = 10;
* for(int i=0; i<repeat; ++i){
* * MyContainer* h = new MyContainer();
* * for(int j=0; j<size; ++j){
* * * h->add(getRand(0, 1000));
* * }
* * myMap.insert(h);
* }
* for(int i=0; i<repeat; ++i){
* * MyContainer* h = new MyContainer();
* * for(int j=0; j<size; ++j){
* * * h->add(getRand(2000, 3000));
* * }
* * MyMap::const_iterator found = myMap.find(h);
* * assert(found == myMap.end()); // aborts!
* }
* // TODO: finally delete elements in myMap
* return EXIT_SUCCESS;}

</code>

The solution seems to be to adapt the equality condition in eqPtrs to
also test for actual equality of the members, not just equality of the
hash codes:

struct eqPtrs {
* bool operator()(const MyContainer* h1, const MyContainer* h2) const
{
* * return h1->hashcode() == h2->hashcode() && haveSameElements(*h1,
*h2); // added condition
* }

};

Is that the solution, or am I doing something wrong in general?
General comments on my code are welcome!

Markus
You actually have already figured out. eqPrts should return true only
when the two myContainer's have the same elements.
Buy the way, doesn't having the same elements imply having the same
hashcode? Comparing hashcode in eqPrts is redundant.
On a different note, try use Boost.Pointer_container library, it's a
lot safer that what you have posted.
Jul 11 '08 #2
On Jul 10, 9:44*pm, huil...@gmail.com wrote:
On Jul 10, 9:20*pm, Markus Dehmann <markus.dehm...@gmail.comwrote:
Do I have to handle hash collisions in a hash_set myself?
I did a test in which I use find() to look for objects in a hash_set.
These objects are definitely not contained, but find() sometimes finds
them anyway. See this code:
<code>
#include <iostream>
#include <stdexcept>
#include <ctime>
#include <set>
// also include header that defines gnu_namespace and includes
hash_set
class MyContainer {
* std::vector<intv;
public:
* MyContainer(){}
* std::size_t hashcode() const {
* * std::size_t hash = 0;
* * for(unsigned i=0; i<v.size(); ++i){ // sdbm function gives
collisions sometimes
* * * hash = v[i] + (hash << 6) + (hash << 16) - hash;
* * }
* * return hash;
* }
* void add(int i){v.push_back(i);}
};
struct eqPtrs {
* bool operator()(const MyContainer* h1, const MyContainer* h2) const
{
* * return h1->hashcode() == h2->hashcode(); // problematic b/cof
collisions?
* }
};
namespace gnu_namespace {
* template<struct hash<const MyContainer*{
* * inline size_t operator()(const MyContainer* d) const {
* * * return d->hashcode();
* * }
* };
}
int getRand(int min, int max){
* return ((rand() % (max-min+1)) + min);
}
int main(int argc, char** argv){
* srand(time(0));
* typedef hash_set<const MyContainer*, hash<const MyContainer*>,
eqPtrsMyMap;
* MyMap myMap;
* int repeat = 100000;
* int size = 10;
* for(int i=0; i<repeat; ++i){
* * MyContainer* h = new MyContainer();
* * for(int j=0; j<size; ++j){
* * * h->add(getRand(0, 1000));
* * }
* * myMap.insert(h);
* }
* for(int i=0; i<repeat; ++i){
* * MyContainer* h = new MyContainer();
* * for(int j=0; j<size; ++j){
* * * h->add(getRand(2000, 3000));
* * }
* * MyMap::const_iterator found = myMap.find(h);
* * assert(found == myMap.end()); // aborts!
* }
* // TODO: finally delete elements in myMap
* return EXIT_SUCCESS;}
</code>
The solution seems to be to adapt the equality condition in eqPtrs to
also test for actual equality of the members, not just equality of the
hash codes:
struct eqPtrs {
* bool operator()(const MyContainer* h1, const MyContainer* h2) const
{
* * return h1->hashcode() == h2->hashcode() && haveSameElements(*h1,
*h2); // added condition
* }
};
Is that the solution, or am I doing something wrong in general?
General comments on my code are welcome!
Markus

You actually have already figured out. eqPrts should return true only
when the two myContainer's have the same elements.
Buy the way, doesn't having the same elements imply having the same
hashcode? Comparing hashcode in eqPrts is redundant.
On a different note, try use Boost.Pointer_container library, it's a
lot safer that what you have posted.
Oops... Boost.Pointer_container doesn't seem to have hash_set or
hash_map ...
Jul 11 '08 #3
On Jul 11, 3:20 am, Markus Dehmann <markus.dehm...@gmail.comwrote:
Do I have to handle hash collisions in a hash_set myself?
Which hash_set? There is not, and almost certainly never will
be, a hash_set in C++.

Which is, of course, a technicality. There will be an
unordered_set, and most existing hash_set are only slightly
different (but in different and incompatible ways).
I did a test in which I use find() to look for objects in a
hash_set. These objects are definitely not contained, but
find() sometimes finds them anyway. See this code:
<code>
#include <iostream>
#include <stdexcept>
#include <ctime>
#include <set>
// also include header that defines gnu_namespace and includes
// hash_set
If you're asking specifics about the GNU implementation, you
should probably ask an implementation specific group. But
supposing that the issue is more general, and concerns what
unordered_set will guarantee in the end.
class MyContainer {
std::vector<intv;
public:
MyContainer(){}
std::size_t hashcode() const {
std::size_t hash = 0;
for(unsigned i=0; i<v.size(); ++i){ // sdbm function gives
collisions sometimes
hash = v[i] + (hash << 6) + (hash << 16) - hash;
}
return hash;
}
void add(int i){v.push_back(i);}
};
struct eqPtrs {
bool operator()(const MyContainer* h1, const MyContainer* h2) const
{
return h1->hashcode() == h2->hashcode(); // problematic b/c of
collisions?
}
};
You've defined two objects to be equal if their hash codes are
equal. Is that really what you want?
namespace gnu_namespace {
template<struct hash<const MyContainer*{
inline size_t operator()(const MyContainer* d) const {
return d->hashcode();
}
};
}
int getRand(int min, int max){
return ((rand() % (max-min+1)) + min);
}
int main(int argc, char** argv){
srand(time(0));
typedef hash_set<const MyContainer*, hash<const MyContainer*>,
eqPtrsMyMap;
MyMap myMap;
int repeat = 100000;
int size = 10;
for(int i=0; i<repeat; ++i){
MyContainer* h = new MyContainer();
for(int j=0; j<size; ++j){
h->add(getRand(0, 1000));
}
myMap.insert(h);
}
for(int i=0; i<repeat; ++i){
MyContainer* h = new MyContainer();
for(int j=0; j<size; ++j){
h->add(getRand(2000, 3000));
}
MyMap::const_iterator found = myMap.find(h);
assert(found == myMap.end()); // aborts!
I'm not sure I understand. You create a random object, and
assert that it is in the container. Why should you expect it to
be in the container?
}
// TODO: finally delete elements in myMap
return EXIT_SUCCESS;}
</code>
The solution seems to be to adapt the equality condition in
eqPtrs to also test for actual equality of the members, not
just equality of the hash codes:
struct eqPtrs {
bool operator()(const MyContainer* h1, const MyContainer* h2) const
{
return h1->hashcode() == h2->hashcode() && haveSameElements(*h1,
*h2); // added condition
}
};
Is that the solution, or am I doing something wrong in general?
I don't know. It's up to you to define what equality should
mean. The only requirement is that if two elements are equal,
they have the same hash code.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Jul 11 '08 #4
On Jul 11, 5:54*am, James Kanze <james.ka...@gmail.comwrote:
On Jul 11, 3:20 am, Markus Dehmann <markus.dehm...@gmail.comwrote:
* for(int i=0; i<repeat; ++i){
* * MyContainer* h = new MyContainer();
* * for(int j=0; j<size; ++j){
* * * h->add(getRand(0, 1000));
* * }
* * myMap.insert(h);
* }
* for(int i=0; i<repeat; ++i){
* * MyContainer* h = new MyContainer();
* * for(int j=0; j<size; ++j){
* * * h->add(getRand(2000, 3000));
* * }
* * MyMap::const_iterator found = myMap.find(h);
* * assert(found == myMap.end()); // aborts!

I'm not sure I understand. *You create a random object, and
assert that it is in the container. *Why should you expect it to
be in the container?
No, I define an object that is guaranteed *not* to be in the container
and ask the container to find it. The assertion says that the object
should not be found.

Markus
Jul 12 '08 #5
On Jul 12, 7:47 pm, Markus Dehmann <markus.dehm...@gmail.comwrote:
On Jul 11, 5:54 am, James Kanze <james.ka...@gmail.comwrote:
On Jul 11, 3:20 am, Markus Dehmann <markus.dehm...@gmail.comwrote:
for(int i=0; i<repeat; ++i){
MyContainer* h = new MyContainer();
for(int j=0; j<size; ++j){
h->add(getRand(0, 1000));
}
myMap.insert(h);
}
for(int i=0; i<repeat; ++i){
MyContainer* h = new MyContainer();
for(int j=0; j<size; ++j){
h->add(getRand(2000, 3000));
}
MyMap::const_iterator found = myMap.find(h);
assert(found == myMap.end()); // aborts!
I'm not sure I understand. You create a random object, and
assert that it is in the container. Why should you expect it to
be in the container?
No, I define an object that is guaranteed *not* to be in the
container and ask the container to find it. The assertion says
that the object should not be found.
Yes, I misread this part. Except that given the way you defined
equality, you didn't guarantee that an equal object wouldn't be
in the container.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Jul 12 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Abhijit Ray | last post by:
I am using hash_set which is available from gcc ( and which i presume is not part of the C++ standard yet ) okay , In hash tables the key is used by a hash function to calculate a index and the...
10
by: Alex Gerdemann | last post by:
Hello, I have spent a bunch of time converting a Java program I wrote to C++ in order to improve performance, and have found that it is not necessarily faster. Specifically, I'm writing a...
5
by: Bart Blommerde | last post by:
Hi, My question is about the STL extensions hash_set and hash_map, especially the SGI versions of these templates. When defining a class like this : #include <hash_set> class MyClass : public...
1
by: Timo Qvist | last post by:
Hi, I'm a bit new to STL and really new to SGI's hash_set implementation and I've having problem instantiating a hash_set with a custom hash function, I could really use some help sifting through...
3
by: Markus Dehmann | last post by:
I have a class "Data" and I store Data pointers in an STL set. But I have millions of inserts and many more lookups, and my profiler found that they cost a lot of runtime. Therefore, I want to...
5
by: jian | last post by:
I am trying to use the hash_set, but cannot get it work. Here's my code: // file name: hash1.cpp #include <hash_set> int main() { hash_set<int> h; } When I type:
1
by: zs | last post by:
Hi! I get warning message shown below in VS.NET 2k3. Is this deprecated by microsoft or by standard? I need hash_set to store and search small strings (<20 chars long). I'll have less then 300...
8
by: Rakesh | last post by:
Hi - What is wrong this implementation? I get a core dump at the free() statement? Thanks Rakesh #include <ext/hash_map> #include <iostream.h> #include <ext/hash_set>
2
by: Pierre Couderc | last post by:
Generally, is there somewhere a good tutorial and examplefor the use of SGI STL hash_set? I am lost in SGI documentation. More specifically, i am trying to use hat I need that a hash_set : ...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.