473,320 Members | 1,722 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

hash_set: problem with hash function?

I have a class "Data" and I store Data pointers in an STL set. But I have
millions of inserts and many more lookups, and my profiler found that they
cost a lot of runtime.

Therefore, I want to substitute the set<Data*> with a hash_set<Data*>:

typedef hash_set<const Data*, hash<const Data*>, eqData> DataPointerHashSet;
// typedef set<Data*> DataPointerHashSet; // (see complete code below)

But it doesn't work! Everything is fine when I use set<Data*>, but when I
plug in hash_set<...> the program gives a wrong answer and then crashes.

I tried to make a mini example, see code below. But the problem is, the
mini example seems to work fine. Just my original example crashes.

Can anyone see problems with my hash_set usage in the mini example below?
Problems that might occur only in certain situations (collisions)? I don't
really know how to write the hash function, so I adapted it from an example
that I found somewhere.
#include <iostream>

#if __GNUC__ < 3 && __GNUC__ >= 2 && __GNUC_MINOR__ >= 95
# include <hash_set>
# include <functional>
# define gnu_namespace std
#elif __GNUC__ >= 3
# include <ext/hash_set>
# if __GNUC_MINOR__ == 0
# include <functional>
# define gnu_namespace std
# else
# include <ext/functional>
# define gnu_namespace __gnu_cxx
# endif
#else
# include <hash_set.h>
# include <functional.h>
# define gnu_namespace std
#endif
using namespace gnu_namespace;
#include <set>
using namespace std;

class Data {
private:
string someData;
unsigned index;
public:
const unsigned getIndex() const { return index; }
Data(const string& s) : someData(s) {
static unsigned index; // a unique index for each Data obj
this->index = index++;
}
};
bool operator == (const Data& n1, const Data& n2 ){
return n1.getIndex() == n2.getIndex();
}

namespace gnu_namespace {
template<>
struct hash<const Data*> {
hash<unsigned> hasher;
size_t operator()(const Data* n) const {
return hasher(n->getIndex()); // wrong hash function??
}
};
}
struct eqData{
bool operator()(const Data* s1, const Data* s2) const{
return *s1 == *s2;
}
};

typedef hash_set<const Data*, hash<const Data*>, eqData>
DataPointerHashSet;
// typedef set<Data*> DataPointerHashSet;

int main(){
DataPointerHashSet s;
const unsigned size = 10000;

Data* pointers[size];
for(int i = 0; i < size; ++i){
pointers[i] = new Data("bla");
s.insert(pointers[i]);
}
for(int i = 0; i < size; ++i){
if(s.find(pointers[i]) == s.end()){
cerr << "ERROR" << endl;
exit(EXIT_FAILURE);
}
}
for(int i = 0; i < size; ++i){
delete pointers[i];
}
}

Jul 23 '05 #1
3 3289
Markus Dehmann wrote:
I have a class "Data" and I store Data pointers in an STL set. But I have
millions of inserts and many more lookups, and my profiler found that they
cost a lot of runtime.

Therefore, I want to substitute the set<Data*> with a hash_set<Data*>:

typedef hash_set<const Data*, hash<const Data*>, eqData> DataPointerHashSet;
// typedef set<Data*> DataPointerHashSet; // (see complete code below)

But it doesn't work! Everything is fine when I use set<Data*>, but when I
plug in hash_set<...> the program gives a wrong answer and then crashes.

I tried to make a mini example, see code below. But the problem is, the
mini example seems to work fine. Just my original example crashes.

Can anyone see problems with my hash_set usage in the mini example below?
Problems that might occur only in certain situations (collisions)? I don't
really know how to write the hash function, so I adapted it from an example
that I found somewhere.
#include <iostream>

#if __GNUC__ < 3 && __GNUC__ >= 2 && __GNUC_MINOR__ >= 95
# include <hash_set>
# include <functional>
# define gnu_namespace std
#elif __GNUC__ >= 3
# include <ext/hash_set>
# if __GNUC_MINOR__ == 0
# include <functional>
# define gnu_namespace std
# else
# include <ext/functional>
# define gnu_namespace __gnu_cxx
# endif
#else
# include <hash_set.h>
# include <functional.h>
# define gnu_namespace std
#endif
using namespace gnu_namespace;
#include <set>
using namespace std;

class Data {
private:
string someData;
unsigned index;
public:
const unsigned getIndex() const { return index; }
Data(const string& s) : someData(s) {
static unsigned index; // a unique index for each Data obj
this->index = index++;
}
};
bool operator == (const Data& n1, const Data& n2 ){
return n1.getIndex() == n2.getIndex();
}

namespace gnu_namespace {
template<>
struct hash<const Data*> {
hash<unsigned> hasher;
size_t operator()(const Data* n) const {
return hasher(n->getIndex()); // wrong hash function??
}
};
}
struct eqData{
bool operator()(const Data* s1, const Data* s2) const{
return *s1 == *s2;
}
};

typedef hash_set<const Data*, hash<const Data*>, eqData>
DataPointerHashSet;
// typedef set<Data*> DataPointerHashSet;

int main(){
DataPointerHashSet s;
const unsigned size = 10000;

Data* pointers[size];
for(int i = 0; i < size; ++i){
pointers[i] = new Data("bla");
s.insert(pointers[i]);
}
for(int i = 0; i < size; ++i){
if(s.find(pointers[i]) == s.end()){
cerr << "ERROR" << endl;
exit(EXIT_FAILURE);
}
}
for(int i = 0; i < size; ++i){
delete pointers[i];
}
}


I don't entirely understand what you are trying to do with the hash
function here. The point of a hash function is to basically to turn
some block of data into an integer. That integer does not have to be
unique, but it does have to be deterministic (that is, the if you hash
the same thing twice, you get the same value). The quality of a hash
function can be measured by how many collisions it produces. A
collision occurs when two or more objects hash to the same value.

In your case, you are storing a set of pointers. I am curious, what is
it that you want to be unique, the pointers, or the things to which they
are pointing? For example, if I did the following:
Data *p1 = new Data("blah") ;
Data *p2 = new Data("blah") ;

Should I be able to insert both p1 and p2 into the set? The pointers
are certainly different (they point to different memory locations), but
the objects to which they point are equivalent. If it is the pointers
themselves you wish to be unique (i.e., both p1 and p2 WOULD be allowed
in the set) then I suggest simply using the pointer value as your hash.
For example:

class data_pointer_hash
{
public :
size_t operator()(const Data *p)
{
return reinterpret_cast<size_t>(p) ;
}
} ;

If it is the objects themselves that are important to you, then you need
to come up with some way to turn your object into an integer. If your
data is a string (as in the mini-example you gave) you can use the
templated hash function already provided by your STL implementation.
For example (note that for this example to work you will have to make
this class a friend of Data):

class data_hash
{
private :
hash<const char *> h ;
public :
size_t operator()(const Data *p)
{
return h(p->someData.c_str()) ;
}
} ;

Given that you are using, in your example, an equality function object
that compares the actual objects (instead of just the pointers), the
latter is likely the more appropriate hash function for you.
Jul 23 '05 #2
Alan Johnson wrote:
the objects to which they point are equivalent. If it is the pointers
themselves you wish to be unique (i.e., both p1 and p2 WOULD be allowed
in the set) then I suggest simply using the pointer value as your hash.
For example:

class data_pointer_hash
{
public :
size_t operator()(const Data *p)
{
return reinterpret_cast<size_t>(p) ;
}
} ;


Thanks! That's what I wanted. It turned out there was a bug in the original
code that assigned duplicate indexes. Now I am getting rid of the indexes
at all and use your solution with directly hashing the pointers.

Markus

Jul 23 '05 #3
Markus Dehmann wrote:
I have a class "Data" and I store Data pointers in an STL set. But I have
millions of inserts and many more lookups, and my profiler found that they
cost a lot of runtime.

Therefore, I want to substitute the set<Data*> with a hash_set<Data*>:

typedef hash_set<const Data*, hash<const Data*>, eqData>
DataPointerHashSet;
// typedef set<Data*> DataPointerHashSet; // (see complete code below)


Actually, there are only a few hundred insertions and millions of lookups.
Is hash_set the best structure for that? After I fixed my bug I didn't
find evidence that hash_set worked faster than set. Would a vector and
stl::find be the better solution (since there are only a few hundred
elements)?

Markus

Jul 23 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Abhijit Ray | last post by:
I am using hash_set which is available from gcc ( and which i presume is not part of the C++ standard yet ) okay , In hash tables the key is used by a hash function to calculate a index and the...
10
by: Alex Gerdemann | last post by:
Hello, I have spent a bunch of time converting a Java program I wrote to C++ in order to improve performance, and have found that it is not necessarily faster. Specifically, I'm writing a...
1
by: Timo Qvist | last post by:
Hi, I'm a bit new to STL and really new to SGI's hash_set implementation and I've having problem instantiating a hash_set with a custom hash function, I could really use some help sifting through...
4
by: yuyang08 | last post by:
Hello, everyone, I am wondering what is the hash function that is used in hash_map/hash_set. Can I replace it with my own hash function? Any comments on this? Thanks! -Andy
8
by: Rakesh | last post by:
Hi - What is wrong this implementation? I get a core dump at the free() statement? Thanks Rakesh #include <ext/hash_map> #include <iostream.h> #include <ext/hash_set>
2
by: Pierre Couderc | last post by:
Generally, is there somewhere a good tutorial and examplefor the use of SGI STL hash_set? I am lost in SGI documentation. More specifically, i am trying to use hat I need that a hash_set : ...
5
by: Amit Bhatia | last post by:
Hi, I was wondering if I have a hash_set, can I modify its elements using an iterator: given the fact that the changes I make will not change the position or the key of the object that the...
3
by: misu101 | last post by:
Hi, I have an application that defines a hash_set as: typedef hash_set<const char*, hash<const char*>, chunker_eqstr> StringSet; This code compiles OK with VS6 and SGI STL library. It is...
5
by: Markus Dehmann | last post by:
Do I have to handle hash collisions in a hash_set myself? I did a test in which I use find() to look for objects in a hash_set. These objects are definitely not contained, but find() sometimes...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.