473,765 Members | 2,053 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

hash_set: problem with hash function?

I have a class "Data" and I store Data pointers in an STL set. But I have
millions of inserts and many more lookups, and my profiler found that they
cost a lot of runtime.

Therefore, I want to substitute the set<Data*> with a hash_set<Data*> :

typedef hash_set<const Data*, hash<const Data*>, eqData> DataPointerHash Set;
// typedef set<Data*> DataPointerHash Set; // (see complete code below)

But it doesn't work! Everything is fine when I use set<Data*>, but when I
plug in hash_set<...> the program gives a wrong answer and then crashes.

I tried to make a mini example, see code below. But the problem is, the
mini example seems to work fine. Just my original example crashes.

Can anyone see problems with my hash_set usage in the mini example below?
Problems that might occur only in certain situations (collisions)? I don't
really know how to write the hash function, so I adapted it from an example
that I found somewhere.
#include <iostream>

#if __GNUC__ < 3 && __GNUC__ >= 2 && __GNUC_MINOR__ >= 95
# include <hash_set>
# include <functional>
# define gnu_namespace std
#elif __GNUC__ >= 3
# include <ext/hash_set>
# if __GNUC_MINOR__ == 0
# include <functional>
# define gnu_namespace std
# else
# include <ext/functional>
# define gnu_namespace __gnu_cxx
# endif
#else
# include <hash_set.h>
# include <functional.h >
# define gnu_namespace std
#endif
using namespace gnu_namespace;
#include <set>
using namespace std;

class Data {
private:
string someData;
unsigned index;
public:
const unsigned getIndex() const { return index; }
Data(const string& s) : someData(s) {
static unsigned index; // a unique index for each Data obj
this->index = index++;
}
};
bool operator == (const Data& n1, const Data& n2 ){
return n1.getIndex() == n2.getIndex();
}

namespace gnu_namespace {
template<>
struct hash<const Data*> {
hash<unsigned> hasher;
size_t operator()(cons t Data* n) const {
return hasher(n->getIndex()); // wrong hash function??
}
};
}
struct eqData{
bool operator()(cons t Data* s1, const Data* s2) const{
return *s1 == *s2;
}
};

typedef hash_set<const Data*, hash<const Data*>, eqData>
DataPointerHash Set;
// typedef set<Data*> DataPointerHash Set;

int main(){
DataPointerHash Set s;
const unsigned size = 10000;

Data* pointers[size];
for(int i = 0; i < size; ++i){
pointers[i] = new Data("bla");
s.insert(pointe rs[i]);
}
for(int i = 0; i < size; ++i){
if(s.find(point ers[i]) == s.end()){
cerr << "ERROR" << endl;
exit(EXIT_FAILU RE);
}
}
for(int i = 0; i < size; ++i){
delete pointers[i];
}
}

Jul 23 '05 #1
3 3339
Markus Dehmann wrote:
I have a class "Data" and I store Data pointers in an STL set. But I have
millions of inserts and many more lookups, and my profiler found that they
cost a lot of runtime.

Therefore, I want to substitute the set<Data*> with a hash_set<Data*> :

typedef hash_set<const Data*, hash<const Data*>, eqData> DataPointerHash Set;
// typedef set<Data*> DataPointerHash Set; // (see complete code below)

But it doesn't work! Everything is fine when I use set<Data*>, but when I
plug in hash_set<...> the program gives a wrong answer and then crashes.

I tried to make a mini example, see code below. But the problem is, the
mini example seems to work fine. Just my original example crashes.

Can anyone see problems with my hash_set usage in the mini example below?
Problems that might occur only in certain situations (collisions)? I don't
really know how to write the hash function, so I adapted it from an example
that I found somewhere.
#include <iostream>

#if __GNUC__ < 3 && __GNUC__ >= 2 && __GNUC_MINOR__ >= 95
# include <hash_set>
# include <functional>
# define gnu_namespace std
#elif __GNUC__ >= 3
# include <ext/hash_set>
# if __GNUC_MINOR__ == 0
# include <functional>
# define gnu_namespace std
# else
# include <ext/functional>
# define gnu_namespace __gnu_cxx
# endif
#else
# include <hash_set.h>
# include <functional.h >
# define gnu_namespace std
#endif
using namespace gnu_namespace;
#include <set>
using namespace std;

class Data {
private:
string someData;
unsigned index;
public:
const unsigned getIndex() const { return index; }
Data(const string& s) : someData(s) {
static unsigned index; // a unique index for each Data obj
this->index = index++;
}
};
bool operator == (const Data& n1, const Data& n2 ){
return n1.getIndex() == n2.getIndex();
}

namespace gnu_namespace {
template<>
struct hash<const Data*> {
hash<unsigned> hasher;
size_t operator()(cons t Data* n) const {
return hasher(n->getIndex()); // wrong hash function??
}
};
}
struct eqData{
bool operator()(cons t Data* s1, const Data* s2) const{
return *s1 == *s2;
}
};

typedef hash_set<const Data*, hash<const Data*>, eqData>
DataPointerHash Set;
// typedef set<Data*> DataPointerHash Set;

int main(){
DataPointerHash Set s;
const unsigned size = 10000;

Data* pointers[size];
for(int i = 0; i < size; ++i){
pointers[i] = new Data("bla");
s.insert(pointe rs[i]);
}
for(int i = 0; i < size; ++i){
if(s.find(point ers[i]) == s.end()){
cerr << "ERROR" << endl;
exit(EXIT_FAILU RE);
}
}
for(int i = 0; i < size; ++i){
delete pointers[i];
}
}


I don't entirely understand what you are trying to do with the hash
function here. The point of a hash function is to basically to turn
some block of data into an integer. That integer does not have to be
unique, but it does have to be deterministic (that is, the if you hash
the same thing twice, you get the same value). The quality of a hash
function can be measured by how many collisions it produces. A
collision occurs when two or more objects hash to the same value.

In your case, you are storing a set of pointers. I am curious, what is
it that you want to be unique, the pointers, or the things to which they
are pointing? For example, if I did the following:
Data *p1 = new Data("blah") ;
Data *p2 = new Data("blah") ;

Should I be able to insert both p1 and p2 into the set? The pointers
are certainly different (they point to different memory locations), but
the objects to which they point are equivalent. If it is the pointers
themselves you wish to be unique (i.e., both p1 and p2 WOULD be allowed
in the set) then I suggest simply using the pointer value as your hash.
For example:

class data_pointer_ha sh
{
public :
size_t operator()(cons t Data *p)
{
return reinterpret_cas t<size_t>(p) ;
}
} ;

If it is the objects themselves that are important to you, then you need
to come up with some way to turn your object into an integer. If your
data is a string (as in the mini-example you gave) you can use the
templated hash function already provided by your STL implementation.
For example (note that for this example to work you will have to make
this class a friend of Data):

class data_hash
{
private :
hash<const char *> h ;
public :
size_t operator()(cons t Data *p)
{
return h(p->someData.c_str ()) ;
}
} ;

Given that you are using, in your example, an equality function object
that compares the actual objects (instead of just the pointers), the
latter is likely the more appropriate hash function for you.
Jul 23 '05 #2
Alan Johnson wrote:
the objects to which they point are equivalent. If it is the pointers
themselves you wish to be unique (i.e., both p1 and p2 WOULD be allowed
in the set) then I suggest simply using the pointer value as your hash.
For example:

class data_pointer_ha sh
{
public :
size_t operator()(cons t Data *p)
{
return reinterpret_cas t<size_t>(p) ;
}
} ;


Thanks! That's what I wanted. It turned out there was a bug in the original
code that assigned duplicate indexes. Now I am getting rid of the indexes
at all and use your solution with directly hashing the pointers.

Markus

Jul 23 '05 #3
Markus Dehmann wrote:
I have a class "Data" and I store Data pointers in an STL set. But I have
millions of inserts and many more lookups, and my profiler found that they
cost a lot of runtime.

Therefore, I want to substitute the set<Data*> with a hash_set<Data*> :

typedef hash_set<const Data*, hash<const Data*>, eqData>
DataPointerHash Set;
// typedef set<Data*> DataPointerHash Set; // (see complete code below)


Actually, there are only a few hundred insertions and millions of lookups.
Is hash_set the best structure for that? After I fixed my bug I didn't
find evidence that hash_set worked faster than set. Would a vector and
stl::find be the better solution (since there are only a few hundred
elements)?

Markus

Jul 23 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
1983
by: Abhijit Ray | last post by:
I am using hash_set which is available from gcc ( and which i presume is not part of the C++ standard yet ) okay , In hash tables the key is used by a hash function to calculate a index and the object is stored under that key. I would like to have the index value. I want to iterate over all the objects stored in the hash_set and get all the objects and their index. ) The aim is to display the hash_set graphically, so that we can...
10
4271
by: Alex Gerdemann | last post by:
Hello, I have spent a bunch of time converting a Java program I wrote to C++ in order to improve performance, and have found that it is not necessarily faster. Specifically, I'm writing a circuit simulator, so I must first parse the netlist file. The program goes through an input file, and makes a hash_set of unique nodes (std::string's). The words are then sorted and numbered by copying the hash_set into a vector and sorting. ...
1
3947
by: Timo Qvist | last post by:
Hi, I'm a bit new to STL and really new to SGI's hash_set implementation and I've having problem instantiating a hash_set with a custom hash function, I could really use some help sifting through g++ (v3.3.3) error message on this particular problem. I want a hash_set with class pointers as the hashed type, my class is simply called Transition, the following code doesn't work;
4
4910
by: yuyang08 | last post by:
Hello, everyone, I am wondering what is the hash function that is used in hash_map/hash_set. Can I replace it with my own hash function? Any comments on this? Thanks! -Andy
8
3116
by: Rakesh | last post by:
Hi - What is wrong this implementation? I get a core dump at the free() statement? Thanks Rakesh #include <ext/hash_map> #include <iostream.h> #include <ext/hash_set>
2
7622
by: Pierre Couderc | last post by:
Generally, is there somewhere a good tutorial and examplefor the use of SGI STL hash_set? I am lost in SGI documentation. More specifically, i am trying to use hat I need that a hash_set : hash_set<ch; and logically the h function for my c class is missing and I get a compile error.
5
2900
by: Amit Bhatia | last post by:
Hi, I was wondering if I have a hash_set, can I modify its elements using an iterator: given the fact that the changes I make will not change the position or the key of the object that the iterator points to? Or do I need to necessarily use hash_map ? Thanks, --a.
3
3480
by: misu101 | last post by:
Hi, I have an application that defines a hash_set as: typedef hash_set<const char*, hash<const char*>, chunker_eqstr> StringSet; This code compiles OK with VS6 and SGI STL library. It is patterned after the SGI STL documentations. We are now moving to VS 2005 and the SGI STL header file is in conflict with the VS 2005 system header files. But when I try to use VS 2005 and the VS8 STL library instead of SGI STL, I got these error...
5
2536
by: Markus Dehmann | last post by:
Do I have to handle hash collisions in a hash_set myself? I did a test in which I use find() to look for objects in a hash_set. These objects are definitely not contained, but find() sometimes finds them anyway. See this code: <code> #include <iostream> #include <stdexcept> #include <ctime>
0
10156
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10007
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9951
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8831
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7375
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5419
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3924
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3531
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2805
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.