473,396 Members | 1,676 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Hash map implementation problem

My problem is to find a good implementation that will allow me, once transformed strings in hash keys, to do, if I had at the beginning this:
Group 1: string12, string21
Group 2: string364 string128
and after i do this:
move string128 string21
the result will be:
Group 1: string12, string21, string128
Group 2: string364
I know, nothing so complicated but my implementation is still slow, it should be faster.
I'm using a vector<vector<int>>, where i save the hash_key.
What should i use?(i can't use unordered_map of c++11!)
I have to save the hash keys in a map?in a vector? or i have to use a vector<vector> or vector<map>??? I'm confused and I hope someone can help me because it's very important for me.
Jun 1 '14 #1
26 2830
weaknessforcats
9,208 Expert Mod 8TB
If your hash keys are in vector<vector<int> then you need t iterate the vector to find the vector<int> for the hash key you are looking up.

Have you tried a map<pair <int, vector<int> >?

You should be able to quickly look up the key and then iterate vector<int> to fetch up all the strings that hashed to the key value.
Jun 1 '14 #2
I have not explained the problem well because i can have also this case:

Group 1: string12, string21, string128
Group 2: string364
and after i do this:
move string128 string364
the result will be:
Group 1: string12, string21
Group 2: string364 string128
So i have to iterate over all vectors to search for the key??
Jun 2 '14 #3
weaknessforcats
9,208 Expert Mod 8TB
The idea is to use a chain-link table. One implementation is an array and a linked list. In this scenario let's assume your hash vales are between 0 and 1000. You create an array where the element index is the hash value. Then when you hash to, say,275 you go to element 275 of the array and there is the address of a linked list of all items that have hashed to 275. Then you iterate the linked list to find the one you want.

My suggestion was to use a map<> instead of an array and to use a vector<> (or perhaps a list<>) for the linked list.

In my suggestion when you hash to 275 you locate the pair in the map<> and then iterate the vector<> in that pair<>. You would need to iterate only this one vector<>. Depending upon how many keys are in the map<> and the efficiency of your hash algorithm, this vector<> may have very few elements.

There are new hash templates for C++ that might help you.
Jun 2 '14 #4
lu8890
1
can you do something like this (in c#):

dictionary<string, List<string>>?

or, if input strings needs to be unique, then
dictionary<string, dictionary<string, string>>?
Jun 2 '14 #5
weaknessforcats
9,208 Expert Mod 8TB
We are trying to do this in C++.

Start a thread in C# forum to ask C# questions.
Jun 3 '14 #6
I've tried using many data structures but is still slow, nothing has increased the speed. Other operation is also(thought it was not necessary but maybe it is), merge group of string1 with group of string2. Example:
Group 1: string12, string21, string128
Group 2: string364
merge string128 string364
Group 1:
Group 2: string364,string128,string12,string21
I've used(most successful):
- vector< vector<int>> but is slow! (linear search N*N)
- vector<map<hash_key,int>> but is also slow, more than vector?!(maybe operations like insert or erase are not costant like vector, but logN)
Weaknessforcats what i have to save in the vector, the hask_key right?
Jun 3 '14 #7
weaknessforcats
9,208 Expert Mod 8TB
Did you try map<pair<hash_key, vector<string> >?

vector<string> will contain all strings hashed to the hash_key value.
Jun 3 '14 #8
Yes i did , but if i have:
string1 = hashKey1
string2 = hashKey2
...
stringN = hashKeyN

and this case:
move string2 in string 1
move string3 in string 1
move string4 in string 1
move string5 in string 2
move string6 in string 2
hashKey1,vector<string>{string2,string3,string4}
hashKey2,vector<string>{string5,string6}
hashKey3,vector<string>{}
hashKey4,vector<string>{}
hashKey5,vector<string>{}
hashKey6,vector<string>{}

now if i want to merge string4 with stringN or print with many elements string4 is, first i have to iterate over all vectors to find string4, is this what you mean? I thought there was a way to accelerate all. anyway thanks
Jun 3 '14 #9
weaknessforcats
9,208 Expert Mod 8TB
How would you know there is a string4? All that's in the vector is the string contents. If you already know the string contents, then you don't need to look the data up.

All the data has been hashed. Presumably all you have are the hash keys so you can look up the data. There is no string4.

string4 is a variable in your program. All that's in your executable is binary content. The "string4" is for the compiler.

For lookup you start with the hash key. Presumably, the hash keys have been saved and not the strings. You locate the has key in the map and if more than one string has hashed to that hash value, there will be a vector of strings that have hashed to the same value.

This is not a good setup to go look for data without hash key.
Jun 4 '14 #10
I'm confused, you said to use map<pair<hash_key, vector<string> >, so I figured to put inside the vector, the string and not the hash value, btw was an example. I believe that my solution is more like an hash table, that i'm studying , because i can find an element in O(1). But if i give you this case, the simplest one, so you can understand if i'm thinking wrong,if i have: move string1 in string2
1) where this string must be saved, in the vector? and vector of which hashkey?
2) if i have: print element of string1, it must print 2 and string2 too, because they are in the same ipotetic "group", this means to insert string1 in string2 and vieversa?
thank very much for your help, sorry my bad english, but i'm having a little bit of trouble with this argouments
Jun 5 '14 #11
weaknessforcats
9,208 Expert Mod 8TB
You need to make a pair<>. The first member is the hash key and the second member is the vector<string>. You put the string in the vector.

The pair<> is inserted in the map<>. Next time you hash a string you access the map to see if the hash key already exists. If it does, you add the new string to the vector. Otherwise you create a new pair<> and add it to the map<>.

So the structure is the pair<> which associates a vector of all strings that hash to the same value. The pairs are inserted in the map so you can access based on hash key.

BTW: Your English is fine.
Jun 5 '14 #12
What you are saying is for when i have two strings with the same hashkey, so i have to save in the same hashkey a vector with all the strings that are referred to that hashkey. This is not a real problem for me because i use a function(and tested and it works) that returns me a unique hashkey for the all strings that i have in input. The problem is when i have to move, merge the strings. I need a fast looukup that knows where the string is, or that it may know through a link.
Jun 5 '14 #13
weaknessforcats
9,208 Expert Mod 8TB
If your keys are unique then don't use a vector in the pair<>. Instead use the string itself:

pair<hash_key, string>

Insert this in the map and you can access the string by using the unique hash key.

Remember, pair<> has two members 1)first and 2)second. So you So if your map is now:

Expand|Select|Wrap|Line Numbers
  1. map<pair<hash_key, string> >  database;
then database[251] or database.find(251) will access the pair<> with that hash_key. Therefore, database[251].first is your hash_key and database[251].second is your string.
Jun 5 '14 #14
Ok perfect, this is what i did, a map<pair<hash_key, string> > with all the hashkey and the strings. But now, i should do a vector(that i figured that will be my groups) with the string that i move from a group to another, that i merge ecc..
so in vector[1] i put hashkey1 and in vector[2] hashkey2.., when i have to move string1 with string2 i move hashkey2 from vector[2] into vector[1], idem with the merge. But problem is that i have to scan all the vectors to find the hashkeys in input in which vector(figured how groups) are, and this is very very slow.
Jun 5 '14 #15
weaknessforcats
9,208 Expert Mod 8TB
This looks like you need a second map. A map of groups:

Expand|Select|Wrap|Line Numbers
  1. map<pair<group_id, vector<hash_key> > groups
;

Now if there is a group, say group 34, you access the groups map using the group_id. There, you find the vector of hash_keys that represent the strings in the group. You iterate that vector and find 26,73,450. Using those numbers in that order you access the string database to fetch the strings for those hash keys. You create a new string and append the string for hash 25, then the string for hash 73 and finally the string for hash 450. And there is your group string.
Jun 5 '14 #16
Then if in group[1] i have 26,73,450 and in group[2] i have 15,28,3, and i have to move hashkey 3 into the group of hashkey 73, I should not search in which group hashkey 3 and 73 are?this means linear search in group[1]..ecc until group[n], this is what i want to avoid, becuase i very slow, but maybe it's the only way to do it
Jun 5 '14 #17
weaknessforcats
9,208 Expert Mod 8TB
OK. Here is where you use an association index.

One side of the association index shows which hash_keys belong to which groups. This is

Expand|Select|Wrap|Line Numbers
  1. map<pair<hash_key, vector<group_id> >
  2. The data in your example is:
  3. 3 <2>
  4. 15 <2>
  5. 26 <1>
  6. 28 <2>
  7. 73 <1>
  8. 450 <1>
  9.  
The second side of the index associates group_id with hash_key.

Expand|Select|Wrap|Line Numbers
  1. map<pair<group_id, vector<hash_key> >
Expand|Select|Wrap|Line Numbers
  1. 1 <26, 73, 450>
  2. 2 <15,28, 3>
  3.  
So to move a string from group 2 to group 1 you:
1) Access the map<<pair<hash_key, vector<group_id> > using 3 as the hash_key and remove 2 from the vector<2>. Then add 1 to this vector to get 3 <1>
2) Access the other index using the destination group_id, which is 1 and add hash_key 3 to the vector<26, 73, 450>
in the correct position. Maybe in the middle to get <26, 73,3, 450>
3) Access the other index again using the source group_id, which is 2 and remove 3 from the vector<15,28,3> to get vector<15,28>.

and your string has moved between groups. If you keep hash_key 3 in both groups, then you have done a copy just by having the hash_key appear in the group_id vector of 2 groups.

Notice you never did have to actually find the string itself.
Jun 5 '14 #18
weaknessforcats
9,208 Expert Mod 8TB
Check this out:

http://en.wikipedia.org/wiki/Junction_table
Jun 5 '14 #19
Ok i tried in this way, but with a vector as a groups instead of a map, so when i have the group_id, i can access to it directly, vector[group_id], and in each element of the vector i saved a map instead of a vector, with all the hashkeys, because now i have to search the element and the map's find is faster. I'll do some test to see if there is an improvement in the speed
Jun 6 '14 #20
weaknessforcats
9,208 Expert Mod 8TB
There should be a big improvement. A map<> uses a red/black tree based on a B-tree. The number of accesses using the tree should be much smaller than iterating a vector sequentially.

Let me know what happens.
Jun 6 '14 #21
I tryed with
vector< map <int,int> > groups; // vector[group_id]= map<hash_key, string.size>
map<int,int> keyGroup; // map<hash_key, group_id>
It returns me correct values but is not faster than a vector<vector> that have a list of hashkey inside each position, where i have to do a linear search to find the hashkeys in input.
That's strange, maybe there is a waste of time with the indexing after merge.
Jun 6 '14 #22
weaknessforcats
9,208 Expert Mod 8TB
That's probably because there is no map method to advance to the next element. With a map each seek starts with the root of the three. A tree does not have a "next in the list" item.

Advancing a vector is just pointer arithmetic from the current position.

I would think a map would be faster if you are trying to locate a single key whereas a vector would be faster when iterating the entire vector.

BTW: A vector must be implemented as an array. That means overhead in expanding the array by adding elements. You might try a list<> instead of a vector<> since list<> is just a double linked list. list<> might be faster to add but slower to iterate. vector<> might be slower to add but faster to iterate.

Of course, you could create your vector reserving the maximum elements you expect. That would remove the slower to add but would add bloat to your app.

I am pleased you are getting correct values.
Jun 7 '14 #23
what i'm not understanding is if i can use for my problem an hashtable instead of a map. Now i'm using two maps that have O(log(n)) for insert, find and remove because map is a RB-tree, while an hash table, that use buckets, for the same functions have O(1).
I think you have suggested this to me already, but is not clear to me the implementation of it for my problem. You think i can have improvments with this one and it's applicable to my problem? Considering also that my hashfunctions returns me high int value, and i don't use a modulus.
Jun 7 '14 #24
weaknessforcats
9,208 Expert Mod 8TB
You have a many-to-many situation. A string can belong to many groups and a group can have many strings. This makes the association index the solution for the situation.

First question: Does your program work?
If not, then get it working.

Here I would be writing a class with these containers as member variables which are accessed only through member functions of this class. This is basic encapsulation.

I would no code whatsoever outside this class member functions the used any of the maps, vectors, etc..

Once this is done, then you can alter the implementation of the class (but not the public interface) and experiment with various data access schemes.

One that occurs to me is to use something like Pro*C for this class and implement Oracle. Or MySQL, etc... These are all set up for foreign keys, which is what you are using.
Jun 7 '14 #25
OK i'm here again, for the happiness of weaknessforcats. Yes it works, but it's slow. I made some modifications, that have speeded up my program of about 20 seconds, now it take about 1min to finish the test, but it have to take 20sec or less to finish.
I have used a vector<vector<int>>group for insert, search and merge the hashkeys and a map<int,int> for indexing the key with the hashkeys and the value with the groups id. Unless I change it, I think that, less than this, I can't do with this structure.
Now i'm asking what can i use, maybe some pointer, vector of class pointer. I'm sorry but I do not understand well what you said here:
"Here I would be writing a class with these containers as member variables which are accessed only through member functions of this class. This is basic encapsulation.
I would no code whatsoever outside this class member functions the used any of the maps, vectors, etc..".
You're suggesting to use some class method? Why would I benefit?
Jun 8 '14 #26
weaknessforcats
9,208 Expert Mod 8TB
You should hide the implementation of your vectors and maps.

For example:

obj.insert("weaknessforcats", "petshop");

Let's say that this method:
1)hashes the string and inserts it the string database
2)adds the string to the petshop group
3)adds the petshop group to the string groups. So how is the method implemented?

Answer: You don't care.

There are no hash keys and group ids. These have all been buried. The application uses strings.

If your code looks like this you can change the structure and implementation of obj without impacting any code that uses methods on this object.

You might even decide to assign specific virtual processors to handle the various databases and indexes instead of having the program thread do everything sequentially.

If you make a change in obj all you do is rebuild your code with the new obj code and your change is implemented in your program.

On the other hand, if you expose all these vector and map calls in 500 places in your program , now it's not so easy to make a change.

Part of using C++ is reuse. How can I reuse this already written and debugged code?

You are now branching off to object-oriented programming, which is beyond the subject of this thread. If you are not familiar with these concepts, this is a golden opportunity to research some of them and use them in the solution of your problem.
Jun 8 '14 #27

Sign in to post your reply or Sign up for a free account.

Similar topics

1
by: andreas gammel | last post by:
I want to use Perl hash functionality (hashes of hashes, arrays of hashes, hashes of arrays) in my C programs. Other Perl functions (split, chomp, regular expressions) would be welcome too. ...
4
by: Me | last post by:
I am not understanding an aspect of how to implement a class template in a ..cpp file. What I do know is that there are at least two problems with my understanding of how to accomplish the...
2
by: Kevin Grigorenko | last post by:
Basically, I have two classes declared in a header file: class Version { ... }; class BasicRegex { ...
2
by: Brian Genisio | last post by:
Hello all, I need to use a hash table for quick lookup... What is the best way to do it? My current solution uses stl::map, where I essentially do the following (not compilable code): ...
2
by: Dan | last post by:
I've installed passport sdk on to windows 2003 and I have received a key from netservicesmanager.com. I have set securelevel in the registry to 0. When I click on the link generated by logotag2 it...
1
by: ProJee | last post by:
Hi, I've got the following problems while trying to finish my NewsReader class: #1 There's a NEWNEWS command in the RFC specs, which should retrieve the list of articles posted after the...
2
by: Shaun Merrill | last post by:
This function doesn't seem to be giving me the appearance I expect. This should appear more like "1ba442ab45c8002a86d068d1c1748e44" but it looks really cryptic and screwy, like "­óìØLÜzrVÊ%'...
5
by: weidongtom | last post by:
Hi, I tried to implement the Universal Machine as described in http://www.boundvariable.org/task.shtml, and I managed to get one implemented (After looking at what other's have done.) But when I...
7
by: bcpkh | last post by:
Hello All Received a header file from a supplier that defines an interface to implement but it's giving me a problem, I reproduce the general structure of the header file below; #ifndef XYZ_H...
5
by: NoOneCanHelpMe | last post by:
Hello everyone, could someone help me please, I want to print out the frequency of each vowel in the string and I think I will need to use the hash table I already have but im not sure how to do it...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.