473,320 Members | 1,881 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Hash map to categorize users.

Hi,

I have a small problem I'm trying to solve in C++

I have a file containing a number of usernames under a serious of
groups:

Admin
---------
Dave834
Bob
John
......

Operators
----------
George
Ray
Gavin
Garry
......

Users
---------
Dylan
Darragh
Clive
.....

Note that we can't have duplicate users names.
I have written some code that parses the file, and can read each
group, and then read all the users in that group.

I want to be able to perform a lookup on a username, and find out what
group that user belongs to. Im guessing I need to use some sort of
hash map, but haven't found anything suitable. The emphisis is on
fast reterival, does anyone have a suggestion of how this could be
implemented?

Sep 12 '07 #1
8 1542
On 2007-09-12 19:12, DaveJ wrote:
Hi,

I have a small problem I'm trying to solve in C++

I have a file containing a number of usernames under a serious of
groups:

Admin
---------
Dave834
Bob
John
.....

Operators
----------
George
Ray
Gavin
Garry
.....

Users
---------
Dylan
Darragh
Clive
....

Note that we can't have duplicate users names.
I have written some code that parses the file, and can read each
group, and then read all the users in that group.

I want to be able to perform a lookup on a username, and find out what
group that user belongs to. Im guessing I need to use some sort of
hash map, but haven't found anything suitable. The emphisis is on
fast reterival, does anyone have a suggestion of how this could be
implemented?
Since I don't have access to a good TR1 implementation I'm going to use
a normal map in the examples below, but it should be easily replaceable
with a hashmap if you have one.

The simplest implementation is to simply use

std::map<std::string, std::stringuserToGroup;

If that is not enough please explain in more detail your needs.

--
Erik Wikström
Sep 12 '07 #2
On Sep 12, 1:12 pm, DaveJ <davej2...@gmail.comwrote:
Hi,

I have a small problem I'm trying to solve in C++

I have a file containing a number of usernames under a serious of
groups:

Admin
---------
Dave834
Bob
John
.....

Operators
----------
George
Ray
Gavin
Garry
.....

Users
---------
Dylan
Darragh
Clive
....

Note that we can't have duplicate users names.
I have written some code that parses the file, and can read each
group, and then read all the users in that group.

I want to be able to perform a lookup on a username, and find out what
group that user belongs to. Im guessing I need to use some sort of
hash map, but haven't found anything suitable. The emphisis is on
fast reterival, does anyone have a suggestion of how this could be
implemented?
First cut, just use a std::map <std::string, std::stringwhere the
key is the name and the value is the group.

Second cut, if that's too slow, look into using hash_map (e.g. from
boost tr1).

Third cut, consider placing the groups in another container (e.g.
vector) and
having the map's value be an pointer to an element of that container.
This
should save on space since if many names will use the same group.

Hope that helps.

Sep 12 '07 #3
On Sep 12, 7:53 pm, "AnonMail2...@gmail.com" <AnonMail2...@gmail.com>
wrote:
On Sep 12, 1:12 pm, DaveJ <davej2...@gmail.comwrote:
I have a small problem I'm trying to solve in C++
I have a file containing a number of usernames under a serious of
groups:
Admin
---------
Dave834
Bob
John
.....
Operators
----------
George
Ray
Gavin
Garry
.....
Users
---------
Dylan
Darragh
Clive
....
Note that we can't have duplicate users names.
I have written some code that parses the file, and can read each
group, and then read all the users in that group.
I want to be able to perform a lookup on a username, and find out what
group that user belongs to. Im guessing I need to use some sort of
hash map, but haven't found anything suitable. The emphisis is on
fast reterival, does anyone have a suggestion of how this could be
implemented?
First cut, just use a std::map <std::string, std::stringwhere the
key is the name and the value is the group.
Second cut, if that's too slow, look into using hash_map (e.g. from
boost tr1).
Third cut, consider placing the groups in another container
(e.g. vector) and having the map's value be an pointer to an
element of that container. This should save on space since if
many names will use the same group.
And saving space means less cache misses and page faults:-).

He doesn't say how many names he is dealing with. I regularly
use std::find (linear search) for up to about 20 entries, even
in time critical code (and in non-critical code, of course, even
over a 100 entries isn't a problem). My own experiments suggest
that the O(lg n) performance of std::map doesn't really start
having an effect below around a couple of thousand entries: with
optimal hashing, the difference starts being measurable around
175 entries, but just barely, and hashing is rarely optimal.
Above a thousand or so entries, on the other hand, the
difference does begin to have a noticeable impact.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 13 '07 #4
Thanks guys,

Thats food for thought.
I wanted to avoid just having a map<string, stringimplementation as
there would be several users per group, but I guess a pointer to a
vector of groups would be a nice way around that.

As for numbers I would expect to have maybe 1000 in each of the three
groups, so I guess I would see benifits of a hash map. Thanks

Sep 13 '07 #5
LR
DaveJ wrote:
Thanks guys,

Thats food for thought.
I wanted to avoid just having a map<string, stringimplementation as
there would be several users per group, but I guess a pointer to a
vector of groups would be a nice way around that.
Sorry, but why?

Did I misread your original request?

You have many users, each user is a member of a group, you want to be
able to look up a particular user to see what groups they are a member of.

std::map<std::string,std::stringwill work just fine for that. Is
there some other requirement or concern you have that I'm missing or
didn't understand?

As for numbers I would expect to have maybe 1000 in each of the three
groups, so I guess I would see benifits of a hash map. Thanks
Probably. Well, maybe. But 1000 doesn't really sound like a very big
number to me. In a Red Black tree (I think that many std::maps are
implemented with RBTs) I think that's going to be at most 10 levels. It
might not save you that much.

I guess another question is, how often are you going to have to read
that file? If the answer is 5000 times a day, then look into speed
problems (but not yet). If the answer is once a month, then maybe not
to worry.
LR

Sep 13 '07 #6
On 2007-09-13 14:43, LR wrote:
DaveJ wrote:
>Thanks guys,

Thats food for thought.
I wanted to avoid just having a map<string, stringimplementation as
there would be several users per group, but I guess a pointer to a
vector of groups would be a nice way around that.

Sorry, but why?

Did I misread your original request?

You have many users, each user is a member of a group, you want to be
able to look up a particular user to see what groups they are a member of.

std::map<std::string,std::stringwill work just fine for that. Is
there some other requirement or concern you have that I'm missing or
didn't understand?
Memory savings mostly I would suspect, if you have many groups with long
names you can save quite a few bytes by only storing them once.
>As for numbers I would expect to have maybe 1000 in each of the three
groups, so I guess I would see benifits of a hash map. Thanks

Probably. Well, maybe. But 1000 doesn't really sound like a very big
number to me. In a Red Black tree (I think that many std::maps are
implemented with RBTs) I think that's going to be at most 10 levels. It
might not save you that much.

I guess another question is, how often are you going to have to read
that file? If the answer is 5000 times a day, then look into speed
problems (but not yet). If the answer is once a month, then maybe not
to worry.
I don't think that the file will be read very often, but lookups might
be performed very often. Never the less, unless this is a very time-
critical app like a like a busy webserver, or very many lookups are
performed in some loop, or some kind of realtime app I don't think there
will be any noticeable difference between std::map and a hashmap on
modern PCs.

--
Erik Wikström
Sep 13 '07 #7
The plan is to have 1000 or so names per group. There will actually
be more groups that what I specified that was just test data.
In terms of lookups, it will be a busy system with 1000+ transactions
per minute.

On Sep 13, 3:14 pm, Erik Wikström <Erik-wikst...@telia.comwrote:
On 2007-09-13 14:43, LR wrote:
DaveJ wrote:
Thanks guys,
Thats food for thought.
I wanted to avoid just having a map<string, stringimplementation as
there would be several users per group, but I guess a pointer to a
vector of groups would be a nice way around that.
Sorry, but why?
Did I misread your original request?
You have many users, each user is a member of a group, you want to be
able to look up a particular user to see what groups they are a member of.
std::map<std::string,std::stringwill work just fine for that. Is
there some other requirement or concern you have that I'm missing or
didn't understand?

Memory savings mostly I would suspect, if you have many groups with long
names you can save quite a few bytes by only storing them once.
As for numbers I would expect to have maybe 1000 in each of the three
groups, so I guess I would see benifits of a hash map. Thanks
Probably. Well, maybe. But 1000 doesn't really sound like a very big
number to me. In a Red Black tree (I think that many std::maps are
implemented with RBTs) I think that's going to be at most 10 levels. It
might not save you that much.
I guess another question is, how often are you going to have to read
that file? If the answer is 5000 times a day, then look into speed
problems (but not yet). If the answer is once a month, then maybe not
to worry.

I don't think that the file will be read very often, but lookups might
be performed very often. Never the less, unless this is a very time-
critical app like a like a busy webserver, or very many lookups are
performed in some loop, or some kind of realtime app I don't think there
will be any noticeable difference between std::map and a hashmap on
modern PCs.

--
Erik Wikström

Sep 13 '07 #8
LR
DaveJ wrote:

I find top posting to be a little confusing, so I moved what you wrote
to the end.
On Sep 13, 3:14 pm, Erik Wikström <Erik-wikst...@telia.comwrote:
>On 2007-09-13 14:43, LR wrote:
>>DaveJ wrote:
Thanks guys,
Thats food for thought.
I wanted to avoid just having a map<string, stringimplementation as
there would be several users per group, but I guess a pointer to a
vector of groups would be a nice way around that.
Sorry, but why?
Did I misread your original request?
You have many users, each user is a member of a group, you want to be
able to look up a particular user to see what groups they are a member of.
std::map<std::string,std::stringwill work just fine for that. Is
there some other requirement or concern you have that I'm missing or
didn't understand?
Memory savings mostly I would suspect, if you have many groups with long
names you can save quite a few bytes by only storing them once.
Ok, I guess the utility of that will depend on the types of transactions
that are being done. If there are groups that are added or deleted with
some frequency then something other than a vector, some type of map,
might be better.

>>
>>>As for numbers I would expect to have maybe 1000 in each of the three
groups, so I guess I would see benifits of a hash map. Thanks
Probably. Well, maybe. But 1000 doesn't really sound like a very big
number to me. In a Red Black tree (I think that many std::maps are
implemented with RBTs) I think that's going to be at most 10 levels. It
might not save you that much.
I guess another question is, how often are you going to have to read
that file? If the answer is 5000 times a day, then look into speed
problems (but not yet). If the answer is once a month, then maybe not
to worry.
I don't think that the file will be read very often, but lookups might
be performed very often. Never the less, unless this is a very time-
critical app like a like a busy webserver, or very many lookups are
performed in some loop, or some kind of realtime app I don't think there
will be any noticeable difference between std::map and a hashmap on
modern PCs.
The plan is to have 1000 or so names per group. There will actually
be more groups that what I specified that was just test data.
In terms of lookups, it will be a busy system with 1000+ transactions
per minute.
I guess the kinds of transactions you do will influence the kinds of
data structures you choose, but I suspect that for 1000+ per minute you
may not find std::map to be the kind of problem you think it will be.

It may be very worthwhile to profile what ever you write and then see
how you can speed up what ever problems you run into.

LR
Sep 14 '07 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: rolivawdaneel | last post by:
Hello, I am using SGI's hash_map in such a way that there is an inherent memory leak. Let me explain: 1- My hash_map takes char* as keys and one of my class, say myClass, as data. I give the...
22
by: VK | last post by:
A while ago I proposed to update info in the group FAQ section, but I dropped the discussion using the approach "No matter what color the cat is as long as it still hounts the mice". Over the last...
7
by: Zhiqiang Ye | last post by:
Hi,dear all, I know there is a hash library in glibc, it's head file is search.h. One can use hash map based on this.But its hash is global , there could be only one hash structrue at one time. Is...
1
by: bb | last post by:
I have a requirement to create and store in our database the users password in a couple of additional hashes (we currently store an MD5 hash) the spec is pretty brief... Spec: Store the NT...
8
by: Adam Carpenter | last post by:
Hello, I have my users passwords stored to my DB hashs created using SHA1CryptoServiceProvider, here is the function: Public Shared Function EncryptPassword(ByVal password As String) As Byte()...
10
by: Johhny | last post by:
Hello, I am currently trying to write some scripts to get information from the xmlrpc for redhat network. One of the issues I am having is trying to strip off the special characters in the hash...
12
by: Arash Partow | last post by:
Hi all, I've ported various hash functions to python if anyone is interested: def RSHash(key): a = 378551 b = 63689 hash = 0
1
by: Frank | last post by:
Hello All, I am exploring and developing a plan for a C# web app or windows app to handle bounced emails. Basically, I need to develop a system where I weed out bad addresses in our db...but...
139
by: ravi | last post by:
Hi can anybody tell me that which ds will be best suited to implement a hash table in C/C++ thanx. in advanced
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.