473,320 Members | 1,865 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Merging a list of similar items

Hello,

I have a dataset which consist of a string username and string hostname as a
key and then an integer representing a count as the matching "second" value
in a pair.

So far I've used std::list to store this information by inserting objects
containing all 3 pieces of information. However now some users has datasets
in the range of 12000 records and my list seems to be taking like forever to
complete inserting and finding.
For this reason I'm looking for ways to speed this up.

Here is a bit of code from the existing solution (which works but is too
slow):

iterator Find(const string &strUsername, const string &strHostname)
{
iterator it;

for (it = begin(); it != end(); it++)
{
if (StrNoCaseCompare(it->m_strUsername, strUsername) &&
StrNoCaseCompare(it->m_strHostname, strHostname))
return it;
}

return end();
}

void Insert(const string &strUsername, const string &strHostname, int
nCount)
{
CDailyUsage DU;

iterator it = Find(strUsername, strHostname);

if (it == end())
{
DU.m_strUsername = strUsername;
DU.m_strHostname = strHostname;
DU.m_nCount = nCount;

PushBack(DU);
}
else
{
it->m_nCount += nCount;
}
}

I'm thinking about replacing it with map<pair<username,hostname>, ncount>
but I don't know how much this will give me.
Are anyone up for good ideas?

Thanks.

-- Henrik
Jun 13 '07 #1
6 1787
hi:

stl::map is much more suitable for your requirement.

As you mentioned the total size of dataset is around 12000,
it is a very big number and usually the stl::list is implemented in
linkedlink.
It will travers from the begin to the end, this means the searching
time complexity is O(n).

But map is always implemented in RB tree, the complexity is O(lg(n))

It is much fast if the size is bigger
On Jun 13, 3:03 pm, "Henrik Goldman" <henrik_gold...@mail.tele.dk>
wrote:
Hello,

I have a dataset which consist of a string username and string hostname as a
key and then an integer representing a count as the matching "second" value
in a pair.

So far I've used std::list to store this information by inserting objects
containing all 3 pieces of information. However now some users has datasets
in the range of 12000 records and my list seems to be taking like forever to
complete inserting and finding.
For this reason I'm looking for ways to speed this up.

Here is a bit of code from the existing solution (which works but is too
slow):

iterator Find(const string &strUsername, const string &strHostname)
{
iterator it;

for (it = begin(); it != end(); it++)
{
if (StrNoCaseCompare(it->m_strUsername, strUsername) &&
StrNoCaseCompare(it->m_strHostname, strHostname))
return it;
}

return end();
}

void Insert(const string &strUsername, const string &strHostname, int
nCount)
{
CDailyUsage DU;

iterator it = Find(strUsername, strHostname);

if (it == end())
{
DU.m_strUsername = strUsername;
DU.m_strHostname = strHostname;
DU.m_nCount = nCount;

PushBack(DU);
}
else
{
it->m_nCount += nCount;
}
}

I'm thinking about replacing it with map<pair<username,hostname>, ncount>
but I don't know how much this will give me.
Are anyone up for good ideas?

Thanks.

-- Henrik

Jun 13 '07 #2
But map is always implemented in RB tree, the complexity is O(lg(n))
>
It is much fast if the size is bigger
Do you know if the construction I wrote is valid?

E.g. how would I do inserts and lookups in a map with a pair as 'first'.
Does pair<have a proper compare function?

-- Henrik
Jun 13 '07 #3
Henrik Goldman wrote:
>But map is always implemented in RB tree, the complexity is O(lg(n))

It is much fast if the size is bigger

Do you know if the construction I wrote is valid?

E.g. how would I do inserts and lookups in a map with a pair as 'first'.
Does pair<have a proper compare function?
No, it hasn't. Actually, you could write one for your problem, but at
this point it's better to wrap the information in a small class (or
struct, if you do not really need a class for this). For the map to
work, the operator< is needed. The simplest way is:

struct Account
{
std::string username;
std::string hostname;
bool operator<(const Account& a) const
{
return (username == a.username) ?
(hostname < hostname) :
(username < username);
}
};

and then...

std::map<Account, std::stringpasswords;

will work fine.

Regards,

Zeppe
Jun 13 '07 #4
"Henrik Goldman" <he************@mail.tele.dkwrote in message
news:46***********************@dread11.news.tele.d k...
Hello,

I have a dataset which consist of a string username and string hostname as
a key and then an integer representing a count as the matching "second"
value in a pair.

So far I've used std::list to store this information by inserting objects
containing all 3 pieces of information. However now some users has
datasets in the range of 12000 records and my list seems to be taking like
forever to complete inserting and finding.
For this reason I'm looking for ways to speed this up.

Here is a bit of code from the existing solution (which works but is too
slow):

iterator Find(const string &strUsername, const string &strHostname)
{
iterator it;

for (it = begin(); it != end(); it++)
{
if (StrNoCaseCompare(it->m_strUsername, strUsername) &&
StrNoCaseCompare(it->m_strHostname, strHostname))
return it;
}

return end();
}

void Insert(const string &strUsername, const string &strHostname, int
nCount)
{
CDailyUsage DU;

iterator it = Find(strUsername, strHostname);

if (it == end())
{
DU.m_strUsername = strUsername;
DU.m_strHostname = strHostname;
DU.m_nCount = nCount;

PushBack(DU);
}
else
{
it->m_nCount += nCount;
}
}

I'm thinking about replacing it with map<pair<username,hostname>, ncount>
but I don't know how much this will give me.
Are anyone up for good ideas?
A std::map should give you a whole lot. A list is unindexed and for every
search you are iteratirng though every possible record. A map is indexed (I
believe they use binary trees or better) and searches are a lot faster, a
WHOLE lot faster. The difficulty comes in your strNoCase compare, but you
can get around that by making the key a class with an operator<..

The way you are using this you probably want a set anyway since your data is
in your class/structure.

Output of following program showing your type of functions for List, Map and
Set is:

*** List ***
Inserting group 0 100 unique entries to list: 94ns
Inserting group 1 100 unique entries to list: 266ns
Inserting group 2 100 unique entries to list: 343ns
Inserting group 3 100 unique entries to list: 469ns
Inserting group 4 100 unique entries to list: 609ns
Inserting group 5 100 unique entries to list: 719ns
Inserting group 6 100 unique entries to list: 860ns
Inserting group 7 100 unique entries to list: 1000ns
Inserting group 8 100 unique entries to list: 1171ns
Inserting group 9 100 unique entries to list: 1250ns
Search for non existant entry in List: 16ns
List size:1000

*** Map ***
Inserting group 0 100 unique entries to map: 47ns
Inserting group 1 100 unique entries to map: 62ns
Inserting group 2 100 unique entries to map: 63ns
Inserting group 3 100 unique entries to map: 78ns
Inserting group 4 100 unique entries to map: 78ns
Inserting group 5 100 unique entries to map: 94ns
Inserting group 6 100 unique entries to map: 94ns
Inserting group 7 100 unique entries to map: 94ns
Inserting group 8 100 unique entries to map: 78ns
Inserting group 9 100 unique entries to map: 94ns
Search for non existant entry in Map: 0.485ns
Map size:1000

*** Set ***
Inserting group 0 100 unique entries to set: 46ns
Inserting group 1 100 unique entries to set: 79ns
Inserting group 2 100 unique entries to set: 78ns
Inserting group 3 100 unique entries to set: 78ns
Inserting group 4 100 unique entries to set: 78ns
Inserting group 5 100 unique entries to set: 78ns
Inserting group 6 100 unique entries to set: 93ns
Inserting group 7 100 unique entries to set: 94ns
Inserting group 8 100 unique entries to set: 94ns
Inserting group 9 100 unique entries to set: 94ns
Search for non existant entry in Set: 0.531ns
Set size:1000

The main thing to notice is the search times. For your list it's a linear
search, takes longer and longer the more entries you have. For map and set,
they use indexes, so QUITE a bit faster. Here is program: (Please note
that followoing program is not optmized, I just tried to follow the way you
were doing things with maps and sets also).

#include <iostream>
#include <string>
#include <ctime>
#include <cstring>
#include <utility>
#include <sstream>

#include <list>
#include <map>
#include <set>

std::string Lowercase( const std::string& MixedString )
{
std::string ReturnValue = "";
for ( std::string::size_type i = 0; i < MixedString.size(); ++i )
ReturnValue += static_cast<char>( tolower( MixedString[i] ) );
return ReturnValue;
}

bool StrNoCaseCompare( const std::string& Str1, const std::string& Str2 )
{
if ( Lowercase( Str1 ) == Lowercase( Str2 ) )
return true;

return false;
}
class UserInfo
{
public:
UserInfo( const std::string& Username, const std::string& Hostname ):
Username( Username ), Hostname( Hostname ) {}
std::string Username;
std::string Hostname;
int count; // Only used in set
};

bool operator<( const UserInfo& lhs, const UserInfo& rhs )
{
// case insensitve compare
// use your fastest method, this one just thrown together
std::string Userlhs = Lowercase( lhs.Username );
std::string Hostlhs = Lowercase( lhs.Hostname );
std::string Userrhs = Lowercase( rhs.Username );
std::string Hostrhs = Lowercase( rhs.Hostname );

if ( Userlhs == Userrhs )
if ( Hostlhs < Hostrhs )
return true;
else
return false;
else if ( Userlhs < Userrhs )
return true;
else
return false;
}
typedef std::list<std::pair<UserInfo, int DataList;
typedef std::map<UserInfo, int DataMap;
typedef std::set<UserInfoDataSet;

DataList::iterator FindInList(const std::string &strUsername, const
std::string &strHostname, DataList& Data)
{
DataList::iterator it;

for (DataList::iterator it = Data.begin(); it != Data.end(); ++it)
{
if (StrNoCaseCompare(it->first.Username, strUsername) &&
StrNoCaseCompare(it->first.Hostname, strHostname))
return it;
}

return Data.end();
}

void InsertToList(const std::string &strUsername, const std::string
&strHostname, int nCount, DataList& Data)
{
DataList::iterator it = FindInList(strUsername, strHostname, Data);

if (it == Data.end())
{
Data.push_back(std::make_pair( UserInfo( strUsername, strHostname ),
nCount ));
}
else
{
it->second += nCount;
}
}

DataMap::iterator FindInMap(const std::string &strUsername, const
std::string &strHostname, DataMap& Data)
{
return Data.find( UserInfo( strUsername, strHostname ));
}
void InsertToMap(const std::string &strUsername, const std::string
&strHostname, int nCount, DataMap& Data)
{
UserInfo Entry( strUsername, strHostname );
DataMap::iterator it = Data.find( Entry );

if (it == Data.end())
{
Data.insert(std::make_pair( Entry, nCount ));
}
else
{
it->second += nCount;
}
}

DataSet::iterator FindInSet(const std::string &strUsername, const
std::string &strHostname, DataSet& Data)
{
return Data.find( UserInfo( strUsername, strHostname ));
}

void InsertToSet(const std::string &strUsername, const std::string
&strHostname, int nCount, DataSet& Data)
{
UserInfo Entry( strUsername, strHostname );
Entry.count = nCount;

DataSet::iterator it = Data.find( Entry );

if (it == Data.end())
{
Data.insert(Entry);
}
else
{
it->count += nCount;
}
}

std::string StrmConvert( int Number )
{
std::stringstream Temp;
Temp << Number;
std::string Output;
Temp >Output;
return Output;
}
int main ()
{
const int Entries = 100;

DataList ListData;
DataMap MapData;
DataSet SetData;

clock_t Start;
clock_t End;

// Insert to list
std::cout << "*** List ***\n";
for ( int Group = 0; Group < 10; ++Group )
{
std::string GroupPrefix = StrmConvert( Group );
Start = clock();
for ( int i = 0; i < Entries; ++i )
{
InsertToList( GroupPrefix + StrmConvert( i ), GroupPrefix +
StrmConvert( i ), i, ListData );
}
End = clock();
std::cout << "Inserting group " << Group << " " << Entries << "
unique entries to list: " << End - Start << "ns" << "\n";
}
Start = clock();
FindInList( "Bogus", "Bogus", ListData );
End = clock();
std::cout << "Search for non existant entry in List: " << End - Start <<
"ns\n";

std::cout << "List size:" << ListData.size() << "\n";

// Insert to map
std::cout << "\n*** Map ***\n";
for ( int Group = 0; Group < 10; ++Group )
{
std::string GroupPrefix = StrmConvert( Group );
Start = clock();
for ( int i = 0; i < Entries; ++i )
{
InsertToMap( GroupPrefix + StrmConvert( i ), GroupPrefix +
StrmConvert( i ), i, MapData );
}
End = clock();
std::cout << "Inserting group " << Group << " " << Entries << "
unique entries to map: " << End - Start << "ns" << "\n";
}
Start = clock();
// Map find is so fast one search reports 0ns
for ( int i = 0; i < 1000; ++i )
FindInMap( "Bogus", "Bogus", MapData );
End = clock();
std::cout << "Search for non existant entry in Map: " <<
static_cast<double>( End - Start ) / 1000.0 << "ns\n";
std::cout << "Map size:" << MapData.size() << "\n";

// Insert to set
std::cout << "\n*** Set ***\n";
for ( int Group = 0; Group < 10; ++Group )
{
std::string GroupPrefix = StrmConvert( Group );
Start = clock();
for ( int i = 0; i < Entries; ++i )
{
InsertToSet( GroupPrefix + StrmConvert( i ), GroupPrefix +
StrmConvert( i ), i, SetData );
}
End = clock();
std::cout << "Inserting group " << Group << " " << Entries << "
unique entries to set: " << End - Start << "ns" << "\n";
}
Start = clock();
// Set find is so fast one search reports 0ns
for ( int i = 0; i < 1000; ++i )
FindInSet( "Bogus", "Bogus", SetData );
End = clock();
std::cout << "Search for non existant entry in Set: " <<
static_cast<double>( End - Start ) / 1000.0 << "ns\n";

std::cout << "Set size:" << SetData.size() << "\n";

}
Jun 13 '07 #5
and then...

std::map<Account, std::stringpasswords;

will work fine.
Thanks alot. After this change it seems only to take less than a second now.
So this certainly made a great impact.

-- Henrik
Jun 13 '07 #6
Zeppe wrote:
Henrik Goldman wrote:
>>But map is always implemented in RB tree, the complexity is O(lg(n))

It is much fast if the size is bigger

Do you know if the construction I wrote is valid?

E.g. how would I do inserts and lookups in a map with a pair as 'first'.
Does pair<have a proper compare function?

No, it hasn't. Actually, you could write one for your problem, but at
this point it's better to wrap the information in a small class (or
struct, if you do not really need a class for this). For the map to
work, the operator< is needed. The simplest way is:

struct Account
{
std::string username;
std::string hostname;
bool operator<(const Account& a) const
{
return (username == a.username) ?
(hostname < hostname) :
(username < username);
}
};

and then...

std::map<Account, std::stringpasswords;

will work fine.

Regards,

Zeppe
Or you could also use boost::tuples, then you wouldn't need to define any
struct Account to define the operator< The tuples come with their own
comparisons if you include boost/tuple/tuple_comparison.hpp, then the
map will be able to insert with its operator< like this

#include <iostream>
#include <map>
#include "boost/tuple/tuple.hpp"
#include "boost/tuple/tuple_comparison.hpp"

using boost::tuple;
using boost::make_tuple;
using std::map;
using std::make_pair;
using std::string;
using std::cout;
using std::endl;
typedef map< tuple<string, string>, string PASSMAP;
typedef PASSMAP::iterator PASSMAP_ITER;

int main()
{
map< tuple<string, string>, string passwords;

passwords.insert(make_pair(make_tuple("host1", "user1"), "user3_password"));
passwords.insert(make_pair(make_tuple("host3", "userz"), "user2_password"));
passwords.insert(make_pair(make_tuple("host3", "usera"), "user3_password"));

for (PASSMAP_ITER pmi = passwords.begin();
pmi != passwords.end(); ++pmi)
{
cout << "host " << boost::tuples::get<0>((*pmi).first) << endl;
cout << "user " << boost::tuples::get<1>((*pmi).first) << endl;
cout << "passwd " << (*pmi).second << endl;
}

/* output is sorted by first tuple, then second tuple
host host1
user user1
passwd user3_password
host host3
user usera
passwd user3_password
host host3
user userz
passwd user2_password
*/
return 0;
}

Jun 14 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: rottyguy70 | last post by:
hello, am trying to do the following with little success. any help is appreciated. 1) assume i have a static mapping: a -> 1 b -> 2 c -> 3 2) i need to read through some data, let's say...
3
by: Steve Barnett | last post by:
I developed an app using VC# Express Beta that included an mdi parent and an mdi child form. Both forms contain a ToolStrip and, while using the Beta, these ToolStrips merged together as I expected...
2
by: Dave Taylor | last post by:
Is there a decent explanation of how menu merging with MDI forms work in VB.NET? I've read through the online help and it still seems that whenever I change menus around or whatever, it breaks...
5
by: Joe Fallon | last post by:
I have a list box with 7 text values in it. I have a pair of buttons to Move Up or Move Down the selected item one position. What is the simplest way to code the buttons so the item moves one...
2
by: Tom Costanza | last post by:
For the life of me, I can't seem to merge main menu items from child to parent with MDI forms. I can merge sub-menus, but not main menu items. So I have a parent form with: File Windows Help...
1
by: keveen | last post by:
Can someone tell me how I can import tables from another non-Joomla mysql file into Joomla? Basically it is just from one mySQL database into another. I use phpMyAdmin to import and export the entire...
6
by: dannylam4 | last post by:
Hello, I've got a question about merging/concatenating rows. There's a similar topic here: Combining Multiple Rows of one Field into One Result but I didn't know if I should hijack it. Basically, I...
14
by: etal | last post by:
Here's an algorithm question: How should I efficiently merge a collection of mostly similar lists, with different lengths and arbitrary contents, while eliminating duplicates and preserving order...
13
by: Joel Koltner | last post by:
Is there an easy way to get a list comprehension to produce a flat list of, say, for each input argument? E.g., I'd like to do something like: for x in range(4) ] ....and receive
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.