473,609 Members | 2,187 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Merging a list of similar items

Hello,

I have a dataset which consist of a string username and string hostname as a
key and then an integer representing a count as the matching "second" value
in a pair.

So far I've used std::list to store this information by inserting objects
containing all 3 pieces of information. However now some users has datasets
in the range of 12000 records and my list seems to be taking like forever to
complete inserting and finding.
For this reason I'm looking for ways to speed this up.

Here is a bit of code from the existing solution (which works but is too
slow):

iterator Find(const string &strUsername , const string &strHostname )
{
iterator it;

for (it = begin(); it != end(); it++)
{
if (StrNoCaseCompa re(it->m_strUsernam e, strUsername) &&
StrNoCaseCompar e(it->m_strHostnam e, strHostname))
return it;
}

return end();
}

void Insert(const string &strUsername , const string &strHostname , int
nCount)
{
CDailyUsage DU;

iterator it = Find(strUsernam e, strHostname);

if (it == end())
{
DU.m_strUsernam e = strUsername;
DU.m_strHostnam e = strHostname;
DU.m_nCount = nCount;

PushBack(DU);
}
else
{
it->m_nCount += nCount;
}
}

I'm thinking about replacing it with map<pair<userna me,hostname>, ncount>
but I don't know how much this will give me.
Are anyone up for good ideas?

Thanks.

-- Henrik
Jun 13 '07 #1
6 1803
hi:

stl::map is much more suitable for your requirement.

As you mentioned the total size of dataset is around 12000,
it is a very big number and usually the stl::list is implemented in
linkedlink.
It will travers from the begin to the end, this means the searching
time complexity is O(n).

But map is always implemented in RB tree, the complexity is O(lg(n))

It is much fast if the size is bigger
On Jun 13, 3:03 pm, "Henrik Goldman" <henrik_gold... @mail.tele.dk>
wrote:
Hello,

I have a dataset which consist of a string username and string hostname as a
key and then an integer representing a count as the matching "second" value
in a pair.

So far I've used std::list to store this information by inserting objects
containing all 3 pieces of information. However now some users has datasets
in the range of 12000 records and my list seems to be taking like forever to
complete inserting and finding.
For this reason I'm looking for ways to speed this up.

Here is a bit of code from the existing solution (which works but is too
slow):

iterator Find(const string &strUsername , const string &strHostname )
{
iterator it;

for (it = begin(); it != end(); it++)
{
if (StrNoCaseCompa re(it->m_strUsernam e, strUsername) &&
StrNoCaseCompar e(it->m_strHostnam e, strHostname))
return it;
}

return end();
}

void Insert(const string &strUsername , const string &strHostname , int
nCount)
{
CDailyUsage DU;

iterator it = Find(strUsernam e, strHostname);

if (it == end())
{
DU.m_strUsernam e = strUsername;
DU.m_strHostnam e = strHostname;
DU.m_nCount = nCount;

PushBack(DU);
}
else
{
it->m_nCount += nCount;
}
}

I'm thinking about replacing it with map<pair<userna me,hostname>, ncount>
but I don't know how much this will give me.
Are anyone up for good ideas?

Thanks.

-- Henrik

Jun 13 '07 #2
But map is always implemented in RB tree, the complexity is O(lg(n))
>
It is much fast if the size is bigger
Do you know if the construction I wrote is valid?

E.g. how would I do inserts and lookups in a map with a pair as 'first'.
Does pair<have a proper compare function?

-- Henrik
Jun 13 '07 #3
Henrik Goldman wrote:
>But map is always implemented in RB tree, the complexity is O(lg(n))

It is much fast if the size is bigger

Do you know if the construction I wrote is valid?

E.g. how would I do inserts and lookups in a map with a pair as 'first'.
Does pair<have a proper compare function?
No, it hasn't. Actually, you could write one for your problem, but at
this point it's better to wrap the information in a small class (or
struct, if you do not really need a class for this). For the map to
work, the operator< is needed. The simplest way is:

struct Account
{
std::string username;
std::string hostname;
bool operator<(const Account& a) const
{
return (username == a.username) ?
(hostname < hostname) :
(username < username);
}
};

and then...

std::map<Accoun t, std::stringpass words;

will work fine.

Regards,

Zeppe
Jun 13 '07 #4
"Henrik Goldman" <he************ @mail.tele.dkwr ote in message
news:46******** *************** @dread11.news.t ele.dk...
Hello,

I have a dataset which consist of a string username and string hostname as
a key and then an integer representing a count as the matching "second"
value in a pair.

So far I've used std::list to store this information by inserting objects
containing all 3 pieces of information. However now some users has
datasets in the range of 12000 records and my list seems to be taking like
forever to complete inserting and finding.
For this reason I'm looking for ways to speed this up.

Here is a bit of code from the existing solution (which works but is too
slow):

iterator Find(const string &strUsername , const string &strHostname )
{
iterator it;

for (it = begin(); it != end(); it++)
{
if (StrNoCaseCompa re(it->m_strUsernam e, strUsername) &&
StrNoCaseCompar e(it->m_strHostnam e, strHostname))
return it;
}

return end();
}

void Insert(const string &strUsername , const string &strHostname , int
nCount)
{
CDailyUsage DU;

iterator it = Find(strUsernam e, strHostname);

if (it == end())
{
DU.m_strUsernam e = strUsername;
DU.m_strHostnam e = strHostname;
DU.m_nCount = nCount;

PushBack(DU);
}
else
{
it->m_nCount += nCount;
}
}

I'm thinking about replacing it with map<pair<userna me,hostname>, ncount>
but I don't know how much this will give me.
Are anyone up for good ideas?
A std::map should give you a whole lot. A list is unindexed and for every
search you are iteratirng though every possible record. A map is indexed (I
believe they use binary trees or better) and searches are a lot faster, a
WHOLE lot faster. The difficulty comes in your strNoCase compare, but you
can get around that by making the key a class with an operator<..

The way you are using this you probably want a set anyway since your data is
in your class/structure.

Output of following program showing your type of functions for List, Map and
Set is:

*** List ***
Inserting group 0 100 unique entries to list: 94ns
Inserting group 1 100 unique entries to list: 266ns
Inserting group 2 100 unique entries to list: 343ns
Inserting group 3 100 unique entries to list: 469ns
Inserting group 4 100 unique entries to list: 609ns
Inserting group 5 100 unique entries to list: 719ns
Inserting group 6 100 unique entries to list: 860ns
Inserting group 7 100 unique entries to list: 1000ns
Inserting group 8 100 unique entries to list: 1171ns
Inserting group 9 100 unique entries to list: 1250ns
Search for non existant entry in List: 16ns
List size:1000

*** Map ***
Inserting group 0 100 unique entries to map: 47ns
Inserting group 1 100 unique entries to map: 62ns
Inserting group 2 100 unique entries to map: 63ns
Inserting group 3 100 unique entries to map: 78ns
Inserting group 4 100 unique entries to map: 78ns
Inserting group 5 100 unique entries to map: 94ns
Inserting group 6 100 unique entries to map: 94ns
Inserting group 7 100 unique entries to map: 94ns
Inserting group 8 100 unique entries to map: 78ns
Inserting group 9 100 unique entries to map: 94ns
Search for non existant entry in Map: 0.485ns
Map size:1000

*** Set ***
Inserting group 0 100 unique entries to set: 46ns
Inserting group 1 100 unique entries to set: 79ns
Inserting group 2 100 unique entries to set: 78ns
Inserting group 3 100 unique entries to set: 78ns
Inserting group 4 100 unique entries to set: 78ns
Inserting group 5 100 unique entries to set: 78ns
Inserting group 6 100 unique entries to set: 93ns
Inserting group 7 100 unique entries to set: 94ns
Inserting group 8 100 unique entries to set: 94ns
Inserting group 9 100 unique entries to set: 94ns
Search for non existant entry in Set: 0.531ns
Set size:1000

The main thing to notice is the search times. For your list it's a linear
search, takes longer and longer the more entries you have. For map and set,
they use indexes, so QUITE a bit faster. Here is program: (Please note
that followoing program is not optmized, I just tried to follow the way you
were doing things with maps and sets also).

#include <iostream>
#include <string>
#include <ctime>
#include <cstring>
#include <utility>
#include <sstream>

#include <list>
#include <map>
#include <set>

std::string Lowercase( const std::string& MixedString )
{
std::string ReturnValue = "";
for ( std::string::si ze_type i = 0; i < MixedString.siz e(); ++i )
ReturnValue += static_cast<cha r>( tolower( MixedString[i] ) );
return ReturnValue;
}

bool StrNoCaseCompar e( const std::string& Str1, const std::string& Str2 )
{
if ( Lowercase( Str1 ) == Lowercase( Str2 ) )
return true;

return false;
}
class UserInfo
{
public:
UserInfo( const std::string& Username, const std::string& Hostname ):
Username( Username ), Hostname( Hostname ) {}
std::string Username;
std::string Hostname;
int count; // Only used in set
};

bool operator<( const UserInfo& lhs, const UserInfo& rhs )
{
// case insensitve compare
// use your fastest method, this one just thrown together
std::string Userlhs = Lowercase( lhs.Username );
std::string Hostlhs = Lowercase( lhs.Hostname );
std::string Userrhs = Lowercase( rhs.Username );
std::string Hostrhs = Lowercase( rhs.Hostname );

if ( Userlhs == Userrhs )
if ( Hostlhs < Hostrhs )
return true;
else
return false;
else if ( Userlhs < Userrhs )
return true;
else
return false;
}
typedef std::list<std:: pair<UserInfo, int DataList;
typedef std::map<UserIn fo, int DataMap;
typedef std::set<UserIn foDataSet;

DataList::itera tor FindInList(cons t std::string &strUsername , const
std::string &strHostname , DataList& Data)
{
DataList::itera tor it;

for (DataList::iter ator it = Data.begin(); it != Data.end(); ++it)
{
if (StrNoCaseCompa re(it->first.Username , strUsername) &&
StrNoCaseCompar e(it->first.Hostname , strHostname))
return it;
}

return Data.end();
}

void InsertToList(co nst std::string &strUsername , const std::string
&strHostname , int nCount, DataList& Data)
{
DataList::itera tor it = FindInList(strU sername, strHostname, Data);

if (it == Data.end())
{
Data.push_back( std::make_pair( UserInfo( strUsername, strHostname ),
nCount ));
}
else
{
it->second += nCount;
}
}

DataMap::iterat or FindInMap(const std::string &strUsername , const
std::string &strHostname , DataMap& Data)
{
return Data.find( UserInfo( strUsername, strHostname ));
}
void InsertToMap(con st std::string &strUsername , const std::string
&strHostname , int nCount, DataMap& Data)
{
UserInfo Entry( strUsername, strHostname );
DataMap::iterat or it = Data.find( Entry );

if (it == Data.end())
{
Data.insert(std ::make_pair( Entry, nCount ));
}
else
{
it->second += nCount;
}
}

DataSet::iterat or FindInSet(const std::string &strUsername , const
std::string &strHostname , DataSet& Data)
{
return Data.find( UserInfo( strUsername, strHostname ));
}

void InsertToSet(con st std::string &strUsername , const std::string
&strHostname , int nCount, DataSet& Data)
{
UserInfo Entry( strUsername, strHostname );
Entry.count = nCount;

DataSet::iterat or it = Data.find( Entry );

if (it == Data.end())
{
Data.insert(Ent ry);
}
else
{
it->count += nCount;
}
}

std::string StrmConvert( int Number )
{
std::stringstre am Temp;
Temp << Number;
std::string Output;
Temp >Output;
return Output;
}
int main ()
{
const int Entries = 100;

DataList ListData;
DataMap MapData;
DataSet SetData;

clock_t Start;
clock_t End;

// Insert to list
std::cout << "*** List ***\n";
for ( int Group = 0; Group < 10; ++Group )
{
std::string GroupPrefix = StrmConvert( Group );
Start = clock();
for ( int i = 0; i < Entries; ++i )
{
InsertToList( GroupPrefix + StrmConvert( i ), GroupPrefix +
StrmConvert( i ), i, ListData );
}
End = clock();
std::cout << "Inserting group " << Group << " " << Entries << "
unique entries to list: " << End - Start << "ns" << "\n";
}
Start = clock();
FindInList( "Bogus", "Bogus", ListData );
End = clock();
std::cout << "Search for non existant entry in List: " << End - Start <<
"ns\n";

std::cout << "List size:" << ListData.size() << "\n";

// Insert to map
std::cout << "\n*** Map ***\n";
for ( int Group = 0; Group < 10; ++Group )
{
std::string GroupPrefix = StrmConvert( Group );
Start = clock();
for ( int i = 0; i < Entries; ++i )
{
InsertToMap( GroupPrefix + StrmConvert( i ), GroupPrefix +
StrmConvert( i ), i, MapData );
}
End = clock();
std::cout << "Inserting group " << Group << " " << Entries << "
unique entries to map: " << End - Start << "ns" << "\n";
}
Start = clock();
// Map find is so fast one search reports 0ns
for ( int i = 0; i < 1000; ++i )
FindInMap( "Bogus", "Bogus", MapData );
End = clock();
std::cout << "Search for non existant entry in Map: " <<
static_cast<dou ble>( End - Start ) / 1000.0 << "ns\n";
std::cout << "Map size:" << MapData.size() << "\n";

// Insert to set
std::cout << "\n*** Set ***\n";
for ( int Group = 0; Group < 10; ++Group )
{
std::string GroupPrefix = StrmConvert( Group );
Start = clock();
for ( int i = 0; i < Entries; ++i )
{
InsertToSet( GroupPrefix + StrmConvert( i ), GroupPrefix +
StrmConvert( i ), i, SetData );
}
End = clock();
std::cout << "Inserting group " << Group << " " << Entries << "
unique entries to set: " << End - Start << "ns" << "\n";
}
Start = clock();
// Set find is so fast one search reports 0ns
for ( int i = 0; i < 1000; ++i )
FindInSet( "Bogus", "Bogus", SetData );
End = clock();
std::cout << "Search for non existant entry in Set: " <<
static_cast<dou ble>( End - Start ) / 1000.0 << "ns\n";

std::cout << "Set size:" << SetData.size() << "\n";

}
Jun 13 '07 #5
and then...

std::map<Accoun t, std::stringpass words;

will work fine.
Thanks alot. After this change it seems only to take less than a second now.
So this certainly made a great impact.

-- Henrik
Jun 13 '07 #6
Zeppe wrote:
Henrik Goldman wrote:
>>But map is always implemented in RB tree, the complexity is O(lg(n))

It is much fast if the size is bigger

Do you know if the construction I wrote is valid?

E.g. how would I do inserts and lookups in a map with a pair as 'first'.
Does pair<have a proper compare function?

No, it hasn't. Actually, you could write one for your problem, but at
this point it's better to wrap the information in a small class (or
struct, if you do not really need a class for this). For the map to
work, the operator< is needed. The simplest way is:

struct Account
{
std::string username;
std::string hostname;
bool operator<(const Account& a) const
{
return (username == a.username) ?
(hostname < hostname) :
(username < username);
}
};

and then...

std::map<Accoun t, std::stringpass words;

will work fine.

Regards,

Zeppe
Or you could also use boost::tuples, then you wouldn't need to define any
struct Account to define the operator< The tuples come with their own
comparisons if you include boost/tuple/tuple_compariso n.hpp, then the
map will be able to insert with its operator< like this

#include <iostream>
#include <map>
#include "boost/tuple/tuple.hpp"
#include "boost/tuple/tuple_compariso n.hpp"

using boost::tuple;
using boost::make_tup le;
using std::map;
using std::make_pair;
using std::string;
using std::cout;
using std::endl;
typedef map< tuple<string, string>, string PASSMAP;
typedef PASSMAP::iterat or PASSMAP_ITER;

int main()
{
map< tuple<string, string>, string passwords;

passwords.inser t(make_pair(mak e_tuple("host1" , "user1"), "user3_password "));
passwords.inser t(make_pair(mak e_tuple("host3" , "userz"), "user2_password "));
passwords.inser t(make_pair(mak e_tuple("host3" , "usera"), "user3_password "));

for (PASSMAP_ITER pmi = passwords.begin ();
pmi != passwords.end() ; ++pmi)
{
cout << "host " << boost::tuples:: get<0>((*pmi).f irst) << endl;
cout << "user " << boost::tuples:: get<1>((*pmi).f irst) << endl;
cout << "passwd " << (*pmi).second << endl;
}

/* output is sorted by first tuple, then second tuple
host host1
user user1
passwd user3_password
host host3
user usera
passwd user3_password
host host3
user userz
passwd user2_password
*/
return 0;
}

Jun 14 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
1474
by: rottyguy70 | last post by:
hello, am trying to do the following with little success. any help is appreciated. 1) assume i have a static mapping: a -> 1 b -> 2 c -> 3 2) i need to read through some data, let's say set and create
3
7376
by: Steve Barnett | last post by:
I developed an app using VC# Express Beta that included an mdi parent and an mdi child form. Both forms contain a ToolStrip and, while using the Beta, these ToolStrips merged together as I expected them to. Now, I've updated to the final release of VC# Express and my ToolStrips no longer merge. What happens now is that the ToolStrip on the mdiparent displays and the ToolStrip from the mdichild never gets displayed at all. Does anyone...
2
2420
by: Dave Taylor | last post by:
Is there a decent explanation of how menu merging with MDI forms work in VB.NET? I've read through the online help and it still seems that whenever I change menus around or whatever, it breaks everything. VB6, as repetitive as it was to retype menus, was at least consistent. It seems that VB.NET throws menu items wherever it feels like.
5
24765
by: Joe Fallon | last post by:
I have a list box with 7 text values in it. I have a pair of buttons to Move Up or Move Down the selected item one position. What is the simplest way to code the buttons so the item moves one position? -- Joe Fallon
2
1355
by: Tom Costanza | last post by:
For the life of me, I can't seem to merge main menu items from child to parent with MDI forms. I can merge sub-menus, but not main menu items. So I have a parent form with: File Windows Help And a child form with: Edit View And when the child form is displayed, I want the menu to be:
1
1577
by: keveen | last post by:
Can someone tell me how I can import tables from another non-Joomla mysql file into Joomla? Basically it is just from one mySQL database into another. I use phpMyAdmin to import and export the entire file but I don't know how to do queries. I tried exporting the source database and then renaming all the database names to match the ones I want to merge into but all that happened was a new table was created - no merging. I really just want to...
6
2121
by: dannylam4 | last post by:
Hello, I've got a question about merging/concatenating rows. There's a similar topic here: Combining Multiple Rows of one Field into One Result but I didn't know if I should hijack it. Basically, I have a single table in Access that looks like: Name - - - Address - -Email - - - - -Comments John Doe - 11211 - - - j2@g.com - lad John Doe - 41541 - - - q3@g.com -asd John Doe - 12345 - - - w2@g.com -ask And what I basically want it to look...
14
4004
by: etal | last post by:
Here's an algorithm question: How should I efficiently merge a collection of mostly similar lists, with different lengths and arbitrary contents, while eliminating duplicates and preserving order as much as possible? My code: def merge_to_unique(sources): """Merge the unique elements from each list in sources into new list.
13
1508
by: Joel Koltner | last post by:
Is there an easy way to get a list comprehension to produce a flat list of, say, for each input argument? E.g., I'd like to do something like: for x in range(4) ] ....and receive
0
8109
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8035
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
8188
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8374
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6969
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5502
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4002
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
2502
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
0
1366
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.