473,810 Members | 2,935 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

STL Map: Sorting options?

Hi,

I'm using a map with the key of type string, and value of type int.

typedef map<string, int, less<string> >concordance;

I'm finding words within some text and keeping a count of their
frequency.
I'm new to STL, but I've just read through the headers and can't see a
way to sort the results by value. (Maybe there's a way to modify less<>?)

Or... Should I create a second multimap (a copy) and have the int as the
key, and the string as the value, and then trivaially keep that sorted.
Or am I just using inappropriate container types to start with?

As I'm dealing with very large text files, speed is a concern too.

Thanks

Steve
Feb 21 '06 #1
10 7571

Steve Edwards wrote:
Hi,

I'm using a map with the key of type string, and value of type int.

typedef map<string, int, less<string> >concordance;

I'm finding words within some text and keeping a count of their
frequency.
I'm new to STL, but I've just read through the headers and can't see a
way to sort the results by value. (Maybe there's a way to modify less<>?)
"Results" is a misleading word here. std::map<Key, Value> is used to
keep
one (composite) value per key. It's a container, and containers do not
have
results. Functions have results=return values.
Or... Should I create a second multimap (a copy) and have the int as the
key, and the string as the value, and then trivaially keep that sorted.
That's one solution, if you often need that representation too, and the
keys
and values don't change. (basic caching pattern).
Or am I just using inappropriate container types to start with?
It's a good container to use when determining the int values, but it
may not
be the best container to keep those int values in.
As I'm dealing with very large text files, speed is a concern too.


Speed of what? Building the collection, or using it? And if the latter,
how?
(And if you talk files, you often are I/O limited anyway)

HTH,
Michiel Salters

Feb 21 '06 #2
In article <gf************ ***********@new s.btinternet.co m>,
Steve Edwards <gf*@lineone.ne t> wrote:
Hi,

I'm using a map with the key of type string, and value of type int.

typedef map<string, int, less<string> >concordance;

I'm finding words within some text and keeping a count of their
frequency.
I'm new to STL, but I've just read through the headers and can't see a
way to sort the results by value. (Maybe there's a way to modify less<>?)

Or... Should I create a second multimap (a copy) and have the int as the
key, and the string as the value, and then trivaially keep that sorted.
Or am I just using inappropriate container types to start with?

As I'm dealing with very large text files, speed is a concern too.


I'm think rather than a multimap, one could use "map<int, set<string> >"
That way, the strings will stay alphabetized.

--
Magic depends on tradition and belief. It does not welcome observation,
nor does it profit by experiment. On the other hand, science is based
on experience; it is open to correction by observation and experiment.
Feb 21 '06 #3
I'm new to STL, but I've just read through the headers and can't see a
way to sort the results by value. (Maybe there's a way to modify
less<>?)
"Results" is a misleading word here. std::map<Key, Value> is used to
keep
one (composite) value per key. It's a container, and containers do not
have
My mistake... I meant contents.
As I'm dealing with very large text files, speed is a concern too.

Speed of what? Building the collection, or using it? And if the latter,
how?
(And if you talk files, you often are I/O limited anyway)


I've already loaded my large text file in to memory as a simple array
of strings (one per word/token). I'm doing various lexical analyses on
these data. For some of these functions I need to quickly look up the
word as the key (hence my choice of <map>) to get the associated data;
other functions then require that the value mapped to the key can be
retrieved rapidly in order (hence my original question).

Since reading your reply I've built a copy as a multimap with the
key/value swapped, and using both containers together seems to be
working quite well in retrieving by either key or value rapidly.
(Though I don't have an alternative strategy to compare speed with, so
who knows.)

Building the structures is naturally slower, now, but it is an
acceptable tradeoff.

------------------------

In my original map

typedef map<string, int, less<string> >concordance;

every time I find another occurrence of the string key, is there a
quicker way to increment it's value count, than:

myConcordance[theWord] = myConcordance [theWord]+1;

It seems I'm doing 2 lookups of [theWord], can I change the value
in-place instead?

Thanks for your help.
Feb 21 '06 #4
In article <gf************ ***********@new s.btinternet.co m>,
Steve Edwards <gf*@lineone.ne t> wrote:
In my original map

typedef map<string, int, less<string> >concordance;

every time I find another occurrence of the string key, is there a
quicker way to increment it's value count, than:

myConcordance[theWord] = myConcordance [theWord]+1;

It seems I'm doing 2 lookups of [theWord], can I change the value
in-place instead?


++myConcordance[theWord];

--
Magic depends on tradition and belief. It does not welcome observation,
nor does it profit by experiment. On the other hand, science is based
on experience; it is open to correction by observation and experiment.
Feb 21 '06 #5
In article <po************ *************** ***@news.east.e arthlink.net>,
"Daniel T." <po********@ear thlink.net> wrote:
In article <gf************ ***********@new s.btinternet.co m>,
Steve Edwards <gf*@lineone.ne t> wrote:
Hi,

I'm using a map with the key of type string, and value of type int.

typedef map<string, int, less<string> >concordance;

I'm finding words within some text and keeping a count of their
frequency.
I'm new to STL, but I've just read through the headers and can't see a
way to sort the results by value. (Maybe there's a way to modify less<>?)

Or... Should I create a second multimap (a copy) and have the int as the
key, and the string as the value, and then trivaially keep that sorted.
Or am I just using inappropriate container types to start with?

As I'm dealing with very large text files, speed is a concern too.


I'm think rather than a multimap, one could use "map<int, set<string> >"
That way, the strings will stay alphabetized.


Thanks, I'll try that, see if it's quicker for my lookup needs.
Feb 21 '06 #6
In article <gf************ ***********@new s.btinternet.co m>,
Steve Edwards <gf*@lineone.ne t> wrote:
In article <po************ *************** ***@news.east.e arthlink.net>,
"Daniel T." <po********@ear thlink.net> wrote:
I'm think rather than a multimap, one could use "map<int, set<string> >"
That way, the strings will stay alphabetized.


Thanks, I'll try that, see if it's quicker for my lookup needs.


If nothing else, it will make it faster to determine how many words have
the same occurrence count.

--
Magic depends on tradition and belief. It does not welcome observation,
nor does it profit by experiment. On the other hand, science is based
on experience; it is open to correction by observation and experiment.
Feb 21 '06 #7
In article <po************ *************** ***@news.east.e arthlink.net>,
"Daniel T." <po********@ear thlink.net> wrote:

myConcordance[theWord] = myConcordance [theWord]+1;
++myConcordance[theWord];


Over 100 million insertions it dropped from 20s to 12.5s. Thanks.

"map<int, set<string> >"

If nothing else, it will make it faster to determine how many words
have
the same occurrence count.


That's a benefit, and it's cleaner to iterate through, too.
Feb 21 '06 #8
On 2006-02-21, Steve Edwards <gf*@lineone.ne t> wrote:
Hi,

I'm using a map with the key of type string, and value of type int.

typedef map<string, int, less<string> >concordance;

I'm finding words within some text and keeping a count of their
frequency.
I'm new to STL, but I've just read through the headers and can't see
a way to sort the results by value. (Maybe there's a way to modify
less<>?) > Or... Should I create a second multimap (a copy) and
have the int as the key, and the string as the value, and then
trivaially keep that sorted.
Or am I just using inappropriate container types to start with?

As I'm dealing with very large text files, speed is a concern too.


Here's another idea, which turned out to be more complicated than I at
first thought. The original plan was to store actual map iterators in
the vector, but then calling equal range became problematic, because I
had no iterator to pass in as the value to search for.

#include <iostream>
#include <iterator>
#include <vector>
#include <map>
#include <string>
#include <algorithm>

using std::map;
using std::string;
using std::vector;
using std::cout;
using std::pair;

bool less_first(pair <int, string> const& lhs, pair<int, string> const& rhs)
{
return lhs.first < rhs.first;
}

vector<pair<int , string> > mirror_map(map< string, int> const& m)
{
vector<pair<int , string> > mirror;
for (map<string, int>::const_ite rator i = m.begin(); i != m.end(); ++i)
{
mirror.push_bac k(pair<int, string>(i->second, i->first));
}
std::sort(mirro r.begin(), mirror.end(), less_first);
return mirror;
}

typedef vector<pair<int , string> >::iterator viter;
pair<viter, viter> word_range(vect or<pair<int, string> >& v, int n)
{
return std::equal_rang e(v.begin(),
v.end(),
pair<int, string>(n, ""),
less_first);
}

int main()
{
map<string, int> words;
words["love"] = 4;
words["like"] = 3;
words["the"] = 10;
words["hate"] = 4;
words["toodles"] = 1;

vector<pair<int , string> > mirror = mirror_map(word s);

// The whole mirror
for (viter i = mirror.begin() ; i != mirror.end() ; ++i)
{
cout << i->first << ": " << i->second << "\n";
}
std::cout << "----\n";

// The part of the mirror with value 4.
std::pair<viter , viter> range = word_range(mirr or, 4);
for (viter i = range.first ; i != range.second ; ++i)
{
cout << i->first << ": " << i->second << "\n";
}
return 0;
}
It is more lightweight than a multimap<int, string>, but not by as
much as I had originally hoped. I assumed that the map would not
change after it had been built up, i.e., you don't need the mirror
while building the map.

--
Neil Cerutti
Feb 21 '06 #9
In article <sl************ *********@FIAD0 6.norwich.edu>,
Neil Cerutti <le*******@emai l.com> wrote:
Here's another idea, which turned out to be more complicated than I at
first thought. The original plan was to store actual map iterators in
the vector, but then calling equal range became problematic, because I
had no iterator to pass in as the value to search for.

#include <iostream>
#include <iterator>
#include <vector>
#include <map>
#include <string>
#include <algorithm>

using std::map;
using std::string;
using std::vector;
using std::cout;
using std::pair;

bool less_first(pair <int, string> const& lhs, pair<int, string> const& rhs)
{
return lhs.first < rhs.first;
}

vector<pair<int , string> > mirror_map(map< string, int> const& m)
{
vector<pair<int , string> > mirror;
for (map<string, int>::const_ite rator i = m.begin(); i != m.end(); ++i)
{
mirror.push_bac k(pair<int, string>(i->second, i->first));
}
std::sort(mirro r.begin(), mirror.end(), less_first);
return mirror;
}

typedef vector<pair<int , string> >::iterator viter;
pair<viter, viter> word_range(vect or<pair<int, string> >& v, int n)
{
return std::equal_rang e(v.begin(),
v.end(),
pair<int, string>(n, ""),
less_first);
}

int main()
{
map<string, int> words;
words["love"] = 4;
words["like"] = 3;
words["the"] = 10;
words["hate"] = 4;
words["toodles"] = 1;

vector<pair<int , string> > mirror = mirror_map(word s);

// The whole mirror
for (viter i = mirror.begin() ; i != mirror.end() ; ++i)
{
cout << i->first << ": " << i->second << "\n";
}
std::cout << "----\n";

// The part of the mirror with value 4.
std::pair<viter , viter> range = word_range(mirr or, 4);
for (viter i = range.first ; i != range.second ; ++i)
{
cout << i->first << ": " << i->second << "\n";
}
return 0;
}
It is more lightweight than a multimap<int, string>, but not by as
much as I had originally hoped. I assumed that the map would not
change after it had been built up, i.e., you don't need the mirror
while building the map.

Whoa... thanks for this... I've been staring at it for 20 minutes trying
to get my head around it (I'm new to the stl.) Sorry I can't say
anything more constructive at the moment, until I can understand whether
doing it this way is going to be beneficial. Not having any STL text
book, I'd only found <map> code snippets on the web that seemed to suit
my needs. Having read this and googled for 'pair', I'm amazed I got as
far as I did without understanding them. (Up until now I'd been using
Objective-C which has NSDictionary and similar structures, that hide the
guts of this kind of stuff behind a lot of really intuitive convenience
functions... but I prefer C++ so here I am!)
Feb 22 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
17680
by: Diego Barros | last post by:
Hello, I was wondering if it was posibble to sort a map using the values and not the keys. Sorting on the keys (if they were strings, for example) is currently possible. What about, for example, if I have a map with string keys and object values, can I have custom sorting of the items in the map based on member variables of said object? Any help would be appreciated. Thanks.
4
3869
by: Aaron Walker | last post by:
I want a std::map<std::string, std::string> but I don't want it sorted by keys. I've been able to simulate this with a vector of pairs and operator. class umap : public std::vector<std::pair<std::string, std::string > > { public: std::string &operator (std::string s) { for (iterator i = begin() ; i != end() ; ++i) if (i->first == s)
1
1654
by: Ross | last post by:
The following codes originally provided by a kind responder Toby A Inkster in another newsgroup for working on displaying a table on web capable of sorting are modified by me (a newbie). How to disable sorting by say, length and chart (last few codes at the end)? I find the order is important to display so cannot just single them out. <?php function insert_datatable_cmp ($a, $b) { return ($a]<$b]) ? -1 : 1; }
4
2432
by: Tim Henderson | last post by:
Hi The question why are there no sorted dictionaries in python, seems to pop up with unseeming regularity. That question in itself in nonsensical sense dictionaries are hash-maps, however should python have a sorted map type object is a good question. clearly many people like have a sorted map, and sorting the keys every time seems rather wasteful, as does keeping a separate sorted list of the keys.
2
1918
by: Gernot Frisch | last post by:
Hi, I know I can give a std::map a sorting function in the c'tor. However, I would need to _change_ the sorting behaviour of an existing instance. How would I do that? Thank you, --
13
20166
by: zaineb | last post by:
Hi, This is a follow-up of sort of this thread: http://groups.google.com/group/comp.lang.c++/browse_thread/thread/de12af6539e0d645/662cab752779f1fa?lnk=st&q=c%2B%2B+vector+vs+map&rnum=1&hl=en#662cab752779f1fa I've been chewing on this problem for a while. When I came across the above thread, I figured that maybe other members have a better idea on how to approach this. I'm currently trying to decide between which one of maps or...
10
2797
by: Sjaakie | last post by:
Hi, I'm, what it turns out to be, fooling around with 3-tier design. At several websites people get really enthusiastic about using custom dataobjects instead of datasets/-tables. While trying to write such layers myself I got stuck on how to get filtered or sorted data from the data-layer. This is what I got: Objects
4
2928
by: tomek milewski | last post by:
Hello, I have a map with keys that can be compared each other. I need a method that returns the lowest and the greatest key from that map. Now I'm using begin() and rbegin() which gives iterators to first and last element of the map. My question is: Does begin() and rbegin() guarantee that returned iterators point on the pair with lowest and gratest value?
2
2791
by: jabbah | last post by:
I have some data in a map and I want to sort it. Currently I have implemented it like this: #include <iostream> #include <map> #include <string> using namespace std; int main(){
0
9722
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10644
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10379
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10124
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9200
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6882
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5690
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4334
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
3015
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.