STL Map: Sorting options?

Steve Edwards

Hi,

I'm using a map with the key of type string, and value of type int.

typedef map<string, int, less<string> >concordance;

I'm finding words within some text and keeping a count of their
frequency.
I'm new to STL, but I've just read through the headers and can't see a
way to sort the results by value. (Maybe there's a way to modify less<>?)

Or... Should I create a second multimap (a copy) and have the int as the
key, and the string as the value, and then trivaially keep that sorted.
Or am I just using inappropriate container types to start with?

As I'm dealing with very large text files, speed is a concern too.

Thanks

Steve

Feb 21 '06 #1

Subscribe Post Reply

7535

Michiel.Salters

Steve Edwards wrote:

Hi,

I'm using a map with the key of type string, and value of type int.

typedef map<string, int, less<string> >concordance;

I'm finding words within some text and keeping a count of their
frequency.
I'm new to STL, but I've just read through the headers and can't see a
way to sort the results by value. (Maybe there's a way to modify less<>?)
"Results" is a misleading word here. std::map<Key, Value> is used to
keep
one (composite) value per key. It's a container, and containers do not
have
results. Functions have results=return values.
Or... Should I create a second multimap (a copy) and have the int as the
key, and the string as the value, and then trivaially keep that sorted.
That's one solution, if you often need that representation too, and the
keys
and values don't change. (basic caching pattern).
Or am I just using inappropriate container types to start with?
It's a good container to use when determining the int values, but it
may not
be the best container to keep those int values in.
As I'm dealing with very large text files, speed is a concern too.

Speed of what? Building the collection, or using it? And if the latter,
how?
(And if you talk files, you often are I/O limited anyway)

HTH,
Michiel Salters

Feb 21 '06 #2

Daniel T.

In article <gf***********************@news.btinternet.com>,
Steve Edwards <gf*@lineone.net> wrote:

Hi,

I'm using a map with the key of type string, and value of type int.

typedef map<string, int, less<string> >concordance;

I'm finding words within some text and keeping a count of their
frequency.
I'm new to STL, but I've just read through the headers and can't see a
way to sort the results by value. (Maybe there's a way to modify less<>?)

Or... Should I create a second multimap (a copy) and have the int as the
key, and the string as the value, and then trivaially keep that sorted.
Or am I just using inappropriate container types to start with?

As I'm dealing with very large text files, speed is a concern too.

I'm think rather than a multimap, one could use "map<int, set<string> >"
That way, the strings will stay alphabetized.

--
Magic depends on tradition and belief. It does not welcome observation,
nor does it profit by experiment. On the other hand, science is based
on experience; it is open to correction by observation and experiment.

Feb 21 '06 #3

Steve Edwards

I'm new to STL, but I've just read through the headers and can't see a
way to sort the results by value. (Maybe there's a way to modify
less<>?)
"Results" is a misleading word here. std::map<Key, Value> is used to
keep
one (composite) value per key. It's a container, and containers do not
have
My mistake... I meant contents.
As I'm dealing with very large text files, speed is a concern too.

Speed of what? Building the collection, or using it? And if the latter,
how?
(And if you talk files, you often are I/O limited anyway)

I've already loaded my large text file in to memory as a simple array
of strings (one per word/token). I'm doing various lexical analyses on
these data. For some of these functions I need to quickly look up the
word as the key (hence my choice of <map>) to get the associated data;
other functions then require that the value mapped to the key can be
retrieved rapidly in order (hence my original question).

Since reading your reply I've built a copy as a multimap with the
key/value swapped, and using both containers together seems to be
working quite well in retrieving by either key or value rapidly.
(Though I don't have an alternative strategy to compare speed with, so
who knows.)

Building the structures is naturally slower, now, but it is an
acceptable tradeoff.

------------------------

In my original map

typedef map<string, int, less<string> >concordance;

every time I find another occurrence of the string key, is there a
quicker way to increment it's value count, than:

myConcordance[theWord] = myConcordance [theWord]+1;

It seems I'm doing 2 lookups of [theWord], can I change the value
in-place instead?

Thanks for your help.

Feb 21 '06 #4

Daniel T.

In article <gf***********************@news.btinternet.com>,
Steve Edwards <gf*@lineone.net> wrote:

In my original map

typedef map<string, int, less<string> >concordance;

every time I find another occurrence of the string key, is there a
quicker way to increment it's value count, than:

myConcordance[theWord] = myConcordance [theWord]+1;

It seems I'm doing 2 lookups of [theWord], can I change the value
in-place instead?

++myConcordance[theWord];

--
Magic depends on tradition and belief. It does not welcome observation,
nor does it profit by experiment. On the other hand, science is based
on experience; it is open to correction by observation and experiment.

Feb 21 '06 #5

Steve Edwards

In article <po******************************@news.east.earthl ink.net>,
"Daniel T." <po********@earthlink.net> wrote:

In article <gf***********************@news.btinternet.com>,
Steve Edwards <gf*@lineone.net> wrote:
Hi,

I'm using a map with the key of type string, and value of type int.

typedef map<string, int, less<string> >concordance;

I'm finding words within some text and keeping a count of their
frequency.
I'm new to STL, but I've just read through the headers and can't see a
way to sort the results by value. (Maybe there's a way to modify less<>?)

Or... Should I create a second multimap (a copy) and have the int as the
key, and the string as the value, and then trivaially keep that sorted.
Or am I just using inappropriate container types to start with?

As I'm dealing with very large text files, speed is a concern too.

I'm think rather than a multimap, one could use "map<int, set<string> >"
That way, the strings will stay alphabetized.

Thanks, I'll try that, see if it's quicker for my lookup needs.

Feb 21 '06 #6

Daniel T.

In article <gf***********************@news.btinternet.com>,
Steve Edwards <gf*@lineone.net> wrote:

In article <po******************************@news.east.earthl ink.net>,
"Daniel T." <po********@earthlink.net> wrote:
I'm think rather than a multimap, one could use "map<int, set<string> >"
That way, the strings will stay alphabetized.

Thanks, I'll try that, see if it's quicker for my lookup needs.

If nothing else, it will make it faster to determine how many words have
the same occurrence count.

--
Magic depends on tradition and belief. It does not welcome observation,
nor does it profit by experiment. On the other hand, science is based
on experience; it is open to correction by observation and experiment.

Feb 21 '06 #7

Steve Edwards

In article <po******************************@news.east.earthl ink.net>,
"Daniel T." <po********@earthlink.net> wrote:

myConcordance[theWord] = myConcordance [theWord]+1;
++myConcordance[theWord];

Over 100 million insertions it dropped from 20s to 12.5s. Thanks.

"map<int, set<string> >"

If nothing else, it will make it faster to determine how many words
have
the same occurrence count.

That's a benefit, and it's cleaner to iterate through, too.

Feb 21 '06 #8

Neil Cerutti

On 2006-02-21, Steve Edwards <gf*@lineone.net> wrote:

Hi,

I'm using a map with the key of type string, and value of type int.

typedef map<string, int, less<string> >concordance;

I'm finding words within some text and keeping a count of their
frequency.
I'm new to STL, but I've just read through the headers and can't see
a way to sort the results by value. (Maybe there's a way to modify
less<>?) > Or... Should I create a second multimap (a copy) and
have the int as the key, and the string as the value, and then
trivaially keep that sorted.
Or am I just using inappropriate container types to start with?

As I'm dealing with very large text files, speed is a concern too.

Here's another idea, which turned out to be more complicated than I at
first thought. The original plan was to store actual map iterators in
the vector, but then calling equal range became problematic, because I
had no iterator to pass in as the value to search for.

#include <iostream>
#include <iterator>
#include <vector>
#include <map>
#include <string>
#include <algorithm>

using std::map;
using std::string;
using std::vector;
using std::cout;
using std::pair;

bool less_first(pair<int, string> const& lhs, pair<int, string> const& rhs)
{
return lhs.first < rhs.first;
}

vector<pair<int, string> > mirror_map(map<string, int> const& m)
{
vector<pair<int, string> > mirror;
for (map<string, int>::const_iterator i = m.begin(); i != m.end(); ++i)
{
mirror.push_back(pair<int, string>(i->second, i->first));
}
std::sort(mirror.begin(), mirror.end(), less_first);
return mirror;
}

typedef vector<pair<int, string> >::iterator viter;
pair<viter, viter> word_range(vector<pair<int, string> >& v, int n)
{
return std::equal_range(v.begin(),
v.end(),
pair<int, string>(n, ""),
less_first);
}

int main()
{
map<string, int> words;
words["love"] = 4;
words["like"] = 3;
words["the"] = 10;
words["hate"] = 4;
words["toodles"] = 1;

vector<pair<int, string> > mirror = mirror_map(words);

// The whole mirror
for (viter i = mirror.begin() ; i != mirror.end() ; ++i)
{
cout << i->first << ": " << i->second << "\n";
}
std::cout << "----\n";

// The part of the mirror with value 4.
std::pair<viter, viter> range = word_range(mirror, 4);
for (viter i = range.first ; i != range.second ; ++i)
{
cout << i->first << ": " << i->second << "\n";
}
return 0;
}
It is more lightweight than a multimap<int, string>, but not by as
much as I had originally hoped. I assumed that the map would not
change after it had been built up, i.e., you don't need the mirror
while building the map.

--
Neil Cerutti

Feb 21 '06 #9

Steve Edwards

In article <sl*********************@FIAD06.norwich.edu>,
Neil Cerutti <le*******@email.com> wrote:

Here's another idea, which turned out to be more complicated than I at
first thought. The original plan was to store actual map iterators in
the vector, but then calling equal range became problematic, because I
had no iterator to pass in as the value to search for.

#include <iostream>
#include <iterator>
#include <vector>
#include <map>
#include <string>
#include <algorithm>

using std::map;
using std::string;
using std::vector;
using std::cout;
using std::pair;

bool less_first(pair<int, string> const& lhs, pair<int, string> const& rhs)
{
return lhs.first < rhs.first;
}

vector<pair<int, string> > mirror_map(map<string, int> const& m)
{
vector<pair<int, string> > mirror;
for (map<string, int>::const_iterator i = m.begin(); i != m.end(); ++i)
{
mirror.push_back(pair<int, string>(i->second, i->first));
}
std::sort(mirror.begin(), mirror.end(), less_first);
return mirror;
}

typedef vector<pair<int, string> >::iterator viter;
pair<viter, viter> word_range(vector<pair<int, string> >& v, int n)
{
return std::equal_range(v.begin(),
v.end(),
pair<int, string>(n, ""),
less_first);
}

int main()
{
map<string, int> words;
words["love"] = 4;
words["like"] = 3;
words["the"] = 10;
words["hate"] = 4;
words["toodles"] = 1;

vector<pair<int, string> > mirror = mirror_map(words);

// The whole mirror
for (viter i = mirror.begin() ; i != mirror.end() ; ++i)
{
cout << i->first << ": " << i->second << "\n";
}
std::cout << "----\n";

// The part of the mirror with value 4.
std::pair<viter, viter> range = word_range(mirror, 4);
for (viter i = range.first ; i != range.second ; ++i)
{
cout << i->first << ": " << i->second << "\n";
}
return 0;
}
It is more lightweight than a multimap<int, string>, but not by as
much as I had originally hoped. I assumed that the map would not
change after it had been built up, i.e., you don't need the mirror
while building the map.

Whoa... thanks for this... I've been staring at it for 20 minutes trying
to get my head around it (I'm new to the stl.) Sorry I can't say
anything more constructive at the moment, until I can understand whether
doing it this way is going to be beneficial. Not having any STL text
book, I'd only found <map> code snippets on the web that seemed to suit
my needs. Having read this and googled for 'pair', I'm amazed I got as
far as I did without understanding them. (Up until now I'd been using
Objective-C which has NSDictionary and similar structures, that hide the
guts of this kind of stuff behind a lot of really intuitive convenience
functions... but I prefer C++ so here I am!)

Feb 22 '06 #10

Neil Cerutti

On 2006-02-22, Steve Edwards <gf*@lineone.net> wrote:

In article <sl*********************@FIAD06.norwich.edu>,
Neil Cerutti <le*******@email.com> wrote:

It is more lightweight than a multimap<int, string>, but not by as
much as I had originally hoped. I assumed that the map would not
change after it had been built up, i.e., you don't need the mirror
while building the map.

Whoa... thanks for this... I've been staring at it for 20 minutes
trying to get my head around it (I'm new to the stl.)

It basically copies the map of string->int into a vector, sorted by
int, of int->string.

I recall now that I used something like this in a small project a few
years ago, but I used a vector of *pointers* to map iterators. In
practice, the above solution is a lot more robust.
Sorry I can't
say anything more constructive at the moment, until I can understand
whether doing it this way is going to be beneficial. Not having any
STL text book, I'd only found <map> code snippets on the web that
seemed to suit my needs. Having read this and googled for 'pair',
I'm amazed I got as far as I did without understanding them. (Up
until now I'd been using Objective-C which has NSDictionary and
similar structures, that hide the guts of this kind of stuff behind
a lot of really intuitive convenience functions... but I prefer C++
so here I am!)

The way you're doing it now is clearer and easier. The vector mirror
could be considered a lower-level optimisation, which it probably
turns out you do not need.

--
Neil Cerutti
If you throw at someone's head, it's very dangerous, because in
the head is the brain. --Pudge Rodriguez

Feb 22 '06 #11

by: Diego Barros | last post by:

Hello, I was wondering if it was posibble to sort a map using the values and not the keys. Sorting on the keys (if they were strings, for example) is currently possible. What about, for example,...

C / C++

non-sorted map?

by: Aaron Walker | last post by:

I want a std::map<std::string, std::string> but I don't want it sorted by keys. I've been able to simulate this with a vector of pairs and operator. class umap : public...

C / C++

Problem of sorting table

by: Ross | last post by:

The following codes originally provided by a kind responder Toby A Inkster in another newsgroup for working on displaying a table on web capable of sorting are modified by me (a newbie). How to...

PHP

should python have a sort list-map object in the std-lib?

by: Tim Henderson | last post by:

Hi The question why are there no sorted dictionaries in python, seems to pop up with unseeming regularity. That question in itself in nonsensical sense dictionaries are hash-maps, however should...

Python

changing the sorting behaviour of std::map

by: Gernot Frisch | last post by:

Hi, I know I can give a std::map a sorting function in the c'tor. However, I would need to _change_ the sorting behaviour of an existing instance. How would I do that? Thank you, --

C / C++

STL: map vs vector

by: zaineb | last post by:

Hi, This is a follow-up of sort of this thread:...

C / C++

3-tier design; sorting and filtering object-based data

by: Sjaakie | last post by:

Hi, I'm, what it turns out to be, fooling around with 3-tier design. At several websites people get really enthusiastic about using custom dataobjects instead of datasets/-tables. While trying to...

ASP.NET

stl map - the lowest and greatest key

by: tomek milewski | last post by:

Hello, I have a map with keys that can be compared each other. I need a method that returns the lowest and the greatest key from that map. Now I'm using begin() and rbegin() which gives...

C / C++

can I rely on the sorting of std::map<>

by: jabbah | last post by:

I have some data in a map and I want to sort it. Currently I have implemented it like this: #include <iostream> #include <map> #include <string> using namespace std; int main(){

C / C++

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

STL Map: Sorting options?

Similar topics