Hi,
I'm using a map with the key of type string, and value of type int.
typedef map<string, int, less<string> >concordance;
I'm finding words within some text and keeping a count of their
frequency.
I'm new to STL, but I've just read through the headers and can't see a
way to sort the results by value. (Maybe there's a way to modify less<>?)
Or... Should I create a second multimap (a copy) and have the int as the
key, and the string as the value, and then trivaially keep that sorted.
Or am I just using inappropriate container types to start with?
As I'm dealing with very large text files, speed is a concern too.
Thanks
Steve 10 7535
Steve Edwards wrote: Hi,
I'm using a map with the key of type string, and value of type int.
typedef map<string, int, less<string> >concordance;
I'm finding words within some text and keeping a count of their frequency.
I'm new to STL, but I've just read through the headers and can't see a way to sort the results by value. (Maybe there's a way to modify less<>?)
"Results" is a misleading word here. std::map<Key, Value> is used to
keep
one (composite) value per key. It's a container, and containers do not
have
results. Functions have results=return values.
Or... Should I create a second multimap (a copy) and have the int as the key, and the string as the value, and then trivaially keep that sorted.
That's one solution, if you often need that representation too, and the
keys
and values don't change. (basic caching pattern).
Or am I just using inappropriate container types to start with?
It's a good container to use when determining the int values, but it
may not
be the best container to keep those int values in.
As I'm dealing with very large text files, speed is a concern too.
Speed of what? Building the collection, or using it? And if the latter,
how?
(And if you talk files, you often are I/O limited anyway)
HTH,
Michiel Salters
In article <gf***********************@news.btinternet.com>,
Steve Edwards <gf*@lineone.net> wrote: Hi,
I'm using a map with the key of type string, and value of type int.
typedef map<string, int, less<string> >concordance;
I'm finding words within some text and keeping a count of their frequency.
I'm new to STL, but I've just read through the headers and can't see a way to sort the results by value. (Maybe there's a way to modify less<>?)
Or... Should I create a second multimap (a copy) and have the int as the key, and the string as the value, and then trivaially keep that sorted.
Or am I just using inappropriate container types to start with?
As I'm dealing with very large text files, speed is a concern too.
I'm think rather than a multimap, one could use "map<int, set<string> >"
That way, the strings will stay alphabetized.
--
Magic depends on tradition and belief. It does not welcome observation,
nor does it profit by experiment. On the other hand, science is based
on experience; it is open to correction by observation and experiment. I'm new to STL, but I've just read through the headers and can't see a way to sort the results by value. (Maybe there's a way to modify less<>?)
"Results" is a misleading word here. std::map<Key, Value> is used to keep one (composite) value per key. It's a container, and containers do not have
My mistake... I meant contents.
As I'm dealing with very large text files, speed is a concern too.
Speed of what? Building the collection, or using it? And if the latter, how? (And if you talk files, you often are I/O limited anyway)
I've already loaded my large text file in to memory as a simple array
of strings (one per word/token). I'm doing various lexical analyses on
these data. For some of these functions I need to quickly look up the
word as the key (hence my choice of <map>) to get the associated data;
other functions then require that the value mapped to the key can be
retrieved rapidly in order (hence my original question).
Since reading your reply I've built a copy as a multimap with the
key/value swapped, and using both containers together seems to be
working quite well in retrieving by either key or value rapidly.
(Though I don't have an alternative strategy to compare speed with, so
who knows.)
Building the structures is naturally slower, now, but it is an
acceptable tradeoff.
------------------------
In my original map
typedef map<string, int, less<string> >concordance;
every time I find another occurrence of the string key, is there a
quicker way to increment it's value count, than:
myConcordance[theWord] = myConcordance [theWord]+1;
It seems I'm doing 2 lookups of [theWord], can I change the value
in-place instead?
Thanks for your help.
In article <gf***********************@news.btinternet.com>,
Steve Edwards <gf*@lineone.net> wrote: In my original map
typedef map<string, int, less<string> >concordance;
every time I find another occurrence of the string key, is there a quicker way to increment it's value count, than:
myConcordance[theWord] = myConcordance [theWord]+1;
It seems I'm doing 2 lookups of [theWord], can I change the value in-place instead?
++myConcordance[theWord];
--
Magic depends on tradition and belief. It does not welcome observation,
nor does it profit by experiment. On the other hand, science is based
on experience; it is open to correction by observation and experiment.
In article <po******************************@news.east.earthl ink.net>,
"Daniel T." <po********@earthlink.net> wrote: In article <gf***********************@news.btinternet.com>, Steve Edwards <gf*@lineone.net> wrote:
Hi,
I'm using a map with the key of type string, and value of type int.
typedef map<string, int, less<string> >concordance;
I'm finding words within some text and keeping a count of their frequency.
I'm new to STL, but I've just read through the headers and can't see a way to sort the results by value. (Maybe there's a way to modify less<>?)
Or... Should I create a second multimap (a copy) and have the int as the key, and the string as the value, and then trivaially keep that sorted.
Or am I just using inappropriate container types to start with?
As I'm dealing with very large text files, speed is a concern too.
I'm think rather than a multimap, one could use "map<int, set<string> >" That way, the strings will stay alphabetized.
Thanks, I'll try that, see if it's quicker for my lookup needs.
In article <gf***********************@news.btinternet.com>,
Steve Edwards <gf*@lineone.net> wrote: In article <po******************************@news.east.earthl ink.net>, "Daniel T." <po********@earthlink.net> wrote:
I'm think rather than a multimap, one could use "map<int, set<string> >" That way, the strings will stay alphabetized.
Thanks, I'll try that, see if it's quicker for my lookup needs.
If nothing else, it will make it faster to determine how many words have
the same occurrence count.
--
Magic depends on tradition and belief. It does not welcome observation,
nor does it profit by experiment. On the other hand, science is based
on experience; it is open to correction by observation and experiment.
In article <po******************************@news.east.earthl ink.net>,
"Daniel T." <po********@earthlink.net> wrote: myConcordance[theWord] = myConcordance [theWord]+1;
++myConcordance[theWord];
Over 100 million insertions it dropped from 20s to 12.5s. Thanks.
"map<int, set<string> >"
If nothing else, it will make it faster to determine how many words have the same occurrence count.
That's a benefit, and it's cleaner to iterate through, too.
On 2006-02-21, Steve Edwards <gf*@lineone.net> wrote: Hi,
I'm using a map with the key of type string, and value of type int.
typedef map<string, int, less<string> >concordance;
I'm finding words within some text and keeping a count of their frequency.
I'm new to STL, but I've just read through the headers and can't see a way to sort the results by value. (Maybe there's a way to modify less<>?) > Or... Should I create a second multimap (a copy) and have the int as the key, and the string as the value, and then trivaially keep that sorted.
Or am I just using inappropriate container types to start with?
As I'm dealing with very large text files, speed is a concern too.
Here's another idea, which turned out to be more complicated than I at
first thought. The original plan was to store actual map iterators in
the vector, but then calling equal range became problematic, because I
had no iterator to pass in as the value to search for.
#include <iostream>
#include <iterator>
#include <vector>
#include <map>
#include <string>
#include <algorithm>
using std::map;
using std::string;
using std::vector;
using std::cout;
using std::pair;
bool less_first(pair<int, string> const& lhs, pair<int, string> const& rhs)
{
return lhs.first < rhs.first;
}
vector<pair<int, string> > mirror_map(map<string, int> const& m)
{
vector<pair<int, string> > mirror;
for (map<string, int>::const_iterator i = m.begin(); i != m.end(); ++i)
{
mirror.push_back(pair<int, string>(i->second, i->first));
}
std::sort(mirror.begin(), mirror.end(), less_first);
return mirror;
}
typedef vector<pair<int, string> >::iterator viter;
pair<viter, viter> word_range(vector<pair<int, string> >& v, int n)
{
return std::equal_range(v.begin(),
v.end(),
pair<int, string>(n, ""),
less_first);
}
int main()
{
map<string, int> words;
words["love"] = 4;
words["like"] = 3;
words["the"] = 10;
words["hate"] = 4;
words["toodles"] = 1;
vector<pair<int, string> > mirror = mirror_map(words);
// The whole mirror
for (viter i = mirror.begin() ; i != mirror.end() ; ++i)
{
cout << i->first << ": " << i->second << "\n";
}
std::cout << "----\n";
// The part of the mirror with value 4.
std::pair<viter, viter> range = word_range(mirror, 4);
for (viter i = range.first ; i != range.second ; ++i)
{
cout << i->first << ": " << i->second << "\n";
}
return 0;
}
It is more lightweight than a multimap<int, string>, but not by as
much as I had originally hoped. I assumed that the map would not
change after it had been built up, i.e., you don't need the mirror
while building the map.
--
Neil Cerutti
In article <sl*********************@FIAD06.norwich.edu>,
Neil Cerutti <le*******@email.com> wrote: Here's another idea, which turned out to be more complicated than I at first thought. The original plan was to store actual map iterators in the vector, but then calling equal range became problematic, because I had no iterator to pass in as the value to search for.
#include <iostream> #include <iterator> #include <vector> #include <map> #include <string> #include <algorithm>
using std::map; using std::string; using std::vector; using std::cout; using std::pair;
bool less_first(pair<int, string> const& lhs, pair<int, string> const& rhs) { return lhs.first < rhs.first; }
vector<pair<int, string> > mirror_map(map<string, int> const& m) { vector<pair<int, string> > mirror; for (map<string, int>::const_iterator i = m.begin(); i != m.end(); ++i) { mirror.push_back(pair<int, string>(i->second, i->first)); } std::sort(mirror.begin(), mirror.end(), less_first); return mirror; }
typedef vector<pair<int, string> >::iterator viter; pair<viter, viter> word_range(vector<pair<int, string> >& v, int n) { return std::equal_range(v.begin(), v.end(), pair<int, string>(n, ""), less_first); }
int main() { map<string, int> words; words["love"] = 4; words["like"] = 3; words["the"] = 10; words["hate"] = 4; words["toodles"] = 1;
vector<pair<int, string> > mirror = mirror_map(words);
// The whole mirror for (viter i = mirror.begin() ; i != mirror.end() ; ++i) { cout << i->first << ": " << i->second << "\n"; } std::cout << "----\n";
// The part of the mirror with value 4. std::pair<viter, viter> range = word_range(mirror, 4); for (viter i = range.first ; i != range.second ; ++i) { cout << i->first << ": " << i->second << "\n"; } return 0; }
It is more lightweight than a multimap<int, string>, but not by as much as I had originally hoped. I assumed that the map would not change after it had been built up, i.e., you don't need the mirror while building the map.
Whoa... thanks for this... I've been staring at it for 20 minutes trying
to get my head around it (I'm new to the stl.) Sorry I can't say
anything more constructive at the moment, until I can understand whether
doing it this way is going to be beneficial. Not having any STL text
book, I'd only found <map> code snippets on the web that seemed to suit
my needs. Having read this and googled for 'pair', I'm amazed I got as
far as I did without understanding them. (Up until now I'd been using
Objective-C which has NSDictionary and similar structures, that hide the
guts of this kind of stuff behind a lot of really intuitive convenience
functions... but I prefer C++ so here I am!)
On 2006-02-22, Steve Edwards <gf*@lineone.net> wrote: In article <sl*********************@FIAD06.norwich.edu>, Neil Cerutti <le*******@email.com> wrote: It is more lightweight than a multimap<int, string>, but not by as much as I had originally hoped. I assumed that the map would not change after it had been built up, i.e., you don't need the mirror while building the map.
Whoa... thanks for this... I've been staring at it for 20 minutes trying to get my head around it (I'm new to the stl.)
It basically copies the map of string->int into a vector, sorted by
int, of int->string.
I recall now that I used something like this in a small project a few
years ago, but I used a vector of *pointers* to map iterators. In
practice, the above solution is a lot more robust.
Sorry I can't say anything more constructive at the moment, until I can understand whether doing it this way is going to be beneficial. Not having any STL text book, I'd only found <map> code snippets on the web that seemed to suit my needs. Having read this and googled for 'pair', I'm amazed I got as far as I did without understanding them. (Up until now I'd been using Objective-C which has NSDictionary and similar structures, that hide the guts of this kind of stuff behind a lot of really intuitive convenience functions... but I prefer C++ so here I am!)
The way you're doing it now is clearer and easier. The vector mirror
could be considered a lower-level optimisation, which it probably
turns out you do not need.
--
Neil Cerutti
If you throw at someone's head, it's very dangerous, because in
the head is the brain. --Pudge Rodriguez This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Diego Barros |
last post by:
Hello,
I was wondering if it was posibble to sort a map using the values and not
the keys. Sorting on the keys (if they were strings, for example) is
currently possible. What about, for example,...
|
by: Aaron Walker |
last post by:
I want a std::map<std::string, std::string> but I don't want it sorted by keys.
I've been able to simulate this with a vector of pairs and operator.
class umap : public...
|
by: Ross |
last post by:
The following codes originally provided by a kind responder Toby A Inkster
in another newsgroup for working on displaying a table on web capable of
sorting are modified by me (a newbie). How to...
|
by: Tim Henderson |
last post by:
Hi
The question why are there no sorted dictionaries in python, seems to
pop up with unseeming regularity. That question in itself in
nonsensical sense dictionaries are hash-maps, however should...
|
by: Gernot Frisch |
last post by:
Hi,
I know I can give a std::map a sorting function in the c'tor. However,
I would need to _change_ the sorting behaviour of an existing
instance. How would I do that?
Thank you,
--
|
by: zaineb |
last post by:
Hi,
This is a follow-up of sort of this thread:...
|
by: Sjaakie |
last post by:
Hi,
I'm, what it turns out to be, fooling around with 3-tier design.
At several websites people get really enthusiastic about using custom
dataobjects instead of datasets/-tables.
While trying to...
|
by: tomek milewski |
last post by:
Hello,
I have a map with keys that can be compared each other. I need a
method that returns the lowest and the greatest key from that map.
Now I'm using begin() and rbegin() which gives...
|
by: jabbah |
last post by:
I have some data in a map and I want to sort it. Currently I have implemented it like this:
#include <iostream>
#include <map>
#include <string>
using namespace std;
int main(){
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
| |