By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,190 Members | 833 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,190 IT Pros & Developers. It's quick & easy.

Which standard container to use?

P: n/a
I'm new to STL, so bear with me...

I'm examining a selection of text and storing attributes about the
frequency of letter combinations in it. Initially, I assumed I would
use a std::map to record letter-combinations -frequency as the key -
value pair. However, I also need to sort by frequency (the value)
once the analysis is done, so std::map seems to be out since map can't
sort by value. (Grrr!) Given that I want to store by key, but later
sort by value, what method should I use?

BTW, speed is critical. I'll literally be doing this operation full-
bore for hours at a time, so my thought was that the obvious solution
of creating a key-value-swapped multimap from the map once I'm done
seems kinda expensive. If this is not as bad as I'm thinking it will
be, or if there's no better way, please let me know that too.

Thanks,
Bill

Mar 17 '07 #1
Share this Question
Share on Google+
3 Replies


P: n/a
ha****@haydentech.com wrote:
I'm new to STL, so bear with me...

I'm examining a selection of text and storing attributes about the
frequency of letter combinations in it. Initially, I assumed I would
use a std::map to record letter-combinations -frequency as the key -
>value pair. However, I also need to sort by frequency (the value)
once the analysis is done, so std::map seems to be out since map can't
sort by value. (Grrr!) Given that I want to store by key, but later
sort by value, what method should I use?

BTW, speed is critical. I'll literally be doing this operation full-
bore for hours at a time, so my thought was that the obvious solution
of creating a key-value-swapped multimap from the map once I'm done
seems kinda expensive. If this is not as bad as I'm thinking it will
be, or if there's no better way, please let me know that too.
I did somthing like this once for a compression algorithm.

A) how many characters in your letter combinations ? If you limit the
number of characters, then you can use a direct index into an array. On
64 bit machines you can index very large arrays so you can get quite a
few characters.

B) How many entries are you looking for in your sorted list ? If you
only need the top X entries (where X is small fraction of the number of
entries), you can do much less work than sort the array.

So a little more information on what you want to do.

G
Mar 17 '07 #2

P: n/a
On Mar 17, 3:59 am, Gianni Mariani <gi3nos...@mariani.wswrote:
A) how many characters in your letter combinations ? If you limit the
number of characters, then you can use a direct index into an array. On
64 bit machines you can index very large arrays so you can get quite a
few characters.
At most three characters. I'm just doing unigraph, digraph, and
trigraph frequencies.
B) How many entries are you looking for in your sorted list ? If you
only need the top X entries (where X is small fraction of the number of
entries), you can do much less work than sort the array.
I only need the top dozen or so.

The main analysis I need to do is to compare the index of certain
letters or letter combinations, and how closely that compares to the
index of that combination in a "reference" passage of text. E.g. the
combination 'ed' was the 3rd most common in the current text, but the
5th most common in the reference text, so it has a delta of 2. I then
use all the deltas between the current and reference frequencies to
calculate a similarity factor.

Thanks,
Bill
Mar 17 '07 #3

P: n/a
nw
A) how many characters in your letter combinations ? If you limit the
number of characters, then you can use a direct index into an array. On
64 bit machines you can index very large arrays so you can get quite a
few characters.

At most three characters. I'm just doing unigraph, digraph, and
trigraph frequencies.
There is a discussion on working round the sorting map by value
problem here:

http://groups.google.com/group/comp....29739698f7ad73
B) How many entries are you looking for in your sorted list ? If you
only need the top X entries (where X is small fraction of the number of
entries), you can do much less work than sort the array.

I only need the top dozen or so.
A map probably isn't the most algorithmically efficient solution to
your problem. I don't believe (someone correct me if I'm wrong) it
will result in a linear time solution. But it may do for your
requirements.

As you are only looking at 3 characters a vector indexed by the
substring converted to an integer is probably an easy and efficient
solution (though probably messier than using a map). You can then find
the top X occurring substrings in a single pass over the vector. If
you were dealing with longer substrings then probably a solution based
around suffix arrays/trees would be the answer... but we are probably
getting outside the scope of this group now.

Mar 17 '07 #4

This discussion thread is closed

Replies have been disabled for this discussion.