473,386 Members | 1,699 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Which standard container to use?

I'm new to STL, so bear with me...

I'm examining a selection of text and storing attributes about the
frequency of letter combinations in it. Initially, I assumed I would
use a std::map to record letter-combinations -frequency as the key -
value pair. However, I also need to sort by frequency (the value)
once the analysis is done, so std::map seems to be out since map can't
sort by value. (Grrr!) Given that I want to store by key, but later
sort by value, what method should I use?

BTW, speed is critical. I'll literally be doing this operation full-
bore for hours at a time, so my thought was that the obvious solution
of creating a key-value-swapped multimap from the map once I'm done
seems kinda expensive. If this is not as bad as I'm thinking it will
be, or if there's no better way, please let me know that too.

Thanks,
Bill

Mar 17 '07 #1
3 1594
ha****@haydentech.com wrote:
I'm new to STL, so bear with me...

I'm examining a selection of text and storing attributes about the
frequency of letter combinations in it. Initially, I assumed I would
use a std::map to record letter-combinations -frequency as the key -
>value pair. However, I also need to sort by frequency (the value)
once the analysis is done, so std::map seems to be out since map can't
sort by value. (Grrr!) Given that I want to store by key, but later
sort by value, what method should I use?

BTW, speed is critical. I'll literally be doing this operation full-
bore for hours at a time, so my thought was that the obvious solution
of creating a key-value-swapped multimap from the map once I'm done
seems kinda expensive. If this is not as bad as I'm thinking it will
be, or if there's no better way, please let me know that too.
I did somthing like this once for a compression algorithm.

A) how many characters in your letter combinations ? If you limit the
number of characters, then you can use a direct index into an array. On
64 bit machines you can index very large arrays so you can get quite a
few characters.

B) How many entries are you looking for in your sorted list ? If you
only need the top X entries (where X is small fraction of the number of
entries), you can do much less work than sort the array.

So a little more information on what you want to do.

G
Mar 17 '07 #2
On Mar 17, 3:59 am, Gianni Mariani <gi3nos...@mariani.wswrote:
A) how many characters in your letter combinations ? If you limit the
number of characters, then you can use a direct index into an array. On
64 bit machines you can index very large arrays so you can get quite a
few characters.
At most three characters. I'm just doing unigraph, digraph, and
trigraph frequencies.
B) How many entries are you looking for in your sorted list ? If you
only need the top X entries (where X is small fraction of the number of
entries), you can do much less work than sort the array.
I only need the top dozen or so.

The main analysis I need to do is to compare the index of certain
letters or letter combinations, and how closely that compares to the
index of that combination in a "reference" passage of text. E.g. the
combination 'ed' was the 3rd most common in the current text, but the
5th most common in the reference text, so it has a delta of 2. I then
use all the deltas between the current and reference frequencies to
calculate a similarity factor.

Thanks,
Bill
Mar 17 '07 #3
nw
A) how many characters in your letter combinations ? If you limit the
number of characters, then you can use a direct index into an array. On
64 bit machines you can index very large arrays so you can get quite a
few characters.

At most three characters. I'm just doing unigraph, digraph, and
trigraph frequencies.
There is a discussion on working round the sorting map by value
problem here:

http://groups.google.com/group/comp....29739698f7ad73
B) How many entries are you looking for in your sorted list ? If you
only need the top X entries (where X is small fraction of the number of
entries), you can do much less work than sort the array.

I only need the top dozen or so.
A map probably isn't the most algorithmically efficient solution to
your problem. I don't believe (someone correct me if I'm wrong) it
will result in a linear time solution. But it may do for your
requirements.

As you are only looking at 3 characters a vector indexed by the
substring converted to an integer is probably an easy and efficient
solution (though probably messier than using a map). You can then find
the top X occurring substrings in a single pass over the vector. If
you were dealing with longer substrings then probably a solution based
around suffix arrays/trees would be the answer... but we are probably
getting outside the scope of this group now.

Mar 17 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Gandalf | last post by:
Hello. I have some questions about the standard containers. How does the standard containers behave if I do queue<Foo> myQ; queue<Foo> myQ2; .... insert into myQ... myQ = myQ2;
1
by: Aaron Walker | last post by:
Hey folks, I've got some code for an options class that stores user run time options (as static members) so that they are availably any time an instance is created; currently the code has a...
25
by: CodeCracker | last post by:
Problem details below: I have few items(simple values or objects) to be put into an array and I implement it through a set rather than just an array of items. This is because every time I get a...
10
by: tjgable | last post by:
I have some code that builds fine on .Net 2001(?).. VC++ 7.0. But with gcc 3.42 in MinGW and MS VC++ 6.0 it does not. I can understand VC++ not working, but isn't gcc standard yet? Here is the...
43
by: Steven T. Hatton | last post by:
Now that I have a better grasp of the scope and capabilities of the C++ Standard Library, I understand that products such as Qt actually provide much of the same functionality through their own...
9
by: Richard Brown | last post by:
Can anyone give me a good argument one way or another? I have an 'address' set of fields that are used in various situations (a client has an address, a destination has an address, etc). These...
7
by: toton | last post by:
Hi, I want a circular buffer or queue like container (queue with array implementation). Moreover I want random access over the elements. And addition at tail and remove from head need to be low...
7
by: desktop | last post by:
I the C++ standard page 472 it says that an associative container can be constructed like X(i,j,c) where i and j are input iterators to elements. But in the implementation there is no constructor...
38
by: Chris Thomasson | last post by:
Here is some example code for a container API for this group to use: http://appcore.home.comcast.net/web/ as-per the following posts: ...
3
by: massysett | last post by:
I'm puzzled about part of the standard. 23.1 states that items stored in a container must be assignable. Therefore, the items in a map--that is, std::pair<const Key, valuemust be assignable....
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.