Which standard container to use?

hayden

I'm new to STL, so bear with me...

I'm examining a selection of text and storing attributes about the
frequency of letter combinations in it. Initially, I assumed I would
use a std::map to record letter-combinations -frequency as the key -

value pair. However, I also need to sort by frequency (the value)

once the analysis is done, so std::map seems to be out since map can't
sort by value. (Grrr!) Given that I want to store by key, but later
sort by value, what method should I use?

BTW, speed is critical. I'll literally be doing this operation full-
bore for hours at a time, so my thought was that the obvious solution
of creating a key-value-swapped multimap from the map once I'm done
seems kinda expensive. If this is not as bad as I'm thinking it will
be, or if there's no better way, please let me know that too.

Thanks,
Bill

Mar 17 '07 #1

Subscribe Post Reply

1594

Gianni Mariani

ha****@haydentech.com wrote:

I'm new to STL, so bear with me...

I'm examining a selection of text and storing attributes about the
frequency of letter combinations in it. Initially, I assumed I would
use a std::map to record letter-combinations -frequency as the key -
>value pair. However, I also need to sort by frequency (the value)
once the analysis is done, so std::map seems to be out since map can't
sort by value. (Grrr!) Given that I want to store by key, but later
sort by value, what method should I use?

BTW, speed is critical. I'll literally be doing this operation full-
bore for hours at a time, so my thought was that the obvious solution
of creating a key-value-swapped multimap from the map once I'm done
seems kinda expensive. If this is not as bad as I'm thinking it will
be, or if there's no better way, please let me know that too.

I did somthing like this once for a compression algorithm.

A) how many characters in your letter combinations ? If you limit the
number of characters, then you can use a direct index into an array. On
64 bit machines you can index very large arrays so you can get quite a
few characters.

B) How many entries are you looking for in your sorted list ? If you
only need the top X entries (where X is small fraction of the number of
entries), you can do much less work than sort the array.

So a little more information on what you want to do.

G

Mar 17 '07 #2

hayden

On Mar 17, 3:59 am, Gianni Mariani <gi3nos...@mariani.wswrote:

A) how many characters in your letter combinations ? If you limit the
number of characters, then you can use a direct index into an array. On
64 bit machines you can index very large arrays so you can get quite a
few characters.

At most three characters. I'm just doing unigraph, digraph, and
trigraph frequencies.

B) How many entries are you looking for in your sorted list ? If you
only need the top X entries (where X is small fraction of the number of
entries), you can do much less work than sort the array.

I only need the top dozen or so.

The main analysis I need to do is to compare the index of certain
letters or letter combinations, and how closely that compares to the
index of that combination in a "reference" passage of text. E.g. the
combination 'ed' was the 3rd most common in the current text, but the
5th most common in the reference text, so it has a delta of 2. I then
use all the deltas between the current and reference frequencies to
calculate a similarity factor.

Thanks,
Bill

Mar 17 '07 #3

A) how many characters in your letter combinations ? If you limit the

number of characters, then you can use a direct index into an array. On
64 bit machines you can index very large arrays so you can get quite a
few characters.

At most three characters. I'm just doing unigraph, digraph, and
trigraph frequencies.

There is a discussion on working round the sorting map by value
problem here:

http://groups.google.com/group/comp....29739698f7ad73

B) How many entries are you looking for in your sorted list ? If you
only need the top X entries (where X is small fraction of the number of
entries), you can do much less work than sort the array.

I only need the top dozen or so.

A map probably isn't the most algorithmically efficient solution to
your problem. I don't believe (someone correct me if I'm wrong) it
will result in a linear time solution. But it may do for your
requirements.

As you are only looking at 3 characters a vector indexed by the
substring converted to an integer is probably an easy and efficient
solution (though probably messier than using a map). You can then find
the top X occurring substrings in a single pass over the vector. If
you were dealing with longer substrings then probably a solution based
around suffix arrays/trees would be the answer... but we are probably
getting outside the scope of this group now.

Mar 17 '07 #4

Similar topics

Standard containers, delete, copy

by: Gandalf | last post by:

Hello. I have some questions about the standard containers. How does the standard containers behave if I do queue<Foo> myQ; queue<Foo> myQ2; .... insert into myQ... myQ = myQ2;

C / C++

Creating a standard container with a template class as a parameter

by: Aaron Walker | last post by:

Hey folks, I've got some code for an options class that stores user run time options (as static members) so that they are availably any time an instance is created; currently the code has a...

C / C++

Which is more efficient:STL:Set or array?

by: CodeCracker | last post by:

Problem details below: I have few items(simple values or objects) to be put into an array and I implement it through a set rather than just an array of items. This is because every time I get a...

C / C++

STL conatainer with a comparison function, what is the standard?

by: tjgable | last post by:

I have some code that builds fine on .Net 2001(?).. VC++ 7.0. But with gcc 3.42 in MinGW and MS VC++ 6.0 it does not. I can understand VC++ not working, but isn't gcc standard yet? Here is the...

C / C++

Alternatives to the C++ Standard Library?

by: Steven T. Hatton | last post by:

Now that I have a better grasp of the scope and capabilities of the C++ Standard Library, I understand that products such as Qt actually provide much of the same functionality through their own...

C / C++

UserControl vs. Standard Class

by: Richard Brown | last post by:

Can anyone give me a good argument one way or another? I have an 'address' set of fields that are used in various situations (a client has an address, a destination has an address, etc). These...

Visual Basic .NET

Which container to use for circular buffer?

by: toton | last post by:

Hi, I want a circular buffer or queue like container (queue with array implementation). Moreover I want random access over the elements. And addition at tail and remove from head need to be low...

C / C++

C++ standard and current implementation

by: desktop | last post by:

I the C++ standard page 472 it says that an associative container can be constructed like X(i,j,c) where i and j are input iterators to elements. But in the implementation there is no constructor...

C / C++

Pre-Alpha CLC Standard Container API Proposal...

by: Chris Thomasson | last post by:

Here is some example code for a container API for this group to use: http://appcore.home.comcast.net/web/ as-per the following posts: ...

C / C++

Standard 23.1 - std::map::value_type must be assignable?

by: massysett | last post by:

I'm puzzled about part of the standard. 23.1 states that items stored in a container must be assignable. Therefore, the items in a map--that is, std::pair<const Key, valuemust be assignable....

C / C++

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++