473,799 Members | 3,178 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Meyers's preference for vectors over maps

Hello,

I was looking at Meyers's "Effective STL", item 23
about choosing between vectors and maps (at least
that the choice for me). In many cases, using
sorted vectors is faster for lookups. The two
reasons are (1) that vector elements are smaller,
since map elements need at least 3 pointers, and
(2) contiguity of vector elements in memory ensure
fewer page faults.

For me, I will a container of objects that are much
bigger than pointers, so reason (1) doesn't apply.

Reason (2) is what I'm trying to understand better.
Let me describe my application.

In my case, I will have N items in the container,
will look up n (n<N) items, do some processing,
then replace the n worst items in the container
with the n new items from processing. Here, worse
is defined by the sorting function. The problem
is that n can vary from 1 to N, making the advantages
of vector vs. map very unclear. For small n,
map is better because I'm basically interspersing
1 modification of the database with 1 lookup.
For large n, vector is better because it's faster
to sort a whole bunch of objects at once rather
than building a map in a piecemeal fashion.

Since I can't get a clear answer from that line of
reasoning, I looked more carefully at Meyers's
reason (2). I will be using binary search for
lookup, and he claims that contiguity of elements
is good for binary search in particular. More
specifically, I understood him as meaning that
closeness of elements in memory should match
closeness of elements in the traversal of
container elements. However, my understanding
of binary search is that it doesn't traverse
elements in sequential order. So why the
contiguity of elements in a vector be of any
advantage?

Thanks for clarifying. And if there are any
other considerations that would make it easier
to decide upon a proper container, I'd appreciate
those too. Right now, I'm tempted to go for a
vector because I haven't used a map before. But
for that reason, I'm also thinking it's not a bad
reason to dabble in maps, too.

Fred
--
Fred Ma
Dept. of Electronics, Carleton University
1125 Colonel By Drive, Ottawa, Ontario
Canada, K1S 5B6
Jul 22 '05 #1
12 1664
Fred Ma wrote:
In my case, I will have N items in the container,
will look up n (n<N) items, do some processing,
then replace the n worst items in the container
with the n new items from processing. Here, worse
is defined by the sorting function. The problem
is that n can vary from 1 to N, making the advantages
of vector vs. map very unclear. For small n,
map is better because I'm basically interspersing
1 modification of the database with 1 lookup.
For large n, vector is better because it's faster
to sort a whole bunch of objects at once rather
than building a map in a piecemeal fashion.


That looks like a good candidate for usage of a priority queue.
Removing items from both a vector and a map is slow, and since you need
a sorting criteria, a normal deque will not suffice.

--
To get my real email adress, remove the two onkas
--
Dipl.-Inform. Hendrik Belitz
Central Institute of Electronics
Research Center Juelich
Jul 22 '05 #2
Hendrik Belitz wrote:

Fred Ma wrote:
In my case, I will have N items in the container,
will look up n (n<N) items, do some processing,
then replace the n worst items in the container
with the n new items from processing. Here, worse
is defined by the sorting function. The problem
is that n can vary from 1 to N, making the advantages
of vector vs. map very unclear. For small n,
map is better because I'm basically interspersing
1 modification of the database with 1 lookup.
For large n, vector is better because it's faster
to sort a whole bunch of objects at once rather
than building a map in a piecemeal fashion.


That looks like a good candidate for usage of a priority queue.
Removing items from both a vector and a map is slow, and since you need
a sorting criteria, a normal deque will not suffice.

Hi, Hendrik,

I looked up priority queues in Josuttis's "The C++
Standard Library". It's built on top of the standard
iterator-supporting containers, with the vector as a
default. Wouldn't I still be faced with the comparison
of issues between vectors and maps for my usage? My
impression is that the special container adapters
simplify the interface rather than changing the
underlying mechanism of data storage and retrieval.
Admittedly, this could be a misimpression on my part.

Fred
Jul 22 '05 #3
Fred Ma wrote:
Hi, Hendrik,

I looked up priority queues in Josuttis's "The C++
Standard Library". It's built on top of the standard
iterator-supporting containers, with the vector as a
default. Wouldn't I still be faced with the comparison
of issues between vectors and maps for my usage? My
impression is that the special container adapters
simplify the interface rather than changing the
underlying mechanism of data storage and retrieval.
Admittedly, this could be a misimpression on my part.


Uh well, I didn't even recognized that priority queues are available as an
STL adapter. Normally priority queues are implemented by fibonacci heaps
and are very efficient. Maybe you should look for some implementation which
does it that way instead of using a stl container/adapter.

--
To get my real email adress, remove the two onkas
--
Dipl.-Inform. Hendrik Belitz
Central Institute of Electronics
Research Center Juelich
Jul 22 '05 #4
On 28 Jan 2004 08:13:37 GMT, Fred Ma <fm*@doe.carlet on.ca> wrote:
Hello,

I was looking at Meyers's "Effective STL", item 23
about choosing between vectors and maps (at least
that the choice for me). In many cases, using
sorted vectors is faster for lookups. The two
reasons are (1) that vector elements are smaller,
since map elements need at least 3 pointers, and
(2) contiguity of vector elements in memory ensure
fewer page faults.

For me, I will a container of objects that are much
bigger than pointers, so reason (1) doesn't apply.

Reason (2) is what I'm trying to understand better.
Let me describe my application.

In my case, I will have N items in the container,
Is N fixed at runtime? IOW, is the size of the container constant?
will look up n (n<N) items, do some processing,
then replace the n worst items in the container
with the n new items from processing. Here, worse
is defined by the sorting function. The problem
is that n can vary from 1 to N, making the advantages
of vector vs. map very unclear. For small n,
map is better because I'm basically interspersing
1 modification of the database with 1 lookup.
For large n, vector is better because it's faster
to sort a whole bunch of objects at once rather
than building a map in a piecemeal fashion.
I would recommend a vector. Keep another empty vector handy. Start by
sorting your main vector. Calculate your new range of n to add and
sort it (in O(n)), then merge that with your old vector into your same
sized vector.

tempvec.reserve (v.size());
std::sort(newra nge.begin(), newrange.end()) ;
//merge the N-n remaining elements from v with the new range:
std::merge(v.be gin() + n, v.end(), newrange.begin( ), newrange.end(),
std::back_inser ter(tempvec));

//swap
tempvec.swap(v) ;
tempvec.clear() ;

etc. That way the operation is only n log n, not N log N.

Since I can't get a clear answer from that line of
reasoning, I looked more carefully at Meyers's
reason (2). I will be using binary search for
lookup, and he claims that contiguity of elements
is good for binary search in particular. More
specifically , I understood him as meaning that
closeness of elements in memory should match
closeness of elements in the traversal of
container elements. However, my understanding
of binary search is that it doesn't traverse
elements in sequential order. So why the
contiguity of elements in a vector be of any
advantage?
Because they are all close to each other in memory. If a 4k chunk of
the vector is pulled into cache memory, you'll get a lot of cache
hits.
Thanks for clarifying. And if there are any
other considerations that would make it easier
to decide upon a proper container, I'd appreciate
those too. Right now, I'm tempted to go for a
vector because I haven't used a map before. But
for that reason, I'm also thinking it's not a bad
reason to dabble in maps, too.


Given the fixed size, vector seems a good choice. If your elements are
expensive to copy, I'd go for a vector or smart pointers, or just
pointers (with manual memory management).

Tom

C++ FAQ: http://www.parashift.com/c++-faq-lite/
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
Jul 22 '05 #5
Fred Ma <fm*@doe.carlet on.ca> wrote:
I was looking at Meyers's "Effective STL", item 23
about choosing between vectors and maps (at least
that the choice for me). In many cases, using
sorted vectors is faster for lookups. The two
reasons are (1) that vector elements are smaller,
since map elements need at least 3 pointers, and
(2) contiguity of vector elements in memory ensure
fewer page faults.


I guess the obvious answer is to create an adaptor that will allow you
to easily switch between the two. Then profile both and pick the most
efficient one.
Jul 22 '05 #6
On Wed, 28 Jan 2004 12:49:02 +0000, tom_usenet
<to********@hot mail.com> wrote:
I would recommend a vector. Keep another empty vector handy. Start by
sorting your main vector. Calculate your new range of n to add and
sort it (in O(n)),


That's meant to say O(n log n)!

Tom

C++ FAQ: http://www.parashift.com/c++-faq-lite/
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
Jul 22 '05 #7
Fred Ma <fm*@doe.carlet on.ca> wrote:
I looked up priority queues in Josuttis's "The C++
Standard Library". It's built on top of the standard
iterator-supporting containers, with the vector as a
default. Wouldn't I still be faced with the comparison
of issues between vectors and maps for my usage?
If the operations provided for a priority queue fit your needs, it is almost
certainly the better choice: this data structure is specialized for finding
the smallest (or biggest; depends on the direction of your predicate)
element as fast as possible. That is, insertion into a priority queue just
does the necessary work to provide fast access to the smallest element.
Likewise for removal.

If you need to access or change other elements than the smallest one, things
change quite a lot, however. The priority queue in the standard library is
not at all suited for this but there are priority queues which are helpful
here, too. However, the difference to other data structures like eg. maps
is reduced.
My
impression is that the special container adapters
simplify the interface rather than changing the
underlying mechanism of data storage and retrieval.


This is true for stack and queue but not for priority_queue. The latter
implements the logic of a 2-heap (a special case of a d-heap which is just
a balanced tree with d children for each inner node). This is quite different
from doing eg. a binary search etc. Also, the number of elements changed
during inserts or removals is not linear in the size of the vector but
logarithmic.

Concerings Scott's comment about the advantages of vector over map: at first
sight, a map looks superiour to a vector. The first insight why it is not so
in all situations is the amount of work necessary to traverse the map's
internal tree compared to a binary search within the vector which just
effectively uses an "ad-hoc" binary tree. To move from one node to it child,
the map has to dereference a pointer whereas the vector simply a clever
addition or subtraction. This is often much faster. Towards the end of the
search where most of the work is going on and the steps become smaller,
the "nodes" (in case of the map the explicit tree nodes, in case of the vector
the implicit current location) are more likely to sit on one page for the
vector because they are [nearly] adjacent to each other. A page miss is
pretty expensive. Also, the vector's nodes are much smaller (just the data
you store there) compared to the map's nodes (which typically contain at least
three pointers for the map's maintenance in addition to the data you store)
increasing the chances to be on one page even more. Especially for small user
data, the vector's movements are likely to eventually move around only in a
cache-line of the CPU while the map is still causing page misses or at least
cache misses.

For searching it is pretty obvious that you want to use a vector in favour of
a map. If you can go with an unorder structure, a hash is an even better
choice, BTW.

Things become less obvious if you need to change the data: if the number of
searches compared to the number of modifications (insertions and removals;
a change is effectively a removal followed by an insertion for both a map and
a sorted vector although it can be improved a little bit) becomes small, ie.
if you have only few searches between modification, things start to change:
a huge collection of data will cause a lot of moving for vectors while it will
cause only a few pointer adjustments for maps. On the other hand, the pointer
adjustments are more involved than simply moving a set of small PODs in a
fairly small vector. The tricky part is figuring out where one approach is
better than the other in such a situation because apart from the CPU cycles
things like the memory foot-print also have an effect. This normally results
in vectors with even a few hunderd elements being faster than maps.

The details depend on the exact setting, including the size of elements, the
size of the container, the total amount of memory used, the access patterns,
etc. Thus, it is normally reasonable to implement a working solution using an
appropriate class and later profile the whole application to find out whether
this part is critical at all (normally the critical parts are others than one
suspects...) and if so, replace the solution by alternatives to measure them,
too. It may be reasonable to use a class with an appropriate interface which
is internally implemented by one of the choices and whose implementation is
later changed as needed.
--
<mailto:di***** ******@yahoo.co m> <http://www.dietmar-kuehl.de/>
Phaidros eaSE - Easy Software Engineering: <http://www.phaidros.co m/>
Jul 22 '05 #8
Hendrik Belitz wrote:
Uh well, I didn't even recognized that priority queues are available as an
STL adapter. Normally priority queues are implemented by fibonacci heaps
and are very efficient.
Are they, now? Well, the 'std::priority_ queue' adapter is by an order
of magnitude faster than the best Fibonacci heap implementation I could
do. Of course, this alone does not say much but I could also improve on
the adapter implementation from SGI-STL, although only by something like
5%.

The advantage of Fibonacci heaps over the 2-heap normally used by the
priority queue adapter is when it comes down to modifying elements:
this operation is not supported at all while it is the operation
Fibonacci heaps are good at. If this feature is needed, Fibonacci heaps
are on par with d-heaps (d == 2 is actually not the best choice here;
I think d == 5 was optimum for my tests) extended to provide this
feature for all number of elements I tested it with (which was up to
10 million if I remember correctly).
Maybe you should look for some implementation
which does it that way instead of using a stl container/adapter.


Have a look at <http://www.dietmar-kuehl.de/cxxrt/heaps.tar.gz>. This
is a collection of priority queues I have implemented. It includes
quite a few different ones, including Fibonacci heaps.
--
<mailto:di***** ******@yahoo.co m> <http://www.dietmar-kuehl.de/>
Phaidros eaSE - Easy Software Engineering: <http://www.phaidros.co m/>
Jul 22 '05 #9
In article <40************ ***@doe.carleto n.ca>, fm*@doe.carleto n.ca
says...

[ ... ]
In my case, I will have N items in the container,
will look up n (n<N) items, do some processing,
then replace the n worst items in the container
with the n new items from processing. Here, worse
is defined by the sorting function. The problem
is that n can vary from 1 to N, making the advantages
of vector vs. map very unclear. For small n,
map is better because I'm basically interspersing
1 modification of the database with 1 lookup.
For large n, vector is better because it's faster
to sort a whole bunch of objects at once rather
than building a map in a piecemeal fashion.


Finding n smallest (or largest) items in a container can be done
considerably more quickly than simply sorting the container and then
taking the first (or last) n items. The latter requires a sort, which
is O(N lg N), where the former can be done in approximately O(n lg N)
time -- a big advantage if n is quite a bit smaller than N, and a
progressively smaller one as n approaches N.

The usual way of doing this is basically a modification of a Quick Sort.
In a Quick Sort, you pick a pivot, partition your elements, and then
recursively do the same with each partition.

To pick n elements instead, you simply pick the partition those elements
will fall into, and only recurse into the partition, ignoring the other.
For example, assume you want the 10 smallest elements out of 100. For
the moment I'll assume you pick perfect pivot elements every time, so
the first time you divide it into the 50 smallest and the 50 largest.
Clearly the 10 smallest aren't in the 50 largest, so you simply ignore
the right hand partition, and continue with the left partition. The
next time, you divide it into two groups of 25 elements apiece. Again,
you ignore the larger elements, and concentrate only on the smaller
ones. The next time around, you're down to 12 elements in the left
partition -- from there, it's probably easiest to just search linearly
to find the to largest of what's left, and discard them, leaving the 10
that you care about.

IIRC, whether this is better or worse than a heap depends on the
relative frequency of updating elements vs. inserting entirely new
elements. If (for example) you start with 100 elements, and every time
around, you insert 50 entirely new elements, find the 10 worst out of
the entire array, and so on iteratively, then chances are that this will
work better than a heap. At the opposite extreme, if the only updating
on the data is due to the processing you do after finding your n
elements, then chances are much better that a heap will work better.

A heap puts in a bit more work up-front, and does a partial sorting on
the entire array, so if you can put that to use, it will usually be a
win. OTOH, if you just search for n elements, you put less work into
arranging the data (so it's faster the first time) but you're leaving
the rest of the array relatively random, so you haven't gained much for
the next iteration.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Jul 22 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
10835
by: Juicer_X | last post by:
Hello everyone, I've been working with the STL Containers for a little while now, but in the middle of working on a small "Markov Chain" class I realized that I wanted to modify my frequency tables, so far I have been using maps like: std::map<std::string, int> frequency; A string for the letter combinations and an int for the frequency, but now I want to add three more frequencies:
1
1338
by: LeTubs | last post by:
Hi I'm having a bit of trouble with maps (I'm a c programmer making the transistion). Now I want to do it this way as I won't know how many files/names untill runtime, not bothered by access (ie logrithimic/constant/linear) time unless the program goes way slow!.so trying to use STL that can grow grow and shrink as required.
5
1553
by: roberts.noah | last post by:
It is my understanding that if you contain vectors or maps you don't need to create copy constructors because the default calls that constructor. It is my understanding that if you use these types you don't have to worry about them at all...even if the contents of your map or vector may contain other maps or vectors. Am I mistaken in any way? We are running into a very strange bug that is avoiding detection by hiding in the default...
9
15907
by: Jeff | last post by:
Hello- Ive never used a vector or vectors in C++ so I have a question for you all. I know I can dynamically create the size I need upfront, but is it possible to create them on the fly dynamically? That is, for a 2 dim array, I want to create the first element and then push_back the columns after that:
1
1709
by: Dekker | last post by:
Hi I would like to transform the result of a csv-string (eg.: "name,age\nstring,int\nMac,23\nMax,24\nMike,78") into a map of vectors map<string, vector<???> >. The key of the map will be the fieldname (name or age) and the values are stored as vectors in the indicated type (string, int). Like this i can call: result (= a string "Mac") or result (= an int of val 23)
5
1784
by: DrLex | last post by:
This is a really annoying thing to look up in Google because all pages that mention STL maps or vectors will most likely also contain the word "template". So maybe this question has been asked before, but it's nearly impossible to find. I'm having trouble using STL vectors, maps, ... in templated functions, when the templated class is a template parameter of the map too. Below is a (cannibalized) example of a class 'Model' which contains...
5
1892
by: pallav | last post by:
I have a map like this: typedef boost::shared_ptr<NodeNodePtr; typedef std::vector<NodePtrNodeVecPtr; typedef std::map<std::string, NodePtrNodeMap; typedef std::map<std:string, NodePtr>::iterator NodeMapItr; NodeMap nmap; In my classes, I often find myself copying the second arguments of
6
2332
by: amscompit | last post by:
I have a written the following code. #include<iomanip> #include<fstream> #include<vector> #include<cctype> using namespace std;
1
2067
by: Rob | last post by:
How would I do this? I want to be able to handle vectors of many different types of data and vectors that can contain any number of other vectors of data. Currently, I have a templated function that handles vectors of vectors of <typename T(which could be anything from int to vectors of something else). As well, I have specialized/overloaded functions to handle the single-level vectors of data (e.g. vector<string>). So the templated...
0
9686
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9540
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10475
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10250
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9068
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7564
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
1
4139
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3757
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2938
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.