Meyers's preference for vectors over maps - Page 2

Fred Ma

Hello,

I was looking at Meyers's "Effective STL", item 23
about choosing between vectors and maps (at least
that the choice for me). In many cases, using
sorted vectors is faster for lookups. The two
reasons are (1) that vector elements are smaller,
since map elements need at least 3 pointers, and
(2) contiguity of vector elements in memory ensure
fewer page faults.

For me, I will a container of objects that are much
bigger than pointers, so reason (1) doesn't apply.

Reason (2) is what I'm trying to understand better.
Let me describe my application.

In my case, I will have N items in the container,
will look up n (n<N) items, do some processing,
then replace the n worst items in the container
with the n new items from processing. Here, worse
is defined by the sorting function. The problem
is that n can vary from 1 to N, making the advantages
of vector vs. map very unclear. For small n,
map is better because I'm basically interspersing
1 modification of the database with 1 lookup.
For large n, vector is better because it's faster
to sort a whole bunch of objects at once rather
than building a map in a piecemeal fashion.

Since I can't get a clear answer from that line of
reasoning, I looked more carefully at Meyers's
reason (2). I will be using binary search for
lookup, and he claims that contiguity of elements
is good for binary search in particular. More
specifically, I understood him as meaning that
closeness of elements in memory should match
closeness of elements in the traversal of
container elements. However, my understanding
of binary search is that it doesn't traverse
elements in sequential order. So why the
contiguity of elements in a vector be of any
advantage?

Thanks for clarifying. And if there are any
other considerations that would make it easier
to decide upon a proper container, I'd appreciate
those too. Right now, I'm tempted to go for a
vector because I haven't used a map before. But
for that reason, I'm also thinking it's not a bad
reason to dabble in maps, too.

Fred
--
Fred Ma
Dept. of Electronics, Carleton University
1125 Colonel By Drive, Ottawa, Ontario
Canada, K1S 5B6

Jul 22 '05

Subscribe Reply

1667

Daniel T.

Fred Ma <fm*@doe.carlet on.ca> wrote:

"Daniel T." wrote:

I guess the obvious answer is to create an adaptor that will allow you
to easily switch between the two. Then profile both and pick the most
efficient one.

I agree. Implementing both and profiling will certainly
determine the answer. And I would certainly like to
learn the profiler some time (it always seems like such
a luxury, time-wise, but it would immediately answer
lots of questions that can't be figured out otherwise).

For starters, I will hammer out a vector implementation.

For starters, you should design an interface that will make this special
queue of yours easy to use for the classes that use it. Then make it so
that you can switch out which kind of queue gets used by simply changing
one or two lines of code. *Then* hammer out a vector implementation.

This way, the endeavor isn't such a time waste. Instead it is a
opportunity to improve your design.

Jul 22 '05 #11

Fred Ma

"Daniel T." wrote:

Fred Ma <fm*@doe.carlet on.ca> wrote:
"Daniel T." wrote:

I guess the obvious answer is to create an adaptor that will allow you
to easily switch between the two. Then profile both and pick the most
efficient one.

I agree. Implementing both and profiling will certainly
determine the answer. And I would certainly like to
learn the profiler some time (it always seems like such
a luxury, time-wise, but it would immediately answer
lots of questions that can't be figured out otherwise).

For starters, I will hammer out a vector implementation.

For starters, you should design an interface that will make this special
queue of yours easy to use for the classes that use it. Then make it so
that you can switch out which kind of queue gets used by simply changing
one or two lines of code. *Then* hammer out a vector implementation.

This way, the endeavor isn't such a time waste. Instead it is a
opportunity to improve your design.

I believe it is not a waste of time either.
However, I have to prioritize the things I investigate.
I also have a deadline which I will miss if I try to
pursue too much in terms of expanding my C++ abilities
per se. I appreciate all the response, and my question
was posed more for a better understanding of the relevant
issues rather than solely determining the implementation
path I will choose. In fact, after some more reflection,
I think it's better to not even use a binary search (be it via
a map or binary searching a vector). I will first sort
the n elements I am searching for, then scan the N-element
lookup table for those elements (n<N). I after finding
the first sought-for element in the lookup table, I will
scan for the next sought-for element starting from that
point in the lookup table. On average, n=N/4, so that
means I scan an average of 4 consecutive elements before
encountering the next sought-for element. I can live with
that.

I'm sure there are more sophisticated ways that change
depending on the size of n and N, but again, the sort
and lookup is a tiny part of a much bigger problem, all
of whose aspects I will have to shortly and entirely
address. Thus the compromise in thoroughness of
investigating the sort/lookup, though I do appreciate
the thoughts on the things that matter for that problem.
There will certainly be situations in future in which I
will have to rely on the ideas that I've picked up in
this thread.

Fred
--
Fred Ma
Dept. of Electronics, Carleton University
1125 Colonel By Drive, Ottawa, Ontario
Canada, K1S 5B6

Jul 22 '05 #12

Fred Ma

Jerry Coffin wrote:

In article <40************ ***@doe.carleto n.ca>, fm*@doe.carleto n.ca
says...

[ ... ]
In my case, I will have N items in the container,
will look up n (n<N) items, do some processing,
then replace the n worst items in the container
with the n new items from processing. Here, worse
is defined by the sorting function. The problem
is that n can vary from 1 to N, making the advantages
of vector vs. map very unclear. For small n,
map is better because I'm basically interspersing
1 modification of the database with 1 lookup.
For large n, vector is better because it's faster
to sort a whole bunch of objects at once rather
than building a map in a piecemeal fashion.

Finding n smallest (or largest) items in a container can be done
considerably more quickly than simply sorting the container and then
taking the first (or last) n items. The latter requires a sort, which
is O(N lg N), where the former can be done in approximately O(n lg N)
time -- a big advantage if n is quite a bit smaller than N, and a
progressively smaller one as n approaches N.

The usual way of doing this is basically a modification of a Quick Sort.
In a Quick Sort, you pick a pivot, partition your elements, and then
recursively do the same with each partition.

To pick n elements instead, you simply pick the partition those elements
will fall into, and only recurse into the partition, ignoring the other.
For example, assume you want the 10 smallest elements out of 100. For
the moment I'll assume you pick perfect pivot elements every time, so
the first time you divide it into the 50 smallest and the 50 largest.
Clearly the 10 smallest aren't in the 50 largest, so you simply ignore
the right hand partition, and continue with the left partition. The
next time, you divide it into two groups of 25 elements apiece. Again,
you ignore the larger elements, and concentrate only on the smaller
ones. The next time around, you're down to 12 elements in the left
partition -- from there, it's probably easiest to just search linearly
to find the to largest of what's left, and discard them, leaving the 10
that you care about.

IIRC, whether this is better or worse than a heap depends on the
relative frequency of updating elements vs. inserting entirely new
elements. If (for example) you start with 100 elements, and every time
around, you insert 50 entirely new elements, find the 10 worst out of
the entire array, and so on iteratively, then chances are that this will
work better than a heap. At the opposite extreme, if the only updating
on the data is due to the processing you do after finding your n
elements, then chances are much better that a heap will work better.

A heap puts in a bit more work up-front, and does a partial sorting on
the entire array, so if you can put that to use, it will usually be a
win. OTOH, if you just search for n elements, you put less work into
arranging the data (so it's faster the first time) but you're leaving
the rest of the array relatively random, so you haven't gained much for
the next iteration.

Jerry, I am replacing the worst n of N elements (n<N) with new elements,
and I realize that partitioning is the way to go for that step (I am
going to use the partition() in STL for convenience). But prior to
this step, I lookup n out of N elements, and these are not the same
as the n worst elements. They can occur anywhere. I also neglected
to mention details which ultimately decided that binary searching (either
by a tree like a map, or binary search) is not the best way to perform
those lookups. That is, n will average N/4. What I will do is sort
the n elements to be looked up, then look for each one in turn by linearly
scanning the N elements of the lookup table. When one of the n elements
is found, the next one will be scanned for from that position in the lookup
table. So there will be an average of 3 skipped elements in the lookup
table for every one found. I take advantage of locality in the cache
more so than using a binary search, since the lookup table is traversed
sequentially, and the overhead of 3 skipped elements is acceptable because
of fast incrementing between adjacent vector elements.

I guess I'm exploiting a situation specific detail to get around the
larger question of profiling vector/map implementations for the time
being. The heap was a bit of a step which I prefer to find out more
about, but in the future. Reason being, my background is not CS, and
I posed my original choice as a vector/map decision because it's
readily available in standardized form via the STL, and I've got a
nice explanatory book about the formalities of their use in Josuttis.
So restricting my choices is a tradeoff; certainly, I won't have the
best solution, but I can choose the best one for which I can allot
the time.

Thanks for explanation of partitioning, and examples comparing it to
the heap (though I have to admit, I need to follow it up with more
textbook reading--I've briefly read about heaps in the past when
pondering the sort/lookup problem).

Fred
--
Fred Ma
Dept. of Electronics, Carleton University
1125 Colonel By Drive, Ottawa, Ontario
Canada, K1S 5B6

Jul 22 '05 #13

Similar topics

10836

Map of vectors how? <Long>

by: Juicer_X | last post by:

Hello everyone, I've been working with the STL Containers for a little while now, but in the middle of working on a small "Markov Chain" class I realized that I wanted to modify my frequency tables, so far I have been using maps like: std::map<std::string, int> frequency; A string for the letter combinations and an int for the frequency, but now I want to add three more frequencies:

C / C++

1338

some help required with maps , stucts and vectors.

by: LeTubs | last post by:

Hi I'm having a bit of trouble with maps (I'm a c programmer making the transistion). Now I want to do it this way as I won't know how many files/names untill runtime, not bothered by access (ie logrithimic/constant/linear) time unless the program goes way slow!.so trying to use STL that can grow grow and shrink as required.

C / C++

1554

types containing vectors or maps

by: roberts.noah | last post by:

It is my understanding that if you contain vectors or maps you don't need to create copy constructors because the default calls that constructor. It is my understanding that if you use these types you don't have to worry about them at all...even if the contents of your map or vector may contain other maps or vectors. Am I mistaken in any way? We are running into a very strange bug that is avoiding detection by hiding in the default...

C / C++

15909

vector of vectors push_back question

by: Jeff | last post by:

Hello- Ive never used a vector or vectors in C++ so I have a question for you all. I know I can dynamically create the size I need upfront, but is it possible to create them on the fly dynamically? That is, for a 2 dim array, I want to create the first element and then push_back the columns after that:

C / C++

1709

A map of varying vectors

by: Dekker | last post by:

Hi I would like to transform the result of a csv-string (eg.: "name,age\nstring,int\nMac,23\nMax,24\nMike,78") into a map of vectors map<string, vector<???> >. The key of the map will be the fieldname (name or age) and the values are stored as vectors in the indicated type (string, int). Like this i can call: result (= a string "Mac") or result (= an int of val 23)

C / C++

1785

Mixing STL maps/vectors with templates

by: DrLex | last post by:

This is a really annoying thing to look up in Google because all pages that mention STL maps or vectors will most likely also contain the word "template". So maybe this question has been asked before, but it's nearly impossible to find. I'm having trouble using STL vectors, maps, ... in templated functions, when the templated class is a template parameter of the map too. Below is a (cannibalized) example of a class 'Model' which contains...

C / C++

1892

Copying from maps to lists or vectors

by: pallav | last post by:

I have a map like this: typedef boost::shared_ptr<NodeNodePtr; typedef std::vector<NodePtrNodeVecPtr; typedef std::map<std::string, NodePtrNodeMap; typedef std::map<std:string, NodePtr>::iterator NodeMapItr; NodeMap nmap; In my classes, I often find myself copying the second arguments of

C / C++

2333

doubt in using 2D vectors

by: amscompit | last post by:

I have a written the following code. #include<iomanip> #include<fstream> #include<vector> #include<cctype> using namespace std;

C / C++

2067

How get data from vectors of templates?

by: Rob | last post by:

How would I do this? I want to be able to handle vectors of many different types of data and vectors that can contain any number of other vectors of data. Currently, I have a templated function that handles vectors of vectors of <typename T(which could be anything from int to vectors of something else). As well, I have specialized/overloaded functions to handle the single-level vectors of data (e.g. vector<string>). So the templated...

C / C++

9721

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

9600

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10374

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

10374

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

10114

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

6880

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5548

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

3859

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

3011

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General