Hello,
I was looking at Meyers's "Effective STL", item 23
about choosing between vectors and maps (at least
that the choice for me). In many cases, using
sorted vectors is faster for lookups. The two
reasons are (1) that vector elements are smaller,
since map elements need at least 3 pointers, and
(2) contiguity of vector elements in memory ensure
fewer page faults.
For me, I will a container of objects that are much
bigger than pointers, so reason (1) doesn't apply.
Reason (2) is what I'm trying to understand better.
Let me describe my application.
In my case, I will have N items in the container,
will look up n (n<N) items, do some processing,
then replace the n worst items in the container
with the n new items from processing. Here, worse
is defined by the sorting function. The problem
is that n can vary from 1 to N, making the advantages
of vector vs. map very unclear. For small n,
map is better because I'm basically interspersing
1 modification of the database with 1 lookup.
For large n, vector is better because it's faster
to sort a whole bunch of objects at once rather
than building a map in a piecemeal fashion.
Since I can't get a clear answer from that line of
reasoning, I looked more carefully at Meyers's
reason (2). I will be using binary search for
lookup, and he claims that contiguity of elements
is good for binary search in particular. More
specifically, I understood him as meaning that
closeness of elements in memory should match
closeness of elements in the traversal of
container elements. However, my understanding
of binary search is that it doesn't traverse
elements in sequential order. So why the
contiguity of elements in a vector be of any
advantage?
Thanks for clarifying. And if there are any
other considerations that would make it easier
to decide upon a proper container, I'd appreciate
those too. Right now, I'm tempted to go for a
vector because I haven't used a map before. But
for that reason, I'm also thinking it's not a bad
reason to dabble in maps, too.
Fred
--
Fred Ma
Dept. of Electronics, Carleton University
1125 Colonel By Drive, Ottawa, Ontario
Canada, K1S 5B6
Jul 22 '05
12 1667
Fred Ma <fm*@doe.carlet on.ca> wrote: "Daniel T." wrote: I guess the obvious answer is to create an adaptor that will allow you to easily switch between the two. Then profile both and pick the most efficient one.
I agree. Implementing both and profiling will certainly determine the answer. And I would certainly like to learn the profiler some time (it always seems like such a luxury, time-wise, but it would immediately answer lots of questions that can't be figured out otherwise).
For starters, I will hammer out a vector implementation.
For starters, you should design an interface that will make this special
queue of yours easy to use for the classes that use it. Then make it so
that you can switch out which kind of queue gets used by simply changing
one or two lines of code. *Then* hammer out a vector implementation.
This way, the endeavor isn't such a time waste. Instead it is a
opportunity to improve your design.
"Daniel T." wrote: Fred Ma <fm*@doe.carlet on.ca> wrote:
"Daniel T." wrote: I guess the obvious answer is to create an adaptor that will allow you to easily switch between the two. Then profile both and pick the most efficient one.
I agree. Implementing both and profiling will certainly determine the answer. And I would certainly like to learn the profiler some time (it always seems like such a luxury, time-wise, but it would immediately answer lots of questions that can't be figured out otherwise).
For starters, I will hammer out a vector implementation.
For starters, you should design an interface that will make this special queue of yours easy to use for the classes that use it. Then make it so that you can switch out which kind of queue gets used by simply changing one or two lines of code. *Then* hammer out a vector implementation.
This way, the endeavor isn't such a time waste. Instead it is a opportunity to improve your design.
I believe it is not a waste of time either.
However, I have to prioritize the things I investigate.
I also have a deadline which I will miss if I try to
pursue too much in terms of expanding my C++ abilities
per se. I appreciate all the response, and my question
was posed more for a better understanding of the relevant
issues rather than solely determining the implementation
path I will choose. In fact, after some more reflection,
I think it's better to not even use a binary search (be it via
a map or binary searching a vector). I will first sort
the n elements I am searching for, then scan the N-element
lookup table for those elements (n<N). I after finding
the first sought-for element in the lookup table, I will
scan for the next sought-for element starting from that
point in the lookup table. On average, n=N/4, so that
means I scan an average of 4 consecutive elements before
encountering the next sought-for element. I can live with
that.
I'm sure there are more sophisticated ways that change
depending on the size of n and N, but again, the sort
and lookup is a tiny part of a much bigger problem, all
of whose aspects I will have to shortly and entirely
address. Thus the compromise in thoroughness of
investigating the sort/lookup, though I do appreciate
the thoughts on the things that matter for that problem.
There will certainly be situations in future in which I
will have to rely on the ideas that I've picked up in
this thread.
Fred
--
Fred Ma
Dept. of Electronics, Carleton University
1125 Colonel By Drive, Ottawa, Ontario
Canada, K1S 5B6
Jerry Coffin wrote: In article <40************ ***@doe.carleto n.ca>, fm*@doe.carleto n.ca says...
[ ... ]
In my case, I will have N items in the container, will look up n (n<N) items, do some processing, then replace the n worst items in the container with the n new items from processing. Here, worse is defined by the sorting function. The problem is that n can vary from 1 to N, making the advantages of vector vs. map very unclear. For small n, map is better because I'm basically interspersing 1 modification of the database with 1 lookup. For large n, vector is better because it's faster to sort a whole bunch of objects at once rather than building a map in a piecemeal fashion.
Finding n smallest (or largest) items in a container can be done considerably more quickly than simply sorting the container and then taking the first (or last) n items. The latter requires a sort, which is O(N lg N), where the former can be done in approximately O(n lg N) time -- a big advantage if n is quite a bit smaller than N, and a progressively smaller one as n approaches N.
The usual way of doing this is basically a modification of a Quick Sort. In a Quick Sort, you pick a pivot, partition your elements, and then recursively do the same with each partition.
To pick n elements instead, you simply pick the partition those elements will fall into, and only recurse into the partition, ignoring the other. For example, assume you want the 10 smallest elements out of 100. For the moment I'll assume you pick perfect pivot elements every time, so the first time you divide it into the 50 smallest and the 50 largest. Clearly the 10 smallest aren't in the 50 largest, so you simply ignore the right hand partition, and continue with the left partition. The next time, you divide it into two groups of 25 elements apiece. Again, you ignore the larger elements, and concentrate only on the smaller ones. The next time around, you're down to 12 elements in the left partition -- from there, it's probably easiest to just search linearly to find the to largest of what's left, and discard them, leaving the 10 that you care about.
IIRC, whether this is better or worse than a heap depends on the relative frequency of updating elements vs. inserting entirely new elements. If (for example) you start with 100 elements, and every time around, you insert 50 entirely new elements, find the 10 worst out of the entire array, and so on iteratively, then chances are that this will work better than a heap. At the opposite extreme, if the only updating on the data is due to the processing you do after finding your n elements, then chances are much better that a heap will work better.
A heap puts in a bit more work up-front, and does a partial sorting on the entire array, so if you can put that to use, it will usually be a win. OTOH, if you just search for n elements, you put less work into arranging the data (so it's faster the first time) but you're leaving the rest of the array relatively random, so you haven't gained much for the next iteration.
Jerry, I am replacing the worst n of N elements (n<N) with new elements,
and I realize that partitioning is the way to go for that step (I am
going to use the partition() in STL for convenience). But prior to
this step, I lookup n out of N elements, and these are not the same
as the n worst elements. They can occur anywhere. I also neglected
to mention details which ultimately decided that binary searching (either
by a tree like a map, or binary search) is not the best way to perform
those lookups. That is, n will average N/4. What I will do is sort
the n elements to be looked up, then look for each one in turn by linearly
scanning the N elements of the lookup table. When one of the n elements
is found, the next one will be scanned for from that position in the lookup
table. So there will be an average of 3 skipped elements in the lookup
table for every one found. I take advantage of locality in the cache
more so than using a binary search, since the lookup table is traversed
sequentially, and the overhead of 3 skipped elements is acceptable because
of fast incrementing between adjacent vector elements.
I guess I'm exploiting a situation specific detail to get around the
larger question of profiling vector/map implementations for the time
being. The heap was a bit of a step which I prefer to find out more
about, but in the future. Reason being, my background is not CS, and
I posed my original choice as a vector/map decision because it's
readily available in standardized form via the STL, and I've got a
nice explanatory book about the formalities of their use in Josuttis.
So restricting my choices is a tradeoff; certainly, I won't have the
best solution, but I can choose the best one for which I can allot
the time.
Thanks for explanation of partitioning, and examples comparing it to
the heap (though I have to admit, I need to follow it up with more
textbook reading--I've briefly read about heaps in the past when
pondering the sort/lookup problem).
Fred
--
Fred Ma
Dept. of Electronics, Carleton University
1125 Colonel By Drive, Ottawa, Ontario
Canada, K1S 5B6 This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Juicer_X |
last post by:
Hello everyone,
I've been working with the STL Containers for a little while now, but in
the middle of working on a small "Markov Chain" class I realized that I
wanted to modify my frequency tables, so far I have been using maps like:
std::map<std::string, int> frequency;
A string for the letter combinations and an int for the frequency, but
now I want to add three more frequencies:
|
by: LeTubs |
last post by:
Hi
I'm having a bit of trouble with maps (I'm a c programmer making the
transistion).
Now I want to do it this way as I won't know how many files/names untill
runtime, not bothered
by access (ie logrithimic/constant/linear) time unless the program goes way
slow!.so trying to
use STL that can grow grow and shrink as required.
|
by: roberts.noah |
last post by:
It is my understanding that if you contain vectors or maps you don't
need to create copy constructors because the default calls that
constructor. It is my understanding that if you use these types you
don't have to worry about them at all...even if the contents of your
map or vector may contain other maps or vectors. Am I mistaken in any
way?
We are running into a very strange bug that is avoiding detection by
hiding in the default...
|
by: Jeff |
last post by:
Hello-
Ive never used a vector or vectors in C++ so I have a question for you
all.
I know I can dynamically create the size I need upfront, but is it
possible to
create them on the fly dynamically? That is, for a 2 dim array, I want
to create
the first element and then push_back the columns after that:
|
by: Dekker |
last post by:
Hi
I would like to transform the result of a csv-string (eg.:
"name,age\nstring,int\nMac,23\nMax,24\nMike,78") into a map of vectors
map<string, vector<???> >.
The key of the map will be the fieldname (name or age) and the values
are stored as vectors in the indicated type (string, int). Like this i
can call: result (= a string "Mac") or result (=
an int of val 23)
| |
by: DrLex |
last post by:
This is a really annoying thing to look up in Google because all pages that mention STL maps or vectors will most likely also contain the word "template". So maybe this question has been asked before, but it's nearly impossible to find.
I'm having trouble using STL vectors, maps, ... in templated functions, when the templated class is a template parameter of the map too.
Below is a (cannibalized) example of a class 'Model' which contains...
|
by: pallav |
last post by:
I have a map like this:
typedef boost::shared_ptr<NodeNodePtr;
typedef std::vector<NodePtrNodeVecPtr;
typedef std::map<std::string, NodePtrNodeMap;
typedef std::map<std:string, NodePtr>::iterator NodeMapItr;
NodeMap nmap;
In my classes, I often find myself copying the second arguments of
|
by: amscompit |
last post by:
I have a written the following code.
#include<iomanip>
#include<fstream>
#include<vector>
#include<cctype>
using namespace std;
|
by: Rob |
last post by:
How would I do this?
I want to be able to handle vectors of many different types of data
and vectors that can contain any number of other vectors of data.
Currently, I have a templated function that handles vectors of vectors
of <typename T(which could be anything from int to vectors of
something else). As well, I have specialized/overloaded functions to
handle the single-level vectors of data (e.g. vector<string>). So the
templated...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |