STL: map vs vector

zaineb

Hi,

This is a follow-up of sort of this thread:
http://groups.google.com/group/comp....2cab752779f1fa

I've been chewing on this problem for a while. When I came across the
above thread, I figured that maybe other members have a better idea on
how to approach this.

I'm currently trying to decide between which one of maps or vectors to
use. I'm trying to come up with a design that will let me perform fast
insertions and deletions (w/o duplicates) in a sorted "list" and then
look up every 25th or so element (and then iterate sequentially for the
next 25 values , so sequential look up). The indexed lookup happens
slightly more frequently than insertions and deletions, so it makes
sense to make lookup efficient. Because the insertions and deletions
also happen often, fast/efficient lookups are helpful as well. Memory
isnt as much of an issue as speed is.

I've looked around a bit, so if this is something that has been
discussed before, please let me know.

Right now I have 2 "solutions".

One is a sorted vector, where I use 'lower_bound' to get where I should
insert the value and insert it there, and use it to find the value to
remove. Lookups are just normal lookups (thru
my_vector.begin()+indexSubscript ). Most inserts and removals are from
the center of the list, so whenever there is an insertion or removal,
I'm guessing that all the elements are copied up or down. The values
are user defined objects so thats a lot of copying calls.

The other design uses a map and a vector. The map contains the actual
sorted values. The vector is an index to every 25th key which is reset
at every insertion and removal (from the closest previous key in the
map onwards). To access the 25th*x value, you would get the
corresponding key in the index and find it in the map, and then iterate
sequentially.

After implementing both of these, the performance difference is really
minimal for me, but I'm not sure if I'm missing a better way of doing
this. I'm guessing this is something that people have come across
often. I'm a new C++/STL user, so I might be missing a good data
structure to use or a better solution to this problem. Any help would
be appreciated.

Thanks much.
-Z

May 18 '06 #1

Subscribe Reply

20102

n2xssvv g02gfr12930

I'd suggest a map is generally a better idea as you intend often doing
insertion and deletion operations. My justification for this is that a
vector is generally implemented as an allocated array and a map is
generally implemented as a Red Black Tree. If insertion and deletions
are often required then the Red Black Tree will be faster without a
significant lose of speed for searching or spanning. Whereas insertion
and deletions in an array are significantly slower.

JB

May 18 '06 #2

persenaama

I might be missing the obvious, but since you are considering between
map and a *vector*, it looks like std::set might do the job better than
map. The same fundamental problem would still persist, though.

The indexing is a problem, what you could do is that you have indexing
vector with only pointers-to-objects-in-set stored. You could try lazy
evaluation for the indexing "service": sort the indexing vector using
the object's address as predicate when first indexing query comes to
your higher level container (wrapper).

You got a nice problem there: sorting, fast insertion and removal -and-
linear indexing into the container. The indexing absolutely suggests
that you want vector-like storage. This is a big no if the objects you
are storing are expensive to copy. Pointers are cheaper to copy so that
will reduce the overhead. That's the rationalization I have for the
suggestion.

What works best in situation where ideal solution is not obvious
depends to a degree on the usage pattern. It's not obvious to me from
your post (I didn't follow the link to previous thread).

Important: the lazy evaluation is highly important here, you *want* to
have a flag indicating when the pointer vector is not sorted. Ofcourse,
still, doing "random" removals from a vector will cost a bundle. It's
really ugly.. if possible I'd try to adjust the client code to not
require indexing into the container (most probably highly inpractical
but just a thought..) -- note -- removal won't invalidate the order,
only insertion. I'd postpone the sorting until a query is made, then
you can use trivial std::sort

May 19 '06 #3

zaineb

Hi,

I figured there was no real difference between a set and a map, so at
least I'll be able to map to keys in map, which might be easier for me.
I was a little confused about pointers to objects. If I add a new
object or remove an object from the set/map .. is the pointer
invalidated?

One thing I can say about usage is that reading is periodic. So say,
every 30 ms, you read chunks of 25 objects from the "list". You only
read things in chunks so you read the 1st 25 elements .. and then
somewhere else you might read 75-100th elements.

Since querying is performed so often, I'm not sure how helpful lazy
evaluation will be though, or maybe I'm confused about this.

Thanks!
Mehwish

persenaama wrote:

I might be missing the obvious, but since you are considering between
map and a *vector*, it looks like std::set might do the job better than
map. The same fundamental problem would still persist, though.

The indexing is a problem, what you could do is that you have indexing
vector with only pointers-to-objects-in-set stored. You could try lazy
evaluation for the indexing "service": sort the indexing vector using
the object's address as predicate when first indexing query comes to
your higher level container (wrapper).

You got a nice problem there: sorting, fast insertion and removal -and-
linear indexing into the container. The indexing absolutely suggests
that you want vector-like storage. This is a big no if the objects you
are storing are expensive to copy. Pointers are cheaper to copy so that
will reduce the overhead. That's the rationalization I have for the
suggestion.

What works best in situation where ideal solution is not obvious
depends to a degree on the usage pattern. It's not obvious to me from
your post (I didn't follow the link to previous thread).

Important: the lazy evaluation is highly important here, you *want* to
have a flag indicating when the pointer vector is not sorted. Ofcourse,
still, doing "random" removals from a vector will cost a bundle. It's
really ugly.. if possible I'd try to adjust the client code to not
require indexing into the container (most probably highly inpractical
but just a thought..) -- note -- removal won't invalidate the order,
only insertion. I'd postpone the sorting until a query is made, then
you can use trivial std::sort

May 19 '06 #4

zaineb

May 19 '06 #5

Luke Meyers

za****@gmail.com wrote:

I'm currently trying to decide between which one of maps or vectors to
use. I'm trying to come up with a design that will let me perform fast
insertions and deletions (w/o duplicates) in a sorted "list" and then
look up every 25th or so element (and then iterate sequentially for the
next 25 values , so sequential look up). The indexed lookup happens
slightly more frequently than insertions and deletions, so it makes
sense to make lookup efficient. Because the insertions and deletions
also happen often, fast/efficient lookups are helpful as well. Memory
isnt as much of an issue as speed is.
Do you iterate across all (or most) of the collection when you do the
25-at-a-time thing, without modifying the container (hence invalidating
iterators and such) in the middle of said iteration? That may be a
significant consideration. Note that map (and set) do *not* invalidate
iterators on insert and delete (except for iterators pointing to a
deleted element).
One is a sorted vector, where I use 'lower_bound' to get where I should
insert the value and insert it there, and use it to find the value to
remove. Lookups are just normal lookups (thru
my_vector.begin()+indexSubscript ). Most inserts and removals are from
the center of the list, so whenever there is an insertion or removal,
I'm guessing that all the elements are copied up or down. The values
are user defined objects so thats a lot of copying calls.
Yeah, that's not a very good idea. You'll have to talk all day if you
want to try and convince me you've got a problem where the best
solution is to use a vector and insert into the middle of it over and
over.
The other design uses a map and a vector. The map contains the actual
sorted values. The vector is an index to every 25th key which is reset
at every insertion and removal (from the closest previous key in the
map onwards). To access the 25th*x value, you would get the
corresponding key in the index and find it in the map, and then iterate
sequentially.
....guh? Do you mean that you've got a vector<map<T>::iterator>? So
basically, you're pessimizing by caching a whole lot of iterators you
usually won't need? If that's not what you mean, please show some code
to be more clear. Or use a simpler scheme...
After implementing both of these, the performance difference is really
minimal for me, but I'm not sure if I'm missing a better way of doing
this. I'm guessing this is something that people have come across
often. I'm a new C++/STL user, so I might be missing a good data
structure to use or a better solution to this problem. Any help would
be appreciated.

What's wrong with just using std::set, and simply doing (iter += 25)?
You get logarithmic insert and delete, a constant-time operation to
jump ahead 25 elements, few iterator invalidations, and the sorting is
done for you. The only thing you don't get is constant-time random
access, but log time should be plenty good, I'd think. Implement it
and see how the perf numbers come out.

If that doesn't suit your needs, std::list is another reasonable
option. Linear random access, but constant-time insert/delete, and
again very few iterator invalidations to worry about. I recommend
getting O'Reilly's _STL Pocket Guide_ and paging through it when you
have trouble choosing the right STL template.

Luke

May 19 '06 #6

zaineb

Luke Meyers wrote:

Do you iterate across all (or most) of the collection when you do the
25-at-a-time thing, without modifying the container (hence invalidating
iterators and such) in the middle of said iteration? That may be a
significant consideration. Note that map (and set) do *not* invalidate
iterators on insert and delete (except for iterators pointing to a
deleted element).
I iterate through the collection after the position where the element
has been inserted or removed. So I'm basically resetting the index
(vector) after the position where the element was inserted. I'm not
storing iterators but keys (for the map), so I do a find in the map for
that key (but its logarithmic which isnt bad).
Yeah, that's not a very good idea. You'll have to talk all day if you
want to try and convince me you've got a problem where the best
solution is to use a vector and insert into the middle of it over and
over.
:)
...guh? Do you mean that you've got a vector<map<T>::iterator>? So
basically, you're pessimizing by caching a whole lot of iterators you
usually won't need? If that's not what you mean, please show some code
to be more clear. Or use a simpler scheme...
No I have a vector<key> for a map<key, value>. Also, I actually do use
pretty much each one of the index since they're "played out"
elsewhere.
What's wrong with just using std::set, and simply doing (iter += 25)?
You get logarithmic insert and delete, a constant-time operation to
jump ahead 25 elements, few iterator invalidations, and the sorting is
done for you. The only thing you don't get is constant-time random
access, but log time should be plenty good, I'd think. Implement it
and see how the perf numbers come out.

I wasn't aware I could do a iter+=25 on a set iterator. Are you sure we
can? I'm currently using Nicholai M. Josuttis _C++ Standard Library_
and I wasn't aware that sets have random access iterators (I thought it
was bidirectional only). If I can do an efficient +=25, then that would
be great!!

Also, I tested the +=n with maps .. it didnt work, Im guessing sets are
the same inside .. so if theres a way, lemme know!!

Also I tried using it with sets, are you sure this is defined.

Sorry Im a relative newbie, so lemme know.

Thanks!

May 19 '06 #7

Luke Meyers

za****@gmail.com wrote:

Luke Meyers wrote:
Do you iterate across all (or most) of the collection when you do the
25-at-a-time thing, without modifying the container (hence invalidating
iterators and such) in the middle of said iteration? That may be a
significant consideration. Note that map (and set) do *not* invalidate
iterators on insert and delete (except for iterators pointing to a
deleted element).
I iterate through the collection after the position where the element
has been inserted or removed. So I'm basically resetting the index
(vector) after the position where the element was inserted. I'm not
storing iterators but keys (for the map), so I do a find in the map for
that key (but its logarithmic which isnt bad).

Assuming your insertions occur with similar frequency at any point in
the list (as ISTR was stated or implied), this is no different in terms
of asymptotic complexity from resetting all the indices.

...guh? Do you mean that you've got a vector<map<T>::iterator>? So
basically, you're pessimizing by caching a whole lot of iterators you
usually won't need? If that's not what you mean, please show some code
to be more clear. Or use a simpler scheme...

No I have a vector<key> for a map<key, value>. Also, I actually do use
pretty much each one of the index since they're "played out"
elsewhere.

I think that, under this scheme, it's better to use a vector of
iterators, since they permit constant-time element access rather than
log-time access. Sounds like you're re-caching them in all cases where
iterators would be invalidated (and then some) anyway. Of course, if
your constraint about the 25 thing was loose, you could save on a lot
of this reevaluation (which is of linear complexity each time you do
it).

But regardless, what does this buy you? In order to populate the
vector, don't you have to do all the work ahead of time that you're
supposedly saving by caching those keys/iterators in the vector?
Measure caching vs. non-cachingg and see if your "optimization" really
bought you anything.
I wasn't aware I could do a iter+=25 on a set iterator. Are you sure we
can? I'm currently using Nicholai M. Josuttis _C++ Standard Library_
and I wasn't aware that sets have random access iterators (I thought it
was bidirectional only). If I can do an efficient +=25, then that would
be great!!

Also, I tested the +=n with maps .. it didnt work, Im guessing sets are
the same inside .. so if theres a way, lemme know!!

Also I tried using it with sets, are you sure this is defined.

Sorry, I didn't try this myself prior to posting. Whether += is
defined is (perhaps surprisingly at first) kind of irrelevant. You've
got a forward-iterator, which means you've got operator++, so you can
always implement operator+= in terms of that. If the standard library
doesn't provide this, it's probably to avoid implying that the
operation is more efficient than it really is.

Here's a working implementation:

namespace std {
template <class FwIter>
FwIter &
operator+=(FwIter & iter, size_t n)
{
for (int i = 0; i < n; ++i, ++iter); // no body
return iter;
}
}

Luke

May 22 '06 #8

persenaama

The iter += 25 will in this case do ++iter 25 times most likely. Btw.
if you want to iterate after the object you did just insert, you can
use the iterator to that object and start from there (insert returns
iterator!).

May 22 '06 #9

Luke Meyers

Luke Meyers wrote:

za****@gmail.com wrote:
If I can do an efficient +=25, then that would be great!!

Forgot to mention in my other post, but while the implementation I gave
is of course linear w.r.t. 25, 25 is a constant (and a small one), so
technically the operation is constant time. I assume you're clever
enough not to let this fool you into a false sense of efficiency,
though -- but the problem is a result of the data structure you chose,
not this operation.

Luke

May 22 '06 #10

zaineb

The whole point of doing this was not having to iterate all the way
down the list.
So the way this is used is that I have a bunch of records. Depending on
the record x used, you would get 25xth to 25(x+1)th . So if there are
multiple records called, for each record you would have to iterate all
the way down the list to get the values. For lists that are in the
thousands, this is very hard. So for imagine this for x = 1000 with x =
1500. Now imagine doing this for 100 or so records all with different x
values. There's a lot of unnecessary iteration going on.
Since insertions or removals might happen 6-7 times a second, it makes
more sense to have an index instead of using +=25 to iterate linearly.
The gains in doing that are substantial.

May 22 '06 #11

Richard Herring

In message <11**********************@j73g2000cwa.googlegroups .com>, Luke
Meyers <n.***********@gmail.com> writes

Sorry, I didn't try this myself prior to posting. Whether += is
defined is (perhaps surprisingly at first) kind of irrelevant. You've
got a forward-iterator, which means you've got operator++, so you can
always implement operator+= in terms of that. If the standard library
doesn't provide this, it's probably to avoid implying that the
operation is more efficient than it really is.

Here's a working implementation:
(apart from being UB :-( )
namespace std {
17.4.3.1 It is undefined for a C + + program to add declarations or
definitions to namespace std or namespaces within namespace std unless
otherwise specified. A program may add template specializations for any
standard library template to namespace std. Such a specialization
(complete or partial) of a standard library template results in
undefined behavior unless the declaration depends on a user-defined name
of external linkage and unless the specialization meets the standard
library requirements for the original template.
template <class FwIter>
FwIter &
operator+=(FwIter & iter, size_t n)
which will attempt to match just about any type for its left operand,
not just set iterators.
{
for (int i = 0; i < n; ++i, ++iter); // no body
return iter;
}
}

The C++ way of doing this is called std::advance().

--
Richard Herring

May 22 '06 #12

Luke Meyers

za****@gmail.com wrote:

The whole point of doing this was not having to iterate all the way
down the list.
So the way this is used is that I have a bunch of records. Depending on
the record x used, you would get 25xth to 25(x+1)th . So if there are
multiple records called, for each record you would have to iterate all
the way down the list to get the values. For lists that are in the
thousands, this is very hard. So for imagine this for x = 1000 with x =
1500. Now imagine doing this for 100 or so records all with different x
values. There's a lot of unnecessary iteration going on.
Since insertions or removals might happen 6-7 times a second, it makes
more sense to have an index instead of using +=25 to iterate linearly.
The gains in doing that are substantial.

Unless the creation of the indices introduces overhead equivalent to
the linear-search work you're trying to avoid by creating them. Which
is what I stated. I'm well aware, and said explicitly, that the +=
operation doesn't magically introduce constant-time random access to
std::map. Neither does your scheme. If you've got 6-7
insertions/removals per second, and calculate the set of indices (or a
significant fraction thereof -- half, on average, if they're evenly
distributed) each time, you're doing 6-7 operations with time
complexity that's linear in the size of the map per second. If you're
willing to recalculate the set of indices less often, at the expense of
not spanning exactly 25 elements each time, you can incrementally
reduce this. You didn't respond to my suggestion along these lines,
though.

The asymptotic time-complexities of your current scheme, from my best
understanding of your descriptions, are:

Insert/remove element:
O(n*log(n)) // log(n) for std::map insertion, plus O(n) insertions of
recalculated indices

Access element number x*25, given cached index set:
O(log(n)) // can be improved to O(1) if you use vector<map::iterator>

Iterate across all elements, 25 at a time, assuming no insert/delete
interruptions:
O(n*log(n)) // O(n) elements accessed, log(n) look up each time. Can
be improved to O(n) if you use vector<map::iterator>

There's no way you'll get better than O(n) for accessing every 25th
element, since the number of elements you want to actually look at has
cardinality of O(n). The problem is the O(n*log(n)) step on
insert/remove. That's as bad as doing push_back on a vector and
calling std::sort each time you insert. Do you still like this scheme?

Finally, getting back to the original point, it's not accurate to
characterize the += step as "iterating all the way down the list." 25
is a constant. You've got an iterator to one point in the list, and
+=25 involves a constant amount of work to get to the next point you're
interested in. If you want to do this n times (or n/25 times, as it
happens), you're going to have linear-time complexity regardless of
whether your constant-time operation is an efficient random access
(which is not available) or the one I described involving looping over
operator++.

I hope this is more clear. I feel like your response was a little
impolite, incidentally -- I feel I've demonstrated adequate
understanding of asymptotic complexity in this thread not to merit a
remedial explanation of why linear search is inefficient.

Unless you've got some better-than-O(n) way of caching those indices
each time you insert/remove, perform fewer insertions/removals, or
recache your indices far less frequently, you're not going to improve
on O(n*log(n)) here. Given that fact, you might as well give up on
this scheme for a more straightforward O(n*log(n)) scheme (one with a
better constant factor, no doubt), or try approaching the problem from
a different perspective entirely.

According to what you've said, it sounds like perhaps you're
over-valuing random access. You said that, after doing one of these
random accesses, you proceed to iterate over the entire collection
starting at that point (on average, half of the elements), 25 at a
time. That's inherently an O(n) operation, so you wouldn't increase
your big-O even by doing linear search for the starting point. Of
course, the constant factor in that lookup is greater by a factor of
25. But, well, food for thought.

Luke

May 22 '06 #13

Luke Meyers

Richard Herring wrote:

In message <11**********************@j73g2000cwa.googlegroups .com>, Luke
Meyers <n.***********@gmail.com> writes
Here's a working implementation:

(apart from being UB :-( )

Okay, maybe "working implementation" was too strong a phrase. ;)

What I meant is, I coded it up and compiled it, and it worked in my
case -- to be fair, that's more than you get a lot of the time with ng
code!

namespace std {

17.4.3.1 It is undefined for a C + + program to add declarations or
definitions to namespace std or namespaces within namespace std unless
otherwise specified. A program may add template specializations for any
standard library template to namespace std. Such a specialization
(complete or partial) of a standard library template results in
undefined behavior unless the declaration depends on a user-defined name
of external linkage and unless the specialization meets the standard
library requirements for the original template.

Yeah yeah, ya got me. Thanks for pointing this out. I was aware of
this section of the standard, but had chosen following the example of
an old C++ mentor of mine over adhering to the letter of the standard
in this particular case -- not my usual practice, and I'm now glad to
have considered this and found sufficient reason not to make the
exception. His reasoning for adding things to namespace std had to do
with ADL, and putting operations in the same namespace as the entities
to whose interface they belong. Valid in general, but as I considered
your point I realized that, by introducing a template such as this, I
could interfere with lookups happening internal to my standard library
implementation, which may be part of why this section of the standard
exists.

template <class FwIter>
FwIter &
operator+=(FwIter & iter, size_t n)

which will attempt to match just about any type for its left operand,
not just set iterators.

Right, and I knew this. Just as you know that writing correct generic
code is full of unique challenges -- my point was not to say, here's
some perfect code you should copy and paste and use everywhere, but
just to point out that based on the ability to increment, incrementing
by 25 is an obvious and trivial extension. How it's accomplished
syntactically, while perhaps interesting in another context, is not
important to the question at hand.

for (int i = 0; i < n; ++i, ++iter); // no body

The C++ way of doing this is called std::advance().

Yes, that's exactly what I wish I had thought of.

Thanks,
Luke

May 22 '06 #14

Similar topics