By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
445,750 Members | 1,214 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 445,750 IT Pros & Developers. It's quick & easy.

Efficiency of map and vector iterators

P: n/a


I'm curious why I might be getting such a large performance
difference between using a map iterator and a vector iterator.
This is a computational electromagnetics code where I have to
separate points in space that arrive randomly into groupings
based on (x, y, z) dimensions, so I use std::map to do that.
Once the sorting is done I need to find the interactions between
groups, so I've been using iterators to step through the map.
As a test, I then modified the code a bit to first copy the
groups from the map into a vector so I could use a vector
iterator. I didn't expect any significant performance difference
since there are heavy floating point calculations going on at
each step. However, with Intel C++ 7.1 the compuation time to do
the nested loops below decreased from 78.5 seconds to 60 seconds
when I switched to the vector. The difference with GCC was less
severe, but it still decreased from 81.5 seconds with the map to
73.1 seconds with the vector.

My question is why there is such a dramatic difference in
these two cases. With all the floating point calculations going
on, I can't imagine that the actual time to increment the
iterators is having an effect. Could the map template be
preventing optimizations that the vector template alone
permits? (I used -O3 optimization in both cases.) Is this
a typical penalty paid when iterating through a map instead
of a vector?
Code sample with map or vector implementation chosen using
#define statements. Note that I added the pointers to the map
elements to allow the same code to be used below with either
iterator. This had no effect on the execution time.

using namespace std;

typedef map<DIMS<int> , GROUP> GROUP_MAP; // DIMS has x, y, z indices
vector<GROUP_MAP> groups_map;

#ifdef USE_GROUPS_VECTOR
typedef vector<GROUP> GROUP_VECTOR;
vector<GROUP_VECTOR> groups_vector;

.... // Copy groups from map to vector

#endif

for (int level = 0; level < LEVELS; ++level) {
#ifdef USE_GROUPS_MAP // Slower
for (GROUP_MAP::iterator M_it = groups_map[level].begin();
M_it != groups_map[level].end(); ++M_it) {
GROUP *m_group = &(M_it->second); // For compatibility below
for (GROUP_MAP::iterator N_it = groups_map[level].begin();
N_it != groups_map[level].end(); ++N_it) {
GROUP *n_group = &(N_it->second); // For compatibility below
#endif
#ifdef USE_GROUPS_VECTOR // Faster
for (GROUP_VECTOR::iterator m_group = groups_vector[level].begin();
m_group != groups_vector[level].end(); ++m_group) {
for (GROUP_VECTOR::iterator n_group = groups_vector[level].begin();
n_group != groups_vector[level].end(); ++n_group) {
#endif
... // Heavy floating-point calculations, calling Fortran/C libraries
}
}
}
Jul 22 '05 #1
Share this Question
Share on Google+
14 Replies


P: n/a
Jim West wrote:
I'm curious why I might be getting such a large performance
difference between using a map iterator and a vector iterator.


Incrementing a vector iterator is usually a single instruction (add
offset) while a map iterator is usually a memory lookup (or many memory
lookups) and hence more expensive computationally.

No surprises here.
Jul 22 '05 #2

P: n/a
In article <br********@dispatch.concentric.net>, Gianni Mariani wrote:
Jim West wrote:
I'm curious why I might be getting such a large performance
difference between using a map iterator and a vector iterator.


Incrementing a vector iterator is usually a single instruction (add
offset) while a map iterator is usually a memory lookup (or many memory
lookups) and hence more expensive computationally.


I understand that. However, I do six fast Fourier transforms (lengths
up to 100s) and several matrix-vector multiplies within each loop.
I can't imagine that the time needed to increment the iterator could be
significant compared to the time needed to do the floating-point
computations. And yet I'm getting a 30% decrease in execution speed
when using a map with Intel C++. Hence my concern that there is
something else fundamentally different between the map and vector
templates that is having an effect that I am not aware of (and
should be!).
Jul 22 '05 #3

P: n/a
"Jim West" <eg***********@yahoo.com> wrote in message
news:sl********************@detrick.westlan...
In article <br********@dispatch.concentric.net>, Gianni Mariani wrote:
Jim West wrote:
I'm curious why I might be getting such a large performance
difference between using a map iterator and a vector iterator.


Incrementing a vector iterator is usually a single instruction (add
offset) while a map iterator is usually a memory lookup (or many memory
lookups) and hence more expensive computationally.


I understand that. However, I do six fast Fourier transforms (lengths
up to 100s) and several matrix-vector multiplies within each loop.
I can't imagine that the time needed to increment the iterator could be
significant compared to the time needed to do the floating-point
computations. And yet I'm getting a 30% decrease in execution speed
when using a map with Intel C++. Hence my concern that there is
something else fundamentally different between the map and vector
templates that is having an effect that I am not aware of (and
should be!).


Yes, that sounds fishy to me. I wouldn't be surprised if, somehow, the
number of calculations you are running is actually different in the two
cases.

--
Cy
http://home.rochester.rr.com/cyhome/
Jul 22 '05 #4

P: n/a
Jim West wrote:

I understand that. However, I do six fast Fourier transforms (lengths
up to 100s) and several matrix-vector multiplies within each loop.
I can't imagine that the time needed to increment the iterator could be
significant compared to the time needed to do the floating-point
computations. And yet I'm getting a 30% decrease in execution speed
when using a map with Intel C++. Hence my concern that there is
something else fundamentally different between the map and vector
templates that is having an effect that I am not aware of (and
should be!).

Can your run a profiler ?

Jul 22 '05 #5

P: n/a
Jim West wrote:

I understand that. However, I do six fast Fourier transforms (lengths
up to 100s) and several matrix-vector multiplies within each loop.
I can't imagine that the time needed to increment the iterator could be
significant compared to the time needed to do the floating-point
computations. And yet I'm getting a 30% decrease in execution speed
when using a map with Intel C++. Hence my concern that there is
something else fundamentally different between the map and vector
templates that is having an effect that I am not aware of (and
should be!).


Well, you can always take a look into the actual implementation of
'std::map<>::iterator's increment operator, see how "heavy" it really is
and decide whether the observed 30% reduction in execution speed is
justified.

Also, some other factors could be at play here. For example, adjacent
elements of 'std::vector' are stored in adjacent regions of memory,
which is not necessarily the case with 'std::map'. This can have
noticeable impact on the efficiency of hardware memory caches, which in
turn might have noticeable impact on the overall performance.

--
Best regards,
Andrey Tarasevich

Jul 22 '05 #6

P: n/a

"Jim West" <eg***********@yahoo.com> wrote in message
news:sl********************@detrick.westlan...
In article <br********@dispatch.concentric.net>, Gianni Mariani wrote:
Jim West wrote:
I'm curious why I might be getting such a large performance
difference between using a map iterator and a vector iterator.


Incrementing a vector iterator is usually a single instruction (add
offset) while a map iterator is usually a memory lookup (or many memory
lookups) and hence more expensive computationally.


I understand that. However, I do six fast Fourier transforms (lengths
up to 100s) and several matrix-vector multiplies within each loop.
I can't imagine that the time needed to increment the iterator could be
significant compared to the time needed to do the floating-point
computations. And yet I'm getting a 30% decrease in execution speed
when using a map with Intel C++. Hence my concern that there is
something else fundamentally different between the map and vector
templates that is having an effect that I am not aware of (and
should be!).


Looking through the headers on my system (VC6), there is a major difference
in the way end() is implemented.

In <vector> we have:
iterator end() {return (_Last); }
where the 'iterator' type is a typedef for a pointer

In <map> we have:
iterator end() {return (_Tr.end()); }
which leads to (in xtree):
iterator end() {return (iterator(_Head)); }

which leads to:
iterator(_Nodeptr _P) : const_iterator(_P) {}
and:
const_iterator(_Nodeptr _P) : _Ptr(_P) {}

So, for vector, end() just returns a pointer but, for map, each call results
in some memory allocation and object construction.

What happens if you cache the return value of end() when using the map
(assuming you are not altering the map so as to make this dangerous)?

Tom
Jul 22 '05 #7

P: n/a
Thomas Wintschel wrote in news:s9SBb.44169$d35.14903@edtnps84:

Looking through the headers on my system (VC6), there is a major
difference in the way end() is implemented.

In <vector> we have:
iterator end() {return (_Last); }
where the 'iterator' type is a typedef for a pointer

In <map> we have:
iterator end() {return (_Tr.end()); }
which leads to (in xtree):
iterator end() {return (iterator(_Head)); }

which leads to:
iterator(_Nodeptr _P) : const_iterator(_P) {}
and:
const_iterator(_Nodeptr _P) : _Ptr(_P) {}

So, for vector, end() just returns a pointer but, for map, each call
results in some memory allocation and object construction.

What happens if you cache the return value of end() when using the map
(assuming you are not altering the map so as to make this dangerous)?


A std::map (and std::set and std::multi...) end() iterator is fixed.

Insert's don't invalidate iterator's, and erase only invalidates the
iterators to the elements('s) being erased, and you can't erase end().

So this shouldn't be a problem.

Rob.
--
http://www.victim-prime.dsl.pipex.com/
Jul 22 '05 #8

P: n/a
In article <s9SBb.44169$d35.14903@edtnps84>, Thomas Wintschel wrote:


So, for vector, end() just returns a pointer but, for map, each call results
in some memory allocation and object construction.

What happens if you cache the return value of end() when using the map
(assuming you are not altering the map so as to make this dangerous)?
That is something that would never have occured to me. I will
check it in the morning.

Following Gianni's advice, I ran the profiler (why I didn't do it in the
first place is beyond me!) and two things jumped out at me:

1) The map version had the entry: (lines broken)

4.57 106.10 9.36 33835812 0.00 0.00
std::_Tree<std::_Tmap_traits<DIMS<int>, GROUP, std::less<DIMS<int> >,
std::allocator<std::pair<DIMS<int> const, GROUP> >, false>::const_iterator::_Inc()


which would account for about half the time difference (the 9.36 s
is the actual time). I found that quite surprising.

2) The entry for operator* multiplying a float by a complex<float> grew
from 1.20 s with vector to 8.90 s with map. I have no idea why
that might occur, other than an optimizing difference. (I get exactly
the same answer, down to the last decimal, so I don't think I'm
doing any extra calculations with the map.)

I'll tweak the stopping test tomrrow after I have had some time to
sleep.

Thanks for the help, everyone.
Jul 22 '05 #9

P: n/a

"Rob Williscroft" <rt*@freenet.REMOVE.co.uk> wrote in message
news:Xn**********************************@195.129. 110.200...
Thomas Wintschel wrote in news:s9SBb.44169$d35.14903@edtnps84:

Looking through the headers on my system (VC6), there is a major
difference in the way end() is implemented.

In <vector> we have:
iterator end() {return (_Last); }
where the 'iterator' type is a typedef for a pointer

In <map> we have:
iterator end() {return (_Tr.end()); }
which leads to (in xtree):
iterator end() {return (iterator(_Head)); }

which leads to:
iterator(_Nodeptr _P) : const_iterator(_P) {}
and:
const_iterator(_Nodeptr _P) : _Ptr(_P) {}

So, for vector, end() just returns a pointer but, for map, each call
results in some memory allocation and object construction.

What happens if you cache the return value of end() when using the map
(assuming you are not altering the map so as to make this dangerous)?


A std::map (and std::set and std::multi...) end() iterator is fixed.

Insert's don't invalidate iterator's, and erase only invalidates the
iterators to the elements('s) being erased, and you can't erase end().

So this shouldn't be a problem.

Rob.
--
http://www.victim-prime.dsl.pipex.com/


Just tried it out on a 933 MHz Intel. Constructed an empty map and an empty
vector and called end() on each a hundred million times. The map clocked in
at 34 seconds and the vector at 9.

I guess this only really matters to those who must iterate over grains of
sand in a desert.

Tom
Jul 22 '05 #10

P: n/a
<snip>
Gianni's advice, I ran the profiler (why I didn't do it in the
first place is beyond me!) and two things jumped out at me:

1) The map version had the entry: (lines broken)

4.57 106.10 9.36 33835812 0.00 0.00
std::_Tree<std::_Tmap_traits<DIMS<int>, GROUP, std::less<DIMS<int> >,
std::allocator<std::pair<DIMS<int> const, GROUP> >, false>
::const_iterator::_Inc()


which would account for about half the time difference (the 9.36 s
is the actual time). I found that quite surprising.

2) The entry for operator* multiplying a float by a complex<float> grew
from 1.20 s with vector to 8.90 s with map. I have no idea why
that might occur, other than an optimizing difference. (I get exactly
the same answer, down to the last decimal, so I don't think I'm
doing any extra calculations with the map.)


As far as the second point is concerned, it is difficult to to tell why
this happens without seeing the code. But as Andrey suggested the memory
access patterns may be an issue. Cache misses are very expensive on modern
processors.

--
Peter van Merkerk
peter.van.merkerk(at)dse.nl


Jul 22 '05 #11

P: n/a
Thomas Wintschel wrote:

Just tried it out on a 933 MHz Intel. Constructed an empty map and an empty
vector and called end() on each a hundred million times. The map clocked in
at 34 seconds and the vector at 9.

I guess this only really matters to those who must iterate over grains of
sand in a desert.


Carefull how you interpret this.

9 = time(vector.end()) + loop overhead
34 = time(map.end()) + loop overhead

if:

9 = loop overhead

then time(map.end()) is infinitly more expensive than time(map.end()) ...

BTW - it is likely that the loop overhead is more expensive than calling
vector.end() because vector.end() is more than likely inlined. In a
good optimizing compiler, it may not call it at all if it does nothing.

However, my tests with GCC 3.3.1 show that map operations are ~3 times
slower.

Jul 22 '05 #12

P: n/a
In article <sl********************@detrick.westlan>, Jim West wrote:

1) The map version had the entry: (lines broken)

4.57 106.10 9.36 33835812 0.00 0.00
std::_Tree<std::_Tmap_traits<DIMS<int>, GROUP, std::less<DIMS<int> >,
std::allocator<std::pair<DIMS<int> const, GROUP> >, false>
::const_iterator::_Inc()

Well, I figured out this part. I had a direct lookup of the map
elements using a DIMS<int> as the index buried in the inner loop
that didn't do anything (left over from a developmental version of
the code). Apparently it was optimized when switching to the vector
but not when iterating over the same map the lookup refered to. Even I
know that is an extremely slow operation! When I got rid of it the time
reduced to 66 seconds, versus 60 seconds for the vector version.

I feel a bit like an idiot for this one. Sorry to waste everyone's
time.

2) The entry for operator* multiplying a float by a complex<float> grew
from 1.20 s with vector to 8.90 s with map. I have no idea why
that might occur, other than an optimizing difference. (I get exactly
the same answer, down to the last decimal, so I don't think I'm
doing any extra calculations with the map.)


Still wondering about this one, though. My best guess is that it has
to do with cache hits (as suggested) since copying the groups into
a vector puts them into contiguous memory. Although right now I must
admit it is probably more likely that I messed up somewhere else.

Thanks again for the help, everyone.
Jul 22 '05 #13

P: n/a
In article <sl**************************@jwest.ecen.okstate.e du>, Jim West wrote:
2) The entry for operator* multiplying a float by a complex<float> grew
from 1.20 s with vector to 8.90 s with map. I have no idea why
that might occur, other than an optimizing difference. (I get exactly
the same answer, down to the last decimal, so I don't think I'm
doing any extra calculations with the map.)


Still wondering about this one, though. My best guess is that it has
to do with cache hits (as suggested) since copying the groups into
a vector puts them into contiguous memory. Although right now I must
admit it is probably more likely that I messed up somewhere else.


If anyone is still interested, this one went away when I upgraded to
Intel C++ version 8.0 (from 7.1). Apparently it was an optimizing
problem afterall.
Jul 22 '05 #14

P: n/a

"Jim West" <eg***********@yahoo.com> wrote in message news:sl**************************@jwest.ecen.oksta te.edu...


I'm curious why I might be getting such a large performance
difference between using a map iterator and a vector iterator.

[snip]
Comparative performance tests "vector iterator vs. map iterator" can be seen at :
http://article.gmane.org/gmane.comp.....perfometer/24

We can see that a map iterator uses more time than a vector iterator.
=====================================
Alex Vinokur
mailto:al****@connect.to
http://mathforum.org/library/view/10978.html
news://news.gmane.org/gmane.comp.lang.c++.perfometer
=====================================

Jul 22 '05 #15

This discussion thread is closed

Replies have been disabled for this discussion.