445,750 Members | 1,214 Online
Need help? Post your question and get tips & solutions from a community of 445,750 IT Pros & Developers. It's quick & easy.

Efficiency of map and vector iterators

 P: n/a I'm curious why I might be getting such a large performance difference between using a map iterator and a vector iterator. This is a computational electromagnetics code where I have to separate points in space that arrive randomly into groupings based on (x, y, z) dimensions, so I use std::map to do that. Once the sorting is done I need to find the interactions between groups, so I've been using iterators to step through the map. As a test, I then modified the code a bit to first copy the groups from the map into a vector so I could use a vector iterator. I didn't expect any significant performance difference since there are heavy floating point calculations going on at each step. However, with Intel C++ 7.1 the compuation time to do the nested loops below decreased from 78.5 seconds to 60 seconds when I switched to the vector. The difference with GCC was less severe, but it still decreased from 81.5 seconds with the map to 73.1 seconds with the vector. My question is why there is such a dramatic difference in these two cases. With all the floating point calculations going on, I can't imagine that the actual time to increment the iterators is having an effect. Could the map template be preventing optimizations that the vector template alone permits? (I used -O3 optimization in both cases.) Is this a typical penalty paid when iterating through a map instead of a vector? Code sample with map or vector implementation chosen using #define statements. Note that I added the pointers to the map elements to allow the same code to be used below with either iterator. This had no effect on the execution time. using namespace std; typedef map , GROUP> GROUP_MAP; // DIMS has x, y, z indices vector groups_map; #ifdef USE_GROUPS_VECTOR typedef vector GROUP_VECTOR; vector groups_vector; .... // Copy groups from map to vector #endif for (int level = 0; level < LEVELS; ++level) { #ifdef USE_GROUPS_MAP // Slower for (GROUP_MAP::iterator M_it = groups_map[level].begin(); M_it != groups_map[level].end(); ++M_it) { GROUP *m_group = &(M_it->second); // For compatibility below for (GROUP_MAP::iterator N_it = groups_map[level].begin(); N_it != groups_map[level].end(); ++N_it) { GROUP *n_group = &(N_it->second); // For compatibility below #endif #ifdef USE_GROUPS_VECTOR // Faster for (GROUP_VECTOR::iterator m_group = groups_vector[level].begin(); m_group != groups_vector[level].end(); ++m_group) { for (GROUP_VECTOR::iterator n_group = groups_vector[level].begin(); n_group != groups_vector[level].end(); ++n_group) { #endif ... // Heavy floating-point calculations, calling Fortran/C libraries } } } Jul 22 '05 #1
14 Replies

 P: n/a Jim West wrote: I'm curious why I might be getting such a large performance difference between using a map iterator and a vector iterator. Incrementing a vector iterator is usually a single instruction (add offset) while a map iterator is usually a memory lookup (or many memory lookups) and hence more expensive computationally. No surprises here. Jul 22 '05 #2

 P: n/a In article , Gianni Mariani wrote: Jim West wrote: I'm curious why I might be getting such a large performance difference between using a map iterator and a vector iterator. Incrementing a vector iterator is usually a single instruction (add offset) while a map iterator is usually a memory lookup (or many memory lookups) and hence more expensive computationally. I understand that. However, I do six fast Fourier transforms (lengths up to 100s) and several matrix-vector multiplies within each loop. I can't imagine that the time needed to increment the iterator could be significant compared to the time needed to do the floating-point computations. And yet I'm getting a 30% decrease in execution speed when using a map with Intel C++. Hence my concern that there is something else fundamentally different between the map and vector templates that is having an effect that I am not aware of (and should be!). Jul 22 '05 #3

 P: n/a "Jim West" wrote in message news:sl********************@detrick.westlan... In article , Gianni Mariani wrote: Jim West wrote: I'm curious why I might be getting such a large performance difference between using a map iterator and a vector iterator. Incrementing a vector iterator is usually a single instruction (add offset) while a map iterator is usually a memory lookup (or many memory lookups) and hence more expensive computationally. I understand that. However, I do six fast Fourier transforms (lengths up to 100s) and several matrix-vector multiplies within each loop. I can't imagine that the time needed to increment the iterator could be significant compared to the time needed to do the floating-point computations. And yet I'm getting a 30% decrease in execution speed when using a map with Intel C++. Hence my concern that there is something else fundamentally different between the map and vector templates that is having an effect that I am not aware of (and should be!). Yes, that sounds fishy to me. I wouldn't be surprised if, somehow, the number of calculations you are running is actually different in the two cases. -- Cy http://home.rochester.rr.com/cyhome/ Jul 22 '05 #4

 P: n/a Jim West wrote: I understand that. However, I do six fast Fourier transforms (lengths up to 100s) and several matrix-vector multiplies within each loop. I can't imagine that the time needed to increment the iterator could be significant compared to the time needed to do the floating-point computations. And yet I'm getting a 30% decrease in execution speed when using a map with Intel C++. Hence my concern that there is something else fundamentally different between the map and vector templates that is having an effect that I am not aware of (and should be!). Can your run a profiler ? Jul 22 '05 #5

 P: n/a Jim West wrote: I understand that. However, I do six fast Fourier transforms (lengths up to 100s) and several matrix-vector multiplies within each loop. I can't imagine that the time needed to increment the iterator could be significant compared to the time needed to do the floating-point computations. And yet I'm getting a 30% decrease in execution speed when using a map with Intel C++. Hence my concern that there is something else fundamentally different between the map and vector templates that is having an effect that I am not aware of (and should be!). Well, you can always take a look into the actual implementation of 'std::map<>::iterator's increment operator, see how "heavy" it really is and decide whether the observed 30% reduction in execution speed is justified. Also, some other factors could be at play here. For example, adjacent elements of 'std::vector' are stored in adjacent regions of memory, which is not necessarily the case with 'std::map'. This can have noticeable impact on the efficiency of hardware memory caches, which in turn might have noticeable impact on the overall performance. -- Best regards, Andrey Tarasevich Jul 22 '05 #6

 P: n/a "Jim West" wrote in message news:sl********************@detrick.westlan... In article , Gianni Mariani wrote: Jim West wrote: I'm curious why I might be getting such a large performance difference between using a map iterator and a vector iterator. Incrementing a vector iterator is usually a single instruction (add offset) while a map iterator is usually a memory lookup (or many memory lookups) and hence more expensive computationally. I understand that. However, I do six fast Fourier transforms (lengths up to 100s) and several matrix-vector multiplies within each loop. I can't imagine that the time needed to increment the iterator could be significant compared to the time needed to do the floating-point computations. And yet I'm getting a 30% decrease in execution speed when using a map with Intel C++. Hence my concern that there is something else fundamentally different between the map and vector templates that is having an effect that I am not aware of (and should be!). Looking through the headers on my system (VC6), there is a major difference in the way end() is implemented. In we have: iterator end() {return (_Last); } where the 'iterator' type is a typedef for a pointer In we have: iterator end() {return (_Tr.end()); } which leads to (in xtree): iterator end() {return (iterator(_Head)); } which leads to: iterator(_Nodeptr _P) : const_iterator(_P) {} and: const_iterator(_Nodeptr _P) : _Ptr(_P) {} So, for vector, end() just returns a pointer but, for map, each call results in some memory allocation and object construction. What happens if you cache the return value of end() when using the map (assuming you are not altering the map so as to make this dangerous)? Tom Jul 22 '05 #7

 P: n/a Thomas Wintschel wrote in news:s9SBb.44169\$d35.14903@edtnps84: Looking through the headers on my system (VC6), there is a major difference in the way end() is implemented. In we have: iterator end() {return (_Last); } where the 'iterator' type is a typedef for a pointer In we have: iterator end() {return (_Tr.end()); } which leads to (in xtree): iterator end() {return (iterator(_Head)); } which leads to: iterator(_Nodeptr _P) : const_iterator(_P) {} and: const_iterator(_Nodeptr _P) : _Ptr(_P) {} So, for vector, end() just returns a pointer but, for map, each call results in some memory allocation and object construction. What happens if you cache the return value of end() when using the map (assuming you are not altering the map so as to make this dangerous)? A std::map (and std::set and std::multi...) end() iterator is fixed. Insert's don't invalidate iterator's, and erase only invalidates the iterators to the elements('s) being erased, and you can't erase end(). So this shouldn't be a problem. Rob. -- http://www.victim-prime.dsl.pipex.com/ Jul 22 '05 #8

 P: n/a In article , Thomas Wintschel wrote: So, for vector, end() just returns a pointer but, for map, each call results in some memory allocation and object construction. What happens if you cache the return value of end() when using the map (assuming you are not altering the map so as to make this dangerous)? That is something that would never have occured to me. I will check it in the morning. Following Gianni's advice, I ran the profiler (why I didn't do it in the first place is beyond me!) and two things jumped out at me: 1) The map version had the entry: (lines broken) 4.57 106.10 9.36 33835812 0.00 0.00 std::_Tree, GROUP, std::less >, std::allocator const, GROUP> >, false>::const_iterator::_Inc() which would account for about half the time difference (the 9.36 s is the actual time). I found that quite surprising. 2) The entry for operator* multiplying a float by a complex grew from 1.20 s with vector to 8.90 s with map. I have no idea why that might occur, other than an optimizing difference. (I get exactly the same answer, down to the last decimal, so I don't think I'm doing any extra calculations with the map.) I'll tweak the stopping test tomrrow after I have had some time to sleep. Thanks for the help, everyone. Jul 22 '05 #9

 P: n/a "Rob Williscroft" wrote in message news:Xn**********************************@195.129. 110.200... Thomas Wintschel wrote in news:s9SBb.44169\$d35.14903@edtnps84: Looking through the headers on my system (VC6), there is a major difference in the way end() is implemented. In we have: iterator end() {return (_Last); } where the 'iterator' type is a typedef for a pointer In we have: iterator end() {return (_Tr.end()); } which leads to (in xtree): iterator end() {return (iterator(_Head)); } which leads to: iterator(_Nodeptr _P) : const_iterator(_P) {} and: const_iterator(_Nodeptr _P) : _Ptr(_P) {} So, for vector, end() just returns a pointer but, for map, each call results in some memory allocation and object construction. What happens if you cache the return value of end() when using the map (assuming you are not altering the map so as to make this dangerous)? A std::map (and std::set and std::multi...) end() iterator is fixed. Insert's don't invalidate iterator's, and erase only invalidates the iterators to the elements('s) being erased, and you can't erase end(). So this shouldn't be a problem. Rob. -- http://www.victim-prime.dsl.pipex.com/ Just tried it out on a 933 MHz Intel. Constructed an empty map and an empty vector and called end() on each a hundred million times. The map clocked in at 34 seconds and the vector at 9. I guess this only really matters to those who must iterate over grains of sand in a desert. Tom Jul 22 '05 #10

 P: n/a Gianni's advice, I ran the profiler (why I didn't do it in the first place is beyond me!) and two things jumped out at me: 1) The map version had the entry: (lines broken) 4.57 106.10 9.36 33835812 0.00 0.00 std::_Tree, GROUP, std::less >, std::allocator const, GROUP> >, false>::const_iterator::_Inc() which would account for about half the time difference (the 9.36 s is the actual time). I found that quite surprising. 2) The entry for operator* multiplying a float by a complex grew from 1.20 s with vector to 8.90 s with map. I have no idea why that might occur, other than an optimizing difference. (I get exactly the same answer, down to the last decimal, so I don't think I'm doing any extra calculations with the map.) As far as the second point is concerned, it is difficult to to tell why this happens without seeing the code. But as Andrey suggested the memory access patterns may be an issue. Cache misses are very expensive on modern processors. -- Peter van Merkerk peter.van.merkerk(at)dse.nl Jul 22 '05 #11

 P: n/a Thomas Wintschel wrote: Just tried it out on a 933 MHz Intel. Constructed an empty map and an empty vector and called end() on each a hundred million times. The map clocked in at 34 seconds and the vector at 9. I guess this only really matters to those who must iterate over grains of sand in a desert. Carefull how you interpret this. 9 = time(vector.end()) + loop overhead 34 = time(map.end()) + loop overhead if: 9 = loop overhead then time(map.end()) is infinitly more expensive than time(map.end()) ... BTW - it is likely that the loop overhead is more expensive than calling vector.end() because vector.end() is more than likely inlined. In a good optimizing compiler, it may not call it at all if it does nothing. However, my tests with GCC 3.3.1 show that map operations are ~3 times slower. Jul 22 '05 #12

 P: n/a In article , Jim West wrote: 1) The map version had the entry: (lines broken) 4.57 106.10 9.36 33835812 0.00 0.00 std::_Tree, GROUP, std::less >, std::allocator const, GROUP> >, false>::const_iterator::_Inc() Well, I figured out this part. I had a direct lookup of the map elements using a DIMS as the index buried in the inner loop that didn't do anything (left over from a developmental version of the code). Apparently it was optimized when switching to the vector but not when iterating over the same map the lookup refered to. Even I know that is an extremely slow operation! When I got rid of it the time reduced to 66 seconds, versus 60 seconds for the vector version. I feel a bit like an idiot for this one. Sorry to waste everyone's time. 2) The entry for operator* multiplying a float by a complex grew from 1.20 s with vector to 8.90 s with map. I have no idea why that might occur, other than an optimizing difference. (I get exactly the same answer, down to the last decimal, so I don't think I'm doing any extra calculations with the map.) Still wondering about this one, though. My best guess is that it has to do with cache hits (as suggested) since copying the groups into a vector puts them into contiguous memory. Although right now I must admit it is probably more likely that I messed up somewhere else. Thanks again for the help, everyone. Jul 22 '05 #13

 P: n/a In article , Jim West wrote: 2) The entry for operator* multiplying a float by a complex grew from 1.20 s with vector to 8.90 s with map. I have no idea why that might occur, other than an optimizing difference. (I get exactly the same answer, down to the last decimal, so I don't think I'm doing any extra calculations with the map.) Still wondering about this one, though. My best guess is that it has to do with cache hits (as suggested) since copying the groups into a vector puts them into contiguous memory. Although right now I must admit it is probably more likely that I messed up somewhere else. If anyone is still interested, this one went away when I upgraded to Intel C++ version 8.0 (from 7.1). Apparently it was an optimizing problem afterall. Jul 22 '05 #14

 P: n/a "Jim West" wrote in message news:sl**************************@jwest.ecen.oksta te.edu... I'm curious why I might be getting such a large performance difference between using a map iterator and a vector iterator. [snip] Comparative performance tests "vector iterator vs. map iterator" can be seen at : http://article.gmane.org/gmane.comp.....perfometer/24 We can see that a map iterator uses more time than a vector iterator. ===================================== Alex Vinokur mailto:al****@connect.to http://mathforum.org/library/view/10978.html news://news.gmane.org/gmane.comp.lang.c++.perfometer ===================================== Jul 22 '05 #15