473,395 Members | 1,656 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

vector::insert performance tip.

Hi,
Trying to dump a set<int(s) in a vector (v) my guess was that
v.insert(v.end(),s.begin(),s.end())
would be at least as efficient as
copy(s.begin(), s.end(), back_inserter(v))
It turn out that it is almost two times slower (even without using
reserve() before copy) and that's because it traverses the set twice.
I'm posting it as an exception to the usual (and usually correct)
advise
"Prefer member functions to algorithms".

Regards
Kostas

Apr 28 '07 #1
15 4886
kostas wrote:
Trying to dump a set<int(s) in a vector (v) my guess was that
v.insert(v.end(),s.begin(),s.end())
would be at least as efficient as
copy(s.begin(), s.end(), back_inserter(v))
It turn out that it is almost two times slower (even without using
reserve() before copy) and that's because it traverses the set twice.
I don't see why it would need to traverse the set twice. Any hint
to why it was happening?
I'm posting it as an exception to the usual (and usually correct)
advise
"Prefer member functions to algorithms".
I don't think the advice has anything to do with performance. The
usual (and always correct) advice WRT performance is "measure first
then try to optimize".

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Apr 28 '07 #2
On Apr 28, 10:49 pm, "Victor Bazarov" <v.Abaza...@comAcast.netwrote:
I don't see why it would need to traverse the set twice. Any hint
to why it was happening?
The implementation of STL i use it needs first to calculate the
distance
s.end() - s.begin() in order to reserve the appropriate space. Is this
not
the standard (or the usual) implementation of vector::insert ?
I don't think the advice has anything to do with performance.
with what then?

Apr 28 '07 #3
On 28 Apr 2007 14:32:55 -0700, kostas <sk******@gmail.comwrote:
>On Apr 28, 10:49 pm, "Victor Bazarov" <v.Abaza...@comAcast.netwrote:
>I don't see why it would need to traverse the set twice. Any hint
to why it was happening?

The implementation of STL i use it needs first to calculate the
distance s.end() - s.begin() in order to reserve the appropriate
space. Is this not the standard (or the usual) implementation
of vector::insert ?
It's probably the 'normal' implementation which is faster for
random-access iterators but slower in some cases of bidirectional
iterators. I guess your number of elements is very small. Otherwise
the vector relocations in the copy(...,back_inserter(v)) variant would
be more time-consuming that the double traversal once.
--
Roland Pibinger
"The best software is simple, elegant, and full of drama" - Grady Booch
Apr 29 '07 #4
On Apr 29, 12:09 pm, rpbg...@yahoo.com (Roland Pibinger) wrote:
I guess your number of elements is very small.
Wrong guess. Actually in order to measure the difference i used one
million ints and more.
Otherwise
the vector relocations in the copy(...,back_inserter(v)) variant would
be more time-consuming that the double traversal once.
Both the vector expansion and the set traversal are amortized linear
operations in the number of items. It just happens that in this case
of built-in types the first coefficient is smaller than the other.
Apr 29 '07 #5
On Apr 28, 9:49 pm, "Victor Bazarov" <v.Abaza...@comAcast.netwrote:
kostas wrote:
Trying to dump a set<int(s) in a vector (v) my guess was that
v.insert(v.end(),s.begin(),s.end())
would be at least as efficient as
copy(s.begin(), s.end(), back_inserter(v))
It turn out that it is almost two times slower (even without using
reserve() before copy) and that's because it traverses the set twice.
I don't see why it would need to traverse the set twice. Any hint
to why it was happening?
It's probably an "optimization":-). If the number of elements
in the set is large, and the elements are expensive to copy,
it's faster to travers the set a first time to calculate how
many elements are needed, ensure sufficient capacity, and then
travers the set a second time to copy them.
I'm posting it as an exception to the usual (and usually correct)
advise
"Prefer member functions to algorithms".
I don't think the advice has anything to do with performance. The
usual (and always correct) advice WRT performance is "measure first
then try to optimize".
And this turns out to be a good example of why:-). Off hand, I
would have imagined that traversing the set twice would be
faster, because it avoids unnecessary allocations. Apparently,
I'd have guessed wrong. And apparently, the author of the STL
he's using guessed wrong as well.

(I suspect which form is faster will depend on any number of
external factors, but if the number of elements is large...
std::set has very poor locality, so you're likely to get some
paging, or at least cache misses, in each pass.)

--
James Kanze (Gabi Software) email: ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Apr 29 '07 #6
On Apr 28, 11:32 pm, kostas <skola...@gmail.comwrote:
On Apr 28, 10:49 pm, "Victor Bazarov" <v.Abaza...@comAcast.netwrote:
I don't see why it would need to traverse the set twice. Any hint
to why it was happening?
The implementation of STL i use it needs first to calculate
the distance s.end() - s.begin() in order to reserve the
appropriate space. Is this not the standard (or the usual)
implementation of vector::insert ?
It's the required implementation: "Complexity: The constructor
template <class InputIteratorvector(InputIterator first,
Input- Iterator last) makes only N calls to the copy constructor
of T (where N is the distance between first and last) and no
reallocations if iterators first and last are of forward,
bidirectional, or random access categories." The apparent
assumption in the standard is that copies and allocations are
expensive, and iteration is cheap. Copying basic types,
however, is very, very cheap; modern allocators aren't that
expensive, and iterating over node based containers can be
extremely expensive due to their poor locality.

--
James Kanze (Gabi Software) email: ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Apr 29 '07 #7
On Apr 29, 11:09 am, rpbg...@yahoo.com (Roland Pibinger) wrote:
On 28 Apr 2007 14:32:55 -0700, kostas <skola...@gmail.comwrote:
On Apr 28, 10:49 pm, "Victor Bazarov" <v.Abaza...@comAcast.netwrote:
I don't see why it would need to traverse the set twice. Any hint
to why it was happening?
The implementation of STL i use it needs first to calculate the
distance s.end() - s.begin() in order to reserve the appropriate
space. Is this not the standard (or the usual) implementation
of vector::insert ?
It's probably the 'normal' implementation which is faster for
random-access iterators but slower in some cases of bidirectional
iterators. I guess your number of elements is very small. Otherwise
the vector relocations in the copy(...,back_inserter(v)) variant would
be more time-consuming that the double traversal once.
I just did the test with 10 million members in the set. I get
similar results. I suspect that the lack of locality in node
based containers is the reason.
--
James Kanze (Gabi Software) email: ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Apr 29 '07 #8
On Apr 29, 1:28 pm, kostas <skola...@gmail.comwrote:

[...]
Both the vector expansion and the set traversal are amortized linear
operations in the number of items.
Which means very little on a modern machine. The time it takes
to execute a single operation can vary in significant
proportions depending on whether the required variables are in
registers, in cache, in main memory, or must be paged in. On
the machines I use, something like iter++, where iter is an
std::set<>::iterator, can vary between less than a microsecond
(everything in registers) to somewhere around 10 milliseconds
(if I get a page miss)---4 orders of magnitude.
--
James Kanze (Gabi Software) email: ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Apr 29 '07 #9
"James Kanze" <ja*********@gmail.comwrote in message
news:11**********************@n59g2000hsh.googlegr oups.com...
On Apr 29, 1:28 pm, kostas <skola...@gmail.comwrote:

[...]
>Both the vector expansion and the set traversal are amortized linear
operations in the number of items.
>Which means very little on a modern machine. The time it takes
to execute a single operation can vary in significant
proportions depending on whether the required variables are in
registers, in cache, in main memory, or must be paged in. On
the machines I use, something like iter++, where iter is an
std::set<>::iterator, can vary between less than a microsecond
(everything in registers) to somewhere around 10 milliseconds
(if I get a page miss)---4 orders of magnitude.
[...]

I'm not sure if I'm missing the point here, but for the purposes of
complexity analysis, does it actually matter how long each individual
instruction is taking? If we've got a constant (i.e. independent of the
number of items) upper bound for the time taken for an instruction (however
large), then an operation which requires a number of instructions that is
linear in terms of the problem size will run in linear time. For example, if
I've got an operation on n items that requires no more than (say) 2n
instructions, with each instruction taking at most k milliseconds, then the
maximum time taken on a problem of size n is 2nk milliseconds, which (for k
not a function of n) is linear in n. The actual value of k isn't relevant to
the complexity analysis, even if it's relevant to the person implementing
the code.

Regards,
Stu
Apr 30 '07 #10
On Apr 29, 9:22 pm, James Kanze <james.ka...@gmail.comwrote:
I just did the test with 10 million members in the set. I get
similar results. I suspect that the lack of locality in node
based containers is the reason.
It's not only memory locality issues involved. It's also algorithmic
complexity. Set traversal is amortized linear but not so "linear" as
list traversal for example.

Apr 30 '07 #11
On Apr 30, 2:30 am, "Stuart Golodetz"
<stuart.golod...@NnOeSwP.AoMx.ac.ukwrote:
"James Kanze" <james.ka...@gmail.comwrote in message
news:11**********************@n59g2000hsh.googlegr oups.com...
On Apr 29, 1:28 pm, kostas <skola...@gmail.comwrote:
[...]
Both the vector expansion and the set traversal are amortized linear
operations in the number of items.
Which means very little on a modern machine. The time it takes
to execute a single operation can vary in significant
proportions depending on whether the required variables are in
registers, in cache, in main memory, or must be paged in. On
the machines I use, something like iter++, where iter is an
std::set<>::iterator, can vary between less than a microsecond
(everything in registers) to somewhere around 10 milliseconds
(if I get a page miss)---4 orders of magnitude.
[...]
I'm not sure if I'm missing the point here, but for the purposes of
complexity analysis, does it actually matter how long each individual
instruction is taking?
You're missing the point that what the original poster measured
was execution time. My point is, precisely, that complexity has
little relationship to execution time. It's a useful indicator
as to whether something will scale, but even then, for example,
an O(n) algorithm will suddenly become much, much slower if page
faults occur, and for very large data sets, locality can become
as important a consideration as complexity.
If we've got a constant (i.e. independent of the number of
items) upper bound for the time taken for an instruction
(however large), then an operation which requires a number of
instructions that is linear in terms of the problem size will
run in linear time.
No. That's only true if the time taken for an instruction is
independant of the size of the data set. Paging and caching
introduce a non-linear component into the execution time of
instructions, which can depend on the size of the data set.
For example, if
I've got an operation on n items that requires no more than (say) 2n
instructions, with each instruction taking at most k milliseconds, then the
maximum time taken on a problem of size n is 2nk milliseconds, which (fork
not a function of n) is linear in n. The actual value of k isn't relevantto
the complexity analysis, even if it's relevant to the person implementing
the code.
Once k has any dependence on n, your complexity analysis no
longer makes reasonable predictions concerning the size.
Saying, for example, that ++i takes a maximum of 10
milliseconds, and calculating bounds from that, doesn't mean
much if most of the time, it only takes 1 or 2 microseconds, and
if it will only take more than 5 milliseconds if the number of
elements depasses, say, 10 million. You still do get assymtotic
performance with enough elements, but the number of elements
needed to get there is so high that it won't occur in actual
practice. For performance reasons, if nothing else; once you
start getting page faults, your performance will end up being
unacceptable.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Apr 30 '07 #12
On Apr 30, 8:57 am, kostas <skola...@gmail.comwrote:
On Apr 29, 9:22 pm, James Kanze <james.ka...@gmail.comwrote:
I just did the test with 10 million members in the set. I get
similar results. I suspect that the lack of locality in node
based containers is the reason.
It's not only memory locality issues involved. It's also algorithmic
complexity. Set traversal is amortized linear but not so "linear" as
list traversal for example.
Both can play a role. Typically allocators will allocate
successively allocated objects close to one another, which means
that locality when iterating over a set will be better if the
set was created by insertions in order. A quick test of 50
million elements on a Linux based PC shows that iterating over a
set where the elements were inserted in a random order takes
roughly 8 or 9 times longer than iterating over one where they
were inserted in the final order (and that iterating over an
std::list is considerably faster than either---about 3 or 4
times faster than the sorted set). This is probably due
partially to a simpler algorithm (i.e. the elementary operation
of iter++ is faster, even if everything is already in the
cache), and partially due to better locality: the node of a list
is smaller, so more of them fit into each cache line.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Apr 30 '07 #13
James Kanze wrote:
On Apr 30, 2:30 am, "Stuart Golodetz"
<stuart.golod...@NnOeSwP.AoMx.ac.ukwrote:
>"James Kanze" <james.ka...@gmail.comwrote in message
>news:11**********************@n59g2000hsh.googleg roups.com...
On Apr 29, 1:28 pm, kostas <skola...@gmail.comwrote:
> [...]
>>>Both the vector expansion and the set traversal are amortized linear
operations in the number of items.
Which means very little on a modern machine. The time it takes
to execute a single operation can vary in significant
proportions depending on whether the required variables are in
registers, in cache, in main memory, or must be paged in. On
the machines I use, something like iter++, where iter is an
std::set<>::iterator, can vary between less than a microsecond
(everything in registers) to somewhere around 10 milliseconds
(if I get a page miss)---4 orders of magnitude.
>[...]
>I'm not sure if I'm missing the point here, but for the purposes of
complexity analysis, does it actually matter how long each individual
instruction is taking?

You're missing the point that what the original poster measured
was execution time. My point is, precisely, that complexity has
little relationship to execution time. It's a useful indicator
as to whether something will scale, but even then, for example,
an O(n) algorithm will suddenly become much, much slower if page
faults occur, and for very large data sets, locality can become
as important a consideration as complexity.
Agreed.
>If we've got a constant (i.e. independent of the number of
items) upper bound for the time taken for an instruction
(however large), then an operation which requires a number of
instructions that is linear in terms of the problem size will
run in linear time.

No. That's only true if the time taken for an instruction is
independant of the size of the data set.
I think that's what I said, actually, assuming we equate "number of
items" and "size of the data set".
Paging and caching
introduce a non-linear component into the execution time of
instructions, which can depend on the size of the data set.
Ah, this is the point which I missed. If the execution time for an
instruction is dependent on the problem size then it's a different
matter entirely. Thanks for clarifying :)
>For example, if
I've got an operation on n items that requires no more than (say) 2n
instructions, with each instruction taking at most k milliseconds, then the
maximum time taken on a problem of size n is 2nk milliseconds, which (for k
not a function of n) is linear in n. The actual value of k isn't relevant to
the complexity analysis, even if it's relevant to the person implementing
the code.

Once k has any dependence on n, your complexity analysis no
longer makes reasonable predictions concerning the size.
Saying, for example, that ++i takes a maximum of 10
milliseconds, and calculating bounds from that, doesn't mean
much if most of the time, it only takes 1 or 2 microseconds, and
if it will only take more than 5 milliseconds if the number of
elements depasses, say, 10 million. You still do get assymtotic
performance with enough elements, but the number of elements
needed to get there is so high that it won't occur in actual
practice. For performance reasons, if nothing else; once you
start getting page faults, your performance will end up being
unacceptable.
Once again, agreed. I think we seem to have a clear case of a discussion
heading along the lines of:

"If and only if X (about which I personally know little), then Y."
"Well, actually, not X."
"Ok, then we're agreed that not Y."

(In this particular instance, I think what I'm trying to say is that X
is "k has no dependence on n" and Y is "the complexity analysis given
above makes sense".)

No worries, anyway, was just curious what you were driving at :)

Regards,
Stu

[...]
Apr 30 '07 #14
On Apr 30, 5:14 pm, Stuart Golodetz
<stuart.golod...@nNeOwS.oPxA.aMc.ukwrote:
James Kanze wrote:
On Apr 30, 2:30 am, "Stuart Golodetz"
<stuart.golod...@NnOeSwP.AoMx.ac.ukwrote:
"James Kanze" <james.ka...@gmail.comwrote in message
[...]
Paging and caching
introduce a non-linear component into the execution time of
instructions, which can depend on the size of the data set.
Ah, this is the point which I missed. If the execution time for an
instruction is dependent on the problem size then it's a different
matter entirely. Thanks for clarifying :)
It's not necessarily directly dependant on the problem size, but
if the time for an instruction can vary over several magnitudes,
depending on locality of access, it starts meaning the locality
of access can be more significant than complexity with regards
to execution time.

Strictly speaking, complexity still wins out in the end; at
some point, you end up getting page faults even with the fastest
algorithm, and the individual operations become enormously
expensive everywhere. In practice, however, we aren't concerned
with such cases, since when (and if) we get there, the program
will take forever anyway; the solution isn't to find a faster
algorithm, but to buy more main memory:-).

BTW: I wasn't so much disagreeing with what you said, but
putting it into the practical context which was being discussed.
--
James Kanze (Gabi Software) email: ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

May 1 '07 #15
On Apr 28, 7:53 pm, kostas <skola...@gmail.comwrote:
Hi,
Trying to dump a set<int(s) in a vector (v) my guess was that
v.insert(v.end(),s.begin(),s.end())
would be at least as efficient as
copy(s.begin(), s.end(), back_inserter(v))
It turn out that it is almost two times slower (even without using
reserve() before copy) and that's because it traverses the set twice.
I'm posting it as an exception to the usual (and usually correct)
advise
"Prefer member functions to algorithms".
So, would be a good idea the addition of a insert variant that instead
of the last iterator in the [firts,last) insertion range would take as
argument the number of steps from the first iterator ?
In this case of back insertion would not do any difference but if you
would like to insert the set at v.begin()?

Regards
Kostas
May 1 '07 #16

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Steven C | last post by:
vector <int> vecThe; set <int> setThe; vecThe.insert (setThe.begin (), setThe.end ()); This gives an error about iterators not of the right type. Is there a way around this or do I have to...
4
by: Matt Garman | last post by:
Is there any difference, performance-wise, accessing elements of a vector using iterators or the subscript operator? In other words, say I have a vector of strings: vector<string> strvec; ...
11
by: Steve | last post by:
Hi, I'm using a std::vector to store a list of user defined objects. The vector may have well over 1000 elements, and I'm suffering a performance hit. If I use push_back I get a much worse...
5
by: pwalker | last post by:
Hi everyone, I am trying to get my head around iterators used on vectors. Let's take the code below: ------------- std::vector<intv1; std::vector<intv2; v1.push_back( 1 );
5
by: Boltar | last post by:
Hi Is there a way of inserting and erasing to/from a vector using an array index instead of an iterator or alternatively a way to map an index number to an iterator? eg: vector<intv;
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.