By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,110 Members | 990 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,110 IT Pros & Developers. It's quick & easy.

Inserting into std::set

P: n/a
I am not sure if this is something that is covered by the Standard, or
if it's an implementation detail of my Standard Library.

I am reading in a large amount of data into a std::set. There is an
overload for std::set::insert() that takes in an iterator as a hint as
to where the new value should be inserted, and my implementation
(Dinkumware) says that if the hint is good (meaning the iterator points
immediately before or after where the inserted value should go) then the
insertion can happen in amortized constant time rather than logarithmic
time.

I would like to take advantage of this fact. My input data should
already be in sorted order, therefore I think I can use my_set.end() as
the hint to insert(). Is this a valid assumption, or does this depend
on how std::set is actually implemented?

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
Oct 26 '06 #1
Share this Question
Share on Google+
12 Replies


P: n/a
On Thu, 26 Oct 2006 18:48:57 +0000 (UTC) in comp.lang.c++,
ri******@gehennom.invalid (Marcus Kwok) wrote,
>I would like to take advantage of this fact. My input data should
already be in sorted order, therefore I think I can use my_set.end() as
the hint to insert(). Is this a valid assumption, or does this depend
on how std::set is actually implemented?
Sounds valid to me. However, I think that I would prefer to use the
iterator returned by the previous insert() call.

Oct 26 '06 #2

P: n/a
Marcus Kwok wrote:
I am not sure if this is something that is covered by the Standard, or
if it's an implementation detail of my Standard Library.

I am reading in a large amount of data into a std::set. There is an
overload for std::set::insert() that takes in an iterator as a hint as
to where the new value should be inserted, and my implementation
(Dinkumware) says that if the hint is good (meaning the iterator
points immediately before or after where the inserted value should
go) then the insertion can happen in amortized constant time rather
than logarithmic time.

I would like to take advantage of this fact. My input data should
already be in sorted order, therefore I think I can use my_set.end()
as the hint to insert(). Is this a valid assumption, or does this
depend on how std::set is actually implemented?
Why don't you try it both ways and compare the time it takes your program
to form your 'set'?

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Oct 26 '06 #3

P: n/a
Marcus Kwok wrote:
[using a hint when inserting into std::set]

Victor Bazarov <v.********@comacast.netwrote:
Why don't you try it both ways and compare the time it takes your program
to form your 'set'?
OK, so I tested three different versions of my code: one where no hint
is supplied, one where I use my_set.end() as the hint, and one where I
use the iterator returned from the previous iteration as the hint. All
of them were pretty close. Interestingly, the one with no hint was
actually the fastest, but only very slightly faster than using end() as
the hint (I will consider them the same due to the resolution of my
timing results). Also interesting was that the one using the previous
insertion's iterator was the slowest, but only by about 3%.

Since the timing differences are fairly negligible (and also not
definitive since I had other processes running at the same time), I
think I will just stick to the simple version of insert(), since it is
the clearest.

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
Oct 26 '06 #4

P: n/a
Marcus Kwok wrote:
>Marcus Kwok wrote:
[using a hint when inserting into std::set]

Victor Bazarov <v.********@comacast.netwrote:
>Why don't you try it both ways and compare the time it takes your
program to form your 'set'?

OK, so I tested three different versions of my code: one where no hint
is supplied, one where I use my_set.end() as the hint, and one where I
use the iterator returned from the previous iteration as the hint.
All of them were pretty close. Interestingly, the one with no hint
was actually the fastest, but only very slightly faster than using
end() as the hint (I will consider them the same due to the
resolution of my timing results). Also interesting was that the one
using the previous insertion's iterator was the slowest, but only by
about 3%.

Since the timing differences are fairly negligible (and also not
definitive since I had other processes running at the same time), I
think I will just stick to the simple version of insert(), since it is
the clearest.
While doing that, have you tried looking at the implementation of
the 'insert' without the argument? Could it be that it actually falls
back onto 'insert' with a hint and gives 'end()' as the hint? Not that
it should make much of a difference for you, of course. Just curious,
I guess.

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Oct 26 '06 #5

P: n/a
Victor Bazarov <v.********@comacast.netwrote:
While doing that, have you tried looking at the implementation of
the 'insert' without the argument? Could it be that it actually falls
back onto 'insert' with a hint and gives 'end()' as the hint? Not that
it should make much of a difference for you, of course. Just curious,
I guess.
Well, it looks like on my implementation, std::set is implemented in
terms of a Red-Black tree. As far as I can tell, the overload without a
hint will just traverse the tree to find the right spot. I can't really
see what exactly the version with a hint is doing... all those leading
underscores make my eyes hurt :)

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
Oct 26 '06 #6

P: n/a
Marcus Kwok wrote:
>Marcus Kwok wrote:
[using a hint when inserting into std::set]

Victor Bazarov <v.********@comacast.netwrote:
>Why don't you try it both ways and compare the time it takes your program
to form your 'set'?

OK, so I tested three different versions of my code: one where no hint
is supplied, one where I use my_set.end() as the hint, and one where I
use the iterator returned from the previous iteration as the hint. All
of them were pretty close. Interestingly, the one with no hint was
actually the fastest, but only very slightly faster than using end() as
the hint (I will consider them the same due to the resolution of my
timing results). Also interesting was that the one using the previous
insertion's iterator was the slowest, but only by about 3%.

Since the timing differences are fairly negligible (and also not
definitive since I had other processes running at the same time), I
think I will just stick to the simple version of insert(), since it is
the clearest.
Keep in mind that inserting elements into a set in sorted order may be
far from optimal and that the run time may be dominated by the
rebalancing required to maintain the red-black tree.
Oct 26 '06 #7

P: n/a
Mark P <us****@fall2005remove.fastmailcaps.fmwrote:
Keep in mind that inserting elements into a set in sorted order may be
far from optimal and that the run time may be dominated by the
rebalancing required to maintain the red-black tree.
Hmm, OK, in that case let me add something to my situation. I have a
large list of data that I am adding to my container in sorted order.
There is a possibility of duplicates, but we do not want to add
duplicates. Initially, I was pushing the data to the back of a vector,
but checking if the element was already in the vector before pushing it
back. This repeated searching was very inefficient and was what caused
me to redesign it using a set, since I can just insert without needing
to check before. Using a set instead of the previous method with vector
caused a 4-5x speed increase.

Do you have another suggestion for something I can try?

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
Oct 27 '06 #8

P: n/a
Marcus Kwok wrote:
Mark P <us****@fall2005remove.fastmailcaps.fmwrote:
>Keep in mind that inserting elements into a set in sorted order may be
far from optimal and that the run time may be dominated by the
rebalancing required to maintain the red-black tree.

Hmm, OK, in that case let me add something to my situation. I have a
large list of data that I am adding to my container in sorted order.
There is a possibility of duplicates, but we do not want to add
duplicates. Initially, I was pushing the data to the back of a vector,
but checking if the element was already in the vector before pushing it
back. This repeated searching was very inefficient and was what caused
me to redesign it using a set, since I can just insert without needing
to check before. Using a set instead of the previous method with vector
caused a 4-5x speed increase.

Do you have another suggestion for something I can try?
std::unique or std::unique_copy.

--

-- Pete

Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." For more information about this book, see
www.petebecker.com/tr1book.
Oct 27 '06 #9

P: n/a
On Fri, 27 Oct 2006 13:20:13 +0000 (UTC) in comp.lang.c++,
ri******@gehennom.invalid (Marcus Kwok) wrote,
>Mark P <us****@fall2005remove.fastmailcaps.fmwrote:
>Keep in mind that inserting elements into a set in sorted order may be
far from optimal and that the run time may be dominated by the
rebalancing required to maintain the red-black tree.

Hmm, OK, in that case let me add something to my situation. I have a
large list of data that I am adding to my container in sorted order.
There is a possibility of duplicates, but we do not want to add
duplicates. Initially, I was pushing the data to the back of a vector,
but checking if the element was already in the vector before pushing it
back. This repeated searching was very inefficient and was what caused
me to redesign it using a set, since I can just insert without needing
to check before. Using a set instead of the previous method with vector
caused a 4-5x speed increase.

Do you have another suggestion for something I can try?
If your data is sorted, then you only need to check if each item is
the same as the previous, right? And at the same time, a check if
the item is "less than" the previous will verify that it really is
sorted.
Oct 27 '06 #10

P: n/a
Pete Becker <pe********@acm.orgwrote:
Marcus Kwok wrote:
>Mark P <us****@fall2005remove.fastmailcaps.fmwrote:
>>Keep in mind that inserting elements into a set in sorted order may be
far from optimal and that the run time may be dominated by the
rebalancing required to maintain the red-black tree.

Hmm, OK, in that case let me add something to my situation. I have a
large list of data that I am adding to my container in sorted order.
There is a possibility of duplicates, but we do not want to add
duplicates. Initially, I was pushing the data to the back of a vector,
but checking if the element was already in the vector before pushing it
back. This repeated searching was very inefficient and was what caused
me to redesign it using a set, since I can just insert without needing
to check before. Using a set instead of the previous method with vector
caused a 4-5x speed increase.

Do you have another suggestion for something I can try?

std::unique or std::unique_copy.
Thanks, I'll look into them.

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
Oct 27 '06 #11

P: n/a
David Harmon <so****@netcom.comwrote:
If your data is sorted, then you only need to check if each item is
the same as the previous, right? And at the same time, a check if
the item is "less than" the previous will verify that it really is
sorted.
That's a good idea. Unfortunately, I should have said that our data
"should be" sorted before reading in, but there is the possibility that
something might come in out of order... though this possibility is
rather low, I still need to handle it properly in case it isn't.

Thanks for your input.

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
Oct 27 '06 #12

P: n/a
Marcus Kwok wrote:
David Harmon <so****@netcom.comwrote:
>If your data is sorted, then you only need to check if each item is
the same as the previous, right? And at the same time, a check if
the item is "less than" the previous will verify that it really is
sorted.

That's a good idea. Unfortunately, I should have said that our data
"should be" sorted before reading in, but there is the possibility that
something might come in out of order... though this possibility is
rather low, I still need to handle it properly in case it isn't.

Thanks for your input.
So apply std::sort to the initial data, then apply std::unique or
std::unique_copy as Pete Becker suggested.

As a rule of thumb, if the data only needs to be sorted once, then a
dynamically sorted container such as std::set is probably overkill
(read: inefficient). std::set is nice if you need to sort "online", but
red-black tree algorithms are much more complex than simple sorting.
Oct 27 '06 #13

This discussion thread is closed

Replies have been disabled for this discussion.