473,406 Members | 2,390 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Inserting into std::set

I am not sure if this is something that is covered by the Standard, or
if it's an implementation detail of my Standard Library.

I am reading in a large amount of data into a std::set. There is an
overload for std::set::insert() that takes in an iterator as a hint as
to where the new value should be inserted, and my implementation
(Dinkumware) says that if the hint is good (meaning the iterator points
immediately before or after where the inserted value should go) then the
insertion can happen in amortized constant time rather than logarithmic
time.

I would like to take advantage of this fact. My input data should
already be in sorted order, therefore I think I can use my_set.end() as
the hint to insert(). Is this a valid assumption, or does this depend
on how std::set is actually implemented?

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
Oct 26 '06 #1
12 3487
On Thu, 26 Oct 2006 18:48:57 +0000 (UTC) in comp.lang.c++,
ri******@gehennom.invalid (Marcus Kwok) wrote,
>I would like to take advantage of this fact. My input data should
already be in sorted order, therefore I think I can use my_set.end() as
the hint to insert(). Is this a valid assumption, or does this depend
on how std::set is actually implemented?
Sounds valid to me. However, I think that I would prefer to use the
iterator returned by the previous insert() call.

Oct 26 '06 #2
Marcus Kwok wrote:
I am not sure if this is something that is covered by the Standard, or
if it's an implementation detail of my Standard Library.

I am reading in a large amount of data into a std::set. There is an
overload for std::set::insert() that takes in an iterator as a hint as
to where the new value should be inserted, and my implementation
(Dinkumware) says that if the hint is good (meaning the iterator
points immediately before or after where the inserted value should
go) then the insertion can happen in amortized constant time rather
than logarithmic time.

I would like to take advantage of this fact. My input data should
already be in sorted order, therefore I think I can use my_set.end()
as the hint to insert(). Is this a valid assumption, or does this
depend on how std::set is actually implemented?
Why don't you try it both ways and compare the time it takes your program
to form your 'set'?

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Oct 26 '06 #3
Marcus Kwok wrote:
[using a hint when inserting into std::set]

Victor Bazarov <v.********@comacast.netwrote:
Why don't you try it both ways and compare the time it takes your program
to form your 'set'?
OK, so I tested three different versions of my code: one where no hint
is supplied, one where I use my_set.end() as the hint, and one where I
use the iterator returned from the previous iteration as the hint. All
of them were pretty close. Interestingly, the one with no hint was
actually the fastest, but only very slightly faster than using end() as
the hint (I will consider them the same due to the resolution of my
timing results). Also interesting was that the one using the previous
insertion's iterator was the slowest, but only by about 3%.

Since the timing differences are fairly negligible (and also not
definitive since I had other processes running at the same time), I
think I will just stick to the simple version of insert(), since it is
the clearest.

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
Oct 26 '06 #4
Marcus Kwok wrote:
>Marcus Kwok wrote:
[using a hint when inserting into std::set]

Victor Bazarov <v.********@comacast.netwrote:
>Why don't you try it both ways and compare the time it takes your
program to form your 'set'?

OK, so I tested three different versions of my code: one where no hint
is supplied, one where I use my_set.end() as the hint, and one where I
use the iterator returned from the previous iteration as the hint.
All of them were pretty close. Interestingly, the one with no hint
was actually the fastest, but only very slightly faster than using
end() as the hint (I will consider them the same due to the
resolution of my timing results). Also interesting was that the one
using the previous insertion's iterator was the slowest, but only by
about 3%.

Since the timing differences are fairly negligible (and also not
definitive since I had other processes running at the same time), I
think I will just stick to the simple version of insert(), since it is
the clearest.
While doing that, have you tried looking at the implementation of
the 'insert' without the argument? Could it be that it actually falls
back onto 'insert' with a hint and gives 'end()' as the hint? Not that
it should make much of a difference for you, of course. Just curious,
I guess.

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Oct 26 '06 #5
Victor Bazarov <v.********@comacast.netwrote:
While doing that, have you tried looking at the implementation of
the 'insert' without the argument? Could it be that it actually falls
back onto 'insert' with a hint and gives 'end()' as the hint? Not that
it should make much of a difference for you, of course. Just curious,
I guess.
Well, it looks like on my implementation, std::set is implemented in
terms of a Red-Black tree. As far as I can tell, the overload without a
hint will just traverse the tree to find the right spot. I can't really
see what exactly the version with a hint is doing... all those leading
underscores make my eyes hurt :)

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
Oct 26 '06 #6
Marcus Kwok wrote:
>Marcus Kwok wrote:
[using a hint when inserting into std::set]

Victor Bazarov <v.********@comacast.netwrote:
>Why don't you try it both ways and compare the time it takes your program
to form your 'set'?

OK, so I tested three different versions of my code: one where no hint
is supplied, one where I use my_set.end() as the hint, and one where I
use the iterator returned from the previous iteration as the hint. All
of them were pretty close. Interestingly, the one with no hint was
actually the fastest, but only very slightly faster than using end() as
the hint (I will consider them the same due to the resolution of my
timing results). Also interesting was that the one using the previous
insertion's iterator was the slowest, but only by about 3%.

Since the timing differences are fairly negligible (and also not
definitive since I had other processes running at the same time), I
think I will just stick to the simple version of insert(), since it is
the clearest.
Keep in mind that inserting elements into a set in sorted order may be
far from optimal and that the run time may be dominated by the
rebalancing required to maintain the red-black tree.
Oct 26 '06 #7
Mark P <us****@fall2005remove.fastmailcaps.fmwrote:
Keep in mind that inserting elements into a set in sorted order may be
far from optimal and that the run time may be dominated by the
rebalancing required to maintain the red-black tree.
Hmm, OK, in that case let me add something to my situation. I have a
large list of data that I am adding to my container in sorted order.
There is a possibility of duplicates, but we do not want to add
duplicates. Initially, I was pushing the data to the back of a vector,
but checking if the element was already in the vector before pushing it
back. This repeated searching was very inefficient and was what caused
me to redesign it using a set, since I can just insert without needing
to check before. Using a set instead of the previous method with vector
caused a 4-5x speed increase.

Do you have another suggestion for something I can try?

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
Oct 27 '06 #8
Marcus Kwok wrote:
Mark P <us****@fall2005remove.fastmailcaps.fmwrote:
>Keep in mind that inserting elements into a set in sorted order may be
far from optimal and that the run time may be dominated by the
rebalancing required to maintain the red-black tree.

Hmm, OK, in that case let me add something to my situation. I have a
large list of data that I am adding to my container in sorted order.
There is a possibility of duplicates, but we do not want to add
duplicates. Initially, I was pushing the data to the back of a vector,
but checking if the element was already in the vector before pushing it
back. This repeated searching was very inefficient and was what caused
me to redesign it using a set, since I can just insert without needing
to check before. Using a set instead of the previous method with vector
caused a 4-5x speed increase.

Do you have another suggestion for something I can try?
std::unique or std::unique_copy.

--

-- Pete

Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." For more information about this book, see
www.petebecker.com/tr1book.
Oct 27 '06 #9
On Fri, 27 Oct 2006 13:20:13 +0000 (UTC) in comp.lang.c++,
ri******@gehennom.invalid (Marcus Kwok) wrote,
>Mark P <us****@fall2005remove.fastmailcaps.fmwrote:
>Keep in mind that inserting elements into a set in sorted order may be
far from optimal and that the run time may be dominated by the
rebalancing required to maintain the red-black tree.

Hmm, OK, in that case let me add something to my situation. I have a
large list of data that I am adding to my container in sorted order.
There is a possibility of duplicates, but we do not want to add
duplicates. Initially, I was pushing the data to the back of a vector,
but checking if the element was already in the vector before pushing it
back. This repeated searching was very inefficient and was what caused
me to redesign it using a set, since I can just insert without needing
to check before. Using a set instead of the previous method with vector
caused a 4-5x speed increase.

Do you have another suggestion for something I can try?
If your data is sorted, then you only need to check if each item is
the same as the previous, right? And at the same time, a check if
the item is "less than" the previous will verify that it really is
sorted.
Oct 27 '06 #10
Pete Becker <pe********@acm.orgwrote:
Marcus Kwok wrote:
>Mark P <us****@fall2005remove.fastmailcaps.fmwrote:
>>Keep in mind that inserting elements into a set in sorted order may be
far from optimal and that the run time may be dominated by the
rebalancing required to maintain the red-black tree.

Hmm, OK, in that case let me add something to my situation. I have a
large list of data that I am adding to my container in sorted order.
There is a possibility of duplicates, but we do not want to add
duplicates. Initially, I was pushing the data to the back of a vector,
but checking if the element was already in the vector before pushing it
back. This repeated searching was very inefficient and was what caused
me to redesign it using a set, since I can just insert without needing
to check before. Using a set instead of the previous method with vector
caused a 4-5x speed increase.

Do you have another suggestion for something I can try?

std::unique or std::unique_copy.
Thanks, I'll look into them.

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
Oct 27 '06 #11
David Harmon <so****@netcom.comwrote:
If your data is sorted, then you only need to check if each item is
the same as the previous, right? And at the same time, a check if
the item is "less than" the previous will verify that it really is
sorted.
That's a good idea. Unfortunately, I should have said that our data
"should be" sorted before reading in, but there is the possibility that
something might come in out of order... though this possibility is
rather low, I still need to handle it properly in case it isn't.

Thanks for your input.

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
Oct 27 '06 #12
Marcus Kwok wrote:
David Harmon <so****@netcom.comwrote:
>If your data is sorted, then you only need to check if each item is
the same as the previous, right? And at the same time, a check if
the item is "less than" the previous will verify that it really is
sorted.

That's a good idea. Unfortunately, I should have said that our data
"should be" sorted before reading in, but there is the possibility that
something might come in out of order... though this possibility is
rather low, I still need to handle it properly in case it isn't.

Thanks for your input.
So apply std::sort to the initial data, then apply std::unique or
std::unique_copy as Pete Becker suggested.

As a rule of thumb, if the data only needs to be sorted once, then a
dynamically sorted container such as std::set is probably overkill
(read: inefficient). std::set is nice if you need to sort "online", but
red-black tree algorithms are much more complex than simple sorting.
Oct 27 '06 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: ma740988 | last post by:
The position returned via the STL std::set container never made much sense to me. When you insert elements within the container, the position returned - via find - does not reflect the actual...
11
by: snnn | last post by:
On the book <Generic Programming and the STL>( Matthew . H . Austern ),this function is defined as iterator set::begin() const. However, why should a const object returns a non-const iterator?...
5
by: Peter Jansson | last post by:
Hello, I have the following code: std::map<int,std::set<std::string> > k; k="1234567890"; k="2345678901"; //... std::set<std::string> myMethod(std::map<int,std::set<std::string> > k)...
10
by: danibe | last post by:
I never had any problems storing pointers in STL containers such std::vector and std::map. The benefit of storing pointers instead of the objects themselves is mainly saving memory resources and...
16
by: Cory Nelson | last post by:
Does anyone know how std::set prevents duplicates using only std::less? I've tried looking through a couple of the STL implementations and their code is pretty unreadable (to allow for different...
2
by: shuisheng | last post by:
Dear All, std::set is sorted. So I am wondering is there any fast way to access (sucn as random access) to its elements just like std::vector. Assume I have a set std::set<inta; So I can...
7
by: Renzr | last post by:
I have a problem about the std::set<>iterator. After finding a term in the std::set<>, i want to know the distance from the current term to the begin(). But i have got a error. Please offer me...
2
by: mathieu | last post by:
hi there, I would like to know if the following piece of code is garantee to work. I am afraid that the cstring address I am using in the std::map found from a request in std::set is not...
2
by: Markus Dehmann | last post by:
I want to derive from std::set, like shown below. But when I try to declare an iterator over the contained elements I get an error, see the twp uncommented lines: #include <set> template<class...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.