Inserting into std::set

Marcus Kwok

I am not sure if this is something that is covered by the Standard, or
if it's an implementation detail of my Standard Library.

I am reading in a large amount of data into a std::set. There is an
overload for std::set::insert() that takes in an iterator as a hint as
to where the new value should be inserted, and my implementation
(Dinkumware) says that if the hint is good (meaning the iterator points
immediately before or after where the inserted value should go) then the
insertion can happen in amortized constant time rather than logarithmic
time.

I would like to take advantage of this fact. My input data should
already be in sorted order, therefore I think I can use my_set.end() as
the hint to insert(). Is this a valid assumption, or does this depend
on how std::set is actually implemented?

--
Marcus Kwok
Replace 'invalid' with 'net' to reply

Oct 26 '06 #1

Subscribe Post Reply

3487

David Harmon

On Thu, 26 Oct 2006 18:48:57 +0000 (UTC) in comp.lang.c++,
ri******@gehennom.invalid (Marcus Kwok) wrote,

>I would like to take advantage of this fact. My input data should
already be in sorted order, therefore I think I can use my_set.end() as
the hint to insert(). Is this a valid assumption, or does this depend
on how std::set is actually implemented?

Sounds valid to me. However, I think that I would prefer to use the
iterator returned by the previous insert() call.

Oct 26 '06 #2

Victor Bazarov

Marcus Kwok wrote:

I am not sure if this is something that is covered by the Standard, or
if it's an implementation detail of my Standard Library.

I am reading in a large amount of data into a std::set. There is an
overload for std::set::insert() that takes in an iterator as a hint as
to where the new value should be inserted, and my implementation
(Dinkumware) says that if the hint is good (meaning the iterator
points immediately before or after where the inserted value should
go) then the insertion can happen in amortized constant time rather
than logarithmic time.

I would like to take advantage of this fact. My input data should
already be in sorted order, therefore I think I can use my_set.end()
as the hint to insert(). Is this a valid assumption, or does this
depend on how std::set is actually implemented?

Why don't you try it both ways and compare the time it takes your program
to form your 'set'?

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask

Oct 26 '06 #3

Marcus Kwok

Marcus Kwok wrote:
[using a hint when inserting into std::set]

Victor Bazarov <v.********@comacast.netwrote:

Why don't you try it both ways and compare the time it takes your program
to form your 'set'?

OK, so I tested three different versions of my code: one where no hint
is supplied, one where I use my_set.end() as the hint, and one where I
use the iterator returned from the previous iteration as the hint. All
of them were pretty close. Interestingly, the one with no hint was
actually the fastest, but only very slightly faster than using end() as
the hint (I will consider them the same due to the resolution of my
timing results). Also interesting was that the one using the previous
insertion's iterator was the slowest, but only by about 3%.

Since the timing differences are fairly negligible (and also not
definitive since I had other processes running at the same time), I
think I will just stick to the simple version of insert(), since it is
the clearest.

--
Marcus Kwok
Replace 'invalid' with 'net' to reply

Oct 26 '06 #4

Victor Bazarov

Marcus Kwok wrote:

>Marcus Kwok wrote:
[using a hint when inserting into std::set]

Victor Bazarov <v.********@comacast.netwrote:
>Why don't you try it both ways and compare the time it takes your
program to form your 'set'?

OK, so I tested three different versions of my code: one where no hint
is supplied, one where I use my_set.end() as the hint, and one where I
use the iterator returned from the previous iteration as the hint.
All of them were pretty close. Interestingly, the one with no hint
was actually the fastest, but only very slightly faster than using
end() as the hint (I will consider them the same due to the
resolution of my timing results). Also interesting was that the one
using the previous insertion's iterator was the slowest, but only by
about 3%.

Since the timing differences are fairly negligible (and also not
definitive since I had other processes running at the same time), I
think I will just stick to the simple version of insert(), since it is
the clearest.

While doing that, have you tried looking at the implementation of
the 'insert' without the argument? Could it be that it actually falls
back onto 'insert' with a hint and gives 'end()' as the hint? Not that
it should make much of a difference for you, of course. Just curious,
I guess.

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask

Oct 26 '06 #5

Marcus Kwok

Victor Bazarov <v.********@comacast.netwrote:

While doing that, have you tried looking at the implementation of
the 'insert' without the argument? Could it be that it actually falls
back onto 'insert' with a hint and gives 'end()' as the hint? Not that
it should make much of a difference for you, of course. Just curious,
I guess.

Well, it looks like on my implementation, std::set is implemented in
terms of a Red-Black tree. As far as I can tell, the overload without a
hint will just traverse the tree to find the right spot. I can't really
see what exactly the version with a hint is doing... all those leading
underscores make my eyes hurt :)

--
Marcus Kwok
Replace 'invalid' with 'net' to reply

Oct 26 '06 #6

Mark P

Marcus Kwok wrote:

>Marcus Kwok wrote:
[using a hint when inserting into std::set]

Victor Bazarov <v.********@comacast.netwrote:
>Why don't you try it both ways and compare the time it takes your program
to form your 'set'?

OK, so I tested three different versions of my code: one where no hint
is supplied, one where I use my_set.end() as the hint, and one where I
use the iterator returned from the previous iteration as the hint. All
of them were pretty close. Interestingly, the one with no hint was
actually the fastest, but only very slightly faster than using end() as
the hint (I will consider them the same due to the resolution of my
timing results). Also interesting was that the one using the previous
insertion's iterator was the slowest, but only by about 3%.

Since the timing differences are fairly negligible (and also not
definitive since I had other processes running at the same time), I
think I will just stick to the simple version of insert(), since it is
the clearest.

Keep in mind that inserting elements into a set in sorted order may be
far from optimal and that the run time may be dominated by the
rebalancing required to maintain the red-black tree.

Oct 26 '06 #7

Marcus Kwok

Mark P <us****@fall2005remove.fastmailcaps.fmwrote:

Keep in mind that inserting elements into a set in sorted order may be
far from optimal and that the run time may be dominated by the
rebalancing required to maintain the red-black tree.

Hmm, OK, in that case let me add something to my situation. I have a
large list of data that I am adding to my container in sorted order.
There is a possibility of duplicates, but we do not want to add
duplicates. Initially, I was pushing the data to the back of a vector,
but checking if the element was already in the vector before pushing it
back. This repeated searching was very inefficient and was what caused
me to redesign it using a set, since I can just insert without needing
to check before. Using a set instead of the previous method with vector
caused a 4-5x speed increase.

Do you have another suggestion for something I can try?

--
Marcus Kwok
Replace 'invalid' with 'net' to reply

Oct 27 '06 #8

Pete Becker

Marcus Kwok wrote:

Mark P <us****@fall2005remove.fastmailcaps.fmwrote:
>Keep in mind that inserting elements into a set in sorted order may be
far from optimal and that the run time may be dominated by the
rebalancing required to maintain the red-black tree.

Hmm, OK, in that case let me add something to my situation. I have a
large list of data that I am adding to my container in sorted order.
There is a possibility of duplicates, but we do not want to add
duplicates. Initially, I was pushing the data to the back of a vector,
but checking if the element was already in the vector before pushing it
back. This repeated searching was very inefficient and was what caused
me to redesign it using a set, since I can just insert without needing
to check before. Using a set instead of the previous method with vector
caused a 4-5x speed increase.

Do you have another suggestion for something I can try?

std::unique or std::unique_copy.

--

-- Pete

Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." For more information about this book, see
www.petebecker.com/tr1book.

Oct 27 '06 #9

David Harmon

On Fri, 27 Oct 2006 13:20:13 +0000 (UTC) in comp.lang.c++,
ri******@gehennom.invalid (Marcus Kwok) wrote,

>Mark P <us****@fall2005remove.fastmailcaps.fmwrote:
>Keep in mind that inserting elements into a set in sorted order may be
far from optimal and that the run time may be dominated by the
rebalancing required to maintain the red-black tree.

Hmm, OK, in that case let me add something to my situation. I have a
large list of data that I am adding to my container in sorted order.
There is a possibility of duplicates, but we do not want to add
duplicates. Initially, I was pushing the data to the back of a vector,
but checking if the element was already in the vector before pushing it
back. This repeated searching was very inefficient and was what caused
me to redesign it using a set, since I can just insert without needing
to check before. Using a set instead of the previous method with vector
caused a 4-5x speed increase.

Do you have another suggestion for something I can try?

If your data is sorted, then you only need to check if each item is
the same as the previous, right? And at the same time, a check if
the item is "less than" the previous will verify that it really is
sorted.

Oct 27 '06 #10

Marcus Kwok

Pete Becker <pe********@acm.orgwrote:

Marcus Kwok wrote:
>Mark P <us****@fall2005remove.fastmailcaps.fmwrote:
>>Keep in mind that inserting elements into a set in sorted order may be
far from optimal and that the run time may be dominated by the
rebalancing required to maintain the red-black tree.

Hmm, OK, in that case let me add something to my situation. I have a
large list of data that I am adding to my container in sorted order.
There is a possibility of duplicates, but we do not want to add
duplicates. Initially, I was pushing the data to the back of a vector,
but checking if the element was already in the vector before pushing it
back. This repeated searching was very inefficient and was what caused
me to redesign it using a set, since I can just insert without needing
to check before. Using a set instead of the previous method with vector
caused a 4-5x speed increase.

Do you have another suggestion for something I can try?

std::unique or std::unique_copy.

Thanks, I'll look into them.

--
Marcus Kwok
Replace 'invalid' with 'net' to reply

Oct 27 '06 #11

Marcus Kwok

David Harmon <so****@netcom.comwrote:

If your data is sorted, then you only need to check if each item is
the same as the previous, right? And at the same time, a check if
the item is "less than" the previous will verify that it really is
sorted.

That's a good idea. Unfortunately, I should have said that our data
"should be" sorted before reading in, but there is the possibility that
something might come in out of order... though this possibility is
rather low, I still need to handle it properly in case it isn't.

Thanks for your input.

--
Marcus Kwok
Replace 'invalid' with 'net' to reply

Oct 27 '06 #12

Mark P

Marcus Kwok wrote:

David Harmon <so****@netcom.comwrote:
>If your data is sorted, then you only need to check if each item is
the same as the previous, right? And at the same time, a check if
the item is "less than" the previous will verify that it really is
sorted.

That's a good idea. Unfortunately, I should have said that our data
"should be" sorted before reading in, but there is the possibility that
something might come in out of order... though this possibility is
rather low, I still need to handle it properly in case it isn't.

Thanks for your input.

So apply std::sort to the initial data, then apply std::unique or
std::unique_copy as Pete Becker suggested.

As a rule of thumb, if the data only needs to be sorted once, then a
dynamically sorted container such as std::set is probably overkill
(read: inefficient). std::set is nice if you need to sort "online", but
red-black tree algorithms are much more complex than simple sorting.

Oct 27 '06 #13

Similar topics

std::set

by: ma740988 | last post by:

The position returned via the STL std::set container never made much sense to me. When you insert elements within the container, the position returned - via find - does not reflect the actual...

C / C++

which type should "std::set::begin() const" return?

by: snnn | last post by:

On the book <Generic Programming and the STL>( Matthew . H . Austern ),this function is defined as iterator set::begin() const. However, why should a const object returns a non-const iterator?...

C / C++

std::map<int,std::set<std::string> > Wrong? (Segmentation fault.)

by: Peter Jansson | last post by:

Hello, I have the following code: std::map<int,std::set<std::string> > k; k="1234567890"; k="2345678901"; //... std::set<std::string> myMethod(std::map<int,std::set<std::string> > k)...

C / C++

can std::set hold pointers to keys instead of the keys themselves?

by: danibe | last post by:

I never had any problems storing pointers in STL containers such std::vector and std::map. The benefit of storing pointers instead of the objects themselves is mainly saving memory resources and...

C / C++

How does std::set stay unique with only std::less?

by: Cory Nelson | last post by:

Does anyone know how std::set prevents duplicates using only std::less? I've tried looking through a couple of the STL implementations and their code is pretty unreadable (to allow for different...

C / C++

Help! How to access to std::set elements?

by: shuisheng | last post by:

Dear All, std::set is sorted. So I am wondering is there any fast way to access (sucn as random access) to its elements just like std::vector. Assume I have a set std::set<inta; So I can...

C / C++

A question about the std::set<>::iterator

by: Renzr | last post by:

I have a problem about the std::set<>iterator. After finding a term in the std::set<>, i want to know the distance from the current term to the begin(). But i have got a error. Please offer me...

C / C++

Does a std::set ever rebalance ?

by: mathieu | last post by:

hi there, I would like to know if the following piece of code is garantee to work. I am afraid that the cstring address I am using in the std::map found from a request in std::set is not...

C / C++

derive from std::set, const_iterator does not work

by: Markus Dehmann | last post by:

I want to derive from std::set, like shown below. But when I try to declare an iterator over the contained elements I get an error, see the twp uncommented lines: #include <set> template<class...

C / C++

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General