473,770 Members | 2,153 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Inserting into std::set

I am not sure if this is something that is covered by the Standard, or
if it's an implementation detail of my Standard Library.

I am reading in a large amount of data into a std::set. There is an
overload for std::set::inser t() that takes in an iterator as a hint as
to where the new value should be inserted, and my implementation
(Dinkumware) says that if the hint is good (meaning the iterator points
immediately before or after where the inserted value should go) then the
insertion can happen in amortized constant time rather than logarithmic
time.

I would like to take advantage of this fact. My input data should
already be in sorted order, therefore I think I can use my_set.end() as
the hint to insert(). Is this a valid assumption, or does this depend
on how std::set is actually implemented?

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
Oct 26 '06 #1
12 3522
On Thu, 26 Oct 2006 18:48:57 +0000 (UTC) in comp.lang.c++,
ri******@gehenn om.invalid (Marcus Kwok) wrote,
>I would like to take advantage of this fact. My input data should
already be in sorted order, therefore I think I can use my_set.end() as
the hint to insert(). Is this a valid assumption, or does this depend
on how std::set is actually implemented?
Sounds valid to me. However, I think that I would prefer to use the
iterator returned by the previous insert() call.

Oct 26 '06 #2
Marcus Kwok wrote:
I am not sure if this is something that is covered by the Standard, or
if it's an implementation detail of my Standard Library.

I am reading in a large amount of data into a std::set. There is an
overload for std::set::inser t() that takes in an iterator as a hint as
to where the new value should be inserted, and my implementation
(Dinkumware) says that if the hint is good (meaning the iterator
points immediately before or after where the inserted value should
go) then the insertion can happen in amortized constant time rather
than logarithmic time.

I would like to take advantage of this fact. My input data should
already be in sorted order, therefore I think I can use my_set.end()
as the hint to insert(). Is this a valid assumption, or does this
depend on how std::set is actually implemented?
Why don't you try it both ways and compare the time it takes your program
to form your 'set'?

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Oct 26 '06 #3
Marcus Kwok wrote:
[using a hint when inserting into std::set]

Victor Bazarov <v.********@com acast.netwrote:
Why don't you try it both ways and compare the time it takes your program
to form your 'set'?
OK, so I tested three different versions of my code: one where no hint
is supplied, one where I use my_set.end() as the hint, and one where I
use the iterator returned from the previous iteration as the hint. All
of them were pretty close. Interestingly, the one with no hint was
actually the fastest, but only very slightly faster than using end() as
the hint (I will consider them the same due to the resolution of my
timing results). Also interesting was that the one using the previous
insertion's iterator was the slowest, but only by about 3%.

Since the timing differences are fairly negligible (and also not
definitive since I had other processes running at the same time), I
think I will just stick to the simple version of insert(), since it is
the clearest.

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
Oct 26 '06 #4
Marcus Kwok wrote:
>Marcus Kwok wrote:
[using a hint when inserting into std::set]

Victor Bazarov <v.********@com acast.netwrote:
>Why don't you try it both ways and compare the time it takes your
program to form your 'set'?

OK, so I tested three different versions of my code: one where no hint
is supplied, one where I use my_set.end() as the hint, and one where I
use the iterator returned from the previous iteration as the hint.
All of them were pretty close. Interestingly, the one with no hint
was actually the fastest, but only very slightly faster than using
end() as the hint (I will consider them the same due to the
resolution of my timing results). Also interesting was that the one
using the previous insertion's iterator was the slowest, but only by
about 3%.

Since the timing differences are fairly negligible (and also not
definitive since I had other processes running at the same time), I
think I will just stick to the simple version of insert(), since it is
the clearest.
While doing that, have you tried looking at the implementation of
the 'insert' without the argument? Could it be that it actually falls
back onto 'insert' with a hint and gives 'end()' as the hint? Not that
it should make much of a difference for you, of course. Just curious,
I guess.

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Oct 26 '06 #5
Victor Bazarov <v.********@com acast.netwrote:
While doing that, have you tried looking at the implementation of
the 'insert' without the argument? Could it be that it actually falls
back onto 'insert' with a hint and gives 'end()' as the hint? Not that
it should make much of a difference for you, of course. Just curious,
I guess.
Well, it looks like on my implementation, std::set is implemented in
terms of a Red-Black tree. As far as I can tell, the overload without a
hint will just traverse the tree to find the right spot. I can't really
see what exactly the version with a hint is doing... all those leading
underscores make my eyes hurt :)

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
Oct 26 '06 #6
Marcus Kwok wrote:
>Marcus Kwok wrote:
[using a hint when inserting into std::set]

Victor Bazarov <v.********@com acast.netwrote:
>Why don't you try it both ways and compare the time it takes your program
to form your 'set'?

OK, so I tested three different versions of my code: one where no hint
is supplied, one where I use my_set.end() as the hint, and one where I
use the iterator returned from the previous iteration as the hint. All
of them were pretty close. Interestingly, the one with no hint was
actually the fastest, but only very slightly faster than using end() as
the hint (I will consider them the same due to the resolution of my
timing results). Also interesting was that the one using the previous
insertion's iterator was the slowest, but only by about 3%.

Since the timing differences are fairly negligible (and also not
definitive since I had other processes running at the same time), I
think I will just stick to the simple version of insert(), since it is
the clearest.
Keep in mind that inserting elements into a set in sorted order may be
far from optimal and that the run time may be dominated by the
rebalancing required to maintain the red-black tree.
Oct 26 '06 #7
Mark P <us****@fall200 5remove.fastmai lcaps.fmwrote:
Keep in mind that inserting elements into a set in sorted order may be
far from optimal and that the run time may be dominated by the
rebalancing required to maintain the red-black tree.
Hmm, OK, in that case let me add something to my situation. I have a
large list of data that I am adding to my container in sorted order.
There is a possibility of duplicates, but we do not want to add
duplicates. Initially, I was pushing the data to the back of a vector,
but checking if the element was already in the vector before pushing it
back. This repeated searching was very inefficient and was what caused
me to redesign it using a set, since I can just insert without needing
to check before. Using a set instead of the previous method with vector
caused a 4-5x speed increase.

Do you have another suggestion for something I can try?

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
Oct 27 '06 #8
Marcus Kwok wrote:
Mark P <us****@fall200 5remove.fastmai lcaps.fmwrote:
>Keep in mind that inserting elements into a set in sorted order may be
far from optimal and that the run time may be dominated by the
rebalancing required to maintain the red-black tree.

Hmm, OK, in that case let me add something to my situation. I have a
large list of data that I am adding to my container in sorted order.
There is a possibility of duplicates, but we do not want to add
duplicates. Initially, I was pushing the data to the back of a vector,
but checking if the element was already in the vector before pushing it
back. This repeated searching was very inefficient and was what caused
me to redesign it using a set, since I can just insert without needing
to check before. Using a set instead of the previous method with vector
caused a 4-5x speed increase.

Do you have another suggestion for something I can try?
std::unique or std::unique_cop y.

--

-- Pete

Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." For more information about this book, see
www.petebecker.com/tr1book.
Oct 27 '06 #9
On Fri, 27 Oct 2006 13:20:13 +0000 (UTC) in comp.lang.c++,
ri******@gehenn om.invalid (Marcus Kwok) wrote,
>Mark P <us****@fall200 5remove.fastmai lcaps.fmwrote:
>Keep in mind that inserting elements into a set in sorted order may be
far from optimal and that the run time may be dominated by the
rebalancing required to maintain the red-black tree.

Hmm, OK, in that case let me add something to my situation. I have a
large list of data that I am adding to my container in sorted order.
There is a possibility of duplicates, but we do not want to add
duplicates. Initially, I was pushing the data to the back of a vector,
but checking if the element was already in the vector before pushing it
back. This repeated searching was very inefficient and was what caused
me to redesign it using a set, since I can just insert without needing
to check before. Using a set instead of the previous method with vector
caused a 4-5x speed increase.

Do you have another suggestion for something I can try?
If your data is sorted, then you only need to check if each item is
the same as the previous, right? And at the same time, a check if
the item is "less than" the previous will verify that it really is
sorted.
Oct 27 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
2639
by: ma740988 | last post by:
The position returned via the STL std::set container never made much sense to me. When you insert elements within the container, the position returned - via find - does not reflect the actual position of the value within the container. std::string - for instance - does. How then do I get the actual position of the value when using std::set? # include <iostream> using std::cout; using std::cin;
11
3251
by: snnn | last post by:
On the book <Generic Programming and the STL>( Matthew . H . Austern ),this function is defined as iterator set::begin() const. However, why should a const object returns a non-const iterator? Then, I found, in this book, the semantic of set::iterator is defined as same as set::const_iterator. Both of them must be const! I tried to read the source of GNU STL(version 3.4.1).They were using a red-black tree to implant it (std::set has a...
5
8749
by: Peter Jansson | last post by:
Hello, I have the following code: std::map<int,std::set<std::string> > k; k="1234567890"; k="2345678901"; //... std::set<std::string> myMethod(std::map<int,std::set<std::string> > k) throw(std::runtime_error)
10
7559
by: danibe | last post by:
I never had any problems storing pointers in STL containers such std::vector and std::map. The benefit of storing pointers instead of the objects themselves is mainly saving memory resources and performance (STL containers hold *copies* of what's passed to them via the insert() method). However, I am not sure how to accomplish that using std::set. For various reasons, I cannot use vector or map in my application. But I would like to...
16
5088
by: Cory Nelson | last post by:
Does anyone know how std::set prevents duplicates using only std::less? I've tried looking through a couple of the STL implementations and their code is pretty unreadable (to allow for different compilers, I guess).
2
8154
by: shuisheng | last post by:
Dear All, std::set is sorted. So I am wondering is there any fast way to access (sucn as random access) to its elements just like std::vector. Assume I have a set std::set<inta; So I can easily get access to such as a.
7
5770
by: Renzr | last post by:
I have a problem about the std::set<>iterator. After finding a term in the std::set<>, i want to know the distance from the current term to the begin(). But i have got a error. Please offer me help, thank you. I am a freshman about the STL. The following is the code. #include <set> #include <iostream> #include <vector> int main() {
2
3092
by: mathieu | last post by:
hi there, I would like to know if the following piece of code is garantee to work. I am afraid that the cstring address I am using in the std::map found from a request in std::set is not garantee to remain the same as the std::set grows... thanks -Mathieu
2
2796
by: Markus Dehmann | last post by:
I want to derive from std::set, like shown below. But when I try to declare an iterator over the contained elements I get an error, see the twp uncommented lines: #include <set> template<class T> class MySet : public std::set<T>{ public: MySet() : std::set<T>() {} void foo(){
1
10002
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9869
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8883
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7415
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6676
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5312
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5449
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3575
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2816
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.