By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,742 Members | 1,206 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,742 IT Pros & Developers. It's quick & easy.

std::set constructor taking a sorted sequence

P: n/a
In the C++ standard page 472 it says that you can construct a std::set
in linear time if the constructor gets a sorted sequence of elements.
But how is this possible when insert takes logarithmic time? Should the
time not be nlgn where n is the number of elements?
Jun 10 '07 #1
Share this Question
Share on Google+
7 Replies


P: n/a
desktop wrote:
In the C++ standard page 472 it says that you can construct a std::set
in linear time if the constructor gets a sorted sequence of elements.
But how is this possible when insert takes logarithmic time? Should the
time not be nlgn where n is the number of elements?
The answer is in the question. If the sequence is ordered, there is no
need to repeat the research each time (logn), so the set can be
constructed in a linear time (you already know where to place the
element, that would be the action requiring logn time).

Regards,

Zeppe
Jun 10 '07 #2

P: n/a
On Jun 10, 6:15 pm, Zeppe <z...@remove.all.this.long.comment.email.it>
wrote:
desktop wrote:
In the C++ standard page 472 it says that you can construct a std::set
in linear time if the constructor gets a sorted sequence of elements.
But how is this possible when insert takes logarithmic time? Should the
time not be nlgn where n is the number of elements?

The answer is in the question. If the sequence is ordered, there is no
need to repeat the research each time (logn), so the set can be
constructed in a linear time (you already know where to place the
element, that would be the action requiring logn time).

Regards,

Zeppe
Most of the standard libraries make use of red-black trees for
implementing set and map. In such case the insertion cannot be linear
in time, immaterial of input is sorted or not sorted. And the claim
that "you already know where to place the element" is not true across
calls, as the search always begins at root.

I wrote a test program (which I tested on Solaris running Sun C++ 5.8
compiler) to confirm that sorted elements' insertion doesn't take
linear time.

I am still curious to know if it is possible to construct a set in
linear time.

Rosarin Roy

Jun 10 '07 #3

P: n/a
In article <f4**********@news.net.uni-c.dk>, as****@asd.com says...
In the C++ standard page 472
It's much better to cite section numbers or (better still) section names
-- page numbers change from one version of the standard to the next, but
the section names remain much closer to constant.
it says that you can construct a std::set
in linear time if the constructor gets a sorted sequence of elements.
But how is this possible when insert takes logarithmic time? Should the
time not be nlgn where n is the number of elements?
Doing it with random access iterators is (mostly) pretty simple. You
basically ignore the fact that you're working with a red-black tree (or
AVL tree) and instead just construct a perfectly balanced tree -- take
the median item in the input sequence, and make that your root. This
leaves two halves of the input, which you recursively insert as the left
and right sub-trees of the current node.

This means you never re-balance the tree during creation (because you're
creating it as close to perfectly balanced as possible given the number
of nodes), and each node you insert is always a direct descendent of the
current node, so you never have to do a Log(N) traversal to find where
to insert a node.

With Input Iterators, this gets a bit ugly -- doing this in linear time
requires figuring the distance (and median item) in constant time. You
could temporarily copy from the input sequence to a vector or deque, but
that would rarely be worthwhile. In theory I believe this should be a
win for a sufficiently large collection, but by the time it gets large
enough, the space for the temporary copy would probably be prohibitive.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Jun 11 '07 #4

P: n/a
On Jun 11, 5:20 am, Jerry Coffin <jcof...@taeus.comwrote:
In article <f4hpva$85...@news.net.uni-c.dk>, asd...@asd.com says...
Doing it with random access iterators is (mostly) pretty simple. You
basically ignore the fact that you're working with a red-black tree (or
AVL tree) and instead just construct a perfectly balanced tree -- take
the median item in the input sequence, and make that your root. This
leaves two halves of the input, which you recursively insert as the left
and right sub-trees of the current node.

This is a very nice idea but unfortunately it imposes (small)
unnecessary
overhead when inserting unsorted ranges.

With Input Iterators, this gets a bit ugly -- doing this in linear time
requires figuring the distance (and median item) in constant time. You
could temporarily copy from the input sequence to a vector or deque, but
that would rarely be worthwhile. In theory I believe this should be a
win for a sufficiently large collection, but by the time it gets large
enough, the space for the temporary copy would probably be prohibitive.
This is a bit overcomplicated and I guess that the set implementation
is
able to avoid it by making very good use of the "hint version" of
insert().
According to the standard the complexity of this function is:

"logarithmic in general, but amortized constant if t is inserted right
after p."

Jun 11 '07 #5

P: n/a
In article <11*********************@p77g2000hsh.googlegroups. com>,
v.*********@gmail.com says...
On Jun 11, 5:20 am, Jerry Coffin <jcof...@taeus.comwrote:
[ ... ]
With Input Iterators, this gets a bit ugly -- doing this in linear time
requires figuring the distance (and median item) in constant time. You
could temporarily copy from the input sequence to a vector or deque, but
that would rarely be worthwhile. In theory I believe this should be a
win for a sufficiently large collection, but by the time it gets large
enough, the space for the temporary copy would probably be prohibitive.

This is a bit overcomplicated and I guess that the set implementation
is able to avoid it by making very good use of the "hint version" of
insert(). According to the standard the complexity of this function is:

"logarithmic in general, but amortized constant if t is inserted right
after p."
This has one minor problem: while it gives amortized linear complexity,
that doesn't (strictly speaking) meet the requirement for strictly
linear complexity.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Jun 11 '07 #6

P: n/a
Rosarin Roy wrote:
On Jun 10, 6:15 pm, Zeppe <z...@remove.all.this.long.comment.email.it>
wrote:
>desktop wrote:
>>In the C++ standard page 472 it says that you can construct a std::set
in linear time if the constructor gets a sorted sequence of elements.
[cut]
Most of the standard libraries make use of red-black trees for
implementing set and map. In such case the insertion cannot be linear
in time, immaterial of input is sorted or not sorted. And the claim
that "you already know where to place the element" is not true across
calls, as the search always begins at root.
as you can see from the quoted message, the OP says that "you can
construct a std::set in linear time if the constructor gets a sorted
sequence of elements." Of course if you make different calls it's not
true any more. The reference constructor is that one accepting iterator
first, iterator last. In that case, the comparison is done on the fly
while inserting. I don't really know the red-black tree behaviour, but I
can expect (correct me if I'm wrong) that, being a balanced binary tree,
the insertion will imply a tree rebalancing which is O(1) (that is, it
does not depend on the number of nodes). So, if I already know where to
put the next value, I just have to do the balancing, that is O(1), and
the total complexity is O(n).
I wrote a test program (which I tested on Solaris running Sun C++ 5.8
compiler) to confirm that sorted elements' insertion doesn't take
linear time.

I am still curious to know if it is possible to construct a set in
linear time.
You have to build the set with the proper constructor. You will see a
nice linear increment in the performance. Use this test:

#include <set>
#include <vector>
#include <boost/date_time/posix_time/posix_time.hpp>

int main()
{
std::vector<longv(100000);
for(std::size_t i = 0; i < 100000; ++i){
v[i] = i;
}

for(std::size_t i = 0; i < 100; ++i){
std::vector<long>::const_iterator begin = v.begin();
std::vector<long>::const_iterator end = v.begin() + 1000 * i;

boost::posix_time::ptime startTime =
boost::posix_time::microsec_clock::local_time();
std::set<longs(begin, end);
boost::posix_time::ptime endTime =
boost::posix_time::microsec_clock::local_time();

std::cout << endTime - startTime << std::endl;
}
return 0;
}

and plot the results.

Regards,

Zeppe
Jun 11 '07 #7

P: n/a
Jerry Coffin wrote:
In article <11*********************@p77g2000hsh.googlegroups. com>,
v.*********@gmail.com says...
>On Jun 11, 5:20 am, Jerry Coffin <jcof...@taeus.comwrote:

[ ... ]
>>With Input Iterators, this gets a bit ugly -- doing this in linear time
requires figuring the distance (and median item) in constant time. You
could temporarily copy from the input sequence to a vector or deque, but
that would rarely be worthwhile. In theory I believe this should be a
win for a sufficiently large collection, but by the time it gets large
enough, the space for the temporary copy would probably be prohibitive.
This is a bit overcomplicated and I guess that the set implementation
is able to avoid it by making very good use of the "hint version" of
insert(). According to the standard the complexity of this function is:

"logarithmic in general, but amortized constant if t is inserted right
after p."

This has one minor problem: while it gives amortized linear complexity,
that doesn't (strictly speaking) meet the requirement for strictly
linear complexity.
Is it not enough to argument that the re balancing only takes constant
time when the sequence are sorted?

Re balancing can potentially take O(lg n) time. But when the sequence is
sorted the while loop in re balancing will be executed at most one time
so you get O(n) * O(1) which is O(n).
Jun 11 '07 #8

This discussion thread is closed

Replies have been disabled for this discussion.