std::set constructor taking a sorted sequence

desktop

In the C++ standard page 472 it says that you can construct a std::set
in linear time if the constructor gets a sorted sequence of elements.
But how is this possible when insert takes logarithmic time? Should the
time not be nlgn where n is the number of elements?

Jun 10 '07 #1

Subscribe Post Reply

2559

Zeppe

desktop wrote:

In the C++ standard page 472 it says that you can construct a std::set
in linear time if the constructor gets a sorted sequence of elements.
But how is this possible when insert takes logarithmic time? Should the
time not be nlgn where n is the number of elements?

The answer is in the question. If the sequence is ordered, there is no
need to repeat the research each time (logn), so the set can be
constructed in a linear time (you already know where to place the
element, that would be the action requiring logn time).

Regards,

Zeppe

Jun 10 '07 #2

Rosarin Roy

On Jun 10, 6:15 pm, Zeppe <z...@remove.all.this.long.comment.email.it>
wrote:

desktop wrote:
In the C++ standard page 472 it says that you can construct a std::set
in linear time if the constructor gets a sorted sequence of elements.
But how is this possible when insert takes logarithmic time? Should the
time not be nlgn where n is the number of elements?

The answer is in the question. If the sequence is ordered, there is no
need to repeat the research each time (logn), so the set can be
constructed in a linear time (you already know where to place the
element, that would be the action requiring logn time).

Regards,

Zeppe

Most of the standard libraries make use of red-black trees for
implementing set and map. In such case the insertion cannot be linear
in time, immaterial of input is sorted or not sorted. And the claim
that "you already know where to place the element" is not true across
calls, as the search always begins at root.

I wrote a test program (which I tested on Solaris running Sun C++ 5.8
compiler) to confirm that sorted elements' insertion doesn't take
linear time.

I am still curious to know if it is possible to construct a set in
linear time.

Rosarin Roy

Jun 10 '07 #3

Jerry Coffin

In article <f4**********@news.net.uni-c.dk>, as****@asd.com says...

In the C++ standard page 472

It's much better to cite section numbers or (better still) section names
-- page numbers change from one version of the standard to the next, but
the section names remain much closer to constant.

it says that you can construct a std::set
in linear time if the constructor gets a sorted sequence of elements.
But how is this possible when insert takes logarithmic time? Should the
time not be nlgn where n is the number of elements?

Doing it with random access iterators is (mostly) pretty simple. You
basically ignore the fact that you're working with a red-black tree (or
AVL tree) and instead just construct a perfectly balanced tree -- take
the median item in the input sequence, and make that your root. This
leaves two halves of the input, which you recursively insert as the left
and right sub-trees of the current node.

This means you never re-balance the tree during creation (because you're
creating it as close to perfectly balanced as possible given the number
of nodes), and each node you insert is always a direct descendent of the
current node, so you never have to do a Log(N) traversal to find where
to insert a node.

With Input Iterators, this gets a bit ugly -- doing this in linear time
requires figuring the distance (and median item) in constant time. You
could temporarily copy from the input sequence to a vector or deque, but
that would rarely be worthwhile. In theory I believe this should be a
win for a sufficiently large collection, but by the time it gets large
enough, the space for the temporary copy would probably be prohibitive.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Jun 11 '07 #4

V.R. Marinov

On Jun 11, 5:20 am, Jerry Coffin <jcof...@taeus.comwrote:

In article <f4hpva$85...@news.net.uni-c.dk>, asd...@asd.com says...
Doing it with random access iterators is (mostly) pretty simple. You
basically ignore the fact that you're working with a red-black tree (or
AVL tree) and instead just construct a perfectly balanced tree -- take
the median item in the input sequence, and make that your root. This
leaves two halves of the input, which you recursively insert as the left
and right sub-trees of the current node.

This is a very nice idea but unfortunately it imposes (small)
unnecessary
overhead when inserting unsorted ranges.

With Input Iterators, this gets a bit ugly -- doing this in linear time
requires figuring the distance (and median item) in constant time. You
could temporarily copy from the input sequence to a vector or deque, but
that would rarely be worthwhile. In theory I believe this should be a
win for a sufficiently large collection, but by the time it gets large
enough, the space for the temporary copy would probably be prohibitive.

This is a bit overcomplicated and I guess that the set implementation
is
able to avoid it by making very good use of the "hint version" of
insert().
According to the standard the complexity of this function is:

"logarithmic in general, but amortized constant if t is inserted right
after p."

Jun 11 '07 #5

Jerry Coffin

In article <11*********************@p77g2000hsh.googlegroups. com>,
v.*********@gmail.com says...

On Jun 11, 5:20 am, Jerry Coffin <jcof...@taeus.comwrote:

[ ... ]

With Input Iterators, this gets a bit ugly -- doing this in linear time
requires figuring the distance (and median item) in constant time. You
could temporarily copy from the input sequence to a vector or deque, but
that would rarely be worthwhile. In theory I believe this should be a
win for a sufficiently large collection, but by the time it gets large
enough, the space for the temporary copy would probably be prohibitive.

This is a bit overcomplicated and I guess that the set implementation
is able to avoid it by making very good use of the "hint version" of
insert(). According to the standard the complexity of this function is:

"logarithmic in general, but amortized constant if t is inserted right
after p."

This has one minor problem: while it gives amortized linear complexity,
that doesn't (strictly speaking) meet the requirement for strictly
linear complexity.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Jun 11 '07 #6

Zeppe

Rosarin Roy wrote:

On Jun 10, 6:15 pm, Zeppe <z...@remove.all.this.long.comment.email.it>
wrote:
>desktop wrote:
>>In the C++ standard page 472 it says that you can construct a std::set
in linear time if the constructor gets a sorted sequence of elements.

[cut]

Most of the standard libraries make use of red-black trees for
implementing set and map. In such case the insertion cannot be linear
in time, immaterial of input is sorted or not sorted. And the claim
that "you already know where to place the element" is not true across
calls, as the search always begins at root.

as you can see from the quoted message, the OP says that "you can
construct a std::set in linear time if the constructor gets a sorted
sequence of elements." Of course if you make different calls it's not
true any more. The reference constructor is that one accepting iterator
first, iterator last. In that case, the comparison is done on the fly
while inserting. I don't really know the red-black tree behaviour, but I
can expect (correct me if I'm wrong) that, being a balanced binary tree,
the insertion will imply a tree rebalancing which is O(1) (that is, it
does not depend on the number of nodes). So, if I already know where to
put the next value, I just have to do the balancing, that is O(1), and
the total complexity is O(n).

I wrote a test program (which I tested on Solaris running Sun C++ 5.8
compiler) to confirm that sorted elements' insertion doesn't take
linear time.

I am still curious to know if it is possible to construct a set in
linear time.

You have to build the set with the proper constructor. You will see a
nice linear increment in the performance. Use this test:

#include <set>
#include <vector>
#include <boost/date_time/posix_time/posix_time.hpp>

int main()
{
std::vector<longv(100000);
for(std::size_t i = 0; i < 100000; ++i){
v[i] = i;
}

for(std::size_t i = 0; i < 100; ++i){
std::vector<long>::const_iterator begin = v.begin();
std::vector<long>::const_iterator end = v.begin() + 1000 * i;

boost::posix_time::ptime startTime =
boost::posix_time::microsec_clock::local_time();
std::set<longs(begin, end);
boost::posix_time::ptime endTime =
boost::posix_time::microsec_clock::local_time();

std::cout << endTime - startTime << std::endl;
}
return 0;
}

and plot the results.

Regards,

Zeppe

Jun 11 '07 #7

desktop

Jerry Coffin wrote:

In article <11*********************@p77g2000hsh.googlegroups. com>,
v.*********@gmail.com says...
>On Jun 11, 5:20 am, Jerry Coffin <jcof...@taeus.comwrote:

[ ... ]

>>With Input Iterators, this gets a bit ugly -- doing this in linear time
requires figuring the distance (and median item) in constant time. You
could temporarily copy from the input sequence to a vector or deque, but
that would rarely be worthwhile. In theory I believe this should be a
win for a sufficiently large collection, but by the time it gets large
enough, the space for the temporary copy would probably be prohibitive.
This is a bit overcomplicated and I guess that the set implementation
is able to avoid it by making very good use of the "hint version" of
insert(). According to the standard the complexity of this function is:

"logarithmic in general, but amortized constant if t is inserted right
after p."

This has one minor problem: while it gives amortized linear complexity,
that doesn't (strictly speaking) meet the requirement for strictly
linear complexity.

Is it not enough to argument that the re balancing only takes constant
time when the sequence are sorted?

Re balancing can potentially take O(lg n) time. But when the sequence is
sorted the while loop in re balancing will be executed at most one time
so you get O(n) * O(1) which is O(n).

Jun 11 '07 #8

Similar topics

A non-const std::set iterator

by: Michael Klatt | last post by:

I am trying to write an iterator for a std::set that allows the iterator target to be modified. Here is some relvant code: template <class Set> // Set is an instance of std::set<> class...

C / C++

which type should "std::set::begin() const" return?

by: snnn | last post by:

On the book <Generic Programming and the STL>( Matthew . H . Austern ),this function is defined as iterator set::begin() const. However, why should a const object returns a non-const iterator?...

C / C++

std::map<int,std::set<std::string> > Wrong? (Segmentation fault.)

by: Peter Jansson | last post by:

Hello, I have the following code: std::map<int,std::set<std::string> > k; k="1234567890"; k="2345678901"; //... std::set<std::string> myMethod(std::map<int,std::set<std::string> > k)...

C / C++

can std::set hold pointers to keys instead of the keys themselves?

by: danibe | last post by:

I never had any problems storing pointers in STL containers such std::vector and std::map. The benefit of storing pointers instead of the objects themselves is mainly saving memory resources and...

C / C++

std::set and insert speed

by: asdf | last post by:

I have a program that reads sorted data from a database and inserts it element by element into a set. If the data is sorted is there a faster way to insert ? Meaning is there a way to tell the...

C / C++

Help! How to access to std::set elements?

by: shuisheng | last post by:

Dear All, std::set is sorted. So I am wondering is there any fast way to access (sucn as random access) to its elements just like std::vector. Assume I have a set std::set<inta; So I can...

C / C++

Inserting into std::set

by: Marcus Kwok | last post by:

I am not sure if this is something that is covered by the Standard, or if it's an implementation detail of my Standard Library. I am reading in a large amount of data into a std::set. There is...

C / C++

factor 50.000 between std::list and std::set?

by: desktop | last post by:

If I have a sorted std::list with 1.000.000 elements it takes 1.000.000 operations to find element with value = 1.000.000 (need to iterator through the whole list). In comparison, if I have a...

C / C++

A question about the std::set<>::iterator

by: Renzr | last post by:

I have a problem about the std::set<>iterator. After finding a term in the std::set<>, i want to know the distance from the current term to the begin(). But i have got a error. Please offer me...

C / C++

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++