Delete from a std::set in amortized constant time

On 6/12/07 4:00 PM, in article f4**********@news.net.uni-c.dk, "desktop"
<ff*@sss.comwrote:

In the C++ standard sec 23.1.2 table 69 it says that erase(q) where q is
a pointer to an element can be done in amortized constant time.

I guess that is not worst case since std::set is practically a red-black
tree where insert/delete takes O(lg n) time. Or are there some other
explanation for this complexity?

The explanation is simple. The removal time of a node when measured for an
RB tree includes the time needed to find the node to be deleted in the tree.

In this case of a call to set::erase(), no time needs to be spent searching
for the node (that is, the item) to be removed because its location is
passed to the erase method as a parameter. So by skipping the search for the
item, the item is able to be removed from the set in amortized constant (and
not logarithmic) time.

Greg

Jun 12 '07 #3

desktop

Dave Steffen wrote:

desktop <ff*@sss.comwrites:

>In the C++ standard sec 23.1.2 table 69 it says that erase(q) where q
is a pointer to an element can be done in amortized constant time.

I guess that is not worst case since std::set is practically a
red-black tree where insert/delete takes O(lg n) time. Or are there
some other explanation for this complexity?

IIRC insert takes log n time, since you have to search for the right
place to insert it.

"Remove this value" also takes log n time, since you have to search
for the thing to delete.

In this case, you already know what thing to delete, since you've
got an iterator to it. No searching required, just fiddling with
some internal bookkeeping.

But you still need to do the following re balancing that can take O(lg
n) time

Jun 12 '07 #4

On 6/12/07 4:26 PM, in article f4**********@news.net.uni-c.dk, "desktop"
<ff*@sss.comwrote:

Dave Steffen wrote:
>desktop <ff*@sss.comwrites:

>>In the C++ standard sec 23.1.2 table 69 it says that erase(q) where q
is a pointer to an element can be done in amortized constant time.

I guess that is not worst case since std::set is practically a
red-black tree where insert/delete takes O(lg n) time. Or are there
some other explanation for this complexity?

IIRC insert takes log n time, since you have to search for the right
place to insert it.

"Remove this value" also takes log n time, since you have to search
for the thing to delete.

In this case, you already know what thing to delete, since you've
got an iterator to it. No searching required, just fiddling with
some internal bookkeeping.

But you still need to do the following re balancing that can take O(lg
n) time

But the amortized time for rebalancing an RB tree is O(1) - in another
words, a constant amount of time. So the C++ Standard's performance
requirements for deleting an item from a set can be met by implementing the
set with an RB tree.

Greg

Jun 13 '07 #5

desktop

Greg Herlihy wrote:

>

On 6/12/07 4:26 PM, in article f4**********@news.net.uni-c.dk, "desktop"
<ff*@sss.comwrote:

>Dave Steffen wrote:
>>desktop <ff*@sss.comwrites:

In the C++ standard sec 23.1.2 table 69 it says that erase(q) where q
is a pointer to an element can be done in amortized constant time.

I guess that is not worst case since std::set is practically a
red-black tree where insert/delete takes O(lg n) time. Or are there
some other explanation for this complexity?
IIRC insert takes log n time, since you have to search for the right
place to insert it.

"Remove this value" also takes log n time, since you have to search
for the thing to delete.

In this case, you already know what thing to delete, since you've
got an iterator to it. No searching required, just fiddling with
some internal bookkeeping.
But you still need to do the following re balancing that can take O(lg
n) time

But the amortized time for rebalancing an RB tree is O(1) - in another
words, a constant amount of time. So the C++ Standard's performance
requirements for deleting an item from a set can be met by implementing the
set with an RB tree.

Greg

The delete version in In Introduction To Algorithms by Thomas Cormen is
the version that takes a pointer to an element (not a key that first has
to be found).

But I can't see how you can avoid the O (lg n ) time since the
subroutine 'tree-successor' has a running time equal to the height of
the tree.

Jun 13 '07 #6

Greg Herlihy wrote:

In this case of a call to set::erase(), no time needs to be spent searching
for the node (that is, the item) to be removed because its location is
passed to the erase method as a parameter. So by skipping the search for the
item, the item is able to be removed from the set in amortized constant (and
not logarithmic) time.

And the algorithm somehow manages to rebalance the tree in amortized
constant time?

Jun 13 '07 #7

James Kanze

On Jun 13, 1:00 am, desktop <f...@sss.comwrote:

In the C++ standard sec 23.1.2 table 69 it says that erase(q)
where q is a pointer to an element can be done in amortized
constant time.

No it doesn't. It says that the complexity is "amortized
constant". And in §21.1/2, it says clearly that "All of the
complexity requirements in this clause are stated solely in
terms of the number of operations on the contained objects."
I would expect that erase(q) requires one call to the destructor
of the object, and that is it. Which definitely makes it O(1);
I don't even know why there is an "amortized" in there.

The standard never makes any requirements with regards to time.
In a very real way, it can't; there are too many variables
involved that are beyond the control of the implementation.

--
James Kanze (GABI Software, from CAI) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 13 '07 #8

desktop

James Kanze wrote:

On Jun 13, 1:00 am, desktop <f...@sss.comwrote:
>In the C++ standard sec 23.1.2 table 69 it says that erase(q)
where q is a pointer to an element can be done in amortized
constant time.

No it doesn't. It says that the complexity is "amortized
constant". And in §21.1/2, it says clearly that "All of the
complexity requirements in this clause are stated solely in
terms of the number of operations on the contained objects."
I would expect that erase(q) requires one call to the destructor
of the object, and that is it. Which definitely makes it O(1);
I don't even know why there is an "amortized" in there.

Ok so when erase(p) is said to be amortized constant they exclude the
time used to re balance the tree which would result in logarithmic time.

Jun 13 '07 #9

V.R. Marinov

desktop wrote:

In the C++ standard sec 23.1.2 table 69 it says that erase(q) where q is
a pointer to an element can be done in amortized constant time.

I guess that is not worst case since std::set is practically a red-black
tree where insert/delete takes O(lg n) time. Or are there some other
explanation for this complexity?

You don't read carefully enough.
According to the standard the function erase(T value) has log(N)
complexity where N is the number of elements. This function requires
searching.

And the function erase(iterator it) has constant complexity because
it doesn't require searching.

--
it's mail not fail

Jun 13 '07 #10

Zeppe

V.R. Marinov wrote:

desktop wrote:
>In the C++ standard sec 23.1.2 table 69 it says that erase(q) where q
is a pointer to an element can be done in amortized constant time.

I guess that is not worst case since std::set is practically a
red-black tree where insert/delete takes O(lg n) time. Or are there
some other explanation for this complexity?

You don't read carefully enough.
According to the standard the function erase(T value) has log(N)
complexity where N is the number of elements. This function requires
searching.

And the function erase(iterator it) has constant complexity because
it doesn't require searching.

You don't read carefully enough. The OP argues that the rebalancing of
the tree takes O(log n) time, so, despite of the search, the complexity
would be O(log n) just for the rebalancing.

Regards,

Zeppe

Jun 13 '07 #11

V.R. Marinov

Zeppe wrote:

V.R. Marinov wrote:
>desktop wrote:
>>In the C++ standard sec 23.1.2 table 69 it says that erase(q) where q
is a pointer to an element can be done in amortized constant time.

I guess that is not worst case since std::set is practically a
red-black tree where insert/delete takes O(lg n) time. Or are there
some other explanation for this complexity?

You don't read carefully enough.
According to the standard the function erase(T value) has log(N)
complexity where N is the number of elements. This function requires
searching.

And the function erase(iterator it) has constant complexity because
it doesn't require searching.

You don't read carefully enough. The OP argues that the rebalancing of
the tree takes O(log n) time, so, despite of the search, the complexity
would be O(log n) just for the rebalancing.

I guess I was trying to say that rebalancing doesn't take O(lg(N))
complexity.
--
it's mail not fail

Jun 13 '07 #12

desktop

V.R. Marinov wrote:

desktop wrote:
>In the C++ standard sec 23.1.2 table 69 it says that erase(q) where q
is a pointer to an element can be done in amortized constant time.

I guess that is not worst case since std::set is practically a
red-black tree where insert/delete takes O(lg n) time. Or are there
some other explanation for this complexity?

You don't read carefully enough.
According to the standard the function erase(T value) has log(N)
complexity where N is the number of elements. This function requires
searching.

And the function erase(iterator it) has constant complexity because
it doesn't require searching.

As James Kanze pointed out the constant complexity is based on only
contained objects and not the extra time needed to do re balancing which
leads to a O (lg n) bound.

Jun 13 '07 #13

Zeppe

desktop wrote:

James Kanze wrote:
>On Jun 13, 1:00 am, desktop <f...@sss.comwrote:
>>In the C++ standard sec 23.1.2 table 69 it says that erase(q)
where q is a pointer to an element can be done in amortized
constant time.

No it doesn't. It says that the complexity is "amortized
constant". And in §21.1/2, it says clearly that "All of the
complexity requirements in this clause are stated solely in
terms of the number of operations on the contained objects."
I would expect that erase(q) requires one call to the destructor
of the object, and that is it. Which definitely makes it O(1);
I don't even know why there is an "amortized" in there.

Ok so when erase(p) is said to be amortized constant they exclude the
time used to re balance the tree which would result in logarithmic time.

Not at all.

1) the sentence in 23.1/2 refers to the fact that the complexity is
independent on the type of object that is stored in the container. The
complexity is calculated in terms of operations on the objects, exactly
how normally it's calculated in the computer science books.

2) I don't have any book on data structures, but wikipedia tells that
the rebalancing takes O(log N) or amortized O(1) time. So, it's
reasonable to say that if you do not need to search but just to
rebalance, the required time is amortized O(1). Amortized O(1) is not
the same of O(1): the rebalancing IS O(log N) in the worst case, but
there is the guarantee that the worst case can't happen too often, so in
average it's O(1). More accurately, it means that if you have to erase a
sequence, this will be done in O(n) (not O(n log n)) time. The standard
apparently gives only this "weak" requirement.

3) having said so, I would suggest you to try to read carefully
something about these data structures if you are interest in them,
because it seems like you are trying to understand pretty complex and
theoretical stuff having done just a quick look trough some book. The
questions you are asking for are quite difficult, and almost anybody
here is proficient with the theory of RB trees (that are not strictly
related with c++ either).

Regards,

Zeppe

Jun 13 '07 #14

desktop

Zeppe wrote:

desktop wrote:
>James Kanze wrote:
>>On Jun 13, 1:00 am, desktop <f...@sss.comwrote:
In the C++ standard sec 23.1.2 table 69 it says that erase(q)
where q is a pointer to an element can be done in amortized
constant time.

No it doesn't. It says that the complexity is "amortized
constant". And in §21.1/2, it says clearly that "All of the
complexity requirements in this clause are stated solely in
terms of the number of operations on the contained objects."
I would expect that erase(q) requires one call to the destructor
of the object, and that is it. Which definitely makes it O(1);
I don't even know why there is an "amortized" in there.

Ok so when erase(p) is said to be amortized constant they exclude the
time used to re balance the tree which would result in logarithmic time.

Not at all.

1) the sentence in 23.1/2 refers to the fact that the complexity is
independent on the type of object that is stored in the container. The
complexity is calculated in terms of operations on the objects, exactly
how normally it's calculated in the computer science books.

2) I don't have any book on data structures, but wikipedia tells that
the rebalancing takes O(log N) or amortized O(1) time. So, it's
reasonable to say that if you do not need to search but just to
rebalance, the required time is amortized O(1). Amortized O(1) is not
the same of O(1): the rebalancing IS O(log N) in the worst case, but
there is the guarantee that the worst case can't happen too often, so in
average it's O(1). More accurately, it means that if you have to erase a
sequence, this will be done in O(n) (not O(n log n)) time. The standard
apparently gives only this "weak" requirement.

But delete not only uses re balancing, it also uses tree-successor which
have a running time equal to the height of the tree. This subroutine is
used in the version of delete that takes a pointer to the element thats
going to be deleted.

3) having said so, I would suggest you to try to read carefully
something about these data structures if you are interest in them,
because it seems like you are trying to understand pretty complex and
theoretical stuff having done just a quick look trough some book. The
questions you are asking for are quite difficult, and almost anybody
here is proficient with the theory of RB trees (that are not strictly
related with c++ either).

The literature I read don't cover these special cases or uses of the
red/black tree and the basics they cover are not very explanatory (don't
think you can find a book covering the requirements of the operations in
the C++ standard in this kind of detail) but I agree that its getting
Off-Topic and I will make sure to post theoretical questions in a more
appropriate group.

Jun 13 '07 #15

James Kanze

On Jun 13, 3:15 pm, desktop <f...@sss.comwrote:

James Kanze wrote:
On Jun 13, 1:00 am, desktop <f...@sss.comwrote:
In the C++ standard sec 23.1.2 table 69 it says that erase(q)
where q is a pointer to an element can be done in amortized
constant time.

No it doesn't. It says that the complexity is "amortized
constant". And in §21.1/2, it says clearly that "All of the
complexity requirements in this clause are stated solely in
terms of the number of operations on the contained objects."
I would expect that erase(q) requires one call to the destructor
of the object, and that is it. Which definitely makes it O(1);
I don't even know why there is an "amortized" in there.

Ok so when erase(p) is said to be amortized constant they
exclude the time used to re balance the tree which would
result in logarithmic time.

Essentially, yes. They also exclude the time to deallocate the
node; the time necessary for deallocation could depend on the
number of individual blocks allocated. They also exclude any
time necessary for paging, due to poor locality or whatever,
which might also depend (indirectly) on the size of the set. I
imagine that most people would agree that these last two are
reasonable to exclude; there's no way to really say much about
them anyway. But that means that the complexity guarantees
can't be measured in time. And the results of measuring them in
operations on the contained object results in the time to
rebalance the tree being excluded as well.

Whether this is significant depends on the object type. If
comparison or destruction are complicated operations, the few
pointer operations necessary for rebalancing won't be
significant, even if the tree is 20 deep, and you hit the worst
case. If the set only contains int's, and you use a blocked
fixed size allocator which can deallocate in with only one or
two pointer manipulations, the rebalancing is a major part of
the job.

So I'm not saying that it's good or bad, the way it's specified.
I'm just saying that that's how it's specified.

--
James Kanze (Gabi Software) email: ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 13 '07 #16

On 6/13/07 5:14 AM, in article 46**********************@news.song.fi, "Juha
Nieminen" <no****@thanks.invalidwrote:

Greg Herlihy wrote:
>In this case of a call to set::erase(), no time needs to be spent searching
for the node (that is, the item) to be removed because its location is
passed to the erase method as a parameter. So by skipping the search for the
item, the item is able to be removed from the set in amortized constant (and
not logarithmic) time.

And the algorithm somehow manages to rebalance the tree in amortized
constant time?

Yes.

And if you would like to know exactly how a RB tree is rebalanced, you can
look up the relevant entry in Wikipedia as a starting point.

Greg

Jun 14 '07 #17

V.R. Marinov wrote:

I guess I was trying to say that rebalancing doesn't take O(lg(N))
complexity.

Note that O() and "amortized time" are different things. O() denotes
the *worst-case* scenario of one single operation. Even if rebalancing
the tree after a removal could be done in *amortized* constant time,
it could still be O(lg(N)).
Granted, I'm not acquainted with the exact algorithm used for
rebalancing the tree, but I'm pretty sure it takes lg(N) time in
the worst case.

It's the same thing as with the tree traversal: Even though a whole
traversal can be done in linear time (iow. the whole traversal takes
O(N) steps), one single iterator increment is O(lg(N)) (because there
are cases where it takes that many operations).

Jun 14 '07 #18

desktop

Juha Nieminen wrote:

V.R. Marinov wrote:
>I guess I was trying to say that rebalancing doesn't take O(lg(N))
complexity.

Note that O() and "amortized time" are different things. O() denotes
the *worst-case* scenario of one single operation. Even if rebalancing
the tree after a removal could be done in *amortized* constant time,
it could still be O(lg(N)).
Granted, I'm not acquainted with the exact algorithm used for
rebalancing the tree, but I'm pretty sure it takes lg(N) time in
the worst case.

It's the same thing as with the tree traversal: Even though a whole
traversal can be done in linear time (iow. the whole traversal takes
O(N) steps), one single iterator increment is O(lg(N)) (because there
are cases where it takes that many operations).

The tree-successor subroutine in delete (before the call to re-balance)
also takes O(lg n) time so this must dominate delete even though you can
do re balancing in amortized constant time.

Jun 14 '07 #19

On 6/14/07 3:44 AM, in article 46**********************@news.song.fi, "Juha
Nieminen" <no****@thanks.invalidwrote:

V.R. Marinov wrote:
>I guess I was trying to say that rebalancing doesn't take O(lg(N))
complexity.

Note that O() and "amortized time" are different things. O() denotes
the *worst-case* scenario of one single operation. Even if rebalancing
the tree after a removal could be done in *amortized* constant time,
it could still be O(lg(N)).

Not really. Remember that deleting an item must require no more than an
amortized constant number of operations: therefore, if erasing an item from
a set takes O(log(N)) operations (say, to rebalance the underlying RB
tree), then it must be the case that none of the preceding (or following)
items erased could have required more than a constant number of operations
to complete. In other words, in order to achieve a constant amortized
operation count, there must be a sharp limit on how often a "worst case" may
occur relative to the number of "best cases" that occur when erasing an item
from a set.

So, unlike, say, searching for an item in a set (in which every search could
require a "worst case" or O(log(N)) number of operations), erasing an item
from a set may require O(log(N)) operations only rarely (I estimate around 1
in e^N items erased could turn out to be a worst case).

A clearer example would be a std::vector that doubles its capacity whenever
its size is about to exceed its current capacity. Now, even though inserting
an item into this vector could require a (worst case) N number of operations
- to allocate the added capacity - it should be clear that not every
insertion could be a worst case. In fact the number of such worst cases is
limited to 1 per every 2^N insertions. So in this case, even though the
worst case has linear complexity, the average case still has a constant
complexity, because as the worst cases become worse - they also become less
frequent by the same degree.

Granted, I'm not acquainted with the exact algorithm used for
rebalancing the tree, but I'm pretty sure it takes lg(N) time in
the worst case.

Yes, but the worst cases occur less and less frequently and are therefore
always compensated by an ever-increasing number of constant "best cases". So
the average complexity per deletion when the best cases are included with
the worst cases - is still a constant.

It's the same thing as with the tree traversal: Even though a whole
traversal can be done in linear time (iow. the whole traversal takes
O(N) steps), one single iterator increment is O(lg(N)) (because there
are cases where it takes that many operations).

Your explanation is entirely backwards. First, it seems that you must be
describing the number of steps needed to find a specified item in a
container. Second, Linear complexity, O(N), is much worse than logarithmic
complexity, O(log(N)) So O(N) is the expected worst case to find an item in
certain unsorted containers (such as a vector). To find an item in a set on
the other hand, has only a logarithmic worst-case complexity O(log(N)) - a
significant improvement over a vector.

In any event, searching in a set is not comparable to erasing or adding an
item (at a known location) - since, when searching there is no limit on the
frequency of "worst" cases searches versus "best" case searches - and in
fact every search could be a worst case - so its complexity has to be
reported with the most pessimistic assumption - just in case.

Greg

Jun 15 '07 #20

Greg Herlihy wrote:

So, unlike, say, searching for an item in a set (in which every search could
require a "worst case" or O(log(N)) number of operations), erasing an item
from a set may require O(log(N)) operations only rarely (I estimate around 1
in e^N items erased could turn out to be a worst case).

I think you don't really understand the O() notation. The O()
notation expresses the asymptotic *upper limit* for the time required
to perform the specified operation. Or in other words, it describes
the *worst case*, ie. the largest amount of time the operation could
possibly take. It doesn't matter if this happens rarely. O() doesn't
express the average time nor the amortized time for that operation.

Even if erasing the element can be done in constant time one
billion times for each lg(n)-time erase, erasing the element is
still an O(lg(n))-time algorithm. As I said, O() expresses the
absolute worst case scenario.

"Amortized time" is a completely different thing. It says that if
you perform the operation a very large amount of times over a typical
data set, how much *in average* it will take time.

A clearer example would be a std::vector that doubles its capacity whenever
its size is about to exceed its current capacity. Now, even though inserting
an item into this vector could require a (worst case) N number of operations
- to allocate the added capacity - it should be clear that not every
insertion could be a worst case.

Insertion into a vector *is* an O(n) operation. It doesn't matter if
the operation is constant-time when amortized over a large amount of
insertions. O() expresses the worst possible case of one operation.

In fact the number of such worst cases is
limited to 1 per every 2^N insertions. So in this case, even though the
worst case has linear complexity, the average case still has a constant
complexity, because as the worst cases become worse - they also become less
frequent by the same degree.

Yes, back-insertion in a vector is an *amortized* constant-time
operation, but it's an O(n) operation.

> It's the same thing as with the tree traversal: Even though a whole
traversal can be done in linear time (iow. the whole traversal takes
O(N) steps), one single iterator increment is O(lg(N)) (because there
are cases where it takes that many operations).

Your explanation is entirely backwards. First, it seems that you must be
describing the number of steps needed to find a specified item in a
container.

No, I'm describing a tree traversal from begin() to end(). The entire
traversal takes O(n) time even though one single iterator increment
is an O(lg(n)) operation.

(Common sense would dictate that if an iterator increment is an
O(lg(n)) operation and it's done n times, the entire traversal must
be an O(n*lg(n)) operation. However, quite curiously, it's not. It's
an O(n) operation. It's actually rather easy and intuitive to prove.)

Second, Linear complexity, O(N), is much worse than logarithmic
complexity, O(log(N))

So what? I was talking about an entire tree traversal, from
begin() to end(). It's a linear operation.

So O(N) is the expected worst case to find an item in
certain unsorted containers (such as a vector). To find an item in a set on
the other hand, has only a logarithmic worst-case complexity O(log(N)) - a
significant improvement over a vector.

I was not talking about finding an item.

Jun 15 '07 #21

Hans Bos

"Greg Herlihy" <gr****@pacbell.netwrote in message
news:C2******************@pacbell.net...

On 6/14/07 3:44 AM, in article 46**********************@news.song.fi, "Juha
Nieminen" <no****@thanks.invalidwrote:

>V.R. Marinov wrote:
>>I guess I was trying to say that rebalancing doesn't take O(lg(N))
complexity.

Note that O() and "amortized time" are different things. O() denotes
the *worst-case* scenario of one single operation. Even if rebalancing
the tree after a removal could be done in *amortized* constant time,
it could still be O(lg(N)).

Not really. Remember that deleting an item must require no more than an
amortized constant number of operations: therefore, if erasing an item from
a set takes O(log(N)) operations (say, to rebalance the underlying RB
tree), then it must be the case that none of the preceding (or following)
items erased could have required more than a constant number of operations
to complete. In other words, in order to achieve a constant amortized
operation count, there must be a sharp limit on how often a "worst case" may
occur relative to the number of "best cases" that occur when erasing an item
from a set.

So, unlike, say, searching for an item in a set (in which every search could
require a "worst case" or O(log(N)) number of operations), erasing an item
from a set may require O(log(N)) operations only rarely (I estimate around 1
in e^N items erased could turn out to be a worst case).

.....

>
Yes, but the worst cases occur less and less frequently and are therefore
always compensated by an ever-increasing number of constant "best cases". So
the average complexity per deletion when the best cases are included with
the worst cases - is still a constant.

This is not true. It depends on the order of deletion.
If each time the root of the tree is deleted, each delete operation will take
O(logN) time.

So even amortized this is O(logN) per deletion.

If you always delete a leaf node, the deletion will take O(1) time (ignoring
rebalancing).

Hans.

Jun 15 '07 #22

The delete version in In Introduction To Algorithms by Thomas Cormen is
the version that takes a pointer to an element (not a key that first has
to be found).

But I can't see how you can avoid the O (lg n ) time since the
subroutine 'tree-successor' has a running time equal to the height of
the tree.

Has a worst case running time equal to the height of the tree. But the
average time is O(1).

john

Jun 15 '07 #23

But delete not only uses re balancing, it also uses tree-successor which
have a running time equal to the height of the tree.

Tree successor has a worst case running time equal to the height of the
tree, but it's average time is O(1).

john

Jun 15 '07 #24

desktop wrote:

V.R. Marinov wrote:
>desktop wrote:
>>In the C++ standard sec 23.1.2 table 69 it says that erase(q) where q
is a pointer to an element can be done in amortized constant time.

I guess that is not worst case since std::set is practically a
red-black tree where insert/delete takes O(lg n) time. Or are there
some other explanation for this complexity?

You don't read carefully enough.
According to the standard the function erase(T value) has log(N)
complexity where N is the number of elements. This function requires
searching.

And the function erase(iterator it) has constant complexity because
it doesn't require searching.

As James Kanze pointed out the constant complexity is based on only
contained objects and not the extra time needed to do re balancing which
leads to a O (lg n) bound.

That may be true of the standard (standard is pretty useless in that
case IMHO), but it is still the case that to delete from a red-black
tree given the node to delete, is in the average case, a constant time
operation.

john

Jun 15 '07 #25

On 6/15/07 2:35 AM, in article 46**********************@news.song.fi, "Juha
Nieminen" <no****@thanks.invalidwrote:

Greg Herlihy wrote:
>So, unlike, say, searching for an item in a set (in which every search could
require a "worst case" or O(log(N)) number of operations), erasing an item
from a set may require O(log(N)) operations only rarely (I estimate around 1
in e^N items erased could turn out to be a worst case).

I think you don't really understand the O() notation. The O()
notation expresses the asymptotic *upper limit* for the time required
to perform the specified operation. Or in other words, it describes
the *worst case*, ie. the largest amount of time the operation could
possibly take. It doesn't matter if this happens rarely. O() doesn't
express the average time nor the amortized time for that operation.

You misunderstand the word "limit" as it applies to big-O notation. The word
"limit" here has the same meaning as it does in calculus. A limit is not the
maximum number of operations of a "worst case" scenario - rather a limit is
the value that the entire series of individual worst cases is converging
upon (or approaching). So a big-O complexity measure really does convey
something akin to the "average" (or expected) number of operations (in the
worst case) and really says nothing about how many operations that the worst
of the worst cases could take - a number that may greatly exceed the
complexity described by the big-O notation.

Even if erasing the element can be done in constant time one
billion times for each lg(n)-time erase, erasing the element is
still an O(lg(n))-time algorithm. As I said, O() expresses the
absolute worst case scenario.

No. In order for erasing an item from a set to have O(log(N)) complexity,
the limit of the series of "worst-case" deletions that could be performed
upon a set must approach O(log(N)). Yet no matter how you carefully you
might select the items to be placed in the set and no matter how carefully
you might order each deletion in order to maximize the number of operations
required - the number of operations of each deletion when divided by the
number of items deleted will not approach O(log(N)) but will instead
converge toward a constant value. In other words, deleting an item from a
set has constant and not a logarithmic complexity.

"Amortized time" is a completely different thing. It says that if
you perform the operation a very large amount of times over a typical
data set, how much *in average* it will take time.

No. Amortized complexity is not a statistical measure nor does it describe
the "average case" complexity (and the data set may not be "typical" it must
be a worst case data set). So amortized complexity describes the limit of
the series of "worst cases" (so it's similar in concept to the average
"worst" case). Note also that an amortized complexity need not be a
constant. The amortized complexity of a binary search is O(log(N)) since the
number of operations over the entire series of "worst case" searches does
not converge upon a constant but approaches an ever-increasing value - its
"limit".

>A clearer example would be a std::vector that doubles its capacity whenever
its size is about to exceed its current capacity. Now, even though inserting
an item into this vector could require a (worst case) N number of operations
- to allocate the added capacity - it should be clear that not every
insertion could be a worst case.

Insertion into a vector *is* an O(n) operation. It doesn't matter if
the operation is constant-time when amortized over a large amount of
insertions. O() expresses the worst possible case of one operation.

Insertion at the start of a vector (or a fixed distance) from the start does
have linear complexity - while inserting at the end (or a fixed distance
before the end) has a constant time complexity.

>In fact the number of such worst cases is
limited to 1 per every 2^N insertions. So in this case, even though the
worst case has linear complexity, the average case still has a constant
complexity, because as the worst cases become worse - they also become less
frequent by the same degree.

Yes, back-insertion in a vector is an *amortized* constant-time
operation, but it's an O(n) operation.

No, inserting at the end of a vector has constant-time complexity. It is not
an O(N) operation by any means.

Greg

Jun 15 '07 #26

desktop

John Harrison wrote:

>>
But delete not only uses re balancing, it also uses tree-successor
which have a running time equal to the height of the tree.

Tree successor has a worst case running time equal to the height of the
tree, but it's average time is O(1).

john

hehe everything seems to be running in amortized constant time now :-)
Do you no of any documents that proves that tree-successor runs in
amortized constant time?

Jun 15 '07 #27

=?ISO-8859-1?Q?Erik_Wikstr=F6m?=

On 2007-06-16 00:40, Greg Herlihy wrote:

On 6/15/07 2:35 AM, in article 46**********************@news.song.fi, "Juha
Nieminen" <no****@thanks.invalidwrote:

>Greg Herlihy wrote:
>>So, unlike, say, searching for an item in a set (in which every search could
require a "worst case" or O(log(N)) number of operations), erasing an item
from a set may require O(log(N)) operations only rarely (I estimate around 1
in e^N items erased could turn out to be a worst case).

I think you don't really understand the O() notation. The O()
notation expresses the asymptotic *upper limit* for the time required
to perform the specified operation. Or in other words, it describes
the *worst case*, ie. the largest amount of time the operation could
possibly take. It doesn't matter if this happens rarely. O() doesn't
express the average time nor the amortized time for that operation.

You misunderstand the word "limit" as it applies to big-O notation. The word
"limit" here has the same meaning as it does in calculus. A limit is not the
maximum number of operations of a "worst case" scenario - rather a limit is
the value that the entire series of individual worst cases is converging
upon (or approaching). So a big-O complexity measure really does convey
something akin to the "average" (or expected) number of operations (in the
worst case) and really says nothing about how many operations that the worst
of the worst cases could take - a number that may greatly exceed the
complexity described by the big-O notation.

> Even if erasing the element can be done in constant time one
billion times for each lg(n)-time erase, erasing the element is
still an O(lg(n))-time algorithm. As I said, O() expresses the
absolute worst case scenario.

No. In order for erasing an item from a set to have O(log(N)) complexity,
the limit of the series of "worst-case" deletions that could be performed
upon a set must approach O(log(N)). Yet no matter how you carefully you
might select the items to be placed in the set and no matter how carefully
you might order each deletion in order to maximize the number of operations
required - the number of operations of each deletion when divided by the
number of items deleted will not approach O(log(N)) but will instead
converge toward a constant value. In other words, deleting an item from a
set has constant and not a logarithmic complexity.

> "Amortized time" is a completely different thing. It says that if
you perform the operation a very large amount of times over a typical
data set, how much *in average* it will take time.

No. Amortized complexity is not a statistical measure nor does it describe
the "average case" complexity (and the data set may not be "typical" it must
be a worst case data set). So amortized complexity describes the limit of
the series of "worst cases" (so it's similar in concept to the average
"worst" case). Note also that an amortized complexity need not be a
constant. The amortized complexity of a binary search is O(log(N)) since the
number of operations over the entire series of "worst case" searches does
not converge upon a constant but approaches an ever-increasing value - its
"limit".

>>A clearer example would be a std::vector that doubles its capacity whenever
its size is about to exceed its current capacity. Now, even though inserting
an item into this vector could require a (worst case) N number of operations
- to allocate the added capacity - it should be clear that not every
insertion could be a worst case.

Insertion into a vector *is* an O(n) operation. It doesn't matter if
the operation is constant-time when amortized over a large amount of
insertions. O() expresses the worst possible case of one operation.

Insertion at the start of a vector (or a fixed distance) from the start does
have linear complexity - while inserting at the end (or a fixed distance
before the end) has a constant time complexity.

>>In fact the number of such worst cases is
limited to 1 per every 2^N insertions. So in this case, even though the
worst case has linear complexity, the average case still has a constant
complexity, because as the worst cases become worse - they also become less
frequent by the same degree.

Yes, back-insertion in a vector is an *amortized* constant-time
operation, but it's an O(n) operation.

No, inserting at the end of a vector has constant-time complexity. It is not
an O(N) operation by any means.

Are you sure, I was under the impression that insertions in a vector
(even at the end) was amortized constant time due to the need to re-
allocate now and then. Insertions at the end (or anywhere provided a
pointer/iterator) to a list on the other hand is constant since there
will never be any reallocations.

--
Erik Wikström

Jun 15 '07 #28

desktop wrote:

John Harrison wrote:

>>>
But delete not only uses re balancing, it also uses tree-successor
which have a running time equal to the height of the tree.

Tree successor has a worst case running time equal to the height of
the tree, but it's average time is O(1).

john

hehe everything seems to be running in amortized constant time now :-)
Do you no of any documents that proves that tree-successor runs in
amortized constant time?

Well, exercise 13.2-4 in Cormen states (in effect) prove that you can
use tree-successor to iterate through a n-node binary search tree in
O(n) time. So each indiviual step must be O(1).

But I'm no mathematical expert.

Jun 16 '07 #29

John Harrison wrote:

desktop wrote:
>John Harrison wrote:
>>>>
But delete not only uses re balancing, it also uses tree-successor
which have a running time equal to the height of the tree.

Tree successor has a worst case running time equal to the height of
the tree, but it's average time is O(1).

john

hehe everything seems to be running in amortized constant time now :-)
Do you no of any documents that proves that tree-successor runs in
amortized constant time?

Well, exercise 13.2-4 in Cormen states (in effect) prove that you can
use tree-successor to iterate through a n-node binary search tree in
O(n) time. So each indiviual step must be O(1).

But I'm no mathematical expert.

I think the proof would be along the lines of this.

A binary tree of N nodes has N-1 links. If tree-successor is called to
iterate though the tree from beginning to end then each link will be
traversed twice (once in each direction). Therefore 2N-2 traverals are
made. Therefore an average of (2N-2)/N traverals for each step.
Therefore O(1) for each step.

john

Jun 16 '07 #30

Hans Bos

"John Harrison" <jo*************@hotmail.comwrote in message
news:01*************@newsfe1-win.ntli.net...

desktop wrote:
>John Harrison wrote:
>>>>
But delete not only uses re balancing, it also uses tree-successor which
have a running time equal to the height of the tree.

Tree successor has a worst case running time equal to the height of the
tree, but it's average time is O(1).

john

hehe everything seems to be running in amortized constant time now :-) Do you
no of any documents that proves that tree-successor runs in amortized
constant time?

Well, exercise 13.2-4 in Cormen states (in effect) prove that you can use
tree-successor to iterate through a n-node binary search tree in O(n) time. So
each indiviual step must be O(1).

This is only true as long as the tree doesn't change during the iteration.

If you iterate from the root, it takes O(logN) time to find the successor.
If that successor is moved to the root, before you iterate to it's successor,
that step will also take O(logN) time.

So it the tree is modified while iterating, the amoritized time can still be
O(logN) per iteration.

Hans.

Jun 16 '07 #31

Jerry Coffin

In article <46*********************@news.xs4all.nl>, ha******@xelion.nl
says...

[ ... ]

If each time the root of the tree is deleted, each delete operation will take
O(logN) time.

So even amortized this is O(logN) per deletion.

One minor addition to note: since you are deleting nodes, the value of N
decreases each iteration, sone when you delete the last node (for
example) N is 1.

In terms of big-O notation, this has no effect since O(Log(N/2)) is the
same as O(Log(N)) -- both are logarithmic. In the real world it's often
worth keeping in mind though...

--
Later,
Jerry.

The universe is a figment of its own imagination.

Jun 17 '07 #32

John Harrison wrote:

it's average time is O(1).

Nitpicking here, but I'm not sure you should be using the O()
notation for the average time. The O notation expresses the worst
case scenario and is not related to the average cases in any way.

More correct would probably be "in average it's a constant-time
operation".

Jun 17 '07 #33