467,188 Members | 1,357 Online
Bytes | Developer Community
Ask Question

Home New Posts Topics Members FAQ

Post your question to a community of 467,188 developers. It's quick & easy.

Time complexity of size() for std::set

Hi,

Anyone know if the Standard has anything to say about the time complexity
of size() for std::set?

I need to access a set's size (/not/ to know if it is empty!) heavily
during an algorithm and was thus wondering whether I'd be better off
tracking the size "manually" or whether that'd be pointless.

Thanks,

--
Lionel B
Feb 1 '07 #1
  • viewed: 6012
Share:
26 Replies
On 1 Feb., 12:46, "Lionel B" <m...@privacy.netwrote:
Hi,

Anyone know if the Standard has anything to say about the time complexity
of size() for std::set?

I need to access a set's size (/not/ to know if it is empty!) heavily
during an algorithm and was thus wondering whether I'd be better off
tracking the size "manually" or whether that'd be pointless.
That would be pointless. size() is O(1).

/Peter

Feb 1 '07 #2
peter koch wrote:
On 1 Feb., 12:46, "Lionel B" <m...@privacy.netwrote:
>Hi,

Anyone know if the Standard has anything to say about the time complexity
of size() for std::set?

I need to access a set's size (/not/ to know if it is empty!) heavily
during an algorithm and was thus wondering whether I'd be better off
tracking the size "manually" or whether that'd be pointless.

That would be pointless. size() is O(1).
Nit: the standard says "should" have constant time. Implementations may take
license to do worse. I know that some do this for std::list<as a part of
some trade-off with other operation. However, this reason would not apply
to std::set<as far as I can see.
Best

Kai-Uwe Bux

Feb 1 '07 #3
"peter koch" <pe***************@gmail.comwrote in message
news:11*********************@s48g2000cws.googlegro ups.com...
On 1 Feb., 12:46, "Lionel B" <m...@privacy.netwrote:
>Hi,

Anyone know if the Standard has anything to say about the time complexity
of size() for std::set?

I need to access a set's size (/not/ to know if it is empty!) heavily
during an algorithm and was thus wondering whether I'd be better off
tracking the size "manually" or whether that'd be pointless.

That would be pointless. size() is O(1).

/Peter
Actually, size() is preferred to be O(1) by the standard but it's not
guaranteed. So an implementation could very well choose to make it O(n) (see
23.1/5: "[size()] should have constant complexity". It says "should", not
"shall").

- Sylvester
Feb 1 '07 #4
On Thu, 01 Feb 2007 08:13:53 -0500, Kai-Uwe Bux wrote:
peter koch wrote:
>On 1 Feb., 12:46, "Lionel B" <m...@privacy.netwrote:
>>Hi,

Anyone know if the Standard has anything to say about the time complexity
of size() for std::set?

I need to access a set's size (/not/ to know if it is empty!) heavily
during an algorithm and was thus wondering whether I'd be better off
tracking the size "manually" or whether that'd be pointless.

That would be pointless. size() is O(1).

Nit: the standard says "should" have constant time. Implementations may take
license to do worse. I know that some do this for std::list<as a part of
some trade-off with other operation.
I was aware of that, hence my reluctance to use size() for std::set.
However, this reason would not apply to std::set<as far as I can see.
Ok, I guess the only option is to try it and see...

Thanks,

--
Lionel B
Feb 1 '07 #5
Sylvester Hesp wrote:
"peter koch" <pe***************@gmail.comwrote in message
news:11*********************@s48g2000cws.googlegro ups.com...
>On 1 Feb., 12:46, "Lionel B" <m...@privacy.netwrote:
>>Hi,

Anyone know if the Standard has anything to say about the time complexity
of size() for std::set?

I need to access a set's size (/not/ to know if it is empty!) heavily
during an algorithm and was thus wondering whether I'd be better off
tracking the size "manually" or whether that'd be pointless.
That would be pointless. size() is O(1).

/Peter

Actually, size() is preferred to be O(1) by the standard but it's not
guaranteed. So an implementation could very well choose to make it O(n) (see
23.1/5: "[size()] should have constant complexity". It says "should", not
"shall").
That's correct. On the other hand, for associative containers, an
implementation that didn't implement size() as O(1) would be doing so
simply out of perversity. The underlying issue is list::splice, where
you can insert a range of arbitrary size from some other list. The
implementation can either count the elements being inserted, making
splice O(n), or it can make size() O(n). With an associative container,
inserting more than one element is inherently O(n), since each element
has to be put into the right place. So there's no reason to make size()
slow.

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)
Feb 1 '07 #6
In article <ep**********@south.jnrs.ja.net>,
"Lionel B" <me@privacy.netwrote:
On Thu, 01 Feb 2007 08:13:53 -0500, Kai-Uwe Bux wrote:
peter koch wrote:
On 1 Feb., 12:46, "Lionel B" <m...@privacy.netwrote:
Hi,

Anyone know if the Standard has anything to say about the time complexity
of size() for std::set?

I need to access a set's size (/not/ to know if it is empty!) heavily
during an algorithm and was thus wondering whether I'd be better off
tracking the size "manually" or whether that'd be pointless.

That would be pointless. size() is O(1).
Nit: the standard says "should" have constant time. Implementations may take
license to do worse. I know that some do this for std::list<as a part of
some trade-off with other operation.

I was aware of that, hence my reluctance to use size() for std::set.
However, this reason would not apply to std::set<as far as I can see.

Ok, I guess the only option is to try it and see...
Lionel,

I agree that this is your only option. I also believe this represents a
serious defect in the C++ standard. I am asking your permission to use
this thread as the basis for a defect report against the current working
draft. Please see:

http://home.twcny.rr.com/hinnant/cpp...iew/lwg-active
..html#632

for an unofficial preview. Please feel free to contact me, publicly or
privately. Your options are:

1. Don't contact me at all. I will interpret this as refusing
permission to use your words and name in the defect report, and I will
remove them prior to submitting this document to the committee.

2. Tell me to remove your name/words from the issue and I will do so at
once.

3. Give me permission, but rewrite some or all of the text of the
issue. What is said here is completely up to you. You may propose a
resolution or not. Note that the committee may not accept your proposal
though.

4. Give me permission to go with what I have unchanged.

(#3 is my actual preference :-))

Respectfully,
Howard Hinnant
Library Working Group Chair
C++ Standards Committee
(contact info in the header of this post, and at the top of the linked
webpage is correct)
Feb 1 '07 #7
On 1 Feb., 13:43, "peter koch" <peter.koch.lar...@gmail.comwrote:
On 1 Feb., 12:46, "Lionel B" <m...@privacy.netwrote:
Hi,
Anyone know if the Standard has anything to say about the time complexity
of size() for std::set?
I need to access a set's size (/not/ to know if it is empty!) heavily
during an algorithm and was thus wondering whether I'd be better off
tracking the size "manually" or whether that'd be pointless.

That would be pointless. size() is O(1).

/Peter
I prefer to answer my own post rather than reply to all the other
knowledgeable people who have corrected me. So far as I see it,
"should" is a very strong suggestion that requires quite a lot to
disregard. For std::list this reason exists in splice, that in special
situations can be made signifiantly faster. This "excuse" does not
exist for std::set (and std::map), which in my opinion means that you
will not be able to find a quality standard library where size is not
O(1).
I agree that this might be considered a defect as suggested by Howard
Hinnant.

/Peter
Feb 1 '07 #8
"peter koch" <pe***************@gmail.comwrote in message
news:11*********************@p10g2000cwp.googlegro ups.com...
On 1 Feb., 13:43, "peter koch" <peter.koch.lar...@gmail.comwrote:
>On 1 Feb., 12:46, "Lionel B" <m...@privacy.netwrote:
Hi,
Anyone know if the Standard has anything to say about the time
complexity
of size() for std::set?
I need to access a set's size (/not/ to know if it is empty!) heavily
during an algorithm and was thus wondering whether I'd be better off
tracking the size "manually" or whether that'd be pointless.

That would be pointless. size() is O(1).

/Peter

I prefer to answer my own post rather than reply to all the other
knowledgeable people who have corrected me. So far as I see it,
"should" is a very strong suggestion that requires quite a lot to
disregard. For std::list this reason exists in splice, that in special
situations can be made signifiantly faster. This "excuse" does not
exist for std::set (and std::map), which in my opinion means that you
will not be able to find a quality standard library where size is not
O(1).
I'd go a step farther. The one case in list::splice that goes faster
if size is O(N) is one that *should not* go faster. It lets you splice
a segment from another list whose head follows its tail -- thus snagging
the head node (in all implementations I know) and destroying the
integrity of both lists in the bargain. Better to walk the candidate
splice looking for a head node and counting the elements to be moved
between lists. That way size() can remain O(1) and the list takes the
time necessary to preserve its integrity.

I remember being told years ago that the purpose of OOP was to give
the member functions exclusive access to the stored state so that
nothing bad can ever happen to that state. If we're not going to
preserve that property, why bother with encapsulation?
I agree that this might be considered a defect as suggested by Howard
Hinnant.
Yup.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
Feb 1 '07 #9
P.J. Plauger wrote:
"peter koch" <pe***************@gmail.comwrote in message
news:11*********************@p10g2000cwp.googlegro ups.com...
>On 1 Feb., 13:43, "peter koch" <peter.koch.lar...@gmail.comwrote:
>>On 1 Feb., 12:46, "Lionel B" <m...@privacy.netwrote:

Hi,

Anyone know if the Standard has anything to say about the time
complexity
of size() for std::set?

I need to access a set's size (/not/ to know if it is empty!) heavily
during an algorithm and was thus wondering whether I'd be better off
tracking the size "manually" or whether that'd be pointless.

That would be pointless. size() is O(1).

/Peter

I prefer to answer my own post rather than reply to all the other
knowledgeable people who have corrected me. So far as I see it,
"should" is a very strong suggestion that requires quite a lot to
disregard. For std::list this reason exists in splice, that in special
situations can be made signifiantly faster. This "excuse" does not
exist for std::set (and std::map), which in my opinion means that you
will not be able to find a quality standard library where size is not
O(1).

I'd go a step farther. The one case in list::splice that goes faster
if size is O(N) is one that *should not* go faster. It lets you splice
a segment from another list whose head follows its tail -- thus snagging
the head node (in all implementations I know) and destroying the
integrity of both lists in the bargain.
Huh? That was a little too condensed for me; I got lost at the "whose head
follows its tail" clause. Could you give an example in code that
demonstrates the problem?
Better to walk the candidate
splice looking for a head node and counting the elements to be moved
between lists. That way size() can remain O(1) and the list takes the
time necessary to preserve its integrity.

I remember being told years ago that the purpose of OOP was to give
the member functions exclusive access to the stored state so that
nothing bad can ever happen to that state. If we're not going to
preserve that property, why bother with encapsulation?
[snip]
Best

Kai-Uwe Bux
Feb 1 '07 #10
"Kai-Uwe Bux" <jk********@gmx.netwrote in message
news:ep**********@murdoch.acc.Virginia.EDU...
P.J. Plauger wrote:
.....
>I'd go a step farther. The one case in list::splice that goes faster
if size is O(N) is one that *should not* go faster. It lets you splice
a segment from another list whose head follows its tail -- thus snagging
the head node (in all implementations I know) and destroying the
integrity of both lists in the bargain.

Huh? That was a little too condensed for me; I got lost at the "whose head
follows its tail" clause. Could you give an example in code that
demonstrates the problem?
std::list<wchar_tc1;
c1.push_back(L'a');
c1.push_back(L'b');
c1.push_back(L'c');

std::list<wchar_tc2(c1);
std::list<wchar_t>::iterator it = c2.begin();
c1.splice(c1.begin(), c2, ++it, c2.begin());

This is an ill-formed interval in c2, but if you just go ahead
and splice it without checking you'll add {b, c, HEAD2, a} from
c2 to the front of {a b c HEAD1} in c1. Among other bad things,
the next call to c2.begin() will probably point somewhere inside
c1.

But think of all the time you've saved not checking for this
problem.
>Better to walk the candidate
splice looking for a head node and counting the elements to be moved
between lists. That way size() can remain O(1) and the list takes the
time necessary to preserve its integrity.
P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
Feb 2 '07 #11
On Thu, 01 Feb 2007 11:53:31 -0500, Howard Hinnant wrote:
In article <ep**********@south.jnrs.ja.net>,
"Lionel B" <me@privacy.netwrote:
>On Thu, 01 Feb 2007 08:13:53 -0500, Kai-Uwe Bux wrote:
peter koch wrote:

On 1 Feb., 12:46, "Lionel B" <m...@privacy.netwrote:
Hi,

Anyone know if the Standard has anything to say about the time complexity
of size() for std::set?

I need to access a set's size (/not/ to know if it is empty!) heavily
during an algorithm and was thus wondering whether I'd be better off
tracking the size "manually" or whether that'd be pointless.

That would be pointless. size() is O(1).

Nit: the standard says "should" have constant time. Implementations may take
license to do worse. I know that some do this for std::list<as a part of
some trade-off with other operation.

I was aware of that, hence my reluctance to use size() for std::set.
However, this reason would not apply to std::set<as far as I can see.

Ok, I guess the only option is to try it and see...

Lionel,

I agree that this is your only option. I also believe this represents a
serious defect in the C++ standard. I am asking your permission to use
this thread as the basis for a defect report against the current working
draft. Please see:

http://home.twcny.rr.com/hinnant/cpp...iew/lwg-active
.html#632

for an unofficial preview. Please feel free to contact me, publicly or
privately. Your options are:
As far as I am concerned this conversation is already in the public domain
and you may do with it as you please - short, obviously, of misrepresenting
my views.

In particular, note that I wasn't actually criticising the standard, merely
seeking clarification. I have followed the debate on time complexity of
container size() functions - in this and previous threads - with interest,
but really don't have much to contribute to it.

If I have any recommendation to the C++ Standards Committee it is that
implementations must (not "should"!) document clearly[1], where known, the
time complexity of *all* container access operations.

[1] In my case (gcc 4.1.1) I can't swear that the time complexity of size()
for std::set is not documented... but if it is it's certainly well hidden
away.

[...]

Best regards,

--
Lionel B
Feb 2 '07 #12
P.J. Plauger wrote:
"Kai-Uwe Bux" <jk********@gmx.netwrote in message
news:ep**********@murdoch.acc.Virginia.EDU...
>P.J. Plauger wrote:
.....
>>I'd go a step farther. The one case in list::splice that goes faster
if size is O(N) is one that *should not* go faster. It lets you splice
a segment from another list whose head follows its tail -- thus snagging
the head node (in all implementations I know) and destroying the
integrity of both lists in the bargain.

Huh? That was a little too condensed for me; I got lost at the "whose
head follows its tail" clause. Could you give an example in code that
demonstrates the problem?

std::list<wchar_tc1;
c1.push_back(L'a');
c1.push_back(L'b');
c1.push_back(L'c');

std::list<wchar_tc2(c1);
std::list<wchar_t>::iterator it = c2.begin();
c1.splice(c1.begin(), c2, ++it, c2.begin());

This is an ill-formed interval in c2, but if you just go ahead
and splice it without checking you'll add {b, c, HEAD2, a} from
c2 to the front of {a b c HEAD1} in c1. Among other bad things,
the next call to c2.begin() will probably point somewhere inside
c1.

But think of all the time you've saved not checking for this
problem.
Thanks a lot. I see the problem now.

However, following your suggestion, I thaught "of all the time you've saved
not checking for this problem". And I think: it's a considerable amount. I
hit my code and found that the merge-sort implementation for list makes
heavy use of this version of splice and would be severely slowed down if it
was implemented to be linear time. I would even contend that for some
applications a list is the container of choice precisely _because_ you safe
time not checking for this problem.
>>Better to walk the candidate
splice looking for a head node and counting the elements to be moved
between lists. That way size() can remain O(1) and the list takes the
time necessary to preserve its integrity.
I am not yet convinced that this would be better. Maybe sorting is the only
reasonable application for this splice and that is taken care of by a
member function. However, it seems that making splice O(n) and size O(1)
would reduce the advantages of std::list<to the other sequence containers
to the validity of iterators.

I think I would prefer to have an implementation that puts range checks into
debug versions and leaves them out with -DNDEBUG. This way, I can check my
programs for correctness and still get optimum performance once I convinced
myself that the code is sound.
Best

Kai-Uwe Bux
Feb 2 '07 #13
"Kai-Uwe Bux" <jk********@gmx.netwrote in message
news:ep**********@murdoch.acc.Virginia.EDU...
P.J. Plauger wrote:
>"Kai-Uwe Bux" <jk********@gmx.netwrote in message
news:ep**********@murdoch.acc.Virginia.EDU...
>>P.J. Plauger wrote:
.....
I'd go a step farther. The one case in list::splice that goes faster
if size is O(N) is one that *should not* go faster. It lets you splice
a segment from another list whose head follows its tail -- thus
snagging
the head node (in all implementations I know) and destroying the
integrity of both lists in the bargain.

Huh? That was a little too condensed for me; I got lost at the "whose
head follows its tail" clause. Could you give an example in code that
demonstrates the problem?

std::list<wchar_tc1;
c1.push_back(L'a');
c1.push_back(L'b');
c1.push_back(L'c');

std::list<wchar_tc2(c1);
std::list<wchar_t>::iterator it = c2.begin();
c1.splice(c1.begin(), c2, ++it, c2.begin());

This is an ill-formed interval in c2, but if you just go ahead
and splice it without checking you'll add {b, c, HEAD2, a} from
c2 to the front of {a b c HEAD1} in c1. Among other bad things,
the next call to c2.begin() will probably point somewhere inside
c1.

But think of all the time you've saved not checking for this
problem.

Thanks a lot. I see the problem now.

However, following your suggestion, I thaught "of all the time you've
saved
not checking for this problem". And I think: it's a considerable amount. I
hit my code and found that the merge-sort implementation for list makes
heavy use of this version of splice and would be severely slowed down if
it
was implemented to be linear time. I would even contend that for some
applications a list is the container of choice precisely _because_ you
safe
time not checking for this problem.
Gee, my version of list merge-sort splices the *entire* contents
of c2 into c1 each time. And that overload of splice does *not*
have to do the check I describe. You can splice:

-- all of another list into this one

-- one element of this list somewhere else

-- one element of another list into this one

-- a subrange of this list somewhere else

-- a subrange of another list into this one

It's only the last case, where the subrange is neither empty
nor the full list, that you must walk it to count up elements
and check for a head node.
>>>Better to walk the candidate
splice looking for a head node and counting the elements to be moved
between lists. That way size() can remain O(1) and the list takes the
time necessary to preserve its integrity.

I am not yet convinced that this would be better. Maybe sorting is the
only
reasonable application for this splice and that is taken care of by a
member function. However, it seems that making splice O(n) and size O(1)
would reduce the advantages of std::list<to the other sequence
containers
to the validity of iterators.
I'm not talking about *always* making splice O(N) -- just this
one case. (And the C++ Standard allows that case already,
oddly enough.) Most cases are and should remain O(1).
I think I would prefer to have an implementation that puts range checks
into
debug versions and leaves them out with -DNDEBUG. This way, I can check my
programs for correctness and still get optimum performance once I
convinced
myself that the code is sound.
If only the failure mode weren't so awful.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
Feb 2 '07 #14
In article <ep**********@murdoch.acc.Virginia.EDU>,
Kai-Uwe Bux <jk********@gmx.netwrote:
However, following your suggestion, I thaught "of all the time you've saved
not checking for this problem". And I think: it's a considerable amount. I
hit my code and found that the merge-sort implementation for list makes
heavy use of this version of splice and would be severely slowed down if it
was implemented to be linear time.
<nodI've implemented list::sort using merge-sort and "splice some from
other" as well. And I also preferred an O(1) list::size. And I also
knew that I could not afford an O(N) splice-some-from-other in the
middle of my merge sort.

I solved it by implementing a new splice-some-from-other signature that
is guaranteed to be O(1) even when size() is also O(1):

void splice(iterator position, list& x, iterator first,
iterator last, size_type n);

Precondition: distance(first, last) == n
Complexity: O(1)

In the merge sort, if you know the size of the list (in O(1) time), then
you also know the size of the half of the list that you're wanting to
splice into an auxiliary list. So you can pass that information (n)
into the splice function and it then does not need to compute it. A
debug build can easily verify the precondition.

After coding merge sort this way I started looking around for other
use-cases of splice-some-from-other where distance(first, last) is *not*
known a-priori, or easily computed without changing the complexity of
the overall algorithm. I didn't find any. I'm not claiming such
applications don't exist. But I am claiming that they appear rare.

During this search I also looked for use-cases which assumed an O(1)
list::size. I found lots.

http://home.twcny.rr.com/hinnant/cpp...list_size.html

-Howard
Feb 2 '07 #15
On Thu, 1 Feb 2007 18:59:08 -0500, P.J. Plauger <pj*@dinkumware.comwrote:
"Kai-Uwe Bux" <jk********@gmx.netwrote in message
news:ep**********@murdoch.acc.Virginia.EDU...
>P.J. Plauger wrote:
.....
>>I'd go a step farther. The one case in list::splice that goes faster
if size is O(N) is one that *should not* go faster. It lets you splice
a segment from another list whose head follows its tail -- thus snagging
the head node (in all implementations I know) and destroying the
integrity of both lists in the bargain.

Huh? That was a little too condensed for me; I got lost at the "whose head
follows its tail" clause. Could you give an example in code that
demonstrates the problem?

std::list<wchar_tc1;
c1.push_back(L'a');
c1.push_back(L'b');
c1.push_back(L'c');

std::list<wchar_tc2(c1);
std::list<wchar_t>::iterator it = c2.begin();
c1.splice(c1.begin(), c2, ++it, c2.begin());

This is an ill-formed interval in c2, but if you just go ahead
and splice it without checking you'll add {b, c, HEAD2, a} from
c2 to the front of {a b c HEAD1} in c1. Among other bad things,
the next call to c2.begin() will probably point somewhere inside
c1.

But think of all the time you've saved not checking for this
problem.
Sorry, I don't get this. Do you propose that we should avoid much of the
standard library?

What is different here compared to std::copy and all other functions that
take two iterators and expect [first, last) to be valid? Am I missing
something?

/Jorgen

--
// Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
\X/ snipabacken.dyndns.org R'lyeh wgah'nagl fhtagn!
Feb 2 '07 #16
"Jorgen Grahn" <gr********@snipabacken.dyndns.orgwrote in message
news:sl***********************@frailea.sa.invalid. ..
On Thu, 1 Feb 2007 18:59:08 -0500, P.J. Plauger <pj*@dinkumware.com>
wrote:
>"Kai-Uwe Bux" <jk********@gmx.netwrote in message
news:ep**********@murdoch.acc.Virginia.EDU...
>>P.J. Plauger wrote:
.....
I'd go a step farther. The one case in list::splice that goes faster
if size is O(N) is one that *should not* go faster. It lets you splice
a segment from another list whose head follows its tail -- thus
snagging
the head node (in all implementations I know) and destroying the
integrity of both lists in the bargain.

Huh? That was a little too condensed for me; I got lost at the "whose
head
follows its tail" clause. Could you give an example in code that
demonstrates the problem?

std::list<wchar_tc1;
c1.push_back(L'a');
c1.push_back(L'b');
c1.push_back(L'c');

std::list<wchar_tc2(c1);
std::list<wchar_t>::iterator it = c2.begin();
c1.splice(c1.begin(), c2, ++it, c2.begin());

This is an ill-formed interval in c2, but if you just go ahead
and splice it without checking you'll add {b, c, HEAD2, a} from
c2 to the front of {a b c HEAD1} in c1. Among other bad things,
the next call to c2.begin() will probably point somewhere inside
c1.

But think of all the time you've saved not checking for this
problem.

Sorry, I don't get this. Do you propose that we should avoid much of the
standard library?
Where did that come from? I said nothing like that.
What is different here compared to std::copy and all other functions that
take two iterators and expect [first, last) to be valid? Am I missing
something?
Yes and no. One philosophy of C++, inherited from C, is that you should
be given enough rope to hang yourself, and speedily at that. Another,
inherited from OOP, is that a class has the right and obligation to
protect itself from misuse.

You can indeed call copy with a bad pair of iterators and write all over
memory. Extra iterator debugging logic can catch some such calls, but not
all. And even if an implementation wanted to do this sort of checking all
the time, it can't. Time complexity requirements in the C++ Standard often
forbid it. So the C++ Standard partly institutionalizes the wild 'n wooly
approach to programming.

Template class list has just one member function, which splices in part
of another list, that *requires* linear time to do its work safely.
Some of us have seen to it that the C++ Standard *permits* this member
function to take that time if the implementer so chooses. It so happens
that this is the only member function across all containers that *can*
go faster by taking advantage of another latitude in the C++ Standard --
to have list::size() take linear time instead of constant. Put simply,
slow size() is there to permit fast partial splice (which is still not
guaranteed).

So here is at least one case in the C++ Standard where:

a) people don't like the prospect that calling size() for a container
might take linear time

b) we could remove that requirement from the C++ Standard if all
implementations did their counting at partial-splice time

c) while we're counting the nodes moved during a partial splice, we
can even enforce the integrity goals of OOP

What we give up is the *permission* that implementers currently
have to let people tie granny knots in lists, to be discovered at
a later date, so that one admittedly rare form of splice can do
its damage really fast. I consider this case at least philosophically
different from calling copy with bad arguments. YMMV.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
Feb 3 '07 #17
* P.J. Plauger:
>
Template class list has just one member function, which splices in part
of another list, that *requires* linear time to do its work safely.
Some of us have seen to it that the C++ Standard *permits* this member
function to take that time if the implementer so chooses. It so happens
that this is the only member function across all containers that *can*
go faster by taking advantage of another latitude in the C++ Standard --
to have list::size() take linear time instead of constant. Put simply,
slow size() is there to permit fast partial splice (which is still not
guaranteed).

So here is at least one case in the C++ Standard where:

a) people don't like the prospect that calling size() for a container
might take linear time

b) we could remove that requirement from the C++ Standard if all
implementations did their counting at partial-splice time

c) while we're counting the nodes moved during a partial splice, we
can even enforce the integrity goals of OOP

What we give up is the *permission* that implementers currently
have to let people tie granny knots in lists, to be discovered at
a later date, so that one admittedly rare form of splice can do
its damage really fast. I consider this case at least philosophically
different from calling copy with bad arguments. YMMV.
Ah, well. Essentially there are then, in practice, two kinds of
implementations of std::list, namely fast size, and fast (unsafe) splice.

But with a given standard library implementation there's just one.

Wouldn't it be better to have both, accessible to the user, who could
then make the decision to use one or the other or both, with an
implementation-defined typedef of std::list for default? Was this
considered? And if it was, considering that the two could share most of
their implementations, why was it rejected?

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Feb 3 '07 #18
* Alf P. Steinbach:
* P.J. Plauger:
>>
Template class list has just one member function, which splices in part
of another list, that *requires* linear time to do its work safely.
Some of us have seen to it that the C++ Standard *permits* this member
function to take that time if the implementer so chooses. It so happens
that this is the only member function across all containers that *can*
go faster by taking advantage of another latitude in the C++ Standard --
to have list::size() take linear time instead of constant. Put simply,
slow size() is there to permit fast partial splice (which is still not
guaranteed).

So here is at least one case in the C++ Standard where:

a) people don't like the prospect that calling size() for a container
might take linear time

b) we could remove that requirement from the C++ Standard if all
implementations did their counting at partial-splice time

c) while we're counting the nodes moved during a partial splice, we
can even enforce the integrity goals of OOP

What we give up is the *permission* that implementers currently
have to let people tie granny knots in lists, to be discovered at
a later date, so that one admittedly rare form of splice can do
its damage really fast. I consider this case at least philosophically
different from calling copy with bad arguments. YMMV.

Ah, well. Essentially there are then, in practice, two kinds of
implementations of std::list, namely fast size, and fast (unsafe) splice.

But with a given standard library implementation there's just one.

Wouldn't it be better to have both, accessible to the user, who could
then make the decision to use one or the other or both, with an
implementation-defined typedef of std::list for default?
Ouch, I wrote that without recalling that no template typedef at the
time, or still, for that matter, but one could have used e.g.

template< typename A >
struct list { typedef fast_size_list<Atype; };

(with all the template parameters of std::list of course), or, for the
same convenience as we have now with std::list, exact same usage,

template< typename A >
struct list: fast_size_list<A{ ... };

Was this
considered? And if it was, considering that the two could share most of
their implementations, why was it rejected?
--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Feb 3 '07 #19
P.J. Plauger wrote:
"Jorgen Grahn" <gr********@snipabacken.dyndns.orgwrote in message
news:sl***********************@frailea.sa.invalid. ..
>On Thu, 1 Feb 2007 18:59:08 -0500, P.J. Plauger <pj*@dinkumware.com>
wrote:
>>"Kai-Uwe Bux" <jk********@gmx.netwrote in message
news:ep**********@murdoch.acc.Virginia.EDU...

P.J. Plauger wrote:
.....
I'd go a step farther. The one case in list::splice that goes faster
if size is O(N) is one that *should not* go faster. It lets you splice
a segment from another list whose head follows its tail -- thus
snagging
the head node (in all implementations I know) and destroying the
integrity of both lists in the bargain.

Huh? That was a little too condensed for me; I got lost at the "whose
head
follows its tail" clause. Could you give an example in code that
demonstrates the problem?

std::list<wchar_tc1;
c1.push_back(L'a');
c1.push_back(L'b');
c1.push_back(L'c');

std::list<wchar_tc2(c1);
std::list<wchar_t>::iterator it = c2.begin();
c1.splice(c1.begin(), c2, ++it, c2.begin());

This is an ill-formed interval in c2, but if you just go ahead
and splice it without checking you'll add {b, c, HEAD2, a} from
c2 to the front of {a b c HEAD1} in c1. Among other bad things,
the next call to c2.begin() will probably point somewhere inside
c1.

But think of all the time you've saved not checking for this
problem.

Sorry, I don't get this. Do you propose that we should avoid much of the
standard library?

Where did that come from? I said nothing like that.
>What is different here compared to std::copy and all other functions that
take two iterators and expect [first, last) to be valid? Am I missing
something?

Yes and no. One philosophy of C++, inherited from C, is that you should
be given enough rope to hang yourself, and speedily at that. Another,
inherited from OOP, is that a class has the right and obligation to
protect itself from misuse.
The second philosophy may have been documented in design papers for C++; if
so, however, it does not show in the standard. The standard defines C++ as
a set of mechanism that one can use to implement all sorts of misschief.
C++ does in no way enforce or even promote OOP; designing an OO-approved
class with guarded invariants is not even simpler in C++ than designing an
OO-nightmare.

To propose an amendment of the standard based on a philosophical inclination
that so far has not shown in the standard seems a little bit of a stretch.

You can indeed call copy with a bad pair of iterators and write all over
memory. Extra iterator debugging logic can catch some such calls, but not
all. And even if an implementation wanted to do this sort of checking all
the time, it can't. Time complexity requirements in the C++ Standard often
forbid it.
Often? With regard to range checking I wonder: if every iterator carries a
pointer to its container one can check the validity of a range in linear
time (and constant time for random access iterators). Any algorithm that
takes a range allows for linear time on non-random access iterators anyway.

Could you give an example where a checking implementation is ruled out by
the complexity requirements of the standard?

So the C++ Standard partly institutionalizes the wild 'n wooly
approach to programming.
True! Now, opinions may vary as to whether that is a good thing or not.

By and large, I get by with putting a safety layer between my code and the
language: e.g., a pointer_to<Ttemplate that wraps raw pointers and allows
for checking 0-dereferencing and tracing allocations and deallocations, or
a safe<inttype that checks for arithmetic overflows. I am not sure
whether I want the standard to change in this regard. After all, if you
want safety, you can have it. It just comes at a price.

Template class list has just one member function, which splices in part
of another list, that *requires* linear time to do its work safely.
Some of us have seen to it that the C++ Standard *permits* this member
function to take that time if the implementer so chooses. It so happens
that this is the only member function across all containers that *can*
go faster by taking advantage of another latitude in the C++ Standard --
to have list::size() take linear time instead of constant. Put simply,
slow size() is there to permit fast partial splice (which is still not
guaranteed).

So here is at least one case in the C++ Standard where:

a) people don't like the prospect that calling size() for a container
might take linear time

b) we could remove that requirement from the C++ Standard if all
implementations did their counting at partial-splice time

c) while we're counting the nodes moved during a partial splice, we
can even enforce the integrity goals of OOP
That would be the only place in the standard then. Maybe you are onto
something in setting a precedence.

What we give up is the *permission* that implementers currently
have to let people tie granny knots in lists, to be discovered at
a later date, so that one admittedly rare form of splice can do
its damage really fast. I consider this case at least philosophically
different from calling copy with bad arguments. YMMV.
I am by and large happy with the undefined behavior solution that the
current standard employs, although I would love to have a debug-version of
the standard library that defines all those unsound ranges and out of
bounds accesses to violate some assert().

What bothers me though, is the latitude within the complexity requirements.
I'd rather have the standard declare unambiguously that size() is constant
time. That would at least be predictable. (Similarly, I think the standard
could require sort() to be O(n log(n)) and nth_element() to be linear.)
Best

Kai-Uwe Bux
Feb 3 '07 #20
"Alf P. Steinbach" <al***@start.nowrote in message
news:52*************@mid.individual.net...
>* P.J. Plauger:
>>
Template class list has just one member function, which splices in part
of another list, that *requires* linear time to do its work safely.
Some of us have seen to it that the C++ Standard *permits* this member
function to take that time if the implementer so chooses. It so happens
that this is the only member function across all containers that *can*
go faster by taking advantage of another latitude in the C++ Standard --
to have list::size() take linear time instead of constant. Put simply,
slow size() is there to permit fast partial splice (which is still not
guaranteed).

So here is at least one case in the C++ Standard where:

a) people don't like the prospect that calling size() for a container
might take linear time

b) we could remove that requirement from the C++ Standard if all
implementations did their counting at partial-splice time

c) while we're counting the nodes moved during a partial splice, we
can even enforce the integrity goals of OOP

What we give up is the *permission* that implementers currently
have to let people tie granny knots in lists, to be discovered at
a later date, so that one admittedly rare form of splice can do
its damage really fast. I consider this case at least philosophically
different from calling copy with bad arguments. YMMV.

Ah, well. Essentially there are then, in practice, two kinds of
implementations of std::list, namely fast size, and fast (unsafe) splice.

But with a given standard library implementation there's just one.
Not necessarily. Some might choose the safer form only when "iterator
debugging" is enabled, which we often do.
Wouldn't it be better to have both, accessible to the user, who could then
make the decision to use one or the other or both, with an
implementation-defined typedef of std::list for default? Was this
considered? And if it was, considering that the two could share most of
their implementations, why was it rejected?
No, this hasn't been discussed. There are many ways to push down
one bedspring, only to have another pop up in its place. I have
little sympathy for enlarging an already interface to give
programmers their choice of low and/or high bedsprings.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
Feb 3 '07 #21
"Kai-Uwe Bux" <jk********@gmx.netwrote in message
news:eq**********@murdoch.acc.Virginia.EDU...
P.J. Plauger wrote:
.....
>>What is different here compared to std::copy and all other functions
that
take two iterators and expect [first, last) to be valid? Am I missing
something?

Yes and no. One philosophy of C++, inherited from C, is that you should
be given enough rope to hang yourself, and speedily at that. Another,
inherited from OOP, is that a class has the right and obligation to
protect itself from misuse.

The second philosophy may have been documented in design papers for C++;
if
so, however, it does not show in the standard.
Nonsense. Look at all the requred "protected" members, and all the
internals documented as "exposition only" (so you can't reliably
muck with them. This is the very stuff of OOP.
The standard defines C++ as
a set of mechanism that one can use to implement all sorts of misschief.
Or not.
C++ does in no way enforce or even promote OOP; designing an OO-approved
class with guarded invariants is not even simpler in C++ than designing an
OO-nightmare.
You're overstating the case to the point of incorrectness.
To propose an amendment of the standard based on a philosophical
inclination
that so far has not shown in the standard seems a little bit of a stretch.
Huh? All I was doing was supporting an oft-expressed desire to
guarantee that container size() be O(1). My observation is that
there are even hygienic reasons for disallowing the one bit of
latitude that requires size() be O(N). I'm personally quite
content with the state of the C++ Standard (in this regard).

Having said that, I still dispute your claim that there is no
flavor of OOP in Standard C++. And I can report that *many*
proposed changes to the C++ Standard have philosophical
rationale.
>You can indeed call copy with a bad pair of iterators and write all over
memory. Extra iterator debugging logic can catch some such calls, but not
all. And even if an implementation wanted to do this sort of checking all
the time, it can't. Time complexity requirements in the C++ Standard
often
forbid it.

Often? With regard to range checking I wonder: if every iterator carries a
pointer to its container one can check the validity of a range in linear
time (and constant time for random access iterators). Any algorithm that
takes a range allows for linear time on non-random access iterators
anyway.

Could you give an example where a checking implementation is ruled out by
the complexity requirements of the standard?
Many of the complexity requirements are expressed as specific
maxima -- you just can't do more comparisons, or assignments,
or whatever. In fact, complexity requirements that merely
dictate big-O time complexity are essentially untestable and
de facto toothless, except as a QOI issue.
>So the C++ Standard partly institutionalizes the wild 'n wooly
approach to programming.

True! Now, opinions may vary as to whether that is a good thing or not.
It's one thing to permit it; it's quite another to *mandate* it.
That was my point.
By and large, I get by with putting a safety layer between my code and the
language: e.g., a pointer_to<Ttemplate that wraps raw pointers and
allows
for checking 0-dereferencing and tracing allocations and deallocations, or
a safe<inttype that checks for arithmetic overflows. I am not sure
whether I want the standard to change in this regard. After all, if you
want safety, you can have it. It just comes at a price.
Agreed. The issue is whether you can simultaneously pay the price
and conform. There are compelling reasons to permit both at once,
wherever possible.
>Template class list has just one member function, which splices in part
of another list, that *requires* linear time to do its work safely.
Some of us have seen to it that the C++ Standard *permits* this member
function to take that time if the implementer so chooses. It so happens
that this is the only member function across all containers that *can*
go faster by taking advantage of another latitude in the C++ Standard --
to have list::size() take linear time instead of constant. Put simply,
slow size() is there to permit fast partial splice (which is still not
guaranteed).

So here is at least one case in the C++ Standard where:

a) people don't like the prospect that calling size() for a container
might take linear time

b) we could remove that requirement from the C++ Standard if all
implementations did their counting at partial-splice time

c) while we're counting the nodes moved during a partial splice, we
can even enforce the integrity goals of OOP

That would be the only place in the standard then. Maybe you are onto
something in setting a precedence.
Indeed this *is* a rare circumstance, which is why I've pointed
it out repeatedly.
>What we give up is the *permission* that implementers currently
have to let people tie granny knots in lists, to be discovered at
a later date, so that one admittedly rare form of splice can do
its damage really fast. I consider this case at least philosophically
different from calling copy with bad arguments. YMMV.

I am by and large happy with the undefined behavior solution that the
current standard employs, although I would love to have a debug-version of
the standard library that defines all those unsound ranges and out of
bounds accesses to violate some assert().
I know where you can license one...
What bothers me though, is the latitude within the complexity
requirements.
I'd rather have the standard declare unambiguously that size() is constant
time. That would at least be predictable. (Similarly, I think the standard
could require sort() to be O(n log(n)) and nth_element() to be linear.)
I agree that big-O time complexity should favor the programmer,
wherever practically possible.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
Feb 3 '07 #22
P.J. Plauger wrote:
"Kai-Uwe Bux" <jk********@gmx.netwrote in message
news:eq**********@murdoch.acc.Virginia.EDU...
>P.J. Plauger wrote:
.....
>>>What is different here compared to std::copy and all other functions
that
take two iterators and expect [first, last) to be valid? Am I missing
something?

Yes and no. One philosophy of C++, inherited from C, is that you should
be given enough rope to hang yourself, and speedily at that. Another,
inherited from OOP, is that a class has the right and obligation to
protect itself from misuse.

The second philosophy may have been documented in design papers for C++;
if
so, however, it does not show in the standard.

Nonsense. Look at all the requred "protected" members, and all the
internals documented as "exposition only" (so you can't reliably
muck with them. This is the very stuff of OOP.
> The standard defines C++ as
a set of mechanism that one can use to implement all sorts of misschief.

Or not.
>C++ does in no way enforce or even promote OOP; designing an OO-approved
class with guarded invariants is not even simpler in C++ than designing
an OO-nightmare.

You're overstating the case to the point of incorrectness.
True :-) I should have marked this paragraph as a rant.

However, the very last sentence (that writing an OO-nightmare class in C++
is not any more difficult than writing sound and proper OO-classes) is, I
think, true. We see those nightmares posted here regularly. It must be darn
easy to write that kind of code. (And I think, I am falling into that trap
every now and then.)

>To propose an amendment of the standard based on a philosophical
inclination
that so far has not shown in the standard seems a little bit of a
stretch.

Huh? All I was doing was supporting an oft-expressed desire to
guarantee that container size() be O(1). My observation is that
there are even hygienic reasons for disallowing the one bit of
latitude that requires size() be O(N).
Point taken.
I'm personally quite
content with the state of the C++ Standard (in this regard).
Generally, I would favor guarantees a little bit more stronger. Although,
here it is a minor issue and unlike other cases, this one involves a real
trade-off.
Having said that, I still dispute your claim that there is no
flavor of OOP in Standard C++. And I can report that *many*
proposed changes to the C++ Standard have philosophical
rationale.
You are right: I overstated my case. There do exist instances of OOP in the
standard.
>>You can indeed call copy with a bad pair of iterators and write all over
memory. Extra iterator debugging logic can catch some such calls, but
not all. And even if an implementation wanted to do this sort of
checking all the time, it can't. Time complexity requirements in the C++
Standard often
forbid it.

Often? With regard to range checking I wonder: if every iterator carries
a pointer to its container one can check the validity of a range in
linear time (and constant time for random access iterators). Any
algorithm that takes a range allows for linear time on non-random access
iterators anyway.

Could you give an example where a checking implementation is ruled out by
the complexity requirements of the standard?

Many of the complexity requirements are expressed as specific
maxima -- you just can't do more comparisons, or assignments,
or whatever. In fact, complexity requirements that merely
dictate big-O time complexity are essentially untestable and
de facto toothless, except as a QOI issue.
True, but I still don't see how those sharp requirements (on the number of
assignments or comparisons) would interfere with range checking. I must be
missing something here. Please let me reiterate my request for a concrete
example. (Sorry for being a pain, but I find this particular point rather
interesting; and I have already learned quite a bit from the code sample
you provided upthread).
[undisputed items snipped]
Thanks

Kai-Uwe Bux
Feb 3 '07 #23
In article <e7******************************@giganews.com>,
pj*@dinkumware.com says...

[ ... ]
What we give up is the *permission* that implementers currently
have to let people tie granny knots in lists, to be discovered at
a later date, so that one admittedly rare form of splice can do
its damage really fast. I consider this case at least philosophically
different from calling copy with bad arguments. YMMV.
Do you know if anybody has worked on new language for C++ 0x to require
no worse than linear complexity for list, and constant complexity for
all other containers?

--
Later,
Jerry.

The universe is a figment of its own imagination.
Feb 3 '07 #24
On Sat, 3 Feb 2007 06:39:03 -0500, P.J. Plauger <pj*@dinkumware.comwrote:
"Jorgen Grahn" <gr********@snipabacken.dyndns.orgwrote in message
news:sl***********************@frailea.sa.invalid. ..
>On Thu, 1 Feb 2007 18:59:08 -0500, P.J. Plauger <pj*@dinkumware.com>
wrote:
[std::list::splice()]
>>But think of all the time you've saved not checking for this
problem.

Sorry, I don't get this. Do you propose that we should avoid much of the
standard library?

Where did that come from? I said nothing like that.
It's the way I understand the Standard Library. Things like std::copy offer
speed and flexibility, and in return I take care when choosing with
iterators to feed into it.

So it just made sense to me that list::splice() chooses speed over safety,
too. After all, I can keep count of a list's size myself (as long as I
don't splice it), but I cannot write a splice() which works at O(1).

[snip some informative text]
I consider this case at least philosophically
different from calling copy with bad arguments. YMMV.
Yes, that is the difference I don't see. But my understanding and experience
are limited. (And I doubt I'll ever have a use for splicing lists that way.)

/Jorgen

--
// Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
\X/ snipabacken.dyndns.org R'lyeh wgah'nagl fhtagn!
Feb 3 '07 #25
"Kai-Uwe Bux" <jk********@gmx.netwrote in message
news:eq**********@murdoch.acc.Virginia.EDU...
...
>>Could you give an example where a checking implementation is ruled out
by
the complexity requirements of the standard?

Many of the complexity requirements are expressed as specific
maxima -- you just can't do more comparisons, or assignments,
or whatever. In fact, complexity requirements that merely
dictate big-O time complexity are essentially untestable and
de facto toothless, except as a QOI issue.

True, but I still don't see how those sharp requirements (on the number of
assignments or comparisons) would interfere with range checking. I must be
missing something here. Please let me reiterate my request for a concrete
example. (Sorry for being a pain, but I find this particular point rather
interesting; and I have already learned quite a bit from the code sample
you provided upthread).
Our library fails a dozen-odd complexity tests in our Proofer when we
run it in iterator-debugging mode. I'm late for a big delivery right
now, so I don't have time to run the tests and identify the culprits.
But here's a quick general example. It's not about range checking, but
it is about violating complexity requirements to do good checking:

We check predicates that are required to enforce strict weak ordering
by sometimes evaluating them twice; if a < b is true we verify that
b < a is not also true. This sometimes violates the limit on the number
of compares, but it has proved repeatedly to be a godsend in detecting
program flaws before access violations obscure the cause.

Best I can do right now.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
Feb 3 '07 #26
"Jerry Coffin" <jc*****@taeus.comwrote in message
news:MP************************@news.sunsite.dk...
In article <e7******************************@giganews.com>,
pj*@dinkumware.com says...

[ ... ]
>What we give up is the *permission* that implementers currently
have to let people tie granny knots in lists, to be discovered at
a later date, so that one admittedly rare form of splice can do
its damage really fast. I consider this case at least philosophically
different from calling copy with bad arguments. YMMV.

Do you know if anybody has worked on new language for C++ 0x to require
no worse than linear complexity for list, and constant complexity for
all other containers?
I believe there's a DR about the complexity of size() in general.
Don't remember the details.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
Feb 3 '07 #27

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

7 posts views Thread by ma740988 | last post: by
8 posts views Thread by gipsy boy | last post: by
16 posts views Thread by Cory Nelson | last post: by
7 posts views Thread by Renzr | last post: by
2 posts views Thread by mathieu | last post: by
2 posts views Thread by Markus Dehmann | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.