469,649 Members | 1,236 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,649 developers. It's quick & easy.

list.clear() missing?!?

I tried to clear a list today (which I do rather rarely, considering
that just doing l = [] works most of the time) and was shocked, SHOCKED
to notice that there is no clear() method. Dicts have it, sets have it,
why do lists have to be second class citizens?

Apr 11 '06 #1
77 16475
Ville Vainio wrote:
I tried to clear a list today (which I do rather rarely, considering
that just doing l = [] works most of the time) and was shocked, SHOCKED
to notice that there is no clear() method. Dicts have it, sets have it,
why do lists have to be second class citizens?


because Python already has a perfectly valid way to clear a list,
perhaps ?

del l[:]

(lists are not mappings, so the duck typing argument don't really
apply here.)

</F>

Apr 11 '06 #2
Fredrik Lundh wrote:
I tried to clear a list today (which I do rather rarely, considering
that just doing l = [] works most of the time) and was shocked, SHOCKED
to notice that there is no clear() method. Dicts have it, sets have it,
why do lists have to be second class citizens?
because Python already has a perfectly valid way to clear a list,
perhaps ?

del l[:]


Ok. That's pretty non-obvious but now that I've seen it I'll probably
remember it. I did a stupid "while l: l.pop()" loop myself.
(lists are not mappings, so the duck typing argument don't really
apply here.)


I was thinking of list as a "mutable collection", and clear() is
certainly a very natural operation for them.

Apr 11 '06 #3
Ville Vainio wrote:
I tried to clear a list today (which I do rather rarely, considering
that just doing l = [] works most of the time) and was shocked, SHOCKED
to notice that there is no clear() method. Dicts have it, sets have it,
why do lists have to be second class citizens?


This gets brought up all the time (search the archives for your
favorite), but your options are basically (renaming your list to lst for
readability) one of::

del lst[:]

lst[:] = []

or if you don't need to modify the list in place,

lst = []

Personally, I tend to go Fredrik's route and use the first.

If you feel really strongly about this though, you might consider
writing up a PEP. It's been contentious enough that there's not much
chance of getting a change without one.

STeVe
Apr 11 '06 #4
Em Ter, 2006-04-11 *s 10:42 -0600, Steven Bethard escreveu:
one of::

del lst[:]

lst[:] = []

or if you don't need to modify the list in place,

lst = []

Personally, I tend to go Fredrik's route and use the first.


I love benchmarks, so as I was testing the options, I saw something very
strange:

$ python2.4 -mtimeit 'x = range(100000); '
100 loops, best of 3: 6.7 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x[:]'
100 loops, best of 3: 6.35 msec per loop
$ python2.4 -mtimeit 'x = range(100000); x[:] = []'
100 loops, best of 3: 6.36 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x'
100 loops, best of 3: 6.46 msec per loop

Why the first benchmark is the slowest? I don't get it... could someone
test this, too?

Cheers,

--
Felipe.

Apr 11 '06 #5
Steven Bethard wrote:
If you feel really strongly about this though, you might consider
writing up a PEP. It's been contentious enough that there's not much
chance of getting a change without one.


No strong feelings here, and I'm sure greater minds than me have
already hashed this over sufficiently.

It's just that, when I have an object, and am wondering how I can clear
it, I tend to look what methods it has first and go to google looking
for "idioms" second.

Perhaps "clear" method could be added that raises
PedagogicException("Use del lst[:], stupid!")?

*ducks*

Apr 11 '06 #6
Felipe Almeida Lessa wrote:
I love benchmarks, so as I was testing the options, I saw something very
strange:

$ python2.4 -mtimeit 'x = range(100000); '
100 loops, best of 3: 6.7 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x[:]'
100 loops, best of 3: 6.35 msec per loop
$ python2.4 -mtimeit 'x = range(100000); x[:] = []'
100 loops, best of 3: 6.36 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x'
100 loops, best of 3: 6.46 msec per loop

Why the first benchmark is the slowest? I don't get it... could someone
test this, too?


In the first benchmark, you need space for two lists: the old one and
the new one; the other benchmarks you need only a single block of
memory (*). Concluding from here gets difficult - you would have to study
the malloc implementation to find out whether it works better in one
case over the other. Could also be an issue of processor cache: one
may fit into the cache, but the other may not.

Regards,
Martin

(*) plus, you also need the integer objects twice.
Apr 11 '06 #7
Ville Vainio wrote:
It's just that, when I have an object, and am wondering how I can clear
it, I tend to look what methods it has first and go to google looking
for "idioms" second.


I guess del on a list is not that common, so people tend to not know
that it works on lists (and slices!), too. It's too bad that lists have
a pop() method these days, so people can do x.pop() even if they don't
need the value, instead of doing del x[-1]. I don't think I ever needed
to del a slice except for clearing the entire list (and I don't need to
do that often, either - I just throw the list away).

Regards,
Martin
Apr 11 '06 #8
Steven Bethard wrote:

lst[:] = []
lst = []


What's the difference here?
Apr 11 '06 #9
Em Ter, 2006-04-11 *s 17:56 +0000, John Salerno escreveu:
Steven Bethard wrote:

lst[:] = []
lst = []


What's the difference here?


lst[:] = [] makes the specified slice become []. As we specified ":", it
transforms the entire list into [].

lst = [] assigns the value [] to the variable lst, deleting any previous
one.

This might help:
lst = range(10)
id(lst), lst (-1210826356, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) lst[:] = []
id(lst), lst (-1210826356, [])
lst = range(10)
id(lst), lst (-1210844052, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) lst = []
id(lst), lst (-1210826420, [])
You see? lst[:] removes all elements from the list that lst refers to,
while lst = [] just creates a new list and discard the only one. The
difference is, for example:
lst = range(3)
x = [lst, lst, lst]
x [[0, 1, 2], [0, 1, 2], [0, 1, 2]] lst[:] = []
x [[], [], []]
lst = range(3)
x = [lst, lst, lst]
x [[0, 1, 2], [0, 1, 2], [0, 1, 2]] lst = []
x

[[0, 1, 2], [0, 1, 2], [0, 1, 2]]

HTH,

--
Felipe.

Apr 11 '06 #10
John Salerno wrote:
Steven Bethard wrote:

lst[:] = []
lst = []


What's the difference here?


L[:]= modifies the object in place, L=[] binds the variable to a
new object. compare and contrast:
L = ["a", "b", "c"]
M = L
L ['a', 'b', 'c'] M ['a', 'b', 'c'] L is M True L[:] = []
L [] M [] L is M True
L = ["a", "b", "c"]
M = L
L ['a', 'b', 'c'] M ['a', 'b', 'c'] L = []
L [] M ['a', 'b', 'c'] L is M

False

</F>

Apr 11 '06 #11
John Salerno wrote:
Steven Bethard wrote:

lst[:] = []
lst = []

What's the difference here?

lst = [1,2,3]
lst2 = lst
lst[:] = []
lst2 [] lst = [1,2,3]
lst2 = lst
lst = []
lst2 [1, 2, 3]


Duncan
Apr 11 '06 #12
Felipe Almeida Lessa wrote:
You see? lst[:] removes all elements from the list that lst refers to,
while lst = [] just creates a new list and discard the only one. The
difference is, for example:


Thanks, your explanation was great!
Apr 11 '06 #13
Fredrik Lundh wrote:
John Salerno wrote:
Steven Bethard wrote:

lst[:] = []
lst = []

What's the difference here?


L[:]= modifies the object in place, L=[] binds the variable to a
new object. compare and contrast:


Thanks guys, your explanations are really helpful. I think what had me
confused at first was my understanding of what L[:] does on either side
of the assignment operator. On the left, it just chooses those elements
and edits them in place; on the right, it makes a copy of that list,
right? (Which I guess is still more or less *doing* the same thing, just
for different purposes)
Apr 11 '06 #14
John Salerno wrote:
Thanks guys, your explanations are really helpful. I think what had me
confused at first was my understanding of what L[:] does on either side
of the assignment operator. On the left, it just chooses those elements
and edits them in place; on the right, it makes a copy of that list,
right? (Which I guess is still more or less *doing* the same thing, just
for different purposes)


Interestingly, if it was just a "clear" method nobody would be confused.

Apr 11 '06 #15
On Tue, 11 Apr 2006 14:49:04 -0700, Ville Vainio wrote:
John Salerno wrote:
Thanks guys, your explanations are really helpful. I think what had me
confused at first was my understanding of what L[:] does on either side
of the assignment operator. On the left, it just chooses those elements
and edits them in place; on the right, it makes a copy of that list,
right? (Which I guess is still more or less *doing* the same thing, just
for different purposes)


Interestingly, if it was just a "clear" method nobody would be confused.


Even more importantly, you could say help(list.clear) and learn something
useful, instead of trying help(del) and getting a syntax error.
help(del)

File "<stdin>", line 1
help(del)
^
SyntaxError: invalid syntax

I know Python isn't a purely OOP language, but in my opinion using
statements like del should be avoided when there is an easy and natural OO
way of doing it. Something like name.del() which means "delete the
reference to name in name's namespace" feels wrong, and may not even be
possible with Python's object model, so del name is a natural way to do
it.

But name.clear() meaning "mutate the object referenced by name to the
empty state" is a very natural candidate for a method, and I don't
understand why lists shouldn't have it.

For lists, it would be natural to have a hypothetical clear() method
accept an index or slice as an argument, so that these are equivalent:

del L[:] <=> L.clear()
del L[n] <=> L.clear(n)
del L[a:b] <=> L.clear(slice(a, b))
# untested reference implementation:
class ClearableList(list):
def clear(self, obj=None):
if obj is None:
obj = slice(0, len(self))
if isinstance(obj, int):
self.__delitem__(obj)
elif isinstance(obj, slice):
self.__delslice__(obj.start, obj.stop)
else:
raise TypeError
--
Steven.

Apr 12 '06 #16
On Tue, 11 Apr 2006 19:15:18 +0200, Martin v. Löwis wrote:
Felipe Almeida Lessa wrote:
I love benchmarks, so as I was testing the options, I saw something very
strange:

$ python2.4 -mtimeit 'x = range(100000); '
100 loops, best of 3: 6.7 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x[:]'
100 loops, best of 3: 6.35 msec per loop
$ python2.4 -mtimeit 'x = range(100000); x[:] = []'
100 loops, best of 3: 6.36 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x'
100 loops, best of 3: 6.46 msec per loop

Why the first benchmark is the slowest? I don't get it... could someone
test this, too?


In the first benchmark, you need space for two lists: the old one and
the new one;


Er, what new list? I see only one list, x = range(100000), which is merely
created then nothing done to it. Have I missed something?

I understood Felipe to be asking, why does it take longer to just create a
list, than it takes to create a list AND then do something to it?

--
Steven.

Apr 12 '06 #17
Em Qua, 2006-04-12 *s 11:36 +1000, Steven D'Aprano escreveu:
On Tue, 11 Apr 2006 19:15:18 +0200, Martin v. Löwis wrote:
Felipe Almeida Lessa wrote:
I love benchmarks, so as I was testing the options, I saw something very
strange:

$ python2.4 -mtimeit 'x = range(100000); '
100 loops, best of 3: 6.7 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x[:]'
100 loops, best of 3: 6.35 msec per loop
$ python2.4 -mtimeit 'x = range(100000); x[:] = []'
100 loops, best of 3: 6.36 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x'
100 loops, best of 3: 6.46 msec per loop

Why the first benchmark is the slowest? I don't get it... could someone
test this, too?
In the first benchmark, you need space for two lists: the old one and
the new one;


Er, what new list? I see only one list, x = range(100000), which is merely
created then nothing done to it. Have I missed something?


He's talking about the garbage collector.

Talking about the GC, do you want to see something *really* odd?

$ python2.4 -mtimeit -s 'from gc import collect' 'collect(); x =
range(100000); '
100 loops, best of 3: 13 msec per loop

$ python2.4 -mtimeit -s 'from gc import collect' 'collect(); x =
range(100000); del x[:]'
100 loops, best of 3: 8.19 msec per loop

$ python2.4 -mtimeit -s 'from gc import collect' 'collect(); x =
range(100000); x[:] = []'
100 loops, best of 3: 8.16 msec per loop

$ python2.4 -mtimeit -s 'from gc import collect' 'collect(); x =
range(100000); del x'
100 loops, best of 3: 8.3 msec per loop
But in this case I got the answer (I think):
- When you explicitly delete the objects, the GC already know that it
can be collected, so it just throw the objects away.
- When we let the "x" variable continue to survive, the GC has to look
at all the 100001 objects to see if they can be collected -- just to see
that it can't.

Also, IIRC "del x" is slower than "x = []" because removing a name from
the namespace is more expensive than just assigning something else to
it. Right?
I understood Felipe to be asking, why does it take longer to just create a
list, than it takes to create a list AND then do something to it?


I see dead people... ;-)

--
Felipe.

Apr 12 '06 #18
Felipe Almeida Lessa <fe**********@gmail.com> writes:
I love benchmarks, so as I was testing the options, I saw something very
strange:

$ python2.4 -mtimeit 'x = range(100000); '
100 loops, best of 3: 6.7 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x[:]'
100 loops, best of 3: 6.35 msec per loop
$ python2.4 -mtimeit 'x = range(100000); x[:] = []'
100 loops, best of 3: 6.36 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x'
100 loops, best of 3: 6.46 msec per loop

Why the first benchmark is the slowest? I don't get it... could someone
test this, too?


I get similar behaviour. No idea why.

$ python2.4 -mtimeit 'x = range(100000); '
100 loops, best of 3: 6.99 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x[:]'
100 loops, best of 3: 6.49 msec per loop
$ python2.4 -mtimeit 'x = range(100000); x[:] = []'
100 loops, best of 3: 6.47 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x'
100 loops, best of 3: 6.6 msec per loop

Dan
Apr 12 '06 #19
Martin v. Lwis wrote:
Felipe Almeida Lessa wrote:
I love benchmarks, so as I was testing the options, I saw something very
strange:

$ python2.4 -mtimeit 'x = range(100000); '
100 loops, best of 3: 6.7 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x[:]'
100 loops, best of 3: 6.35 msec per loop
$ python2.4 -mtimeit 'x = range(100000); x[:] = []'
100 loops, best of 3: 6.36 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x'
100 loops, best of 3: 6.46 msec per loop

Why the first benchmark is the slowest? I don't get it... could someone
test this, too?
In the first benchmark, you need space for two lists: the old one and
the new one; the other benchmarks you need only a single block of
memory (*).


I don't follow you here :)
Concluding from here gets difficult - you would have to study
the malloc implementation to find out whether it works better in one
case over the other.
That's not the case here. The following program prints the same
addresses whether you comment or uncomment del
---------------------------------
ids = range(10)
for i in xrange(10):
x = range(100000)
#del x[:]
ids[i] = id(x)

print ids
--------------------------------

Could also be an issue of processor cache: one
may fit into the cache, but the other may not.


That's not the reason, since the same time difference happens with
smaller arrays. I think the difference is how items are deallocated in
these two cases.

Calling del invokes list_ass_slice that deallocates from 0 to end
whereas ordinary removal of a list was changed to backward iteration
(the reason is in the comment:
/* Do it backwards, for Christian Tismer.
There's a simple test case where somehow this reduces
thrashing when a *very* large list is created and
immediately deleted. */

Usually iterating from low addresses to higher addresses is better for
CPU. On my CPU (Pentium M, 1.7Ghz) it's 20% faster:

Here is my results:

C:\py>python -mtimeit "x = range(10000); del x[:]"
1000 loops, best of 3: 213 usec per loop

C:\py>python -mtimeit "x = range(10000); del x"
1000 loops, best of 3: 257 usec per loop

C:\py>python -mtimeit "x = range(10000); "
1000 loops, best of 3: 258 usec per loop

C:\py>python -mtimeit "x = range(1000); del x[:]"
10000 loops, best of 3: 21.4 usec per loop

C:\py>python -mtimeit "x = range(1000); del x"
10000 loops, best of 3: 25.2 usec per loop

C:\py>python -mtimeit "x = range(1000); "
10000 loops, best of 3: 25.6 usec per loop

I don't have a development environment on my computer so I can't test
my thoughts. I could be wrong about the reason.

Apr 12 '06 #20

Martin v. Lwis wrote:
Felipe Almeida Lessa wrote:
I love benchmarks, so as I was testing the options, I saw something very
strange:

$ python2.4 -mtimeit 'x = range(100000); '
100 loops, best of 3: 6.7 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x[:]'
100 loops, best of 3: 6.35 msec per loop
$ python2.4 -mtimeit 'x = range(100000); x[:] = []'
100 loops, best of 3: 6.36 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x'
100 loops, best of 3: 6.46 msec per loop

Why the first benchmark is the slowest? I don't get it... could someone
test this, too?


In the first benchmark, you need space for two lists: the old one and
the new one; the other benchmarks you need only a single block of
memory (*). Concluding from here gets difficult - you would have to study
the malloc implementation to find out whether it works better in one
case over the other. Could also be an issue of processor cache: one
may fit into the cache, but the other may not.


Addition to the previous message. Now I follow you :) There are indeed
two arrays and cache seems to be the second reason for slowdown, but
iterating backwards is also contributing to slowdown.

Apr 12 '06 #21

Felipe Almeida Lessa wrote:
Em Qua, 2006-04-12 s 11:36 +1000, Steven D'Aprano escreveu:
On Tue, 11 Apr 2006 19:15:18 +0200, Martin v. Lwis wrote:
Felipe Almeida Lessa wrote:
> I love benchmarks, so as I was testing the options, I saw something very
> strange:
>
> $ python2.4 -mtimeit 'x = range(100000); '
> 100 loops, best of 3: 6.7 msec per loop
> $ python2.4 -mtimeit 'x = range(100000); del x[:]'
> 100 loops, best of 3: 6.35 msec per loop
> $ python2.4 -mtimeit 'x = range(100000); x[:] = []'
> 100 loops, best of 3: 6.36 msec per loop
> $ python2.4 -mtimeit 'x = range(100000); del x'
> 100 loops, best of 3: 6.46 msec per loop
>
> Why the first benchmark is the slowest? I don't get it... could someone
> test this, too?

In the first benchmark, you need space for two lists: the old one and
the new one;


Er, what new list? I see only one list, x = range(100000), which is merely
created then nothing done to it. Have I missed something?


He's talking about the garbage collector.


To be exact the reason for two array is timeit.py. It doesn't place the
code to time into a separate namespace but injects it into a for loop,
so the actual code timed is:
for _i in _it:
x = range(100000)
and that makes two arrays with 100.000 items exist for a short time
starting from second iteration.

Apr 12 '06 #22
On Wed, 12 Apr 2006 00:33:29 -0700, Serge Orlov wrote:

Felipe Almeida Lessa wrote:
Em Qua, 2006-04-12 *s 11:36 +1000, Steven D'Aprano escreveu:
> On Tue, 11 Apr 2006 19:15:18 +0200, Martin v. Löwis wrote:
>
> > Felipe Almeida Lessa wrote:
> >> I love benchmarks, so as I was testing the options, I saw something very
> >> strange:
> >>
> >> $ python2.4 -mtimeit 'x = range(100000); '
> >> 100 loops, best of 3: 6.7 msec per loop
> >> $ python2.4 -mtimeit 'x = range(100000); del x[:]'
> >> 100 loops, best of 3: 6.35 msec per loop
> >> $ python2.4 -mtimeit 'x = range(100000); x[:] = []'
> >> 100 loops, best of 3: 6.36 msec per loop
> >> $ python2.4 -mtimeit 'x = range(100000); del x'
> >> 100 loops, best of 3: 6.46 msec per loop
> >>
> >> Why the first benchmark is the slowest? I don't get it... could someone
> >> test this, too?
> >
> > In the first benchmark, you need space for two lists: the old one and
> > the new one;
>
> Er, what new list? I see only one list, x = range(100000), which is merely
> created then nothing done to it. Have I missed something?


He's talking about the garbage collector.


To be exact the reason for two array is timeit.py. It doesn't place the
code to time into a separate namespace but injects it into a for loop,
so the actual code timed is:
for _i in _it:
x = range(100000)
and that makes two arrays with 100.000 items exist for a short time
starting from second iteration.


But that is precisely the same for the other timeit tests too.

for _i in _it:
x = range(100000)
del x[:]

etc.

The question remains -- why does it take longer to do X than it takes to
do X and then Y?
--
Steven.

Apr 12 '06 #23
Steven D'Aprano wrote:
But that is precisely the same for the other timeit tests too.

for _i in _it:
x = range(100000) Allocate list.
Allocate ob_item array to hold pointers to 10000 objects
Allocate 99900 integer objects
setup list
del x[:] Calls list_clear which:
decrements references to 99900 integer objects, freeing them
frees the ob_item array

.... next time round ... x = range(100000)
Allocate new list list.
Allocate ob_item array probably picks up same memory block as last time
Allocate 99900 integer objects, probably reusing same memory as last time
setup list

etc.

The question remains -- why does it take longer to do X than it takes to
do X and then Y?

for _i in _it:

.... x = range(100000);

Allocate list.
Allocate ob_item array to hold pointers to 10000 objects
Allocate 99900 integer objects
setup list

.... next time round ...
Allocate another list
allocate a second ob_item array
allocate another 99900 integer objects
setup the list
then deletes the original list, decrements and releases original integers,
frees original ob_item array.

Uses twice as much everything except the actual list object. The actual
work done is the same but I guess there are likely to be more cache misses.
Also there is the question whether the memory allocation does or does not
manage to reuse the recently freed blocks. With one large block I expect it
might well end up reusing it, with two large blocks being freed alternately
it might not manage to reuse either (but that is just a guess and maybe
system dependant).
Apr 12 '06 #24
Steven D'Aprano wrote:
But name.clear() meaning "mutate the object referenced by name to the
empty state" is a very natural candidate for a method, and I don't
understand why lists shouldn't have it.


Funny this even comes up, because I was just trying to 'clear' a list
the other day. But it sounds like it's been an issue for a while. :)
Apr 12 '06 #25
Steven D'Aprano wrote:
On Tue, 11 Apr 2006 14:49:04 -0700, Ville Vainio wrote:
John Salerno wrote:
Thanks guys, your explanations are really helpful. I think what had me
confused at first was my understanding of what L[:] does on either side
of the assignment operator. On the left, it just chooses those elements
and edits them in place; on the right, it makes a copy of that list,
right? (Which I guess is still more or less *doing* the same thing, just
for different purposes)

Interestingly, if it was just a "clear" method nobody would be confused.


Even more importantly, you could say help(list.clear) and learn something
useful, instead of trying help(del) and getting a syntax error.

[snip more reasons to add list.clear()]

I think these are all good reasons for adding a clear method, but being
that it has been so hotly contended in the past, I don't think it will
get added without a PEP. Anyone out there willing to take out the best
examples from this thread and turn it into a PEP?

STeVe
Apr 12 '06 #26
Steven Bethard wrote:
I think these are all good reasons for adding a clear method, but being
that it has been so hotly contended in the past, I don't think it will
get added without a PEP. Anyone out there willing to take out the best
examples from this thread and turn it into a PEP?


What are the usual arguments against adding it?
Apr 12 '06 #27
John Salerno wrote:
Steven Bethard wrote:
I think these are all good reasons for adding a clear method, but being
that it has been so hotly contended in the past, I don't think it will
get added without a PEP. Anyone out there willing to take out the best
examples from this thread and turn it into a PEP?


What are the usual arguments against adding it?


That there should be one obvious way to do it.

Yes, I know that it can be debated whether "del x[:]" is obvious, and
fortunately I'm not the one to decide <wink>.

Georg
Apr 12 '06 #28
Georg Brandl wrote:
John Salerno wrote:
Steven Bethard wrote:
I think these are all good reasons for adding a clear method, but being
that it has been so hotly contended in the past, I don't think it will
get added without a PEP. Anyone out there willing to take out the best
examples from this thread and turn it into a PEP?


What are the usual arguments against adding it?

That there should be one obvious way to do it.

Yes, I know that it can be debated whether "del x[:]" is obvious, and
fortunately I'm not the one to decide <wink>.


It's so "obvious", I couldn't even find it in the tutorial the other day
when this thread came up. It's a wonder I ever learned that one myself,
and I don't know where I saw it. .clear(), on the other hand, would
clearly be easy to add/find in the tutorial... and we'd all save time
answering the question over and over around here (valuable time we would
then be able to using telling people who hadn't, to go and read the
tutorial! :-) ).
-Peter

Apr 12 '06 #29
John Salerno wrote:
Steven Bethard wrote:
I think these are all good reasons for adding a clear method, but being
that it has been so hotly contended in the past, I don't think it will
get added without a PEP. Anyone out there willing to take out the best
examples from this thread and turn it into a PEP?


What are the usual arguments against adding it?


I think one is that it's very rare to need it. Most of the time, just
rebinding the name to a new empty list is easier, and possibly faster.
The main problem with that approach is that in multithreaded code that
can be a source of subtle bugs. On the other hand, most things can be a
source of subtle bugs in multithreaded code, so maybe that's not a good
reason to add clear().

Saving us time answering the question repeatedly (and trying to justify
the current state of affairs) might be justification enough...

-Peter

Apr 12 '06 #30
John Salerno wrote:
Steven Bethard wrote:
I think these are all good reasons for adding a clear method, but
being that it has been so hotly contended in the past, I don't think
it will get added without a PEP. Anyone out there willing to take out
the best examples from this thread and turn it into a PEP?


What are the usual arguments against adding it?


I think most of the other folks have already given you the answers, but
the main ones I remember:

(1) There are already at least two ways of spelling it. We don't need
another.

(2) Clearing a list is a pretty uncommon operation. Usually just
creating a new list is fine (and easier).

Disclaimer: I'm happy with adding list.clear(), so it's quite possible
I'm misrepresenting the argument here. I guess that's just all the more
reason to have a PEP for it: to lay out all the pros and cons.

STeVe
Apr 12 '06 #31
Steven D'Aprano wrote:
$ python2.4 -mtimeit 'x = range(100000); '
100 loops, best of 3: 6.7 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x[:]'
100 loops, best of 3: 6.35 msec per loop
$ python2.4 -mtimeit 'x = range(100000); x[:] = []'
100 loops, best of 3: 6.36 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x'
100 loops, best of 3: 6.46 msec per loop

Why the first benchmark is the slowest? I don't get it... could someone
test this, too? In the first benchmark, you need space for two lists: the old one and
the new one;


Er, what new list? I see only one list, x = range(100000), which is merely
created then nothing done to it. Have I missed something?


See Duncan's explanation. This code is run many times, allocating many
lists. However, only two different lists exist at any point at time.

A Python list consists of two memory blocks: the list proper (a few
bytes), plus the "guts", i.e. a variable-sized block of pointers to
the objects. del x[:] frees the guts.
I understood Felipe to be asking, why does it take longer to just create a
list, than it takes to create a list AND then do something to it?


Actually, the same code (deallocation of all integers and the list
blocks) appears in either case. However, in the one case it is triggered
explicitly, and before the new allocation; in the other case, the
allocation of the new objects occurs before the old ones are released.

Regards,
Martin
Apr 12 '06 #32
Ville Vainio wrote:
Fredrik Lundh wrote:
because Python already has a perfectly valid way to clear a list,
perhaps ?

del l[:]

Ok. That's pretty non-obvious but now that I've seen it I'll probably
remember it. I did a stupid "while l: l.pop()" loop myself.


Actually, it's in the Library Reference (that we keep under
our pillows) section 2.3.6.4 Mutable Sequence Types.

Cheers, Mel.
Apr 12 '06 #33
[Steven Bethard]
I think these are all good reasons for adding a clear method, but being
that it has been so hotly contended in the past, I don't think it will
get added without a PEP. Anyone out there willing to take out the best
examples from this thread and turn it into a PEP?


Something this small doesn't need a PEP. I'll just send a note to
Guido asking for a pronouncement.

Here's a draft list of pros and cons (any changes or suggestions are
welcome):

Pros:
-----

* s.clear() is more obvious in intent

* easier to figure-out, look-up, and remember than either s[:]=[] or
del s[:]

* parallels the api for dicts, sets, and deques (increasing the
expecation that lists will too)

* the existing alternatives are a bit perlish

* the OP is shocked, SHOCKED that python got by for 16 years without
list.clear()
Cons:
-----

* makes the api fatter (there are already two ways to do it)

* expanding the api makes it more difficult to write classes that can
be polymorphically substituted for lists

* learning slices is basic to the language (this lesson shouldn't be
skipped)

* while there are valid use cases for re-using lists, the technique is
already overused for unsuccessful attempts to micro-optimize (creating
new lists is surprisingly fast)

* the request is inane, the underlying problem is trivial, and the
relevant idiom is fundamental (api expansions should be saved for rich
new functionality and not become cluttered with infrequently used
redundant entries)

Apr 12 '06 #34
Em Qua, 2006-04-12 *s 12:40 -0700, Raymond Hettinger escreveu:
* the existing alternatives are a bit perlish


I love this argument =D! "perlish"... lol...

Cheers,

--
Felipe.

Apr 12 '06 #35
[Felipe Almeida Lessa]
I love benchmarks, so as I was testing the options, I saw something very
strange:

$ python2.4 -mtimeit 'x = range(100000); '
100 loops, best of 3: 6.7 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x[:]'
100 loops, best of 3: 6.35 msec per loop
$ python2.4 -mtimeit 'x = range(100000); x[:] = []'
100 loops, best of 3: 6.36 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x'
100 loops, best of 3: 6.46 msec per loop

Why the first benchmark is the slowest? I don't get it... could someone
test this, too?

[Dan Christensen] I get similar behaviour. No idea why.


It is an effect of the memory allocator and fragmentation. The first
builds up a list with increasingly larger sizes. It periodically
cannot grow in-place because something is in the way (some other
object) so it needs to move its current entries to another, larger
block and grow from there. In contrast, the other entries are reusing
a the previously cleared out large block.

Just for grins, replace the first with"
'x=None; x=range(100000)'
The assignment to None frees the reference to the previous list and
allows it to be cleared so that its space is immediately available to
the new list being formed by range().

Apr 12 '06 #36
Raymond Hettinger wrote:
Cons:
-----
* learning slices is basic to the language (this lesson shouldn't be
skipped)
And yet it doesn't appear to be in the tutorial. I could have missed
it, but I've looked in a number of the obvious places, without actually
going through it (again) from start to finish. Also, googling for
"slice site:docs.python.org", you have to go to the *sixth* entry before
you can find the first mention of "del x[:]" and what it does. I think
given the current docs it's possible to learn all kinds of things about
slicing and still not make the non-intuitive leap that "del x[slice]" is
actually how you spell "delete contents of list in-place".
* while there are valid use cases for re-using lists, the technique is
already overused for unsuccessful attempts to micro-optimize (creating
new lists is surprisingly fast)
Not just valid use-cases, but ones for which "x = []" is entirely buggy,
yet not obviously so, especially to newcomers.
* the request is inane, the underlying problem is trivial, and the
relevant idiom is fundamental (api expansions should be saved for rich
new functionality and not become cluttered with infrequently used
redundant entries)


The first phrase is insulting and untrue, the second merely untrue (as
even your own list of pros and cons shows), and the last completely
valid and highly relevant, yet not overriding.

-Peter

Apr 12 '06 #37
Raymond Hettinger wrote:
* easier to figure-out, look-up, and remember than either s[:]=[] or
del s[:]
Easier is an understatement - it's something you figure out
automatically. When I want to do something w/ an object, looking at its
methods (via code completion) is the very first thing.
* the OP is shocked, SHOCKED that python got by for 16 years without
list.clear()
I'm sure you realize I was being sarcastic...
* learning slices is basic to the language (this lesson shouldn't be
skipped)
Assigning to slices is much less important, and is something I always
never do (and hence forget).
* the request is inane, the underlying problem is trivial, and the
relevant idiom is fundamental (api expansions should be saved for rich
new functionality and not become cluttered with infrequently used
redundant entries)


I understand that these are the main arguments. However, as it stands
there is no one *obvious* way to clear a list in-place. I agree that
it's rare to even need it, but when you do a it's a little bit of a
surprise.

Apr 12 '06 #38
Ville Vainio wrote:
Assigning to slices is much less important, and is something I always
never do (and hence forget).


ALMOST never, of course.

Apr 12 '06 #39
In article <11**********************@u72g2000cwu.googlegroups .com>,
Raymond Hettinger <py****@rcn.com> wrote:
[Steven Bethard]
I think these are all good reasons for adding a clear method, but being
that it has been so hotly contended in the past, I don't think it will
get added without a PEP. Anyone out there willing to take out the best
examples from this thread and turn it into a PEP?


Something this small doesn't need a PEP. I'll just send a note to
Guido asking for a pronouncement.

Here's a draft list of pros and cons (any changes or suggestions are
welcome):

Pros:
-----

* s.clear() is more obvious in intent


Serious question: Should it work more like "s=[]" or more like
"s[:]=[]". I'm assuming the latter, but the fact that there is
a difference is an argument for not hiding this operation behind
some syntactic sugar.

Alan
--
Defendit numerus
Apr 12 '06 #40
On Wed, 12 Apr 2006 12:40:52 -0700, Raymond Hettinger wrote:
Something this small doesn't need a PEP. I'll just send a note to
Guido asking for a pronouncement.
Raymond, if you're genuinely trying to help get this sorted in the
fairest, simplest way possible, I hope I speak for everyone when
I say thank you, your efforts are appreciated.

But there is this:
* the request is inane, the underlying problem is trivial, and the
relevant idiom is fundamental (api expansions should be saved for rich
new functionality and not become cluttered with infrequently used
redundant entries)


Is this sort of editorialising fair, or just a way of not-so-subtly
encouraging Guido to reject the whole idea, now and forever?

Convenience and obviousness are important for APIs -- that's why lists
have pop, extend and remove methods. The only difference I can see between
a hypothetical clear and these is that clear can be replaced with a
one-liner, while the others need at least two, e.g. for extend:

for item in seq:
L.append(item)

Here is another Pro for your list:

A list.clear method will make deleting items from a list more OO,
consistent with almost everything else you do to lists, and less
procedural. This is especially true if clear() takes an optional index (or
two), allowing sections of the list to be cleared, not just the entire
list.
--
Steven.

Apr 12 '06 #41
On Wed, 12 Apr 2006 15:36:47 -0700, Alan Morgan wrote:
Serious question: Should it work more like "s=[]" or more like
"s[:]=[]". I'm assuming the latter, but the fact that there is
a difference is an argument for not hiding this operation behind
some syntactic sugar.


Er, I don't see how it can possibly work like s = []. That merely
reassigns a new empty list to the name s, it doesn't touch the existing
list (which may or may not be garbage collected soon/immediately
afterwards).

As far as I know, it is impossible -- or at least very hard -- for an
object to know which namesspaces it is in, so it can reassign one name but
not the others. Even if such a thing was possible, I think it is an
absolutely bad idea.

--
Steven.

Apr 12 '06 #42
In article <pa****************************@REMOVETHIScyber.co m.au>,
Steven D'Aprano <st***@REMOVETHIScyber.com.au> wrote:
On Wed, 12 Apr 2006 15:36:47 -0700, Alan Morgan wrote:
Serious question: Should it work more like "s=[]" or more like
"s[:]=[]". I'm assuming the latter, but the fact that there is
a difference is an argument for not hiding this operation behind
some syntactic sugar.


Er, I don't see how it can possibly work like s = []. That merely
reassigns a new empty list to the name s, it doesn't touch the existing
list (which may or may not be garbage collected soon/immediately
afterwards).


Right. I was wondering what would happen in this case:

s=[1,2,3]
t=s
s.clear()
t # [] or [1,2,3]??

If you know your basic python it is "obvious" what would happen
if you do s=[] or s[:]=[] instead of s.clear() and I guess it is
equally "obvious" which one s.clear() must mimic. I'm still not
used to dealing with mutable lists.

Alan
--
Defendit numerus
Apr 13 '06 #43
Alan Morgan wrote:
Right. I was wondering what would happen in this case:

s=[1,2,3]
t=s
s.clear()
t # [] or [1,2,3]??

If you know your basic python it is "obvious" what would happen
if you do s=[] or s[:]=[] instead of s.clear() and I guess it is
equally "obvious" which one s.clear() must mimic. I'm still not
used to dealing with mutable lists.


If you know your basic python :-), you know that s[:] = [] is doing the
only thing that s.clear() could possibly do, which is changing the
contents of the list which has the name "s" bound to it (and which might
have other names bound to it, just like any object in Python). It
*cannot* be doing the same as "s=[]" which does not operate on the list
but creates an entirely new one and rebinds the name "s" to it.

The only possible answer for your question above is "t is s" and "t ==
[]" because you haven't rebound the names.

-Peter

Apr 13 '06 #44
Steven D'Aprano wrote:
Convenience and obviousness are important for APIs -- that's why lists
have pop, extend and remove methods. The only difference I can see between
a hypothetical clear and these is that clear can be replaced with a
one-liner, while the others need at least two, e.g. for extend:

for item in seq:
L.append(item)


It's not even clear that extend needs two lines:
s = range(5)
more = list('abc')
s[:] = s + more
s

[0, 1, 2, 3, 4, 'a', 'b', 'c']

Okay, it's not obvious, but I don't think s[:]=[] is really any more
obvious as a way to clear the list.

Clearly .extend() needs to be removed from the language as it is an
unnecessary extension to the API using slicing, which everyone should
already know about...

-Peter

Apr 13 '06 #45
In article <ma***************************************@python. org>,
Peter Hansen <pe***@engcorp.com> wrote:
Alan Morgan wrote:
Right. I was wondering what would happen in this case:

s=[1,2,3]
t=s
s.clear()
t # [] or [1,2,3]??

If you know your basic python it is "obvious" what would happen
if you do s=[] or s[:]=[] instead of s.clear() and I guess it is
equally "obvious" which one s.clear() must mimic. I'm still not
used to dealing with mutable lists.


If you know your basic python :-), you know that s[:] = [] is doing the
only thing that s.clear() could possibly do,


Ah, but if you know your basic python then you wouldn't be looking for
s.clear() in the first place; you'd just use s[:]=[] (or s=[], whichever
is appropriate). IOW, the people most in need of s.clear() are those
least likely to be able to work out what it is actually doing.
Personally, it seems more *reasonable* to me, a novice python user,
for s.clear() to behave like s=[] (or, perhaps, for s=[] and s[:]=[] to
mean the same thing). The fact that it can't might be an argument for
not having it in the first place.

Alan
--
Defendit numerus
Apr 13 '06 #46

"Peter Hansen" <pe***@engcorp.com> wrote in message
news:e1**********@sea.gmane.org...
It's not even clear that extend needs two lines:
s = range(5)
more = list('abc')
s[:] = s + more
s [0, 1, 2, 3, 4, 'a', 'b', 'c']


This is not the same as list.extend because it makes a separate
intermediate list instead of doing the extension completely in place.
However, the following does mimic .extend.
s=range(5)
s[len(s):] = list('abc')
s

[0, 1, 2, 3, 4, 'a', 'b', 'c']

So. at the cost of writing and calling len(s), you are correct that .extend
is not necessary.

Terry Jan Reedy

Apr 13 '06 #47
Raymond Hettinger <py****@rcn.com> writes:
Felipe Almeida Lessa writes:
I love benchmarks, so as I was testing the options, I saw something very
strange:

$ python2.4 -mtimeit 'x = range(100000); '
100 loops, best of 3: 6.7 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x[:]'
100 loops, best of 3: 6.35 msec per loop

Why the first benchmark is the slowest? I don't get it... could someone
test this, too?
It is an effect of the memory allocator and fragmentation. The first
builds up a list with increasingly larger sizes.


I don't see what you mean by this. There are many lists all of
the same size. Do you mean some list internal to the memory
allocator?
It periodically
cannot grow in-place because something is in the way (some other
object) so it needs to move its current entries to another, larger
block and grow from there. In contrast, the other entries are reusing
a the previously cleared out large block.

Just for grins, replace the first with"
'x=None; x=range(100000)'
The assignment to None frees the reference to the previous list and
allows it to be cleared so that its space is immediately available to
the new list being formed by range().


It's true that this runs at the same speed as the del variants on my
machine. That's not too surprising to me, but I still don't
understand why the del variants are more than 5% faster than the first
version.

Once this is understood, is it something that could be optimized?
It's pretty common to rebind a variable to a new value, and if
this could be improved 5%, that would be cool. But maybe it
wouldn't affect anything other than such micro benchmarks.

Dan
Apr 13 '06 #48
Peter Hansen wrote:
* learning slices is basic to the language (this lesson shouldn't be
skipped)
And yet it doesn't appear to be in the tutorial.


oh, please.

slices are explained in the section on strings, and in the section on lists,
and used to define the behaviour of the list methods in the second section
on lists, ...
I could have missed it, but I've looked in a number of the obvious places


http://docs.python.org/tut/node5.htm...00000000000000

section 3.1.2 contains an example that shows to remove stuff from a list,
in place.

if you want a clearer example, please consider donating some of your time
to the pytut wiki:

http://pytut.infogami.com/

</F>

Apr 13 '06 #49
Peter Hansen wrote:
It's not even clear that extend needs two lines:
>>> s = range(5)
>>> more = list('abc')
>>> s[:] = s + more
>>> s

[0, 1, 2, 3, 4, 'a', 'b', 'c']

Okay, it's not obvious, but I don't think s[:]=[] is really any more
obvious as a way to clear the list.

Clearly .extend() needs to be removed from the language as it is an
unnecessary extension to the API using slicing


you just flunked the "what Python has to do to carry out a certain operation"
part of the "how Python works, intermediate level" certification.

</F>

Apr 13 '06 #50

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

5 posts views Thread by Mike | last post: by
5 posts views Thread by sam | last post: by
2 posts views Thread by Water Cooler v2 | last post: by
10 posts views Thread by pamelafluente | last post: by
3 posts views Thread by =?Utf-8?B?THVib21pcg==?= | last post: by
reply views Thread by gheharukoh7 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.