By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,363 Members | 3,622 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,363 IT Pros & Developers. It's quick & easy.

mutable list iterators - a proposal

P: n/a
hi,

I like the way that Python does lists, and I love the way it does
iterators. But I've decided I don't like what it does with iterators
of lists. Lists are supposed to be mutable sequences, but try to use
an iterator of a list that you're mutating and watch it all fall to
pieces. That is, if you change the length of a section of the list
through which the iterator has already passed, it will lose track of
where it is. I think this should be fixed for 2.4. I'm not sure if
this is a bug or a PEP, so I'd like to hear from others on this
newsgroup.

I've coded the following Python implementation of how I think lists
should work. It looks a little odd, and I'm sure that a C
implementation could be quite a bit more efficient (at least I hope
that is the case), and it wouldn't leak iterators like this one does.
But I think I've thought of, and dealt with, all of the weird cases.
These include: adding before the last yielded item, subtracting before
the next item to be yielded, adding between the last-yielded and
next-yielded items, subtracting the next-yielded item, etc. I think
that this subclass does the right thing in each of these cases. I'm
curious to see if others agree, and whether they think something like
this should be in the next version of Python. This code should work
with 2.3.

I hope this isn't in conflict with any previous irrevocable
pronouncements. b-)

Hope you like it,
Jess

import itertools

class mutable_iterable_list(list):
"""just like a list except it iterates gracefully under
mutation"""
def __init__(self, object):
list.__init__(self, object)
# set up a way for other methods to communicate with iterators
self._mutations = {}
self._iterator_keys = itertools.count()
def __iter__(self):
# get a unique name for this iterator and 'register' it
key = self._iterator_keys.next()
self._mutations[key] = []
i = 0
try:
while True:
# calculate the cumulative effect of all changes to
the list
# since the last yield statement
for mutation in self._mutations[key]:
delta = 0
# we must use nesting to accomodate slice deletion
-
# if we didn't accumulate in delta, i and j could
pass
# each other in the night
# otherwise a list of (j, d)'s would do
for j, d in mutation:
if j < i:
delta += d
i += delta
self._mutations[key] = []
yield self[i]
i += 1
except IndexError:
pass
del self._mutations[key] # this iterator needs no further
updates
# avoid calling list.__delslice__ and list.__setslice__
def __delslice__(self, i, j):
self.__delitem__(slice(i, j))
def __setslice__(self, i, j, object):
self.__setitem__(slice(i, j), object)
def __delitem__(self, index):
list.__delitem__(self, index)
if type(index) is slice:
# range() can't handle None as an argument
start = index.start or 0
if index.stop is None:
stop = len(self)
else:
stop = index.stop
step = index.step or 1
# record the implicit deletion at each point where it
occurs
mutations = zip(range(start, stop, step),
itertools.repeat(-1))
# inform each iterator of the changes to the list
for it in self._mutations:
self._mutations[it].append(mutations)
else:
for it in self._mutations:
self._mutations[it].append(((index, -1),))
def __setitem__(self, index, object):
list.__setitem__(self, index, object)
# otherwise __setitem__ has no effect on iterators
if type(index) is slice and index.step is None:
start = index.start or 0
if index.stop is None:
stop = len(self)
else:
stop = index.stop
mutations = zip(range(start, stop), itertools.repeat(-1))
for it in self._mutations:
self._mutations[it].append(mutations)
# record the insertion at the beginning of the slice -
this is
# added separately so that if i was pointing to part
of the
# affected slice, it will go back to the beginning of
the added
# sequence
self._mutations[it].append(((start, len(object)),))
def insert(self, index, object):
list.insert(self, index, object)
for it in self._mutations:
self._mutations[it].append(((index, 1),))
def pop(self, index):
r = list.pop(self, index)
for it in self._mutations:
self._mutations[it].append(((index, -1),))
return r
def remove(self, object):
index = self.index(object)
list.remove(self, object)
for it in self._mutations:
self._mutations[it].append(((index, -1),))
Jul 18 '05 #1
Share this Question
Share on Google+
9 Replies


P: n/a
In article <e4**************************@posting.google.com >,
Jess Austin <ja*****@post.harvard.edu> wrote:

I like the way that Python does lists, and I love the way it does
iterators. But I've decided I don't like what it does with iterators
of lists. Lists are supposed to be mutable sequences, but try to use
an iterator of a list that you're mutating and watch it all fall to
pieces. That is, if you change the length of a section of the list
through which the iterator has already passed, it will lose track of
where it is. I think this should be fixed for 2.4. I'm not sure if
this is a bug or a PEP, so I'd like to hear from others on this
newsgroup.
I'll guarantee that it won't be fixed for 2.4. This subject has come up
many times long before iterators were introduced, and the answer has
always beent the same: if you want to mutate, make a copy or make *very*
sure that your mutations don't muck the loop.
I hope this isn't in conflict with any previous irrevocable
pronouncements. b-)


Your hope is in vain. ;-)
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

"usenet imitates usenet" --Darkhawk
Jul 18 '05 #2

P: n/a

"Jess Austin" <ja*****@post.harvard.edu> wrote in message
news:e4**************************@posting.google.c om...
hi,

I like the way that Python does lists, and I love the way it does
iterators. But I've decided I don't like what it does with iterators
of lists. Lists are supposed to be mutable sequences,
and they are.
but try to use an iterator of a list that you're mutating
and watch it all fall to pieces.
Replacing items, which is by far the most common need, works fine.
That is, if you change the length of a section of the list
through which the iterator has already passed, it will lose track of
where it is.
Same is true of dicts. The limitation is documented. The need is
relatively rare. There are several alternatives, usually better.

1. make a copy and iterate through that.
2. create a new object as you iterate (for deletes only, use filter to do
this)
3. make a list of desired edits (something like your _mutations list) and
apply that *after* the iteration to do all the edits at once.

Note that filter is O(n) timewise. Simply deleting items in place and
closing up the gap, one at a time, as your approach would appear to do, is
O(n**2). Applying an edit list in a batch after the initial scan may also
be O(n) for typical cases.

4. Actually, for deletion (filtration) in place, there *is* an O(n) single
scan method. The secret is to *not* change the length of the list while
iterating but to merely move kept objects to the end of a 'kept' section of
the list (using two indexes) and then truncate the list (deleting the now
unneeded tail) *after* the scan.

5. For insertion while scanning, use an index var in a while loop and bump
up the index after doing insertions. But notice again that this can make
for an O(n**2) algorithm and it might really be better to make a list of
slices and then paste them together in a separate post-scan batch process.

6. Or, insert a large empty slice (hopefully large enough to accomodate
the net expansion) at the front of the list before starting and then copy
and insert into the empty space as you scan. (This is a generalization of
the deletion algorithm, 4.) If you run out of room due to an initial
underestimate, make another large insertion. Truncate after if there is
leftover space at the end.

Bottom line: because of the O(n**2) trap, insertion and deletion are
trickier than mere replacement.
I think this should be fixed for 2.4. I'm not sure if
this is a bug or a PEP, so I'd like to hear from others on this
newsgroup.


The listobject.c code is being reworked to make common list operations
quite noticeably faster. A slowdown to enable doing something that is
better done elsewise will not be accepted. In most cases, the limitation
protects people from making a bad algorithm choice.

Terry J. Reedy


Jul 18 '05 #3

P: n/a
hi Terry,

Thanks for your thoughtful response. I've tried to address the points
you made, but I could have missed something. b-)
"Terry Reedy" <tj*****@udel.edu> wrote in message news:<ma*************************************@pyth on.org>...
Replacing items, which is by far the most common need, works fine.
Fair enough.

That is, if you change the length of a section of the list
through which the iterator has already passed, it will lose track of
where it is.


Same is true of dicts. The limitation is documented. The need is


Actually, dicts just give up at that point. The fact that lists don't
implied to me that *someone* thought modification should work with
iterators. Your "other algorithm" suggestions all seem to
specifically avoid the idiom that I'm proposing, which is iteration
over a list that can be modified, such that the loop body can feel the
logical effects of that modification.

All of your suggestions are valid ways to deal with the current
system, by deliberately building more machinery on top of it. But
none of them allow me to simply code a loop that assumes it will get
the item that is now after the last one it got, even if it may have
made changes to the list in the interim. The class I posted does
allow that.

Note that filter is O(n) timewise. Simply deleting items in place and
closing up the gap, one at a time, as your approach would appear to do, is
O(n**2). Applying an edit list in a batch after the initial scan may also
be O(n) for typical cases. ..
..
.. Bottom line: because of the O(n**2) trap, insertion and deletion are
trickier than mere replacement.
I'm not handling the deletion myself; every method of my class besides
__iter__ calls the corresponding method of list right away. I'll
grant you that the methods other than __iter__ shouldn't spend so much
time building sequences to record what their actions were. I think
that instead they should just save a reference to their index value
(which is either a number or a slice), and the iterator can deal with
interpreting the meaning whenever it gets called. I doubt that saving
a reference to an already existing object would slow any of these
methods down that much. And of course the methods could make sure
that there are existing iterators before doing anything else. As for
the iterator, if coded correctly it only has to go through the work of
interpreting the mutation records if there are actually mutation
records. That is, the slowdown that has to be somewhere here would
only occur if the user specifically coded a loop that modified its
list.

I think I'll spend some time implementing some of the above changes to
see if I can get them to work.

The listobject.c code is being reworked to make common list operations
quite noticeably faster. A slowdown to enable doing something that is
better done elsewise will not be accepted. In most cases, the limitation
protects people from making a bad algorithm choice.


I understand what you're saying here. Most of the community doesn't
care about this. The only way to get something like this into core is
to prove it won't cost anyone anything, and then get real lucky. I
wouldn't put it as "protecting people" so much as I would put it "not
eating their cycles when they're not looking". I did actually have an
algorithm in mind when I started down this road. After inputting the
following code, calling tourney(n) builds and returns a tournament
with fair pairings for a number of parties seeded 1 through n. For
example:

tourney(35)
(((((1, (32, 33)), (16, 17)), ((8, 25), (9, 24))), (((4, 29), (13,
20)), ((5, 28), (12, 21)))), ((((2, (31, 34)), (15, 18)), ((7, 26),
(10, 23))), (((3, (30, 35)), (14, 19)), ((6, 27), (11, 22)))))

It uses the list code I posted before. It appears to me to be O(n),
which seems like the best possible for this problem.

thanks,
Jess
import math
from itertools import izip, chain, repeat
from mutable_iterable_list import mutable_iterable_list as mil

class tree(object):
def __repr__(self):
return '('+repr(self.top)+', '+repr(self.bottom)+')'
def SetTop(self, top):
self.top = top
def SetBottom(self, bottom):
self.bottom = bottom

class seed(object):
"""trees grow from seeds"""
def __init__(self, setter, value):
self.setter, self.value = setter, value
def __call__(self, value):
t = tree()
t.SetTop(seed(t.SetTop, self.value))
t.SetBottom(seed(t.SetBottom, value))
self.setter(t)
self.setter = t.SetTop
return t
def __repr__(self):
return repr(self.value)

def tourney(n):
"""Return a tree structure that reflects a tournament of n
parties"""
tournament = tree()
tournament.SetTop(seed(tournament.SetTop, 1))
seeds = mil((tournament.top,))
for x, s in izip(range(2, n+1),
chain(*repeat(seeds, int(math.log(n, 2))+1))):
seeds[:0] = [s(x).bottom]
return tournament.top
Jul 18 '05 #4

P: n/a
aa**@pythoncraft.com (Aahz) wrote in message news:<c3**********@panix2.panix.com>...
I'll guarantee that it won't be fixed for 2.4. This subject has come up
many times long before iterators were introduced, and the answer has
always beent the same: if you want to mutate, make a copy or make *very*
sure that your mutations don't muck the loop.


If this was hashed out so completely before iterators, I wonder if
iterators might make a different solution more plausible. Such a
solution wouldn't have been considered since then, because, well, it
has already been hashed out.
I hope this isn't in conflict with any previous irrevocable
pronouncements. b-)


Your hope is in vain. ;-)


It usually is; still I hope. b-)

Jess
Jul 18 '05 #5

P: n/a
[Jess Austin]
hi Terry,

Thanks for your thoughtful response. I've tried to address the points
you made, but I could have missed something. b-)
List iterators likely won't change because
* it does not encourage an error prone programming style
* it is documented to perform as it currently does
* there is likely existing code that relies on the current behavior
* there are performance issues which changing it
* there do not seem to be compelling, general use cases to warrant a
change

Also, I'm not sure your proposal is self-consistent. If I read it
correctly, there is a strong notion of having the iterator remember
the last item emitted even if its position changes. However, with a
combination of slice deletion and insertion, the notion of the last
item emitted becomes muddy:

lyst = range(10)
it = iter(lyst)
for i in xrange(5):
it.next()
lyst[3:7] = [20]
print it.next() # Should this print 20 or 7 ?

All of your suggestions are valid ways to deal with the current
system, by deliberately building more machinery on top of it. But
none of them allow me to simply code a loop that assumes it will get
the item that is now after the last one it got, even if it may have
made changes to the list in the interim. The class I posted does
allow that.


Py2.4 adds a new type, collections.deque(), that can reasonably be
made to do what your asking (workable because there is no slicing,
just appending or popping from either end and setitem value
mutations):
from collections import deque
d = deque('abcdefg')
it = iter(d)
it.next() 'a' it.next() 'b' d.extend('hijk')
d.appendleft('z')
d.appendleft('y')
list(it) # this picks up right where it left off ['c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k'] d

deque(['y', 'z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'k'])

Raymond Hettinger
Jul 18 '05 #6

P: n/a
Sorry I didn't respond in a timely fashion; I've been out of town.

py****@rcn.com (Raymond Hettinger) wrote in message news:<5d**************************@posting.google. com>...
List iterators likely won't change because
* it does not encourage an error prone programming style
* it is documented to perform as it currently does
* there is likely existing code that relies on the current behavior
* there are performance issues which changing it
* there do not seem to be compelling, general use cases to warrant a
change
These are all reasonable objections.

Also, I'm not sure your proposal is self-consistent. If I read it
correctly, there is a strong notion of having the iterator remember
the last item emitted even if its position changes. However, with a
combination of slice deletion and insertion, the notion of the last
item emitted becomes muddy:

lyst = range(10)
it = iter(lyst)
for i in xrange(5):
it.next()
lyst[3:7] = [20]
print it.next() # Should this print 20 or 7 ?
In this case my class returns a 20. My thinking on this was that if a
slice containing the "normal" next item is replaced by a slice, the
iterator should go to the beginning of the replacement slice, EXCEPT
in the case where the new slice is the same length as the old slice,
in which case the iterator should stay where it is. The exception is
for backwards compatibility with current uses.

Py2.4 adds a new type, collections.deque(), that can reasonably be
made to do what your asking (workable because there is no slicing,
just appending or popping from either end and setitem value
mutations):
from collections import deque
d = deque('abcdefg')
it = iter(d)
it.next() 'a' it.next() 'b' d.extend('hijk')
d.appendleft('z')
d.appendleft('y')
list(it) # this picks up right where it left off ['c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k'] d

deque(['y', 'z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'k'])


This looks great. It would indeed work for the class I had in mind.
Frankly, the fact that the iterator for deque works *correctly* seems
to beg the question, "why doesn't the iterator for list work
correctly?" I know, see above for the reasons. The current list
iteration stills seems ugly, and I know it's jarring to new users
because I've been jarred by it, have forgotten it, and then been
jarred again. As for performance, I think that it could be coded so
that all performance penalties would be within the iterator, and there
only after the list has been changed. There follows a revision of the
code I posted originally, which attempts to demonstrate that.

cheers,
Jess
import itertools

class muterable_list(list):
"""just like a list except it iterates gracefully under
mutation"""
Insertion = object()
Deletion = object()
def __init__(self, object_):
list.__init__(self, object_)
self._mutations = []
self._iterators = {}
self._iterators_keys = itertools.count()
def __iter__(self):
key = self._iterators_keys.next()
self._iterators[key] = []
i = 0
try:
while True:
for position, direction in self._iterators[key]:
if isinstance(position, int):
if position < i:
if direction is self.Deletion:
i += -1
else:
i += 1
elif direction is self.Deletion: # slice deletion
start, stop, step = position.indices(i)
i += (start - stop)/step
elif position.start < i: # slice insertion
i += position.stop - position.start
self._iterators[key] = []
yield self[i]
i += 1
for iterator in self._iterators:
self._iterators[iterator].extend(self._mutations)
self._mutations = []
except IndexError:
del self._iterators[key]
def __delslice__(self, i, j):
list.__delslice__(self, i, j)
if self._iterators:
self._mutations.append((slice(i, j), self.Deletion))
def __setslice__(self, i, j, object_):
list.__setslice__(self, i, j, object_)
if self._iterators:
length = len(object_)
if length != (j - i):
self._mutations.append((slice(i, j), self.Deletion))
self._mutations.append((slice(i, i + length),
self.Insertion))
def __delitem__(self, index):
list.__delitem__(self, index)
if self._iterators:
self._mutations.append((index, self.Deletion))
def __setitem__(self, index, object_):
list.__setitem__(self, index, object_)
if self._iterators:
length = len(object_)
if (isinstance(index, slice) and (index.step == 1 or
index.step is None) and length != (index.start -
index.stop)):
self._mutations.append((index, self.Deletion))
self._mutations.append((slice(index.start,
index.start+length),
self.Insertion))
def insert(self, index, object_):
list.insert(self, index, object_)
if self._iterators:
self._mutations.append((index, self.Insertion))
def pop(self, index):
r = list.pop(self, index)
if self._iterators:
self._mutations.append((index, self.Deletion))
return r
def remove(self, object_):
if self._iterators:
mutation = self.index(object_), self.Deletion
list.remove(self, object_)
self._mutations.append(mutation)
else:
list.remove(self, object_)
Jul 18 '05 #7

P: n/a
> > Also, I'm not sure your proposal is self-consistent. If I read it
correctly, there is a strong notion of having the iterator remember
the last item emitted even if its position changes. However, with a
combination of slice deletion and insertion, the notion of the last
item emitted becomes muddy:

lyst = range(10)
it = iter(lyst)
for i in xrange(5):
it.next()
lyst[3:7] = [20]
print it.next() # Should this print 20 or 7 ?
In this case my class returns a 20. My thinking on this was that if a
slice containing the "normal" next item is replaced by a slice,


When designing a new type, the trick is to avoid areas where two
different people could reasonably expect different "normal" behaviors.
Otherwise, you're doing more harm than good and making it likely that
your users will stumble into a class of bugs that are extremely
difficult to hunt down.

They are truly much better off with the paradigm of looping through
one list and building another. In-place alteration is as icebergs are
to the Titanic. Only the simplest cases are reasonable (changing
values but not size; or reverse iterating and popping away items after
the current position).
Py2.4 adds a new type, collections.deque(), that can reasonably be made to do what your asking (workable because there is no slicing,
just appending or popping from either end and setitem value
mutations):

. . . This looks great. It would indeed work for the class I had in mind.
Frankly, the fact that the iterator for deque works *correctly* seems
to beg the question, "why doesn't the iterator for list work
correctly?"
What makes lists different is that slicing operations can alter them
in the middle. The inconsistencies disappear at the end points.
Deques only mutate at the ends, so you always have a clear definition
of "what's next".
There follows a revision of the
code I posted originally, which attempts to demonstrate that.
The code is an improvement, but still suffers from the definitional
ambiguity mentioned above. I believe this is inescapable. Consider a
line at movie, you leave the line, others join in the middle, others
leave, a few more join, a few more leave, now who is the person who
"naturally" comes after you.

All the contortions in the code still suggest to me that the use case
needs a different algorithm. Try rewriting it with separate input and
output lists. My bet is that the code is clearer and more
maintainable.
import itertools
Hey, hey, I love to see people using my module ;-)

class muterable_list(list):
"""just like a list except it iterates gracefully under
mutation"""


There is one word in the docstring that is debatable. Any guesses ;-)

Raymond Hettinger
Jul 18 '05 #8

P: n/a
I hope I'm not belaboring the point. b-)

py****@rcn.com (Raymond Hettinger) wrote in message news:<5d**************************@posting.google. com>...
When designing a new type, the trick is to avoid areas where two
different people could reasonably expect different "normal" behaviors.
Otherwise, you're doing more harm than good and making it likely that
your users will stumble into a class of bugs that are extremely
difficult to hunt down.
This seems like good advice, but the current implementation of lists
in Python doesn't follow it wrt modifying while iterating. People who
remember all the footnotes to the docs expect one thing, and the rest
of us expect something else. Maybe it would be better if list
iterators puked in the same fashion that dict iterators do immediately
after the dict instance is modified. Then users wouldn't start by
appending to lists over which they are iterating (which works,
although I'm not sure it will with the new reversed iterators), and go
on to expect to be able to modify lists in a general fashion.

The code is an improvement, but still suffers from the definitional
ambiguity mentioned above. I believe this is inescapable. Consider a
Thanks. I feel that in allowing an operation that was previously
implicitly forbidden, one can be forgiven if there is some subtlety to
the semantics of the operation. But that's debatable.

All the contortions in the code still suggest to me that the use case
needs a different algorithm. Try rewriting it with separate input and
output lists. My bet is that the code is clearer and more
maintainable.


My original use case was 3 classes in 30 lines, not including the
subclassing of list. I thought the subclassing of list was fairly
componentized, and it would be unnecessary if the list implementation
were modified as I've suggested. That doesn't look too likely. b-)
I'll certainly be using collections.deque after I install 2.4!

import itertools


Hey, hey, I love to see people using my module ;-)


I love to use it. Iterators and list comprehensions were good, but
coupled with itertools they are great.

Thanks, Raymond, for all you've done for Python.

later,
Jess
Jul 18 '05 #9

P: n/a
[Raymond]
All the contortions in the code still suggest to me that the use case
needs a different algorithm. Try rewriting it with separate input and
output lists. My bet is that the code is clearer and more
maintainable.

[Jess Austin] My original use case was 3 classes in 30 lines, not including the
subclassing of list.
Depending on your application, there may be a good alternative to
mutating in-place while iterating. Consider using a Queue object or
something similar (collections.deque in Py2.4, deque objects on ASPN,
or lists with append() and pop(0)):

while inputlist:
datum = inputlist.pop(0)
[processing with the element]
if somecondition:
inputlist.append(datum)
else:
continue # effectively removing the element

The idea is to get around the issues of mutating in-place by removing
the element in question, processing it, and if needed again, adding to
the end of the queue.

The ASPN recipes on queues show show to do this a bit more efficiently
than with the list object but the idea is the same.
I love to use it. Iterators and list comprehensions were good, but
coupled with itertools they are great.

Thanks, Raymond, for all you've done for Python.


Thanks for the accolades.
Raymond
Jul 18 '05 #10

This discussion thread is closed

Replies have been disabled for this discussion.