By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,247 Members | 1,057 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,247 IT Pros & Developers. It's quick & easy.

Unexpected Behavior Iterating over a Mutating Object

P: n/a
OK, first, I don't often have the time to read this group, so
apologies if this is a FAQ, though I couldn't find anything at
python.org.

Second, this isn't my code. I wouldn't do this. But a colleague did,
got an unexpected result, and asked me why. I think I can infer what
is occurring, and I was able to find a simple work-around. But I
thought I'd ask about it anyway.

I've been pushing Python at work for use as a scripting language to
get simple tasks done. My colleague is modifying a parts library for
a schematic capture program with a Python script. The library is
entirely in ASCII text.

He's trying to delete a group of parts from the library. The simple
code below shows what is occurring (taken from an IDLE session, Python
2.4.1#65 for Windoze). The "parts" to be deleted in the example
contain the string 'DEL':

---begin included file---
data = [ 'First', 'Second DEL', 'Third', 'Fourth', 'Fifth DEL', 'DEL Sixth', 'Seventh DEL', 'Eighth DEL',
'Ninth DEL', 'Tenth', 'Eleventh', 'Twelfth']
bfr = []
for item in data: if item.find('DEL') >= 0:
bfr.append(item)
data.remove(item)

data ['First', 'Third', 'Fourth', 'DEL Sixth', 'Eighth DEL', 'Tenth',
'Eleventh', 'Twelfth'] bfr ['Second DEL', 'Fifth DEL', 'Seventh DEL', 'Ninth DEL']
---end included file---

It seems when an item is 'remove'd from data, the rest of the list
'shifts' over one, so what was 'next' now occupies the slot of the
'remove'd item. When the next 'item' is selected from 'data', the
_desired_ 'next' item is skipped. So when 'data' has several
consecutive items to be deleted, only every other one is 'remove'd.

The workaround is very simple:

---begin included file--- for item in data: if item.find('DEL') >= 0:
bfr.append(item)

for item in bfr: data.remove(item)

data ['First', 'Third', 'Fourth', 'Tenth', 'Eleventh', 'Twelfth'] bfr

['Second DEL', 'Fifth DEL', 'DEL Sixth', 'Seventh DEL', 'Eighth DEL',
'Ninth DEL']
---end included file---

Again, iterating over an item that is mutating seems like a Bad
Idea(tm) to me. But I was curious: is this the intended behavior, or
does this fall under what C programmers would call 'undefined
behavior.'

Thanks,

-=Dave
--
Change is inevitable, progress is not.
Sep 13 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a
You are correct if you remove items from a list that you
are iterating over you get the results you see. You either
need to iterate over the list in reverse order or better yet
create a new list from the old one with the items removed.

In Python 2.4 you can do:

data = [ 'First', 'Second DEL', 'Third', 'Fourth',
'Fifth DEL', 'DEL Sixth', 'Seventh DEL', 'Eighth DEL',
'Ninth DEL', 'Tenth', 'Eleventh', 'Twelfth']

bfr=[x for x in data if 'DEL' not in x]

Note: I'm not sure why 'DEL' is at the end of all the strings
EXCEPT the Sixth one where it is at the beginning, but if the
"DEL" was consistently at the end of the strings I would do
this instead (which would also work in Python 2.2+):

bfr=[x for x in data if not x.endswith('DEL')]

This also is much easier to read (IMHO) than the loops, etc.
of the original code. In Python 2.4 list comprehensions are
also quite efficient.

Larry Bates

Dave Hansen wrote:
OK, first, I don't often have the time to read this group, so
apologies if this is a FAQ, though I couldn't find anything at
python.org.

Second, this isn't my code. I wouldn't do this. But a colleague did,
got an unexpected result, and asked me why. I think I can infer what
is occurring, and I was able to find a simple work-around. But I
thought I'd ask about it anyway.

I've been pushing Python at work for use as a scripting language to
get simple tasks done. My colleague is modifying a parts library for
a schematic capture program with a Python script. The library is
entirely in ASCII text.

He's trying to delete a group of parts from the library. The simple
code below shows what is occurring (taken from an IDLE session, Python
2.4.1#65 for Windoze). The "parts" to be deleted in the example
contain the string 'DEL':

---begin included file---
data = [ 'First', 'Second DEL', 'Third', 'Fourth',
'Fifth DEL', 'DEL Sixth', 'Seventh DEL', 'Eighth DEL',
'Ninth DEL', 'Tenth', 'Eleventh', 'Twelfth']

bfr = []
for item in data:
if item.find('DEL') >= 0:
bfr.append(item)
data.remove(item)
data
['First', 'Third', 'Fourth', 'DEL Sixth', 'Eighth DEL', 'Tenth',
'Eleventh', 'Twelfth']
bfr
['Second DEL', 'Fifth DEL', 'Seventh DEL', 'Ninth DEL']
---end included file---

It seems when an item is 'remove'd from data, the rest of the list
'shifts' over one, so what was 'next' now occupies the slot of the
'remove'd item. When the next 'item' is selected from 'data', the
_desired_ 'next' item is skipped. So when 'data' has several
consecutive items to be deleted, only every other one is 'remove'd.

The workaround is very simple:

---begin included file---
for item in data:
if item.find('DEL') >= 0:
bfr.append(item)
for item in bfr:
data.remove(item)
data
['First', 'Third', 'Fourth', 'Tenth', 'Eleventh', 'Twelfth']
bfr


['Second DEL', 'Fifth DEL', 'DEL Sixth', 'Seventh DEL', 'Eighth DEL',
'Ninth DEL']
---end included file---

Again, iterating over an item that is mutating seems like a Bad
Idea(tm) to me. But I was curious: is this the intended behavior, or
does this fall under what C programmers would call 'undefined
behavior.'

Thanks,

-=Dave

Sep 13 '05 #2

P: n/a
Dave Hansen wrote:
...
data = [ 'First', 'Second DEL', 'Third', 'Fourth', 'Fifth DEL', 'DEL Sixth', 'Seventh DEL', 'Eighth DEL',
'Ninth DEL', 'Tenth', 'Eleventh', 'Twelfth']bfr = []
for item in data:

if item.find('DEL') >= 0:
bfr.append(item)
data.remove(item)


assuming Python 2.4:

kept = []
for position, text in reversed(list(enumerate(data))):
if 'DEL' in text:
removed.append(text)
else:
del data[position]

This uses reversed so positions remain valid as the list is pruned.

--Scott David Da*****@Acm.Org
Scott.Da*****@Acm.Org
Sep 13 '05 #3

P: n/a
[Dave Hansen]
It seems when an item is 'remove'd from data, the rest of the list
'shifts' over one, so what was 'next' now occupies the slot of the
'remove'd item. When the next 'item' is selected from 'data', the
_desired_ 'next' item is skipped. So when 'data' has several
consecutive items to be deleted, only every other one is 'remove'd.
Your analysis is dead-on. Instead of being the "desired" item, the
iterator fetches a[0], a[1], a[2], ... as determined by the state of
the list when the index is retrieved.
Again, iterating over an item that is mutating seems like a Bad
Idea(tm) to me.
Right.

But I was curious: is this the intended behavior, or
does this fall under what C programmers would call 'undefined
behavior.'


It is intended. The behavior is defined as above (a series of indexed
lookups).

BTW, the usual ways to deal with this are to:
1. iterating from the right using: for item in reversed(data)
2. making a copy of the original list before iterating
3. creating a new result list: data = [x for x in data if 'DEL' not in
x]
Raymond

Sep 13 '05 #4

P: n/a
Dave Hansen wrote:
Again, iterating over an item that is mutating seems like a Bad
Idea(tm) to me. But I was curious: is this the intended behavior, or
does this fall under what C programmers would call 'undefined
behavior.'


a C programmer wouldn't have much problem with this, of course,
since "for in" behaves like:

for (tmp = 0; var = getitem(seq, tmp); tmp++) {
...
}

</F>

Sep 14 '05 #5

P: n/a
Dave Hansen wrote:
(snip code snippets and sensible explanations)
Again, iterating over an item that is mutating seems like a Bad
Idea(tm) to me.
It as *always* been a bad idea to modify a list in place (I mean adding
or removing items) while iterating over it, whatever the language. If
you *really* need to do such a thing (for effeciency reasons - and then
it's usually in low-level C code), you'd better use indexed access, and
adjust the index as needed - but this results in tricky, hard to
maintain code.
But I was curious: is this the intended behavior, or
does this fall under what C programmers would call 'undefined
behavior.'


Not being a Language Lawyer(tm), I can't tell for sure, but I'd think
it's the expected behavior. Anyway it's not a behavior I'd relie upon,
since it would be too much of a dirty trick anyway.

My 2 cents
--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom.gro'.split('@')])"
Sep 14 '05 #6

P: n/a
On Tue, 13 Sep 2005 21:28:21 GMT, id**@hotmail.com (Dave Hansen)
wrote:
OK, first, I don't often have the time to read this group, so
apologies if this is a FAQ, though I couldn't find anything at
python.org.


Thanks to everyone who responded. All is clear now. And I know I
need to look deeper into list comprehensions...

Regards,

-=Dave
--
Change is inevitable, progress is not.
Sep 14 '05 #7

P: n/a
On Wed, 14 Sep 2005 11:12:14 +0200, bruno modulix <on***@xiludom.gro> wrote:
Dave Hansen wrote:
(snip code snippets and sensible explanations)
Again, iterating over an item that is mutating seems like a Bad
Idea(tm) to me.
It as *always* been a bad idea to modify a list in place (I mean adding
or removing items) while iterating over it, whatever the language. If
you *really* need to do such a thing (for effeciency reasons - and then
it's usually in low-level C code), you'd better use indexed access, and
adjust the index as needed - but this results in tricky, hard to
maintain code.

Do you consider this to be tricky, or likely to be hard to maintain?
(not specifically tested beyond what you see ;-)
data = [ 'First', 'Second DEL', 'Third', 'Fourth', ... 'Fifth DEL', 'DEL Sixth', 'Seventh DEL', 'Eighth DEL',
... 'Ninth DEL', 'Tenth', 'Eleventh', 'Twelfth'] iw = 0
for item in data: ... if 'DEL' in item: continue
... data[iw] = item
... iw += 1
... del data[iw:]
data ['First', 'Third', 'Fourth', 'Tenth', 'Eleventh', 'Twelfth']

Or that maybe a utility function like the following would be hard to maintain?
def filter_in_place(L, keep=bool): ... iw = 0
... for item in L:
... if keep(item):
... L[iw] = item
... iw += 1
... del L[iw:]
... data = [ 'First', 'Second DEL', 'Third', 'Fourth', ... 'Fifth DEL', 'DEL Sixth', 'Seventh DEL', 'Eighth DEL',
... 'Ninth DEL', 'Tenth', 'Eleventh', 'Twelfth'] data ['First', 'Second DEL', 'Third', 'Fourth', 'Fifth DEL', 'DEL Sixth', 'Seventh DEL', 'Eighth DEL'
, 'Ninth DEL', 'Tenth', 'Eleventh', 'Twelfth'] filter_in_place(data, lambda x: 'DEL' not in x)
data

['First', 'Third', 'Fourth', 'Tenth', 'Eleventh', 'Twelfth']

But I was curious: is this the intended behavior, or
does this fall under what C programmers would call 'undefined
behavior.'


Not being a Language Lawyer(tm), I can't tell for sure, but I'd think
it's the expected behavior. Anyway it's not a behavior I'd relie upon,
since it would be too much of a dirty trick anyway.

I think it depends on the kind of mutation involved. A mutation of the _structure_
of a container being iterated over effectively modifies the initial parameters of
the iteration control, so there you need either to have intimate knowledge of how
the iterator does its thing, or you have to insulate yourself from that need.
OTOH, a mutation of the content can be safe, if it doesn't modify the content
yet to be accessed during the iteration.

In the case of a list, the order can be depended on, so it's not that hard.
Other iterable containers are not so clear, and of course even in a list you
could have cases of access to early items having side effects on upcoming items,
e.g., if the "keep" evaluation had side effects on deep data shared between early
and later list items. But that would be nasty any which way ;-)

Regards,
Bengt Richter
Sep 14 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.