Deleting from a list while iterating

Rhamphoryncus

The problems of this are well known, and a suggestion for making this
easier was recently posted on python-dev. However, I believe this can
be done just as well without a change to the language. What's more,
most of the suggested methods (in my search results as well as the
suggestion itself) do not scale well, which my approach would solve.

My approach is to make a set of indexes to removed while iterating,
then use a list comprehension to filter them out after. Timings of
this and two other common approaches follow:

setapproach = """\
def func(count):
from random import random
items = [random() for i in xrange(count)]
remove = set()
for index, x in enumerate(items):
#...do something...
if x < 0.5:
remove.add(index)
items = [x for index, x in enumerate(items) if index not in remove]
"""

copyapproach = """\
def func(count):
from random import random
items = [random() for i in xrange(count)]
for x in items[:]:
if x < 0.5:
items.remove(x)
"""

reverseapproach = """\
def func(count):
from random import random
items = [random() for i in xrange(count)]
for index in range(len(items) - 1, -1, -1):
if items[index] < 0.5:
del items[index]
"""

>>import timeit
timeit.Timer(stmt='func(1000)', setup=setapproach).timeit(1)

0.0016040802001953125

>>timeit.Timer(stmt='func(1000)', setup=copyapproach).timeit(1)

0.0085191726684570312

>>timeit.Timer(stmt='func(1000)', setup=reverseapproach).timeit(1)

0.0011308193206787109

>>timeit.Timer(stmt='func(10000)', setup=setapproach).timeit(1)

0.021183013916015625

>>timeit.Timer(stmt='func(10000)', setup=copyapproach).timeit(1)

1.0268981456756592

>>timeit.Timer(stmt='func(10000)', setup=reverseapproach).timeit(1)

0.038264989852905273

>>timeit.Timer(stmt='func(100000)', setup=setapproach).timeit(1)

0.23896384239196777

>>timeit.Timer(stmt='func(100000)', setup=copyapproach).timeit(1)

274.57498288154602

>>timeit.Timer(stmt='func(100000)', setup=reverseapproach).timeit(1)

2.2382969856262207

As you can see, although reverse iteration is somewhat faster at
smaller sizes, a set is substantially faster at larger sizes, and I
believe is more readable anyway. Copying shouldn't even be considered
unless you know the size will always be trivial (< 1000).

I'm sure there's a few other approaches that would do even better under
certain conditions. One is a generator, if your input and output
should both be iterators. Another is using slicing to move contiguous
sections of retained items over the removed items. I leave both of
these as an exercise for the reader.

--
Adam Olsen, aka Rhamphoryncus

Dec 3 '06 #1

Subscribe Post Reply

2408

Marc 'BlackJack' Rintsch

In <11*********************@79g2000cws.googlegroups.c om>, Rhamphoryncus
wrote:

My approach is to make a set of indexes to removed while iterating,
then use a list comprehension to filter them out after. Timings of
this and two other common approaches follow:

setapproach = """\
def func(count):
from random import random
items = [random() for i in xrange(count)]
remove = set()
for index, x in enumerate(items):
#...do something...
if x < 0.5:
remove.add(index)
items = [x for index, x in enumerate(items) if index not in remove]
"""

Why do you make it that complicated? If you are going to build a new list
anyway, this can be done without the `set()` and just one listcomp:

items = [x for x in items if x < 0.5]

No need to iterate twice over the `items`. The two other approaches you
gave are just needed if it's important that the elements are deleted "in
place", i.e. that you don't rebind `items` to a new object.

Ciao,
Marc 'BlackJack' Rintsch

Dec 3 '06 #2

Fredrik Lundh

Rhamphoryncus wrote:

As you can see, although reverse iteration is somewhat faster at
smaller sizes, a set is substantially faster at larger sizes, and I
believe is more readable anyway.

your set approach doesn't modify the list in place, though; it creates
a new list, in a rather roundabout way. if modification in place isn't
important, the normal way is of course to create a *new* list:

items = [i for i in items if not i < 0.5]

on my machine, that's about two orders of magnitude faster than your
"fast" approach for n=100000.

(or twice as fast, if I don't factor out the time it takes to *create*
the original list from the benchmark).

</F>

Dec 3 '06 #3

Fredrik Lundh

Fredrik Lundh wrote:

on my machine, that's about two orders of magnitude faster than your
"fast" approach for n=100000.

oops. forget that; it's three times faster, if you're actually creating
the entire list, and not just a generator that will create it on demand ;-)

</F>

Dec 3 '06 #4

Martin v. Löwis

Rhamphoryncus schrieb:

setapproach = """\
def func(count):
from random import random
items = [random() for i in xrange(count)]
remove = set()
for index, x in enumerate(items):
#...do something...
if x < 0.5:
remove.add(index)
items = [x for index, x in enumerate(items) if index not in remove]
"""

This is different from the other approaches in that it doesn't
modify items. If you wanted a new list, you could incrementally
build one already in the first pass, no need to collect the
indices first (as BlackJack explains).

If you wanted in-place modification, I'd do

newitems = []
for x in items:
if not (x < 0.5):
newitems.append(x)
items[:] = newitems

copyapproach = """\
def func(count):
from random import random
items = [random() for i in xrange(count)]
for x in items[:]:
if x < 0.5:
items.remove(x)
"""

This happens to work for your example, but is incorrect
in the general case: you meant to write

del items[i+removed]

here, as .remove(x) will search the list again, looking
for the first value to remove. If your condition for
removal happens to leave some items equal to x in the
list, it would remove the wrong item. Because the numbering
changes while the iteration is in progress, you have
to count the number of removed items also.

Regards,
Martin

Dec 3 '06 #5

Rhamphoryncus

Marc 'BlackJack' Rintsch wrote:

Why do you make it that complicated? If you are going to build a new list
anyway, this can be done without the `set()` and just one listcomp:

Fredrik Lundh wrote:

your set approach doesn't modify the list in place, though; it creates
a new list, in a rather roundabout way. if modification in place isn't
important, the normal way is of course to create a *new* list:

items = [i for i in items if not i < 0.5]

on my machine, that's about two orders of magnitude faster than your
"fast" approach for n=100000.

Sorry, I should have clarified that the original post assumed you
needed info from the "do something" phase to determine if an element is
removed or not. As you say, a list comprehension is superior if that
is not necessary.

--
Adam Olsen, aka Rhamphoryncus

Dec 3 '06 #6

Fredrik Lundh

Marc 'BlackJack' Rintsch wrote:

No need to iterate twice over the `items`. The two other approaches you
gave are just needed if it's important that the elements are deleted "in
place", i.e. that you don't rebind `items` to a new object.

and even when you do, that can often be written as, e.g:

items[:] = (i for i in items if not i < 0.5)

</F>

Dec 3 '06 #7

Fredrik Lundh

Rhamphoryncus wrote:

Sorry, I should have clarified that the original post assumed you
needed info from the "do something" phase to determine if an element is
removed or not. As you say, a list comprehension is superior if that
is not necessary.

that's spelled

out = []
for i in items:
... do something ...
if i 0.5:
out.append(i)

in Python, and is only a little slower than a list comprehension, as
written above. if performance is really important, move the method
lookup out of the loop:

out = []
append = out.append
for i in items:
... do something ...
if i 0.5:
append(i)

</F>

Dec 3 '06 #8

Rhamphoryncus

Martin v. Löwis wrote:

Rhamphoryncus schrieb:
setapproach = """\
def func(count):
from random import random
items = [random() for i in xrange(count)]
remove = set()
for index, x in enumerate(items):
#...do something...
if x < 0.5:
remove.add(index)
items = [x for index, x in enumerate(items) if index not in remove]
"""

This is different from the other approaches in that it doesn't
modify items. If you wanted a new list, you could incrementally
build one already in the first pass, no need to collect the
indices first (as BlackJack explains).

I didn't feel this distinction was worth mentioning, since "oldlist[:]
= newlist" is so trivial. The only solution that really avoids making
a copy is the reverse-iteration one.

If you wanted in-place modification, I'd do

newitems = []
for x in items:
if not (x < 0.5):
newitems.append(x)
items[:] = newitems

I agree, that does seem simpler.

copyapproach = """\
def func(count):
from random import random
items = [random() for i in xrange(count)]
for x in items[:]:
if x < 0.5:
items.remove(x)
"""

This happens to work for your example, but is incorrect
in the general case: you meant to write

del items[i+removed]

here, as .remove(x) will search the list again, looking
for the first value to remove. If your condition for
removal happens to leave some items equal to x in the
list, it would remove the wrong item. Because the numbering
changes while the iteration is in progress, you have
to count the number of removed items also.

I agree that the example I gave here sucks. However, I copied it from
another posting as a recommended method to get removal to work *right*
(without noticing how slow it is).

There seems to have been many distinct approaches to this problem over
the years. Hopefully we can converge on a single ideal solution (or a
few situational ones) as TOOWTDI.

--
Adam Olsen, aka Rhamphoryncus

Dec 3 '06 #9

Martin v. Löwis

Rhamphoryncus schrieb:

>This is different from the other approaches in that it doesn't
modify items. If you wanted a new list, you could incrementally
build one already in the first pass, no need to collect the
indices first (as BlackJack explains).

I didn't feel this distinction was worth mentioning, since "oldlist[:]
= newlist" is so trivial. The only solution that really avoids making
a copy is the reverse-iteration one.

The distinction is relevant for performance, as the slice assignment
has a complexity linear with the number of elements.

Whether making a copy is expensive depends on the precise data: the
solution that makes the copy in reverse may have quadratic complexity
(if the number of removed elements increases with the list size);
the version that appends all remaining elements to a new list and
then does slice assignment should behave better-than-quadratic
(in recent versions, it should give you linear performance).

Regards,
Martin

Dec 3 '06 #10

by: Patrick von Harsdorf | last post by:

I want to iterate over a collection and delete all unwanted entries. for item in collection: del item of course doesn´t do anything useful. Two things come to mind: a) iterating backwards...

Python

deleting elements from a list in a for loop

by: flupke | last post by:

Hi, i'm having trouble with deleting elements from a list in a for loop ============== test program ============== el = print "**** Start ****" print "List = %s " % el index = 0 for line...

Python

Having trouble deleting objects from a list...

by: KraftDiner | last post by:

I have a list, and within it, objects are marked for deletion. However when I iterate through the list to remove objects not all the marked objects are deleted.. here is a code portion: i = 0...

Python

Deleting multiple elements from STL hash_multiset

by: Varun Kacholia | last post by:

I apologize if there exists a standard way of deleting multiple elements from a STL hash_multiset (or even multiset for that matter) that I am unaware of. The problem, I see, with multisets is...

C / C++

Deleting items from std::list

by: Markus Svilans | last post by:

Hi, There seems to be some functionality missing from the STL. I am iterating through a linked list (std::list) using a reverse iterator and attempting to erase certain items from the list. It...

C / C++

More Efficient: Hashtable or List

by: Macca | last post by:

Hi, My app needs to potentially store a large number of custom objects and be able to iterate through them quickly. I was wondering which data structure would be the most efficient to do this,a...

C# / C Sharp

Adding / Deleting between two DataTables / ListBoxes.

by: laurasaur | last post by:

Hi everyone, I have 2 listboxes that I need to move items between, they are both bound to DataTables which get populated from the database with a list of clients. Im getting a few problems,...

C# / C Sharp

deleting a line

by: programming | last post by:

how do i delete from a text file 1 of the following lines: jon|scott adam|smith <--delete paul|clark say i would like to delete the middle line of this txt, in member.txt what php code or...

PHP

Partially iterating through JSON list looses attrib properties

by: RMWChaos | last post by:

The next episode in the continuing saga of trying to develop a modular, automated DOM create and remove script asks the question, "Where should I put this code?" Alright, here's the story: with a...

Javascript

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Deleting from a list while iterating

Similar topics