how to remove 50000 elements from a 100000 list?

I want to remove about 50000 elements from a list,which has 100000
elements.
sample code like below:

a=range(10)
b=range(4)
for x in b: .... a.remove(x)
.... a [4, 5, 6, 7, 8, 9]

when a and b is small size, it will finished quickly, but when a and b
have many elements.
such as: a=range(100000)
b=range(50000)
for x in b:

.... a.remove(x)
....
it will very slowly. Shall I change to another data structure and choos
a better arithmetic?
any suggestion is welcome.
thanks a lot!

May 5 '06 #1

Subscribe Post Reply

1659

Diez B. Roggisch

Ju Hui wrote:

I want to remove about 50000 elements from a list,which has 100000
elements.
sample code like below:
a=range(10)
b=range(4)
for x in b: ... a.remove(x)
... a [4, 5, 6, 7, 8, 9]

when a and b is small size, it will finished quickly, but when a and b
have many elements.
such as: a=range(100000)
b=range(50000)
for x in b:

... a.remove(x)
...
it will very slowly. Shall I change to another data structure and choos
a better arithmetic?

How about a listcomprehension?
new_list = [e for e in old_list if predicate(e)]
# Possibly you need this, too:
old_list[:] = new_list
Diez

May 5 '06 #2

roman.zeyde

Try to use set objects:

a=set(range(100000))
b=set(range(50000))
a = a - b

May 5 '06 #3

Tim Chase

> but when a and b have many elements. such as:

a=range(100000)
b=range(50000)
for x in b:
... a.remove(x)
...
it will very slowly.

Well, your problem is rather ambiguous. In your examples,
you're removing contiguous ranges. Thus, you should be able
to do something like

a = range(100000)
del a[:50000]
which may have a considerable speedup. However, if B
contains a disjoint sets of entries, the simple solution
will require A*B checks, making it dependant on the size of
A and B (as you've discovered).

Assuming the "del" method is considerably faster, you might
be able to do some sort of optimization of the disjoint set,
using the above-mentioned method. Break B into contiguous
pieces, and then pass those as slices to delete.

Or if B is a pattern of some sort, you could do
a = range(100000)
del a[::2] #delete even numbers
a = range(100000)
del a[1::2] #delete odd numbers
a = range(100000)
del a[2::5] # delete every 5th element beginning with the 3rd

Another attempt might be to try
a = [x for x in a if x not in b]

However, this is still doing A*B checks, and will likely
degrade with as their sizes increase.

Just a few ideas.

-tkc

May 5 '06 #4

Jack Orenstein

On May 5, 2006, at 9:36 AM, Ju Hui wrote:

a=range(100000)
b=range(50000)
for x in b:
... a.remove(x)
...
it will very slowly. Shall I change to another data structure and

choos a better arithmetic?
any suggestion is welcome.

If removal is an O(n) operation, then removing 1/2 the list will take
O(n**2), which you don't want. You'd be better off with the contents of
"a" being in a hash table (O(1) removal in practice) or a balanced
tree (O(log n) removal).

Another possibility: If the a and b lists are ordered in the same way,
then you could walk through the lists in order using a merge procedure,
generating a new list as you go.

After ruling out slow data structures and algorithms, you'll almost
certainly be better off using something built in to Python rather than
coding your own data structure in Python.

Jack Orenstein

May 5 '06 #5

Larry Bates

Ju Hui wrote:

I want to remove about 50000 elements from a list,which has 100000
elements.
sample code like below:
a=range(10)
b=range(4)
for x in b: ... a.remove(x)
... a [4, 5, 6, 7, 8, 9]

when a and b is small size, it will finished quickly, but when a and b
have many elements.
such as: a=range(100000)
b=range(50000)
for x in b:

... a.remove(x)
...
it will very slowly. Shall I change to another data structure and choos
a better arithmetic?
any suggestion is welcome.
thanks a lot!

A list isn't going to respond very well to random removals of large
numbers of elements like this. Remember that it must do linear search
to find the value to be removed and then basically build a completely
new list with the remaining values. Depending on the data in question,
you might be able to leverage things. Are the two lists sorted? If so
you can iterate over both of them and build a third list with the results.

This is not particularly 'elegant' but it is fast for sorted lists:

import time
a=range(100000)
b=range(50000)

start_time=time.time()
for x in b:
a.remove(x)

stop_time=time.time()
print "brute force elapsed time=%.2f seconds" % (stop_time-start_time)

start_time=time.time()
n=[]
i1=0
i2=0
while 1:
try: v1=a[i1]
except IndexError:
break

try: v2=b[i2]
except IndexError:
n.extend(a[i1:])
break

if v1 < v2:
n.append(v1)
i1+=1
continue

elif v1 > v2:
i2+=1
continue

else:
i1+=1
i2+=1
continue

stop_time=time.time()
print "new elapsed time=%.2f seconds" % (stop_time-start_time)

start_time=time.time()
a=set(a)
b=set(b)
a.symmetric_difference(b)
stop_time=time.time()
print "sets elapsed time=%.2f seconds" % (stop_time-start_time)

brute force elapsed time=4.13 seconds
new elapsed time=0.05 seconds
sets elapsed time=0.03 seconds

There may be other ways using bisect or some other module. If data
is random, unsorted or has duplicates, I would look at in-memory
database like SQLlite instead.

If the values are unique you can do something like:

a=set(range(100000))
b=set(range(50000))

a.symmetric_difference(b)

Which is the fastest way.

It all depends on the "real" data which we can't tell from your
test using range().

-Larry Bates

May 5 '06 #6

Diez B. Roggisch

> How about a listcomprehension?

new_list = [e for e in old_list if predicate(e)]
# Possibly you need this, too:
old_list[:] = new_list

forgot the predicate. And you should use a set of objects to remove, as then
you'd have O(1) behavior for the in-operator

So:

to_remove = set(b)
new_list = [e for e in a if e not in to_remove]

Diez

May 5 '06 #7

Sion Arrowsmith

Tim Chase <py*********@tim.thechases.com> wrote:

Another attempt might be to try
a = [x for x in a if x not in b]
However, this is still doing A*B checks, and will likely
degrade with as their sizes increase.

Combine this with the use of sets others have suggested if the
order of a matters, ie:

bset = set(b)
a = [ x for x in a if x not in bset ]

which gets you down to O(A) since set membership is O(1).

--
\S -- si***@chiark.greenend.org.uk -- http://www.chaos.org.uk/~sion/
___ | "Frankly I have no feelings towards penguins one way or the other"
\X/ | -- Arthur C. Clarke
her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump

May 5 '06 #8

Ju Hui

cool!
thanks you all!
I choose
a=set(range(100000))
b=set(range(50000))
a.symmetric_difference(b)
certainly,the real data not range(), I am satisfied the performance of
set.difference

thank you all again!

May 5 '06 #9

Peter Otten

Ju Hui wrote:

cool!
thanks you all!
I choose
a=set(range(100000))
b=set(range(50000))
a.symmetric_difference(b)
certainly,the real data not range(), I am satisfied the performance of
set.difference

thank you all again!

Be warned that this may /add/ items to a:

set("abc").symmetric_difference("bcd")

set(['a', 'd'])

If your subject is correct you want difference(), not
symmetric_difference().

Peter

May 5 '06 #10

Andrew Gwozdziewycz

It's easy in this case:

a = range(50000, 100000)

But, I'm just trying to add some humor to this thread :)

On May 5, 2006, at 9:36 AM, Ju Hui wrote:

I want to remove about 50000 elements from a list,which has 100000
elements.
sample code like below:
a=range(10)
b=range(4)
for x in b: ... a.remove(x)
... a [4, 5, 6, 7, 8, 9]

when a and b is small size, it will finished quickly, but when a and b
have many elements.
such as: a=range(100000)
b=range(50000)
for x in b:

... a.remove(x)
...
it will very slowly. Shall I change to another data structure and
choos
a better arithmetic?
any suggestion is welcome.
thanks a lot!

--
http://mail.python.org/mailman/listinfo/python-list

---
Andrew Gwozdziewycz
ap****@gmail.com
http://www.23excuses.com
http://ihadagreatview.org
http://and.rovir.us

May 5 '06 #11

Larry Bates

Peter Otten wrote:

Ju Hui wrote:
cool!
thanks you all!
I choose
a=set(range(100000))
b=set(range(50000))
a.symmetric_difference(b)
certainly,the real data not range(), I am satisfied the performance of
set.difference

thank you all again!

Be warned that this may /add/ items to a:
set("abc").symmetric_difference("bcd")

set(['a', 'd'])

If your subject is correct you want difference(), not
symmetric_difference().

Peter

Whoops. My bad. Peter is correct.

-Larry

May 5 '06 #12

Ju Hui

to Andrew Gwozdziewycz:
Real humor...
Peter Otten:
thanks your reminder, in my project, a will a superset of b.
so symmetric_difference equals difference.
thank you all again!

May 6 '06 #13

Similar topics

Selecting elements from a list

by: Martin Christensen | last post by:

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Howdy! Suppose I have a list A containing elements that I want to put into list B if they satisfy some property. The obvious way of doing it...

Python

STL function REMOVE doesn't work

by: Tony Johansson | last post by:

Hello! I have some problem with STL function remove I have two classes called Handle which is a template class and Integer which is not a template class. The Integer class is just a wrapper...

C / C++

list remove and insert

by: Arne Claus | last post by:

Hi If've just read, that remove() on a list does not actually remove the elements, but places them at the end of the list (according to TC++STL by Josuttis). It also says, that remove returns a...

C / C++

Function to remove an item from a linked list failing ~1 out of 10,000 times

by: Kieran Simkin | last post by:

Hi all, I'm having some trouble with a linked list function and was wondering if anyone could shed any light on it. Basically I have a singly-linked list which stores pid numbers of a process's...

C / C++

numarray :: multiplying all the elements in 1d array

by: Suresh Jeevanandam | last post by:

Hi all, Lets say I have an array: from numarray import * a = array() I want to multiply out all the elements and get the result. r = 1.0 for i in a: r = r*i

Python

Remove "smallest" element from a linked list?

by: eksamor | last post by:

I have a simple linked list: struct element { struct element *next; int start; }; struct list { struct element *head;

C / C++

how to remove multiple occurrences of a string within a list?

by: bahoo | last post by:

Hi, I have a list like and as output I want If I myList.remove('0024') then only the first instance of '0024' is removed.

Python

<select>.remove() Performance and Necessity

by: Richard Maher | last post by:

Hi, I have read many of the copius entries on the subject of IE performance (or the lack thereof) when populating Select Lists. I don't mind the insert performance so much, (I get 100x120byte...

Javascript

Regex to remove \t \r \n from string

by: morleyc | last post by:

Hi, i would like to remove a number of characters from my string (\t \r \n which are throughout the string), i know regex can do this but i have no idea how. Any pointers much appreciated. Chris

C# / C Sharp

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server