Eliminating duplicates entries from a list efficiently

Paul

Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u

print uniq(['a','b','c','a'])
['a', 'b', 'c']

Thanks,

-- Paul

Jul 18 '05 #1

Subscribe Post Reply

2292

Roy Smith

In article <cc**********@216.39.144.46>, Paul <pa**@oz.net> wrote:

Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u

print uniq(['a','b','c','a'])
['a', 'b', 'c']

Thanks,

-- Paul

Something like:

d = {}
for item in list:
d[item] = True
list = d.keys()

This should do it and will run in, if I'm not mistaken, O(n). The only
problem is that it scrambles the order of the items in the list, which
may or may not be a problem, depending on your application.

Jul 18 '05 #2

Gerhard Häring

Paul wrote:

Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u [...]

That's slow for large lists. Better use a dictionary u. Also it's unwise
to reuse builtin names like "list" as variable names.

Many discussions of this can be found in the mailing list/newsgroup
archives via http://groups.google.com I also remember some neat
solutions for this problem in the Python cookbook.

-- Gerhard

Jul 18 '05 #3

Roger Binns

Paul wrote:

Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine,

The less naive approach to go to the Python cookbook which contains
lots of recipes for doing all sorts of things, commented on and
improved by the community.

The starting page is:

http://aspn.activestate.com/ASPN/Python/Cookbook/

For example here is a recipe by one Tim Peters that shows how to
do it, including all the subtle cases you didn't even realise
existed.

http://aspn.activestate.com/ASPN/Coo...n/Recipe/52560

I would highly recommend getting the Python Cookbook dead tree
version.

Roger

Jul 18 '05 #4

François Pinard

[Roy Smith]

[Paul]
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

Something like: d = {}
for item in list:
d[item] = True
list = d.keys() This should do it and will run in, if I'm not mistaken, O(n). The
only problem is that it scrambles the order of the items in the list,
which may or may not be a problem, depending on your application.

If order is not important, here is a short idiom:

from sets import Set as set
d = list(set(d))

This may be slower than the dictionary solution in current Python. Sets
were experimental, but as they are stabilising, I think there are plans
now to make them faster in subsequent Python releases. So, it might be
a good bet, learning to depend on them in the long run.

--
François Pinard http://www.iro.umontreal.ca/~pinard

Jul 18 '05 #5

Duncan Booth

Roy Smith <ro*@panix.com> wrote in news:roy-
39*******************@reader2.panix.com:

Something like:

d = {}
for item in list:
d[item] = True
list = d.keys()

Or just:

aList = dict.fromkeys(aList).keys()

which does the same thing (except I renamed the variable to avoid confusion
with the list type).

Jul 18 '05 #6

phil hunt

On 3 Jul 2004 00:40:00 GMT, Paul <pa**@oz.net> wrote:

Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u

print uniq(['a','b','c','a'])
['a', 'b', 'c']

How about:

def uniq(list):
list.sort() # put duplicate entries together
lastValue = "different from 0th value" + str(list[0])
u = []
for item in list:
if item != lastValue:
u.append(item)
lastValue = item
return u

--
"It's easier to find people online who openly support the KKK than
people who openly support the RIAA" -- comment on Wikipedia
(Email: zen19725 at zen dot co dot uk)

Jul 18 '05 #7

Larry Bates

Take a look at recipe here:

http://aspn.activestate.com/ASPN/Coo.../Recipe/204297

another way would be to create a dictionary (which would
eliminate the duplicates) and then convert back to a list.

uniq_list=dict(zip(list,list)).keys()

FYI,
Larry Bates
Syscon, Inc.
"phil hunt" <ze******@zen.co.uk> wrote in message
news:sl********************@cabalamat.cabalamat.or g...

On 3 Jul 2004 00:40:00 GMT, Paul <pa**@oz.net> wrote:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u

print uniq(['a','b','c','a'])
['a', 'b', 'c']

How about:

def uniq(list):
list.sort() # put duplicate entries together
lastValue = "different from 0th value" + str(list[0])
u = []
for item in list:
if item != lastValue:
u.append(item)
lastValue = item
return u

--
"It's easier to find people online who openly support the KKK than
people who openly support the RIAA" -- comment on Wikipedia
(Email: zen19725 at zen dot co dot uk)

Jul 18 '05 #8

Scott David Daniels

Larry Bates wrote:

Take a look at recipe here:...
On 3 Jul 2004 00:40:00 GMT, Paul <pa**@oz.net> wrote:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow

Or even (you get to the first few earlier):

def uniq(lst):
matched = {}
for item in lst:
if item not in matched:
yield item
matched[item] = item

--
-Scott David Daniels
Sc***********@Acm.Org

Jul 18 '05 #9

Paul Rubin

Paul <pa**@oz.net> writes:

Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

I generally do it by sorting the list and then making a pass through
it moving stuff down:

def uniq(a):
"Remove duplicate elements from a. a must be sorted."
p=0
for i in xrange(1,len(a)):
if a[i] != a[p]:
p=p+1
a[p]=a[i]
del a[p+1:]

Jul 18 '05 #10

Similar topics

delete duplicates in list

by: christof hoeke | last post by:

hello, this must have come up before, so i am already sorry for asking but a quick googling did not give me any answer. i have a list from which i want a simpler list without the duplicates an...

Python

Eliminating duplicate entries

by: A.T. | last post by:

I have an XML document that looks like the following, except that it has about 30,000 entries. <root> <version id="A-0000"> <!many elements within- removed for ease of reading> </version>...

.NET Framework

REQ HELP: Problem eliminating duplicates

by: MHenry | last post by:

Hi, I have a table with duplicate records. Some of the duplicates need to be eliminated from the table and some need not. A duplicate record does not need to be eliminated if the one record...

Microsoft Access / VBA

Eliminating Duplicates in MS Access while updating them

by: Marlene | last post by:

Hi All I have the following scenario, where I have found all the duplicates in a table, based on an order number and a part number (item).I might have something like this: Order PODate Rec...

Microsoft Access / VBA

Resolving duplicates entries in table among 10 databases

by: AK | last post by:

Hi Our product uses MS-SQL Server 2000. One of our customer has 10 installations with each installation stroring data in its own database. Now the customer wants to consolidate these databases...

Microsoft SQL Server

Filtering a list - efficiently

by: =?Utf-8?B?YWJjZA==?= | last post by:

I have a list of items. Some of the items are another list and that list item could be another list. Now I want to filter the main list by the inner list item value based on some criteria... My...

C# / C Sharp

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++