469,643 Members | 1,531 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,643 developers. It's quick & easy.

Eliminating duplicates entries from a list efficiently

Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u

print uniq(['a','b','c','a'])
['a', 'b', 'c']

Thanks,

-- Paul
Jul 18 '05 #1
9 2144
In article <cc**********@216.39.144.46>, Paul <pa**@oz.net> wrote:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u

print uniq(['a','b','c','a'])
['a', 'b', 'c']

Thanks,

-- Paul


Something like:

d = {}
for item in list:
d[item] = True
list = d.keys()

This should do it and will run in, if I'm not mistaken, O(n). The only
problem is that it scrambles the order of the items in the list, which
may or may not be a problem, depending on your application.
Jul 18 '05 #2
Paul wrote:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u [...]


That's slow for large lists. Better use a dictionary u. Also it's unwise
to reuse builtin names like "list" as variable names.

Many discussions of this can be found in the mailing list/newsgroup
archives via http://groups.google.com I also remember some neat
solutions for this problem in the Python cookbook.

-- Gerhard

Jul 18 '05 #3
Paul wrote:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine,


The less naive approach to go to the Python cookbook which contains
lots of recipes for doing all sorts of things, commented on and
improved by the community.

The starting page is:

http://aspn.activestate.com/ASPN/Python/Cookbook/

For example here is a recipe by one Tim Peters that shows how to
do it, including all the subtle cases you didn't even realise
existed.

http://aspn.activestate.com/ASPN/Coo...n/Recipe/52560

I would highly recommend getting the Python Cookbook dead tree
version.

Roger
Jul 18 '05 #4
[Roy Smith]
[Paul]
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

Something like: d = {}
for item in list:
d[item] = True
list = d.keys() This should do it and will run in, if I'm not mistaken, O(n). The
only problem is that it scrambles the order of the items in the list,
which may or may not be a problem, depending on your application.


If order is not important, here is a short idiom:

from sets import Set as set
d = list(set(d))

This may be slower than the dictionary solution in current Python. Sets
were experimental, but as they are stabilising, I think there are plans
now to make them faster in subsequent Python releases. So, it might be
a good bet, learning to depend on them in the long run.

--
François Pinard http://www.iro.umontreal.ca/~pinard

Jul 18 '05 #5
Roy Smith <ro*@panix.com> wrote in news:roy-
39*******************@reader2.panix.com:
Something like:

d = {}
for item in list:
d[item] = True
list = d.keys()


Or just:

aList = dict.fromkeys(aList).keys()

which does the same thing (except I renamed the variable to avoid confusion
with the list type).

Jul 18 '05 #6
On 3 Jul 2004 00:40:00 GMT, Paul <pa**@oz.net> wrote:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u

print uniq(['a','b','c','a'])
['a', 'b', 'c']

How about:

def uniq(list):
list.sort() # put duplicate entries together
lastValue = "different from 0th value" + str(list[0])
u = []
for item in list:
if item != lastValue:
u.append(item)
lastValue = item
return u

--
"It's easier to find people online who openly support the KKK than
people who openly support the RIAA" -- comment on Wikipedia
(Email: zen19725 at zen dot co dot uk)
Jul 18 '05 #7
Take a look at recipe here:

http://aspn.activestate.com/ASPN/Coo.../Recipe/204297

another way would be to create a dictionary (which would
eliminate the duplicates) and then convert back to a list.

uniq_list=dict(zip(list,list)).keys()

FYI,
Larry Bates
Syscon, Inc.
"phil hunt" <ze******@zen.co.uk> wrote in message
news:sl********************@cabalamat.cabalamat.or g...
On 3 Jul 2004 00:40:00 GMT, Paul <pa**@oz.net> wrote:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u

print uniq(['a','b','c','a'])
['a', 'b', 'c']

How about:

def uniq(list):
list.sort() # put duplicate entries together
lastValue = "different from 0th value" + str(list[0])
u = []
for item in list:
if item != lastValue:
u.append(item)
lastValue = item
return u

--
"It's easier to find people online who openly support the KKK than
people who openly support the RIAA" -- comment on Wikipedia
(Email: zen19725 at zen dot co dot uk)

Jul 18 '05 #8
Larry Bates wrote:
Take a look at recipe here:...
On 3 Jul 2004 00:40:00 GMT, Paul <pa**@oz.net> wrote:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow


Or even (you get to the first few earlier):

def uniq(lst):
matched = {}
for item in lst:
if item not in matched:
yield item
matched[item] = item

--
-Scott David Daniels
Sc***********@Acm.Org
Jul 18 '05 #9
Paul <pa**@oz.net> writes:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:


I generally do it by sorting the list and then making a pass through
it moving stuff down:

def uniq(a):
"Remove duplicate elements from a. a must be sorted."
p=0
for i in xrange(1,len(a)):
if a[i] != a[p]:
p=p+1
a[p]=a[i]
del a[p+1:]
Jul 18 '05 #10

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

22 posts views Thread by christof hoeke | last post: by
1 post views Thread by A.T. | last post: by
9 posts views Thread by =?Utf-8?B?YWJjZA==?= | last post: by
reply views Thread by gheharukoh7 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.