473,237 Members | 1,258 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,237 software developers and data experts.

Eliminating duplicates entries from a list efficiently

Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u

print uniq(['a','b','c','a'])
['a', 'b', 'c']

Thanks,

-- Paul
Jul 18 '05 #1
9 2286
In article <cc**********@216.39.144.46>, Paul <pa**@oz.net> wrote:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u

print uniq(['a','b','c','a'])
['a', 'b', 'c']

Thanks,

-- Paul


Something like:

d = {}
for item in list:
d[item] = True
list = d.keys()

This should do it and will run in, if I'm not mistaken, O(n). The only
problem is that it scrambles the order of the items in the list, which
may or may not be a problem, depending on your application.
Jul 18 '05 #2
Paul wrote:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u [...]


That's slow for large lists. Better use a dictionary u. Also it's unwise
to reuse builtin names like "list" as variable names.

Many discussions of this can be found in the mailing list/newsgroup
archives via http://groups.google.com I also remember some neat
solutions for this problem in the Python cookbook.

-- Gerhard

Jul 18 '05 #3
Paul wrote:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine,


The less naive approach to go to the Python cookbook which contains
lots of recipes for doing all sorts of things, commented on and
improved by the community.

The starting page is:

http://aspn.activestate.com/ASPN/Python/Cookbook/

For example here is a recipe by one Tim Peters that shows how to
do it, including all the subtle cases you didn't even realise
existed.

http://aspn.activestate.com/ASPN/Coo...n/Recipe/52560

I would highly recommend getting the Python Cookbook dead tree
version.

Roger
Jul 18 '05 #4
[Roy Smith]
[Paul]
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

Something like: d = {}
for item in list:
d[item] = True
list = d.keys() This should do it and will run in, if I'm not mistaken, O(n). The
only problem is that it scrambles the order of the items in the list,
which may or may not be a problem, depending on your application.


If order is not important, here is a short idiom:

from sets import Set as set
d = list(set(d))

This may be slower than the dictionary solution in current Python. Sets
were experimental, but as they are stabilising, I think there are plans
now to make them faster in subsequent Python releases. So, it might be
a good bet, learning to depend on them in the long run.

--
François Pinard http://www.iro.umontreal.ca/~pinard

Jul 18 '05 #5
Roy Smith <ro*@panix.com> wrote in news:roy-
39*******************@reader2.panix.com:
Something like:

d = {}
for item in list:
d[item] = True
list = d.keys()


Or just:

aList = dict.fromkeys(aList).keys()

which does the same thing (except I renamed the variable to avoid confusion
with the list type).

Jul 18 '05 #6
On 3 Jul 2004 00:40:00 GMT, Paul <pa**@oz.net> wrote:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u

print uniq(['a','b','c','a'])
['a', 'b', 'c']

How about:

def uniq(list):
list.sort() # put duplicate entries together
lastValue = "different from 0th value" + str(list[0])
u = []
for item in list:
if item != lastValue:
u.append(item)
lastValue = item
return u

--
"It's easier to find people online who openly support the KKK than
people who openly support the RIAA" -- comment on Wikipedia
(Email: zen19725 at zen dot co dot uk)
Jul 18 '05 #7
Take a look at recipe here:

http://aspn.activestate.com/ASPN/Coo.../Recipe/204297

another way would be to create a dictionary (which would
eliminate the duplicates) and then convert back to a list.

uniq_list=dict(zip(list,list)).keys()

FYI,
Larry Bates
Syscon, Inc.
"phil hunt" <ze******@zen.co.uk> wrote in message
news:sl********************@cabalamat.cabalamat.or g...
On 3 Jul 2004 00:40:00 GMT, Paul <pa**@oz.net> wrote:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u

print uniq(['a','b','c','a'])
['a', 'b', 'c']

How about:

def uniq(list):
list.sort() # put duplicate entries together
lastValue = "different from 0th value" + str(list[0])
u = []
for item in list:
if item != lastValue:
u.append(item)
lastValue = item
return u

--
"It's easier to find people online who openly support the KKK than
people who openly support the RIAA" -- comment on Wikipedia
(Email: zen19725 at zen dot co dot uk)

Jul 18 '05 #8
Larry Bates wrote:
Take a look at recipe here:...
On 3 Jul 2004 00:40:00 GMT, Paul <pa**@oz.net> wrote:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow


Or even (you get to the first few earlier):

def uniq(lst):
matched = {}
for item in lst:
if item not in matched:
yield item
matched[item] = item

--
-Scott David Daniels
Sc***********@Acm.Org
Jul 18 '05 #9
Paul <pa**@oz.net> writes:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:


I generally do it by sorting the list and then making a pass through
it moving stuff down:

def uniq(a):
"Remove duplicate elements from a. a must be sorted."
p=0
for i in xrange(1,len(a)):
if a[i] != a[p]:
p=p+1
a[p]=a[i]
del a[p+1:]
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

22
by: christof hoeke | last post by:
hello, this must have come up before, so i am already sorry for asking but a quick googling did not give me any answer. i have a list from which i want a simpler list without the duplicates an...
1
by: A.T. | last post by:
I have an XML document that looks like the following, except that it has about 30,000 entries. <root> <version id="A-0000"> <!many elements within- removed for ease of reading> </version>...
1
by: MHenry | last post by:
Hi, I have a table with duplicate records. Some of the duplicates need to be eliminated from the table and some need not. A duplicate record does not need to be eliminated if the one record...
6
by: Marlene | last post by:
Hi All I have the following scenario, where I have found all the duplicates in a table, based on an order number and a part number (item).I might have something like this: Order PODate Rec...
3
by: AK | last post by:
Hi Our product uses MS-SQL Server 2000. One of our customer has 10 installations with each installation stroring data in its own database. Now the customer wants to consolidate these databases...
9
by: =?Utf-8?B?YWJjZA==?= | last post by:
I have a list of items. Some of the items are another list and that list item could be another list. Now I want to filter the main list by the inner list item value based on some criteria... My...
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 3 Jan 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). For other local times, please check World Time Buddy In...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: fareedcanada | last post by:
Hello I am trying to split number on their count. suppose i have 121314151617 (12cnt) then number should be split like 12,13,14,15,16,17 and if 11314151617 (11cnt) then should be split like...
0
Git
by: egorbl4 | last post by:
Скачал я git, хотел начать настройку, а там вылезло вот это Что это? Что мне с этим делать? ...
1
by: davi5007 | last post by:
Hi, Basically, I am trying to automate a field named TraceabilityNo into a web page from an access form. I've got the serial held in the variable strSearchString. How can I get this into the...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.