473,387 Members | 1,504 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Eliminating duplicates entries from a list efficiently

Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u

print uniq(['a','b','c','a'])
['a', 'b', 'c']

Thanks,

-- Paul
Jul 18 '05 #1
9 2292
In article <cc**********@216.39.144.46>, Paul <pa**@oz.net> wrote:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u

print uniq(['a','b','c','a'])
['a', 'b', 'c']

Thanks,

-- Paul


Something like:

d = {}
for item in list:
d[item] = True
list = d.keys()

This should do it and will run in, if I'm not mistaken, O(n). The only
problem is that it scrambles the order of the items in the list, which
may or may not be a problem, depending on your application.
Jul 18 '05 #2
Paul wrote:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u [...]


That's slow for large lists. Better use a dictionary u. Also it's unwise
to reuse builtin names like "list" as variable names.

Many discussions of this can be found in the mailing list/newsgroup
archives via http://groups.google.com I also remember some neat
solutions for this problem in the Python cookbook.

-- Gerhard

Jul 18 '05 #3
Paul wrote:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine,


The less naive approach to go to the Python cookbook which contains
lots of recipes for doing all sorts of things, commented on and
improved by the community.

The starting page is:

http://aspn.activestate.com/ASPN/Python/Cookbook/

For example here is a recipe by one Tim Peters that shows how to
do it, including all the subtle cases you didn't even realise
existed.

http://aspn.activestate.com/ASPN/Coo...n/Recipe/52560

I would highly recommend getting the Python Cookbook dead tree
version.

Roger
Jul 18 '05 #4
[Roy Smith]
[Paul]
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

Something like: d = {}
for item in list:
d[item] = True
list = d.keys() This should do it and will run in, if I'm not mistaken, O(n). The
only problem is that it scrambles the order of the items in the list,
which may or may not be a problem, depending on your application.


If order is not important, here is a short idiom:

from sets import Set as set
d = list(set(d))

This may be slower than the dictionary solution in current Python. Sets
were experimental, but as they are stabilising, I think there are plans
now to make them faster in subsequent Python releases. So, it might be
a good bet, learning to depend on them in the long run.

--
François Pinard http://www.iro.umontreal.ca/~pinard

Jul 18 '05 #5
Roy Smith <ro*@panix.com> wrote in news:roy-
39*******************@reader2.panix.com:
Something like:

d = {}
for item in list:
d[item] = True
list = d.keys()


Or just:

aList = dict.fromkeys(aList).keys()

which does the same thing (except I renamed the variable to avoid confusion
with the list type).

Jul 18 '05 #6
On 3 Jul 2004 00:40:00 GMT, Paul <pa**@oz.net> wrote:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u

print uniq(['a','b','c','a'])
['a', 'b', 'c']

How about:

def uniq(list):
list.sort() # put duplicate entries together
lastValue = "different from 0th value" + str(list[0])
u = []
for item in list:
if item != lastValue:
u.append(item)
lastValue = item
return u

--
"It's easier to find people online who openly support the KKK than
people who openly support the RIAA" -- comment on Wikipedia
(Email: zen19725 at zen dot co dot uk)
Jul 18 '05 #7
Take a look at recipe here:

http://aspn.activestate.com/ASPN/Coo.../Recipe/204297

another way would be to create a dictionary (which would
eliminate the duplicates) and then convert back to a list.

uniq_list=dict(zip(list,list)).keys()

FYI,
Larry Bates
Syscon, Inc.
"phil hunt" <ze******@zen.co.uk> wrote in message
news:sl********************@cabalamat.cabalamat.or g...
On 3 Jul 2004 00:40:00 GMT, Paul <pa**@oz.net> wrote:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u

print uniq(['a','b','c','a'])
['a', 'b', 'c']

How about:

def uniq(list):
list.sort() # put duplicate entries together
lastValue = "different from 0th value" + str(list[0])
u = []
for item in list:
if item != lastValue:
u.append(item)
lastValue = item
return u

--
"It's easier to find people online who openly support the KKK than
people who openly support the RIAA" -- comment on Wikipedia
(Email: zen19725 at zen dot co dot uk)

Jul 18 '05 #8
Larry Bates wrote:
Take a look at recipe here:...
On 3 Jul 2004 00:40:00 GMT, Paul <pa**@oz.net> wrote:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow


Or even (you get to the first few earlier):

def uniq(lst):
matched = {}
for item in lst:
if item not in matched:
yield item
matched[item] = item

--
-Scott David Daniels
Sc***********@Acm.Org
Jul 18 '05 #9
Paul <pa**@oz.net> writes:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:


I generally do it by sorting the list and then making a pass through
it moving stuff down:

def uniq(a):
"Remove duplicate elements from a. a must be sorted."
p=0
for i in xrange(1,len(a)):
if a[i] != a[p]:
p=p+1
a[p]=a[i]
del a[p+1:]
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

22
by: christof hoeke | last post by:
hello, this must have come up before, so i am already sorry for asking but a quick googling did not give me any answer. i have a list from which i want a simpler list without the duplicates an...
1
by: A.T. | last post by:
I have an XML document that looks like the following, except that it has about 30,000 entries. <root> <version id="A-0000"> <!many elements within- removed for ease of reading> </version>...
1
by: MHenry | last post by:
Hi, I have a table with duplicate records. Some of the duplicates need to be eliminated from the table and some need not. A duplicate record does not need to be eliminated if the one record...
6
by: Marlene | last post by:
Hi All I have the following scenario, where I have found all the duplicates in a table, based on an order number and a part number (item).I might have something like this: Order PODate Rec...
3
by: AK | last post by:
Hi Our product uses MS-SQL Server 2000. One of our customer has 10 installations with each installation stroring data in its own database. Now the customer wants to consolidate these databases...
9
by: =?Utf-8?B?YWJjZA==?= | last post by:
I have a list of items. Some of the items are another list and that list item could be another list. Now I want to filter the main list by the inner list item value based on some criteria... My...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.