[Roy Smith]
[Paul]
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:
Something like:
d = {}
for item in list:
d[item] = True
list = d.keys()
This should do it and will run in, if I'm not mistaken, O(n). The
only problem is that it scrambles the order of the items in the list,
which may or may not be a problem, depending on your application.
If order is not important, here is a short idiom:
from sets import Set as set
d = list(set(d))
This may be slower than the dictionary solution in current Python. Sets
were experimental, but as they are stabilising, I think there are plans
now to make them faster in subsequent Python releases. So, it might be
a good bet, learning to depend on them in the long run.
--
François Pinard
http://www.iro.umontreal.ca/~pinard