473,406 Members | 2,707 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

finding items that occur more than once in a list

Is there a more efficient way to do this?

def f(L):
'''Return a set of the items that occur more than once in L.'''
L = list(L)
for item in set(L):
L.remove(item)
return set(L)
|>f([0, 0, 1, 1, 2, 2, 3])
set([0, 1, 2])

Mar 18 '08 #1
22 6855
On Mar 18, 11:57*am, Simon Forman <sajmik...@gmail.comwrote:
Is there a more efficient way to do this?

def f(L):
* * '''Return a set of the items that occur more than once in L.'''
* * L = list(L)
* * for item in set(L):
* * * * L.remove(item)
* * return set(L)

|>f([0, 0, 1, 1, 2, 2, 3])
set([0, 1, 2])
def f(L):
D = dict()
for item in L:
if item in D:
D[item] += 1
else:
D[item] = 1
return [i for i,j in D.items() if j 1]

That would be my way to do it, would need to test it via several
thousand iterations to see which one is most efficient though.
Mar 18 '08 #2
Simon Forman <sa*******@gmail.comwrites:
Is there a more efficient way to do this?
http://aspn.activestate.com/ASPN/Coo.../Recipe/502263
Mar 18 '08 #3
Simon Forman wrote:
Is there a more efficient way to do this?

def f(L):
'''Return a set of the items that occur more than once in L.'''
L = list(L)
for item in set(L):
L.remove(item)
return set(L)
That's neat, but quadratic time because list.remove() requires
a linear search. We can make an efficient variant by using
remove on a set rather than a list:

def multiples(lst):
singles = set(lst)
mults = set()
for x in lst:
if x in singles:
singles.remove(x)
else:
mults.add(x)
return mults

Though probably better is:

def multiples(lst):
seen = set()
mults = set()
for x in lst:
if x in seen:
mults.add(x)
else:
seen.add(x)
return mults
I've typically used dicts for such things, as in:

def multiples(lst):
h = {}
for x in lst:
h[x] = h.get(x, 0) + 1
return set([x for x in h if h[x] 1])
--
--Bryan
Mar 18 '08 #4
On Mar 18, 2:57 am, Simon Forman <sajmik...@gmail.comwrote:
Is there a more efficient way to do this?

def f(L):
'''Return a set of the items that occur more than once in L.'''
L = list(L)
for item in set(L):
L.remove(item)
return set(L)

|>f([0, 0, 1, 1, 2, 2, 3])
set([0, 1, 2])

def f(L):
seen = set()
dups = set()
for e in L:
if e in seen:
dups.add(e)
else:
seen.add(e)
return dups

Raymond
Mar 18 '08 #5
Ninereeds <st*************@aol.comwrites:
As for the growth pattern, each time you grow the table you have to
redistribute all the items previously inserted to new locations.
Resizes would get rarer as more items are added due to the
exponential growth, but every table resize would take longer too
since there are more items to move. Inserting n items still
intuitively looks like O(n^2) to me.

That said, it does remind me of that old exponential realloc trick
for array resizing. Same thing, I suppose, since a hash table is
basically an array. Maybe my math "intuition" is just wrong.
That's it. While table resizes grow linearly in complexity, they
become geometrically rarer. This is exactly what happens when
resizing dynamic arrays such as Python lists. Applying your
intuition, appending n elements to a Python list would also have
O(n^2) complexity, which it doesn't. See, for example,
http://en.wikipedia.org/wiki/Dynamic...amortized_cost
Mar 18 '08 #6
On 18 Mar, 10:57, Simon Forman <sajmik...@gmail.comwrote:
def f(L):
'''Return a set of the items that occur more than once in L.'''
L = list(L)
for item in set(L):
L.remove(item)
return set(L)

def nonunique(lst):
slst = sorted(lst)
return list(set([s[0] for s in
filter(lambda t : not(t[0]-t[1]), zip(slst[:-1],slst[1:]))]))


Mar 18 '08 #7
On 18 Mar, 22:22, sturlamolden <sturlamol...@yahoo.nowrote:
def nonunique(lst):
slst = sorted(lst)
return list(set([s[0] for s in
filter(lambda t : not(t[0]-t[1]), zip(slst[:-1],slst[1:]))]))

Or perhaps better:

def nonunique(lst):
slst = sorted(lst)
return list(set([s[0] for s in
filter(lambda t : t[0] != t[1], zip(slst[:-1],slst[1:]))]))

Mar 18 '08 #8
On 18 Mar, 22:25, sturlamolden <sturlamol...@yahoo.nowrote:
def nonunique(lst):
slst = sorted(lst)
return list(set([s[0] for s in
filter(lambda t : t[0] != t[1], zip(slst[:-1],slst[1:]))]))
Obviously that should be 'lambda t : t[0] == t[1]'. Instead of using
the set function, there is more structure to exploit when the list is
sorted:
def nonunique(lst):
slst = sorted(lst)
dups = [s[0] for s in
filter(lambda t : t[0] == t[1], zip(slst[:-1],slst[1:]))]
return [dups[0]] + [s[1] for s in
filter(lambda t : t[0] != t[1], zip(dups[:-1],dups[1:]))]

Mar 18 '08 #9
On Mar 18, 9:56*pm, sturlamolden <sturlamol...@yahoo.nowrote:
On 18 Mar, 22:25, sturlamolden <sturlamol...@yahoo.nowrote:
def nonunique(lst):
* *slst = sorted(lst)
* *return list(set([s[0] for s in
* * * *filter(lambda t : t[0] != t[1], zip(slst[:-1],slst[1:]))]))

Obviously that should be 'lambda t : t[0] == t[1]'. Instead of using
the set function, there is more structure to exploit when the list is
sorted:

def nonunique(lst):
* *slst = sorted(lst)
* *dups = [s[0] for s in
* * * * filter(lambda t : t[0] == t[1], zip(slst[:-1],slst[1:]))]
* *return [dups[0]] + [s[1] for s in
* * * * filter(lambda t : t[0] != t[1], zip(dups[:-1],dups[1:]))]
Argh! What's wrong with something like:

def duplicates(l):
i = j = object()
for k in sorted(l):
if i != j == k: yield k
i, j = j, k
>>list(duplicates([1, 2, 3, 2, 2, 4, 1]))
[1, 2]

--
Arnaud

Mar 18 '08 #10
On 18 Mar, 23:45, Arnaud Delobelle <arno...@googlemail.comwrote:
def nonunique(lst):
slst = sorted(lst)
dups = [s[0] for s in
filter(lambda t : t[0] == t[1], zip(slst[:-1],slst[1:]))]
return [dups[0]] + [s[1] for s in
filter(lambda t : t[0] != t[1], zip(dups[:-1],dups[1:]))]

Argh! What's wrong with something like:

def duplicates(l):
i = j = object()
for k in sorted(l):
if i != j == k: yield k
i, j = j, k

Nice, and more readable. But I'd use Paul Robin's solution. It is O(N)
as opposed to ours which are O(N log N).


Mar 18 '08 #11
On Mar 19, 10:08 am, sturlamolden <sturlamol...@yahoo.nowrote:
On 18 Mar, 23:45, Arnaud Delobelle <arno...@googlemail.comwrote:
def nonunique(lst):
slst = sorted(lst)
dups = [s[0] for s in
filter(lambda t : t[0] == t[1], zip(slst[:-1],slst[1:]))]
return [dups[0]] + [s[1] for s in
filter(lambda t : t[0] != t[1], zip(dups[:-1],dups[1:]))]
Argh! What's wrong with something like:
def duplicates(l):
i = j = object()
for k in sorted(l):
if i != j == k: yield k
i, j = j, k

Nice, and more readable. But I'd use Paul Robin's solution. It is O(N)
as opposed to ours which are O(N log N).
I'd use Raymond Hettinger's solution. It is as much O(N) as Paul's,
and is IMHO more readable than Paul's.
Mar 19 '08 #12
On 19 Mar, 22:48, John Machin <sjmac...@lexicon.netwrote:
I'd use Raymond Hettinger's solution. It is as much O(N) as Paul's,
and is IMHO more readable than Paul's.
Is a Python set implemented using a hash table?
Mar 19 '08 #13
On Mar 19, 2:48 pm, John Machin <sjmac...@lexicon.netwrote:
On Mar 19, 10:08 am, sturlamolden <sturlamol...@yahoo.nowrote:
On 18 Mar, 23:45, Arnaud Delobelle <arno...@googlemail.comwrote:
def nonunique(lst):
slst = sorted(lst)
dups = [s[0] for s in
filter(lambda t : t[0] == t[1], zip(slst[:-1],slst[1:]))]
return [dups[0]] + [s[1] for s in
filter(lambda t : t[0] != t[1], zip(dups[:-1],dups[1:]))]
Argh! What's wrong with something like:
def duplicates(l):
i = j = object()
for k in sorted(l):
if i != j == k: yield k
i, j = j, k
Nice, and more readable. But I'd use Paul Robin's solution. It is O(N)
as opposed to ours which are O(N log N).

I'd use Raymond Hettinger's solution. It is as much O(N) as Paul's,
and is IMHO more readable than Paul's.
It's not as much O(N)... Paul Robin's uses a sort first which is
definitely not O(N). Paul's could be prettied up a bit but the general
principle is sound.
Mar 19 '08 #14
On Mar 20, 9:57 am, Justin Bozonier <darkxant...@gmail.comwrote:
On Mar 19, 2:48 pm, John Machin <sjmac...@lexicon.netwrote:
On Mar 19, 10:08 am, sturlamolden <sturlamol...@yahoo.nowrote:
On 18 Mar, 23:45, Arnaud Delobelle <arno...@googlemail.comwrote:
def nonunique(lst):
slst = sorted(lst)
dups = [s[0] for s in
filter(lambda t : t[0] == t[1], zip(slst[:-1],slst[1:]))]
return [dups[0]] + [s[1] for s in
filter(lambda t : t[0] != t[1], zip(dups[:-1],dups[1:]))]
Argh! What's wrong with something like:
def duplicates(l):
i = j = object()
for k in sorted(l):
if i != j == k: yield k
i, j = j, k
Nice, and more readable. But I'd use Paul Robin's solution. It is O(N)
as opposed to ours which are O(N log N).
I'd use Raymond Hettinger's solution. It is as much O(N) as Paul's,
and is IMHO more readable than Paul's.

It's not as much O(N)... Paul Robin's uses a sort first which is
definitely not O(N). Paul's could be prettied up a bit but the general
principle is sound.
"""
from collections import defaultdict
def nonunique(lst):
d = defaultdict(int)
for x in lst: d[x] += 1
return [x for x,n in d.iterkeys() if n 1]
"""

I see no sort here.
Mar 19 '08 #15
On Mar 20, 9:14 am, sturlamolden <sturlamol...@yahoo.nowrote:
On 19 Mar, 22:48, John Machin <sjmac...@lexicon.netwrote:
I'd use Raymond Hettinger's solution. It is as much O(N) as Paul's,
and is IMHO more readable than Paul's.

Is a Python set implemented using a hash table?
What don't you understand about the comments in the first two
screenfuls of Objects/setobject.c?

Mar 19 '08 #16
On 20 Mar, 00:16, John Machin <sjmac...@lexicon.netwrote:
What don't you understand about the comments in the first two
screenfuls of Objects/setobject.c?
I had not looked at it, but now I have. Is seems Hettinger is the
author :) Ok, so sets are implemented as hash tables. Then I agree,
use Raymond Hettinger's solution.
Mar 20 '08 #17
En Wed, 19 Mar 2008 20:16:36 -0300, John Machin <sj******@lexicon.net>
escribió:
On Mar 20, 9:14 am, sturlamolden <sturlamol...@yahoo.nowrote:
>Is a Python set implemented using a hash table?

What don't you understand about the comments in the first two
screenfuls of Objects/setobject.c?
That comment was unnecesarily harsh, don't you think?

--
Gabriel Genellina

Mar 20 '08 #18
On Mar 20, 12:50*pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
En Wed, 19 Mar 2008 20:16:36 -0300, John Machin <sjmac...@lexicon.net*
escribió:
On Mar 20, 9:14 am, sturlamolden <sturlamol...@yahoo.nowrote:
Is a Python set implemented using a hash table?
What don't you understand about the comments in the first two
screenfuls of Objects/setobject.c?

That comment was unnecesarily harsh, don't you think?
No, otherwise I wouldn't have made it.
Mar 20 '08 #19
On Mar 20, 2:07*am, John Machin <sjmac...@lexicon.netwrote:
On Mar 20, 12:50*pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:En Wed, 19 Mar 2008 20:16:36 -0300, John Machin <sjmac...@lexicon.net*
escribió:
On Mar 20, 9:14 am, sturlamolden <sturlamol...@yahoo.nowrote:
>Is a Python set implemented using a hash table?
What don't you understand about the comments in the first two
screenfuls of Objects/setobject.c?
That comment was unnecesarily harsh, don't you think?

No, otherwise I wouldn't have made it.
Screenfuls is an awful lot, the way you guys write. Besides, what's
the answer? What don't you understand?
Mar 21 '08 #20
On Mar 21, 3:39*pm, John Machin <sjmac...@lexicon.netwrote:
On Mar 22, 1:11 am, castiro...@gmail.com wrote:
A collision sequence is not so rare.
>>[ hash( 2**i ) for i in range( 0, 256, 32 ) ]
[1, 1, 1, 1, 1, 1, 1, 1]

Bryan did qualify his remarks: "If we exclude the case where an
adversary is choosing the keys, ..."
Some adversary. What, you mean, my boss or my customers?
Mar 22 '08 #21
ca********@gmail.com wrote:
John Machin wrote:
>On Mar 22, 1:11 am, castiro...@gmail.com wrote:
>>A collision sequence is not so rare.
>[ hash( 2**i ) for i in range( 0, 256, 32 ) ]
[1, 1, 1, 1, 1, 1, 1, 1]
>Bryan did qualify his remarks: "If we exclude the case where an
adversary is choosing the keys, ..."

Some adversary. What, you mean, my boss or my customers?
We mean that the party supplying the keys deliberately chose
them to make the hash table inefficient. In this thread the goal
is efficiency; a party working toward an opposing goal is an
adversary.

If you find real-world data sets that tend to induce bad-case
behavior in Python's hash, please do tell. It would be reason
enough to adjust the hash function. The hashes in popular
software such as Python are already quite well vetted.

Even a hash function that behaves as a random oracle has
worst-case quadratic-time in the algorithm here, but that's
an issue in theory and not in serving customers.
--
--Bryan
Mar 22 '08 #22
On 22 Mar, 09:31, Bryan Olson <fakeaddr...@nowhere.orgwrote:
Even a hash function that behaves as a random oracle has
worst-case quadratic-time in the algorithm here
In which case inserts are not amortized to O(1).
Mar 22 '08 #23

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Xif | last post by:
Hi I want to have control over all items added to a list. The first attempt was to subclass list and override its .append() method. Problem is, there are plenty of other ways the list can...
2
by: Arpan De | last post by:
Suppose I have the following ASP code: <% Dim strName ..................... ..................... ..................... ..................... ..................... .....................
6
by: Nimesh | last post by:
I need to find customer's names that repeat / occur more than once in the same table. I have treid many options and I have tried comparing the column to itself but Oracle gives me an error. ...
1
by: Simon Forman | last post by:
I've got a function that I'd like to improve. It takes a list of lists and a "target" element, and it returns the set of the items in the lists that appear either before or after the target...
5
by: ShaunO | last post by:
BACKGROUND I have the following classes as part of a program that opens 3 Asynchronous Sockets. Each socket is in a separate instance of a wrapping class. 1x User Interface class 1x Client...
3
by: MikeP | last post by:
Hi there, I've tried hunting for this in the C# specs but can't find the relevant info. Does anyone know what the official rule says (or doesn't say) about removing an event more than once. If...
1
by: wbsurfver | last post by:
Here is the problem, I am working with a bunch of old code so I can really restructure the includes, otherwise I guess I could change all the #include statements to #include-once. Anyway, If I...
1
by: puzzlecracker | last post by:
I get this error during compilation and not sure what the source of it is: Error 2 The item "value\port\Port.cs" was specified more than once in the "Sources" parameter. Duplicate items are not...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.