By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
 440,928 Members | 1,210 Online
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,928 IT Pros & Developers. It's quick & easy.

finding items that occur more than once in a list

 P: n/a Is there a more efficient way to do this? def f(L): '''Return a set of the items that occur more than once in L.''' L = list(L) for item in set(L): L.remove(item) return set(L) |>f([0, 0, 1, 1, 2, 2, 3]) set([0, 1, 2]) Mar 18 '08 #1
Share this Question
22 Replies

 P: n/a On Mar 18, 11:57*am, Simon Forman f([0, 0, 1, 1, 2, 2, 3]) set([0, 1, 2]) def f(L): D = dict() for item in L: if item in D: D[item] += 1 else: D[item] = 1 return [i for i,j in D.items() if j 1] That would be my way to do it, would need to test it via several thousand iterations to see which one is most efficient though. Mar 18 '08 #2

 P: n/a Simon Forman

 P: n/a Simon Forman wrote: Is there a more efficient way to do this? def f(L): '''Return a set of the items that occur more than once in L.''' L = list(L) for item in set(L): L.remove(item) return set(L) That's neat, but quadratic time because list.remove() requires a linear search. We can make an efficient variant by using remove on a set rather than a list: def multiples(lst): singles = set(lst) mults = set() for x in lst: if x in singles: singles.remove(x) else: mults.add(x) return mults Though probably better is: def multiples(lst): seen = set() mults = set() for x in lst: if x in seen: mults.add(x) else: seen.add(x) return mults I've typically used dicts for such things, as in: def multiples(lst): h = {} for x in lst: h[x] = h.get(x, 0) + 1 return set([x for x in h if h[x] 1]) -- --Bryan Mar 18 '08 #4

 P: n/a On Mar 18, 2:57 am, Simon Forman f([0, 0, 1, 1, 2, 2, 3]) set([0, 1, 2]) def f(L): seen = set() dups = set() for e in L: if e in seen: dups.add(e) else: seen.add(e) return dups Raymond Mar 18 '08 #5

 P: n/a Ninereeds

 P: n/a On 18 Mar, 10:57, Simon Forman

 P: n/a On 18 Mar, 22:22, sturlamolden

 P: n/a On 18 Mar, 22:25, sturlamolden

 P: n/a On Mar 18, 9:56*pm, sturlamolden >list(duplicates([1, 2, 3, 2, 2, 4, 1])) [1, 2] -- Arnaud Mar 18 '08 #10

 P: n/a On 18 Mar, 23:45, Arnaud Delobelle

 P: n/a On Mar 19, 10:08 am, sturlamolden

 P: n/a On 19 Mar, 22:48, John Machin

 P: n/a On Mar 19, 2:48 pm, John Machin

 P: n/a On Mar 20, 9:57 am, Justin Bozonier

 P: n/a On Mar 20, 9:14 am, sturlamolden

 P: n/a On 20 Mar, 00:16, John Machin

 P: n/a En Wed, 19 Mar 2008 20:16:36 -0300, John Machin escribió: On Mar 20, 9:14 am, sturlamolden Is a Python set implemented using a hash table? What don't you understand about the comments in the first two screenfuls of Objects/setobject.c? That comment was unnecesarily harsh, don't you think? -- Gabriel Genellina Mar 20 '08 #18

 P: n/a On Mar 20, 12:50*pm, "Gabriel Genellina" wrote: En Wed, 19 Mar 2008 20:16:36 -0300, John Machin

 P: n/a On Mar 20, 2:07*am, John Machin wrote:En Wed, 19 Mar 2008 20:16:36 -0300, John Machin Is a Python set implemented using a hash table? What don't you understand about the comments in the first two screenfuls of Objects/setobject.c? That comment was unnecesarily harsh, don't you think? No, otherwise I wouldn't have made it. Screenfuls is an awful lot, the way you guys write. Besides, what's the answer? What don't you understand? Mar 21 '08 #20

 P: n/a On Mar 21, 3:39*pm, John Machin >[ hash( 2**i ) for i in range( 0, 256, 32 ) ] [1, 1, 1, 1, 1, 1, 1, 1] Bryan did qualify his remarks: "If we exclude the case where an adversary is choosing the keys, ..." Some adversary. What, you mean, my boss or my customers? Mar 22 '08 #21

 P: n/a ca********@gmail.com wrote: John Machin wrote: >On Mar 22, 1:11 am, castiro...@gmail.com wrote: >>A collision sequence is not so rare.>[ hash( 2**i ) for i in range( 0, 256, 32 ) ][1, 1, 1, 1, 1, 1, 1, 1] >Bryan did qualify his remarks: "If we exclude the case where anadversary is choosing the keys, ..." Some adversary. What, you mean, my boss or my customers? We mean that the party supplying the keys deliberately chose them to make the hash table inefficient. In this thread the goal is efficiency; a party working toward an opposing goal is an adversary. If you find real-world data sets that tend to induce bad-case behavior in Python's hash, please do tell. It would be reason enough to adjust the hash function. The hashes in popular software such as Python are already quite well vetted. Even a hash function that behaves as a random oracle has worst-case quadratic-time in the algorithm here, but that's an issue in theory and not in serving customers. -- --Bryan Mar 22 '08 #22

 P: n/a On 22 Mar, 09:31, Bryan Olson

This discussion thread is closed

Replies have been disabled for this discussion.