P: n/a

Is there a more efficient way to do this?
def f(L):
'''Return a set of the items that occur more than once in L.'''
L = list(L)
for item in set(L):
L.remove(item)
return set(L)
>f([0, 0, 1, 1, 2, 2, 3])
set([0, 1, 2])  
Share this Question
P: n/a

On Mar 18, 11:57*am, Simon Forman <sajmik...@gmail.comwrote:
Is there a more efficient way to do this?
def f(L):
* * '''Return a set of the items that occur more than once in L.'''
* * L = list(L)
* * for item in set(L):
* * * * L.remove(item)
* * return set(L)
>f([0, 0, 1, 1, 2, 2, 3])
set([0, 1, 2])
def f(L):
D = dict()
for item in L:
if item in D:
D[item] += 1
else:
D[item] = 1
return [i for i,j in D.items() if j 1]
That would be my way to do it, would need to test it via several
thousand iterations to see which one is most efficient though.  
P: n/a

Simon Forman wrote:
Is there a more efficient way to do this?
def f(L):
'''Return a set of the items that occur more than once in L.'''
L = list(L)
for item in set(L):
L.remove(item)
return set(L)
That's neat, but quadratic time because list.remove() requires
a linear search. We can make an efficient variant by using
remove on a set rather than a list:
def multiples(lst):
singles = set(lst)
mults = set()
for x in lst:
if x in singles:
singles.remove(x)
else:
mults.add(x)
return mults
Though probably better is:
def multiples(lst):
seen = set()
mults = set()
for x in lst:
if x in seen:
mults.add(x)
else:
seen.add(x)
return mults
I've typically used dicts for such things, as in:
def multiples(lst):
h = {}
for x in lst:
h[x] = h.get(x, 0) + 1
return set([x for x in h if h[x] 1])

Bryan  
P: n/a

On Mar 18, 2:57 am, Simon Forman <sajmik...@gmail.comwrote:
Is there a more efficient way to do this?
def f(L):
'''Return a set of the items that occur more than once in L.'''
L = list(L)
for item in set(L):
L.remove(item)
return set(L)
>f([0, 0, 1, 1, 2, 2, 3])
set([0, 1, 2])
def f(L):
seen = set()
dups = set()
for e in L:
if e in seen:
dups.add(e)
else:
seen.add(e)
return dups
Raymond  
P: n/a

Ninereeds <st*************@aol.comwrites:
As for the growth pattern, each time you grow the table you have to
redistribute all the items previously inserted to new locations.
Resizes would get rarer as more items are added due to the
exponential growth, but every table resize would take longer too
since there are more items to move. Inserting n items still
intuitively looks like O(n^2) to me.
That said, it does remind me of that old exponential realloc trick
for array resizing. Same thing, I suppose, since a hash table is
basically an array. Maybe my math "intuition" is just wrong.
That's it. While table resizes grow linearly in complexity, they
become geometrically rarer. This is exactly what happens when
resizing dynamic arrays such as Python lists. Applying your
intuition, appending n elements to a Python list would also have
O(n^2) complexity, which it doesn't. See, for example, http://en.wikipedia.org/wiki/Dynamic...amortized_cost  
P: n/a

On 18 Mar, 10:57, Simon Forman <sajmik...@gmail.comwrote:
def f(L):
'''Return a set of the items that occur more than once in L.'''
L = list(L)
for item in set(L):
L.remove(item)
return set(L)
def nonunique(lst):
slst = sorted(lst)
return list(set([s[0] for s in
filter(lambda t : not(t[0]t[1]), zip(slst[:1],slst[1:]))]))  
P: n/a

On 18 Mar, 22:22, sturlamolden <sturlamol...@yahoo.nowrote:
def nonunique(lst):
slst = sorted(lst)
return list(set([s[0] for s in
filter(lambda t : not(t[0]t[1]), zip(slst[:1],slst[1:]))]))
Or perhaps better:
def nonunique(lst):
slst = sorted(lst)
return list(set([s[0] for s in
filter(lambda t : t[0] != t[1], zip(slst[:1],slst[1:]))]))  
P: n/a

On 18 Mar, 22:25, sturlamolden <sturlamol...@yahoo.nowrote:
def nonunique(lst):
slst = sorted(lst)
return list(set([s[0] for s in
filter(lambda t : t[0] != t[1], zip(slst[:1],slst[1:]))]))
Obviously that should be 'lambda t : t[0] == t[1]'. Instead of using
the set function, there is more structure to exploit when the list is
sorted:
def nonunique(lst):
slst = sorted(lst)
dups = [s[0] for s in
filter(lambda t : t[0] == t[1], zip(slst[:1],slst[1:]))]
return [dups[0]] + [s[1] for s in
filter(lambda t : t[0] != t[1], zip(dups[:1],dups[1:]))]  
P: n/a

On Mar 18, 9:56*pm, sturlamolden <sturlamol...@yahoo.nowrote:
On 18 Mar, 22:25, sturlamolden <sturlamol...@yahoo.nowrote:
def nonunique(lst):
* *slst = sorted(lst)
* *return list(set([s[0] for s in
* * * *filter(lambda t : t[0] != t[1], zip(slst[:1],slst[1:]))]))
Obviously that should be 'lambda t : t[0] == t[1]'. Instead of using
the set function, there is more structure to exploit when the list is
sorted:
def nonunique(lst):
* *slst = sorted(lst)
* *dups = [s[0] for s in
* * * * filter(lambda t : t[0] == t[1], zip(slst[:1],slst[1:]))]
* *return [dups[0]] + [s[1] for s in
* * * * filter(lambda t : t[0] != t[1], zip(dups[:1],dups[1:]))]
Argh! What's wrong with something like:
def duplicates(l):
i = j = object()
for k in sorted(l):
if i != j == k: yield k
i, j = j, k
>>list(duplicates([1, 2, 3, 2, 2, 4, 1]))
[1, 2]

Arnaud  
P: n/a

On 18 Mar, 23:45, Arnaud Delobelle <arno...@googlemail.comwrote:
def nonunique(lst):
slst = sorted(lst)
dups = [s[0] for s in
filter(lambda t : t[0] == t[1], zip(slst[:1],slst[1:]))]
return [dups[0]] + [s[1] for s in
filter(lambda t : t[0] != t[1], zip(dups[:1],dups[1:]))]
Argh! What's wrong with something like:
def duplicates(l):
i = j = object()
for k in sorted(l):
if i != j == k: yield k
i, j = j, k
Nice, and more readable. But I'd use Paul Robin's solution. It is O(N)
as opposed to ours which are O(N log N).  
P: n/a

On Mar 19, 10:08 am, sturlamolden <sturlamol...@yahoo.nowrote:
On 18 Mar, 23:45, Arnaud Delobelle <arno...@googlemail.comwrote:
def nonunique(lst):
slst = sorted(lst)
dups = [s[0] for s in
filter(lambda t : t[0] == t[1], zip(slst[:1],slst[1:]))]
return [dups[0]] + [s[1] for s in
filter(lambda t : t[0] != t[1], zip(dups[:1],dups[1:]))]
Argh! What's wrong with something like:
def duplicates(l):
i = j = object()
for k in sorted(l):
if i != j == k: yield k
i, j = j, k
Nice, and more readable. But I'd use Paul Robin's solution. It is O(N)
as opposed to ours which are O(N log N).
I'd use Raymond Hettinger's solution. It is as much O(N) as Paul's,
and is IMHO more readable than Paul's.  
P: n/a

On 19 Mar, 22:48, John Machin <sjmac...@lexicon.netwrote:
I'd use Raymond Hettinger's solution. It is as much O(N) as Paul's,
and is IMHO more readable than Paul's.
Is a Python set implemented using a hash table?  
P: n/a

On Mar 19, 2:48 pm, John Machin <sjmac...@lexicon.netwrote:
On Mar 19, 10:08 am, sturlamolden <sturlamol...@yahoo.nowrote:
On 18 Mar, 23:45, Arnaud Delobelle <arno...@googlemail.comwrote:
def nonunique(lst):
slst = sorted(lst)
dups = [s[0] for s in
filter(lambda t : t[0] == t[1], zip(slst[:1],slst[1:]))]
return [dups[0]] + [s[1] for s in
filter(lambda t : t[0] != t[1], zip(dups[:1],dups[1:]))]
Argh! What's wrong with something like:
def duplicates(l):
i = j = object()
for k in sorted(l):
if i != j == k: yield k
i, j = j, k
Nice, and more readable. But I'd use Paul Robin's solution. It is O(N)
as opposed to ours which are O(N log N).
I'd use Raymond Hettinger's solution. It is as much O(N) as Paul's,
and is IMHO more readable than Paul's.
It's not as much O(N)... Paul Robin's uses a sort first which is
definitely not O(N). Paul's could be prettied up a bit but the general
principle is sound.  
P: n/a

On Mar 20, 9:57 am, Justin Bozonier <darkxant...@gmail.comwrote:
On Mar 19, 2:48 pm, John Machin <sjmac...@lexicon.netwrote:
On Mar 19, 10:08 am, sturlamolden <sturlamol...@yahoo.nowrote:
On 18 Mar, 23:45, Arnaud Delobelle <arno...@googlemail.comwrote:
def nonunique(lst):
slst = sorted(lst)
dups = [s[0] for s in
filter(lambda t : t[0] == t[1], zip(slst[:1],slst[1:]))]
return [dups[0]] + [s[1] for s in
filter(lambda t : t[0] != t[1], zip(dups[:1],dups[1:]))]
Argh! What's wrong with something like:
def duplicates(l):
i = j = object()
for k in sorted(l):
if i != j == k: yield k
i, j = j, k
Nice, and more readable. But I'd use Paul Robin's solution. It is O(N)
as opposed to ours which are O(N log N).
I'd use Raymond Hettinger's solution. It is as much O(N) as Paul's,
and is IMHO more readable than Paul's.
It's not as much O(N)... Paul Robin's uses a sort first which is
definitely not O(N). Paul's could be prettied up a bit but the general
principle is sound.
"""
from collections import defaultdict
def nonunique(lst):
d = defaultdict(int)
for x in lst: d[x] += 1
return [x for x,n in d.iterkeys() if n 1]
"""
I see no sort here.  
P: n/a

On Mar 20, 9:14 am, sturlamolden <sturlamol...@yahoo.nowrote:
On 19 Mar, 22:48, John Machin <sjmac...@lexicon.netwrote:
I'd use Raymond Hettinger's solution. It is as much O(N) as Paul's,
and is IMHO more readable than Paul's.
Is a Python set implemented using a hash table?
What don't you understand about the comments in the first two
screenfuls of Objects/setobject.c?  
P: n/a

On 20 Mar, 00:16, John Machin <sjmac...@lexicon.netwrote:
What don't you understand about the comments in the first two
screenfuls of Objects/setobject.c?
I had not looked at it, but now I have. Is seems Hettinger is the
author :) Ok, so sets are implemented as hash tables. Then I agree,
use Raymond Hettinger's solution.  
P: n/a

En Wed, 19 Mar 2008 20:16:36 0300, John Machin <sj******@lexicon.net>
escribió:
On Mar 20, 9:14 am, sturlamolden <sturlamol...@yahoo.nowrote:
>Is a Python set implemented using a hash table?
What don't you understand about the comments in the first two
screenfuls of Objects/setobject.c?
That comment was unnecesarily harsh, don't you think?

Gabriel Genellina  
P: n/a

On Mar 20, 12:50*pm, "Gabriel Genellina" <gagsl...@yahoo.com.ar>
wrote:
En Wed, 19 Mar 2008 20:16:36 0300, John Machin <sjmac...@lexicon.net*
escribió:
On Mar 20, 9:14 am, sturlamolden <sturlamol...@yahoo.nowrote:
Is a Python set implemented using a hash table?
What don't you understand about the comments in the first two
screenfuls of Objects/setobject.c?
That comment was unnecesarily harsh, don't you think?
No, otherwise I wouldn't have made it.  
P: n/a

On Mar 20, 2:07*am, John Machin <sjmac...@lexicon.netwrote:
On Mar 20, 12:50*pm, "Gabriel Genellina" <gagsl...@yahoo.com.ar>
wrote:En Wed, 19 Mar 2008 20:16:36 0300, John Machin <sjmac...@lexicon.net*
escribió:
On Mar 20, 9:14 am, sturlamolden <sturlamol...@yahoo.nowrote:
>Is a Python set implemented using a hash table?
What don't you understand about the comments in the first two
screenfuls of Objects/setobject.c?
That comment was unnecesarily harsh, don't you think?
No, otherwise I wouldn't have made it.
Screenfuls is an awful lot, the way you guys write. Besides, what's
the answer? What don't you understand?  
P: n/a

On Mar 21, 3:39*pm, John Machin <sjmac...@lexicon.netwrote:
On Mar 22, 1:11 am, castiro...@gmail.com wrote:
A collision sequence is not so rare.
>>[ hash( 2**i ) for i in range( 0, 256, 32 ) ]
[1, 1, 1, 1, 1, 1, 1, 1]
Bryan did qualify his remarks: "If we exclude the case where an
adversary is choosing the keys, ..."
Some adversary. What, you mean, my boss or my customers?  
P: n/a
 ca********@gmail.com wrote:
John Machin wrote:
>On Mar 22, 1:11 am, castiro...@gmail.com wrote:
>>A collision sequence is not so rare. >[ hash( 2**i ) for i in range( 0, 256, 32 ) ] [1, 1, 1, 1, 1, 1, 1, 1]
>Bryan did qualify his remarks: "If we exclude the case where an adversary is choosing the keys, ..."
Some adversary. What, you mean, my boss or my customers?
We mean that the party supplying the keys deliberately chose
them to make the hash table inefficient. In this thread the goal
is efficiency; a party working toward an opposing goal is an
adversary.
If you find realworld data sets that tend to induce badcase
behavior in Python's hash, please do tell. It would be reason
enough to adjust the hash function. The hashes in popular
software such as Python are already quite well vetted.
Even a hash function that behaves as a random oracle has
worstcase quadratictime in the algorithm here, but that's
an issue in theory and not in serving customers.

Bryan  
P: n/a

On 22 Mar, 09:31, Bryan Olson <fakeaddr...@nowhere.orgwrote:
Even a hash function that behaves as a random oracle has
worstcase quadratictime in the algorithm here
In which case inserts are not amortized to O(1).   This discussion thread is closed Replies have been disabled for this discussion.   Question stats  viewed: 5361
 replies: 22
 date asked: Mar 18 '08
