By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,050 Members | 1,019 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,050 IT Pros & Developers. It's quick & easy.

Clustering the keys of a dict according to its values

P: n/a
Hi everybody!

Given a dictionary, I want to create a clustered version of it,
collecting keys that have the same value:
>>d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
cluster(d)
{1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}

That is, generate a new dict which holds for each value of the old dict
a list of the keys of the old dict that have that very value.

Another requirement is that it should also work on lists, in that case
with indices instead of keys. We may assume that all values in the
original dict/list can be used as dict keys.

Right now I'm doing it like this:

def cluster(d):
try:
# is d a dict?
values = unique(d.values())
keys = d.keys()
except AttributeError:
# assume d is a list
values = unique(d)
keys = range(len(d))

clusters = {}
for value in values:
clusters[value] = filter(lambda v: d[v] == value, keys)

return clusters

where unique() is from O'Reilly's Python Cookbook and returns a list
containing each item of the given list exactly once.

Now I'm pretty new to Python and chances are rather high that this could
be done prettier and/or more efficient. Therefore any comment on the
above code is greatly appreciated.
Regards,
Florian
Nov 14 '08 #1
Share this Question
Share on Google+
12 Replies


P: n/a
Florian Brucker:
We may assume that all values in the
original dict/list can be used as dict keys.
Are you sure? Is this for school?

Bye,
bearophile
Nov 14 '08 #2

P: n/a
Are you sure? Is this for school?

Yes, I'm pretty sure (the values of the original dict are integers), and
no, this isn't for school :) If the "We may assume ..." made you think
so, I guess that's the way everybody formulates requirements after
having read to many math papers :D

If it is of interest to you: I'm trying to reproduce a certain sorted
version of the keys of a dict (sorted according to their value) where
the values are not unique, and thus the sorted version is not unique,
either. All I have left of the original sorting is some statistics, so I
can check for every sorted version if it's the same as the original. If
I have the clustering I'm talking about getting all the valid sorted
lists is a matter of concatenating permutations of the clusters.
Regards,
Florian
Nov 14 '08 #3

P: n/a
On Nov 14, 1:16 pm, Florian Brucker <t...@torfbold.comwrote:
Hi everybody!

Given a dictionary, I want to create a clustered version of it,
collecting keys that have the same value:
>>d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
>>cluster(d)
{1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}

That is, generate a new dict which holds for each value of the old dict
a list of the keys of the old dict that have that very value.

Another requirement is that it should also work on lists, in that case
with indices instead of keys. We may assume that all values in the
original dict/list can be used as dict keys.

Right now I'm doing it like this:

def cluster(d):
try:
# is d a dict?
values = unique(d.values())
keys = d.keys()
except AttributeError:
# assume d is a list
values = unique(d)
keys = range(len(d))

clusters = {}
for value in values:
clusters[value] = filter(lambda v: d[v] == value, keys)

return clusters

where unique() is from O'Reilly's Python Cookbook and returns a list
containing each item of the given list exactly once.

Now I'm pretty new to Python and chances are rather high that this could
be done prettier and/or more efficient. Therefore any comment on the
above code is greatly appreciated.

Regards,
Florian
Your implementation is pretty inefficient as it iterates over the
whole dictionary as many times as there are distinct values in it.
Here is a simpler solution.

from collections import defaultdict

def cluster(d):
clusters = defaultdict(list)
for key, val in d.iteritems():
clusters[val].append(key)
return clusters
Or without defaultdict:

def cluster(d):
clusters = {}
for key, val in d.iteritems():
clusters.setdefault(val, []).append(key)
return clusters
--
Arnaud
Nov 14 '08 #4

P: n/a
Not much tested:

from collections import defaultdict

def cluster(pairs):
"""
>>d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
cluster(d) == {1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}
True
>>p = [1, 2, 1, 1, 2, 3]
cluster(p) == {1: [0, 2, 3], 2: [1, 4], 3: [5]}
True
"""
d = defaultdict(list)
if isinstance(pairs, list):
for k, v in enumerate(pairs):
d[v].append(k)
else:
for k, v in pairs.iteritems():
d[v].append(k)
return d

if __name__ == "__main__":
import doctest
doctest.testmod()
print "Doctests finished.\n"

Bye,
bearophile
Nov 14 '08 #5

P: n/a
Alternative version:

def cluster(data):
d = defaultdict(list)
pairs = enumerate(data) if isinstance(data, list) else
data.iteritems()
for k, v in pairs:
d[v].append(k)
return d

Bye,
bearophile
Nov 14 '08 #6

P: n/a
On Nov 14, 11:24*pm, bearophileH...@lycos.com wrote:
Alternative version:

def cluster(data):
* * d = defaultdict(list)
* * pairs = enumerate(data) if isinstance(data, list) else
data.iteritems()
* * for k, v in pairs:
* * * * d[v].append(k)
* * return d

Bye,
bearophile
Very nice, +1.
Nov 14 '08 #7

P: n/a
Florian Brucker a écrit :
Hi everybody!

Given a dictionary, I want to create a clustered version of it,
collecting keys that have the same value:
>>d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
>>cluster(d)
{1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}

That is, generate a new dict which holds for each value of the old dict
a list of the keys of the old dict that have that very value.

Another requirement is that it should also work on lists, in that case
with indices instead of keys. We may assume that all values in the
original dict/list can be used as dict keys.
from collections import defaultdict

def cluster(source):
iteritems = getattr(
source, "iteritems",
lambda : enumerate(source)
)
index = defaultdict(list)
for k, v in iteritems():
index[k].append(v)
return index

HTH
Nov 14 '08 #8

P: n/a
be************@lycos.com a écrit :
Alternative version:

def cluster(data):
d = defaultdict(list)
pairs = enumerate(data) if isinstance(data, list) else
data.iteritems()

What is data is another type of sequence or iterable ?-)
for k, v in pairs:
d[v].append(k)
return d

Bye,
bearophile
Nov 14 '08 #9

P: n/a
Florian Brucker <to**@torfbold.comwrote:
That is, generate a new dict which holds for each value of the old
dict a list of the keys of the old dict that have that very value.
Another requirement is that it should also work on lists, in that case
with indices instead of keys. We may assume that all values in the
original dict/list can be used as dict keys.
import itertools

def invert(d):
def hack(d):
try:
return d.iteritems()
except AttributeError:
return itertools.izip(itertools.count(), d)
dd = {}
for k, v in hack(d):
dd.setdefault(v, []).append(k)
return dd

(It's the setdefault trick which is the important part.)

-- [mdw]
Nov 14 '08 #10

P: n/a
Wow, thanks everybody! There's a lot to learn for me from these examples...
Enjoy your weekend!
Florian

Florian Brucker wrote:
Hi everybody!

Given a dictionary, I want to create a clustered version of it,
collecting keys that have the same value:
>>d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
>>cluster(d)
{1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}

That is, generate a new dict which holds for each value of the old dict
a list of the keys of the old dict that have that very value.

Another requirement is that it should also work on lists, in that case
with indices instead of keys. We may assume that all values in the
original dict/list can be used as dict keys.

Right now I'm doing it like this:

def cluster(d):
try:
# is d a dict?
values = unique(d.values())
keys = d.keys()
except AttributeError:
# assume d is a list
values = unique(d)
keys = range(len(d))

clusters = {}
for value in values:
clusters[value] = filter(lambda v: d[v] == value, keys)

return clusters

where unique() is from O'Reilly's Python Cookbook and returns a list
containing each item of the given list exactly once.

Now I'm pretty new to Python and chances are rather high that this could
be done prettier and/or more efficient. Therefore any comment on the
above code is greatly appreciated.
Regards,
Florian
Nov 15 '08 #11

P: n/a
Bruno Desthuilliers:
What is data is another type of sequence or iterable ?-)<
The original problem statement was:

Florian Brucker:
>Given a dictionary, I want to create a clustered version of it, collecting keys that have the same value: [...] Another requirement is that it should also work on lists, in that case with indices instead of keys. We may assume that all values in the original dict/list can be used as dict keys.<
If the problem changes, then the code has to/can change. When you
write code it's better to avoid over-generalization (this is also a
rule in Agile programming practices).

Bye,
bearophile
Nov 16 '08 #12

P: n/a
Florian Brucker wrote:
Florian Brucker wrote:
>Hi everybody!

Given a dictionary, I want to create a clustered version of it,
collecting keys that have the same value:
> >>d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
cluster(d)
{1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}

That is, generate a new dict which holds for each value of the old
dict a list of the keys of the old dict that have that very value.

Another requirement is that it should also work on lists, in that case
with indices instead of keys. We may assume that all values in the
original dict/list can be used as dict keys.
[...]
Wow, thanks everybody! There's a lot to learn for me from these
examples...
>

d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}

from itertools import groupby

d2 = dict(
(key, [G[1] for G in g]) for (key, g) in
groupby(
sorted( (val, key) for (key, val) in d.iteritems() ),
lambda X: X[0]
)
)

print d2
{1: ['a', 'c', 'd'], 2: ['b', 'e'], 3: ['f']}
;-) *ducks*

G.

Nov 16 '08 #13

This discussion thread is closed

Replies have been disabled for this discussion.