473,395 Members | 1,720 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Clustering the keys of a dict according to its values

Hi everybody!

Given a dictionary, I want to create a clustered version of it,
collecting keys that have the same value:
>>d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
cluster(d)
{1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}

That is, generate a new dict which holds for each value of the old dict
a list of the keys of the old dict that have that very value.

Another requirement is that it should also work on lists, in that case
with indices instead of keys. We may assume that all values in the
original dict/list can be used as dict keys.

Right now I'm doing it like this:

def cluster(d):
try:
# is d a dict?
values = unique(d.values())
keys = d.keys()
except AttributeError:
# assume d is a list
values = unique(d)
keys = range(len(d))

clusters = {}
for value in values:
clusters[value] = filter(lambda v: d[v] == value, keys)

return clusters

where unique() is from O'Reilly's Python Cookbook and returns a list
containing each item of the given list exactly once.

Now I'm pretty new to Python and chances are rather high that this could
be done prettier and/or more efficient. Therefore any comment on the
above code is greatly appreciated.
Regards,
Florian
Nov 14 '08 #1
12 4968
Florian Brucker:
We may assume that all values in the
original dict/list can be used as dict keys.
Are you sure? Is this for school?

Bye,
bearophile
Nov 14 '08 #2
Are you sure? Is this for school?

Yes, I'm pretty sure (the values of the original dict are integers), and
no, this isn't for school :) If the "We may assume ..." made you think
so, I guess that's the way everybody formulates requirements after
having read to many math papers :D

If it is of interest to you: I'm trying to reproduce a certain sorted
version of the keys of a dict (sorted according to their value) where
the values are not unique, and thus the sorted version is not unique,
either. All I have left of the original sorting is some statistics, so I
can check for every sorted version if it's the same as the original. If
I have the clustering I'm talking about getting all the valid sorted
lists is a matter of concatenating permutations of the clusters.
Regards,
Florian
Nov 14 '08 #3
On Nov 14, 1:16 pm, Florian Brucker <t...@torfbold.comwrote:
Hi everybody!

Given a dictionary, I want to create a clustered version of it,
collecting keys that have the same value:
>>d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
>>cluster(d)
{1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}

That is, generate a new dict which holds for each value of the old dict
a list of the keys of the old dict that have that very value.

Another requirement is that it should also work on lists, in that case
with indices instead of keys. We may assume that all values in the
original dict/list can be used as dict keys.

Right now I'm doing it like this:

def cluster(d):
try:
# is d a dict?
values = unique(d.values())
keys = d.keys()
except AttributeError:
# assume d is a list
values = unique(d)
keys = range(len(d))

clusters = {}
for value in values:
clusters[value] = filter(lambda v: d[v] == value, keys)

return clusters

where unique() is from O'Reilly's Python Cookbook and returns a list
containing each item of the given list exactly once.

Now I'm pretty new to Python and chances are rather high that this could
be done prettier and/or more efficient. Therefore any comment on the
above code is greatly appreciated.

Regards,
Florian
Your implementation is pretty inefficient as it iterates over the
whole dictionary as many times as there are distinct values in it.
Here is a simpler solution.

from collections import defaultdict

def cluster(d):
clusters = defaultdict(list)
for key, val in d.iteritems():
clusters[val].append(key)
return clusters
Or without defaultdict:

def cluster(d):
clusters = {}
for key, val in d.iteritems():
clusters.setdefault(val, []).append(key)
return clusters
--
Arnaud
Nov 14 '08 #4
Not much tested:

from collections import defaultdict

def cluster(pairs):
"""
>>d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
cluster(d) == {1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}
True
>>p = [1, 2, 1, 1, 2, 3]
cluster(p) == {1: [0, 2, 3], 2: [1, 4], 3: [5]}
True
"""
d = defaultdict(list)
if isinstance(pairs, list):
for k, v in enumerate(pairs):
d[v].append(k)
else:
for k, v in pairs.iteritems():
d[v].append(k)
return d

if __name__ == "__main__":
import doctest
doctest.testmod()
print "Doctests finished.\n"

Bye,
bearophile
Nov 14 '08 #5
Alternative version:

def cluster(data):
d = defaultdict(list)
pairs = enumerate(data) if isinstance(data, list) else
data.iteritems()
for k, v in pairs:
d[v].append(k)
return d

Bye,
bearophile
Nov 14 '08 #6
On Nov 14, 11:24*pm, bearophileH...@lycos.com wrote:
Alternative version:

def cluster(data):
* * d = defaultdict(list)
* * pairs = enumerate(data) if isinstance(data, list) else
data.iteritems()
* * for k, v in pairs:
* * * * d[v].append(k)
* * return d

Bye,
bearophile
Very nice, +1.
Nov 14 '08 #7
Florian Brucker a écrit :
Hi everybody!

Given a dictionary, I want to create a clustered version of it,
collecting keys that have the same value:
>>d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
>>cluster(d)
{1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}

That is, generate a new dict which holds for each value of the old dict
a list of the keys of the old dict that have that very value.

Another requirement is that it should also work on lists, in that case
with indices instead of keys. We may assume that all values in the
original dict/list can be used as dict keys.
from collections import defaultdict

def cluster(source):
iteritems = getattr(
source, "iteritems",
lambda : enumerate(source)
)
index = defaultdict(list)
for k, v in iteritems():
index[k].append(v)
return index

HTH
Nov 14 '08 #8
be************@lycos.com a écrit :
Alternative version:

def cluster(data):
d = defaultdict(list)
pairs = enumerate(data) if isinstance(data, list) else
data.iteritems()

What is data is another type of sequence or iterable ?-)
for k, v in pairs:
d[v].append(k)
return d

Bye,
bearophile
Nov 14 '08 #9
Florian Brucker <to**@torfbold.comwrote:
That is, generate a new dict which holds for each value of the old
dict a list of the keys of the old dict that have that very value.
Another requirement is that it should also work on lists, in that case
with indices instead of keys. We may assume that all values in the
original dict/list can be used as dict keys.
import itertools

def invert(d):
def hack(d):
try:
return d.iteritems()
except AttributeError:
return itertools.izip(itertools.count(), d)
dd = {}
for k, v in hack(d):
dd.setdefault(v, []).append(k)
return dd

(It's the setdefault trick which is the important part.)

-- [mdw]
Nov 14 '08 #10
Wow, thanks everybody! There's a lot to learn for me from these examples...
Enjoy your weekend!
Florian

Florian Brucker wrote:
Hi everybody!

Given a dictionary, I want to create a clustered version of it,
collecting keys that have the same value:
>>d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
>>cluster(d)
{1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}

That is, generate a new dict which holds for each value of the old dict
a list of the keys of the old dict that have that very value.

Another requirement is that it should also work on lists, in that case
with indices instead of keys. We may assume that all values in the
original dict/list can be used as dict keys.

Right now I'm doing it like this:

def cluster(d):
try:
# is d a dict?
values = unique(d.values())
keys = d.keys()
except AttributeError:
# assume d is a list
values = unique(d)
keys = range(len(d))

clusters = {}
for value in values:
clusters[value] = filter(lambda v: d[v] == value, keys)

return clusters

where unique() is from O'Reilly's Python Cookbook and returns a list
containing each item of the given list exactly once.

Now I'm pretty new to Python and chances are rather high that this could
be done prettier and/or more efficient. Therefore any comment on the
above code is greatly appreciated.
Regards,
Florian
Nov 15 '08 #11
Bruno Desthuilliers:
What is data is another type of sequence or iterable ?-)<
The original problem statement was:

Florian Brucker:
>Given a dictionary, I want to create a clustered version of it, collecting keys that have the same value: [...] Another requirement is that it should also work on lists, in that case with indices instead of keys. We may assume that all values in the original dict/list can be used as dict keys.<
If the problem changes, then the code has to/can change. When you
write code it's better to avoid over-generalization (this is also a
rule in Agile programming practices).

Bye,
bearophile
Nov 16 '08 #12
Florian Brucker wrote:
Florian Brucker wrote:
>Hi everybody!

Given a dictionary, I want to create a clustered version of it,
collecting keys that have the same value:
> >>d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
cluster(d)
{1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}

That is, generate a new dict which holds for each value of the old
dict a list of the keys of the old dict that have that very value.

Another requirement is that it should also work on lists, in that case
with indices instead of keys. We may assume that all values in the
original dict/list can be used as dict keys.
[...]
Wow, thanks everybody! There's a lot to learn for me from these
examples...
>

d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}

from itertools import groupby

d2 = dict(
(key, [G[1] for G in g]) for (key, g) in
groupby(
sorted( (val, key) for (key, val) in d.iteritems() ),
lambda X: X[0]
)
)

print d2
{1: ['a', 'c', 'd'], 2: ['b', 'e'], 3: ['f']}
;-) *ducks*

G.

Nov 16 '08 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Gerrit Holl | last post by:
Hi, is it guaranteed that dict(zip(d.keys(), d.values())) == d? In words, do .keys() and .values() always have the same order? Is it safe to rely on this? yours, Gerrit. --
1
by: Alexander Kervero | last post by:
Hi ,today i was reading diveinto python book,in chapter 5 it has a very generic module to get file information,html,mp3s ,etc. The code of the example is here :...
5
by: soup_or_power | last post by:
Hi I have an associative array like this: arr=30; arr=20;arr=40;arr=10; I want the sort function to sort keys in ascending order of the values on the right hand side with the following result:...
6
by: David Rasmussen | last post by:
If I have a collection of dicts like: john = {'id': 1, 'name': "John Cleese", 'year': 1939} graham = {'id': 2, 'name': "Graham Chapman", 'year': 1941} I could store all of them in a list. But...
90
by: Christoph Zwerschke | last post by:
Ok, the answer is easy: For historical reasons - built-in sets exist only since Python 2.4. Anyway, I was thinking about whether it would be possible and desirable to change the old behavior in...
6
by: Paul Rubin | last post by:
I'm thinking of proposing that a values method be added to set objects, analogously with dicts. If x is a set, x.values() would be the same as list(x). This feels logical, and it would allow...
1
by: geerttennapel | last post by:
Hello, I'm building an video applicatie and i have a little problem. When building the application, I have insert a field that contains multiple keys from another table. Now I'm trying tot make a...
6
by: Davy | last post by:
Hi all, I have a dictionary with n elements, and I want to get the m(m<=n) keys with the largest values. For example, I have dic that includes n=4 elements, I want m=2 keys have the largest...
13
by: Nader | last post by:
Hello, I have a dictionary and will get all keys which have the same values. d = {('a' : 1), ('b' : 3), ('c' : 2),('d' : 3),('e' : 1),('f' : 4)} I will something as : d.keys(where their...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.