471,119 Members | 1,499 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,119 software developers and data experts.

Returning histogram-like data for items in a list

Hi there,

I have a list:
L1 = [1,1,1,2,2,3]

How can I easily turn this into a list of tuples where the first element
is the list element and the second is the number of times it occurs in
the list (I think that this is referred to as a histogram):

i.e.:

L2 = [(1,3),(2,2),(3,1)]

I was doing something like:

myDict = {}
for i in L1:
myDict.setdefault(i,[]).append(i)

then doing this:

L2 = []
for k, v in myDict.iteritems():
L2.append((k, len(v)))

This works but I sort of feel like there ought to be an easier way,
rather than to have to store the list elements, when all I want is a
count of them. Would anyone care to comment?

I also tried this trick, where locals()['_[1]'] refers to the list
comprehension itself as it gets built, but it gave me unexpected results:
L2 = [(i, len(i)) for i in L2 if not i in locals()['_[1]']]
L2

[((1, 3), 2), ((2, 2), 2), ((3, 1), 2)]

i.e. I don't understand why each tuple is being counted as well.

Regards,

Ric
Jul 21 '05 #1
6 2602
Ric Deez wrote:
Hi there,

I have a list:
L1 = [1,1,1,2,2,3]

How can I easily turn this into a list of tuples where the first element
is the list element and the second is the number of times it occurs in
the list (I think that this is referred to as a histogram):

i.e.:

L2 = [(1,3),(2,2),(3,1)]

import itertools
L1 = [1,1,1,2,2,3]
L2 = [(key, len(list(group))) for key, group in itertools.groupby(L1)]
L2

[(1, 3), (2, 2), (3, 1)]
--
Michael Hoffman
Jul 22 '05 #2
"Michael Hoffman" <ca*******@mh391.invalid> wrote:
Ric Deez wrote:
Hi there,

I have a list:
L1 = [1,1,1,2,2,3]

How can I easily turn this into a list of tuples where the first element
is the list element and the second is the number of times it occurs in
the list (I think that this is referred to as a histogram):

i.e.:

L2 = [(1,3),(2,2),(3,1)]

>>> import itertools
>>> L1 = [1,1,1,2,2,3]
>>> L2 = [(key, len(list(group))) for key, group in itertools.groupby(L1)]
>>> L2

[(1, 3), (2, 2), (3, 1)]
--
Michael Hoffman


This is correct if the original list items are grouped together; to be on the safe side, sort it
first:
L2 = [(key, len(list(group))) for key, group in itertools.groupby(sorted(L1))]

Or if you care about performance rather than number of lines, use this:

def hist(seq):
h = {}
for i in seq:
try: h[i] += 1
except KeyError: h[i] = 1
return h.items()
George
Jul 22 '05 #3
Adding to George's reply, if you want slightly more performance, you
can avoid the exception with something like

def hist(seq):
h = {}
for i in seq:
h[i] = h.get(i,0)+1
return h.items()

Jeethu Rao

Jul 22 '05 #4
Ric Deez a écrit :
Hi there,

I have a list:
L1 = [1,1,1,2,2,3]

How can I easily turn this into a list of tuples where the first element
is the list element and the second is the number of times it occurs in
the list (I think that this is referred to as a histogram):

i.e.:

L2 = [(1,3),(2,2),(3,1)]

I was doing something like:

myDict = {}
for i in L1:
myDict.setdefault(i,[]).append(i)

then doing this:

L2 = []
for k, v in myDict.iteritems():
L2.append((k, len(v)))

This works but I sort of feel like there ought to be an easier way,
If you don't care about order (but your solution isn't garanteed to
preserve order either...):

L2 = dict([(item, L1.count(item)) for item in L1]).items()

But this may be inefficient is the list is large, so...

def hist(seq):
d = {}
for item in seq:
if not item in d:
d[item] = seq.count(item)
return d.items()
I also tried this trick, where locals()['_[1]'] refers to the list


Not sure to understand how that one works... But anyway, please avoid
this kind of horror unless your engaged in WORN context with a
perl-monger !-).
Jul 22 '05 #5
"jeethu_rao" <je*******@gmail.com> wrote:
Adding to George's reply, if you want slightly more performance, you
can avoid the exception with something like

def hist(seq):
h = {}
for i in seq:
h[i] = h.get(i,0)+1
return h.items()

Jeethu Rao


The performance penalty of the exception is imposed only the first time a distinct item is found. So
unless you have a huge list of distinct items, I seriously doubt that this is faster at any
measurable rate.

George
Jul 22 '05 #6

"Ric Deez" <de**@next-level.com.au> wrote in message
news:db**********@nnrp.waia.asn.au...
I have a list:
L1 = [1,1,1,2,2,3]
How can I easily turn this into a list of tuples where the first element
is the list element and the second is the number of times it occurs in
the list (I think that this is referred to as a histogram):


For ease of reading (but not efficiency) I like:
hist = [(x,L1.count(x)) for x in set(L1)]
See http://aspn.activestate.com/ASPN/Coo.../Recipe/277600

Alan Isaac
Jul 22 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

reply views Thread by Oracle3001 | last post: by
7 posts views Thread by WreckingCru | last post: by
27 posts views Thread by ext_u | last post: by
12 posts views Thread by KraftDiner | last post: by
reply views Thread by sami2000 | last post: by
11 posts views Thread by c19h28o2 | last post: by
5 posts views Thread by arnuld | last post: by
15 posts views Thread by zaturn | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.