# groupby

 P: n/a can some explain why in the 2nd example, m doesn't print the list [1, 1, 1] which i had expected? for k, g in groupby([1, 1, 1, 2, 2, 3]): .... print k, list(g) .... 1 [1, 1, 1] 2 [2, 2] 3 [3] m = list(groupby([1, 1, 1, 2, 2, 3])) m [(1, ), (2, ), (3, )] list(m[0][1]) [] thanks, bryan May 23 '06 #1
 P: n/a Bryan wrote: can some explain why in the 2nd example, m doesn't print the list [1, 1, 1] which i had expected? >>> for k, g in groupby([1, 1, 1, 2, 2, 3]): ... print k, list(g) ... 1 [1, 1, 1] 2 [2, 2] 3 [3] >>> m = list(groupby([1, 1, 1, 2, 2, 3])) >>> m [(1, ), (2, ), (3, )] >>> list(m[0][1]) [] >>> thanks, bryan I've tripped on this more than once, but it's in the docs (http://docs.python.org/lib/itertools-functions.html): "The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list" George May 23 '06 #2

 P: n/a George Sakkis wrote: Bryan wrote: can some explain why in the 2nd example, m doesn't print the list [1, 1, 1] which i had expected? >>> for k, g in groupby([1, 1, 1, 2, 2, 3]): ... print k, list(g) ... 1 [1, 1, 1] 2 [2, 2] 3 [3] >>> m = list(groupby([1, 1, 1, 2, 2, 3])) >>> m [(1, ), (2, ), (3, )] >>> list(m[0][1]) [] >>> thanks, bryan I've tripped on this more than once, but it's in the docs (http://docs.python.org/lib/itertools-functions.html): "The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list" George i read that description in the docs so many times before i posted here. now that i read it about 10 more times, i finally get it. there's just something about the wording that kept tripping me up, but i can't explain why :) thanks, bryan May 23 '06 #3

 P: n/a "Bryan" wrote in message news:ma***************************************@pyt hon.org... George Sakkis wrote: "The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list" George i read that description in the docs so many times before i posted here. now that i read it about 10 more times, i finally get it. there's just something about the wording that kept tripping me up, but i can't explain why :) thanks, bryan So here's how to save the values from the iterators while iterating over the groupby: m = [(x,list(y)) for x,y in groupby([1, 1, 1, 2, 2, 3])] m [(1, [1, 1, 1]), (2, [2, 2]), (3, [3])] -- Paul May 23 '06 #4

 P: n/a "Paul McGuire" wrote in message news:bz******************@tornado.texas.rr.com... So here's how to save the values from the iterators while iterating over the groupby: m = [(x,list(y)) for x,y in groupby([1, 1, 1, 2, 2, 3])] m [(1, [1, 1, 1]), (2, [2, 2]), (3, [3])] -- Paul Playing some more with groupby. Here's a one-liner to tally a list of integers into a histogram: # create data set, random selection of numbers from 1-10 dataValueRange = range(1,11) data = [random.choice(dataValueRange) for i in xrange(10)] print data # tally values into histogram: # (from the inside out: # - sort data into ascending order, so groupby will see all like values together # - call groupby, return iterator of (value,valueItemIterator) tuples # - tally groupby results into a dict of (value, valueFrequency) tuples # - expand dict into histogram list, filling in zeroes for any keys that didn't get a value hist = [ (k1,dict((k,len(list(g))) for k,g in itertools.groupby(sorted(data))).get(k1,0)) for k1 in dataValueRange ] print hist Gives: [9, 6, 8, 3, 2, 3, 10, 7, 6, 2] [(1, 0), (2, 2), (3, 2), (4, 0), (5, 0), (6, 2), (7, 1), (8, 1), (9, 1), (10, 1)] Change the generation of the original data list to 10,000 values, and you get something like: [(1, 995), (2, 986), (3, 941), (4, 998), (5, 978), (6, 1007), (7, 997), (8, 1033), (9, 1038), (10, 1027)] If you know there wont be any zero frequency values (or don't care about them), you can skip the fill-in-the-zeros step, with one of these expressions: histAsList = [ (k,len(list(g))) for k,g in itertools.groupby(sorted(data)) ] histAsDict = dict((k,len(list(g))) for k,g in itertools.groupby(sorted(data))) -- Paul May 27 '06 #5

