Is there a way I can do time series calculation, such as a moving
average in list comprehension syntax? I'm new to python but it looks
like list comprehension's 'head' can only work at a value at a time. I
also tried using the reduce function and passed in my list and another
function which calculates a moving average outside the list comp. ...
but I'm not clear how to do it. Any ideas? Thanks. 9 3076
I suggest that statistical data, including time series, be stored and
processed in arrays, such as the one found in NumPy. You can compute
averages using the "sum" function and array slices.
Write explicit for loops, possibly with nested if conditionals, that do
exactly what you want. The functional expressions are abbreviations for
certain patterns of induction. Except as an educational exercise, I do not
think it worthwhile to go through contortions to force fit a problem to a
pattern it does not really fit.
Terry Jan Reedy
I agree with others that reduce is not the best way to do this. But,
to satisfy your curiosity, I offer this horribly inefficient way to use
"reduce" to calculate the average of a list:
from __future__ import division
def reduceaverage(acc, x):
return [acc[0] + x, acc[1] + 1, (acc[0] + x) / (acc[1] + 1) ]
numbers = [4, 8, 15, 16, 23, 42]
print reduce(reduceaverage, numbers, [0,0,0])[2]
....basically, the idea is to write a function that takes as its first
argument the accumulated values, and as its second argument the next
value in the list. In Python, this is almost always the wrong way to
do something, but it is kind of geeky and LISPish.
Do you mean something like this?
for i in xrange(5, len(ts)):
# compute and print moving average from i5 to i
print i, sum(ts[i5:i]) / 5.
Well, you could iterate over an index into the list:
from __future__ import division
def moving_average(sequence, n):
return [sum(sequence[i:i+n])/n for i in
xrange(len(sequence)n+1)]
Of course, that's hardly efficient. You really want to use the value
calculated for the i_th term in the (i+1)th term's evaluation. While
it's not easy (or pretty) to store state between iterations in a list
comprehension, this is the perfect use for a generator:
def generator_to_list(f):
return lambda *args,**keywords: list(f(*args,**keywords))
@generator_to_list
def moving_average(sequence, n):
assert len(sequence) >= n and n > 0
average = sum(sequence[:n]) / n
yield average
for i in xrange(1, len(sequence)n+1):
average += (sequence[i+n1]  sequence[i1]) / n
yield average
Lonnie Princehouse wrote: You*really*want*to*use*the*value calculated for the i_th term in the (i+1)th term's evaluation.**
It may sometimes be necessary to recalculate the average for every iteration
to avoid error accumulation. Another tradeoff with your optimization is
that it becomes harder to switch the accumulation function from average to
max, say.
While it's not easy (or pretty) to store state between iterations in a list comprehension, this is the perfect use for a generator:
**def*generator_to_list(f): ****return*lambda**args,**keywords:*list(f(*args,* *keywords))
**@generator_to_list **def*moving_average(sequence,*n): ****assert*len(sequence)*>=*n*and*n*>*0 ****average*=*sum(sequence[:n])*/*n ****yield*average ****for*i*in*xrange(1,*len(sequence)n+1): ******average*+=*(sequence[i+n1]**sequence[i1])*/*n ******yield*average
Here are two more that work with arbitrary iterables:
from __future__ import division
from itertools import islice, tee, izip
from collections import deque
def window(items, n):
it = iter(items)
w = deque(islice(it, n1))
for item in it:
w.append(item)
yield w # for a robust implementation:
# yield tuple(w)
w.popleft()
def moving_average1(items, n):
return (sum(w)/n for w in window(items, n))
def moving_average2(items, n):
first_items, last_items = tee(items)
accu = sum(islice(last_items, n1))
for first, last in izip(first_items, last_items):
accu += last
yield accu/n
accu = first
While moving_average1() is even slower than your inefficient variant,
moving_average2() seems to be a tad faster than the efficient one.
Peter
I used the following to return an array of the average of the last n
values it's not particularly pretty, but it works
# set number of values to average
weighting = 10
# an array of values we want to calculate a running average on
ratings = []
# an array of running averages
running_avg = []
# some routine to fill ratings with the values
r = random.Random()
for i in range(0, 20):
ratings.append(float(r.randint(0, 99)))
for i in range(1, 1 + len(ratings)):
if i < weighting:
running_avg.append(ratings[i  1])
else:
running_avg.append(reduce(lambda s, a: s+ a,
ratings[i  weighting : i]) /
len(ratings[i  weighting : i]))
for i in range(0, len(ratings)):
print "%3d: %3d %5.2f" % (i, ratings[i], running_avg[i])
sample output:
0: 34 34.00
1: 28 28.00
2: 58 58.00
3: 16 34.00
4: 74 44.00
5: 32 45.00
6: 74 49.00
7: 21 50.25
8: 78 51.25
9: 28 50.25
10: 32 39.75
11: 93 57.75
12: 2 38.75
13: 7 33.50
14: 8 27.50
15: 30 11.75
16: 1 11.50
17: 8 11.75
18: 40 19.75
19: 8 14.25
For all but the first 3 rows, the third column is the average of the
values in the 2nd column for this and the preceding 3 rows.

Jim Segrave (je*@jes2.demon.nl)
[Peter Otten] from __future__ import division
from itertools import islice, tee, izip
. . . def moving_average2(items, n): first_items, last_items = tee(items) accu = sum(islice(last_items, n1)) for first, last in izip(first_items, last_items): accu += last yield accu/n accu = first
While moving_average1() is even slower than your inefficient variant, moving_average2() seems to be a tad faster than the efficient one.
This is nicely done and scalesup well. Given an naverage of mitems,
it has O(n) memory consumption and O(m) running time. In contrast, the
other variants do more work than necessary by pulling the whole
sequence into memory or by resumming all n items at every step,
resulting in O(m) memory consumption and O(m*n) running time.
This recipe gets my vote for the best solution.
Raymond
