P: n/a

I'm new to Python and to Numerical Python. I have a program written
in another program that used arrays extensively. I'm trying to
convert to Python.
Assume an array in Numerical Python, the contents of which are the
daily open, high, low, and last prices of the DOW Jones Industrial
Average for its entire history. How would one iterate throughout the
entire array calculating the 10 day average of daily high price or the
daily low price?
If someone could write some pseudocode that would point me in the
right direction, I would be most appreciative.
Thanks.  
Share this Question
P: n/a
 mc*****@bigfoot.com (2mc) wrote previously:
Assume an array in Numerical Python, the contents of which are the
daily open, high, low, and last prices of the DOW Jones Industrial
Average for its entire history.
I'm not sure exactly where in the array the various highs and lows are
stored (different rows?).
But in general, an average is 'sum(highs)/len(highs)'. The version of
'sum()' in Numeric will work a lot faster though... although it makes no
difference for 10 prices. If you start worrying about a million prices,
you might see a significant boost using Numeric.
I have an intro article on Numerical Python forthcoming at IBM dW. But
the manual for Numeric is quite excellent to start with. Still, the
package is probably overkill for the simple and small operations
mentioned.
Yours, David...

mertz@  The specter of free information is haunting the `Net! All the
gnosis  powers of IP and cryptotyranny have entered into an unholy
..cx  alliance...ideas have nothing to lose but their chains. Unite
 against "intellectual property" and antiprivacy regimes!
  
P: n/a

David Mertz wrote:
... But in general, an average is 'sum(highs)/len(highs)'. The version of 'sum()' in Numeric will work a lot faster though... although it makes no difference for 10 prices. If you start worrying about a million prices, you might see a significant boost using Numeric.
No kidding: Numeric.sum is an *order of magnitude faster* for this:
[alex@lancelot pop]$ timeit.py s'import Numeric'
s'x=Numeric.array(map(float,range(1000000)))' 'Numeric.sum(x)'
10 loops, best of 3: 2.11e+04 usec per loop
[alex@lancelot pop]$ timeit.py s'import Numeric'
s'x=Numeric.array(map(float,range(1000000)))' 'sum(x)'
10 loops, best of 3: 3.05e+05 usec per loop
the difference between 21 milliseconds with Numeric.sum, and 300 with
the builtin sum, can be *very* significant if you sum millions of floats in
your bottlenecks.
I think the chapter on Numeric in "Python in a Nutshell" is a good intro
(thanks to lots of crucial input from Eric Jones and Paul DuBois!), and
you can read it for free with the usual trick  get an O'Reilly safari
subscription and use the free first two weeks to read the parts that
interest you, then cancel the subscription so you won't have to pay.
Alex  
P: n/a

2mc: Assume an array in Numerical Python, the contents of which are the daily open, high, low, and last prices of the DOW Jones Industrial Average for its entire history. How would one iterate throughout the entire array calculating the 10 day average of daily high price or the daily low price?
The common technique for moving averages is to maintain a single
accumulator value over the last n points. For each new point, remove (by
subtracting) the value at the beginning of the window and add in the value
at the end of the window. The value of the accumulator divided by n is the
moving average. You will need to define what you want as output, if any,
before the nth point.
Neil  
P: n/a

Neil Hodgson wrote: 2mc:
Assume an array in Numerical Python, the contents of which are the daily open, high, low, and last prices of the DOW Jones Industrial Average for its entire history. How would one iterate throughout the entire array calculating the 10 day average of daily high price or the daily low price?
The common technique for moving averages is to maintain a single accumulator value over the last n points. For each new point, remove (by subtracting) the value at the beginning of the window and add in the value at the end of the window. The value of the accumulator divided by n is the moving average. You will need to define what you want as output, if any, before the nth point.
While it might be a common technique, it's also inefficient (for
Numeric) because you have to iterate over tens of thousands of entries
in a Python loop. Fortunately, Numeric provides a much, much faster
way to do this using slicing.
Let's say D[i] is the daily high at the end of day i, and N is the
total number of days in the array. To calculate 10day averages, you
can add ten slices offset by one, then divide the sum by 10:
A = zeros(N9,Float)
for j in range(10):
A += D[j:N9+j]
A /= 10.0
Bam, that's it. Instead of looping over tens of thousands of slices
of ten, you're now looping over ten slices of tens of thousands.
This works because, when you add ten slices offset by one, the result
is that each item contains the sum of ten consecutive numbers from the
original array.

CARL BANKS http://www.aerojockey.com/software
As the newest Lady Turnpot descended into the kitchen wrapped only in
her celerygreen dressing gown, her creamy bosom rising and falling
like a temperamental souffle, her tart mouth pursed in distaste, the
souschef whispered to the scullery boy, "I don't know what to make of
her."
Laurel Fortuner, Montendre, France
1992 BulwerLytton Fiction Contest Winner  
P: n/a
 me***@gnosis.cx (David Mertz) wrote in message news:<ma************************************@pytho n.org>... I'm not sure exactly where in the array the various highs and lows are stored (different rows?).
I apologize for not making this clear. If an array were to be viewed
as a spreadsheet, then rows would be individual days and columns would
be date, open price, high price, etc.
But in general, an average is 'sum(highs)/len(highs)'. The version of 'sum()' in Numeric will work a lot faster though... although it makes no difference for 10 prices. If you start worrying about a million prices, you might see a significant boost using Numeric.
Assuming the entire daily price history of the Dow Jones (all of last
century through today), finding the 10 day average for each day would
*not* work significantly faster with Numeric? What if I were doing
the standard deviation of price for 25 days and I wanted this done for
each price  open, high, low, close? Still no speed enhancement of
any significance?
I have an intro article on Numerical Python forthcoming at IBM dW. But the manual for Numeric is quite excellent to start with. Still, the package is probably overkill for the simple and small operations mentioned.
Yours, David...
I have been reading the manual. I'm still working to get the cobwebs
out of my mind  which are the ways I did things in the other language
I used.
Thank you for your input. And, I look forward to any further comments
my clarification may elicit. Thanks.
Matt  
P: n/a

numarray, the planned successor to Numeric 23, has a function cumsum
(cumulative sum) which might be helpful.
numarray apears to be largely operational, you might consider it.
Colin W.
Carl Banks wrote: Neil Hodgson wrote:
2mc:
Assume an array in Numerical Python, the contents of which are the daily open, high, low, and last prices of the DOW Jones Industrial Average for its entire history. How would one iterate throughout the entire array calculating the 10 day average of daily high price or the daily low price?
The common technique for moving averages is to maintain a single accumulator value over the last n points. For each new point, remove (by subtracting) the value at the beginning of the window and add in the value at the end of the window. The value of the accumulator divided by n is the moving average. You will need to define what you want as output, if any, before the nth point.
While it might be a common technique, it's also inefficient (for Numeric) because you have to iterate over tens of thousands of entries in a Python loop. Fortunately, Numeric provides a much, much faster way to do this using slicing.
Let's say D[i] is the daily high at the end of day i, and N is the total number of days in the array. To calculate 10day averages, you can add ten slices offset by one, then divide the sum by 10:
A = zeros(N9,Float) for j in range(10): A += D[j:N9+j] A /= 10.0
Bam, that's it. Instead of looping over tens of thousands of slices of ten, you're now looping over ten slices of tens of thousands.
This works because, when you add ten slices offset by one, the result is that each item contains the sum of ten consecutive numbers from the original array.
 
P: n/a

Carl Banks <im*****@aerojockey.invalid> wrote in message news:<aw****************@nwrdny02.gnilink.net>... Let's say D[i] is the daily high at the end of day i, and N is the total number of days in the array. To calculate 10day averages, you can add ten slices offset by one, then divide the sum by 10:
A = zeros(N9,Float) for j in range(10): A += D[j:N9+j] A /= 10.0
Bam, that's it. Instead of looping over tens of thousands of slices of ten, you're now looping over ten slices of tens of thousands.
This works because, when you add ten slices offset by one, the result is that each item contains the sum of ten consecutive numbers from the original array.
Thank you very much for this code. It took me several looks at it,
before I understood it. I'm still getting the cobwebs out of my brain
over the way I used to do things.
I have 2 questions. This code snippet runs at "array speed," right?
In other words, it will run much faster than if I tried to duplicate
this with lists, correct?
Also, can you give me an idea of how you would accomplish the 10 day
standard deviation of prices instead of the 10 day average? It would
really help me get a handle on this.
Thank you.
Matt  
P: n/a

2mc wrote: Carl Banks <im*****@aerojockey.invalid> wrote in message news:<aw****************@nwrdny02.gnilink.net>... Let's say D[i] is the daily high at the end of day i, and N is the total number of days in the array. To calculate 10day averages, you can add ten slices offset by one, then divide the sum by 10:
A = zeros(N9,Float) for j in range(10): A += D[j:N9+j] A /= 10.0
Bam, that's it. Instead of looping over tens of thousands of slices of ten, you're now looping over ten slices of tens of thousands.
This works because, when you add ten slices offset by one, the result is that each item contains the sum of ten consecutive numbers from the original array. Thank you very much for this code. It took me several looks at it, before I understood it. I'm still getting the cobwebs out of my brain over the way I used to do things.
I have 2 questions. This code snippet runs at "array speed," right? In other words, it will run much faster than if I tried to duplicate this with lists, correct?
Yes, it should be far faster. The main benefit of Numeric arrays is
they can do all kinds of array operations without having to wite a
loop in Python.
The main advantage of this method over the method using "sum" is that
this method operates on large arrays while iterating 10 times in
Python. The method using "sum" operates on 10element arrays while
iterating 100 thousand or so times in Python. It's easy to see that
the former method puts Numeric is put to better use.
Also, can you give me an idea of how you would accomplish the 10 day standard deviation of prices instead of the 10 day average? It would really help me get a handle on this.
Well, I don't remember the exact formula for standard deviationbut
here is an example that does something to that effect (it uses the
average calculated above):
S = zeros(N9,Float)
for j in range(10):
S += sqrt(D[j:N9+j]  A)
S /= 10.0
The key is to think of the array slices as if they were regular scalar
values. The above method calculates the SD (not really) pretty much
the same way one would calculate the SD using regular old numbers.
Note that sqrt is Numeric.sqrt, not math.sqrt.

CARL BANKS http://www.aerojockey.com/software
As the newest Lady Turnpot descended into the kitchen wrapped only in
her celerygreen dressing gown, her creamy bosom rising and falling
like a temperamental souffle, her tart mouth pursed in distaste, the
souschef whispered to the scullery boy, "I don't know what to make of
her."
Laurel Fortuner, Montendre, France
1992 BulwerLytton Fiction Contest Winner  
P: n/a

"Neil Hodgson" <nh******@bigpond.net.au> wrote in message news:<S0********************@newsserver.bigpond.net.au>... 2mc:
Assume an array in Numerical Python, the contents of which are the daily open, high, low, and last prices of the DOW Jones Industrial Average for its entire history. How would one iterate throughout the entire array calculating the 10 day average of daily high price or the daily low price?
The common technique for moving averages is to maintain a single accumulator value over the last n points. For each new point, remove (by subtracting) the value at the beginning of the window and add in the value at the end of the window. The value of the accumulator divided by n is the moving average. You will need to define what you want as output, if any, before the nth point.
Neil
While this is common in c/c++, it is not the most efficient way in
python when you have numpy around to do the loops in c if used correctly.
I use the following setup
from Numeric import array, cumsum
def movavg(s, n):
''' returns an n period moving average for the time series s
s is a list ordered from oldest (index 0) to most recent (index 1)
n is an integer
returns a numeric array of the moving average
'''
s = array(s)
c = cumsum(s)
return (c[n1:]  c[:n+1]) / float(n)
This should run in near constant time with regard to n (of course,
O(n) to the length of s). At least one person has said yuk becuase
of the numerical issue of losing precission in the cumsum, but for
small n's, and values like you will see in stock prices and indices,
I don't think this is too much of a problem. Someone may have
a more numerically stable version, but then you could just implement
the ccode version and wrap it for python.
Sean  
P: n/a
 mc*****@bigfoot.com (2mc) wrote previously:
I apologize for not making this clear. If an array were to be viewed
as a spreadsheet, then rows would be individual days and columns would
be date, open price, high price, etc.
Aside from the speed increase, Numeric provides quite a few handy syntax
tricks. For example, in a crude version of your problem, I first create
the "spreadsheet", and populate it with silly open/close/high/low
prices. from Numeric import * stock = zeros((5,20),Int) stock[0,:] = range(20) # number the days stock[1,:] = [10]*20 # open price stock[2,:] = [11]*20 # close price stock[3,:] = [13]*20 # high price stock[4,:] = [8]*20 # low price print stock
[[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
[10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10]
[11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11]
[13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13]
[ 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8]]
Of course, it is an odd stock that opens, closes, and peaks at the same
price on every day. :)
Next, I can find a particular average high for a range:
avg_high_day5to15 = sum(stocks[3,5:15])/len(stocks[3,5:15]) avg_high_day5to15
13
You might want to generalize it in a function:
def ten_day_avg_high(beg):
... return sum(stocks[3,beg:beg+10])/len(stocks[3,beg:beg+10])
Which let's you calculate your running averages:
map(ten_day_avg_high, range(20))
[13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13]
This works even near the end, where there are fewer than ten days to
averagewe divide by the length of the tail, not by '10' to make sure
of that.
You can work out deviations and other statistics in similar ways.
Yours, David...

Keeping medicines from the bloodstreams of the sick; food from the bellies
of the hungry; books from the hands of the uneducated; technology from the
underdeveloped; and putting advocates of freedom in prisons. Intellectual
property is to the 21st century what the slave trade was to the 16th.   This discussion thread is closed Replies have been disabled for this discussion.   Question stats  viewed: 2954
 replies: 10
 date asked: Jul 18 '05
