By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,225 Members | 2,170 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,225 IT Pros & Developers. It's quick & easy.

Numerical Python question

P: n/a
2mc
I'm new to Python and to Numerical Python. I have a program written
in another program that used arrays extensively. I'm trying to
convert to Python.

Assume an array in Numerical Python, the contents of which are the
daily open, high, low, and last prices of the DOW Jones Industrial
Average for its entire history. How would one iterate throughout the
entire array calculating the 10 day average of daily high price or the
daily low price?

If someone could write some pseudo-code that would point me in the
right direction, I would be most appreciative.

Thanks.
Jul 18 '05 #1
Share this Question
Share on Google+
10 Replies


P: n/a
mc*****@bigfoot.com (2mc) wrote previously:
|Assume an array in Numerical Python, the contents of which are the
|daily open, high, low, and last prices of the DOW Jones Industrial
|Average for its entire history.

I'm not sure exactly where in the array the various highs and lows are
stored (different rows?).

But in general, an average is 'sum(highs)/len(highs)'. The version of
'sum()' in Numeric will work a lot faster though... although it makes no
difference for 10 prices. If you start worrying about a million prices,
you might see a significant boost using Numeric.

I have an intro article on Numerical Python forthcoming at IBM dW. But
the manual for Numeric is quite excellent to start with. Still, the
package is probably overkill for the simple and small operations
mentioned.

Yours, David...

--
mertz@ | The specter of free information is haunting the `Net! All the
gnosis | powers of IP- and crypto-tyranny have entered into an unholy
..cx | alliance...ideas have nothing to lose but their chains. Unite
| against "intellectual property" and anti-privacy regimes!
-------------------------------------------------------------------------
Jul 18 '05 #2

P: n/a
David Mertz wrote:
...
But in general, an average is 'sum(highs)/len(highs)'. The version of
'sum()' in Numeric will work a lot faster though... although it makes no
difference for 10 prices. If you start worrying about a million prices,
you might see a significant boost using Numeric.


No kidding: Numeric.sum is an *order of magnitude faster* for this:

[alex@lancelot pop]$ timeit.py -s'import Numeric'
-s'x=Numeric.array(map(float,range(1000000)))' 'Numeric.sum(x)'
10 loops, best of 3: 2.11e+04 usec per loop

[alex@lancelot pop]$ timeit.py -s'import Numeric'
-s'x=Numeric.array(map(float,range(1000000)))' 'sum(x)'
10 loops, best of 3: 3.05e+05 usec per loop

the difference between 21 milliseconds with Numeric.sum, and 300 with
the built-in sum, can be *very* significant if you sum millions of floats in
your bottlenecks.

I think the chapter on Numeric in "Python in a Nutshell" is a good intro
(thanks to lots of crucial input from Eric Jones and Paul DuBois!), and
you can read it for free with the usual trick -- get an O'Reilly safari
subscription and use the free first two weeks to read the parts that
interest you, then cancel the subscription so you won't have to pay.
Alex

Jul 18 '05 #3

P: n/a
2mc:
Assume an array in Numerical Python, the contents of which are the
daily open, high, low, and last prices of the DOW Jones Industrial
Average for its entire history. How would one iterate throughout the
entire array calculating the 10 day average of daily high price or the
daily low price?


The common technique for moving averages is to maintain a single
accumulator value over the last n points. For each new point, remove (by
subtracting) the value at the beginning of the window and add in the value
at the end of the window. The value of the accumulator divided by n is the
moving average. You will need to define what you want as output, if any,
before the nth point.

Neil
Jul 18 '05 #4

P: n/a
Neil Hodgson wrote:
2mc:
Assume an array in Numerical Python, the contents of which are the
daily open, high, low, and last prices of the DOW Jones Industrial
Average for its entire history. How would one iterate throughout the
entire array calculating the 10 day average of daily high price or the
daily low price?


The common technique for moving averages is to maintain a single
accumulator value over the last n points. For each new point, remove (by
subtracting) the value at the beginning of the window and add in the value
at the end of the window. The value of the accumulator divided by n is the
moving average. You will need to define what you want as output, if any,
before the nth point.


While it might be a common technique, it's also inefficient (for
Numeric) because you have to iterate over tens of thousands of entries
in a Python loop. Fortunately, Numeric provides a much, much faster
way to do this using slicing.

Let's say D[i] is the daily high at the end of day i, and N is the
total number of days in the array. To calculate 10-day averages, you
can add ten slices offset by one, then divide the sum by 10:

A = zeros(N-9,Float)
for j in range(10):
A += D[j:N-9+j]
A /= 10.0

Bam, that's it. Instead of looping over tens of thousands of slices
of ten, you're now looping over ten slices of tens of thousands.

This works because, when you add ten slices offset by one, the result
is that each item contains the sum of ten consecutive numbers from the
original array.
--
CARL BANKS http://www.aerojockey.com/software

As the newest Lady Turnpot descended into the kitchen wrapped only in
her celery-green dressing gown, her creamy bosom rising and falling
like a temperamental souffle, her tart mouth pursed in distaste, the
sous-chef whispered to the scullery boy, "I don't know what to make of
her."
--Laurel Fortuner, Montendre, France
1992 Bulwer-Lytton Fiction Contest Winner
Jul 18 '05 #5

P: n/a
2mc
me***@gnosis.cx (David Mertz) wrote in message news:<ma************************************@pytho n.org>...
I'm not sure exactly where in the array the various highs and lows are
stored (different rows?).
I apologize for not making this clear. If an array were to be viewed
as a spreadsheet, then rows would be individual days and columns would
be date, open price, high price, etc.
But in general, an average is 'sum(highs)/len(highs)'. The version of
'sum()' in Numeric will work a lot faster though... although it makes no
difference for 10 prices. If you start worrying about a million prices,
you might see a significant boost using Numeric.
Assuming the entire daily price history of the Dow Jones (all of last
century through today), finding the 10 day average for each day would
*not* work significantly faster with Numeric? What if I were doing
the standard deviation of price for 25 days and I wanted this done for
each price - open, high, low, close? Still no speed enhancement of
any significance?
I have an intro article on Numerical Python forthcoming at IBM dW. But
the manual for Numeric is quite excellent to start with. Still, the
package is probably overkill for the simple and small operations
mentioned.

Yours, David...


I have been reading the manual. I'm still working to get the cobwebs
out of my mind - which are the ways I did things in the other language
I used.

Thank you for your input. And, I look forward to any further comments
my clarification may elicit. Thanks.

Matt
Jul 18 '05 #6

P: n/a
numarray, the planned successor to Numeric 23, has a function cumsum
(cumulative sum) which might be helpful.

numarray apears to be largely operational, you might consider it.

Colin W.

Carl Banks wrote:
Neil Hodgson wrote:
2mc:

Assume an array in Numerical Python, the contents of which are the
daily open, high, low, and last prices of the DOW Jones Industrial
Average for its entire history. How would one iterate throughout the
entire array calculating the 10 day average of daily high price or the
daily low price?


The common technique for moving averages is to maintain a single
accumulator value over the last n points. For each new point, remove (by
subtracting) the value at the beginning of the window and add in the value
at the end of the window. The value of the accumulator divided by n is the
moving average. You will need to define what you want as output, if any,
before the nth point.

While it might be a common technique, it's also inefficient (for
Numeric) because you have to iterate over tens of thousands of entries
in a Python loop. Fortunately, Numeric provides a much, much faster
way to do this using slicing.

Let's say D[i] is the daily high at the end of day i, and N is the
total number of days in the array. To calculate 10-day averages, you
can add ten slices offset by one, then divide the sum by 10:

A = zeros(N-9,Float)
for j in range(10):
A += D[j:N-9+j]
A /= 10.0

Bam, that's it. Instead of looping over tens of thousands of slices
of ten, you're now looping over ten slices of tens of thousands.

This works because, when you add ten slices offset by one, the result
is that each item contains the sum of ten consecutive numbers from the
original array.


Jul 18 '05 #7

P: n/a
2mc
Carl Banks <im*****@aerojockey.invalid> wrote in message news:<aw****************@nwrdny02.gnilink.net>...
Let's say D[i] is the daily high at the end of day i, and N is the
total number of days in the array. To calculate 10-day averages, you
can add ten slices offset by one, then divide the sum by 10:

A = zeros(N-9,Float)
for j in range(10):
A += D[j:N-9+j]
A /= 10.0

Bam, that's it. Instead of looping over tens of thousands of slices
of ten, you're now looping over ten slices of tens of thousands.

This works because, when you add ten slices offset by one, the result
is that each item contains the sum of ten consecutive numbers from the
original array.


Thank you very much for this code. It took me several looks at it,
before I understood it. I'm still getting the cobwebs out of my brain
over the way I used to do things.

I have 2 questions. This code snippet runs at "array speed," right?
In other words, it will run much faster than if I tried to duplicate
this with lists, correct?

Also, can you give me an idea of how you would accomplish the 10 day
standard deviation of prices instead of the 10 day average? It would
really help me get a handle on this.

Thank you.

Matt
Jul 18 '05 #8

P: n/a
2mc wrote:
Carl Banks <im*****@aerojockey.invalid> wrote in message news:<aw****************@nwrdny02.gnilink.net>...
Let's say D[i] is the daily high at the end of day i, and N is the
total number of days in the array. To calculate 10-day averages, you
can add ten slices offset by one, then divide the sum by 10:

A = zeros(N-9,Float)
for j in range(10):
A += D[j:N-9+j]
A /= 10.0

Bam, that's it. Instead of looping over tens of thousands of slices
of ten, you're now looping over ten slices of tens of thousands.

This works because, when you add ten slices offset by one, the result
is that each item contains the sum of ten consecutive numbers from the
original array.
Thank you very much for this code. It took me several looks at it,
before I understood it. I'm still getting the cobwebs out of my brain
over the way I used to do things.

I have 2 questions. This code snippet runs at "array speed," right?
In other words, it will run much faster than if I tried to duplicate
this with lists, correct?


Yes, it should be far faster. The main benefit of Numeric arrays is
they can do all kinds of array operations without having to wite a
loop in Python.

The main advantage of this method over the method using "sum" is that
this method operates on large arrays while iterating 10 times in
Python. The method using "sum" operates on 10-element arrays while
iterating 100 thousand or so times in Python. It's easy to see that
the former method puts Numeric is put to better use.

Also, can you give me an idea of how you would accomplish the 10 day
standard deviation of prices instead of the 10 day average? It would
really help me get a handle on this.


Well, I don't remember the exact formula for standard deviation--but
here is an example that does something to that effect (it uses the
average calculated above):

S = zeros(N-9,Float)
for j in range(10):
S += sqrt(D[j:N-9+j] - A)
S /= 10.0

The key is to think of the array slices as if they were regular scalar
values. The above method calculates the SD (not really) pretty much
the same way one would calculate the SD using regular old numbers.

Note that sqrt is Numeric.sqrt, not math.sqrt.
--
CARL BANKS http://www.aerojockey.com/software

As the newest Lady Turnpot descended into the kitchen wrapped only in
her celery-green dressing gown, her creamy bosom rising and falling
like a temperamental souffle, her tart mouth pursed in distaste, the
sous-chef whispered to the scullery boy, "I don't know what to make of
her."
--Laurel Fortuner, Montendre, France
1992 Bulwer-Lytton Fiction Contest Winner
Jul 18 '05 #9

P: n/a
"Neil Hodgson" <nh******@bigpond.net.au> wrote in message news:<S0********************@news-server.bigpond.net.au>...
2mc:
Assume an array in Numerical Python, the contents of which are the
daily open, high, low, and last prices of the DOW Jones Industrial
Average for its entire history. How would one iterate throughout the
entire array calculating the 10 day average of daily high price or the
daily low price?


The common technique for moving averages is to maintain a single
accumulator value over the last n points. For each new point, remove (by
subtracting) the value at the beginning of the window and add in the value
at the end of the window. The value of the accumulator divided by n is the
moving average. You will need to define what you want as output, if any,
before the nth point.

Neil

While this is common in c/c++, it is not the most efficient way in
python when you have numpy around to do the loops in c if used correctly.

I use the following setup

from Numeric import array, cumsum

def movavg(s, n):
''' returns an n period moving average for the time series s

s is a list ordered from oldest (index 0) to most recent (index -1)
n is an integer

returns a numeric array of the moving average
'''
s = array(s)
c = cumsum(s)
return (c[n-1:] - c[:-n+1]) / float(n)

This should run in near constant time with regard to n (of course,
O(n) to the length of s). At least one person has said yuk becuase
of the numerical issue of losing precission in the cumsum, but for
small n's, and values like you will see in stock prices and indices,
I don't think this is too much of a problem. Someone may have
a more numerically stable version, but then you could just implement
the c-code version and wrap it for python.

Sean
Jul 18 '05 #10

P: n/a
mc*****@bigfoot.com (2mc) wrote previously:
|I apologize for not making this clear. If an array were to be viewed
|as a spreadsheet, then rows would be individual days and columns would
|be date, open price, high price, etc.

Aside from the speed increase, Numeric provides quite a few handy syntax
tricks. For example, in a crude version of your problem, I first create
the "spreadsheet", and populate it with silly open/close/high/low
prices.
from Numeric import *
stock = zeros((5,20),Int)
stock[0,:] = range(20) # number the days
stock[1,:] = [10]*20 # open price
stock[2,:] = [11]*20 # close price
stock[3,:] = [13]*20 # high price
stock[4,:] = [8]*20 # low price
print stock [[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
[10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10]
[11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11]
[13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13]
[ 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8]]

Of course, it is an odd stock that opens, closes, and peaks at the same
price on every day. :-)

Next, I can find a particular average high for a range:
avg_high_day5to15 = sum(stocks[3,5:15])/len(stocks[3,5:15])
avg_high_day5to15 13

You might want to generalize it in a function:
def ten_day_avg_high(beg): ... return sum(stocks[3,beg:beg+10])/len(stocks[3,beg:beg+10])

Which let's you calculate your running averages:
map(ten_day_avg_high, range(20))

[13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13]

This works even near the end, where there are fewer than ten days to
average--we divide by the length of the tail, not by '10' to make sure
of that.

You can work out deviations and other statistics in similar ways.

Yours, David...

--
Keeping medicines from the bloodstreams of the sick; food from the bellies
of the hungry; books from the hands of the uneducated; technology from the
underdeveloped; and putting advocates of freedom in prisons. Intellectual
property is to the 21st century what the slave trade was to the 16th.

Jul 18 '05 #11

This discussion thread is closed

Replies have been disabled for this discussion.