473,326 Members | 2,012 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

Numerical Python question

2mc
I'm new to Python and to Numerical Python. I have a program written
in another program that used arrays extensively. I'm trying to
convert to Python.

Assume an array in Numerical Python, the contents of which are the
daily open, high, low, and last prices of the DOW Jones Industrial
Average for its entire history. How would one iterate throughout the
entire array calculating the 10 day average of daily high price or the
daily low price?

If someone could write some pseudo-code that would point me in the
right direction, I would be most appreciative.

Thanks.
Jul 18 '05 #1
10 3217
mc*****@bigfoot.com (2mc) wrote previously:
|Assume an array in Numerical Python, the contents of which are the
|daily open, high, low, and last prices of the DOW Jones Industrial
|Average for its entire history.

I'm not sure exactly where in the array the various highs and lows are
stored (different rows?).

But in general, an average is 'sum(highs)/len(highs)'. The version of
'sum()' in Numeric will work a lot faster though... although it makes no
difference for 10 prices. If you start worrying about a million prices,
you might see a significant boost using Numeric.

I have an intro article on Numerical Python forthcoming at IBM dW. But
the manual for Numeric is quite excellent to start with. Still, the
package is probably overkill for the simple and small operations
mentioned.

Yours, David...

--
mertz@ | The specter of free information is haunting the `Net! All the
gnosis | powers of IP- and crypto-tyranny have entered into an unholy
..cx | alliance...ideas have nothing to lose but their chains. Unite
| against "intellectual property" and anti-privacy regimes!
-------------------------------------------------------------------------
Jul 18 '05 #2
David Mertz wrote:
...
But in general, an average is 'sum(highs)/len(highs)'. The version of
'sum()' in Numeric will work a lot faster though... although it makes no
difference for 10 prices. If you start worrying about a million prices,
you might see a significant boost using Numeric.


No kidding: Numeric.sum is an *order of magnitude faster* for this:

[alex@lancelot pop]$ timeit.py -s'import Numeric'
-s'x=Numeric.array(map(float,range(1000000)))' 'Numeric.sum(x)'
10 loops, best of 3: 2.11e+04 usec per loop

[alex@lancelot pop]$ timeit.py -s'import Numeric'
-s'x=Numeric.array(map(float,range(1000000)))' 'sum(x)'
10 loops, best of 3: 3.05e+05 usec per loop

the difference between 21 milliseconds with Numeric.sum, and 300 with
the built-in sum, can be *very* significant if you sum millions of floats in
your bottlenecks.

I think the chapter on Numeric in "Python in a Nutshell" is a good intro
(thanks to lots of crucial input from Eric Jones and Paul DuBois!), and
you can read it for free with the usual trick -- get an O'Reilly safari
subscription and use the free first two weeks to read the parts that
interest you, then cancel the subscription so you won't have to pay.
Alex

Jul 18 '05 #3
2mc:
Assume an array in Numerical Python, the contents of which are the
daily open, high, low, and last prices of the DOW Jones Industrial
Average for its entire history. How would one iterate throughout the
entire array calculating the 10 day average of daily high price or the
daily low price?


The common technique for moving averages is to maintain a single
accumulator value over the last n points. For each new point, remove (by
subtracting) the value at the beginning of the window and add in the value
at the end of the window. The value of the accumulator divided by n is the
moving average. You will need to define what you want as output, if any,
before the nth point.

Neil
Jul 18 '05 #4
Neil Hodgson wrote:
2mc:
Assume an array in Numerical Python, the contents of which are the
daily open, high, low, and last prices of the DOW Jones Industrial
Average for its entire history. How would one iterate throughout the
entire array calculating the 10 day average of daily high price or the
daily low price?


The common technique for moving averages is to maintain a single
accumulator value over the last n points. For each new point, remove (by
subtracting) the value at the beginning of the window and add in the value
at the end of the window. The value of the accumulator divided by n is the
moving average. You will need to define what you want as output, if any,
before the nth point.


While it might be a common technique, it's also inefficient (for
Numeric) because you have to iterate over tens of thousands of entries
in a Python loop. Fortunately, Numeric provides a much, much faster
way to do this using slicing.

Let's say D[i] is the daily high at the end of day i, and N is the
total number of days in the array. To calculate 10-day averages, you
can add ten slices offset by one, then divide the sum by 10:

A = zeros(N-9,Float)
for j in range(10):
A += D[j:N-9+j]
A /= 10.0

Bam, that's it. Instead of looping over tens of thousands of slices
of ten, you're now looping over ten slices of tens of thousands.

This works because, when you add ten slices offset by one, the result
is that each item contains the sum of ten consecutive numbers from the
original array.
--
CARL BANKS http://www.aerojockey.com/software

As the newest Lady Turnpot descended into the kitchen wrapped only in
her celery-green dressing gown, her creamy bosom rising and falling
like a temperamental souffle, her tart mouth pursed in distaste, the
sous-chef whispered to the scullery boy, "I don't know what to make of
her."
--Laurel Fortuner, Montendre, France
1992 Bulwer-Lytton Fiction Contest Winner
Jul 18 '05 #5
2mc
me***@gnosis.cx (David Mertz) wrote in message news:<ma************************************@pytho n.org>...
I'm not sure exactly where in the array the various highs and lows are
stored (different rows?).
I apologize for not making this clear. If an array were to be viewed
as a spreadsheet, then rows would be individual days and columns would
be date, open price, high price, etc.
But in general, an average is 'sum(highs)/len(highs)'. The version of
'sum()' in Numeric will work a lot faster though... although it makes no
difference for 10 prices. If you start worrying about a million prices,
you might see a significant boost using Numeric.
Assuming the entire daily price history of the Dow Jones (all of last
century through today), finding the 10 day average for each day would
*not* work significantly faster with Numeric? What if I were doing
the standard deviation of price for 25 days and I wanted this done for
each price - open, high, low, close? Still no speed enhancement of
any significance?
I have an intro article on Numerical Python forthcoming at IBM dW. But
the manual for Numeric is quite excellent to start with. Still, the
package is probably overkill for the simple and small operations
mentioned.

Yours, David...


I have been reading the manual. I'm still working to get the cobwebs
out of my mind - which are the ways I did things in the other language
I used.

Thank you for your input. And, I look forward to any further comments
my clarification may elicit. Thanks.

Matt
Jul 18 '05 #6
numarray, the planned successor to Numeric 23, has a function cumsum
(cumulative sum) which might be helpful.

numarray apears to be largely operational, you might consider it.

Colin W.

Carl Banks wrote:
Neil Hodgson wrote:
2mc:

Assume an array in Numerical Python, the contents of which are the
daily open, high, low, and last prices of the DOW Jones Industrial
Average for its entire history. How would one iterate throughout the
entire array calculating the 10 day average of daily high price or the
daily low price?


The common technique for moving averages is to maintain a single
accumulator value over the last n points. For each new point, remove (by
subtracting) the value at the beginning of the window and add in the value
at the end of the window. The value of the accumulator divided by n is the
moving average. You will need to define what you want as output, if any,
before the nth point.

While it might be a common technique, it's also inefficient (for
Numeric) because you have to iterate over tens of thousands of entries
in a Python loop. Fortunately, Numeric provides a much, much faster
way to do this using slicing.

Let's say D[i] is the daily high at the end of day i, and N is the
total number of days in the array. To calculate 10-day averages, you
can add ten slices offset by one, then divide the sum by 10:

A = zeros(N-9,Float)
for j in range(10):
A += D[j:N-9+j]
A /= 10.0

Bam, that's it. Instead of looping over tens of thousands of slices
of ten, you're now looping over ten slices of tens of thousands.

This works because, when you add ten slices offset by one, the result
is that each item contains the sum of ten consecutive numbers from the
original array.


Jul 18 '05 #7
2mc
Carl Banks <im*****@aerojockey.invalid> wrote in message news:<aw****************@nwrdny02.gnilink.net>...
Let's say D[i] is the daily high at the end of day i, and N is the
total number of days in the array. To calculate 10-day averages, you
can add ten slices offset by one, then divide the sum by 10:

A = zeros(N-9,Float)
for j in range(10):
A += D[j:N-9+j]
A /= 10.0

Bam, that's it. Instead of looping over tens of thousands of slices
of ten, you're now looping over ten slices of tens of thousands.

This works because, when you add ten slices offset by one, the result
is that each item contains the sum of ten consecutive numbers from the
original array.


Thank you very much for this code. It took me several looks at it,
before I understood it. I'm still getting the cobwebs out of my brain
over the way I used to do things.

I have 2 questions. This code snippet runs at "array speed," right?
In other words, it will run much faster than if I tried to duplicate
this with lists, correct?

Also, can you give me an idea of how you would accomplish the 10 day
standard deviation of prices instead of the 10 day average? It would
really help me get a handle on this.

Thank you.

Matt
Jul 18 '05 #8
2mc wrote:
Carl Banks <im*****@aerojockey.invalid> wrote in message news:<aw****************@nwrdny02.gnilink.net>...
Let's say D[i] is the daily high at the end of day i, and N is the
total number of days in the array. To calculate 10-day averages, you
can add ten slices offset by one, then divide the sum by 10:

A = zeros(N-9,Float)
for j in range(10):
A += D[j:N-9+j]
A /= 10.0

Bam, that's it. Instead of looping over tens of thousands of slices
of ten, you're now looping over ten slices of tens of thousands.

This works because, when you add ten slices offset by one, the result
is that each item contains the sum of ten consecutive numbers from the
original array.
Thank you very much for this code. It took me several looks at it,
before I understood it. I'm still getting the cobwebs out of my brain
over the way I used to do things.

I have 2 questions. This code snippet runs at "array speed," right?
In other words, it will run much faster than if I tried to duplicate
this with lists, correct?


Yes, it should be far faster. The main benefit of Numeric arrays is
they can do all kinds of array operations without having to wite a
loop in Python.

The main advantage of this method over the method using "sum" is that
this method operates on large arrays while iterating 10 times in
Python. The method using "sum" operates on 10-element arrays while
iterating 100 thousand or so times in Python. It's easy to see that
the former method puts Numeric is put to better use.

Also, can you give me an idea of how you would accomplish the 10 day
standard deviation of prices instead of the 10 day average? It would
really help me get a handle on this.


Well, I don't remember the exact formula for standard deviation--but
here is an example that does something to that effect (it uses the
average calculated above):

S = zeros(N-9,Float)
for j in range(10):
S += sqrt(D[j:N-9+j] - A)
S /= 10.0

The key is to think of the array slices as if they were regular scalar
values. The above method calculates the SD (not really) pretty much
the same way one would calculate the SD using regular old numbers.

Note that sqrt is Numeric.sqrt, not math.sqrt.
--
CARL BANKS http://www.aerojockey.com/software

As the newest Lady Turnpot descended into the kitchen wrapped only in
her celery-green dressing gown, her creamy bosom rising and falling
like a temperamental souffle, her tart mouth pursed in distaste, the
sous-chef whispered to the scullery boy, "I don't know what to make of
her."
--Laurel Fortuner, Montendre, France
1992 Bulwer-Lytton Fiction Contest Winner
Jul 18 '05 #9
"Neil Hodgson" <nh******@bigpond.net.au> wrote in message news:<S0********************@news-server.bigpond.net.au>...
2mc:
Assume an array in Numerical Python, the contents of which are the
daily open, high, low, and last prices of the DOW Jones Industrial
Average for its entire history. How would one iterate throughout the
entire array calculating the 10 day average of daily high price or the
daily low price?


The common technique for moving averages is to maintain a single
accumulator value over the last n points. For each new point, remove (by
subtracting) the value at the beginning of the window and add in the value
at the end of the window. The value of the accumulator divided by n is the
moving average. You will need to define what you want as output, if any,
before the nth point.

Neil

While this is common in c/c++, it is not the most efficient way in
python when you have numpy around to do the loops in c if used correctly.

I use the following setup

from Numeric import array, cumsum

def movavg(s, n):
''' returns an n period moving average for the time series s

s is a list ordered from oldest (index 0) to most recent (index -1)
n is an integer

returns a numeric array of the moving average
'''
s = array(s)
c = cumsum(s)
return (c[n-1:] - c[:-n+1]) / float(n)

This should run in near constant time with regard to n (of course,
O(n) to the length of s). At least one person has said yuk becuase
of the numerical issue of losing precission in the cumsum, but for
small n's, and values like you will see in stock prices and indices,
I don't think this is too much of a problem. Someone may have
a more numerically stable version, but then you could just implement
the c-code version and wrap it for python.

Sean
Jul 18 '05 #10
mc*****@bigfoot.com (2mc) wrote previously:
|I apologize for not making this clear. If an array were to be viewed
|as a spreadsheet, then rows would be individual days and columns would
|be date, open price, high price, etc.

Aside from the speed increase, Numeric provides quite a few handy syntax
tricks. For example, in a crude version of your problem, I first create
the "spreadsheet", and populate it with silly open/close/high/low
prices.
from Numeric import *
stock = zeros((5,20),Int)
stock[0,:] = range(20) # number the days
stock[1,:] = [10]*20 # open price
stock[2,:] = [11]*20 # close price
stock[3,:] = [13]*20 # high price
stock[4,:] = [8]*20 # low price
print stock [[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
[10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10]
[11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11]
[13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13]
[ 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8]]

Of course, it is an odd stock that opens, closes, and peaks at the same
price on every day. :-)

Next, I can find a particular average high for a range:
avg_high_day5to15 = sum(stocks[3,5:15])/len(stocks[3,5:15])
avg_high_day5to15 13

You might want to generalize it in a function:
def ten_day_avg_high(beg): ... return sum(stocks[3,beg:beg+10])/len(stocks[3,beg:beg+10])

Which let's you calculate your running averages:
map(ten_day_avg_high, range(20))

[13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13]

This works even near the end, where there are fewer than ten days to
average--we divide by the length of the tail, not by '10' to make sure
of that.

You can work out deviations and other statistics in similar ways.

Yours, David...

--
Keeping medicines from the bloodstreams of the sick; food from the bellies
of the hungry; books from the hands of the uneducated; technology from the
underdeveloped; and putting advocates of freedom in prisons. Intellectual
property is to the 21st century what the slave trade was to the 16th.

Jul 18 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Anthony Calabria | last post by:
Is anyone aware of iterative numerical optimzaiton routines in Java and/or Python? I'm looking for a routine similar to the fmincon in Matlab, powell in IDL, solver in Excel. There appears to be...
14
by: 2mc | last post by:
Generally speaking, if one had a list (from regular Python) and an array (from Numerical Python) that contained the same number of elements, would a While loop or a For loop process them at the...
9
by: Carl | last post by:
I have been using Python for quite some time now and I love it. I use it mainly for explorative computing and numerical prototyping, ie testing and trying out different kinds of algorithms and...
2
by: Uwe Mayer | last post by:
Hi, Using PyQt I got a QListView with about 800 entries now (but its intended to be scalable up to about 3000). The first column contains numerical data. Now Qt does the sorting all by its self...
20
by: Brian Kazian | last post by:
Here's my problem, and hopefully someone can help me figure out if there is a good way to do this. I am writing a program that allows the user to enter an equation in a text field using...
81
by: Jonas Smithson | last post by:
I recently read the claim somewhere that numerical entities (such as —) have a speed advantage over the equivalent named entities (such as &mdash;) because the numerical entity requires just a...
11
by: lcw1964 | last post by:
Greetings groups! I am a rank novice in both C programming and numerical analysis, so I ask in advance your indulgence. Also, this question is directed specifically to those familiar with Numerical...
17
by: Albert Hopkins | last post by:
This issue may have been referred to in news:<mailman.1864.1196703799.13605.python-list@python.orgbut I didn't entirely understand the explanation. Basically I have this: 6.0 nan 6.0 nan ...
0
by: Johannes Nix | last post by:
Hi, this might be of interest for people who are look for practical information on doing real-time signal processing, possibly using multiple CPUs, and wonder whether it's possible to use...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.