By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,089 Members | 2,418 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,089 IT Pros & Developers. It's quick & easy.

why scipy cause my program slow?

P: n/a
Why the exec time of test(readdata()) and test(randomdata()) of
following program is different?
my test file 150Hz10dB.wav has 2586024 samples, so I set randomdata
function
to return a list with 2586024 samples.
the exec result is:
2586024
<type 'list'>
10.8603842736
2586024
<type 'list'>
2.16525233979
test(randomdata()) is 5x faster than test(readdata())
if I remove "from scipy import *" then I get the following result:
2586024
<type 'list'>
2.21851601473
2586024
<type 'list'>
2.13885042216

So, what the problem with scipy?
Python 2.4.2, scipy ver. 0.5.1
import wave
from scipy import *
from time import *
import random
from array import array

def readdata():
f = wave.open("150Hz10dB.wav", "rb")
t = f.getparams()
SampleRate = t[2]
data = array("h", f.readframes(t[3]))
f.close()
left = data[0::2]
mean = sum(left)/float(len(left))
left = [abs(x-mean) for x in left]
return left

def randomdata():
return [random.random()*32768.0 for i in xrange(2586024)]

def test(data):
print len(data)
print type(data)
envelop = []
e = 0.0
ga, gr = 0.977579425259, 0.999773268338
ga1, gr1 = 1.0 - ga, 1.0 - gr
start = clock()
for x in data:
if e < x:
e *= ga
e += ga1*x
else:
e *= gr
e += gr1*x
envelop.append(e)
print clock() - start
return envelop

test(readdata())
test(randomdata())

Jan 16 '07 #1
Share this Question
Share on Google+
4 Replies


P: n/a
HYRY wrote:
Why the exec time of test(readdata()) and test(randomdata()) of
following program is different?
my test file 150Hz10dB.wav has 2586024 samples, so I set randomdata
function
to return a list with 2586024 samples.
the exec result is:
2586024
<type 'list'>
10.8603842736
2586024
<type 'list'>
2.16525233979
test(randomdata()) is 5x faster than test(readdata())
if I remove "from scipy import *" then I get the following result:
2586024
<type 'list'>
2.21851601473
2586024
<type 'list'>
2.13885042216

So, what the problem with scipy?
You're importing (through scipy) numpy's sum() function. The result type of that
function is a numpy scalar type. The set of scalar types was introduced for a
number of reasons, mostly having to do with being able to represent the full
range of numerical datatypes that Python does not have builtin types for.
Unfortunately, the code paths that get executed when arithmetic is performed
sith such scalars are still suboptimal; I believe they are still going through
the full ufunc machinery.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Jan 16 '07 #2

P: n/a
Thanks, by your hint, I change type(data) to type(data[0]), and I get
<type 'float'>
<type 'numpy.float64'>
So, calculate with float is about 5x faster numpy.float64.

Robert Kern wrote:
HYRY wrote:
Why the exec time of test(readdata()) and test(randomdata()) of
following program is different?
my test file 150Hz10dB.wav has 2586024 samples, so I set randomdata
function
to return a list with 2586024 samples.
the exec result is:
2586024
<type 'list'>
10.8603842736
2586024
<type 'list'>
2.16525233979
test(randomdata()) is 5x faster than test(readdata())
if I remove "from scipy import *" then I get the following result:
2586024
<type 'list'>
2.21851601473
2586024
<type 'list'>
2.13885042216

So, what the problem with scipy?

You're importing (through scipy) numpy's sum() function. The result type of that
function is a numpy scalar type. The set of scalar types was introduced for a
number of reasons, mostly having to do with being able to represent the full
range of numerical datatypes that Python does not have builtin types for.
Unfortunately, the code paths that get executed when arithmetic is performed
sith such scalars are still suboptimal; I believe they are still going through
the full ufunc machinery.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
Jan 16 '07 #3

P: n/a
HYRY wrote:
Thanks, by your hint, I change type(data) to type(data[0]), and I get
<type 'float'>
<type 'numpy.float64'>
So, calculate with float is about 5x faster numpy.float64.
approx..
numpy funcs all upcast int to int32 and float to float32 and
int32/float to float32 etc. This is probably ill behavior.
float32 arrays should only arise if numpy.array(l,dtype=numpy.float32)

In your example you'll best go to numpy/scipy types very early
(not mixing with the python array type in addition) and do the
array computations with scipy

left = [abs(x-mean) for x in left]

->

data = scipy.array(f.readframes(t[3]),"h")
...
left = abs(left-mean)

code the test(data) similar - see also scipy.signal.lfilter etc.

and cast types down to Python types late like float(mynumfloat) ...
The type magic and speed loss will and pickle problems will
probably only disapear, when float & int are handled as extra
(more conservative) types in numpy - with numpy scalar types only
on request. Currently numpy uses Python.
Robert
Jan 16 '07 #4

P: n/a
Robert Kern wrote:
HYRY wrote:
>>Why the exec time of test(readdata()) and test(randomdata()) of
following program is different?
my test file 150Hz10dB.wav has 2586024 samples, so I set randomdata
function
to return a list with 2586024 samples.
the exec result is:
2586024
<type 'list'>
10.8603842736
2586024
<type 'list'>
2.16525233979
test(randomdata()) is 5x faster than test(readdata())
if I remove "from scipy import *" then I get the following result:
2586024
<type 'list'>
2.21851601473
2586024
<type 'list'>
2.13885042216

So, what the problem with scipy?


You're importing (through scipy) numpy's sum() function. The result type of that
function is a numpy scalar type. The set of scalar types was introduced for a
number of reasons, mostly having to do with being able to represent the full
range of numerical datatypes that Python does not have builtin types for.
Unfortunately, the code paths that get executed when arithmetic is performed
sith such scalars are still suboptimal; I believe they are still going through
the full ufunc machinery.
This should not be true in the 1.0 release of NumPy. The numpy scalars
do their own math which has less overhead than ufunc-based math. But,
there is still more overhead than with simple floats because mixed-type
arithmetic is handled more generically (the same algorithm covers all
the cases).

The speed could be improved but hasn't been because it is so easy to get
a Python float if you are concerned about speed.

-Travis

Jan 17 '07 #5

This discussion thread is closed

Replies have been disabled for this discussion.