472,357 Members | 2,015 Online

# why scipy cause my program slow?

Why the exec time of test(readdata()) and test(randomdata()) of
following program is different?
my test file 150Hz10dB.wav has 2586024 samples, so I set randomdata
function
to return a list with 2586024 samples.
the exec result is:
2586024
<type 'list'>
10.8603842736
2586024
<type 'list'>
2.16525233979
test(randomdata()) is 5x faster than test(readdata())
if I remove "from scipy import *" then I get the following result:
2586024
<type 'list'>
2.21851601473
2586024
<type 'list'>
2.13885042216

So, what the problem with scipy?
Python 2.4.2, scipy ver. 0.5.1
import wave
from scipy import *
from time import *
import random
from array import array

f = wave.open("150Hz10dB.wav", "rb")
t = f.getparams()
SampleRate = t[2]
f.close()
left = data[0::2]
mean = sum(left)/float(len(left))
left = [abs(x-mean) for x in left]
return left

def randomdata():
return [random.random()*32768.0 for i in xrange(2586024)]

def test(data):
print len(data)
print type(data)
envelop = []
e = 0.0
ga, gr = 0.977579425259, 0.999773268338
ga1, gr1 = 1.0 - ga, 1.0 - gr
start = clock()
for x in data:
if e < x:
e *= ga
e += ga1*x
else:
e *= gr
e += gr1*x
envelop.append(e)
print clock() - start
return envelop

test(randomdata())

Jan 16 '07 #1
4 2737
HYRY wrote:
Why the exec time of test(readdata()) and test(randomdata()) of
following program is different?
my test file 150Hz10dB.wav has 2586024 samples, so I set randomdata
function
to return a list with 2586024 samples.
the exec result is:
2586024
<type 'list'>
10.8603842736
2586024
<type 'list'>
2.16525233979
test(randomdata()) is 5x faster than test(readdata())
if I remove "from scipy import *" then I get the following result:
2586024
<type 'list'>
2.21851601473
2586024
<type 'list'>
2.13885042216

So, what the problem with scipy?
You're importing (through scipy) numpy's sum() function. The result type of that
function is a numpy scalar type. The set of scalar types was introduced for a
number of reasons, mostly having to do with being able to represent the full
range of numerical datatypes that Python does not have builtin types for.
Unfortunately, the code paths that get executed when arithmetic is performed
sith such scalars are still suboptimal; I believe they are still going through
the full ufunc machinery.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
an underlying truth."
-- Umberto Eco

Jan 16 '07 #2
Thanks, by your hint, I change type(data) to type(data[0]), and I get
<type 'float'>
<type 'numpy.float64'>
So, calculate with float is about 5x faster numpy.float64.

Robert Kern wrote:
HYRY wrote:
Why the exec time of test(readdata()) and test(randomdata()) of
following program is different?
my test file 150Hz10dB.wav has 2586024 samples, so I set randomdata
function
to return a list with 2586024 samples.
the exec result is:
2586024
<type 'list'>
10.8603842736
2586024
<type 'list'>
2.16525233979
test(randomdata()) is 5x faster than test(readdata())
if I remove "from scipy import *" then I get the following result:
2586024
<type 'list'>
2.21851601473
2586024
<type 'list'>
2.13885042216

So, what the problem with scipy?

You're importing (through scipy) numpy's sum() function. The result type of that
function is a numpy scalar type. The set of scalar types was introduced for a
number of reasons, mostly having to do with being able to represent the full
range of numerical datatypes that Python does not have builtin types for.
Unfortunately, the code paths that get executed when arithmetic is performed
sith such scalars are still suboptimal; I believe they are still going through
the full ufunc machinery.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
an underlying truth."
-- Umberto Eco
Jan 16 '07 #3
HYRY wrote:
Thanks, by your hint, I change type(data) to type(data[0]), and I get
<type 'float'>
<type 'numpy.float64'>
So, calculate with float is about 5x faster numpy.float64.
approx..
numpy funcs all upcast int to int32 and float to float32 and
int32/float to float32 etc. This is probably ill behavior.
float32 arrays should only arise if numpy.array(l,dtype=numpy.float32)

In your example you'll best go to numpy/scipy types very early
(not mixing with the python array type in addition) and do the
array computations with scipy

left = [abs(x-mean) for x in left]

->

...
left = abs(left-mean)

and cast types down to Python types late like float(mynumfloat) ...
The type magic and speed loss will and pickle problems will
probably only disapear, when float & int are handled as extra
(more conservative) types in numpy - with numpy scalar types only
on request. Currently numpy uses Python.
Robert
Jan 16 '07 #4
Robert Kern wrote:
HYRY wrote:
>>Why the exec time of test(readdata()) and test(randomdata()) of
following program is different?
my test file 150Hz10dB.wav has 2586024 samples, so I set randomdata
function
to return a list with 2586024 samples.
the exec result is:
2586024
<type 'list'>
10.8603842736
2586024
<type 'list'>
2.16525233979
test(randomdata()) is 5x faster than test(readdata())
if I remove "from scipy import *" then I get the following result:
2586024
<type 'list'>
2.21851601473
2586024
<type 'list'>
2.13885042216

So, what the problem with scipy?

You're importing (through scipy) numpy's sum() function. The result type of that
function is a numpy scalar type. The set of scalar types was introduced for a
number of reasons, mostly having to do with being able to represent the full
range of numerical datatypes that Python does not have builtin types for.
Unfortunately, the code paths that get executed when arithmetic is performed
sith such scalars are still suboptimal; I believe they are still going through
the full ufunc machinery.
This should not be true in the 1.0 release of NumPy. The numpy scalars
do their own math which has less overhead than ufunc-based math. But,
there is still more overhead than with simple floats because mixed-type
arithmetic is handled more generically (the same algorithm covers all
the cases).

The speed could be improved but hasn't been because it is so easy to get
a Python float if you are concerned about speed.

-Travis

Jan 17 '07 #5

This thread has been closed and replies have been disabled. Please start a new discussion.