# why scipy cause my program slow?

 Why the exec time of test(readdata()) and test(randomdata()) of following program is different? my test file 150Hz10dB.wav has 2586024 samples, so I set randomdata function to return a list with 2586024 samples. the exec result is: 2586024 10.8603842736 2586024 2.16525233979 test(randomdata()) is 5x faster than test(readdata()) if I remove "from scipy import *" then I get the following result: 2586024 2.21851601473 2586024 2.13885042216 So, what the problem with scipy? Python 2.4.2, scipy ver. 0.5.1 import wave from scipy import * from time import * import random from array import array def readdata(): f = wave.open("150Hz10dB.wav", "rb") t = f.getparams() SampleRate = t[2] data = array("h", f.readframes(t[3])) f.close() left = data[0::2] mean = sum(left)/float(len(left)) left = [abs(x-mean) for x in left] return left def randomdata(): return [random.random()*32768.0 for i in xrange(2586024)] def test(data): print len(data) print type(data) envelop = [] e = 0.0 ga, gr = 0.977579425259, 0.999773268338 ga1, gr1 = 1.0 - ga, 1.0 - gr start = clock() for x in data: if e < x: e *= ga e += ga1*x else: e *= gr e += gr1*x envelop.append(e) print clock() - start return envelop test(readdata()) test(randomdata()) Jan 16 '07 #1
 HYRY wrote: Why the exec time of test(readdata()) and test(randomdata()) of following program is different? my test file 150Hz10dB.wav has 2586024 samples, so I set randomdata function to return a list with 2586024 samples. the exec result is: 2586024 10.8603842736 2586024 2.16525233979 test(randomdata()) is 5x faster than test(readdata()) if I remove "from scipy import *" then I get the following result: 2586024 2.21851601473 2586024 2.13885042216 So, what the problem with scipy? You're importing (through scipy) numpy's sum() function. The result type of that function is a numpy scalar type. The set of scalar types was introduced for a number of reasons, mostly having to do with being able to represent the full range of numerical datatypes that Python does not have builtin types for. Unfortunately, the code paths that get executed when arithmetic is performed sith such scalars are still suboptimal; I believe they are still going through the full ufunc machinery. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco Jan 16 '07 #2
 Thanks, by your hint, I change type(data) to type(data[0]), and I get So, calculate with float is about 5x faster numpy.float64. Robert Kern wrote: HYRY wrote: Why the exec time of test(readdata()) and test(randomdata()) of following program is different? my test file 150Hz10dB.wav has 2586024 samples, so I set randomdata function to return a list with 2586024 samples. the exec result is: 2586024 10.8603842736 2586024 2.16525233979 test(randomdata()) is 5x faster than test(readdata()) if I remove "from scipy import *" then I get the following result: 2586024 2.21851601473 2586024 2.13885042216 So, what the problem with scipy? You're importing (through scipy) numpy's sum() function. The result type of that function is a numpy scalar type. The set of scalar types was introduced for a number of reasons, mostly having to do with being able to represent the full range of numerical datatypes that Python does not have builtin types for. Unfortunately, the code paths that get executed when arithmetic is performed sith such scalars are still suboptimal; I believe they are still going through the full ufunc machinery. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco Jan 16 '07 #3
 HYRY wrote: Thanks, by your hint, I change type(data) to type(data[0]), and I get So, calculate with float is about 5x faster numpy.float64. approx.. numpy funcs all upcast int to int32 and float to float32 and int32/float to float32 etc. This is probably ill behavior. float32 arrays should only arise if numpy.array(l,dtype=numpy.float32) In your example you'll best go to numpy/scipy types very early (not mixing with the python array type in addition) and do the array computations with scipy left = [abs(x-mean) for x in left] -> data = scipy.array(f.readframes(t[3]),"h") ... left = abs(left-mean) code the test(data) similar - see also scipy.signal.lfilter etc. and cast types down to Python types late like float(mynumfloat) ... The type magic and speed loss will and pickle problems will probably only disapear, when float & int are handled as extra (more conservative) types in numpy - with numpy scalar types only on request. Currently numpy uses Python. Robert Jan 16 '07 #4
 Robert Kern wrote: HYRY wrote: >>Why the exec time of test(readdata()) and test(randomdata()) offollowing program is different?my test file 150Hz10dB.wav has 2586024 samples, so I set randomdatafunctionto return a list with 2586024 samples.the exec result is:258602410.860384273625860242.16525233979test(randomdata()) is 5x faster than test(readdata())if I remove "from scipy import *" then I get the following result:25860242.2185160147325860242.13885042216So, what the problem with scipy? You're importing (through scipy) numpy's sum() function. The result type of that function is a numpy scalar type. The set of scalar types was introduced for a number of reasons, mostly having to do with being able to represent the full range of numerical datatypes that Python does not have builtin types for. Unfortunately, the code paths that get executed when arithmetic is performed sith such scalars are still suboptimal; I believe they are still going through the full ufunc machinery. This should not be true in the 1.0 release of NumPy. The numpy scalars do their own math which has less overhead than ufunc-based math. But, there is still more overhead than with simple floats because mixed-type arithmetic is handled more generically (the same algorithm covers all the cases). The speed could be improved but hasn't been because it is so easy to get a Python float if you are concerned about speed. -Travis Jan 17 '07 #5

