469,962 Members | 2,347 Online

Computing correlations with SciPy

I want to compute the correlation between two sequences X and Y, and
tried using SciPy to do so without success.l Here's what I have, how
can I correct it?
X = [1, 2, 3, 4, 5]
Y = [5, 4, 3, 2, 1]
import scipy
scipy.corrcoef(X,Y) Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "C:\Python24\Lib\site-packages\numpy\lib\function_base.py", line
671, in corrcoef
d = diag(c)
File "C:\Python24\Lib\site-packages\numpy\lib\twodim_base.py", line
80, in diag
raise ValueError, "Input must be 1- or 2-d."
ValueError: Input must be 1- or 2-d.

Thomas Philips

Mar 16 '06 #1
4 5461 Em Qui, 2006-03-16 Ã*s 07:49 -0800, tk****@hotmail.com escreveu:
I want to compute the correlation between two sequences X and Y, and
tried using SciPy to do so without success.l Here's what I have, how
can I correct it?

\$ python2.4
Python 2.4.2 (#2, Nov 20 2005, 17:04:48)
[GCC 4.0.3 20051111 (prerelease) (Debian 4.0.2-4)] on linux2
x = [1,2,3,4,5]
y = [5,4,3,2,1]
import scipy
scipy.corrcoef(x, y) array([[ 1., -1.],
[-1., 1.]]) # Looks fine for me...

Mar 16 '06 #2
>>>>> "tkpmep" == tkpmep <tk****@hotmail.com> writes:

tkpmep> I want to compute the correlation between two sequences X
tkpmep> and Y, and tried using SciPy to do so without success.l
tkpmep> Here's what I have, how can I correct it?
X = [1, 2, 3, 4, 5] Y = [5, 4, 3, 2, 1] import scipy
scipy.corrcoef(X,Y) tkpmep> Traceback (most recent call last): File "<interactive
tkpmep> input>", line 1, in ? File
tkpmep> "C:\Python24\Lib\site-packages\numpy\lib\function_base.py",
tkpmep> line 671, in corrcoef d = diag(c) File
tkpmep> "C:\Python24\Lib\site-packages\numpy\lib\twodim_base.py",
tkpmep> line 80, in diag raise ValueError, "Input must be 1- or
tkpmep> 2-d." ValueError: Input must be 1- or 2-d.

Hmm, this may be a bug in scipy. matplotlib also defines a corrcoef
function, which you may want to use until this problem gets sorted out

In : matplotlib.mlab.corrcoef(X,Y)

In : X = [1, 2, 3, 4, 5]

In : Y = [5, 4, 3, 2, 1]

In : matplotlib.mlab.corrcoef(X,Y)
Out:
array([[ 1., -1.],
[-1., 1.]])
Mar 16 '06 #3
tk****@hotmail.com wrote:
I want to compute the correlation between two sequences X and Y, and
tried using SciPy to do so without success.l Here's what I have, how
can I correct it?

This was a bug in NumPy (inherited from Numeric actually). The fix is
in SVN of NumPy.

Here are the new versions of those functions that should work as you
wish (again, these are in SVN, but perhaps you have a binary install).

These functions belong in <site-packages>/numpy/lib/function_base.py

def cov(m,y=None, rowvar=1, bias=0):
"""Estimate the covariance matrix.

If m is a vector, return the variance. For matrices return the
covariance matrix.

If y is given it is treated as an additional (set of)
variable(s).

Normalization is by (N-1) where N is the number of observations
(unbiased estimate). If bias is 1 then normalization is by N.

If rowvar is non-zero (default), then each row is a variable with
observations in the columns, otherwise each column
is a variable and the observations are in the rows.
"""

X = asarray(m,ndmin=2)
if X.shape == 1:
rowvar = 1
if rowvar:
axis = 0
tup = (slice(None),newaxis)
else:
axis = 1
tup = (newaxis, slice(None))
if y is not None:
y = asarray(y,ndmin=2)
X = concatenate((X,y),axis)

X -= X.mean(axis=1-axis)[tup]
if rowvar:
N = X.shape
else:
N = X.shape

if bias:
fact = N*1.0
else:
fact = N-1.0

if not rowvar:
return (dot(X.transpose(), X.conj()) / fact).squeeze()
else:
return (dot(X,X.transpose().conj())/fact).squeeze()

def corrcoef(x, y=None, rowvar=1, bias=0):
"""The correlation coefficients
"""
c = cov(x, y, rowvar, bias)
try:
d = diag(c)
except ValueError: # scalar covariance
return 1
return c/sqrt(multiply.outer(d,d))

Mar 17 '06 #4
Tested it and it works like a charm! Thank you very much for fixing
this. Not knowing what an SVN is, I simply copied the code into the
appropriate library files and it works perfectly well.

May I suggest a simple enhancement: modify corrcoef so that if it is
fed two 1 dimensional arrays, it returns a scalar. cov does something
similar for covariances: if you feed it just one vector, it returns a
scalar, and if you feed it two, it returns the covariance matrix i.e:
x = [1, 2, 3, 4, 5] z = [5, 4, 3, 2, 1] scipy.cov(x,z) array([[ 2.5, -2.5],
[-2.5, 2.5]])
scipy.cov(x)

2.5

I suspect that the majority of users use corrcoef to obtain point
estimates of the covariance of two vectors, and relatively few will
estimate a covariance matrix, as this method tends not to be robust to
the presence of noise and/or errors in the data.

Thomas Philips

Mar 19 '06 #5

 1 post views Thread by Markus von Ehr | last post: by 20 posts views Thread by mclaugb | last post: by 1 post views Thread by tkpmep | last post: by reply views Thread by Julien Fiore | last post: by 11 posts views Thread by Fie Pye | last post: by 2 posts views Thread by robert | last post: by 18 posts views Thread by robert | last post: by 19 posts views Thread by LucasLondon | last post: by 6 posts views Thread by jadamwilson2 | last post: by reply views Thread by eddparker01 | last post: by reply views Thread by isladogs | last post: by 1 post views Thread by isladogs | last post: by reply views Thread by Trystan | last post: by 1 post views Thread by elpidahope | last post: by reply views Thread by IbrarBarlow | last post: by 1 post views Thread by mscomx | last post: by reply views Thread by Abraham01 | last post: by 1 post views Thread by rainxy | last post: by