numpy/scipy: correlation - Page 2

robert

Is there a ready made function in numpy/scipy to compute the correlation y=mx+o of an X and Y fast:
m, m-err, o, o-err, r-coef,r-coef-err ?

Or a formula to to compute the 3 error ranges?

-robert

PS:

numpy.corrcoef computes only the bare coeff:

>>numpy.corrcoe f((0,1,2,3.0),( 2,5,6,7.0),)

array([[ 1. , 0.95618289],
[ 0.95618289, 1. ]])

with ints it goes computes wrong:

>>numpy.corrcoe f((0,1,2,3),(2, 5,6,7),)

array([[ 1. , 0.94491118],
[ 0.94491118, 1. ]])

Nov 11 '06

Subscribe Reply

22401

Robert Kern

sturlamolden wrote:

Robert Kern wrote:

>The difference between the two models is that the first places no restrictions
on the distribution of x. The second does; both the x and y marginal
distribution s need to be normal. Under the first model, the correlation
coefficient has no meaning.

That is not correct. The correlation coefficient is meaningful in both
models, but must be interpreted differently. However, in both cases a
correlation coefficient of 1 or -1 indicates an exact linear
relationship between x and y.

Under the first model ("linear regression"), the squared correlation
coefficient is the "explained variance", i.e. the the proportion of y's
variance accounted for by the model y = m*x + o.

Point.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Nov 12 '06 #11

Robert Kern

robert wrote:

I remember once I saw somewhere a formula for an error range of the corrcoef. but cannot find it anymore.

There is no such thing as "a formula for an error range" in a vacuum like that.
Each formula has a model attached to it. If your data does not follow that
model, then any such estimate of the error of your estimate is meaningless.

sturlamolden pointed out to me that I was wrong in thinking that the correlation
coefficient was meaningless in the linear regression case. However, the error
estimate that is used for bivariate normal correlation coefficients will almost
certainly not apply to "correlatio n coefficient qua sqare root of the fraction
of y's variance explained by the linear model". In that context, the
"correlatio n coefficient" is not an estimate of some parameter. It has no
uncertainty attached to it.

http://en.wikipedia.org/wiki/Pearson...ficient#Trivia
says:
In MATLAB, corr(X) calculates Pearsons correlation coefficient along with p-value.

Does anybody know how this prob.-value is computed/motivated? Such thing would be very helpful for numpy/scipy too.

scipy.stats.pea rsonr()

As with all such frequentist hypothesis testing nonsense, one takes the null
hypothesis (in this case, a bivariate normal distribution of points with 0
correlation), finds the distribution of the test statistic given the number of
points sampled, and then finds the probability of getting a test statistic "at
least as extreme" as the test statistic you actually got.

my old func is simply hands-on based on
n,sum_x,sum_y,s um_xy,sum_xx,su m_yy=len(vx),vx .sum(),vy.sum() ,(vx*vy).sum(), (vx*vx).sum(),( vy*vy).sum()
Guess its already fast for large data?

You really want to use a 2-pass algorithm to avoid numerical problems.

Note: numpy.corrcoef strikes on 2 points:

>>>numpy.corrco ef(([1,2],[1,2]))

array([[ -1.#IND, -1.#IND],
[ -1.#IND, -1.#IND]])

This was fixed in SVN.

PPS:

A compatible scipy binary (0.5.2?) for numpy 1.0 was announced some weeks back. Think currently many users suffer when trying to get started with incompatible most-recent libs of scipy and numpy.

Well, go ahead and poke the people that said they would have a Windows binary
ready. Or provide one yourself.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Nov 12 '06 #12

sturlamolden

robert wrote:

t = r * sqrt( (n-2)/(1-r**2) )

yet too lazy/practical for digging these things from there. You obviously got it - out of that, what would be a final estimate for an error range of r (n big) ?
that same "const. * (1-r**2)/sqrt(n)" which I found in that other document ?

I gave you th formula. Solve for r and you get the confidence interval.
You will need to use the inverse cumulative Student t distribution.

Another quick-and-dirty solution is to use bootstrapping.

from numpy import mean, std, sum, sqrt, sort
from numpy.random import randint

def bootstrap_corre lation(x,y):
idx = randint(len(x), size=(1000,len( x)))
bx = x[idx] # reasmples x with replacement
by = y[idx] # resamples y with replacement
mx = mean(bx,1)
my = mean(by,1)
sx = std(bx,1)
sy = std(by,1)
r = sort(sum( (bx - mx.repeat(len(x ),0).reshape(bx .shape)) *
(by - my.repeat(len(y ),0).reshape(by .shape)), 1) /
((len(x)-1)*sx*sy))
#bootstrap confidence interval (NB! biased)
return (r[25],r[975])

My main concern is, how to respect the fact, that the (x,y) points may not distribute well along the regression line.

The bootstrap is "non-parametric" in the sense that it is distribution
free.

Nov 12 '06 #13

sturlamolden

While I am at it, lets add the bootstrap estimate of the standard error
as well.

from numpy import mean, std, sum, sqrt, sort, corrcoef, tanh, arctanh
from numpy.random import randint

def bootstrap_corre lation(x,y):
idx = randint(len(x), size=(1000,len( x)))
bx = x[idx]
by = y[idx]
mx = mean(bx,1)
my = mean(by,1)
sx = std(bx,1)
sy = std(by,1)
r = sort(sum( (bx - mx.repeat(len(x ),0).reshape(bx .shape)) *
(by - my.repeat(len(y ),0).reshape(by .shape)), 1)
/ ((len(x)-1)*sx*sy))
#bootstrap confidence interval (NB! biased)
conf_interval = (r[25],r[975])
#bootstrap standard error using Fisher's z-transform (NB! biased)
std_err = tanh(std(arctan h(r))*(len(r)/(len(r)-1.0)))
return (std_err, conf_interval)

Since we are not using bias corrected and accelerated bootstrap, the
standard error are more meaningful, although both estimates are biased.

I said that the boostrap is "distributi on free". That is not quite the
case either. The assumed distribution is the provided sample
distribution, i.e. a sum of Dirac delta functions. Often this is not a
very sound assumption, and the bootstrap must then be improved e.g.
using the BCA procedure. That is particularly important for confidence
intervals, but not so much for standard errors. However for a
quick-and-dirty standard error estimate, we can just ignore the small
bias.

Now, if you wonder why I used the standard deviation to obtain the
standard error (without dividing by sqrt(n) ), the answer is that the
standard error is the standard deviation of an estimator. As we have
1000 samples from the correlation's sampling distribution in r, we get
the "standard error of the correlation" by taking the standard
deviation of r. The z-transform is required before we can compute mean
and standard deviations of correlations.

Nov 12 '06 #14

robert

sturlamolden wrote:

robert wrote:

>>t = r * sqrt( (n-2)/(1-r**2) )

>yet too lazy/practical for digging these things from there. You obviously got it - out of that, what would be a final estimate for an error range of r (n big) ?
that same "const. * (1-r**2)/sqrt(n)" which I found in that other document ?

I gave you th formula. Solve for r and you get the confidence interval.
You will need to use the inverse cumulative Student t distribution.

Another quick-and-dirty solution is to use bootstrapping.

from numpy import mean, std, sum, sqrt, sort
from numpy.random import randint

def bootstrap_corre lation(x,y):
idx = randint(len(x), size=(1000,len( x)))
bx = x[idx] # reasmples x with replacement
by = y[idx] # resamples y with replacement
mx = mean(bx,1)
my = mean(by,1)
sx = std(bx,1)
sy = std(by,1)
r = sort(sum( (bx - mx.repeat(len(x ),0).reshape(bx .shape)) *
(by - my.repeat(len(y ),0).reshape(by .shape)), 1) /
((len(x)-1)*sx*sy))
#bootstrap confidence interval (NB! biased)
return (r[25],r[975])

>My main concern is, how to respect the fact, that the (x,y) points may not distribute well along the regression line.

The bootstrap is "non-parametric" in the sense that it is distribution
free.

thanks for the bootstrap tester. It confirms mainly the "r_stderr = (1-r**2)/sqrt(n)" formula. The assymetry of r (-1..+1) is less a problem.
Yet my main problem, how to respect clumpy distribution in the data points, is still the same.
In practice think of a situation where data out of an experiment has an unkown damping/filter (or whatever unkown data clumper) on it, thus lots of redundancy in effect.
An extreme example is to just duplicate data:

>>x ,y =[0.,0,0,0,1]*10 ,[0.,1,1,1,1]*10
xx,yy=[0.,0,0,0,1]*100,[0.,1,1,1,1]*100
correlation(x ,y)

(0.25, 0.132582521472, 0.25, 0.75)

>>correlation(x x,yy)

(0.25, 0.0419262745781 , 0.25, 0.75)

>>bootstrap_cor relation(array( x),array(y))

(0.148447544378 , 0.375391432338)

>>bootstrap_cor relation(array( xx),array(yy))

(0.215668822617 , 0.285633303438)

>>>

here the bootstrap test will as well tell us, that the confidence intervall narrows down by a factor ~sqrt(10) - just the same as if there would be 10-fold more of well distributed "new" data. Thus this kind of error estimation has no reasonable basis for data which is not very good.

The interesting task is probably this: to check for linear correlation but "weight clumping of data" somehow for the error estimation.
So far I can only think of kind of geometric density approach...

Or is there a commonly known straight forward approach/formula for this problem?
In this formula which I can remember weakly somehow - I think there were other basic sum terms like sum_xxy, sum_xyy,.. in it (which are not needed for the formula for r itself )
Robert

Nov 15 '06 #15

sturlamolden

robert wrote:

here the bootstrap test will as well tell us, that the confidence intervall narrows down by a factor ~sqrt(10) - just the same as if there would be 10-fold more of well distributed "new" data. Thus this kind of error estimation has no reasonable basis for data which is not very good.

The confidence intervals narrows when the amount of independent data
increases. If you don't understand why, then you lack a basic
understanding of statistics. Particularly, it is a fundamental
assumption in most statistical models that the data samples are
"IDENTICALL Y AND INDEPENDENTLY DISTRIBUTED", often abbreviated "i.i.d."
And it certainly is assumed in this case. If you tell the computer (or
model) that you have i.i.d. data, it will assume it is i.i.d. data,
even when its not. The fundamental law of computer science also applies
to statistics: shit in = shit out. If you nevertheless provide data
that are not i.i.d., like you just did, you will simply obtain invalid
results.

The confidence interval concerns uncertainty about the value of a
population parameter, not about the spread of your data sample. If you
collect more INDEPENDENT data, you know more about the population from
which the data was sampled. The confidence interval has the property
that it will contain the unknown "true correlation" 95% of the times it
is generated. Thus if you two samples WITH INDEPENDENT DATA from the
same population, one small and one large, the large sample will
generate a narrower confidence interval. Computer intensive methods
like bootstrapping and asymptotic approximations derived analytically
will behave similarly in this respect. However, if you are dumb enough
to just provide duplications of your data, the computer is dumb enough
to accept that they are obtained statistically independently. In
statistical jargon this is called "pseudo-sampling", and is one of the
most common fallacies among uneducated practitioners.

Statistical software doesn't prevent the practitioner from shooting
himself in the leg; it actually makes it a lot easier. Anyone can paste
data from Excel into SPSS and hit "ANOVA" in the menu. Whether the
output makes any sense is a whole other story. One can duplicate each
sample three or four times, and SPSS would be ignorant of that fact. It
cannot guess that you are providing it with crappy data, and prevent
you from screwing up your analysis. The same goes for NumPy code. The
statistical formulas you type in Python have certain assumptions, and
when they are violated the output is of no value. The more severe the
violation, the less valuable is the output.

The interesting task is probably this: to check for linear correlation but "weight clumping of data" somehow for the error estimation.

If you have a pathological data sample, then you need to specify your
knowledge in greater detail. Can you e.g. formulate a reasonable
stochastic model for your data, fit the model parameters using the
data, and then derive the correlation analytically?

I am beginning to think your problem is ill defined because you lack a
basic understanding of maths and statistics. For example, it seems you
were confusing numerical error (rounding and truncation error) with
statistical sampling error, you don't understand why standard errors
decrease with sample size, you are testing with pathological data, you
don't understand the difference between independent data and data
duplications, etc. You really need to pick up a statistics textbook and
do some reading, that's my advice.

Nov 16 '06 #16

robert

sturlamolden wrote:

robert wrote:

>here the bootstrap test will as well tell us, that the confidence intervall narrows down by a factor ~sqrt(10) - just the same as if there would be 10-fold more of well distributed "new" data. Thus this kind of error estimation has no reasonable basis for data which is not very good.

The confidence intervals narrows when the amount of independent data
increases. If you don't understand why, then you lack a basic
understanding of statistics. Particularly, it is a fundamental
assumption in most statistical models that the data samples are
"IDENTICALL Y AND INDEPENDENTLY DISTRIBUTED", often abbreviated "i.i.d."
And it certainly is assumed in this case. If you tell the computer (or
model) that you have i.i.d. data, it will assume it is i.i.d. data,
even when its not. The fundamental law of computer science also applies
to statistics: shit in = shit out. If you nevertheless provide data
that are not i.i.d., like you just did, you will simply obtain invalid
results.

The confidence interval concerns uncertainty about the value of a
population parameter, not about the spread of your data sample. If you
collect more INDEPENDENT data, you know more about the population from
which the data was sampled. The confidence interval has the property
that it will contain the unknown "true correlation" 95% of the times it
is generated. Thus if you two samples WITH INDEPENDENT DATA from the
same population, one small and one large, the large sample will
generate a narrower confidence interval. Computer intensive methods
like bootstrapping and asymptotic approximations derived analytically
will behave similarly in this respect. However, if you are dumb enough
to just provide duplications of your data, the computer is dumb enough
to accept that they are obtained statistically independently. In
statistical jargon this is called "pseudo-sampling", and is one of the
most common fallacies among uneducated practitioners.

that duplication is just an extreme example to show my need: When I get the data, there can be an inherent filter/damping or other mean of clumping on the data which I don't know of beforehand. My model is basically linear (its a preparation step for ranking valuable input data for a classification task, thus for data reduction), just the degree of clumping in the data is unknown. Thus the formula for r is ok, but that bare i.i.d. formula for the error "(1-r**2)/sqrt(n)" (or bootstrap_test same) is blind on that.

Statistical software doesn't prevent the practitioner from shooting
himself in the leg; it actually makes it a lot easier. Anyone can paste
data from Excel into SPSS and hit "ANOVA" in the menu. Whether the
output makes any sense is a whole other story. One can duplicate each
sample three or four times, and SPSS would be ignorant of that fact. It
cannot guess that you are providing it with crappy data, and prevent
you from screwing up your analysis. The same goes for NumPy code. The
statistical formulas you type in Python have certain assumptions, and
when they are violated the output is of no value. The more severe the
violation, the less valuable is the output.

>The interesting task is probably this: to check for linear correlation but "weight clumping of data" somehow for the error estimation.

If you have a pathological data sample, then you need to specify your
knowledge in greater detail. Can you e.g. formulate a reasonable
stochastic model for your data, fit the model parameters using the
data, and then derive the correlation analytically?

no, its too complex. Or its just: additional clumping/fractality in the data.
Thus, linear correlation is supposed, but the x,y data distribution may have "less than 2 dimensions". No better model.

Think of such example: A drunken (x,y) 2D walker is supposed to walk along a diagonal, but he makes frequent and unpredictable pauses/slow motion. You get x,y coordinates in 1 per second. His speed and time pattern at all do not matter - you just want to know how well he keeps his track.
( My application data is even worse/blackbox, there is not even such "model" )

I am beginning to think your problem is ill defined because you lack a
basic understanding of maths and statistics. For example, it seems you
were confusing numerical error (rounding and truncation error) with
statistical sampling error, you don't understand why standard errors
decrease with sample size, you are testing with pathological data, you
don't understand the difference between independent data and data
duplications, etc. You really need to pick up a statistics textbook and
do some reading, that's my advice.

I think I understand all this very well. Its not on this level. The problem has also nothing to do with rounding, sampling errors etc.
Of course the error ~1/sqrt(n) is the basic assumption - not what I not know, but what I "complain" about :-) (Thus I even guessed the "dumb" formula for r_err well before I saw it somewhere. This is all not the real question.).
Yet I need a way to _NOT_ just fall on that ~1/sqrt(n) for the error, when there is unknown clumping in the data. It has to be a smarter - say automatic non-i.i.d. computation for a reasonable confidence intervall/error of the correlation - in absence of a final/total modell.

Thats not even an exceptional application. In most measured data which is created by iso-timestep sampling (thus not "pathologic al" so far?), the space of some interesting 2 variables may be walked "non-iso". Think of any time series data, where most of the data is "boring"/redundant because the flux of the experiment is so, that interesting things happen only occasionally. In absence of a full model for the "whole history", one could try to preprocess the x,y data by attaching a density weight in order to make it "non-pathological" before feeding it into the formula for r,r_err. Yet this is expensive. Or one could think of computing a rough fractal dimension and decorate the error like fracconst * (1-r**2)/sqrt(n)

The (fast) formula I'm looking for - possibly it doesn't exist - should do this in a rush.
Robert

Nov 16 '06 #17

sturlamolden

robert wrote:

Think of such example: A drunken (x,y) 2D walker is supposed to walk along a diagonal, but he makes frequent and unpredictable pauses/slow motion. You get x,y coordinates in 1 per second. His speed and time pattern at all do not matter - you just want to know how well he keeps his track.

In which case you have time series data, i.e. regular samples from p(t)
= [ x(t), y(t) ]. Time series have some sort of autocorrelation in the
samples as well, which must be taken into account. Even tough you could
weight each point by the drunkard's speed, a correlation or linear
regression would still not make any sense here, as such analyses are
based on the assumption of no autocorrelation in the samples or the
residuals. Correlation has no meaning if y[t] is correlated with
y[t+1], and regression has no meaning if the residual e[t] is
correlated with the residual e[t+1].

A state space model could e.g. be applicable. You could estimate the
path of the drunkard using a Kalman filter to compute a Taylor series
expansion p(t) = p0 + v*t + 0.5*a*t**2 + ... for the path at each step
p(t). When you have estimates for the state parameters s, v, and a, you
can compute some sort of measure for the drunkard's deviation from his
ideal path.

However, if you don't have time series data, you should not treat your
data as such.

If you don't know how your data is generated, there is no way to deal
with them correctly. If the samples are time series they must be
threated as such, if they are not they should not. If the samples are
i.i.d. each point count equally much, if they are not they do not. If
you have a clumped data due to time series or lack of i.i.d., you must
deal with that. However, data can be i.i.d. and clumped, if the
underlying distribution is clumped. In order to determine the cause,
you must consider how your data are generated and how your data are
sampled. You need meta-information about your data to determine this.
Matlab or Octave will help you with this, and it is certainly not a
weakness of NumPy as you implied in your original post. There is no way
to put magic into any numerical computation. Statistics always require
formulation of specific assumptions about the data. If you cannot think
clearly about your data, then that is the problem you must solve.

Nov 16 '06 #18

robert

sturlamolden wrote:

robert wrote:

>Think of such example: A drunken (x,y) 2D walker is supposed to walk along a diagonal, but he makes frequent and unpredictable pauses/slow motion. You get x,y coordinates in 1 per second. His speed and time pattern at all do not matter - you just want to know how well he keeps his track.

In which case you have time series data, i.e. regular samples from p(t)
= [ x(t), y(t) ]. Time series have some sort of autocorrelation in the
samples as well, which must be taken into account. Even tough you could
weight each point by the drunkard's speed, a correlation or linear
regression would still not make any sense here, as such analyses are
based on the assumption of no autocorrelation in the samples or the
residuals. Correlation has no meaning if y[t] is correlated with
y[t+1], and regression has no meaning if the residual e[t] is
correlated with the residual e[t+1].

A state space model could e.g. be applicable. You could estimate the
path of the drunkard using a Kalman filter to compute a Taylor series
expansion p(t) = p0 + v*t + 0.5*a*t**2 + ... for the path at each step
p(t). When you have estimates for the state parameters s, v, and a, you
can compute some sort of measure for the drunkard's deviation from his
ideal path.

However, if you don't have time series data, you should not treat your
data as such.

If you don't know how your data is generated, there is no way to deal
with them correctly. If the samples are time series they must be
threated as such, if they are not they should not. If the samples are
i.i.d. each point count equally much, if they are not they do not. If
you have a clumped data due to time series or lack of i.i.d., you must
deal with that. However, data can be i.i.d. and clumped, if the
underlying distribution is clumped. In order to determine the cause,
you must consider how your data are generated and how your data are
sampled. You need meta-information about your data to determine this.
Matlab or Octave will help you with this, and it is certainly not a
weakness of NumPy as you implied in your original post. There is no way
to put magic into any numerical computation. Statistics always require
formulation of specific assumptions about the data. If you cannot think
clearly about your data, then that is the problem you must solve.

yes, in the example of the drunkard time-series its possible to go to better model - yet even there it is very expensive (in relation to the stats-improvement) to worry too much about the best model for such guy :-).

In the field of datamining with many datatracks and typically digging first for multiple but smaller correlations - without a practical bottom-up modell, I think one falls regularly back to a certain basic case - maybe the most basic model for data at all: that basic modell is possibly that of a "hunter" which waits mostly, but only acts, if rare goodies are in front of him.
Again in the most basic case, when having 2D x,y data in front of you without a relyable time-path or so, you see this: a density distribution of points. There is possibly a linear correlation on the highest scale - which you are interested in - but the points show also inhomgenity/clumping, and this rises the question of influence on r_err. What now? One sees clearly that its nonsense to make just plain average stats.
I think this case is a most basic default for data - even compared to the common textbook-i.i.d. case. In fact, one can recognize such kind of stats, which repects mere (inhomogous) data-density itself, as (kind of simple/indepent) auto-bayesian stats vs. dumb averaging.

I think one can almost always do this "bayesian density weighter/filter" as better option compared to mere average stats in that case of x,y correlation when there is obvioulsy interesting correlation but where you are too lazy..to..princ ipally unable to itch out a modell on physics level. The latter requirement is in fact what any averaging stats cries for at any price - but how often can you do it in real world applications ...

( In reality there is anyway no way to eliminate auto-correlation in the composition of data. Everything and everybody lies :-) )

Thats where a top-down (model-free) bayesian stats approach will pay off: In the previous extreme example of criminal data duplication - I'm sure - it will totally neutralize the attack without question. In the drunkard time-series example it will tell me very reliably how well this guy will keep track - without need for a complex model. In the case of good i.i.d. data distribution it will tell me the same as simple stats. Just good news ...

Thus I can possibly say it so now: I have the problem for guessing linear correlation (coefficient with error) on x,y data with respect to the (most general assupmtion) "bayesian" background of inhomogenous data distribution.
Therefore I'm seeking a (fast/efficient/approx.) formula for r/r_err. I guess the formula for r does not change (much) compared to that for simple averaging stats, but the formula for r_err will.

Maybe its easy with some existing means of numpy/scipy already. Maybe not. I'm far from finding the (efficient) math myself, but I know what I want - and can see if a formula really does it.
Robert

Nov 16 '06 #19

Similar topics

2592

NumPy error

by: jason | last post by:

Hello: I am using the following versions of Python and packages on Windows XP (SP2): Python 2.4.2 NumPy 0.9.4.win32-py2.4 SciPy 0.4.4 for Python 2.4 and Pentium 4/SSE2 In the Python Shell I am running the following:

Python

2584

Scientific Computing with NumPy

by: mclaugb | last post by:

Has anyone recompiled the Scientific Computing package using NumPy instead of Numeric? I need a least squares algorithm and a Newton Rhaphson algorithm which is contained in Numeric but all the documentation out there says that Numeric is crap and all code should be using NumPy. Thanks, Bryan

Python

1527

Trouble with numpy-0.9.4 and numpy-0.9.5

by: drife | last post by:

Hello, I use the Python Numeric package extensively, and had been an avid user of the "old" scipy. In my view, both pieces of software are truly first rate, and have greatly improved my productivity in the area of scientific analysis. Thus, I was excited to make the transition to the new scipy core (numpy). Unfortunately, I am experiencing a problem that I cannot sort out. I am running Python 2.4.2 on a Debian box (V3.1), using

Python

3612

Error with: pickle.dumps(numpy.float32)

by: Iljya | last post by:

Hello, I need to pickle the type numpy.float32 but encounter an error when I try to do so. I am able to pickle the array itself, it is specifically the type that I cannot pickle. I am using: Numpy version: 0.9.4 Python version: 2.4.3 Windows XP

Python

2526

switching to numpy and failing, a user story

by: greg.landrum | last post by:

After using numeric for almost ten years, I decided to attempt to switch a large codebase (python and C++) to using numpy. Here's are some comments about how that went. - The code to automatically switch python stuff over just kind of works. But it was a 90% solution, I could do the rest by hand. Of course, the problem is that then the code is still using the old numeric API, so it's not a long term solution. Unfortunately, to switch to...

Python

3947

latest numpy & scipy - incompatible ?

by: robert | last post by:

I'm using latest numpy & scipy. What is this problem ? : RuntimeError: module compiled against version 1000002 of C-API but this version of numpy is 1000009 Traceback (most recent call last): File "<interactive input>", line 1, in ? File "C:\PYTHON23\Lib\site-packages\scipy\stats\__init__.py", line 7, in ? from stats import * File "C:\PYTHON23\Lib\site-packages\scipy\stats\stats.py", line 191, in ? import scipy.special as special File...

Python

1616

trouble installing numpy 1.0.2 and scipy.0.5.2

by: 1960_j | last post by:

I have tried to install numpy and scipy on python 5.2. Using gcc 2.95.3, lapack 3.1.1 and ATLAS 3.6.0. When installin numpy it seems to work but when I try to run test get error no test for numpy. When I try to Install scipy only get error. Any ideas on how to install would be appreciated. Thanks

Python

2138

Build Python, Numpy and Scipy source with Visual Studio 6.0 forwindows

by: Richard_Martineau | last post by:

Hello All I wonder if anyone can advise me or has done similar to the following? Basically I've downloaded the Python 2.5.2 source code that builds with Visual Studio 6.0. I've built Python for windows. This was easy (it even came with the pcbuild.dsw workspace file). Great! Now comes the troubled bit...I now look for similar source code for Python extensions Numpy and Scipy but the source code and directories

Python

4221

How to "reduce" a numpy array using a costum binary function

by: Slaunger | last post by:

I know there must be a simple method to do this. I have implemented this function for calculating a checksum based on a ones complement addition: def complement_ones_checksum(ints): """ Returns a complements one checksum based on a specified numpy.array of dtype=uint16 """

Python

9521

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

9333

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

9945

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

6599

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5214

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5361

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

3863

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

3442

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

2733

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General