I am writing the following code and my matrice: one is 3x40000 and the

other one 40000x3.

the program runs too slow:

/*multiply two matrice: xyz_trans * xyz , the output is w 3x3*/

for (i=0;i<3;i++)

{

for (j=0;j<3;j++)

{

w[i][j]=0;

for (k=0;k<40000;k++)

{

w[i][j]+=xyz_trans[i][k]*xyz[k][j];

}

}

}

What is your code when you multiply 2 big matrice? Is there any faster

algorithm? I see MATLAB handles this task so quick.

I'm going to risk blowing smoke (i.e., I haven't tested what I'm

saying so take it with a huge grain of salt). With that in mind, here

are some ideas, which you can try and see if they help.

1) Split into two parts (as the other poster mentioned), one for

initialization and one for multiplication.

2) In the second part, switch the order of k and j. This might

potentially increase your cache hit rate, which can have a significant

speed increase. (Because in this case you'd always be accessing

consecutive memory locations, i.e., only changing the last index.)

The resulting code would be:

for (i=0;i<3;i++) {

for (j=0;j<3;j++) {

w[i][j]=0;

}

}

for (i=0;i<3;i++) {

for (k=0;k<40000;k++) {

for (j=0;j<3;j++) {

w[i][j] += xyz_trans[i][k] * xyz[k][j];

}

}

}

3) In the off chance you haven't already, set your compiler's flags to

maximum optimization.

4) Maybe introduce temporary variables for some things (hopefully

compiler optimization would catch those). I probably wouldn't do

this, but the code would look something like this (second part only):

for (i=0;i<3;i++) {

double* wp = w[i];

for (k=0;k<40000;k++) {

double* xyzp = xyz[k];

double xyz_trans_val = xyz_trans[i][k];

for (j=0;j<3;j++) {

wp[j] += xyz_trans_val * xyzp[j];

}

}

}

As I said, I wouldn't do it because I expect my compiler to do this

kind of thing for me. If I were really desparate, I might look at the

generated assembly to figure out if my compiler is actually doing

that.

Finally, as mentioned, I haven't tested this, so it might actually

make the code slower.

And it looks like xyz_trans might be the same as xyz transposed, in

which case you could rewrite the whole thing with only xyz and without

creating a separate array.

Michael

P.S. Matrix multiplication as written is N^3 if you multiply NxN and

NxN matrices. There's a better algorithm that's N^2.5 or something

like that, but since one of your dimensions is 3, it's not likely to

help much (and it is a significant pain to implement).