I am writing the following code and my matrice: one is 3x40000 and the
other one 40000x3.
the program runs too slow:
/*multiply two matrice: xyz_trans * xyz , the output is w 3x3*/
for (i=0;i<3;i++)
{
for (j=0;j<3;j++)
{
w[i][j]=0;
for (k=0;k<40000;k+ +)
{
w[i][j]+=xyz_trans[i][k]*xyz[k][j];
}
}
}
What is your code when you multiply 2 big matrice? Is there any faster
algorithm? I see MATLAB handles this task so quick.
I'm going to risk blowing smoke (i.e., I haven't tested what I'm
saying so take it with a huge grain of salt). With that in mind, here
are some ideas, which you can try and see if they help.
1) Split into two parts (as the other poster mentioned), one for
initialization and one for multiplication.
2) In the second part, switch the order of k and j. This might
potentially increase your cache hit rate, which can have a significant
speed increase. (Because in this case you'd always be accessing
consecutive memory locations, i.e., only changing the last index.)
The resulting code would be:
for (i=0;i<3;i++) {
for (j=0;j<3;j++) {
w[i][j]=0;
}
}
for (i=0;i<3;i++) {
for (k=0;k<40000;k+ +) {
for (j=0;j<3;j++) {
w[i][j] += xyz_trans[i][k] * xyz[k][j];
}
}
}
3) In the off chance you haven't already, set your compiler's flags to
maximum optimization.
4) Maybe introduce temporary variables for some things (hopefully
compiler optimization would catch those). I probably wouldn't do
this, but the code would look something like this (second part only):
for (i=0;i<3;i++) {
double* wp = w[i];
for (k=0;k<40000;k+ +) {
double* xyzp = xyz[k];
double xyz_trans_val = xyz_trans[i][k];
for (j=0;j<3;j++) {
wp[j] += xyz_trans_val * xyzp[j];
}
}
}
As I said, I wouldn't do it because I expect my compiler to do this
kind of thing for me. If I were really desparate, I might look at the
generated assembly to figure out if my compiler is actually doing
that.
Finally, as mentioned, I haven't tested this, so it might actually
make the code slower.
And it looks like xyz_trans might be the same as xyz transposed, in
which case you could rewrite the whole thing with only xyz and without
creating a separate array.
Michael
P.S. Matrix multiplication as written is N^3 if you multiply NxN and
NxN matrices. There's a better algorithm that's N^2.5 or something
like that, but since one of your dimensions is 3, it's not likely to
help much (and it is a significant pain to implement).