445,750 Members | 1,214 Online Need help? Post your question and get tips & solutions from a community of 445,750 IT Pros & Developers. It's quick & easy.

# how to speed up the matrix multiplication in C

 P: n/a Hi all, Is there any method which can implememt the matrix multiplication faster than using the formula as we often do by hand? I am writing the following code and my matrice: one is 3x40000 and the other one 40000x3. the program runs too slow: /*multiply two matrice: xyz_trans * xyz , the output is w 3x3*/ for (i=0;i<3;i++) { for (j=0;j<3;j++) { w[i][j]=0; for (k=0;k<40000;k++) { w[i][j]+=xyz_trans[i][k]*xyz[k][j]; } } } What is your code when you multiply 2 big matrice? Is there any faster algorithm? I see MATLAB handles this task so quick. Thanks Mar 16 '07 #1
7 Replies

 P: n/a Vi*******@gmail.com wrote: Is there any method which can implememt the matrix multiplication faster than using the formula as we often do by hand? Yes, but AFAIK its rather hairy and only useful for large (no, larger than that) matricies. IMBW. I am writing the following code and my matrice: one is 3x40000 and the other one 40000x3. the program runs too slow: /*multiply two matrice: xyz_trans * xyz , the output is w 3x3*/ for (i=0;i<3;i++) { for (j=0;j<3;j++) { w[i][j]=0; for (k=0;k<40000;k++) { w[i][j]+=xyz_trans[i][k]*xyz[k][j]; } } } How slow is "too slow"? It's pretty much instant on the-here machine. (The inner loop is only executed 360000 times. I'm assuming that the arrays are float/double.) But if you /really/ want more speed: Um. Well. You're not exactly /helping/ the compiler. For example, the expression `w[i][j]` is evaluated /every time round the inner loop/. So is the expression `xyz_trans[i]`. You can have a temp for w[i][j], and you can declare a local for xyz_trans[i]. -- Chris "electric hedgehog" Dollin "What I don't understand is this ..." Trevor Chaplin, /The Beiderbeck Affair/ Mar 16 '07 #2

 P: n/a I am writing the following code and my matrice: one is 3x40000 and the other one 40000x3. the program runs too slow: /*multiply two matrice: xyz_trans * xyz , the output is w 3x3*/ for (i=0;i<3;i++) { for (j=0;j<3;j++) { w[i][j]=0; for (k=0;k<40000;k++) { w[i][j]+=xyz_trans[i][k]*xyz[k][j]; } } } What is your code when you multiply 2 big matrice? Is there any faster algorithm? I see MATLAB handles this task so quick. I'm going to risk blowing smoke (i.e., I haven't tested what I'm saying so take it with a huge grain of salt). With that in mind, here are some ideas, which you can try and see if they help. 1) Split into two parts (as the other poster mentioned), one for initialization and one for multiplication. 2) In the second part, switch the order of k and j. This might potentially increase your cache hit rate, which can have a significant speed increase. (Because in this case you'd always be accessing consecutive memory locations, i.e., only changing the last index.) The resulting code would be: for (i=0;i<3;i++) { for (j=0;j<3;j++) { w[i][j]=0; } } for (i=0;i<3;i++) { for (k=0;k<40000;k++) { for (j=0;j<3;j++) { w[i][j] += xyz_trans[i][k] * xyz[k][j]; } } } 3) In the off chance you haven't already, set your compiler's flags to maximum optimization. 4) Maybe introduce temporary variables for some things (hopefully compiler optimization would catch those). I probably wouldn't do this, but the code would look something like this (second part only): for (i=0;i<3;i++) { double* wp = w[i]; for (k=0;k<40000;k++) { double* xyzp = xyz[k]; double xyz_trans_val = xyz_trans[i][k]; for (j=0;j<3;j++) { wp[j] += xyz_trans_val * xyzp[j]; } } } As I said, I wouldn't do it because I expect my compiler to do this kind of thing for me. If I were really desparate, I might look at the generated assembly to figure out if my compiler is actually doing that. Finally, as mentioned, I haven't tested this, so it might actually make the code slower. And it looks like xyz_trans might be the same as xyz transposed, in which case you could rewrite the whole thing with only xyz and without creating a separate array. Michael P.S. Matrix multiplication as written is N^3 if you multiply NxN and NxN matrices. There's a better algorithm that's N^2.5 or something like that, but since one of your dimensions is 3, it's not likely to help much (and it is a significant pain to implement). Mar 16 '07 #3

 P: n/a On Mar 16, 9:37 am, VijaKh...@gmail.com wrote: Hi all, Is there any method which can implememt the matrix multiplication faster than using the formula as we often do by hand? I am writing the following code and my matrice: one is 3x40000 and the other one 40000x3. the program runs too slow: /*multiply two matrice: xyz_trans * xyz , the output is w 3x3*/ for (i=0;i<3;i++) { for (j=0;j<3;j++) { w[i][j]=0; for (k=0;k<40000;k++) { w[i][j]+=xyz_trans[i][k]*xyz[k][j]; } } } What is your code when you multiply 2 big matrice? Is there any faster algorithm? I see MATLAB handles this task so quick. Thanks I posted a C Strassen matrix multiply in this thread: http://groups.google.com/group/comp....be9b87e8427d4b There is a link to a C++ version in this thread: http://groups.google.com/group/comp....54b01120cc246b Mar 16 '07 #4

 P: n/a On 16 Mar 2007 09:37:36 -0700, Vi*******@gmail.com wrote: >Hi all,Is there any method which can implememt the matrix multiplicationfaster than using the formula as we often do by hand?I am writing the following code and my matrice: one is 3x40000 and theother one 40000x3.the program runs too slow: /*multiply two matrice: xyz_trans * xyz , the output is w 3x3*/ for (i=0;i<3;i++) { for (j=0;j<3;j++) { w[i][j]=0; for (k=0;k<40000;k++) { w[i][j]+=xyz_trans[i][k]*xyz[k][j]; } } }What is your code when you multiply 2 big matrice? Is there any fasteralgorithm? I see MATLAB handles this task so quick. Your problem is that you have lots of caches faults and perhaps even page fault because the stride in one of the terms in the dot product isn't equal to one. This being C code, the stride in xyz[k][j] is 3 rather than 1. This is a well known issue in matrix multiplication. If feasible the solution is to transpose the matrix causing trouble first. BTW this really isn't a C problem. Comp.programming or sci.math.num-analysis might be more appropriate groups. Mar 16 '07 #5

 P: n/a On Mar 16, 4:37 pm, VijaKh...@gmail.com wrote: Hi all, Is there any method which can implememt the matrix multiplication faster than using the formula as we often do by hand? I am writing the following code and my matrice: one is 3x40000 and the other one 40000x3. the program runs too slow: /*multiply two matrice: xyz_trans * xyz , the output is w 3x3*/ for (i=0;i<3;i++) { for (j=0;j<3;j++) { w[i][j]=0; for (k=0;k<40000;k++) { w[i][j]+=xyz_trans[i][k]*xyz[k][j]; } } } What is your code when you multiply 2 big matrice? Is there any faster algorithm? I see MATLAB handles this task so quick. Thanks Your post is missing some important things: How exactly are i, j, k, xyz, xyz_trans and w declared? How fast or slow exactly is the code, on which processor? That information is needed to decide if there is something significantly wrong with your code, or whether this is just a bit slow because your code is quite clumsy, or whether this code is actually as good as you could expect. I suggest you replace your code with (I left out some declarations because you left out stuff that I would need to know) for (i=0;i<3;i++) { s0 = s1 = s2 = 0; for (k=0;k<40000;k++) { t = xyz_trans[i][k]; p = &xyz[k]; s0 += t * p ; s1 _= t * p ; s2 += t * p ; } w [i] = s0; w [i] = s1; w [i] = s2; } Measure the times and post it here, plus post the _complete_ source, then we can have another look. About a millisecond on a modern desktop computer seems reasonable. Mar 17 '07 #6

 P: n/a In article <11*********************@l75g2000hse.googlegroups. comVi*******@gmail.com writes: Is there any method which can implememt the matrix multiplication faster than using the formula as we often do by hand? I am writing the following code and my matrice: one is 3x40000 and the other one 40000x3. the program runs too slow: /*multiply two matrice: xyz_trans * xyz , the output is w 3x3*/ for (i=0;i<3;i++) { for (j=0;j<3;j++) { w[i][j]=0; for (k=0;k<40000;k++) { w[i][j]+=xyz_trans[i][k]*xyz[k][j]; } } } It depends on the processor, and I think also on caching. But you might try the following: for(i = 0; i < 3; i++) for(j = 0; j < 3; j++) w[i][j] = 0; for(k = 0; k < 40000; k++) { for(i = 0; i < 3; i++) { xyzt = xyz_trans[i][k]; for(j = 0; j < 3; j++) { w[i][j] += xyzt * xyz[k][j]; } } } Under some circumstances it works better. -- dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131 home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/ Mar 17 '07 #7

 P: n/a In article <11**********************@y66g2000hsf.googlegroups .com"Michael"

### This discussion thread is closed

Replies have been disabled for this discussion. 