454,369 Members | 1,553 Online
Need help? Post your question and get tips & solutions from a community of 454,369 IT Pros & Developers. It's quick & easy.

for loop speed problem

 P: n/a Hi this code: float tmp = 0.; for (int i = 0; i < 2000; i++) { for (int j = i; j < 2000; j++) { for (int k = 0; k < 9000; k++) { tmp += tmp_mat(k,i) * tmp_mat(k,j); } tmp = 0.; } cout << "J: " << i << endl; } is 100 times faster than this float tmp = 0.; for (int i = 0; i < 2000; i++) { for (int j = i; j < 2000; j++) { for (int k = 0; k < 9000; k++) { tmp += tmp_mat(k,i) * tmp_mat(k,j); } data2d[i][j] = tmp; // the problem tmp = 0.; } cout << "J: " << i << endl; } can anybody help me? greetings Mario Oct 28 '05 #1
9 Replies

 P: n/a Mario Lüttich wrote [10/28/2005 08:32 AM] : Hi this code: > float tmp = 0.; > for (int i = 0; i < 2000; i++) { > for (int j = i; j < 2000; j++) { > for (int k = 0; k < 9000; k++) { > tmp += tmp_mat(k,i) * tmp_mat(k,j); > } > tmp = 0.; > > } > cout << "J: " << i << endl; > } is 100 times faster than this > float tmp = 0.; > for (int i = 0; i < 2000; i++) { > for (int j = i; j < 2000; j++) { > for (int k = 0; k < 9000; k++) { > tmp += tmp_mat(k,i) * tmp_mat(k,j); > } > data2d[i][j] = tmp; // the problem > tmp = 0.; > > } > cout << "J: " << i << endl; > } can anybody help me? greetings Mario data2d type? 2d array? Oct 28 '05 #2

 P: n/a On Fri, 28 Oct 2005 15:32:51 +0200, Mario Lüttich wrote: Hithis code: float tmp = 0.; for (int i = 0; i < 2000; i++) { for (int j = i; j < 2000; j++) { for (int k = 0; k < 9000; k++) { tmp += tmp_mat(k,i) * tmp_mat(k,j); } tmp = 0.; } cout << "J: " << i << endl; }is 100 times faster than this float tmp = 0.; for (int i = 0; i < 2000; i++) { for (int j = i; j < 2000; j++) { for (int k = 0; k < 9000; k++) { tmp += tmp_mat(k,i) * tmp_mat(k,j); } data2d[i][j] = tmp; // the problem tmp = 0.; } cout << "J: " << i << endl; }can anybody help me?greetingsMario What about the following? for (int i = 0; i < 2000; i++) { float& where_to_put_it=data2d[i][j]; where_to_put_it=0; for (int j = i; j < 2000; j++) { for (int k = 0; k < 9000; k++) { where_to_put_i += tmp_mat(k,i) * tmp_mat(k,j); } } cout << "J: " << i << endl; } .... and if we knew more deatils, we may even give more suggestions Best regards, -- Zara Oct 28 '05 #3

 P: n/a Hi, This doesn't look like the same code, I think you're dereferencing j before decalring it, possibly you want to move the lines around like this: - for (int i = 0; i < 2000; i++) { for (int j = i; j < 2000; j++) { float& where_to_put_it=data2d[i][j]; where_to_put_it=0; for (int k = 0; k < 9000; k++) { where_to_put_it += tmp_mat(k,i) * tmp_mat(k,j); } } cout << "J: " << i << endl; } Nice idea using a reference though, although I suspect this will be the same speed, how about this instead: - for (int i = 0; i < 2000; i++) { float* where_is_i = data2d[i]; for (int j = i; j < 2000; j++) { float& where_to_put_it = where_is_i[j]; where_to_put_it=0; for (int k = 0; k < 9000; k++) { where_to_put_it += tmp_mat(k,i) * tmp_mat(k,j); } } cout << "J: " << i << endl; } The idea here is to reduce the number of times that you de-reference data2d with the same value of i. Oct 28 '05 #4

 P: n/a Mario Lüttich wrote: Hi this code: float tmp = 0.; for (int i = 0; i < 2000; i++) { for (int j = i; j < 2000; j++) { for (int k = 0; k < 9000; k++) { tmp += tmp_mat(k,i) * tmp_mat(k,j); } tmp = 0.; } cout << "J: " << i << endl; } is 100 times faster than this float tmp = 0.; for (int i = 0; i < 2000; i++) { for (int j = i; j < 2000; j++) { for (int k = 0; k < 9000; k++) { tmp += tmp_mat(k,i) * tmp_mat(k,j); } data2d[i][j] = tmp; // the problem tmp = 0.; } cout << "J: " << i << endl; } can anybody help me? The speed difference is not surprising since the compiler has probably noticed that the loop on k in the first example can be eliminated since its output value (in tmp) is not used. Oct 28 '05 #5

 P: n/a Mario Lüttich wrote: Hi this code: > float tmp = 0.; > for (int i = 0; i < 2000; i++) { > for (int j = i; j < 2000; j++) { > for (int k = 0; k < 9000; k++) { > tmp += tmp_mat(k,i) * tmp_mat(k,j); > } > tmp = 0.; > > } > cout << "J: " << i << endl; > } is 100 times faster than this > float tmp = 0.; > for (int i = 0; i < 2000; i++) { > for (int j = i; j < 2000; j++) { > for (int k = 0; k < 9000; k++) { > tmp += tmp_mat(k,i) * tmp_mat(k,j); > } > data2d[i][j] = tmp; // the problem > tmp = 0.; > > } > cout << "J: " << i << endl; > } can anybody help me? greetings Mario Mario, Use pointers to loop through data2d incrementing them as required like the code shown below. Incrementing pointers is quicker than the calculations to index the array with i and j. int data[10][10]; int (*pRow)[10] = data,*pCol; int i,j; for (i = 0;i < 10;++i,++pRow) { pCol = *pRow; for (j = 0;j < 10;++j,++pCol) { *pCol = i * j; } } JFJB Oct 28 '05 #6

 P: n/a n2xssvv g02gfr12930 wrote: Mario Lüttich wrote: Hi this code: > float tmp = 0.; > for (int i = 0; i < 2000; i++) { > for (int j = i; j < 2000; j++) { > for (int k = 0; k < 9000; k++) { > tmp += tmp_mat(k,i) * tmp_mat(k,j); > } > tmp = 0.; > > } > cout << "J: " << i << endl; > } is 100 times faster than this > float tmp = 0.; > for (int i = 0; i < 2000; i++) { > for (int j = i; j < 2000; j++) { > for (int k = 0; k < 9000; k++) { > tmp += tmp_mat(k,i) * tmp_mat(k,j); > } > data2d[i][j] = tmp; // the problem > tmp = 0.; > > } > cout << "J: " << i << endl; > } can anybody help me? greetings Mario Mario, Use pointers to loop through data2d incrementing them as required like the code shown below. Incrementing pointers is quicker than the calculations to index the array with i and j. int data[10][10]; int (*pRow)[10] = data,*pCol; int i,j; for (i = 0;i < 10;++i,++pRow) { pCol = *pRow; for (j = 0;j < 10;++j,++pCol) { *pCol = i * j; } } JFJB Thanks anybody! greetings Mario Oct 31 '05 #7

 P: n/a Mario Lüttich wrote: Hi this code: [snip code that does not store the results anywhere] is 100 times faster than this [snip other code] can anybody help me? Look at the assembler file generated by the compiler. See what it is doing. Step through the code instruction by instruction in a debugger. If the optimiser is eliminating the inner loop then that code will not be exercised. Try turning the optimiser off and seeing if the versions are still so dis-similar. If one version is calling a library routine instead of using built-in instructions, you will notice that too. If the assembly looks fine, then it is possible that architecture issues (cache lines, FPU pipelines) are dominating. Its interesting to see what your compiler is actually doing. Oct 31 '05 #8

 P: n/a On 28 Oct 2005 08:45:40 -0700, "chrisg67" wrote: Hi, This doesn't look like the same code, I think you're dereferencing jbefore decalring it, possibly you want to move the lines around likethis: - for (int i = 0; i < 2000; i++) { for (int j = i; j < 2000; j++) { float& where_to_put_it=data2d[i][j]; where_to_put_it=0; for (int k = 0; k < 9000; k++) { where_to_put_it += tmp_mat(k,i) * tmp_mat(k,j); } } cout << "J: " << i << endl; } Sure. Wrting and not testing => bugging ;-) Nice idea using a reference though, although I suspect this will be thesame speed, how about this instead: - for (int i = 0; i < 2000; i++) { float* where_is_i = data2d[i]; for (int j = i; j < 2000; j++) { float& where_to_put_it = where_is_i[j]; where_to_put_it=0; for (int k = 0; k < 9000; k++) { where_to_put_it += tmp_mat(k,i) * tmp_mat(k,j); } } cout << "J: " << i << endl; }The idea here is to reduce the number of times that you de-referencedata2d with the same value of i. OK. What about this one? float **where_is_i=data2d; for (int i = 0; i < 2000; ++i,++where_is_i) { float *where_to_put_it=where_is_i+0; for (int j = i; j < 2000; ++j,++where_is_i) { *where_to_put_it=0; for (int k = 0; k < 9000; k++) { where_to_put_it += tmp_mat(k,i) * tmp_mat(k,j); } } cout << "J: " << i << endl; } It has more additions, and less multiplications/shiftings. BTW, not tested it, just for fun. Oct 31 '05 #9

 P: n/a Zara wrote: On 28 Oct 2005 08:45:40 -0700, "chrisg67" wrote:Hi, This doesn't look like the same code, I think you're dereferencing jbefore decalring it, possibly you want to move the lines around likethis: -for (int i = 0; i < 2000; i++) { for (int j = i; j < 2000; j++) { float& where_to_put_it=data2d[i][j]; where_to_put_it=0; for (int k = 0; k < 9000; k++) { where_to_put_it += tmp_mat(k,i) * tmp_mat(k,j); } } cout << "J: " << i << endl; } Sure. Wrting and not testing => bugging ;-)Nice idea using a reference though, although I suspect this will be thesame speed, how about this instead: - for (int i = 0; i < 2000; i++){ float* where_is_i = data2d[i]; for (int j = i; j < 2000; j++) { float& where_to_put_it = where_is_i[j]; where_to_put_it=0; for (int k = 0; k < 9000; k++) { where_to_put_it += tmp_mat(k,i) * tmp_mat(k,j); } } cout << "J: " << i << endl; }The idea here is to reduce the number of times that you de-referencedata2d with the same value of i. OK. What about this one? float **where_is_i=data2d; for (int i = 0; i < 2000; ++i,++where_is_i) { float *where_to_put_it=where_is_i+0; for (int j = i; j < 2000; ++j,++where_is_i) { *where_to_put_it=0; for (int k = 0; k < 9000; k++) { where_to_put_it += tmp_mat(k,i) * tmp_mat(k,j); } } cout << "J: " << i << endl; } It has more additions, and less multiplications/shiftings. BTW, not tested it, just for fun. This one ist faster. And my first case was without the last loop. Thanks at everyone that helped me. Nov 1 '05 #10