P: n/a

Hi
this code: float tmp = 0.; for (int i = 0; i < 2000; i++) { for (int j = i; j < 2000; j++) { for (int k = 0; k < 9000; k++) { tmp += tmp_mat(k,i) * tmp_mat(k,j); } tmp = 0.;
} cout << "J: " << i << endl; }
is 100 times faster than this
float tmp = 0.; for (int i = 0; i < 2000; i++) { for (int j = i; j < 2000; j++) { for (int k = 0; k < 9000; k++) { tmp += tmp_mat(k,i) * tmp_mat(k,j); } data2d[i][j] = tmp; // the problem tmp = 0.;
} cout << "J: " << i << endl; }
can anybody help me?
greetings
Mario  
Share this Question
P: n/a

Mario Lüttich wrote [10/28/2005 08:32 AM] : Hi
this code:
> float tmp = 0.; > for (int i = 0; i < 2000; i++) { > for (int j = i; j < 2000; j++) { > for (int k = 0; k < 9000; k++) { > tmp += tmp_mat(k,i) * tmp_mat(k,j); > } > tmp = 0.; > > } > cout << "J: " << i << endl; > }
is 100 times faster than this
> float tmp = 0.; > for (int i = 0; i < 2000; i++) { > for (int j = i; j < 2000; j++) { > for (int k = 0; k < 9000; k++) { > tmp += tmp_mat(k,i) * tmp_mat(k,j); > } > data2d[i][j] = tmp; // the problem > tmp = 0.; > > } > cout << "J: " << i << endl; > }
can anybody help me?
greetings
Mario
data2d type? 2d array?  
P: n/a

On Fri, 28 Oct 2005 15:32:51 +0200, Mario Lüttich <ml*****@gwdg.de>
wrote: Hi
this code:
float tmp = 0.; for (int i = 0; i < 2000; i++) { for (int j = i; j < 2000; j++) { for (int k = 0; k < 9000; k++) { tmp += tmp_mat(k,i) * tmp_mat(k,j); } tmp = 0.;
} cout << "J: " << i << endl; }
is 100 times faster than this
float tmp = 0.; for (int i = 0; i < 2000; i++) { for (int j = i; j < 2000; j++) { for (int k = 0; k < 9000; k++) { tmp += tmp_mat(k,i) * tmp_mat(k,j); } data2d[i][j] = tmp; // the problem tmp = 0.;
} cout << "J: " << i << endl; }
can anybody help me?
greetings
Mario
What about the following?
for (int i = 0; i < 2000; i++) {
float& where_to_put_it=data2d[i][j];
where_to_put_it=0;
for (int j = i; j < 2000; j++) {
for (int k = 0; k < 9000; k++) {
where_to_put_i += tmp_mat(k,i) * tmp_mat(k,j);
}
}
cout << "J: " << i << endl;
}
.... and if we knew more deatils, we may even give more suggestions
Best regards,
 Zara  
P: n/a

Hi,
This doesn't look like the same code, I think you're dereferencing j
before decalring it, possibly you want to move the lines around like
this: 
for (int i = 0; i < 2000; i++) {
for (int j = i; j < 2000; j++) {
float& where_to_put_it=data2d[i][j];
where_to_put_it=0;
for (int k = 0; k < 9000; k++) {
where_to_put_it += tmp_mat(k,i) * tmp_mat(k,j);
}
}
cout << "J: " << i << endl;
}
Nice idea using a reference though, although I suspect this will be the
same speed, how about this instead: 
for (int i = 0; i < 2000; i++)
{
float* where_is_i = data2d[i];
for (int j = i; j < 2000; j++)
{
float& where_to_put_it = where_is_i[j];
where_to_put_it=0;
for (int k = 0; k < 9000; k++)
{
where_to_put_it += tmp_mat(k,i) * tmp_mat(k,j);
}
}
cout << "J: " << i << endl;
}
The idea here is to reduce the number of times that you dereference
data2d with the same value of i.  
P: n/a

Mario Lüttich wrote: Hi
this code:
float tmp = 0.; for (int i = 0; i < 2000; i++) { for (int j = i; j < 2000; j++) { for (int k = 0; k < 9000; k++) { tmp += tmp_mat(k,i) * tmp_mat(k,j); } tmp = 0.;
} cout << "J: " << i << endl; }
is 100 times faster than this
float tmp = 0.; for (int i = 0; i < 2000; i++) { for (int j = i; j < 2000; j++) { for (int k = 0; k < 9000; k++) { tmp += tmp_mat(k,i) * tmp_mat(k,j); } data2d[i][j] = tmp; // the problem tmp = 0.;
} cout << "J: " << i << endl; }
can anybody help me?
The speed difference is not surprising since the compiler has probably
noticed that the loop on k in the first example can be eliminated since
its output value (in tmp) is not used.  
P: n/a

Mario Lüttich wrote: Hi
this code:
> float tmp = 0.; > for (int i = 0; i < 2000; i++) { > for (int j = i; j < 2000; j++) { > for (int k = 0; k < 9000; k++) { > tmp += tmp_mat(k,i) * tmp_mat(k,j); > } > tmp = 0.; > > } > cout << "J: " << i << endl; > }
is 100 times faster than this
> float tmp = 0.; > for (int i = 0; i < 2000; i++) { > for (int j = i; j < 2000; j++) { > for (int k = 0; k < 9000; k++) { > tmp += tmp_mat(k,i) * tmp_mat(k,j); > } > data2d[i][j] = tmp; // the problem > tmp = 0.; > > } > cout << "J: " << i << endl; > }
can anybody help me?
greetings
Mario
Mario,
Use pointers to loop through data2d incrementing them as required
like the code shown below. Incrementing pointers is quicker than the
calculations to index the array with i and j.
int data[10][10];
int (*pRow)[10] = data,*pCol;
int i,j;
for (i = 0;i < 10;++i,++pRow)
{
pCol = *pRow;
for (j = 0;j < 10;++j,++pCol)
{
*pCol = i * j;
}
}
JFJB  
P: n/a

n2xssvv g02gfr12930 wrote: Mario Lüttich wrote:
Hi
this code:
> float tmp = 0.; > for (int i = 0; i < 2000; i++) { > for (int j = i; j < 2000; j++) { > for (int k = 0; k < 9000; k++) { > tmp += tmp_mat(k,i) * tmp_mat(k,j); > } > tmp = 0.; > > } > cout << "J: " << i << endl; > }
is 100 times faster than this
> float tmp = 0.; > for (int i = 0; i < 2000; i++) { > for (int j = i; j < 2000; j++) { > for (int k = 0; k < 9000; k++) { > tmp += tmp_mat(k,i) * tmp_mat(k,j); > } > data2d[i][j] = tmp; // the problem > tmp = 0.; > > } > cout << "J: " << i << endl; > }
can anybody help me?
greetings
Mario
Mario,
Use pointers to loop through data2d incrementing them as required like the code shown below. Incrementing pointers is quicker than the calculations to index the array with i and j.
int data[10][10]; int (*pRow)[10] = data,*pCol; int i,j;
for (i = 0;i < 10;++i,++pRow) { pCol = *pRow; for (j = 0;j < 10;++j,++pCol) { *pCol = i * j; } }
JFJB
Thanks anybody!
greetings
Mario  
P: n/a

Mario Lüttich wrote: Hi
this code: [snip code that does not store the results anywhere] is 100 times faster than this [snip other code] can anybody help me?
Look at the assembler file generated by the compiler. See what it is
doing. Step through the code instruction by instruction in a debugger.
If the optimiser is eliminating the inner loop then that code will not
be exercised. Try turning the optimiser off and seeing if the versions
are still so dissimilar. If one version is calling a library routine
instead of using builtin instructions, you will notice that too. If
the assembly looks fine, then it is possible that architecture issues
(cache lines, FPU pipelines) are dominating. Its interesting to see
what your compiler is actually doing.  
P: n/a

On 28 Oct 2005 08:45:40 0700, "chrisg67" <ch******@gmail.com> wrote: Hi, This doesn't look like the same code, I think you're dereferencing j before decalring it, possibly you want to move the lines around like this: 
for (int i = 0; i < 2000; i++) { for (int j = i; j < 2000; j++) { float& where_to_put_it=data2d[i][j]; where_to_put_it=0; for (int k = 0; k < 9000; k++) { where_to_put_it += tmp_mat(k,i) * tmp_mat(k,j); } } cout << "J: " << i << endl; }
Sure. Wrting and not testing => bugging ;) Nice idea using a reference though, although I suspect this will be the same speed, how about this instead: 
for (int i = 0; i < 2000; i++) { float* where_is_i = data2d[i]; for (int j = i; j < 2000; j++) { float& where_to_put_it = where_is_i[j]; where_to_put_it=0; for (int k = 0; k < 9000; k++) { where_to_put_it += tmp_mat(k,i) * tmp_mat(k,j); } } cout << "J: " << i << endl; }
The idea here is to reduce the number of times that you dereference data2d with the same value of i.
OK. What about this one?
float **where_is_i=data2d;
for (int i = 0; i < 2000; ++i,++where_is_i) {
float *where_to_put_it=where_is_i+0;
for (int j = i; j < 2000; ++j,++where_is_i) {
*where_to_put_it=0;
for (int k = 0; k < 9000; k++) {
where_to_put_it += tmp_mat(k,i) * tmp_mat(k,j);
}
}
cout << "J: " << i << endl;
}
It has more additions, and less multiplications/shiftings. BTW, not
tested it, just for fun.  
P: n/a

Zara wrote: On 28 Oct 2005 08:45:40 0700, "chrisg67" <ch******@gmail.com> wrote:
Hi, This doesn't look like the same code, I think you're dereferencing j before decalring it, possibly you want to move the lines around like this: 
for (int i = 0; i < 2000; i++) { for (int j = i; j < 2000; j++) { float& where_to_put_it=data2d[i][j]; where_to_put_it=0; for (int k = 0; k < 9000; k++) { where_to_put_it += tmp_mat(k,i) * tmp_mat(k,j); } } cout << "J: " << i << endl; }
Sure. Wrting and not testing => bugging ;)
Nice idea using a reference though, although I suspect this will be the same speed, how about this instead: 
for (int i = 0; i < 2000; i++) { float* where_is_i = data2d[i]; for (int j = i; j < 2000; j++) { float& where_to_put_it = where_is_i[j]; where_to_put_it=0; for (int k = 0; k < 9000; k++) { where_to_put_it += tmp_mat(k,i) * tmp_mat(k,j); } } cout << "J: " << i << endl; }
The idea here is to reduce the number of times that you dereference data2d with the same value of i.
OK. What about this one?
float **where_is_i=data2d; for (int i = 0; i < 2000; ++i,++where_is_i) { float *where_to_put_it=where_is_i+0; for (int j = i; j < 2000; ++j,++where_is_i) { *where_to_put_it=0; for (int k = 0; k < 9000; k++) { where_to_put_it += tmp_mat(k,i) * tmp_mat(k,j); } } cout << "J: " << i << endl; }
It has more additions, and less multiplications/shiftings. BTW, not tested it, just for fun.
This one ist faster. And my first case was without the last loop.
Thanks at everyone that helped me.   This discussion thread is closed Replies have been disabled for this discussion.   Question stats  viewed: 1363
 replies: 9
 date asked: Oct 28 '05
