Hello
I am working on the optimization of a function, which should do
something extensive work. Running the profiler, I identified this
function to be the bottleneck.
Simplified function looks like this:
void a( int n, int m, int *p )
{
for ( int i=0; i<n; ++i )
{
for ( int j=0; j<m; j =+ 4 )
{
const int data1 = p[SIZE*i+0];
const int data2 = p[SIZE*i+1];
const int data3 = p[SIZE*i+2];
const int data4 = p[SIZE*i+3];
// use data in the calculation and return the result to
// p[j+SIZE*i]
}
}
SIZE - some const predefined value
Now they introduced another variable number of data used in the
calculations, and I have to modify this to get the optimal function.
Number of data is not that big (2 to 8).
Now, I have several solutions, and would like to hear which in your
opinion is the best.
Solution 1:
template < int N >
void a( int n, int m, int *p )
{
for ( int i=0; i<n; ++i )
{
for ( int j=0; j<m; j =+ 4 )
{
int pData[N];
for ( int k=0; k<N; ++k)
{
pData[k] = p[SIZE*i+k];
}
// use pData in the calculation and return the result to
// p[j+SIZE*i]
}
}
Solution 2:
Define 7 different functions, which do the calculation on 2 to 8 number
of data. After all, the template will unroll into these.
Maybe solution 3?
Looking forward to hear from you.
Cheers!