473,387 Members | 1,542 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

optimisation of a function

Hello

I am working on the optimization of a function, which should do
something extensive work. Running the profiler, I identified this
function to be the bottleneck.

Simplified function looks like this:

void a( int n, int m, int *p )
{
for ( int i=0; i<n; ++i )
{
for ( int j=0; j<m; j =+ 4 )
{
const int data1 = p[SIZE*i+0];
const int data2 = p[SIZE*i+1];
const int data3 = p[SIZE*i+2];
const int data4 = p[SIZE*i+3];
// use data in the calculation and return the result to
// p[j+SIZE*i]
}
}

SIZE - some const predefined value

Now they introduced another variable number of data used in the
calculations, and I have to modify this to get the optimal function.
Number of data is not that big (2 to 8).

Now, I have several solutions, and would like to hear which in your
opinion is the best.

Solution 1:
template < int N >
void a( int n, int m, int *p )
{
for ( int i=0; i<n; ++i )
{
for ( int j=0; j<m; j =+ 4 )
{
int pData[N];
for ( int k=0; k<N; ++k)
{
pData[k] = p[SIZE*i+k];
}

// use pData in the calculation and return the result to
// p[j+SIZE*i]
}
}

Solution 2:
Define 7 different functions, which do the calculation on 2 to 8 number
of data. After all, the template will unroll into these.

Maybe solution 3?

Looking forward to hear from you.

Cheers!
Oct 10 '07 #1
2 1189
anon <an**@no.nowrote in
news:fe**********@el-srv04-CHE.srvnet.eastlink.de:
Hello

I am working on the optimization of a function, which should do
something extensive work. Running the profiler, I identified this
function to be the bottleneck.

Simplified function looks like this:

void a( int n, int m, int *p )
{
for ( int i=0; i<n; ++i )
{
for ( int j=0; j<m; j =+ 4 )
{
const int data1 = p[SIZE*i+0];
const int data2 = p[SIZE*i+1];
const int data3 = p[SIZE*i+2];
const int data4 = p[SIZE*i+3];
// use data in the calculation and return the result to
// p[j+SIZE*i]
}
}
Um... does your profiler say that your function is taking too much time
initializing those variables?
SIZE - some const predefined value

Now they introduced another variable number of data used in the
calculations, and I have to modify this to get the optimal function.
Number of data is not that big (2 to 8).
"optimal" function is rather ambiguous. However, one thing to note, the
variables data1 through data4 aren't dependant in any way on the value
of j. So why not lift those four variables out of the j loop. Saves
you having to reinitialize them (i * (j / 4)) times.
>
Now, I have several solutions, and would like to hear which in your
opinion is the best.

Solution 1:
template < int N >
void a( int n, int m, int *p )
{
for ( int i=0; i<n; ++i )
{
for ( int j=0; j<m; j =+ 4 )
{
int pData[N];
for ( int k=0; k<N; ++k)
{
pData[k] = p[SIZE*i+k];
}

// use pData in the calculation and return the result to
// p[j+SIZE*i]
}
}
Same deal.. your initialization of pData isn't dependant on j, so bring
it out of the loop. Also, isn't your j loop supposed to be "j += N"
for the increment stage?
Solution 2:
Define 7 different functions, which do the calculation on 2 to 8
number of data. After all, the template will unroll into these.
Ick. Template's probably better. Also with templates, if you don't use
it, it won't get instantiated. May result in smaller code.

Third possibility, I'm not sure how you're using the pData variables,
but perhaps references might be an idea too. Or have pData simply point
to the beginning of the range of values you're looking at (they were
originally declared as const, so I'm assuming that you're not going to
be modifying those values):

int * pData = &p[SIZE * i];

And then use pData[0] -pData[N - 1] as appropriate.

Of course... run the profiler on the modified versions to see if you've
had any effect.
Oct 10 '07 #2
Andre Kostur wrote:
anon <an**@no.nowrote in
news:fe**********@el-srv04-CHE.srvnet.eastlink.de:
>Hello

I am working on the optimization of a function, which should do
something extensive work. Running the profiler, I identified this
function to be the bottleneck.

Simplified function looks like this:

void a( int n, int m, int *p )
{
for ( int i=0; i<n; ++i )
{
for ( int j=0; j<m; j =+ 4 )
{
const int data1 = p[SIZE*i+0];
const int data2 = p[SIZE*i+1];
const int data3 = p[SIZE*i+2];
const int data4 = p[SIZE*i+3];
// use data in the calculation and return the result to
// p[j+SIZE*i]
}
}

Um... does your profiler say that your function is taking too much time
initializing those variables?
>SIZE - some const predefined value

Now they introduced another variable number of data used in the
calculations, and I have to modify this to get the optimal function.
Number of data is not that big (2 to 8).

"optimal" function is rather ambiguous. However, one thing to note, the
variables data1 through data4 aren't dependant in any way on the value
of j. So why not lift those four variables out of the j loop. Saves
you having to reinitialize them (i * (j / 4)) times.
Sorry, thats typo :( Those depend on j:
const int data1 = p[SIZE*i+4*j+0];
const int data2 = p[SIZE*i+4*j+1];
const int data3 = p[SIZE*i+4*j+2];
const int data4 = p[SIZE*i+4*j+3];
>
>Now, I have several solutions, and would like to hear which in your
opinion is the best.

Solution 1:
template < int N >
void a( int n, int m, int *p )
{
for ( int i=0; i<n; ++i )
{
for ( int j=0; j<m; j =+ 4 )
{
int pData[N];
for ( int k=0; k<N; ++k)
{
pData[k] = p[SIZE*i+k];
}

// use pData in the calculation and return the result to
// p[j+SIZE*i]
}
}

Same deal.. your initialization of pData isn't dependant on j, so bring
it out of the loop. Also, isn't your j loop supposed to be "j += N"
for the increment stage?
yes. That + same thing for pData:
template < int N >
void a( int n, int m, int *p )
{
for ( int i=0; i<n; ++i )
{
for ( int j=0; j<m; j =+ N ) <<<
{
int pData[N];
for ( int k=0; k<N; ++k)
{
pData[k] = p[SIZE*i+N*j+k]; <<<
}

// use pData in the calculation and return the result to
// p[j+SIZE*i]
}
}

>
>Solution 2:
Define 7 different functions, which do the calculation on 2 to 8
number of data. After all, the template will unroll into these.

Ick. Template's probably better. Also with templates, if you don't use
it, it won't get instantiated. May result in smaller code.
I am mostly concerned with performances for this function. If it is
better to have 8 different function, I would go for it.
My profiler says that I loose some time in the 3rd loop:
int pData[N];
for ( int k=0; k<N; ++k)
{
pData[k] = p[SIZE*i+N*j+k];
}
to initialize these values, and to initialize and handle k.
>
Third possibility, I'm not sure how you're using the pData variables,
but perhaps references might be an idea too. Or have pData simply point
Those data is used in a complex math function, impossible to optimize
further.
to the beginning of the range of values you're looking at (they were
originally declared as const, so I'm assuming that you're not going to
be modifying those values):

int * pData = &p[SIZE * i];

And then use pData[0] -pData[N - 1] as appropriate.

Of course... run the profiler on the modified versions to see if you've
had any effect.
Yes, thats the best test.
Oct 10 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Simon Elliott | last post by:
What optimisation do compilers typically provide when passing STL containers around? For example, if I do something like this: struct Tbloggs { std::string s1; }; typedef...
16
by: simonwittber | last post by:
Hello People. I've have a very tight inner loop (in a game app, so every millisecond counts) which I have optimised below: def loop(self): self_pool = self.pool self_call_exit_funcs =...
13
by: David Mitchell | last post by:
I use the above function in queries for a number of forms and reports. The reports take approx 20 seconds to open. There are only 100 product id's in tblProducts. My concern is that the time will...
17
by: EC-AKD | last post by:
Hi All, I am new to the concept of optimising codes in C. I was wondering if C level inlining of code is any way comparable to macros. I believe that inlining is equivalent to writing macros....
8
by: Jon Maz | last post by:
Hi, I'm facing a code-optimisation issue on an asp.net/vb.net/SQL Server 2000 project. A web page containing not much more than 3 DropDownLists is taking nigh on 6 seconds to load, because each...
1
by: David Welch | last post by:
Hi, I have a bit of code where I am relying on empty base member optimisation. The bit of code is below: template<typename Enum> struct EncodePrefix { template<Enum e> struct Apply
1
by: grid | last post by:
Hi, I was exploring the affect of cache on program performance/optimisation.Is it the compilers responsibility only to consider this kind of optimisation or the programmer can do his bit in this...
2
by: special_dragonfly | last post by:
Hello, I know this might be a little cheeky, and if it is, please say, but I need a little hand optimising some code. For the simple reason that this is 'company' code and I have no idea what I'm...
2
by: =?Utf-8?B?cmljaGFyZCBzYW5jZW5vdA==?= | last post by:
When calling the DrawSomething function, i get an access violation in release mode. This error happens when "Increase speed (/02)" is enabled (Preferences->C/C++ -Optimisation -Increase speed...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.