optimisation of a function

anon

Hello

I am working on the optimization of a function, which should do
something extensive work. Running the profiler, I identified this
function to be the bottleneck.

Simplified function looks like this:

void a( int n, int m, int *p )
{
for ( int i=0; i<n; ++i )
{
for ( int j=0; j<m; j =+ 4 )
{
const int data1 = p[SIZE*i+0];
const int data2 = p[SIZE*i+1];
const int data3 = p[SIZE*i+2];
const int data4 = p[SIZE*i+3];
// use data in the calculation and return the result to
// p[j+SIZE*i]
}
}

SIZE - some const predefined value

Now they introduced another variable number of data used in the
calculations, and I have to modify this to get the optimal function.
Number of data is not that big (2 to 8).

Now, I have several solutions, and would like to hear which in your
opinion is the best.

Solution 1:
template < int N >
void a( int n, int m, int *p )
{
for ( int i=0; i<n; ++i )
{
for ( int j=0; j<m; j =+ 4 )
{
int pData[N];
for ( int k=0; k<N; ++k)
{
pData[k] = p[SIZE*i+k];
}

// use pData in the calculation and return the result to
// p[j+SIZE*i]
}
}

Solution 2:
Define 7 different functions, which do the calculation on 2 to 8 number
of data. After all, the template will unroll into these.

Maybe solution 3?

Looking forward to hear from you.

Cheers!

Oct 10 '07 #1

Subscribe Post Reply

1189

Andre Kostur

anon <an**@no.nowrote in
news:fe**********@el-srv04-CHE.srvnet.eastlink.de:

Hello

I am working on the optimization of a function, which should do
something extensive work. Running the profiler, I identified this
function to be the bottleneck.

Simplified function looks like this:

void a( int n, int m, int *p )
{
for ( int i=0; i<n; ++i )
{
for ( int j=0; j<m; j =+ 4 )
{
const int data1 = p[SIZE*i+0];
const int data2 = p[SIZE*i+1];
const int data3 = p[SIZE*i+2];
const int data4 = p[SIZE*i+3];
// use data in the calculation and return the result to
// p[j+SIZE*i]
}
}

Um... does your profiler say that your function is taking too much time
initializing those variables?

SIZE - some const predefined value

Now they introduced another variable number of data used in the
calculations, and I have to modify this to get the optimal function.
Number of data is not that big (2 to 8).

"optimal" function is rather ambiguous. However, one thing to note, the
variables data1 through data4 aren't dependant in any way on the value
of j. So why not lift those four variables out of the j loop. Saves
you having to reinitialize them (i * (j / 4)) times.

>
Now, I have several solutions, and would like to hear which in your
opinion is the best.

Solution 1:
template < int N >
void a( int n, int m, int *p )
{
for ( int i=0; i<n; ++i )
{
for ( int j=0; j<m; j =+ 4 )
{
int pData[N];
for ( int k=0; k<N; ++k)
{
pData[k] = p[SIZE*i+k];
}

// use pData in the calculation and return the result to
// p[j+SIZE*i]
}
}

Same deal.. your initialization of pData isn't dependant on j, so bring
it out of the loop. Also, isn't your j loop supposed to be "j += N"
for the increment stage?

Solution 2:
Define 7 different functions, which do the calculation on 2 to 8
number of data. After all, the template will unroll into these.

Ick. Template's probably better. Also with templates, if you don't use
it, it won't get instantiated. May result in smaller code.

Third possibility, I'm not sure how you're using the pData variables,
but perhaps references might be an idea too. Or have pData simply point
to the beginning of the range of values you're looking at (they were
originally declared as const, so I'm assuming that you're not going to
be modifying those values):

int * pData = &p[SIZE * i];

And then use pData[0] -pData[N - 1] as appropriate.

Of course... run the profiler on the modified versions to see if you've
had any effect.

Oct 10 '07 #2

anon

Andre Kostur wrote:

anon <an**@no.nowrote in
news:fe**********@el-srv04-CHE.srvnet.eastlink.de:

>Hello

I am working on the optimization of a function, which should do
something extensive work. Running the profiler, I identified this
function to be the bottleneck.

Simplified function looks like this:

void a( int n, int m, int *p )
{
for ( int i=0; i<n; ++i )
{
for ( int j=0; j<m; j =+ 4 )
{
const int data1 = p[SIZE*i+0];
const int data2 = p[SIZE*i+1];
const int data3 = p[SIZE*i+2];
const int data4 = p[SIZE*i+3];
// use data in the calculation and return the result to
// p[j+SIZE*i]
}
}

Um... does your profiler say that your function is taking too much time
initializing those variables?

>SIZE - some const predefined value

Now they introduced another variable number of data used in the
calculations, and I have to modify this to get the optimal function.
Number of data is not that big (2 to 8).

"optimal" function is rather ambiguous. However, one thing to note, the
variables data1 through data4 aren't dependant in any way on the value
of j. So why not lift those four variables out of the j loop. Saves
you having to reinitialize them (i * (j / 4)) times.

Sorry, thats typo :( Those depend on j:
const int data1 = p[SIZE*i+4*j+0];
const int data2 = p[SIZE*i+4*j+1];
const int data3 = p[SIZE*i+4*j+2];
const int data4 = p[SIZE*i+4*j+3];

>
>Now, I have several solutions, and would like to hear which in your
opinion is the best.

Solution 1:
template < int N >
void a( int n, int m, int *p )
{
for ( int i=0; i<n; ++i )
{
for ( int j=0; j<m; j =+ 4 )
{
int pData[N];
for ( int k=0; k<N; ++k)
{
pData[k] = p[SIZE*i+k];
}

// use pData in the calculation and return the result to
// p[j+SIZE*i]
}
}

Same deal.. your initialization of pData isn't dependant on j, so bring
it out of the loop. Also, isn't your j loop supposed to be "j += N"
for the increment stage?

yes. That + same thing for pData:
template < int N >
void a( int n, int m, int *p )
{
for ( int i=0; i<n; ++i )
{
for ( int j=0; j<m; j =+ N ) <<<
{
int pData[N];
for ( int k=0; k<N; ++k)
{
pData[k] = p[SIZE*i+N*j+k]; <<<
}

// use pData in the calculation and return the result to
// p[j+SIZE*i]
}
}

>
>Solution 2:
Define 7 different functions, which do the calculation on 2 to 8
number of data. After all, the template will unroll into these.

Ick. Template's probably better. Also with templates, if you don't use
it, it won't get instantiated. May result in smaller code.

I am mostly concerned with performances for this function. If it is
better to have 8 different function, I would go for it.
My profiler says that I loose some time in the 3rd loop:
int pData[N];
for ( int k=0; k<N; ++k)
{
pData[k] = p[SIZE*i+N*j+k];
}
to initialize these values, and to initialize and handle k.

>
Third possibility, I'm not sure how you're using the pData variables,
but perhaps references might be an idea too. Or have pData simply point

Those data is used in a complex math function, impossible to optimize
further.

to the beginning of the range of values you're looking at (they were
originally declared as const, so I'm assuming that you're not going to
be modifying those values):

int * pData = &p[SIZE * i];

And then use pData[0] -pData[N - 1] as appropriate.

Of course... run the profiler on the modified versions to see if you've
had any effect.

Yes, thats the best test.

Oct 10 '07 #3

by: Simon Elliott | last post by:

What optimisation do compilers typically provide when passing STL containers around? For example, if I do something like this: struct Tbloggs { std::string s1; }; typedef...

C / C++

Magic Optimisation

by: simonwittber | last post by:

Hello People. I've have a very tight inner loop (in a game app, so every millisecond counts) which I have optimised below: def loop(self): self_pool = self.pool self_call_exit_funcs =...

Python

Albert Kallal's QuantityOnHand - Speed Optimisation

by: David Mitchell | last post by:

I use the above function in queries for a number of forms and reports. The reports take approx 20 seconds to open. There are only 100 product id's in tblProducts. My concern is that the time will...

Microsoft Access / VBA

Optimisation techniques

by: EC-AKD | last post by:

Hi All, I am new to the concept of optimising codes in C. I was wondering if C level inlining of code is any way comparable to macros. I believe that inlining is equivalent to writing macros....

C / C++

DB optimisation vs OOP layer separation

by: Jon Maz | last post by:

Hi, I'm facing a code-optimisation issue on an asp.net/vb.net/SQL Server 2000 project. A web page containing not much more than 3 DropDownLists is taking nigh on 6 seconds to load, because each...

Visual Basic .NET

Empty base member optimisation with multiple inheritance

by: David Welch | last post by:

Hi, I have a bit of code where I am relying on empty base member optimisation. The bit of code is below: template<typename Enum> struct EncodePrefix { template<Enum e> struct Apply

C / C++

Program performance/optimisation

by: grid | last post by:

Hi, I was exploring the affect of cache on program performance/optimisation.Is it the compilers responsibility only to consider this kind of optimisation or the programmer can do his bit in this...

C / C++

Help with optimisation

by: special_dragonfly | last post by:

Hello, I know this might be a little cheeky, and if it is, please say, but I need a little hand optimising some code. For the simple reason that this is 'company' code and I have no idea what I'm...

Python

VC++ bug with code optimisation enabled?

by: =?Utf-8?B?cmljaGFyZCBzYW5jZW5vdA==?= | last post by:

When calling the DrawSomething function, i get an access violation in release mode. This error happens when "Increase speed (/02)" is enabled (Preferences->C/C++ -Optimisation -Increase speed...

.NET Framework

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

optimisation of a function

Similar topics