code optimization

syed

I want clues to optimize a function that copies elements of a
2-dimensional array to another. Is there a way to optimize this piece
of code so that it runs faster?

#define RIDX(i,j,n) ((i)*(n)+(j))

void myrotate(int dim, pixel *src, pixel *dst)
{

for (i = 0; i < dim; i++)
for (j = 0; j < dim; j++)
dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)];
}

Nov 15 '05 #1

Subscribe Post Reply

2972

Richard Tobin

In article <11*********************@z14g2000cwz.googlegroups. com>,
syed <su*******@gmail.com> wrote:

I want clues to optimize a function that copies elements of a
2-dimensional array to another. Is there a way to optimize this piece
of code so that it runs faster?
Probably not much. You are changing the order of the elements, so
you can't use memcpy().
#define RIDX(i,j,n) ((i)*(n)+(j))

Depending on what you are doing with the array, you might be able to
avoid copying it altogether, and instead use a different version of
RIDX to access the elements in the new order.

-- Richard

Nov 15 '05 #2

Ulrich Eckhardt

One thing up front: C itself doesn't define anything like performance. I'll
somewhat assume a typical desktop CPU.

syed wrote:

I want clues to optimize a function that copies elements of a
2-dimensional array to another. Is there a way to optimize this piece
of code so that it runs faster?

#define RIDX(i,j,n) ((i)*(n)+(j))
Why? Why not use an inline function?
void myrotate(int dim, pixel *src, pixel *dst)
Okay, a few points here:
- If 'dim' doesn't change, you could make it a constant giving the compiler
possibility to optimise code better.
- You're not going to modify 'src', so make that a pointer to const pixel.
Note that this won't necessarily influence performance.
- There's the 'restrict' property in C99 which (AFAIK) can be used to tell
the compiler that the two ranges don't overlap. This might make things
faster as the compiler can prefetch the read-values earlier.
{

for (i = 0; i < dim; i++)
Hmm, where is 'i'? Making it a global doesn't speed up things. Also:
assert(src && dst && (dim>0));

for (i = 0; i < dim; i++)
for (j = 0; j < dim; j++)
dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)];

Hmm, not much that can be done here. What I'd try is to compute a pointer
to the row in the inner loop in order not to redo this computation for
every pixel. Maybe a "duff's device" would help, but I'd rather expect
current compilers to unroll loops themselves.

Lastly, the probably fastest way is when this is simply not done - instead
you could compute the index differently when accessing the matrix.

Uli

Nov 15 '05 #3

Syed

Hi

Thanks for your reply, it was very helpful. I would like to iterate through
the 2-dimensional array using a single loop (instead of two). Is it
possible?

Also, is there any room for loop-unrolling in the code snippet?

Thanks again.
Syed Ijaz

-----
#define RIDX(i,j,n) ((i)*(n)+(j))

void myrotate(int dim, pixel *src, pixel *dst)
{

for (i = 0; i < dim; i++)
for (j = 0; j < dim; j++)
dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)];
}

Nov 15 '05 #4

Malcolm

"syed" <su*******@gmail.com> wrote

I want clues to optimize a function that copies elements of a
2-dimensional array to another. Is there a way to optimize this piece
of code so that it runs faster?

#define RIDX(i,j,n) ((i)*(n)+(j))

void myrotate(int dim, pixel *src, pixel *dst)
{

for (i = 0; i < dim; i++)
for (j = 0; j < dim; j++)
dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)];
}

You are calculating i and j * dim on each iteration.
If you maintain an offset index and increment or decrement by dim on each
pass, you might save a few instructions.

Nov 15 '05 #5

Richard Tobin

In article <dh**********@nwrdmz01.dmz.ncs.ea.ibs-infra.bt.com>,
Malcolm <re*******@btinternet.com> wrote:

#define RIDX(i,j,n) ((i)*(n)+(j)) dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)];
You are calculating i and j * dim on each iteration.

A reasonable compiler will avoid this.

In general modern compilers are likely to be better at this kind of
micro-optimisation than programmers, because they know the
characteristics of the target processor. Hand-optimising for
one architecture may make things worse on another.

-- Richard

Nov 15 '05 #6

Walter Bright

"Richard Tobin" <ri*****@cogsci.ed.ac.uk> wrote in message
news:dh***********@pc-news.cogsci.ed.ac.uk...

In article <dh**********@nwrdmz01.dmz.ncs.ea.ibs-infra.bt.com>,
Malcolm <re*******@btinternet.com> wrote:
#define RIDX(i,j,n) ((i)*(n)+(j)) dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)];

You are calculating i and j * dim on each iteration.

A reasonable compiler will avoid this.

In general modern compilers are likely to be better at this kind of
micro-optimisation than programmers, because they know the
characteristics of the target processor. Hand-optimising for
one architecture may make things worse on another.

Right. I'd add that it makes sense to look at the optimized assembler
produced by one's particular compiler, as it may already be doing quite a
few loop optimizations on it. Hand doing the optimizations may actually wind
up defeating the compiler ones and make things worse. Many compilers do
very, very well at optimizing loops such as this.

-Walter Bright
www.digitalmars.com free C, C++, D programming language compilers

Nov 15 '05 #7

Old Wolf

Ulrich Eckhardt wrote:

One thing up front: C itself doesn't define anything like performance.
I'll somewhat assume a typical desktop CPU.

syed wrote:
I want clues to optimize a function that copies elements of a
2-dimensional array to another. Is there a way to optimize this piece
of code so that it runs faster?

#define RIDX(i,j,n) ((i)*(n)+(j))

Why? Why not use an inline function?

Inline functions are a C99 feature, although many (but not all)
C89 compilers support them. One compiler I use will recognize the
keyword, but never actually inline any function (which is
conforming behaviour, in fact).

Macros may be 'evil', but can also be useful IMHO.

Nov 15 '05 #8

Sascha Springer

syed schrieb:

I want clues to optimize a function that copies elements of a
2-dimensional array to another. Is there a way to optimize this piece
of code so that it runs faster?

This depends on many things like processor type (RISC, CISC, VLIW,
etc.), number of registers, number of pipelines, cache sizes
(instruction, data), instruction set (SIMD, DSP) and available resources
(RAM) of the architecture involved in the given scenario.

So in general we cannot give you hints about how to optimize your code
when we don't know the details. Anyway, I'm afraid discussing such
details would be even more OT in this group.

However, the compiler for your specific platform should give you good
optimization and profiling options to find a way to improve the
performance of your code. Probably there's an optimized library for such
problems provided by a third party vendor.

Regards
Sascha

Nov 15 '05 #9

Kenneth Brody

syed wrote:

I want clues to optimize a function that copies elements of a
2-dimensional array to another. Is there a way to optimize this piece
of code so that it runs faster?

#define RIDX(i,j,n) ((i)*(n)+(j))

void myrotate(int dim, pixel *src, pixel *dst)
{

for (i = 0; i < dim; i++)
for (j = 0; j < dim; j++)
dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)];
}

In the "src[RIDX(i,j,dim)]" expression, "i" and "dim" don't change for the
duration of the inner for-loop. Since your RIDX macro will cause these two
same values to be multiplied each time through the loop, you can optimize
this away by calculating this only once:

for (i = 0; i < dim; i++)
{
int temp = i*dim;
for (j = 0; j < dim; j++)
dst[RIDX(dim-1-j, i, dim)] = src[temp+j];
}

You can do this one better by not calcluating the addition each time,
either:

for (i = 0; i < dim; i++)
{
pixel *src2 = &src[RIDX(i,0,dim)];
for (j = 0; j < dim; j++)
dst[RIDX(dim-1-j, i, dim)] = src2[j];
}

A similar optimization can be done on the dst[] array by realizing that the
subscript will be decrementing by dim each time through the loop. Calculate
the initial pointer as "&dst[RIDX(dim-1,i,dim)]" and then decrement that
pointer by "dim" each time through the loop.

for (i = 0; i < dim; i++)
{
pixel *src2 = &src[RIDX(i,0,dim)];
pixel *dst2 = &dst[RIDX(dim-1,i,dim)];
for (j = 0; j < dim; j++)
{
*dst2 = *src2++;
dst2 -= dim;
}
}

You have now eliminated the need for almost all of the multiplcations,
going from dim^2 to dim*2.

--
+-------------------------+--------------------+-----------------------------+
| Kenneth J. Brody | www.hvcomputer.com | |
| kenbrody/at\spamcop.net | www.fptech.com | #include <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------------+
Don't e-mail me at: <mailto:Th*************@gmail.com>

Nov 15 '05 #10

Anonymous 7843

In article <dh***********@pc-news.cogsci.ed.ac.uk>,
Richard Tobin <ri*****@cogsci.ed.ac.uk> wrote:

In article <11*********************@z14g2000cwz.googlegroups. com>,
syed <su*******@gmail.com> wrote:
#define RIDX(i,j,n) ((i)*(n)+(j))

Depending on what you are doing with the array, you might be able to
avoid copying it altogether, and instead use a different version of
RIDX to access the elements in the new order.

Specifically...

#define ROTATED_RIDX(i,j,n) ((j)*(n)+((dim)-(i)-1))

Nov 15 '05 #11

Anonymous 7843

In article <wh_Ze.16262$mH.10320@fed1read07>, I wrote:

In article <dh***********@pc-news.cogsci.ed.ac.uk>,
Richard Tobin <ri*****@cogsci.ed.ac.uk> wrote:

In article <11*********************@z14g2000cwz.googlegroups. com>,
syed <su*******@gmail.com> wrote:
#define RIDX(i,j,n) ((i)*(n)+(j))

Depending on what you are doing with the array, you might be able to
avoid copying it altogether, and instead use a different version of
RIDX to access the elements in the new order.

Specifically...

#define ROTATED_RIDX(i,j,n) ((j)*(n)+((dim)-(i)-1))

Sigh, let me fix it before someone else does it for me:

#define ROTATED_RIDX(i,j,n) ((j)*(n)+((n)-(i)-1))

Nov 15 '05 #12

by: PWalker | last post by:

Hi, I have written code that I would like to optimize. I need to push it to the limit interms of speed as the accuracy of results are proportional to runtime. First off, would anyone know any...

C / C++

much optimizing = bad code ? (array/vector)

by: Hagen | last post by:

Hi, I have a question that you probably shouldnÂ´t worry about since the compiler cares for it, but anyways: When you run your compiler with optimization turned on (eg. g++ with -Ox flag) and...

C / C++

Compiler code optimization: see code below

by: joshc | last post by:

I'm writing some C to be used in an embedded environment and the code needs to be optimized. I have a question about optimizing compilers in general. I'm using GCC for the workstation and Diab...

C / C++

wrinting an optimised code

by: junky_fellow | last post by:

What are the basic guidelines to write an optimised code. What points should one keep in mind for this ? Is this purely architecture or complier specific ? Are there any general techniques that...

C / C++

Efficiently compiled code

by: cyberscout | last post by:

OK I have some code which I didn't write and I'm toying with whether I need to tidy it up. In the code is the line shown in Example 1 Exampe 1: sprintf(stringvariable, "%s", "String"); ...

C / C++

code optimization- Removing if .. else conditions in for loops

by: Kunal | last post by:

Hello, I need help in removing if ..else conditions inside for loops. I have used the following method but I am not sure whether it has actually helped. Below is an example to illustrate what I...

C / C++

optimized code

by: Mike | last post by:

Is it still true that the managed C++ compiler will produce much better opimizations than the C# compiler, or have some of the more global/aggressive opimizations been rolled into the 2005...

C# / C Sharp

Can you write code directly in CIL ???

by: Peter Olcott | last post by:

Cab you write code directly in the Common Intermediate language? I need to optimize a critical real-time function.

C# / C Sharp

simple code performance question

by: galiorenye | last post by:

Hi, Given this code: A** ppA = new A*; A *pA = NULL; for(int i = 0; i < 10; ++i) { pA = ppA; //do something with pA

C / C++

Behavior of the code

by: somenath | last post by:

Hi All, I was going through one of the exercise of one C tutorial . In that they have given one small code and asked about the output. #include <stdio.h> int main(void) { int x =...

C / C++

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

code optimization

Similar topics