By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,402 Members | 897 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,402 IT Pros & Developers. It's quick & easy.

Help Me.

P: n/a
Can anybody help me...
I have following code which I have optimized using different
optimization techniques
like loop unrolling, use of pointers.

extern long mac(const short *a, const short *b, long sqr, long * sum)
{
int i;
long dotp = *sum;
short c = *b++;
short d;
for (i = 0; i < 74; i++)
{
sqr += (long)c * c;
d = *a++;
dotp += (long)c * d;
c = *b++;
sqr += (long)c * c;
d = *a++;
dotp += (long)c * d;
c = *b++;
}
sqr += (long)c * c; /* Loop epilogue */
d = *a++;
d = *a++;
dotp += (long)c * d;
c = *b++;
sqr += (long)c * c;
d = *a++;
dotp += (long)c * d;
*sum = dotp;
return sqr;
}
Can anybody suggest me any further optimization possible in the above
code?
Please help me.

Sumkari

Jul 7 '06 #1
Share this Question
Share on Google+
14 Replies


P: n/a
ku********@gmail.com wrote:
Can anybody help me...
I have following code which I have optimized using different
optimization techniques
like loop unrolling, use of pointers.
<snip>
>
Can anybody suggest me any further optimization possible in the above
code?
What does your profiler tell you? I assume you profiled the code before
you optimised it.

--
Ian Collins.
Jul 7 '06 #2

P: n/a
Ian Collins wrote:
ku********@gmail.com wrote:
>Can anybody help me...
I have following code which I have optimized using different
optimization techniques
like loop unrolling, use of pointers.
<snip>
>Can anybody suggest me any further optimization possible in the above
code?

What does your profiler tell you? I assume you profiled the code before
you optimised it.
Exactly. Remember Hoare's Law/Knuth's Law (attribution varies):
"Premature optimization is the root of all evil".
Jul 7 '06 #3

P: n/a
Hi Sukumari,
I have following code which I have optimized using different
optimization techniques
like loop unrolling, use of pointers.
use of pointers doesn't mean optimizing the code.

you did not mention what your code is meant for. Sorry to say that it
has some bugs.
short c = *b++;
here pointer b is incremented not the value of b. even if you wanted to
increment the value, you cannot do that because it is a constant.

and all increments to a, b will lead to pointer increment in your code
and the ouput is undefined.

Specify briefly what mac function does.

-- Murali Krishna

Jul 7 '06 #4

P: n/a
On 6 Jul 2006 21:55:01 -0700, ku********@gmail.com wrote:
>Can anybody help me...
I have following code which I have optimized using different
optimization techniques
like loop unrolling, use of pointers.

extern long mac(const short *a, const short *b, long sqr, long * sum)
{
<...>
}
Can anybody suggest me any further optimization possible in the above
code?
Normally, the optimizations as loop unrolling are done by the
compiler itself. And no optimization should be tried before profiling
tells you it is necessary.

But you may take a look to the following, it *may* be more optimized:
>extern long mac(const short *a, const short *b, long sqr, long * sum)
{
int i;
long dotp = *sum;
short c = *b;
>short d;
for (i = 0; i < 74; i++)
{
sqr += (long)c * c;
d = *a;
>dotp += (long)c * d;
c = *++b;
>sqr += (long)c * c;
d = *++a;
>dotp += (long)c * d;
c = *++b;
}
sqr += (long)c * c; /* Loop epilogue */
d = *++a;
d = *++a;
>dotp += (long)c * d;
c = *++b;
>sqr += (long)c * c;
d = *++a;
>dotp += (long)c * d;
*sum = dotp;
return sqr;
}
This way, you increment pointers a and b one time less, and using
preincrement *might* be faster than postincrement because it is
unnecessary to keep old value after changing it.

*BUT* compiler optimizations might do better than this, and only a
profiler can tell us which is the best option.

Best regards,

Zara
Jul 7 '06 #5

P: n/a
Zara wrote:
On 6 Jul 2006 21:55:01 -0700, ku********@gmail.com wrote:
Can anybody help me...
I have following code which I have optimized using different
optimization techniques
like loop unrolling, use of pointers.

extern long mac(const short *a, const short *b, long sqr, long * sum)
{
<...>
}
Can anybody suggest me any further optimization possible in the above
code?
Normally, the optimizations as loop unrolling are done by the
compiler itself. And no optimization should be tried before profiling
tells you it is necessary.

But you may take a look to the following, it *may* be more optimized:
extern long mac(const short *a, const short *b, long sqr, long * sum)
{
int i;
long dotp = *sum;
short c = *b;
short d;
for (i = 0; i < 74; i++)
{
sqr += (long)c * c;
d = *a;
dotp += (long)c * d;
c = *++b;
sqr += (long)c * c;
d = *++a;
dotp += (long)c * d;
c = *++b;
}
sqr += (long)c * c; /* Loop epilogue */
d = *++a;
d = *++a;
dotp += (long)c * d;
c = *++b;
sqr += (long)c * c;
d = *++a;
dotp += (long)c * d;
*sum = dotp;
return sqr;
}

This way, you increment pointers a and b one time less, and using
preincrement *might* be faster than postincrement because it is
unnecessary to keep old value after changing it.

*BUT* compiler optimizations might do better than this, and only a
profiler can tell us which is the best option.

Best regards,

Zara
pre increments and post increment will result in different o/p no
matter which one is faster.
c = *++b;
will again lead to pointer increment not the value increment.

-- Murali Krishna

Jul 7 '06 #6

P: n/a
pre increments and post increment will result in different o/p no
matter which one is faster.
c = *++b;

will again lead to pointer increment not the value increment.
Maybe he intends to increment pointer. I get a feeling b is an array
that he has declared as a pointer. But, as you said, without more
details, its hard to figure out.

Jul 7 '06 #7

P: n/a
"Murali Krishna" <pm*********@gmail.comwrote:
Hi Sukumari,
I have following code which I have optimized using
different optimization techniques like loop unrolling,
use of pointers.

use of pointers doesn't mean optimizing the code.

you did not mention what your code is meant for. Sorry to
say that it has some bugs.
short c = *b++;

here pointer b is incremented not the value of b. even if
you wanted to increment the value, you cannot do that because
it is a constant,
Well, not exactly. The declaration "const short *b" doesn't
mean that either b or the thing it points to is const; it
just means that (*b) cannot be changed through pointer b.

So "short c = *b++" is technically valid C++. It stores (*b)
in c, then increments b. Now, as to why you'd want to do
that, I don't know. But the OP does that same thing repeatedly
in the program, so I doubt it's a "bug". Probably a heavily
unrolled verson of something that would be better expressed
at a more abstract level, unless it's extremely time-critical
(avionics? guided-missle control? spacecraft navigation?).
But if it's a regular app running on someone's PC, such
agressive overoptimization seems silly to me. Why go to
such lengths to save a few nanoseconds? Especially at the
cost of clarity and maintainability?
and all increments to a, b will lead to pointer increment
in your code and the ouput is undefined.
Not "undefined", just "unknown". The output is dependent on the
pointers pointing to something valid, which is always the case
when using C-style "passing by reference", which is why I detest
that and rarely use it. I generally use C++ references instead.
Incrementing a pointer that points to something in another
function seems like shooting in the dark to me. I wouldn't do
that unless I had no other choice.
--
Cheers,
Robbie Hatley
Tustin, CA, USA
lonewolfintj at pacbell dot net
(put "[usenet]" in subject to bypass spam filter)
http://home.pacbell.net/earnur/
Jul 7 '06 #8

P: n/a

Robbie Hatley wrote:
"Murali Krishna" <pm*********@gmail.comwrote:
Hi Sukumari,
I have following code which I have optimized using
different optimization techniques like loop unrolling,
use of pointers.
use of pointers doesn't mean optimizing the code.

you did not mention what your code is meant for. Sorry to
say that it has some bugs.
short c = *b++;
here pointer b is incremented not the value of b. even if
you wanted to increment the value, you cannot do that because
it is a constant,

Well, not exactly. The declaration "const short *b" doesn't
mean that either b or the thing it points to is const; it
just means that (*b) cannot be changed through pointer b.

So "short c = *b++" is technically valid C++. It stores (*b)
in c, then increments b. Now, as to why you'd want to do
that, I don't know. But the OP does that same thing repeatedly
in the program, so I doubt it's a "bug". Probably a heavily
unrolled verson of something that would be better expressed
at a more abstract level, unless it's extremely time-critical
(avionics? guided-missle control? spacecraft navigation?).
But if it's a regular app running on someone's PC, such
agressive overoptimization seems silly to me. Why go to
such lengths to save a few nanoseconds? Especially at the
cost of clarity and maintainability?
and all increments to a, b will lead to pointer increment
in your code and the ouput is undefined.

Not "undefined", just "unknown". The output is dependent on the
pointers pointing to something valid, which is always the case
when using C-style "passing by reference", which is why I detest
that and rarely use it. I generally use C++ references instead.
Incrementing a pointer that points to something in another
function seems like shooting in the dark to me. I wouldn't do
that unless I had no other choice.
--
Cheers,
Robbie Hatley
Tustin, CA, USA
lonewolfintj at pacbell dot net
(put "[usenet]" in subject to bypass spam filter)
http://home.pacbell.net/earnur/
here pointer b is incremented not the value of b. even if
you wanted to increment the value, you cannot do that because
it is a constant,

Well, not exactly. The declaration "const short *b" doesn't
mean that either b or the thing it points to is const; it
just means that (*b) cannot be changed through pointer b.
That's what I meant.
in the program, so I doubt it's a "bug". Probably a heavily
unrolled verson of something that would be better expressed
at a more abstract level, unless it's extremely time-critical
(avionics? guided-missle control? spacecraft navigation?).
LOL.
Not "undefined", just "unknown".
OK. a better word.
Incrementing a pointer that points to something in another
function seems like shooting in the dark to me. I wouldn't do
that unless I had no other choice.
yes. as Vikram said..
Maybe he intends to increment pointer. I get a feeling b is an array
that he has declared as a pointer. But, as you said, without more
details, its hard to figure out.
-- Murali Krishna

Jul 7 '06 #9

P: n/a
On 6 Jul 2006 23:27:05 -0700, "Murali Krishna" <pm*********@gmail.com>
wrote:
>Zara wrote:
>On 6 Jul 2006 21:55:01 -0700, ku********@gmail.com wrote:
>Can anybody help me...
<..>
>
pre increments and post increment will result in different o/p no
matter which one is faster.
I donīt understand what you mean.
>
>c = *++b;

will again lead to pointer increment not the value increment.
Well, I proposed an "optimization" for the code presented. I never
tried to understand if it did what it should, I only proposed an
alternative that works almost exactly the same, and gives the same
result.

Zara
Jul 7 '06 #10

P: n/a

Zara wrote:
pre increments and post increment will result in different o/p no
matter which one is faster.
I donīt understand what you mean.
Her first code had..

c = *b++;

will assign b's value to c and pointer b increments to next address.

you wrote..
c = *++b;
First, pointer b is incremented and value contained in next address
space will be assigned to c.

so post and pre increment will have different results. I think you
already know this.
Well, I proposed an "optimization" for the code presented. I never
tried to understand if it did what it should, I only proposed an
alternative that works almost exactly the same, and gives the same
result.
I appreciate your intention to provide optimized code. If one posts a
query that is not breif and clear, we cannot send prefect answer to
that and it will lead to long discussion like this.

any how, in the original code,in the third statement,

the original code is as follows..
short c = *b++;
you changed that to..
short c = *b;
If the function is meant for receiving array of shorts, this code will
not do the right justification.

-- Murali Krishna

Jul 7 '06 #11

P: n/a
On 7 Jul 2006 02:48:41 -0700, "Murali Krishna" <pm*********@gmail.com>
wrote:
<...>
>
I appreciate your intention to provide optimized code. If one posts a
query that is not breif and clear, we cannot send prefect answer to
that and it will lead to long discussion like this.

any how, in the original code,in the third statement,

the original code is as follows..
>short c = *b++;

you changed that to..
>short c = *b;

If the function is meant for receiving array of shorts, this code will
not do the right justification.

-- Murali Krishna
Yes, there was some error. The proposition is:

extern long mac(const short *a, const short *b, long sqr, long * sum)
{
int i;
long dotp = *sum;
short c = *b;
short d;
for (i = 0; i < 74; ++i,++a)
{
sqr += (long)c * c;
d = *a;
dotp += (long)c * d;
c = *++b;
sqr += (long)c * c;
d = *++a;
dotp += (long)c * d;
c = *++b;
}
sqr += (long)c * c; /* Loop epilogue */
d = *a;
d = *++a;
dotp += (long)c * d;
c = *++b;
sqr += (long)c * c;
d = *++a;
dotp += (long)c * d;
*sum = dotp;
return sqr;
}
Zara
Jul 7 '06 #12

P: n/a
In article <11**********************@s16g2000cws.googlegroups .com>,
ku********@gmail.com says...
Can anybody help me...
I have following code which I have optimized using different
optimization techniques
like loop unrolling, use of pointers.
You're at the point that real optimization is going to depend on the
target. That said, there are some possibilities that may be worth
trying -- though it's open to question whether they'll really help or
not.

[ ... ]
sqr += (long)c * c;
d = *a++;
dotp += (long)c * d;
c = *b++;
sqr += (long)c * c;
d = *a++;
dotp += (long)c * d;
c = *b++;
}
First of all, I'd point out that many compilers can automatically
unroll loops -- and do it with some knowledge of the cache size of
the target processor and such, so they have a better chance of (for
example) stopping unrolling before it becomes harmful.

To get any benefit from unrolling loops yourself, you usually need to
(at least) ensure against any dependencies between the unrolled
iterations to help execute more in parallel. In this case, you've
done more or less the reverse. With a really simple loop, most
compilers could probably figure out what's independent between
iterations, and execute more in parallel. By accessing values via the
same pointers, you may be forcing serialization.

I'd try a loop body more like this:

for (int i=0; i<74; i++, a+=2, b+=2) {
sqr += b[0] * b[0] + b[1] * b[1];
dotp += b[0] * a[0] + b[1] * a[1];
}

This makes it much more obvious that the multiplications are
independent of each other, so they can all be carried out in
parallel. Your compiler/CPU may have been able to figure that out
anyway, but this may improve their chances a bit.

With most modern compilers, array-style notation carries no penalty,
and even if it did, the win from executing two multiplications in
parallel will be much larger than the loss from array-style notation
could ever be.

To make a real difference, however, you'll probably have to resort to
platform-specific optimizations. At least in my recent experience,
cache management is probably the single biggest factor for code like
this. With a modern processor, it's often convenient to think of the
CPU as infinitely fast, so your primary job is to optimize the
availability of operands to it.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Jul 7 '06 #13

P: n/a
ku********@gmail.com wrote:
Can anybody help me...
I have following code which I have optimized using different
optimization techniques
like loop unrolling, use of pointers.
Are you sure that your compiler doesn't already do all this,
and maybe much more. Have you timed the before/after code?

What was the original code? There are chances that someone
could select another algorithm for a major speedup, or see
something basic that you missed, but hiding the original
code makes that much harder.

Many attempts at "optimizing" end up being just "obfuscation"
and end up making the code slower, as well as introducing
bugs. Seeing others comments, have you checked that the
results returned are the same for the "optimized" and original
code? If you don't care what it returns, replacing the body
with a "return 42;" and it would run faster.

Also, I'd advise to leave the original "unoptimized" code
in the source somehow (maybe using something like
"#ifdef UN_OPTIMIZED ... #else ... #endif". It would make
it easier for future debuggers to understand what you are
actually trying to do.
extern long mac(const short *a, const short *b, long sqr, long * sum)
{
....
}
Can anybody suggest me any further optimization possible in the above
code?
What are the "optimization" requirements?
Speed, size, accuracy, maximum buzz-words?

----== Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----
Jul 7 '06 #14

P: n/a
ku********@gmail.com schrieb:
Can anybody help me...
I have following code which I have optimized using different
optimization techniques
like loop unrolling, use of pointers.

extern long mac(const short *a, const short *b, long sqr, long * sum)
{
functions are extern by default. You don't have to specify it.

To return something in parameter 'sum', you could use a reference as it
is more convenient in C++.
int i;
Put that in the for-loop header:

for (int i = 0; ...)
long dotp = *sum;
short c = *b++;
short d;
for (i = 0; i < 74; i++)
{
Are the arrays a and b really of constant length (150 elements)?

Remove the magic number. Use a constant defined somewhere instead.
Or better, use std::vector instead and get the size by a.size(), or use
the iterator.
sqr += (long)c * c;
d = *a++;
dotp += (long)c * d;
c = *b++;
sqr += (long)c * c;
d = *a++;
dotp += (long)c * d;
c = *b++;
}
sqr += (long)c * c; /* Loop epilogue */
d = *a++;
d = *a++;
This is an error, isn't it?
dotp += (long)c * d;
c = *b++;
sqr += (long)c * c;
d = *a++;
dotp += (long)c * d;
*sum = dotp;
return sqr;
}
It seems that the 4th parameter (sum) becomes the dot product (a1*b1 +
a2*b2 + ...) and the return value becomes (b1)^2 + (b2)^2 + (b3)^2...

I would expect the function to return the squares of the _first_
parameter (i.e. (a1)^2 + (a2)^2 + (a3)^2...) and the dot product of the
first and second.
Can anybody suggest me any further optimization possible in the above
code?
Yes. Comment your code. Dokument, what this function does, what
parameters it expects and what it returns/changes. Let the compiler do
the optimization, unless you really need the speedup and profiled your code.

--
Thomas
Jul 7 '06 #15

This discussion thread is closed

Replies have been disabled for this discussion.