473,407 Members | 2,320 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,407 software developers and data experts.

Help Me.

Can anybody help me...
I have following code which I have optimized using different
optimization techniques
like loop unrolling, use of pointers.

extern long mac(const short *a, const short *b, long sqr, long * sum)
{
int i;
long dotp = *sum;
short c = *b++;
short d;
for (i = 0; i < 74; i++)
{
sqr += (long)c * c;
d = *a++;
dotp += (long)c * d;
c = *b++;
sqr += (long)c * c;
d = *a++;
dotp += (long)c * d;
c = *b++;
}
sqr += (long)c * c; /* Loop epilogue */
d = *a++;
d = *a++;
dotp += (long)c * d;
c = *b++;
sqr += (long)c * c;
d = *a++;
dotp += (long)c * d;
*sum = dotp;
return sqr;
}
Can anybody suggest me any further optimization possible in the above
code?
Please help me.

Sumkari

Jul 7 '06 #1
14 1424
ku********@gmail.com wrote:
Can anybody help me...
I have following code which I have optimized using different
optimization techniques
like loop unrolling, use of pointers.
<snip>
>
Can anybody suggest me any further optimization possible in the above
code?
What does your profiler tell you? I assume you profiled the code before
you optimised it.

--
Ian Collins.
Jul 7 '06 #2
Ian Collins wrote:
ku********@gmail.com wrote:
>Can anybody help me...
I have following code which I have optimized using different
optimization techniques
like loop unrolling, use of pointers.
<snip>
>Can anybody suggest me any further optimization possible in the above
code?

What does your profiler tell you? I assume you profiled the code before
you optimised it.
Exactly. Remember Hoare's Law/Knuth's Law (attribution varies):
"Premature optimization is the root of all evil".
Jul 7 '06 #3
Hi Sukumari,
I have following code which I have optimized using different
optimization techniques
like loop unrolling, use of pointers.
use of pointers doesn't mean optimizing the code.

you did not mention what your code is meant for. Sorry to say that it
has some bugs.
short c = *b++;
here pointer b is incremented not the value of b. even if you wanted to
increment the value, you cannot do that because it is a constant.

and all increments to a, b will lead to pointer increment in your code
and the ouput is undefined.

Specify briefly what mac function does.

-- Murali Krishna

Jul 7 '06 #4
On 6 Jul 2006 21:55:01 -0700, ku********@gmail.com wrote:
>Can anybody help me...
I have following code which I have optimized using different
optimization techniques
like loop unrolling, use of pointers.

extern long mac(const short *a, const short *b, long sqr, long * sum)
{
<...>
}
Can anybody suggest me any further optimization possible in the above
code?
Normally, the optimizations as loop unrolling are done by the
compiler itself. And no optimization should be tried before profiling
tells you it is necessary.

But you may take a look to the following, it *may* be more optimized:
>extern long mac(const short *a, const short *b, long sqr, long * sum)
{
int i;
long dotp = *sum;
short c = *b;
>short d;
for (i = 0; i < 74; i++)
{
sqr += (long)c * c;
d = *a;
>dotp += (long)c * d;
c = *++b;
>sqr += (long)c * c;
d = *++a;
>dotp += (long)c * d;
c = *++b;
}
sqr += (long)c * c; /* Loop epilogue */
d = *++a;
d = *++a;
>dotp += (long)c * d;
c = *++b;
>sqr += (long)c * c;
d = *++a;
>dotp += (long)c * d;
*sum = dotp;
return sqr;
}
This way, you increment pointers a and b one time less, and using
preincrement *might* be faster than postincrement because it is
unnecessary to keep old value after changing it.

*BUT* compiler optimizations might do better than this, and only a
profiler can tell us which is the best option.

Best regards,

Zara
Jul 7 '06 #5
Zara wrote:
On 6 Jul 2006 21:55:01 -0700, ku********@gmail.com wrote:
Can anybody help me...
I have following code which I have optimized using different
optimization techniques
like loop unrolling, use of pointers.

extern long mac(const short *a, const short *b, long sqr, long * sum)
{
<...>
}
Can anybody suggest me any further optimization possible in the above
code?
Normally, the optimizations as loop unrolling are done by the
compiler itself. And no optimization should be tried before profiling
tells you it is necessary.

But you may take a look to the following, it *may* be more optimized:
extern long mac(const short *a, const short *b, long sqr, long * sum)
{
int i;
long dotp = *sum;
short c = *b;
short d;
for (i = 0; i < 74; i++)
{
sqr += (long)c * c;
d = *a;
dotp += (long)c * d;
c = *++b;
sqr += (long)c * c;
d = *++a;
dotp += (long)c * d;
c = *++b;
}
sqr += (long)c * c; /* Loop epilogue */
d = *++a;
d = *++a;
dotp += (long)c * d;
c = *++b;
sqr += (long)c * c;
d = *++a;
dotp += (long)c * d;
*sum = dotp;
return sqr;
}

This way, you increment pointers a and b one time less, and using
preincrement *might* be faster than postincrement because it is
unnecessary to keep old value after changing it.

*BUT* compiler optimizations might do better than this, and only a
profiler can tell us which is the best option.

Best regards,

Zara
pre increments and post increment will result in different o/p no
matter which one is faster.
c = *++b;
will again lead to pointer increment not the value increment.

-- Murali Krishna

Jul 7 '06 #6
pre increments and post increment will result in different o/p no
matter which one is faster.
c = *++b;

will again lead to pointer increment not the value increment.
Maybe he intends to increment pointer. I get a feeling b is an array
that he has declared as a pointer. But, as you said, without more
details, its hard to figure out.

Jul 7 '06 #7
"Murali Krishna" <pm*********@gmail.comwrote:
Hi Sukumari,
I have following code which I have optimized using
different optimization techniques like loop unrolling,
use of pointers.

use of pointers doesn't mean optimizing the code.

you did not mention what your code is meant for. Sorry to
say that it has some bugs.
short c = *b++;

here pointer b is incremented not the value of b. even if
you wanted to increment the value, you cannot do that because
it is a constant,
Well, not exactly. The declaration "const short *b" doesn't
mean that either b or the thing it points to is const; it
just means that (*b) cannot be changed through pointer b.

So "short c = *b++" is technically valid C++. It stores (*b)
in c, then increments b. Now, as to why you'd want to do
that, I don't know. But the OP does that same thing repeatedly
in the program, so I doubt it's a "bug". Probably a heavily
unrolled verson of something that would be better expressed
at a more abstract level, unless it's extremely time-critical
(avionics? guided-missle control? spacecraft navigation?).
But if it's a regular app running on someone's PC, such
agressive overoptimization seems silly to me. Why go to
such lengths to save a few nanoseconds? Especially at the
cost of clarity and maintainability?
and all increments to a, b will lead to pointer increment
in your code and the ouput is undefined.
Not "undefined", just "unknown". The output is dependent on the
pointers pointing to something valid, which is always the case
when using C-style "passing by reference", which is why I detest
that and rarely use it. I generally use C++ references instead.
Incrementing a pointer that points to something in another
function seems like shooting in the dark to me. I wouldn't do
that unless I had no other choice.
--
Cheers,
Robbie Hatley
Tustin, CA, USA
lonewolfintj at pacbell dot net
(put "[usenet]" in subject to bypass spam filter)
http://home.pacbell.net/earnur/
Jul 7 '06 #8

Robbie Hatley wrote:
"Murali Krishna" <pm*********@gmail.comwrote:
Hi Sukumari,
I have following code which I have optimized using
different optimization techniques like loop unrolling,
use of pointers.
use of pointers doesn't mean optimizing the code.

you did not mention what your code is meant for. Sorry to
say that it has some bugs.
short c = *b++;
here pointer b is incremented not the value of b. even if
you wanted to increment the value, you cannot do that because
it is a constant,

Well, not exactly. The declaration "const short *b" doesn't
mean that either b or the thing it points to is const; it
just means that (*b) cannot be changed through pointer b.

So "short c = *b++" is technically valid C++. It stores (*b)
in c, then increments b. Now, as to why you'd want to do
that, I don't know. But the OP does that same thing repeatedly
in the program, so I doubt it's a "bug". Probably a heavily
unrolled verson of something that would be better expressed
at a more abstract level, unless it's extremely time-critical
(avionics? guided-missle control? spacecraft navigation?).
But if it's a regular app running on someone's PC, such
agressive overoptimization seems silly to me. Why go to
such lengths to save a few nanoseconds? Especially at the
cost of clarity and maintainability?
and all increments to a, b will lead to pointer increment
in your code and the ouput is undefined.

Not "undefined", just "unknown". The output is dependent on the
pointers pointing to something valid, which is always the case
when using C-style "passing by reference", which is why I detest
that and rarely use it. I generally use C++ references instead.
Incrementing a pointer that points to something in another
function seems like shooting in the dark to me. I wouldn't do
that unless I had no other choice.
--
Cheers,
Robbie Hatley
Tustin, CA, USA
lonewolfintj at pacbell dot net
(put "[usenet]" in subject to bypass spam filter)
http://home.pacbell.net/earnur/
here pointer b is incremented not the value of b. even if
you wanted to increment the value, you cannot do that because
it is a constant,

Well, not exactly. The declaration "const short *b" doesn't
mean that either b or the thing it points to is const; it
just means that (*b) cannot be changed through pointer b.
That's what I meant.
in the program, so I doubt it's a "bug". Probably a heavily
unrolled verson of something that would be better expressed
at a more abstract level, unless it's extremely time-critical
(avionics? guided-missle control? spacecraft navigation?).
LOL.
Not "undefined", just "unknown".
OK. a better word.
Incrementing a pointer that points to something in another
function seems like shooting in the dark to me. I wouldn't do
that unless I had no other choice.
yes. as Vikram said..
Maybe he intends to increment pointer. I get a feeling b is an array
that he has declared as a pointer. But, as you said, without more
details, its hard to figure out.
-- Murali Krishna

Jul 7 '06 #9
On 6 Jul 2006 23:27:05 -0700, "Murali Krishna" <pm*********@gmail.com>
wrote:
>Zara wrote:
>On 6 Jul 2006 21:55:01 -0700, ku********@gmail.com wrote:
>Can anybody help me...
<..>
>
pre increments and post increment will result in different o/p no
matter which one is faster.
I don´t understand what you mean.
>
>c = *++b;

will again lead to pointer increment not the value increment.
Well, I proposed an "optimization" for the code presented. I never
tried to understand if it did what it should, I only proposed an
alternative that works almost exactly the same, and gives the same
result.

Zara
Jul 7 '06 #10

Zara wrote:
pre increments and post increment will result in different o/p no
matter which one is faster.
I don´t understand what you mean.
Her first code had..

c = *b++;

will assign b's value to c and pointer b increments to next address.

you wrote..
c = *++b;
First, pointer b is incremented and value contained in next address
space will be assigned to c.

so post and pre increment will have different results. I think you
already know this.
Well, I proposed an "optimization" for the code presented. I never
tried to understand if it did what it should, I only proposed an
alternative that works almost exactly the same, and gives the same
result.
I appreciate your intention to provide optimized code. If one posts a
query that is not breif and clear, we cannot send prefect answer to
that and it will lead to long discussion like this.

any how, in the original code,in the third statement,

the original code is as follows..
short c = *b++;
you changed that to..
short c = *b;
If the function is meant for receiving array of shorts, this code will
not do the right justification.

-- Murali Krishna

Jul 7 '06 #11
On 7 Jul 2006 02:48:41 -0700, "Murali Krishna" <pm*********@gmail.com>
wrote:
<...>
>
I appreciate your intention to provide optimized code. If one posts a
query that is not breif and clear, we cannot send prefect answer to
that and it will lead to long discussion like this.

any how, in the original code,in the third statement,

the original code is as follows..
>short c = *b++;

you changed that to..
>short c = *b;

If the function is meant for receiving array of shorts, this code will
not do the right justification.

-- Murali Krishna
Yes, there was some error. The proposition is:

extern long mac(const short *a, const short *b, long sqr, long * sum)
{
int i;
long dotp = *sum;
short c = *b;
short d;
for (i = 0; i < 74; ++i,++a)
{
sqr += (long)c * c;
d = *a;
dotp += (long)c * d;
c = *++b;
sqr += (long)c * c;
d = *++a;
dotp += (long)c * d;
c = *++b;
}
sqr += (long)c * c; /* Loop epilogue */
d = *a;
d = *++a;
dotp += (long)c * d;
c = *++b;
sqr += (long)c * c;
d = *++a;
dotp += (long)c * d;
*sum = dotp;
return sqr;
}
Zara
Jul 7 '06 #12
In article <11**********************@s16g2000cws.googlegroups .com>,
ku********@gmail.com says...
Can anybody help me...
I have following code which I have optimized using different
optimization techniques
like loop unrolling, use of pointers.
You're at the point that real optimization is going to depend on the
target. That said, there are some possibilities that may be worth
trying -- though it's open to question whether they'll really help or
not.

[ ... ]
sqr += (long)c * c;
d = *a++;
dotp += (long)c * d;
c = *b++;
sqr += (long)c * c;
d = *a++;
dotp += (long)c * d;
c = *b++;
}
First of all, I'd point out that many compilers can automatically
unroll loops -- and do it with some knowledge of the cache size of
the target processor and such, so they have a better chance of (for
example) stopping unrolling before it becomes harmful.

To get any benefit from unrolling loops yourself, you usually need to
(at least) ensure against any dependencies between the unrolled
iterations to help execute more in parallel. In this case, you've
done more or less the reverse. With a really simple loop, most
compilers could probably figure out what's independent between
iterations, and execute more in parallel. By accessing values via the
same pointers, you may be forcing serialization.

I'd try a loop body more like this:

for (int i=0; i<74; i++, a+=2, b+=2) {
sqr += b[0] * b[0] + b[1] * b[1];
dotp += b[0] * a[0] + b[1] * a[1];
}

This makes it much more obvious that the multiplications are
independent of each other, so they can all be carried out in
parallel. Your compiler/CPU may have been able to figure that out
anyway, but this may improve their chances a bit.

With most modern compilers, array-style notation carries no penalty,
and even if it did, the win from executing two multiplications in
parallel will be much larger than the loss from array-style notation
could ever be.

To make a real difference, however, you'll probably have to resort to
platform-specific optimizations. At least in my recent experience,
cache management is probably the single biggest factor for code like
this. With a modern processor, it's often convenient to think of the
CPU as infinitely fast, so your primary job is to optimize the
availability of operands to it.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Jul 7 '06 #13
ku********@gmail.com wrote:
Can anybody help me...
I have following code which I have optimized using different
optimization techniques
like loop unrolling, use of pointers.
Are you sure that your compiler doesn't already do all this,
and maybe much more. Have you timed the before/after code?

What was the original code? There are chances that someone
could select another algorithm for a major speedup, or see
something basic that you missed, but hiding the original
code makes that much harder.

Many attempts at "optimizing" end up being just "obfuscation"
and end up making the code slower, as well as introducing
bugs. Seeing others comments, have you checked that the
results returned are the same for the "optimized" and original
code? If you don't care what it returns, replacing the body
with a "return 42;" and it would run faster.

Also, I'd advise to leave the original "unoptimized" code
in the source somehow (maybe using something like
"#ifdef UN_OPTIMIZED ... #else ... #endif". It would make
it easier for future debuggers to understand what you are
actually trying to do.
extern long mac(const short *a, const short *b, long sqr, long * sum)
{
....
}
Can anybody suggest me any further optimization possible in the above
code?
What are the "optimization" requirements?
Speed, size, accuracy, maximum buzz-words?

----== Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----
Jul 7 '06 #14
ku********@gmail.com schrieb:
Can anybody help me...
I have following code which I have optimized using different
optimization techniques
like loop unrolling, use of pointers.

extern long mac(const short *a, const short *b, long sqr, long * sum)
{
functions are extern by default. You don't have to specify it.

To return something in parameter 'sum', you could use a reference as it
is more convenient in C++.
int i;
Put that in the for-loop header:

for (int i = 0; ...)
long dotp = *sum;
short c = *b++;
short d;
for (i = 0; i < 74; i++)
{
Are the arrays a and b really of constant length (150 elements)?

Remove the magic number. Use a constant defined somewhere instead.
Or better, use std::vector instead and get the size by a.size(), or use
the iterator.
sqr += (long)c * c;
d = *a++;
dotp += (long)c * d;
c = *b++;
sqr += (long)c * c;
d = *a++;
dotp += (long)c * d;
c = *b++;
}
sqr += (long)c * c; /* Loop epilogue */
d = *a++;
d = *a++;
This is an error, isn't it?
dotp += (long)c * d;
c = *b++;
sqr += (long)c * c;
d = *a++;
dotp += (long)c * d;
*sum = dotp;
return sqr;
}
It seems that the 4th parameter (sum) becomes the dot product (a1*b1 +
a2*b2 + ...) and the return value becomes (b1)^2 + (b2)^2 + (b3)^2...

I would expect the function to return the squares of the _first_
parameter (i.e. (a1)^2 + (a2)^2 + (a3)^2...) and the dot product of the
first and second.
Can anybody suggest me any further optimization possible in the above
code?
Yes. Comment your code. Dokument, what this function does, what
parameters it expects and what it returns/changes. Let the compiler do
the optimization, unless you really need the speedup and profiled your code.

--
Thomas
Jul 7 '06 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

21
by: Dave | last post by:
After following Microsofts admonition to reformat my system before doing a final compilation of my app I got many warnings/errors upon compiling an rtf file created in word. I used the Help...
9
by: Tom | last post by:
A question for gui application programmers. . . I 've got some GUI programs, written in Python/wxPython, and I've got a help button and a help menu item. Also, I've got a compiled file made with...
6
by: wukexin | last post by:
Help me, good men. I find mang books that introduce bit "mang header files",they talk too bit,in fact it is my too fool, I don't learn it, I have do a test program, but I have no correct doing...
3
by: Colin J. Williams | last post by:
Python advertises some basic service: C:\Python24>python Python 2.4.1 (#65, Mar 30 2005, 09:13:57) on win32 Type "help", "copyright", "credits" or "license" for more information. >>> With...
7
by: Corepaul | last post by:
Missing Help Files When I enter "recordset" as the keyword and search the Visual Basic Help index, I get many topics of interest in the resulting list. But there isn't any information available...
5
by: Steve | last post by:
I have written a help file (chm) for a DLL and referenced it using Help.ShowHelp My expectation is that a developer using my DLL would be able to access this help file during his development time...
8
by: Mark | last post by:
I have loaded Visual Studio .net on my home computer and my laptop, but my home computer has an abbreviated help screen not 2% of the help on my laptop. All the settings look the same on both...
10
by: JonathanOrlev | last post by:
Hello everybody, I wrote this comment in another message of mine, but decided to post it again as a standalone message. I think that Microsoft's Office 2003 help system is horrible, probably...
1
by: trunxnirvana007 | last post by:
'UPGRADE_WARNING: Array has a new behavior. Click for more: 'ms-help://MS.VSCC.v80/dv_commoner/local/redirect.htm?keyword="9B7D5ADD-D8FE-4819-A36C-6DEDAF088CC7"' 'UPGRADE_WARNING: Couldn't resolve...
0
by: hitencontractor | last post by:
I am working on .NET Version 2003 making an SDI application that calls MS Excel 2003. I added a menu item called "MyApp Help" in the end of the menu bar to show Help-> About. The application...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.