473,406 Members | 2,698 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Hi, how can I optimize the following code?

I was trying to use multiple thread to optimize my following code, but
met some problems, anyone can help me?

k[3][500] are initialized.

int computePot() {
int i, j;

for( i=0; i<500; i++ ) {
for( j=0; j<i-1; j++ ) {
distx = pow( (r[0][j] - r[0][i]), 2 );
disty = pow( (r[1][j] - r[1][i]), 2 );
distz = pow( (r[2][j] - r[2][i]), 2 );
dist = sqrt( distx + disty + distz );
pot += 1.0 / dist;
}
}

I doubt that If this code really can be optimized?

Dec 11 '06 #1
15 2490

kenneth wrote:
I was trying to use multiple thread to optimize my following code, but
met some problems, anyone can help me?

k[3][500] are initialized.

int computePot() {
int i, j;

for( i=0; i<500; i++ ) {
for( j=0; j<i-1; j++ ) {
distx = pow( (r[0][j] - r[0][i]), 2 );
disty = pow( (r[1][j] - r[1][i]), 2 );
distz = pow( (r[2][j] - r[2][i]), 2 );
dist = sqrt( distx + disty + distz );
pot += 1.0 / dist;
}
}

I doubt that If this code really can be optimized?
You can quite often optimize an O(N^2) operation by coming up with an
algorithm that is not O(N^2). The compiler will hit your loops and
optimize them quite a bit but that doesn't resolve the problem of
having an algorithm that is a square of N. Don't know what you are
trying to do. You might describe the problem in comp.programming and
see if they have some ideas.

In other words, look to the algorithms you are using not ways to
micro-optimize.

Dec 11 '06 #2
kenneth kirjoitti:
int computePot() {
int i, j;

for( i=0; i<500; i++ ) {
for( j=0; j<i-1; j++ ) {
distx = pow( (r[0][j] - r[0][i]), 2 );
disty = pow( (r[1][j] - r[1][i]), 2 );
distz = pow( (r[2][j] - r[2][i]), 2 );
dist = sqrt( distx + disty + distz );
pot += 1.0 / dist;
}
}

I doubt that If this code really can be optimized?
I can't say how to optimize for multithreading but
don't use pow function to compute squares.
Function pow takes the exponent as a floating point number and it may
use exponent and logarithm functions.

Use something like
----
inline double SquareDouble(double r)
{
return r * r;
}
----
or
----
#define square(x) ((x)*(x))
----

If you use GNU C consider using the const attribute:
----
inline double SquareDouble(double r) __attribute__((const))
{
return r * r;
}
----

--
Tommi Höynälänmaa
sähköposti / e-mail: to***************@iki.fi
kotisivu / homepage: http://www.iki.fi/tohoyn/
Dec 11 '06 #3
kenneth wrote:
I was trying to use multiple thread to optimize my following code, but
met some problems, anyone can help me?
I'll try with a single thread. Multiple threads would require
platform-specific solutions, so are out of the scope of this NG.
k[3][500] are initialized.
What's k?
int computePot() {
int i, j;

for( i=0; i<500; i++ ) {
for( j=0; j<i-1; j++ ) {
distx = pow( (r[0][j] - r[0][i]), 2 );
disty = pow( (r[1][j] - r[1][i]), 2 );
distz = pow( (r[2][j] - r[2][i]), 2 );
dist = sqrt( distx + disty + distz );
pot += 1.0 / dist;
}
}

I doubt that If this code really can be optimized?
Sure it can. First of all forget that 'pow' thing.
To get the square, just multiply the value by itself.
This will definitely make it much faster.

Another thing is that you don't need to index out
the second vector inside the inner loop, you can as
well do it outside, like in:

for(i) {
register double r0i=r[0][i]; // now evaluated only 500 times
register double r1i=r[1][i];
register double r2i=r[2][i];
for(j) {
distx = (r[0][j]-r0i)*(r[0][j]-r0i);
// ...
}
}

That should give your code a boost.

HTH,
- J.
Dec 11 '06 #4
"kenneth" <ja*************@gmail.comwrote in message
news:11**********************@79g2000cws.googlegro ups.com...
>I was trying to use multiple thread to optimize my following code, but
met some problems, anyone can help me?

k[3][500] are initialized.

int computePot() {
int i, j;

for( i=0; i<500; i++ ) {
for( j=0; j<i-1; j++ ) {
distx = pow( (r[0][j] - r[0][i]), 2 );
disty = pow( (r[1][j] - r[1][i]), 2 );
distz = pow( (r[2][j] - r[2][i]), 2 );
dist = sqrt( distx + disty + distz );
pot += 1.0 / dist;
}
}

I doubt that If this code really can be optimized?
Hi,

in principle I would let the compiler worry about that and not try to
micro-optimize. However, you can move the assignments outside to register
variables like Jacek already showed and get rid of the pow(). Additionally,
it's cheap to check if some "manual" loop-unrolling via meta-templates might
get you some speed.

Cheers
Chris
Dec 11 '06 #5
Jacek Dziedzic wrote:
Another thing is that you don't need to index out
the second vector inside the inner loop, you can as
well do it outside, like in:

for(i) {
register double r0i=r[0][i]; // now evaluated only 500 times
register double r1i=r[1][i];
register double r2i=r[2][i];
for(j) {
distx = (r[0][j]-r0i)*(r[0][j]-r0i);
// ...
}
}

That should give your code a boost.
Doubtful. Any compiler worth its salt would have hoisted the evaluations out
of the loop by itself. Just how any compiler worth its salt will convert
'distx = (r[0][j]-r0i)*(r[0][j]-r0i);' into: 'distx = (r[0][j]-r0i); distx *=
distx;'

But the fact is that code like what the OP showed /rarely/ benefits from
micro-optimizations; that's not to say that trying to make such optimizations
aren't necessary at times -- only that it is a much better idea to focus on
the algorithm and try to make it simpler.

-n
Dec 11 '06 #6
kenneth wrote:
I was trying to use multiple thread to optimize my following code, but
met some problems, anyone can help me?

k[3][500] are initialized.

int computePot() {
int i, j;

for( i=0; i<500; i++ ) {
for( j=0; j<i-1; j++ ) {
distx = pow( (r[0][j] - r[0][i]), 2 );
disty = pow( (r[1][j] - r[1][i]), 2 );
distz = pow( (r[2][j] - r[2][i]), 2 );
dist = sqrt( distx + disty + distz );
pot += 1.0 / dist;
}
}

I doubt that If this code really can be optimized?
If you have access to an MP system and your compiler supports OpenMP,
you might be able to get some gains.

--
Ian Collins.
Dec 12 '06 #7
Nikolaos D. Bougalis wrote:
Jacek Dziedzic wrote:
> Another thing is that you don't need to index out
the second vector inside the inner loop, you can as
well do it outside, like in:

for(i) {
register double r0i=r[0][i]; // now evaluated only 500 times
register double r1i=r[1][i];
register double r2i=r[2][i];
for(j) {
distx = (r[0][j]-r0i)*(r[0][j]-r0i);
// ...
}
}

That should give your code a boost.

Doubtful. Any compiler worth its salt would have hoisted the
evaluations out of the loop by itself. Just how any compiler worth its
salt will convert 'distx = (r[0][j]-r0i)*(r[0][j]-r0i);' into: 'distx =
(r[0][j]-r0i); distx *= distx;'
I disagree. First of all, I doubt any compiler would turn
pow(double,2) into double*double.

Secondly, the conversion distx=something*something into
distx=something, distx*=distx does not necessarily speed up
anything, that would depend on the computer architecture,
so I doubt "any compiler worth its salt will convert" it.

Thirdly, even though a good compiler is supposed to
move outside of the loop what does not depend on the loop
iterating variable, my experience show it is worth trying
to help it. I have recently spent a few days, if not weeks,
having to micro-optimize tight loops, working with a compiler
that is rather good at optimizing, the Intel compiler for
Itanium 2 (this platform _needs_ good optimization at
compiler level). I think you would be surprised at how
dumb sometimes the compiler was, or rather, how typical,
simple and out-of-the-book optimizations made the code
faster even at -O3.

Therefore, if the OP asked for ways to optimize, I gave
him two hints. I'm 99% sure they won't slow the code down
and pretty sure they will speed it up. I'm too lazy though
to try it out, anyway, it depends on the platform.
But the fact is that code like what the OP showed /rarely/ benefits
from micro-optimizations; that's not to say that trying to make such
optimizations aren't necessary at times -- only that it is a much better
idea to focus on the algorithm and try to make it simpler.
Of course. But I assumed the OP ran out of options on this one.
For example, this looked like a calculation of a harmonic potential.
Typically for the calculations of potential one assumes a cutoff
radius, beyond which the interactions are assumed to vanish.
Here, such algorithmic optimization would turn this O(N^2) into
O(N). However, with the harmonic potential it is impossible,
because it decays to slowly. That led me to the conclusion that
the OP has already ran out of options on algorithmic optimization
and is left with mirco-optimizations.

howgh :),
- J.
Dec 12 '06 #8
Jacek Dziedzic wrote:
Nikolaos D. Bougalis wrote:
>Jacek Dziedzic wrote:
>> Another thing is that you don't need to index out
the second vector inside the inner loop, you can as
well do it outside, like in:

for(i) {
register double r0i=r[0][i]; // now evaluated only 500 times
register double r1i=r[1][i];
register double r2i=r[2][i];
for(j) {
distx = (r[0][j]-r0i)*(r[0][j]-r0i);
// ...
}
}

That should give your code a boost.

Doubtful. Any compiler worth its salt would have hoisted the
evaluations out of the loop by itself. Just how any compiler worth its
salt will convert 'distx = (r[0][j]-r0i)*(r[0][j]-r0i);' into: 'distx
= (r[0][j]-r0i); distx *= distx;'

I disagree. First of all, I doubt any compiler would turn
pow(double,2) into double*double.
I said the compiler would hoist the r[0][i], r[1][i] and r[2][i] out of the
loop. I said nothing about converting the pow(d,2) into d*d, although I don't
see why a compiler could not do that, if it had intrinsic support for pow()
and understood the semantics of the pow() call..

Secondly, the conversion distx=something*something into
distx=something, distx*=distx does not necessarily speed up
anything, that would depend on the computer architecture,
so I doubt "any compiler worth its salt will convert" it.
You are misrepresenting what I said. In your code "something" was a complex
expression that required evaluation. Do you honestly believe that any decent
compiler would calculate "a-b" twice to do (a-b)*(a-b)? Common subexpression
elimination is one of the most basic optimizations a compiler can and does
perform.

Basic optimizations aside, you would really be surprised what compilers can
do nowadays. I was quite stunned to see the latest CL.EXE for example, do some
crazy tricks in a math intensive function, shaving off two divisions by using
multiplications and additions instead and managing to keep both the U and V
pipes going full speed significantly improving throughput. I couldn't even
come close to matching it and I'm not a newbie at the optimization game.

Thirdly, even though a good compiler is supposed to
move outside of the loop what does not depend on the loop
iterating variable, my experience show it is worth trying
to help it. I have recently spent a few days, if not weeks,
having to micro-optimize tight loops, working with a compiler
that is rather good at optimizing, the Intel compiler for
Itanium 2 (this platform _needs_ good optimization at
compiler level). I think you would be surprised at how
dumb sometimes the compiler was, or rather, how typical,
simple and out-of-the-book optimizations made the code
faster even at -O3.
I don't disagree that hand-optimization and a helping hand to the compiler
are necessary even today. What I am saying, is that the manual hoist operation
you showed and most such trivial optimizations are unlikely to have any
significant effect. The compiler will take care of simple things like that if
you compile with optimizations enabled and you don't use volatile every other
line.

A very good example of optimizations that programmers can make but compilers
cannot is easily demonstrated with this little function that sums a floating
point array (extern float *huge_array):

The naive approach is this:

float array_sum()
{
float sum = 0;

for(int i = 0; i < huge_array_size; i++)
sum += huge_array[i];

return sum;
}

The better approach takes advantage of abelian operations,
something a compiler cannot do because of restrictions placed by
the ANSI/ISO C and C++ standards:

float array_sum()
{
float sum1 = 0, sum2 = 0, sum3 = 0, sum4 = 0

for(int i = 0; i < huge_array_size; i += 4)
{
sum0 += huge_array[i];
sum1 += huge_array[i + 1];
sum2 += huge_array[i + 2];
sum3 += huge_array[i + 3];
}

return (sum0 + sum2) + (sum1 + sum3);
}

-n
Dec 12 '06 #9
Nikolaos D. Bougalis wrote:
Jacek Dziedzic wrote:
>Nikolaos D. Bougalis wrote:
>>Jacek Dziedzic wrote:

Another thing is that you don't need to index out
the second vector inside the inner loop, you can as
well do it outside, like in:

for(i) {
register double r0i=r[0][i]; // now evaluated only 500 times
register double r1i=r[1][i];
register double r2i=r[2][i];
for(j) {
distx = (r[0][j]-r0i)*(r[0][j]-r0i);
// ...
}
}

That should give your code a boost.

Doubtful. Any compiler worth its salt would have hoisted the
evaluations out of the loop by itself. Just how any compiler worth
its salt will convert 'distx = (r[0][j]-r0i)*(r[0][j]-r0i);' into:
'distx = (r[0][j]-r0i); distx *= distx;'

I disagree. First of all, I doubt any compiler would turn
pow(double,2) into double*double.

I said the compiler would hoist the r[0][i], r[1][i] and r[2][i] out
of the loop. I said nothing about converting the pow(d,2) into d*d,
Yes, but you said "doubtful" to my statement "that should give
your code a boost". What I meant to say was "for this code to give
no boost, it would require the compiler to turn pow(double,2) into
double*double. Therefore, by being doubtful wrt to this code's
potential boost, you imply this action of the compiler.".
although I don't see why a compiler could not do that, if it had
intrinsic support for pow() and understood the semantics of the pow()
call..
OK, perhaps a smart compiler could do that. But honestly
I wouldn't expect it.
> Secondly, the conversion distx=something*something into
distx=something, distx*=distx does not necessarily speed up
anything, that would depend on the computer architecture,
so I doubt "any compiler worth its salt will convert" it.

You are misrepresenting what I said. In your code "something" was a
complex expression that required evaluation. Do you honestly believe
that any decent compiler would calculate "a-b" twice to do (a-b)*(a-b)?
No, of course I don't believe that. This was a misunderstanding.
I meant that (a-b) would be stored into a register, there squared
and sent to distx. I somehow thought you implied that a memory
location would be squared, rather than a register.
Common subexpression elimination is one of the most basic optimizations
a compiler can and does perform.
Yes, I don't claim a-b will be evaluated twice.
Basic optimizations aside, you would really be surprised what
compilers can do nowadays. I was quite stunned to see the latest CL.EXE
for example, do some crazy tricks in a math intensive function, shaving
off two divisions by using multiplications and additions instead and
managing to keep both the U and V pipes going full speed significantly
improving throughput. I couldn't even come close to matching it and I'm
not a newbie at the optimization game.
Yes, I also usually trust the compiler to optimize better, especially
since for the Itanium they are supposed to be good at it. Maybe I was
lucky then that I managed to squeeze some extra ticks from the code
I was working on by simple tricks like changing

for(i) a[i]=something;
into
while(i++) *(a_ptr+i)=something;

or the other way round.
I don't disagree that hand-optimization and a helping hand to the
compiler are necessary even today. What I am saying, is that the manual
hoist operation you showed and most such trivial optimizations are
unlikely to have any significant effect. The compiler will take care of
simple things like that if you compile with optimizations enabled and
you don't use volatile every other line.
OK, then that was just a suggestion for the OP to try and see for
himself. I think neither I can make a general statement "taking
things out of the loop will make things faster" nor you can make
a general statement "taking things out of the loop will not make
things faster because the compiler had done it already", because
this all depends on the compiler.

We'll never know unless the OP tries.

I could have as well said "make sure you have -O3" and then
somebody could say "come on, it's on by default" -- just a
suggestion to try out.
>
A very good example of optimizations that programmers can make but
compilers cannot is easily demonstrated with this little function that
sums a floating point array (extern float *huge_array):

The naive approach is this:

float array_sum()
{
float sum = 0;

for(int i = 0; i < huge_array_size; i++)
sum += huge_array[i];

return sum;
}

The better approach takes advantage of abelian operations,
something a compiler cannot do because of restrictions placed by
the ANSI/ISO C and C++ standards:

float array_sum()
{
float sum1 = 0, sum2 = 0, sum3 = 0, sum4 = 0

for(int i = 0; i < huge_array_size; i += 4)
{
sum0 += huge_array[i];
sum1 += huge_array[i + 1];
sum2 += huge_array[i + 2];
sum3 += huge_array[i + 3];
}

return (sum0 + sum2) + (sum1 + sum3);
}
OK.

- J.
Dec 12 '06 #10

[ After re-reading this thread, I think I might have come off a bit too
aggressive. It was not my intent, and I hope there are no hard feelings. ]

A lot of times, simple tricks can help the compiler along and result in much
better code. While slightly off-topic, I'd be interested in knowing what kind
of gains you saw in Itanium builds after the micro-optimizations you made. You
can contact me in private if you don't mind sharing.

-n
Dec 12 '06 #11
kenneth wrote:
I was trying to use multiple thread to optimize my following code, but
met some problems, anyone can help me?

k[3][500] are initialized.

int computePot() {
int i, j;

for( i=0; i<500; i++ ) {
for( j=0; j<i-1; j++ ) {
distx = pow( (r[0][j] - r[0][i]), 2 );
disty = pow( (r[1][j] - r[1][i]), 2 );
distz = pow( (r[2][j] - r[2][i]), 2 );
dist = sqrt( distx + disty + distz );
pot += 1.0 / dist;
}
}

I doubt that If this code really can be optimized?
I think you're too dum for the explanation.
Dec 17 '06 #12
Uenal S. Mutlu wrote:
kenneth wrote:
>I was trying to use multiple thread to optimize my following code, but
met some problems, anyone can help me?

k[3][500] are initialized.

int computePot() {
int i, j;

for( i=0; i<500; i++ ) {
for( j=0; j<i-1; j++ ) {
distx = pow( (r[0][j] - r[0][i]), 2 );
disty = pow( (r[1][j] - r[1][i]), 2 );
distz = pow( (r[2][j] - r[2][i]), 2 );
dist = sqrt( distx + disty + distz );
pot += 1.0 / dist;
}
}

I doubt that If this code really can be optimized?
I think you're too dum for the explanation.
A bit rich from one who can't spell.

--
Ian Collins.
Dec 17 '06 #13

Uenal S. Mutlu wrote:
>
I think you're too dum for the explanation.
Hey! don't break wind???...
[NOTE: As long as you post this kind of stuff in this group, I shall
follow up with this advice!]

Dec 17 '06 #14
Ian Collins wrote:
A bit rich from one who can't spell.
I arege taht his cmmonet was ianpprorpaite, but dsylxeia has nthonig to
do wtih inetlilegnece.

Dec 17 '06 #15
frame wrote:
Uenal S. Mutlu wrote:
>>
I think you're too dum for the explanation.

Hey! don't break wind???...
[NOTE: As long as you post this kind of stuff in this group, I shall
follow up with this advice!]
Don't feed the troll pls. I have already reported his destructive actions
to his ISP and his NSP.

--
WW aka Attila
:::
Constant change is here to stay
Dec 19 '06 #16

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Andreas Falck | last post by:
Hi, I ran the code below on two different versions, 4.0.14 and 4.0.16 respectively, both running RH 7.3 on intel. In version mysql server version 4.0.14 the SELECT privelege suffices for...
5
by: xeqister | last post by:
Greetings all, We have a complicated statement in DB2 which takes long hour to complete and we have created most of the indexes. Does anybody knows how to tune the following statement to optimize...
0
by: Daniel | last post by:
Hi there, I recently came across an interesting option when right clicking on a project (Right click on the project -> properties -> Configuration Properties ->Build -> Optimize) There is an...
4
by: Huaer.XC | last post by:
>From the following MySQL command: EXPLAIN SELECT * FROM t1 JOIN t2 ON (t1.id = t2.id) JOIN t3 ON t3.name = t1.name WHERE t1.id IN(123, 124); which result is:...
13
by: Frank Swarbrick | last post by:
IBM has a product for the VSE operating system called the VSAM Redirector. It allows you to use VSAM to access RDBMS tables/views as if they were actual VSAM files. We're doing a comparison right...
2
by: kshirsagar007 | last post by:
friends, I would like to optimize the following query....as its taking 2 minutes to get the records. select a.CODE "Code", ud.Name "Name", a.SchemeId "SchemeID" from (MASTER sh,...
0
gits
by: gits | last post by:
This little article will show you how to optimize runtime performance when you need to compare two arrays (a quite common task). Have a close look at the entire article, and you will see the...
4
by: George2 | last post by:
Hello everyone, Why visual studio does not optimize constructor in this case? I do not understand what the MSDN mentioned, if use different named object, compiler can not optimize. Why? ...
1
by: acornejo | last post by:
Hi All I've the following code I need to optimize. Currently tblOutgoing is about 250K registers and growing at a rate of about 20k records per day. This code takes me over 5 secs to run on each...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.