473,386 Members | 1,679 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Explanation requested for this speed increase

I have a function that before I modified it took around 13.75 seconds
to complete after the modification it took .325 seconds to complete.

the function header:
(Point **Input, size_t InputSize, Point *Output, size_t OutputSize);

Point is a simple x,y,z coordinate structure (3 doubles).
TheDistances is an array containing room for InputSize doubles.

Most of the time consumed (99%+) is in the following for loops:

for (size_t i = 0; i < OutputSize; ++i)
{
for (size_t j = 0; j < OutputSize; ++j)
{
for (size_t k = 0; k < InputSize; ++k)
{
TheDistances[k] = sqrt ( pow(TempPoint.x - (*Input)[k].x,2 ) +
pow(TempPoint.y - (*Input)[k].y, 2) );
if (TheDistances[k] != 0)
{
TheDistances[k] = (pow( TheDistances[k],2) *
(log(TheDistances[k])-1));

}
}
} // for (int j = 0; j < OutputSize; ++j)
} // for (int i = 0; i < OutputSize; ++i)

This takes roughly 13.75 seconds.
By copying the values in Input to Point* InputCopy and then
replacing the innerloop with
TheDistances[k] = sqrt ( pow(TempPoint.x - InputCopy[k].x,2 ) +
pow(TempPoint.y - InputCopy[k].y, 2) );

I get the speed up to roughly 0.325 seconds.
What am I missing here that i get this efficiency increase?

Jul 23 '05 #1
5 1847

ve*********@hotmail.com wrote:
I have a function that before I modified it took around 13.75 seconds
to complete after the modification it took .325 seconds to complete.

the function header:
(Point **Input, size_t InputSize, Point *Output, size_t OutputSize);

Point is a simple x,y,z coordinate structure (3 doubles).
TheDistances is an array containing room for InputSize doubles.

Most of the time consumed (99%+) is in the following for loops:

for (size_t i = 0; i < OutputSize; ++i)
{
for (size_t j = 0; j < OutputSize; ++j)
{
for (size_t k = 0; k < InputSize; ++k)
{
TheDistances[k] = sqrt ( pow(TempPoint.x - (*Input)[k].x,2 ) +
pow(TempPoint.y - (*Input)[k].y, 2) );
if (TheDistances[k] != 0)
{
TheDistances[k] = (pow( TheDistances[k],2) *
(log(TheDistances[k])-1));

}
}
} // for (int j = 0; j < OutputSize; ++j)
} // for (int i = 0; i < OutputSize; ++i)

This takes roughly 13.75 seconds.
By copying the values in Input to Point* InputCopy and then
replacing the innerloop with
TheDistances[k] = sqrt ( pow(TempPoint.x - InputCopy[k].x,2 ) +
pow(TempPoint.y - InputCopy[k].y, 2) );

I get the speed up to roughly 0.325 seconds.
What am I missing here that i get this efficiency increase?


Hi,

But where are you using the outer loop varaibles i and j...it seems you
are repeating (OutputSize)^2 times.

-vs_p

Jul 23 '05 #2
whoops thought I copied the entire inside section. The i and j loop are
used to get a value for TempPoint.
And yes this thing loops OutputSize^2 * InputSize.
This is the modified section (altered a bit more to lower roundof
error)
for (size_t i = 0; i < OutputSize; ++i)
{
for (size_t j = 0; j < OutputSize; ++j)
{
TempPoint = Output[j+i*OutputSize];
TempPoint.z = 0;
for (size_t k = 0; k < InputSize; ++k)
{
TheDistances[k] = pow(TempPoint.x - WorkCopyOfInput[k].x,2 ) +
pow(TempPoint.y - WorkCopyOfInput[k].y, 2);
if (TheDistances[k] != 0)
{
TheDistances[k] = (pow( TheDistances[k],2) *
(log(sqrt(TheDistances[k]))-1));
}
}
for (size_t k= 0; k < InputSize; ++k)
{
TempPoint.z+=(TheDistances[k]*Weights[k]);
}
Output[j+i*OutputSize].z = TempPoint.z;
} // for (int j = 0; j < OutputSize; ++j)
} // for (int i = 0; i < OutputSize; ++i)

The original part in the innermost loop was:
TheDistances[k] = pow(TempPoint.x - (*Input)[k].x,2 ) +
pow(TempPoint.y - (*Input)[k].y, 2);
if (TheDistances[k] != 0)
{
TheDistances[k] = (pow( TheDistances[k],2) *
(log(sqrt(TheDistances[k]))-1));

Jul 23 '05 #3

ve*********@hotmail.com wrote:
I have a function that before I modified it took around 13.75 seconds
to complete after the modification it took .325 seconds to complete.

the function header:
(Point **Input, size_t InputSize, Point *Output, size_t OutputSize); Most of the time consumed (99%+) is in the following for loops: .... This takes roughly 13.75 seconds.

By copying the values in Input to Point* InputCopy ...
I get the speed up to roughly 0.325 seconds.
What am I missing here that i get this efficiency increase?


A logical assumption is that the compiler spotted that
InputCopy doesn't change. This means it can cache these
values more aggresively. Input points to unknown memory,
which might even overlap Output. That means the compiler
has to be rather careful, and not use cached Input values
at all.

HTH,
Michiel Salters

Jul 23 '05 #4
msalters wrote:

ve*********@hotmail.com wrote:
I have a function that before I modified it took around 13.75 seconds
to complete after the modification it took .325 seconds to complete.

the function header:
(Point **Input, size_t InputSize, Point *Output, size_t OutputSize);

Most of the time consumed (99%+) is in the following for loops:

...
This takes roughly 13.75 seconds.

By copying the values in Input to Point* InputCopy ...
I get the speed up to roughly 0.325 seconds.
What am I missing here that i get this efficiency increase?


A logical assumption is that the compiler spotted that
InputCopy doesn't change. This means it can cache these
values more aggresively. Input points to unknown memory,
which might even overlap Output. That means the compiler
has to be rather careful, and not use cached Input values
at all.


That's a sensible explanation. But honestly, I don't buy it.
Not for a speedup from 13 to 0.3 seconds in a tight O(3) loop,
which is full of calls to sqrt() and pow(). Even if the value
of *Input is fetched from memory every time, its time would be
overwhelmed by the remaining code.

To the OP: Are you sure you don't compare apples with oranges?
That is: A debug build with a release build
I notice that i and j are *not* used anywhere in the
loop body. So while the compiler will let those 2 loops
intact in debug mode, it would optimize them away in a
release build.
Therefore the debug build would do the very same
calculation OutputSize*OutputSize number of times, while
the release build does it only once.

Are you sure you didn't ruin anthing else? Did you
check if the results are still correct?

--
Karl Heinz Buchegger
kb******@gascad.at
Jul 23 '05 #5
Not comparing a release build with a debug build.
I first did a short version of the problem but then figured out that
that would mangle my question and I thought I replaced the entire bit
with the code in the loops (I posted the complete version later).
i and j are used.to grab a new value for TempPoint.

Still, it's puzzlign that this is happening in anycase the amount of
extra memory used by copying the input isn't that huge (compared to
some of the other memory demands of the program this function is in).

Jul 23 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Fred Bennett | last post by:
I have a simulation project in which data can naturally be held in structures for processing. There are calls to multiple functions involved. Execution speed is an issue. Do I take a big hit for...
4
by: stew dean | last post by:
I am very new to mysql so please forgive me if I may be asking something obvious - I'm also not likely to fully understand the answer but if I know it can be done and someone could point me in the...
4
by: PaulR | last post by:
Hi, We have a Server running SLES 8 and 3GB memory, with 1 DB2 instance and 2 active Databases. General info... DB2level = "DB2 v8.1.0.72", "s040914", "MI00086", and FixPak "7" uname -a =...
4
by: Scott Johnson | last post by:
Hi! Is there a way to "preload" a form using a thread or something else so that my user doesn't have to wait 5 seconds (initializing time) between forms? Some of these forms have tab strips with...
7
by: s99999999s2003 | last post by:
hi my friend has written a loop like this cnt = 0 files = while cnt < len(files) : do_something(files) i told him using for fi in files: do_something(fi)
3
by: Jakob Petersen | last post by:
Hi, I need to increase the speed when retrieving data from a hosted SQL Server into VBA. I'm using simple SELECT statements. How important is the speed of my Internet connection? (I have...
1
by: Kelie | last post by:
hello, would there be any speed increase in code execution after python code being compiled into exe file with py2exe? thanks, kelie
10
by: Devang | last post by:
Hello, I am using php script to upload file. some times if file size is too big(1GB) it takes too much time to upload. Can someone suggest me the way to increase upload speed. thanks
7
by: cmrhema | last post by:
Hi, I have two questions. 1. I have heard that replacing a <tr> <td> with <div> tags increases the rendering speed. Is it so, and if yes which speed does it increase, rendering speed or loading...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.