473,795 Members | 2,805 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to make it faster?

When I used gprof to see which function consumed most running time, I
identified the following one. sz was less than 5000 on average, but
foo had been called about 1,000,000 times. I have tried using
"register sum = 0.0" and saw some improvement. My question is how to
improve foo further to make it faster.

double foo(double *a, double *b, int sz)
{

double sum = 0.0;
int i;

for (i = 0; i < sz; i++)
sum += a[i]*b[i];

return sum;
}
Apr 7 '08 #1
48 2130


istillsh...@gma il.com wrote:
When I used gprof to see which function consumed most running time, I
identified the following one. sz was less than 5000 on average, but
foo had been called about 1,000,000 times. I have tried using
"register sum = 0.0" and saw some improvement. My question is how to
improve foo further to make it faster.

double foo(double *a, double *b, int sz)
{

double sum = 0.0;
int i;

for (i = 0; i < sz; i++)
sum += a[i]*b[i];

return sum;
}
Twice faster would help a lot.
Apr 7 '08 #2

<is*********@gm ail.comwrote in message
double foo(double *a, double *b, int sz)
{

double sum = 0.0;
int i;

for (i = 0; i < sz; i++)
sum += a[i]*b[i];

return sum;
}
Make a and b float arrays, and sum a float too. On some architectures that
will give about a 2 * speedup. Of course you'll have to use single precision
throughout the program.
The code is so simple that there is unlikely to be any sort of other
micro-optimisiation you can do. However it may be possible to call foo()
less often by rearranging the algorithm.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Apr 7 '08 #3

<is*********@gm ail.comwrote in message
news:6d******** *************** ***********@k10 g2000prm.google groups.com...
>

istillsh...@gma il.com wrote:
>When I used gprof to see which function consumed most running time, I
identified the following one. sz was less than 5000 on average, but
foo had been called about 1,000,000 times. I have tried using
"register sum = 0.0" and saw some improvement. My question is how to
improve foo further to make it faster.

double foo(double *a, double *b, int sz)
{

double sum = 0.0;
int i;

for (i = 0; i < sz; i++)
sum += a[i]*b[i];

return sum;
}

Twice faster would help a lot.
Suggestions have already been made for improving the loop.

Reducing the calls from 1000000 will help a lot more, for that the overall
strategy needs to be looked at.

Then there's the data: does sz need to be that high? Is there anything
special about the a and b data (like mostly zeros or ones)?

How accurate does the sum have to be? If a,b change slowly, perhaps sum only
every other pair (then double the result)?

Have you thought about using integers (fixed point)? This might need a
global change.

--
Bart
Apr 7 '08 #4
is*********@gma il.com wrote:
When I used gprof to see which function consumed most running time, I
identified the following one. sz was less than 5000 on average, but
foo had been called about 1,000,000 times. I have tried using
"register sum = 0.0" and saw some improvement. My question is how to
improve foo further to make it faster.

double foo(double *a, double *b, int sz)
{

double sum = 0.0;
int i;

for (i = 0; i < sz; i++)
sum += a[i]*b[i];

return sum;
}
With very old compilers you might get some improvement
by "unrolling" the loop, but most compilers have been able
to do that on their own for the past couple decades. As
Willem suggests, examine the generated code to see whether
the loop is unrolled already -- if it isn't, do so manually,
but it probably will have been.

Similarly, you could try rearrangements like

double foo(double *a, double *b, int sz) {
double sum = 0.0;
while (--sz >= 0)
sum += *a++ * *b++;
return sum;
}

.... but compilers have been doing *that* for even longer
than they've been unrolling loops. Adding `register' to
the declarations of a,b,sz and possibly sum might help,
but again: today's compilers are unlikely to overlook that
sort of optimization.

The big payoff -- or potential payoff, because it might
not be feasible -- would come not from making foo() faster,
but from calling it less often. You mention a million
calls to compute dot-products of 5000-place vectors; are
all those ten billion numbers really independent? If one
pair is very similar to the next except perhaps for a small
number of elements, maybe you could just make adjustments
to the original dot-product instead of recomputing the whole
thing from scratch:

double dot = foo(a, b, sz);
/* Make a change in the k'th place: */
dot -= a[k] * b[k];
a[k] = newa;
b[k] = newb;
dot += a[k] * b[k];

If you do this a million times you'll need to worry about
the propagation of small errors; consider using something
like Kahan's summation formula (GIYF) for the adjustments.
(You might also use Kahan's formula for the original sum;
it's slower, but that might not be an issue if you can get
a substantial reduction in foo() calls.)

Or maybe there are many changes to a[] and b[] between
foo() calls, but they tend to be clustered. In that case
you might decompose both arrays into 100 or so segments,
calculate the sum of each, make the changes, and then
recalculate only the segments that changed. Taken to the
extreme, this suggests turning the paired vectors into
trees, with the a[i],b[i] at the leaves, the pairwise sums
one level up, the four-way sums a level higher still, and
so on until the grand total is at the root: Changing the
k'th pair of numbers would then need only about thirteen
additions for a 5000-element pair. (You could do the
same thing with higher-degree trees, too.)

Whatever the situation, I think it unlikely you'll get
a twofold speedup by tweaking foo() itself; stepping back
and looking at the wider problem may have more promise.

--
Er*********@sun .com
Apr 7 '08 #5
is*********@gma il.com wrote:
When I used gprof to see which function consumed most running time, I
identified the following one. sz was less than 5000 on average, but
foo had been called about 1,000,000 times. I have tried using
"register sum = 0.0" and saw some improvement. My question is how to
improve foo further to make it faster.

double foo(double *a, double *b, int sz)
{

double sum = 0.0;
int i;

for (i = 0; i < sz; i++)
sum += a[i]*b[i];

return sum;
}
Nobody has mentioned "loop unrolling"? Intel compilers also support
embedded prefetch directives if your target platform is Intel based.

Fei
Apr 7 '08 #6
In article <47************ @gmail.com>, Fei Liu <fe*****@gmail. comwrote:
>Nobody has mentioned "loop unrolling"?
Eric, Willem, and I all mentioned loop unrolling, with Eric and I
mentioning it directly under that name and Willem mentioning the
technique without giving it that name.
--
"Eightly percent of the people in the world are fools and the
rest of us are in danger of contamination." -- Walter Matthau
Apr 7 '08 #7
On Apr 7, 1:39 pm, istillsh...@gma il.com wrote:
When I used gprof to see which function consumed most running time, I
identified the following one. sz was less than 5000 on average, but
foo had been called about 1,000,000 times. I have tried using
"register sum = 0.0" and saw some improvement. My question is how to
improve foo further to make it faster.

double foo(double *a, double *b, int sz)
{

double sum = 0.0;
int i;

for (i = 0; i < sz; i++)
sum += a[i]*b[i];

return sum;

}
Suppose in my application I know that a[i]*b[i] can only take a
limited set of values. For examples, a[i] can only be 0, 0.25, 0.5,
or 0.75; b[i] can only be 0, 0.2, 0.4, 0.6, or 0.8.
I thought I could pre-compute their products and fetch them given
certain a[i] and b[i]. For example,

if (a[i] == 0.25 && b[i] == 0.2)
sum += 0.05;
else if (a[i] == 0.25 && b[i] == 0.4)
sum += 0.1;
....

But this solution made running time longer. I speculated that
conditional statements were more expensive than multiplication.

Do you have any better solution?
Apr 7 '08 #8
On Apr 7, 1:39 pm, istillsh...@gma il.com wrote:
When I used gprof to see which function consumed most running time, I
identified the following one. sz was less than 5000 on average, but
foo had been called about 1,000,000 times. I have tried using
"register sum = 0.0" and saw some improvement. My question is how to
improve foo further to make it faster.

double foo(double *a, double *b, int sz)
{

double sum = 0.0;
int i;

for (i = 0; i < sz; i++)
sum += a[i]*b[i];

return sum;

}
Suppose in my application I know that a[i]*b[i] can only take a
limited set of values. For example, a[i] can only be 0, 0.25, 0.5, or
0.75; b[i] can only be 0, 0.2, 0.4, 0.6, or 0.8.
I thought I could pre-compute their products and fetch them given
certain a[i] and b[i]. For example,

if (a[i] == 0.25 && b[i] == 0.2)
sum += 0.05;
else if (a[i] == 0.25 && b[i] == 0.4)
sum += 0.1;
....

But the above solution made the running time considerably longer. I
speculated that conditional statements were more expensive than
multiplication.

Are better solutions possible?
Apr 7 '08 #9
In article <89************ *************** *******@k1g2000 prb.googlegroup s.com>,
<is*********@gm ail.comwrote:
>Suppose in my application I know that a[i]*b[i] can only take a
limited set of values. For example, a[i] can only be 0, 0.25, 0.5, or
0.75; b[i] can only be 0, 0.2, 0.4, 0.6, or 0.8.
I thought I could pre-compute their products and fetch them given
certain a[i] and b[i]. For example,

if (a[i] == 0.25 && b[i] == 0.2)
sum += 0.05;
else if (a[i] == 0.25 && b[i] == 0.4)
sum += 0.1;
...

But the above solution made the running time considerably longer. I
speculated that conditional statements were more expensive than
multiplication .
Yes, that's quite likely. I would expect a modern compiler to generate
good code for a simple loop of multiplications .

If the values are from some small set, perhaps it would be better not
to store them as floating point numbers. Can you use integers instead
by multiplying by a constant?

Or you could store the values as small integers even if they are not
simple multiples, and have a 2-dimensional array in which you look up
the answers.

Or perhaps you could split your problem into multiple threads.

Of course, if your sequence of multiplications is really so huge that
it is important to speed it up, you should also reconsider your
algorithm.

-- Richard
--
:wq
Apr 7 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
1247
by: Kamilche | last post by:
I have a routine that I really need, but it slows down processing significantly. Can you spot any ineffeciencies in the code? This code makes a critical function of mine run about 7x slower than using a prebuilt format string. For maximum flexibility, it would be best to calculate the format string using this method, so I'd dearly love to keep it. def fmtstring(args): delim = '\0'
1
2197
by: Brent Patroch | last post by:
Hello, Novice here, I am doing bulk emails using CDO, connection to a smtp server at another location. I am trying to streamline my script, or through it out and start over to make it faster. I know that I am doing something wrong and that messages should be sending much faster. Any help appreciated: Set objConfig = Server.CreateObject("CDO.Configuration")
6
1829
by: ajikoe | last post by:
def f(x,y): return math.sin(x*y) + 8 * x I have code like this: def main(): n = 2000 a = zeros((n,n), Float) xcoor = arange(0,1,1/float(n)) ycoor = arange(0,1,1/float(n))
19
1991
by: pkilambi | last post by:
I wrote this function which does the following: after readling lines from file.It splits and finds the word occurences through a hash table...for some reason this is quite slow..can some one help me make it faster... f = open(filename) lines = f.readlines() def create_words(lines): cnt = 0 spl_set = '+' for content in lines:
8
3272
by: Scott Emick | last post by:
I am using the following to compute distances between two lat/long coordinates for a store locator - (VB .NET 2003) it seems to take a long time to iterate through like 100-150 locations - about 10-15 seconds...I want to make the code faster. I changed it to be multi-threaded, and it doesn't really make it any faster. The bottleneck seems to be with the math computations. Any ideas like changing my data types or other ideas etc would...
10
2186
by: Extremest | last post by:
I know there are ways to make this a lot faster. Any newsreader does this in seconds. I don't know how they do it and I am very new to c#. If anyone knows a faster way please let me know. All I am doing is quering the db for all the headers for a certain group and then going through them to find all the parts of each post. I only want ones that are complete. Meaning all segments for that one file posted are there. using System;
12
1636
by: vunet.us | last post by:
Is there a suggestion I can make this code run faster: if(document.getElementById("1")){ doOne(); } if(document.getElementById("2")){ doTwo(); } .................... if(document.getElementById("n")){ doN(); } It is a simplified version above. There is a large number of these repetitive actions. So I wanted to change them for:
13
2136
by: Simply_Red | last post by:
Hi, is there a way to make this function faster??? struct Points { double X; double Y; };
7
1924
by: Steve Bergman | last post by:
I'm involved in a discussion thread in which it has been stated that: """ Anything written in a language that is 20x slower (Perl, Python, PHP) than C/C++ should be instantly rejected by users on those grounds alone. """ I've challenged someone to beat the snippet of code below in C, C++, or assembler, for reading in one million pairs of random floats and
0
9673
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10443
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10165
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10002
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9044
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5437
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5565
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4113
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2921
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.