When I used gprof to see which function consumed most running time, I
identified the following one. sz was less than 5000 on average, but
foo had been called about 1,000,000 times. I have tried using
"register sum = 0.0" and saw some improvement. My question is how to
improve foo further to make it faster.
double foo(double *a, double *b, int sz)
{
double sum = 0.0;
int i;
for (i = 0; i < sz; i++)
sum += a[i]*b[i];
return sum;
} 48 2130
istillsh...@gma il.com wrote:
When I used gprof to see which function consumed most running time, I
identified the following one. sz was less than 5000 on average, but
foo had been called about 1,000,000 times. I have tried using
"register sum = 0.0" and saw some improvement. My question is how to
improve foo further to make it faster.
double foo(double *a, double *b, int sz)
{
double sum = 0.0;
int i;
for (i = 0; i < sz; i++)
sum += a[i]*b[i];
return sum;
}
Twice faster would help a lot.
<is*********@gm ail.comwrote in message
double foo(double *a, double *b, int sz)
{
double sum = 0.0;
int i;
for (i = 0; i < sz; i++)
sum += a[i]*b[i];
return sum;
}
Make a and b float arrays, and sum a float too. On some architectures that
will give about a 2 * speedup. Of course you'll have to use single precision
throughout the program.
The code is so simple that there is unlikely to be any sort of other
micro-optimisiation you can do. However it may be possible to call foo()
less often by rearranging the algorithm.
--
Free games and programming goodies. http://www.personal.leeds.ac.uk/~bgy1mm
<is*********@gm ail.comwrote in message
news:6d******** *************** ***********@k10 g2000prm.google groups.com...
>
istillsh...@gma il.com wrote:
>When I used gprof to see which function consumed most running time, I identified the following one. sz was less than 5000 on average, but foo had been called about 1,000,000 times. I have tried using "register sum = 0.0" and saw some improvement. My question is how to improve foo further to make it faster.
double foo(double *a, double *b, int sz) {
double sum = 0.0; int i;
for (i = 0; i < sz; i++) sum += a[i]*b[i];
return sum; }
Twice faster would help a lot.
Suggestions have already been made for improving the loop.
Reducing the calls from 1000000 will help a lot more, for that the overall
strategy needs to be looked at.
Then there's the data: does sz need to be that high? Is there anything
special about the a and b data (like mostly zeros or ones)?
How accurate does the sum have to be? If a,b change slowly, perhaps sum only
every other pair (then double the result)?
Have you thought about using integers (fixed point)? This might need a
global change.
--
Bart is*********@gma il.com wrote:
When I used gprof to see which function consumed most running time, I
identified the following one. sz was less than 5000 on average, but
foo had been called about 1,000,000 times. I have tried using
"register sum = 0.0" and saw some improvement. My question is how to
improve foo further to make it faster.
double foo(double *a, double *b, int sz)
{
double sum = 0.0;
int i;
for (i = 0; i < sz; i++)
sum += a[i]*b[i];
return sum;
}
With very old compilers you might get some improvement
by "unrolling" the loop, but most compilers have been able
to do that on their own for the past couple decades. As
Willem suggests, examine the generated code to see whether
the loop is unrolled already -- if it isn't, do so manually,
but it probably will have been.
Similarly, you could try rearrangements like
double foo(double *a, double *b, int sz) {
double sum = 0.0;
while (--sz >= 0)
sum += *a++ * *b++;
return sum;
}
.... but compilers have been doing *that* for even longer
than they've been unrolling loops. Adding `register' to
the declarations of a,b,sz and possibly sum might help,
but again: today's compilers are unlikely to overlook that
sort of optimization.
The big payoff -- or potential payoff, because it might
not be feasible -- would come not from making foo() faster,
but from calling it less often. You mention a million
calls to compute dot-products of 5000-place vectors; are
all those ten billion numbers really independent? If one
pair is very similar to the next except perhaps for a small
number of elements, maybe you could just make adjustments
to the original dot-product instead of recomputing the whole
thing from scratch:
double dot = foo(a, b, sz);
/* Make a change in the k'th place: */
dot -= a[k] * b[k];
a[k] = newa;
b[k] = newb;
dot += a[k] * b[k];
If you do this a million times you'll need to worry about
the propagation of small errors; consider using something
like Kahan's summation formula (GIYF) for the adjustments.
(You might also use Kahan's formula for the original sum;
it's slower, but that might not be an issue if you can get
a substantial reduction in foo() calls.)
Or maybe there are many changes to a[] and b[] between
foo() calls, but they tend to be clustered. In that case
you might decompose both arrays into 100 or so segments,
calculate the sum of each, make the changes, and then
recalculate only the segments that changed. Taken to the
extreme, this suggests turning the paired vectors into
trees, with the a[i],b[i] at the leaves, the pairwise sums
one level up, the four-way sums a level higher still, and
so on until the grand total is at the root: Changing the
k'th pair of numbers would then need only about thirteen
additions for a 5000-element pair. (You could do the
same thing with higher-degree trees, too.)
Whatever the situation, I think it unlikely you'll get
a twofold speedup by tweaking foo() itself; stepping back
and looking at the wider problem may have more promise.
-- Er*********@sun .com is*********@gma il.com wrote:
When I used gprof to see which function consumed most running time, I
identified the following one. sz was less than 5000 on average, but
foo had been called about 1,000,000 times. I have tried using
"register sum = 0.0" and saw some improvement. My question is how to
improve foo further to make it faster.
double foo(double *a, double *b, int sz)
{
double sum = 0.0;
int i;
for (i = 0; i < sz; i++)
sum += a[i]*b[i];
return sum;
}
Nobody has mentioned "loop unrolling"? Intel compilers also support
embedded prefetch directives if your target platform is Intel based.
Fei
In article <47************ @gmail.com>, Fei Liu <fe*****@gmail. comwrote:
>Nobody has mentioned "loop unrolling"?
Eric, Willem, and I all mentioned loop unrolling, with Eric and I
mentioning it directly under that name and Willem mentioning the
technique without giving it that name.
--
"Eightly percent of the people in the world are fools and the
rest of us are in danger of contamination." -- Walter Matthau
On Apr 7, 1:39 pm, istillsh...@gma il.com wrote:
When I used gprof to see which function consumed most running time, I
identified the following one. sz was less than 5000 on average, but
foo had been called about 1,000,000 times. I have tried using
"register sum = 0.0" and saw some improvement. My question is how to
improve foo further to make it faster.
double foo(double *a, double *b, int sz)
{
double sum = 0.0;
int i;
for (i = 0; i < sz; i++)
sum += a[i]*b[i];
return sum;
}
Suppose in my application I know that a[i]*b[i] can only take a
limited set of values. For examples, a[i] can only be 0, 0.25, 0.5,
or 0.75; b[i] can only be 0, 0.2, 0.4, 0.6, or 0.8.
I thought I could pre-compute their products and fetch them given
certain a[i] and b[i]. For example,
if (a[i] == 0.25 && b[i] == 0.2)
sum += 0.05;
else if (a[i] == 0.25 && b[i] == 0.4)
sum += 0.1;
....
But this solution made running time longer. I speculated that
conditional statements were more expensive than multiplication.
Do you have any better solution?
On Apr 7, 1:39 pm, istillsh...@gma il.com wrote:
When I used gprof to see which function consumed most running time, I
identified the following one. sz was less than 5000 on average, but
foo had been called about 1,000,000 times. I have tried using
"register sum = 0.0" and saw some improvement. My question is how to
improve foo further to make it faster.
double foo(double *a, double *b, int sz)
{
double sum = 0.0;
int i;
for (i = 0; i < sz; i++)
sum += a[i]*b[i];
return sum;
}
Suppose in my application I know that a[i]*b[i] can only take a
limited set of values. For example, a[i] can only be 0, 0.25, 0.5, or
0.75; b[i] can only be 0, 0.2, 0.4, 0.6, or 0.8.
I thought I could pre-compute their products and fetch them given
certain a[i] and b[i]. For example,
if (a[i] == 0.25 && b[i] == 0.2)
sum += 0.05;
else if (a[i] == 0.25 && b[i] == 0.4)
sum += 0.1;
....
But the above solution made the running time considerably longer. I
speculated that conditional statements were more expensive than
multiplication.
Are better solutions possible?
In article <89************ *************** *******@k1g2000 prb.googlegroup s.com>,
<is*********@gm ail.comwrote:
>Suppose in my application I know that a[i]*b[i] can only take a limited set of values. For example, a[i] can only be 0, 0.25, 0.5, or 0.75; b[i] can only be 0, 0.2, 0.4, 0.6, or 0.8. I thought I could pre-compute their products and fetch them given certain a[i] and b[i]. For example,
if (a[i] == 0.25 && b[i] == 0.2)
sum += 0.05; else if (a[i] == 0.25 && b[i] == 0.4)
sum += 0.1; ...
But the above solution made the running time considerably longer. I speculated that conditional statements were more expensive than multiplication .
Yes, that's quite likely. I would expect a modern compiler to generate
good code for a simple loop of multiplications .
If the values are from some small set, perhaps it would be better not
to store them as floating point numbers. Can you use integers instead
by multiplying by a constant?
Or you could store the values as small integers even if they are not
simple multiples, and have a 2-dimensional array in which you look up
the answers.
Or perhaps you could split your problem into multiple threads.
Of course, if your sequence of multiplications is really so huge that
it is important to speed it up, you should also reconsider your
algorithm.
-- Richard
--
:wq This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Kamilche |
last post by:
I have a routine that I really need, but it slows down processing
significantly. Can you spot any ineffeciencies in the code?
This code makes a critical function of mine run about 7x slower than
using a prebuilt format string. For maximum flexibility, it would be
best to calculate the format string using this method, so I'd dearly
love to keep it.
def fmtstring(args):
delim = '\0'
|
by: Brent Patroch |
last post by:
Hello,
Novice here, I am doing bulk emails using CDO, connection to a smtp
server at another location. I am trying to streamline my script, or
through it out and start over to make it faster.
I know that I am doing something wrong and that messages should be
sending much faster. Any help appreciated:
Set objConfig = Server.CreateObject("CDO.Configuration")
|
by: ajikoe |
last post by:
def f(x,y):
return math.sin(x*y) + 8 * x
I have code like this:
def main():
n = 2000
a = zeros((n,n), Float)
xcoor = arange(0,1,1/float(n))
ycoor = arange(0,1,1/float(n))
|
by: pkilambi |
last post by:
I wrote this function which does the following:
after readling lines from file.It splits and finds the word occurences
through a hash table...for some reason this is quite slow..can some one
help me make it faster...
f = open(filename)
lines = f.readlines()
def create_words(lines):
cnt = 0
spl_set = '+'
for content in lines:
|
by: Scott Emick |
last post by:
I am using the following to compute distances between two lat/long
coordinates for a store locator - (VB .NET 2003)
it seems to take a long time to iterate through like 100-150 locations -
about 10-15 seconds...I want to make the code faster. I changed it to be
multi-threaded, and it doesn't really make it any faster. The bottleneck
seems to be with the math computations.
Any ideas like changing my data types or other ideas etc would...
| |
by: Extremest |
last post by:
I know there are ways to make this a lot faster. Any
newsreader does this in seconds. I don't know how they do
it and I am very new to c#. If anyone knows a faster way
please let me know. All I am doing is quering the db for
all the headers for a certain group and then going through
them to find all the parts of each post. I only want ones
that are complete. Meaning all segments for that one file
posted are there.
using System;
|
by: vunet.us |
last post by:
Is there a suggestion I can make this code run faster:
if(document.getElementById("1")){ doOne(); }
if(document.getElementById("2")){ doTwo(); }
....................
if(document.getElementById("n")){ doN(); }
It is a simplified version above. There is a large number of these
repetitive actions. So I wanted to change them for:
|
by: Simply_Red |
last post by:
Hi,
is there a way to make this function faster???
struct Points {
double X;
double Y;
};
|
by: Steve Bergman |
last post by:
I'm involved in a discussion thread in which it has been stated that:
"""
Anything written in a language that is 20x slower (Perl, Python,
PHP) than C/C++ should be instantly rejected by users on those grounds
alone.
"""
I've challenged someone to beat the snippet of code below in C, C++,
or assembler, for reading in one million pairs of random floats and
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
| |
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |