How to make it faster?

istillshine

When I used gprof to see which function consumed most running time, I
identified the following one. sz was less than 5000 on average, but
foo had been called about 1,000,000 times. I have tried using
"register sum = 0.0" and saw some improvement. My question is how to
improve foo further to make it faster.

double foo(double *a, double *b, int sz)
{

double sum = 0.0;
int i;

for (i = 0; i < sz; i++)
sum += a[i]*b[i];

return sum;
}

Apr 7 '08 #1

Subscribe Reply

2130

1
2
3
>
Last »

istillshine

istillsh...@gma il.com wrote:

When I used gprof to see which function consumed most running time, I
identified the following one. sz was less than 5000 on average, but
foo had been called about 1,000,000 times. I have tried using
"register sum = 0.0" and saw some improvement. My question is how to
improve foo further to make it faster.

double foo(double *a, double *b, int sz)
{

double sum = 0.0;
int i;

for (i = 0; i < sz; i++)
sum += a[i]*b[i];

return sum;
}

Twice faster would help a lot.

Apr 7 '08 #2

Malcolm McLean

<is*********@gm ail.comwrote in message

double foo(double *a, double *b, int sz)
{

double sum = 0.0;
int i;

for (i = 0; i < sz; i++)
sum += a[i]*b[i];

return sum;
}

Make a and b float arrays, and sum a float too. On some architectures that
will give about a 2 * speedup. Of course you'll have to use single precision
throughout the program.
The code is so simple that there is unlikely to be any sort of other
micro-optimisiation you can do. However it may be possible to call foo()
less often by rearranging the algorithm.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Apr 7 '08 #3

Bartc

<is*********@gm ail.comwrote in message
news:6d******** *************** ***********@k10 g2000prm.google groups.com...

>

istillsh...@gma il.com wrote:
>When I used gprof to see which function consumed most running time, I
identified the following one. sz was less than 5000 on average, but
foo had been called about 1,000,000 times. I have tried using
"register sum = 0.0" and saw some improvement. My question is how to
improve foo further to make it faster.

double foo(double *a, double *b, int sz)
{

double sum = 0.0;
int i;

for (i = 0; i < sz; i++)
sum += a[i]*b[i];

return sum;
}

Twice faster would help a lot.

Suggestions have already been made for improving the loop.

Reducing the calls from 1000000 will help a lot more, for that the overall
strategy needs to be looked at.

Then there's the data: does sz need to be that high? Is there anything
special about the a and b data (like mostly zeros or ones)?

How accurate does the sum have to be? If a,b change slowly, perhaps sum only
every other pair (then double the result)?

Have you thought about using integers (fixed point)? This might need a
global change.

--
Bart

Apr 7 '08 #4

Eric Sosman

is*********@gma il.com wrote:

When I used gprof to see which function consumed most running time, I
identified the following one. sz was less than 5000 on average, but
foo had been called about 1,000,000 times. I have tried using
"register sum = 0.0" and saw some improvement. My question is how to
improve foo further to make it faster.

double foo(double *a, double *b, int sz)
{

double sum = 0.0;
int i;

for (i = 0; i < sz; i++)
sum += a[i]*b[i];

return sum;
}

With very old compilers you might get some improvement
by "unrolling" the loop, but most compilers have been able
to do that on their own for the past couple decades. As
Willem suggests, examine the generated code to see whether
the loop is unrolled already -- if it isn't, do so manually,
but it probably will have been.

Similarly, you could try rearrangements like

double foo(double *a, double *b, int sz) {
double sum = 0.0;
while (--sz >= 0)
sum += *a++ * *b++;
return sum;
}

.... but compilers have been doing *that* for even longer
than they've been unrolling loops. Adding `register' to
the declarations of a,b,sz and possibly sum might help,
but again: today's compilers are unlikely to overlook that
sort of optimization.

The big payoff -- or potential payoff, because it might
not be feasible -- would come not from making foo() faster,
but from calling it less often. You mention a million
calls to compute dot-products of 5000-place vectors; are
all those ten billion numbers really independent? If one
pair is very similar to the next except perhaps for a small
number of elements, maybe you could just make adjustments
to the original dot-product instead of recomputing the whole
thing from scratch:

double dot = foo(a, b, sz);
/* Make a change in the k'th place: */
dot -= a[k] * b[k];
a[k] = newa;
b[k] = newb;
dot += a[k] * b[k];

If you do this a million times you'll need to worry about
the propagation of small errors; consider using something
like Kahan's summation formula (GIYF) for the adjustments.
(You might also use Kahan's formula for the original sum;
it's slower, but that might not be an issue if you can get
a substantial reduction in foo() calls.)

Or maybe there are many changes to a[] and b[] between
foo() calls, but they tend to be clustered. In that case
you might decompose both arrays into 100 or so segments,
calculate the sum of each, make the changes, and then
recalculate only the segments that changed. Taken to the
extreme, this suggests turning the paired vectors into
trees, with the a[i],b[i] at the leaves, the pairwise sums
one level up, the four-way sums a level higher still, and
so on until the grand total is at the root: Changing the
k'th pair of numbers would then need only about thirteen
additions for a 5000-element pair. (You could do the
same thing with higher-degree trees, too.)

Whatever the situation, I think it unlikely you'll get
a twofold speedup by tweaking foo() itself; stepping back
and looking at the wider problem may have more promise.

--
Er*********@sun .com

Apr 7 '08 #5

Fei Liu

is*********@gma il.com wrote:

When I used gprof to see which function consumed most running time, I
identified the following one. sz was less than 5000 on average, but
foo had been called about 1,000,000 times. I have tried using
"register sum = 0.0" and saw some improvement. My question is how to
improve foo further to make it faster.

double foo(double *a, double *b, int sz)
{

double sum = 0.0;
int i;

for (i = 0; i < sz; i++)
sum += a[i]*b[i];

return sum;
}

Nobody has mentioned "loop unrolling"? Intel compilers also support
embedded prefetch directives if your target platform is Intel based.

Fei

Apr 7 '08 #6

Walter Roberson

In article <47************ @gmail.com>, Fei Liu <fe*****@gmail. comwrote:

>Nobody has mentioned "loop unrolling"?

Eric, Willem, and I all mentioned loop unrolling, with Eric and I
mentioning it directly under that name and Willem mentioning the
technique without giving it that name.
--
"Eightly percent of the people in the world are fools and the
rest of us are in danger of contamination." -- Walter Matthau

Apr 7 '08 #7

istillshine

On Apr 7, 1:39 pm, istillsh...@gma il.com wrote:

When I used gprof to see which function consumed most running time, I
identified the following one. sz was less than 5000 on average, but
foo had been called about 1,000,000 times. I have tried using
"register sum = 0.0" and saw some improvement. My question is how to
improve foo further to make it faster.

double foo(double *a, double *b, int sz)
{

double sum = 0.0;
int i;

for (i = 0; i < sz; i++)
sum += a[i]*b[i];

return sum;

}

Suppose in my application I know that a[i]*b[i] can only take a
limited set of values. For examples, a[i] can only be 0, 0.25, 0.5,
or 0.75; b[i] can only be 0, 0.2, 0.4, 0.6, or 0.8.
I thought I could pre-compute their products and fetch them given
certain a[i] and b[i]. For example,

if (a[i] == 0.25 && b[i] == 0.2)
sum += 0.05;
else if (a[i] == 0.25 && b[i] == 0.4)
sum += 0.1;
....

But this solution made running time longer. I speculated that
conditional statements were more expensive than multiplication.

Do you have any better solution?

Apr 7 '08 #8

istillshine

On Apr 7, 1:39 pm, istillsh...@gma il.com wrote:

When I used gprof to see which function consumed most running time, I
identified the following one. sz was less than 5000 on average, but
foo had been called about 1,000,000 times. I have tried using
"register sum = 0.0" and saw some improvement. My question is how to
improve foo further to make it faster.

double foo(double *a, double *b, int sz)
{

double sum = 0.0;
int i;

for (i = 0; i < sz; i++)
sum += a[i]*b[i];

return sum;

}

Suppose in my application I know that a[i]*b[i] can only take a
limited set of values. For example, a[i] can only be 0, 0.25, 0.5, or
0.75; b[i] can only be 0, 0.2, 0.4, 0.6, or 0.8.
I thought I could pre-compute their products and fetch them given
certain a[i] and b[i]. For example,

if (a[i] == 0.25 && b[i] == 0.2)
sum += 0.05;
else if (a[i] == 0.25 && b[i] == 0.4)
sum += 0.1;
....

But the above solution made the running time considerably longer. I
speculated that conditional statements were more expensive than
multiplication.

Are better solutions possible?

Apr 7 '08 #9

Richard Tobin

In article <89************ *************** *******@k1g2000 prb.googlegroup s.com>,
<is*********@gm ail.comwrote:

>Suppose in my application I know that a[i]*b[i] can only take a
limited set of values. For example, a[i] can only be 0, 0.25, 0.5, or
0.75; b[i] can only be 0, 0.2, 0.4, 0.6, or 0.8.
I thought I could pre-compute their products and fetch them given
certain a[i] and b[i]. For example,

if (a[i] == 0.25 && b[i] == 0.2)
sum += 0.05;
else if (a[i] == 0.25 && b[i] == 0.4)
sum += 0.1;
...

But the above solution made the running time considerably longer. I
speculated that conditional statements were more expensive than
multiplication .

Yes, that's quite likely. I would expect a modern compiler to generate
good code for a simple loop of multiplications .

If the values are from some small set, perhaps it would be better not
to store them as floating point numbers. Can you use integers instead
by multiplying by a constant?

Or you could store the values as small integers even if they are not
simple multiples, and have a 2-dimensional array in which you look up
the answers.

Or perhaps you could split your problem into multiple threads.

Of course, if your sequence of multiplications is really so huge that
it is important to speed it up, you should also reconsider your
algorithm.

-- Richard
--
:wq

Apr 7 '08 #10

Similar topics

1247

Can you make this faster?

by: Kamilche | last post by:

I have a routine that I really need, but it slows down processing significantly. Can you spot any ineffeciencies in the code? This code makes a critical function of mine run about 7x slower than using a prebuilt format string. For maximum flexibility, it would be best to calculate the format string using this method, so I'd dearly love to keep it. def fmtstring(args): delim = '\0'

Python

2197

CDO Bulk Email Help - Need to make it faster

by: Brent Patroch | last post by:

Hello, Novice here, I am doing bulk emails using CDO, connection to a smtp server at another location. I am trying to streamline my script, or through it out and start over to make it faster. I know that I am doing something wrong and that messages should be sending much faster. Any help appreciated: Set objConfig = Server.CreateObject("CDO.Configuration")

ASP / Active Server Pages

1829

how to make this code faster

by: ajikoe | last post by:

def f(x,y): return math.sin(x*y) + 8 * x I have code like this: def main(): n = 2000 a = zeros((n,n), Float) xcoor = arange(0,1,1/float(n)) ycoor = arange(0,1,1/float(n))

Python

1991

help make it faster please

by: pkilambi | last post by:

I wrote this function which does the following: after readling lines from file.It splits and finds the word occurences through a hash table...for some reason this is quite slow..can some one help me make it faster... f = open(filename) lines = f.readlines() def create_words(lines): cnt = 0 spl_set = '+' for content in lines:

Python

3272

how can I make this code run faster

by: Scott Emick | last post by:

I am using the following to compute distances between two lat/long coordinates for a store locator - (VB .NET 2003) it seems to take a long time to iterate through like 100-150 locations - about 10-15 seconds...I want to make the code faster. I changed it to be multi-threaded, and it doesn't really make it any faster. The bottleneck seems to be with the math computations. Any ideas like changing my data types or other ideas etc would...

Visual Basic .NET

2186

Can anyone tell me of any optimations I could do to this to make it faster?

by: Extremest | last post by:

I know there are ways to make this a lot faster. Any newsreader does this in seconds. I don't know how they do it and I am very new to c#. If anyone knows a faster way please let me know. All I am doing is quering the db for all the headers for a certain group and then going through them to find all the parts of each post. I only want ones that are complete. Meaning all segments for that one file posted are there. using System;

C# / C Sharp

1636

Make This Subroutine Faster

by: vunet.us | last post by:

Is there a suggestion I can make this code run faster: if(document.getElementById("1")){ doOne(); } if(document.getElementById("2")){ doTwo(); } .................... if(document.getElementById("n")){ doN(); } It is a simplified version above. There is a large number of these repetitive actions. So I wanted to change them for:

Javascript

2136

make it faster

by: Simply_Red | last post by:

Hi, is there a way to make this function faster??? struct Points { double X; double Y; };

C / C++

1924

How to make this unpickling/sorting demo faster?

by: Steve Bergman | last post by:

I'm involved in a discussion thread in which it has been stated that: """ Anything written in a language that is 20x slower (Perl, Python, PHP) than C/C++ should be instantly rejected by users on those grounds alone. """ I've challenged someone to beat the snippet of code below in C, C++, or assembler, for reading in one million pairs of random floats and

Python

9673

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

10443

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

10165

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

10002

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

9044

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

5437

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5565

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

4113

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

2921

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General