473,804 Members | 3,311 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to make it faster?

When I used gprof to see which function consumed most running time, I
identified the following one. sz was less than 5000 on average, but
foo had been called about 1,000,000 times. I have tried using
"register sum = 0.0" and saw some improvement. My question is how to
improve foo further to make it faster.

double foo(double *a, double *b, int sz)
{

double sum = 0.0;
int i;

for (i = 0; i < sz; i++)
sum += a[i]*b[i];

return sum;
}
Apr 7 '08
48 2139
is*********@gma il.com writes:
>>Or compute exp(sum) instead of sum, if that doesn't overflow:
(...)
I came up with a solution.
exp_sum *= (z1/z2)*HUGE_VAL;
return sz*log(HUGE_VAL ) + log(exp_sum)

How large this HUGE_VAL should be?
Depends on your data set and the range of result values. Looks to me
like if z1/z2 are normally about the same size, HUGE_VAL should be 1...
I have little experience with numerical programming, but I suppose a
good value would be one which normally yields exp_sum == ca 1, i.e.
HUGE_VAL = exp(result/sz). But if you need this, maybe that means
the data set is too variable and you had better do log() in the loop.

--
Hallvard
Apr 9 '08 #41
is*********@gma il.com writes:
exp_sum *= (z1/z2)*HUGE_VAL;
return sz*log(HUGE_VAL ) + log(exp_sum)
Eh. I'm getting tired, but shouldn't that be exp_sum *= (z1/z2) /
HUGE_VAL or return log(exp_sum) - sz*log(HUGE_VAL )?

And a similar error carried over to my answer. (The idea was to
try to collect a normal value of the answer in HUGE_VAL so
so log(exp_sum) return approximately 0 for a normal data set.)

--
Hallvard
Apr 9 '08 #42
On Apr 9, 1:19 pm, Hallvard B Furuseth <h.b.furus...@u sit.uio.no>
wrote:
istillsh...@gma il.com writes:
Can also get rid of the multiplications with exp0:
Instead of exp1, exp2, x0, ... z2, keep variables
with values (current exp1)/exp0 ... (current z2)/exp0.
I don't understand this

exp1 = exp(x->data[0] - 1.0);
exp2 = exp(x->data[1] - 1.0);
for (i = 0; i < sz; i++) {
x0 = g0[i];
y0 = f0[i];
(no other changes in the loop)
}

Variables <exp,x,y,z><0,1 ,2now have values (original value / exp0).
sum and df->data do not change (except due to rounding), since
they get values (foo/exp0) / (bar/exp0) instead of foo/bar.
Got it. So smart.
Apr 9 '08 #43
On Apr 9, 2:29 pm, istillsh...@gma il.com wrote:
>
Combined with the previous 3 secs' saving, the running time has
reduced to 11 secs from 71 secs. Thank you. I really learned
something new and useful.
After setting -O2 flag in compiling, it only took 7 secs. The running
time is satisfactory. Your ideas are indispensable.
Apr 9 '08 #44
On Apr 9, 2:29 pm, istillsh...@gma il.com wrote:
Combined with the previous 3 secs' saving, the running time has
reduced to 11 secs from 71 secs. Thank you. I really learned
something new and useful.
After setting -O2 flag in compiling, it only took 7 secs.
Apr 9 '08 #45
On Apr 9, 2:50 pm, Paul Hsieh <websn...@gmail .comwrote:
#define ITER 5

Change this to:

#define ITER 64

Just trust me, and try it out.
I tried using "#define ITER 64". Overflow occurred in this setting.
I got "1.#J". I don't know what "1.#J" stands for? When I changed
the type of sump and sumd to long double, the problem disappeared.
The program saved 1 sec because of larger ITER.

I also tried another way:

#define ITER 64

register double sum = 0.0;

if (i % ITER == 0) {
sum += log(prodn/prodd); /* introducing log, but not so frequent */
prodn = 1.0;
prodd = 1.0;
}

In the way, the running time remained the same: 11 secs before setting
-O2, and 7 secs after setting -O2. However, the return value was a
little different.
Apr 9 '08 #46
On Apr 9, 3:51 pm, istillsh...@gma il.com wrote:
On Apr 9, 2:50 pm, Paul Hsieh <websn...@gmail .comwrote:
#define ITER 5
Change this to:
#define ITER 64
Just trust me, and try it out.

I tried using "#define ITER 64". Overflow occurred in this setting.
I got "1.#J". I don't know what "1.#J" stands for?
Its a kind of NaN which is in fact an overflow condition of some
kind. In that case, you should try 4 or 8. But the key is to use a
power of 2. This will allow your compiler to use an "as-if" rule to
reduce the cost of executing the "%" operator. Otherwise you can use
a target counter:

if (i >= tgtI) {
tgtI += ITER;
prodn /= prodd;
prodd = 1.0;
}

That would probably work just as well.
[...] When I changed
the type of sump and sumd to long double, the problem disappeared.
The program saved 1 sec because of larger ITER.

I also tried another way:

#define ITER 64

register double sum = 0.0;

if (i % ITER == 0) {
sum += log(prodn/prodd); /* introducing log, but not so frequent */
prodn = 1.0;
prodd = 1.0;

}

In the way, the running time remained the same: 11 secs before setting
-O2, and 7 secs after setting -O2. However, the return value was a
little different.
Well I would suggest trying other values of ITER first.
Unfortunately, long double sounds like its use a larger likely more
expensive data type which may likely have impact on the performance of
rest of your code. So I would recommend against doing that.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
Apr 9 '08 #47
On Apr 9, 7:50*pm, Paul Hsieh <websn...@gmail .comwrote:
Glad to hear it. *Transcendental calls are the worst in terms of
performance. *After that comes divides and modulos. *After that is
branch mis-predictions. *If you can optimize *all that, you will at
least be within 2x-4x of optimal (barring algorithm and IO
improvements) pretty much all time.
Usually the total cost is computational cost plus cost of data access.
Since the computational cost has by now been reduced by a factor ten,
it would now be a good idea to look at things like cache usage, but
that would require a bigger picture.

It was mentioned that g [i] is always one of 0.0, 0.25, 0.5 and 0.75.
A trivial change (changing these arrays from double to float) would
save 25% of memory bandwidth. A simple experiment would be to measure
the time of one program run, then do the same number of calls to foo,
but always with the same set of data. If that is a lot faster, then
cache improvements can help.
Apr 11 '08 #48
On Apr 11, 3:56*pm, "christian. bau" <christian....@ cbau.wanadoo.co .uk>
wrote:
On Apr 9, 7:50*pm, Paul Hsieh <websn...@gmail .comwrote:
Glad to hear it. *Transcendental calls are the worst in terms of
performance. *After that comes divides and modulos. *After that is
branch mis-predictions. *If you can optimize *all that, you will at
least be within 2x-4x of optimal (barring algorithm and IO
improvements) pretty much all time.

Usually the total cost is computational cost plus cost of data access.
Since the computational cost has by now been reduced by a factor ten,
it would now be a good idea to look at things like cache usage, but
that would require a bigger picture.

It was mentioned that g [i] is always one of 0.0, 0.25, 0.5 and 0.75.
A trivial change (changing these arrays from double to float) would
save 25% of memory bandwidth. A simple experiment would be to measure
the time of one program run, then do the same number of calls to foo,
but always with the same set of data. If that is a lot faster, then
cache improvements can help.
If the problem were scaled so that g[i] contained characters of
0,1,2,3 {which represent multiples of 0.25} it would reduce the
requirement for 4 byte float to 1 byte for char (assuming 32 bit float
and 8 bit char).
Jun 27 '08 #49

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
1248
by: Kamilche | last post by:
I have a routine that I really need, but it slows down processing significantly. Can you spot any ineffeciencies in the code? This code makes a critical function of mine run about 7x slower than using a prebuilt format string. For maximum flexibility, it would be best to calculate the format string using this method, so I'd dearly love to keep it. def fmtstring(args): delim = '\0'
1
2197
by: Brent Patroch | last post by:
Hello, Novice here, I am doing bulk emails using CDO, connection to a smtp server at another location. I am trying to streamline my script, or through it out and start over to make it faster. I know that I am doing something wrong and that messages should be sending much faster. Any help appreciated: Set objConfig = Server.CreateObject("CDO.Configuration")
6
1829
by: ajikoe | last post by:
def f(x,y): return math.sin(x*y) + 8 * x I have code like this: def main(): n = 2000 a = zeros((n,n), Float) xcoor = arange(0,1,1/float(n)) ycoor = arange(0,1,1/float(n))
19
1993
by: pkilambi | last post by:
I wrote this function which does the following: after readling lines from file.It splits and finds the word occurences through a hash table...for some reason this is quite slow..can some one help me make it faster... f = open(filename) lines = f.readlines() def create_words(lines): cnt = 0 spl_set = '+' for content in lines:
8
3277
by: Scott Emick | last post by:
I am using the following to compute distances between two lat/long coordinates for a store locator - (VB .NET 2003) it seems to take a long time to iterate through like 100-150 locations - about 10-15 seconds...I want to make the code faster. I changed it to be multi-threaded, and it doesn't really make it any faster. The bottleneck seems to be with the math computations. Any ideas like changing my data types or other ideas etc would...
10
2187
by: Extremest | last post by:
I know there are ways to make this a lot faster. Any newsreader does this in seconds. I don't know how they do it and I am very new to c#. If anyone knows a faster way please let me know. All I am doing is quering the db for all the headers for a certain group and then going through them to find all the parts of each post. I only want ones that are complete. Meaning all segments for that one file posted are there. using System;
12
1637
by: vunet.us | last post by:
Is there a suggestion I can make this code run faster: if(document.getElementById("1")){ doOne(); } if(document.getElementById("2")){ doTwo(); } .................... if(document.getElementById("n")){ doN(); } It is a simplified version above. There is a large number of these repetitive actions. So I wanted to change them for:
13
2136
by: Simply_Red | last post by:
Hi, is there a way to make this function faster??? struct Points { double X; double Y; };
7
1925
by: Steve Bergman | last post by:
I'm involved in a discussion thread in which it has been stated that: """ Anything written in a language that is 20x slower (Perl, Python, PHP) than C/C++ should be instantly rejected by users on those grounds alone. """ I've challenged someone to beat the snippet of code below in C, C++, or assembler, for reading in one million pairs of random floats and
0
9708
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9588
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10340
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10327
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9161
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7625
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6857
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5527
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
3828
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.