When I used gprof to see which function consumed most running time, I
identified the following one. sz was less than 5000 on average, but
foo had been called about 1,000,000 times. I have tried using
"register sum = 0.0" and saw some improvement. My question is how to
improve foo further to make it faster.
double foo(double *a, double *b, int sz)
{
double sum = 0.0;
int i;
for (i = 0; i < sz; i++)
sum += a[i]*b[i];
return sum;
}
Apr 7 '08
48 2139 is*********@gma il.com writes:
>>Or compute exp(sum) instead of sum, if that doesn't overflow: (...)
I came up with a solution.
exp_sum *= (z1/z2)*HUGE_VAL;
return sz*log(HUGE_VAL ) + log(exp_sum)
How large this HUGE_VAL should be?
Depends on your data set and the range of result values. Looks to me
like if z1/z2 are normally about the same size, HUGE_VAL should be 1...
I have little experience with numerical programming, but I suppose a
good value would be one which normally yields exp_sum == ca 1, i.e.
HUGE_VAL = exp(result/sz). But if you need this, maybe that means
the data set is too variable and you had better do log() in the loop.
--
Hallvard is*********@gma il.com writes:
exp_sum *= (z1/z2)*HUGE_VAL;
return sz*log(HUGE_VAL ) + log(exp_sum)
Eh. I'm getting tired, but shouldn't that be exp_sum *= (z1/z2) /
HUGE_VAL or return log(exp_sum) - sz*log(HUGE_VAL )?
And a similar error carried over to my answer. (The idea was to
try to collect a normal value of the answer in HUGE_VAL so
so log(exp_sum) return approximately 0 for a normal data set.)
--
Hallvard
On Apr 9, 1:19 pm, Hallvard B Furuseth <h.b.furus...@u sit.uio.no>
wrote:
istillsh...@gma il.com writes:
Can also get rid of the multiplications with exp0:
Instead of exp1, exp2, x0, ... z2, keep variables
with values (current exp1)/exp0 ... (current z2)/exp0.
I don't understand this
exp1 = exp(x->data[0] - 1.0);
exp2 = exp(x->data[1] - 1.0);
for (i = 0; i < sz; i++) {
x0 = g0[i];
y0 = f0[i];
(no other changes in the loop)
}
Variables <exp,x,y,z><0,1 ,2now have values (original value / exp0).
sum and df->data do not change (except due to rounding), since
they get values (foo/exp0) / (bar/exp0) instead of foo/bar.
Got it. So smart.
On Apr 9, 2:29 pm, istillsh...@gma il.com wrote:
>
Combined with the previous 3 secs' saving, the running time has
reduced to 11 secs from 71 secs. Thank you. I really learned
something new and useful.
After setting -O2 flag in compiling, it only took 7 secs. The running
time is satisfactory. Your ideas are indispensable.
On Apr 9, 2:29 pm, istillsh...@gma il.com wrote:
Combined with the previous 3 secs' saving, the running time has
reduced to 11 secs from 71 secs. Thank you. I really learned
something new and useful.
After setting -O2 flag in compiling, it only took 7 secs.
On Apr 9, 2:50 pm, Paul Hsieh <websn...@gmail .comwrote:
#define ITER 5
Change this to:
#define ITER 64
Just trust me, and try it out.
I tried using "#define ITER 64". Overflow occurred in this setting.
I got "1.#J". I don't know what "1.#J" stands for? When I changed
the type of sump and sumd to long double, the problem disappeared.
The program saved 1 sec because of larger ITER.
I also tried another way:
#define ITER 64
register double sum = 0.0;
if (i % ITER == 0) {
sum += log(prodn/prodd); /* introducing log, but not so frequent */
prodn = 1.0;
prodd = 1.0;
}
In the way, the running time remained the same: 11 secs before setting
-O2, and 7 secs after setting -O2. However, the return value was a
little different.
On Apr 9, 3:51 pm, istillsh...@gma il.com wrote:
On Apr 9, 2:50 pm, Paul Hsieh <websn...@gmail .comwrote:
#define ITER 5
Change this to:
#define ITER 64
Just trust me, and try it out.
I tried using "#define ITER 64". Overflow occurred in this setting.
I got "1.#J". I don't know what "1.#J" stands for?
Its a kind of NaN which is in fact an overflow condition of some
kind. In that case, you should try 4 or 8. But the key is to use a
power of 2. This will allow your compiler to use an "as-if" rule to
reduce the cost of executing the "%" operator. Otherwise you can use
a target counter:
if (i >= tgtI) {
tgtI += ITER;
prodn /= prodd;
prodd = 1.0;
}
That would probably work just as well.
[...] When I changed
the type of sump and sumd to long double, the problem disappeared.
The program saved 1 sec because of larger ITER.
I also tried another way:
#define ITER 64
register double sum = 0.0;
if (i % ITER == 0) {
sum += log(prodn/prodd); /* introducing log, but not so frequent */
prodn = 1.0;
prodd = 1.0;
}
In the way, the running time remained the same: 11 secs before setting
-O2, and 7 secs after setting -O2. However, the return value was a
little different.
Well I would suggest trying other values of ITER first.
Unfortunately, long double sounds like its use a larger likely more
expensive data type which may likely have impact on the performance of
rest of your code. So I would recommend against doing that.
--
Paul Hsieh http://www.pobox.com/~qed/ http://bstring.sf.net/
On Apr 9, 7:50*pm, Paul Hsieh <websn...@gmail .comwrote:
Glad to hear it. *Transcendental calls are the worst in terms of
performance. *After that comes divides and modulos. *After that is
branch mis-predictions. *If you can optimize *all that, you will at
least be within 2x-4x of optimal (barring algorithm and IO
improvements) pretty much all time.
Usually the total cost is computational cost plus cost of data access.
Since the computational cost has by now been reduced by a factor ten,
it would now be a good idea to look at things like cache usage, but
that would require a bigger picture.
It was mentioned that g [i] is always one of 0.0, 0.25, 0.5 and 0.75.
A trivial change (changing these arrays from double to float) would
save 25% of memory bandwidth. A simple experiment would be to measure
the time of one program run, then do the same number of calls to foo,
but always with the same set of data. If that is a lot faster, then
cache improvements can help.
On Apr 11, 3:56*pm, "christian. bau" <christian....@ cbau.wanadoo.co .uk>
wrote:
On Apr 9, 7:50*pm, Paul Hsieh <websn...@gmail .comwrote:
Glad to hear it. *Transcendental calls are the worst in terms of
performance. *After that comes divides and modulos. *After that is
branch mis-predictions. *If you can optimize *all that, you will at
least be within 2x-4x of optimal (barring algorithm and IO
improvements) pretty much all time.
Usually the total cost is computational cost plus cost of data access.
Since the computational cost has by now been reduced by a factor ten,
it would now be a good idea to look at things like cache usage, but
that would require a bigger picture.
It was mentioned that g [i] is always one of 0.0, 0.25, 0.5 and 0.75.
A trivial change (changing these arrays from double to float) would
save 25% of memory bandwidth. A simple experiment would be to measure
the time of one program run, then do the same number of calls to foo,
but always with the same set of data. If that is a lot faster, then
cache improvements can help.
If the problem were scaled so that g[i] contained characters of
0,1,2,3 {which represent multiples of 0.25} it would reduce the
requirement for 4 byte float to 1 byte for char (assuming 32 bit float
and 8 bit char). This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Kamilche |
last post by:
I have a routine that I really need, but it slows down processing
significantly. Can you spot any ineffeciencies in the code?
This code makes a critical function of mine run about 7x slower than
using a prebuilt format string. For maximum flexibility, it would be
best to calculate the format string using this method, so I'd dearly
love to keep it.
def fmtstring(args):
delim = '\0'
|
by: Brent Patroch |
last post by:
Hello,
Novice here, I am doing bulk emails using CDO, connection to a smtp
server at another location. I am trying to streamline my script, or
through it out and start over to make it faster.
I know that I am doing something wrong and that messages should be
sending much faster. Any help appreciated:
Set objConfig = Server.CreateObject("CDO.Configuration")
|
by: ajikoe |
last post by:
def f(x,y):
return math.sin(x*y) + 8 * x
I have code like this:
def main():
n = 2000
a = zeros((n,n), Float)
xcoor = arange(0,1,1/float(n))
ycoor = arange(0,1,1/float(n))
|
by: pkilambi |
last post by:
I wrote this function which does the following:
after readling lines from file.It splits and finds the word occurences
through a hash table...for some reason this is quite slow..can some one
help me make it faster...
f = open(filename)
lines = f.readlines()
def create_words(lines):
cnt = 0
spl_set = '+'
for content in lines:
|
by: Scott Emick |
last post by:
I am using the following to compute distances between two lat/long
coordinates for a store locator - (VB .NET 2003)
it seems to take a long time to iterate through like 100-150 locations -
about 10-15 seconds...I want to make the code faster. I changed it to be
multi-threaded, and it doesn't really make it any faster. The bottleneck
seems to be with the math computations.
Any ideas like changing my data types or other ideas etc would...
| |
by: Extremest |
last post by:
I know there are ways to make this a lot faster. Any
newsreader does this in seconds. I don't know how they do
it and I am very new to c#. If anyone knows a faster way
please let me know. All I am doing is quering the db for
all the headers for a certain group and then going through
them to find all the parts of each post. I only want ones
that are complete. Meaning all segments for that one file
posted are there.
using System;
|
by: vunet.us |
last post by:
Is there a suggestion I can make this code run faster:
if(document.getElementById("1")){ doOne(); }
if(document.getElementById("2")){ doTwo(); }
....................
if(document.getElementById("n")){ doN(); }
It is a simplified version above. There is a large number of these
repetitive actions. So I wanted to change them for:
|
by: Simply_Red |
last post by:
Hi,
is there a way to make this function faster???
struct Points {
double X;
double Y;
};
|
by: Steve Bergman |
last post by:
I'm involved in a discussion thread in which it has been stated that:
"""
Anything written in a language that is 20x slower (Perl, Python,
PHP) than C/C++ should be instantly rejected by users on those grounds
alone.
"""
I've challenged someone to beat the snippet of code below in C, C++,
or assembler, for reading in one million pairs of random floats and
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |