ch***********@g mail.com wrote On 07/27/06 11:22,:
Just want an opinion. I have an algorithm that needs to run as fast as
possible, in fact. Its already too slow. I've done as much algorithmic
changes as I can to reduce the amount of code, so now I'm turning to
micro-optimizations.
One function that gets run a bazillion times is the pow() function from
math.h. However I've realized probably 90% of the time, the exponent
will be 0. 99.9% of the time, the exponent will lie between -3 and 3.
It sounds like you're approaching things correctly: You've
tried changing the algorithms, rearranging the calculations,
and other such structural changes (that's where the big wins
are), and found yourself still a little short of the required
performance. And you've made measurements that finger pow()
as one of the big time sinks, and you've counted the number
of times it's called, and you've even made an empirical study
of the exponent values.
(You HAVE, right? The "bazillion" and "90%" and "99.9%"
are actual measurements, right? Or at least they're rough
statements backed by measurements, right? If so, fine -- but
if not, go back and make the measurements before proceeding.)
Once you've got measurements that finger pow() as the
culprit and give some actual information about the exponent
distribution, your inline function seems a good way to exploit
the knowledge you've acquired about the program, knowledge
that pow() itself does not have.
Whether this will save "enough" time is something you
can only find out by measuring. You may be able to tell just
by considering your initial measurements, though: If you find
that pow() takes 20% of the running time, replacing pow() with
something that takes *zero* time can only speed up the program
by 25%. If you need a 10% speedup, it might be within reach --
but if you need to double the program's speed, this isn't
going to do the trick.
So I read that switch's can sometimes be optimized into jump tables if
the case expressions are nearly contigous integers. So I coded up this
little diddy:
static inline double
mult_power(doub le x, int y)
{
switch ((y))
{
case 0 : return 1.0;
case -3 : return 1.0 / ((x)*(x)*(x));
case -2 : return 1.0 / ((x)*(x));
case -1 : return 1.0 / (x);
case 1 : return (x);
case 2 : return (x)*(x);
case 3 : return (x)*(x)*(x);
default : return pow((x), (y));
}
}
I actually havent tested this vs sticking the 0 in the middle (which I
will be doing shortly) but is this a sound replacement? Or is there
perhaps a more efficient solution?
The C language itself says nothing about the speeds of
the constructs you write in it, so questions of efficiency
must always refer to the platform or platforms. Still, it
is quite unlikely that the speed will be affected noticeably
by the order in which the case labels appear in the source
code. ((Also, the speed probably won't be affected if you
get rid of some of those needless parentheses ...))
If your "zero 90% of the time" figure is literally true,
you might (*might*) gain a milliquiver or so by testing for
it before doing the switch:
if (y == 0)
return 1.0;
switch (y) {
...
}
Differences due to this sort of thing are usually "down in
the noise," but a bazillion milliquivers is a hectotwitch
(IIRC), which might be noticeable. Keep in mind, though,
that this "optimizati on" might actually slow you down --
measure, measure, measure!
--
Er*********@sun .com