inline power function replacement

chris.fairles

Just want an opinion. I have an algorithm that needs to run as fast as
possible, in fact. Its already too slow. I've done as much algorithmic
changes as I can to reduce the amount of code, so now I'm turning to
micro-optimizations.

One function that gets run a bazillion times is the pow() function from
math.h. However I've realized probably 90% of the time, the exponent
will be 0. 99.9% of the time, the exponent will lie between -3 and 3.
So I read that switch's can sometimes be optimized into jump tables if
the case expressions are nearly contigous integers. So I coded up this
little diddy:

static inline double
mult_power(doub le x, int y)
{
switch ((y))
{
case 0 : return 1.0;
case -3 : return 1.0 / ((x)*(x)*(x));
case -2 : return 1.0 / ((x)*(x));
case -1 : return 1.0 / (x);
case 1 : return (x);
case 2 : return (x)*(x);
case 3 : return (x)*(x)*(x);
default : return pow((x), (y));
}
}

I actually havent tested this vs sticking the 0 in the middle (which I
will be doing shortly) but is this a sound replacement? Or is there
perhaps a more efficient solution?

Jul 27 '06 #1

Subscribe Reply

3332

1
2
3
>
Last »

Ancient_Hacker

ch***********@g mail.com wrote:

I actually havent tested this vs sticking the 0 in the middle (which I
will be doing shortly) but is this a sound replacement? Or is there
perhaps a more efficient solution?

Well it depends on how smart your compiler is. It's not unimaginable
that a smart compiler would already be checking for the simple cases
and generating in-line code for this. Only way to find out is to try
it.

If your compiler is somewhat dumb, it may help to make this a macro,
so no call gets generated.

and Oh, you'll get some flamitude from folks that consider any kind of
speculation of run-times has nothing to do with C.

Jul 27 '06 #2

websnarf

ch***********@g mail.com wrote:

Just want an opinion. I have an algorithm that needs to run as fast as
possible, in fact. Its already too slow. I've done as much algorithmic
changes as I can to reduce the amount of code, so now I'm turning to
micro-optimizations.

I'm not sure this is the right newsgroup for such a question. Not a
lot of people in this group can fathom what exactly you are saying
here.

One function that gets run a bazillion times is the pow() function from
math.h. However I've realized probably 90% of the time, the exponent
will be 0. 99.9% of the time, the exponent will lie between -3 and 3.
So I read that switch's can sometimes be optimized into jump tables if
the case expressions are nearly contigous integers. So I coded up this
little diddy:

static inline double
mult_power(doub le x, int y)
{
switch ((y))
{
case 0 : return 1.0;
case -3 : return 1.0 / ((x)*(x)*(x));
case -2 : return 1.0 / ((x)*(x));
case -1 : return 1.0 / (x);
case 1 : return (x);
case 2 : return (x)*(x);
case 3 : return (x)*(x)*(x);
default : return pow((x), (y));
}
}

I actually havent tested this vs sticking the 0 in the middle (which I
will be doing shortly) but is this a sound replacement?

Well, there are numerical considerations (i.e., your results will not
be identical to just calling pow().) But your idea is very well
motivated. There really should be a powint(double x, int y) function
in C (that does even more.) The problem of doing this as quickly as
possible for all integers is actually an interesting one.

Oh yeah, and the order of the cases should not matter if the compiler
has transformed this to a jump table. It only matters in the cases
where it doesn't do that, in which case you should just put the most
common cases as early as possible.

Or is there perhaps a more efficient solution?

Indeed. If y is mostly zero, then you should do this:

#define fastPowInt(x,y) ((0 == (y)) ? 1.0 : mult_power ((x), (y)))

The reason is that the single conditional on the top can leverage a
modern microprocessor' s branch prediction capabilities, and thus cost
the absolutely least overhead for the vast majority of cases. A
switch() operation indeed does use jmp tables in many compilers, but
unless you are almost always switching to the same case, this has
roughly the same overhead as an indirect function call (calling through
a function pointer.) So, the surrounding conditional as shown above
should improve performance further still.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Jul 27 '06 #3

Frederick Gotham

ch***********@g mail.com posted:

case -3 : return 1.0 / ((x)*(x)*(x));

Why the copious use of parentheses? You're not writing a macro.

return 1.0 / (x*x*x);

Or maybe:

return 1.0 / x / x / x;

Or is there perhaps a more efficient solution?

Presumbly "pow" should be checking for integer exponents... ?

--

Frederick Gotham

Jul 27 '06 #4

Eric Sosman

ch***********@g mail.com wrote On 07/27/06 11:22,:

Just want an opinion. I have an algorithm that needs to run as fast as
possible, in fact. Its already too slow. I've done as much algorithmic
changes as I can to reduce the amount of code, so now I'm turning to
micro-optimizations.

One function that gets run a bazillion times is the pow() function from
math.h. However I've realized probably 90% of the time, the exponent
will be 0. 99.9% of the time, the exponent will lie between -3 and 3.

It sounds like you're approaching things correctly: You've
tried changing the algorithms, rearranging the calculations,
and other such structural changes (that's where the big wins
are), and found yourself still a little short of the required
performance. And you've made measurements that finger pow()
as one of the big time sinks, and you've counted the number
of times it's called, and you've even made an empirical study
of the exponent values.

(You HAVE, right? The "bazillion" and "90%" and "99.9%"
are actual measurements, right? Or at least they're rough
statements backed by measurements, right? If so, fine -- but
if not, go back and make the measurements before proceeding.)

Once you've got measurements that finger pow() as the
culprit and give some actual information about the exponent
distribution, your inline function seems a good way to exploit
the knowledge you've acquired about the program, knowledge
that pow() itself does not have.

Whether this will save "enough" time is something you
can only find out by measuring. You may be able to tell just
by considering your initial measurements, though: If you find
that pow() takes 20% of the running time, replacing pow() with
something that takes *zero* time can only speed up the program
by 25%. If you need a 10% speedup, it might be within reach --
but if you need to double the program's speed, this isn't
going to do the trick.

So I read that switch's can sometimes be optimized into jump tables if
the case expressions are nearly contigous integers. So I coded up this
little diddy:

static inline double
mult_power(doub le x, int y)
{
switch ((y))
{
case 0 : return 1.0;
case -3 : return 1.0 / ((x)*(x)*(x));
case -2 : return 1.0 / ((x)*(x));
case -1 : return 1.0 / (x);
case 1 : return (x);
case 2 : return (x)*(x);
case 3 : return (x)*(x)*(x);
default : return pow((x), (y));
}
}

I actually havent tested this vs sticking the 0 in the middle (which I
will be doing shortly) but is this a sound replacement? Or is there
perhaps a more efficient solution?

The C language itself says nothing about the speeds of
the constructs you write in it, so questions of efficiency
must always refer to the platform or platforms. Still, it
is quite unlikely that the speed will be affected noticeably
by the order in which the case labels appear in the source
code. ((Also, the speed probably won't be affected if you
get rid of some of those needless parentheses ...))

If your "zero 90% of the time" figure is literally true,
you might (*might*) gain a milliquiver or so by testing for
it before doing the switch:

if (y == 0)
return 1.0;
switch (y) {
...
}

Differences due to this sort of thing are usually "down in
the noise," but a bazillion milliquivers is a hectotwitch
(IIRC), which might be noticeable. Keep in mind, though,
that this "optimizati on" might actually slow you down --
measure, measure, measure!

--
Er*********@sun .com

Jul 27 '06 #5

chris.fairles

we******@gmail. com wrote:

ch***********@g mail.com wrote:
Just want an opinion. I have an algorithm that needs to run as fast as
possible, in fact. Its already too slow. I've done as much algorithmic
changes as I can to reduce the amount of code, so now I'm turning to
micro-optimizations.

I'm not sure this is the right newsgroup for such a question. Not a
lot of people in this group can fathom what exactly you are saying
here.

I see your point. I suppose the heart of this question is "what machine
code does this C code get translated into" which obviously is done by
the compiler (gcc 4.1.1 in my case).

Actually, this is more of an archetecture specific question because
really the question is "what should I do to optimize this function for
amd64 processors" heh. But then again, you did give some good insight
depsite the wrong newsgroup! So thanks!

Chris

One function that gets run a bazillion times is the pow() function from
math.h. However I've realized probably 90% of the time, the exponent
will be 0. 99.9% of the time, the exponent will lie between -3 and 3.
So I read that switch's can sometimes be optimized into jump tables if
the case expressions are nearly contigous integers. So I coded up this
little diddy:

static inline double
mult_power(doub le x, int y)
{
switch ((y))
{
case 0 : return 1.0;
case -3 : return 1.0 / ((x)*(x)*(x));
case -2 : return 1.0 / ((x)*(x));
case -1 : return 1.0 / (x);
case 1 : return (x);
case 2 : return (x)*(x);
case 3 : return (x)*(x)*(x);
default : return pow((x), (y));
}
}

I actually havent tested this vs sticking the 0 in the middle (which I
will be doing shortly) but is this a sound replacement?

Well, there are numerical considerations (i.e., your results will not
be identical to just calling pow().) But your idea is very well
motivated. There really should be a powint(double x, int y) function
in C (that does even more.) The problem of doing this as quickly as
possible for all integers is actually an interesting one.

Oh yeah, and the order of the cases should not matter if the compiler
has transformed this to a jump table. It only matters in the cases
where it doesn't do that, in which case you should just put the most
common cases as early as possible.

Or is there perhaps a more efficient solution?

Indeed. If y is mostly zero, then you should do this:

#define fastPowInt(x,y) ((0 == (y)) ? 1.0 : mult_power ((x), (y)))

The reason is that the single conditional on the top can leverage a
modern microprocessor' s branch prediction capabilities, and thus cost
the absolutely least overhead for the vast majority of cases. A
switch() operation indeed does use jmp tables in many compilers, but
unless you are almost always switching to the same case, this has
roughly the same overhead as an indirect function call (calling through
a function pointer.) So, the surrounding conditional as shown above
should improve performance further still.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Jul 27 '06 #6

Keith Thompson

"Ancient_Hacker " <gr**@comcast.n etwrites:

ch***********@g mail.com wrote:
>I actually havent tested this vs sticking the 0 in the middle (which I
will be doing shortly) but is this a sound replacement? Or is there
perhaps a more efficient solution?

Well it depends on how smart your compiler is. It's not unimaginable
that a smart compiler would already be checking for the simple cases
and generating in-line code for this. Only way to find out is to try
it.

If your compiler is somewhat dumb, it may help to make this a macro,
so no call gets generated.

He declared the function as "static inline"; it's unlikely that making
it a macro will improve performance beyond that (assuming the compiler
really does inline it).

If this truly is a bottleneck, both measuring the code's actual
performance and looking at assembly listings could be helpful.

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Jul 27 '06 #7

Eric Sosman

we******@gmail. com wrote:

[...]
There really should be a powint(double x, int y) function
in C (that does even more. [...]

Wimp! Puling almsbeggar! Settle-for-half pantywaist!

C should have an exponentiation *operator*! Since the
obvious ^ has already been co-opted for another purpose and
since the traditional ** would be ambiguous due to other
overloadings, I propose the ^^ operator for exponentiation.
(Or, if folks would prefer it, the **^ operator -- best of
both worlds, don'tcha know.)

NEVER SURRENDER!
-- Dyed-in-the-wool FORTRAN II programmer (always
speaks in capital letters)

--
Eric Sosman
es*****@acm-dot-org.invalid

Jul 28 '06 #8

Nick Vatamaniuc

Chris,
Your problem is architectures & compiler specific. You have to become
"good friends" with your compiler and architecture to optimize this.
It seems like a good thing to do is to hint the branch predictor that
most of the time the value will be 0. So check out the
__builtin_expec t() for GCC (there are a bunch of other __builtins for
constant folding and such...)

If that is too much of a headache, try using the -fbranch-arcs compile
option, run the code, then use the -fbranch-probabilities to recompile
after that. On some architectures you will anywhere from 0 to some
improvement. For example between latest AMD and Intel I would expect
a better improvement from Intel as it has longer pipelines ( the longer
the pipeline, the more penalty during wrong branch prediction --
therefore the better the prediction the better chance of a speed
improvement).

Hope this helps,
Nick Vatamaniuc

ch***********@g mail.com wrote:

Just want an opinion. I have an algorithm that needs to run as fast as
possible, in fact. Its already too slow. I've done as much algorithmic
changes as I can to reduce the amount of code, so now I'm turning to
micro-optimizations.

One function that gets run a bazillion times is the pow() function from
math.h. However I've realized probably 90% of the time, the exponent
will be 0. 99.9% of the time, the exponent will lie between -3 and 3.
So I read that switch's can sometimes be optimized into jump tables if
the case expressions are nearly contigous integers. So I coded up this
little diddy:

static inline double
mult_power(doub le x, int y)
{
switch ((y))
{
case 0 : return 1.0;
case -3 : return 1.0 / ((x)*(x)*(x));
case -2 : return 1.0 / ((x)*(x));
case -1 : return 1.0 / (x);
case 1 : return (x);
case 2 : return (x)*(x);
case 3 : return (x)*(x)*(x);
default : return pow((x), (y));
}
}

I actually havent tested this vs sticking the 0 in the middle (which I
will be doing shortly) but is this a sound replacement? Or is there
perhaps a more efficient solution?

Jul 28 '06 #9

Keith Thompson

Eric Sosman <es*****@acm-dot-org.invalidwrit es:

we******@gmail. com wrote:
>[...]
There really should be a powint(double x, int y) function
in C (that does even more. [...]

Wimp! Puling almsbeggar! Settle-for-half pantywaist!

C should have an exponentiation *operator*! Since the
obvious ^ has already been co-opted for another purpose and
since the traditional ** would be ambiguous due to other
overloadings, I propose the ^^ operator for exponentiation.
(Or, if folks would prefer it, the **^ operator -- best of
both worlds, don'tcha know.)

If you use ^^ for exponentiation, what am I going to use for
short-circuit xor?

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Jul 28 '06 #10

Similar topics

5158

loading a page in an inline frame

by: StumpY | last post by:

Hi I have a code which works ok, except I would like to load the specified file in an inline frame (I1) on the page instead of a whole new page if at all possible - any help appreciated! Here is the function in the <head>; <script type="text/javascript"> function go() {

Javascript

2771

"inline"?

by: Chris Mantoulidis | last post by:

I am not clear with the use of the keyword inline... I believe you add it do a function when you implement the function inside the header file where the class is stored... But is that all? What am I missing? If that's all, then why did Bjarne even bother adding it to the language? If that's not all, what else can I do with "inline"?

C / C++

1537

Inline code in header files - learned my lesson, but one mystery remains

by: S Austin | last post by:

Discovered recently (duh) that putting inline code in .h files (e.g. in class definitions) is not a good idea when building DLLs and the applications that use those DLLs. The reason being, of course, is that the application gets its own copy of that code when it compiles and won't call the code in the DLL. Each compiled unit in the DLL also ends up with its own copy of the never-called code. Assuming the same build process for the DLL...

C / C++

3852

Inlines with external linkage

by: Richard Hayden | last post by:

Hi, I have the following code: /******************************** file1.c #include <iostream> extern void dummy(); inline int testfunc() {

C / C++

3134

inline keyword and linkage

by: Grumble | last post by:

Hello everyone, As far as I understand, the 'inline' keyword is a hint for the compiler to consider the function in question as a candidate for inlining, yes? What happens when a function with extern linkage is inlined? Should the compiler still export the function? Or is an inlined function implicitly static?

C / C++

2447

inline functions...

by: John Ratliff | last post by:

I was reading the C++ FAQ Lite about inline functions, and it says the following (http://www.parashift.com/c++-faq-lite/inline-functions.html#faq-9.7) " It's usually imperative that the function's definition (the part between the {...}) be placed in a header file. If you put the inline function's definition into a .cpp file, and if it is called from some other .cpp file, you'll get an "unresolved external" error from the linker. " Why...

C / C++

5051

Disadvantage of using 'inline'

by: Method Man | last post by:

If I don't care about the size of my executable or compile time, is there any reason why I wouldn't want to inline every function in my code to make the program run more efficient?

C / C++

367

inline functions

by: junky_fellow | last post by:

What is an inline function ? When should we declare a particular function as inline ? If a function is declared to be inline, does that mean that the entire code for that function is subtituted there just like macros ? Then why not use macros ?

C / C++

1502

why won't JIT inline this?

by: Fuzzy | last post by:

I have written a custom replacement for library function Math.Pow(double x, double y) for the specific condition of y being integer and >=0. It is twice as fast as Math.Pow(), but I cannot get the compiler to inline compile it. I've tried many different approaches, but nothing seems to work. I am debugging in native mode, and I do see property get's inlined as I would expect. Any ideas? C# code, IL code, and the optimized...

C# / C Sharp

8399

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

8312

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

8732

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

6169

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

5632

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

4159

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

4318

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

2732

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

1959

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP