inline function call

Riko Wichmann

hi everyone,

I'm googeling since some time, but can't find an answer - maybe because
the answer is 'No!'.

Can I call a function in python inline, so that the python byte compiler
does actually call the function, but sort of inserts it where the inline
call is made? Therefore avoiding the function all overhead.

Thanks and cheers,

Riko

Jan 4 '06 #1

Subscribe Reply

17485

M1st0

I think it does'n exists.
But should be.

You can also roll up your own using some templating library..

Jan 4 '06 #2

Diez B. Roggisch

Riko Wichmann wrote:

Can I call a function in python inline, so that the python byte compiler
does actually call the function, but sort of inserts it where the inline
call is made? Therefore avoiding the function all overhead.

No. That is simply impossible in python as well as in java where functions
are always virtual, meaning they are looked up at runtime. Because you'd
never know _which_ code to insert of all the different foo()-methods that
might be around there.

Do you have an actual use-case for that? I mean, do you have code that runs
slow, but with inlined code embarrassingly faster?

Regards,

Diez

Jan 4 '06 #3

Riko Wichmann

> Do you have an actual use-case for that? I mean, do you have code that runs

slow, but with inlined code embarrassingly faster?

Well, I guess it would not actually be embarrassingly faster. From
trying various things and actually copying the function code into the
DoMC routine, I estimate to get about 15-20% reduction in the execution
time. It ran very slow, in the beginning but after applying some other
'fastpython' techniques it's actually quite fast ....

'inlining' is mostly a matter of curiosity now :)

here is the code snipplet:

-----------------------------------------------------------------

[... cut out some stuff here ....]
# riskfunc(med, low, high):
# risk function for costs: triangular distribution
# implemented acoording to:
http://www.brighton-webs.co.uk/distr...triangular.asp
def riskfunc(med, low, high):
if med != 0.0:
u = random()
try:
if u <= (med-low)/(high-low):
r = low+sqrt(u*(high-low)*(med-low))
else:
r = high - sqrt((1.0-u)*(high-low)*(high-med))

except ZeroDivisionError: # case high = low
r = med
else:
r = 0.0

return r
# doMC:
# run the MC of the cost analysis
#
def doMC(Ntrial = 1):

from math import sqrt

start = time.time()
print 'run MC with ', Ntrial, ' trials'

# start with a defined seed for reproducability

total = 0.0

for i in range(Ntrial):

summe = 0.0
for k in range(len(Gcost)):

x = riskfunc(Gcost[k], Gdown[k], Gup[k])
summe += x

# store the value 'summe' for later usage
# ..... more code here
print "Summe : ", summe
stop = time.time()
print 'Computing time: ', stop-start

################################################## ##################
################################################## ##################

if __name__ == '__main__':

n = 100000
doMC(n)

Jan 4 '06 #4

Rocco Moretti

Riko Wichmann wrote:

hi everyone,

I'm googeling since some time, but can't find an answer - maybe because
the answer is 'No!'.

Can I call a function in python inline, so that the python byte compiler
does actually call the function, but sort of inserts it where the inline
call is made? Therefore avoiding the function all overhead.

The cannonical answer is "you probably don't need to do that."

If you're still set on inlining functions, take a look at bytecodehacks:
http://bytecodehacks.sourceforge.net/

Jan 4 '06 #5

Peter Hansen

Riko Wichmann wrote:

Can I call a function in python inline, so that the python byte compiler
does actually call the function, but sort of inserts it where the inline
call is made? Therefore avoiding the function all overhead.

I know a simple technique that could should basically avoid the function
call overhead that might be worrying you, but it's really not suitable
to use except in case of a really serious bottleneck. How bad is the
performance in the particular case that concerns you? What kinds of
timing measurement have you got that show the problem?

(The technique is basically a particular way of using a generator...
should be obvious and easy to figure out if it's really the "function
call overhead" that is your bottleneck.)

-Peter

Jan 4 '06 #6

Peter Hansen

Riko Wichmann wrote:

def riskfunc(med, low, high):
if med != 0.0:
u = random()
try:
if u <= (med-low)/(high-low):
r = low+sqrt(u*(high-low)*(med-low))
else:
r = high - sqrt((1.0-u)*(high-low)*(high-med))

except ZeroDivisionError: # case high = low
r = med
else:
r = 0.0

return r

Since math.sqrt() is in C, the overhead of the sqrt() call is probably
minor, but the lookup is having to go to the global namespace which is a
tiny step beyond looking just at the locals. Using a default argument
to get a local could make a small difference. That is do this instead:

def riskfunc(med, low, high, sqrt=math.sqrt):

Same thing with calling random(), which is doing a global lookup first
to find the function.

By the way, you'll get better timing information by learning to use the
timeit module. Among other things, depending on your platform and how
long the entire loop takes to run, using time.time() could be given you
pretty coarse results.

You might also try precalculating high-low and storing it in a temporary
variable to avoid the duplicate calculations.

-Peter

Jan 4 '06 #7

Stuart D. Gathman

On Wed, 04 Jan 2006 13:18:32 +0100, Riko Wichmann wrote:

I'm googeling since some time, but can't find an answer - maybe because
the answer is 'No!'.

Can I call a function in python inline, so that the python byte compiler
does actually call the function, but sort of inserts it where the inline
call is made? Therefore avoiding the function all overhead.

In standard python, the answer is no. The reason is that all python
functions are effectively "virtual", and you don't know *which* version to
inline.

HOWEVER, there is a slick product called Psyco:

http://psyco.sourceforge.net/

which gets around this by creating multiple versions of functions which
contain inlined (or compiled) code. For instance, if foo(a,b) is often
called with a and b of int type, then a special version of foo is compiled
that is equivalent performance wise to foo(int a,int b). Dynamically
finding the correct version of foo at runtime is no slower than normal
dynamic calls, so the result is a very fast foo function. The only
tradeoff is that every specialized version of foo eats memory. Psyco
provides controls allowing you to specialize only those functions that
need it after profiling your application.

--
Stuart D. Gathman <st****@bmsi.com>
Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

Jan 4 '06 #8

Riko Wichmann

Hey guys,

thanks for all the quick replies! In addition to the tips Peter and
Stuart gave me above, I also followed some of the hints found under

http://wiki.python.org/moin/PythonSpeed/PerformanceTips

That greatly improved performance from about 3 minutes initially (inner
loop about 2000, outer loop about 10000 runs - I think) down to a few
seconds. My question on the inline function call was triggered by the 3
minute run on a pretty small statistic (10000 events) Monte Carlo
sample. Now, I'm much more relaxed! :)

One of the biggest improvements in addition to using psyco was actually
being careful about avoiding global namespace lookup.

So, thanks again, I learned a great deal about efficient python coding
today! :)

Cheers,

Riko

Jan 4 '06 #9

Peter Hansen

Riko Wichmann wrote:

That greatly improved performance from about 3 minutes initially (inner
loop about 2000, outer loop about 10000 runs - I think) down to a few
seconds. My question on the inline function call was triggered by the 3
minute run on a pretty small statistic (10000 events) Monte Carlo
sample. Now, I'm much more relaxed! :)

One of the biggest improvements in addition to using psyco was actually
being careful about avoiding global namespace lookup.

Riko, any chance you could post the final code and a bit more detail on
exactly how much Psyco contributed to the speedup? The former would be
educational for all of us, while I'm personally very curious about the
latter because my limited attempts to use Psyco in the past have
resulted in speedups on the order of only 20% or so. (I blame my
particular application, not Psyco per se, but I'd be happy to see a
real-world case where Psyco gave a much bigger boost.)

Thanks,
-Peter

Jan 4 '06 #10

bearophileHUGS

Peter Hansen>but I'd be happy to see a real-world case where Psyco gave
a much bigger boost.)<

Psyco can be very fast, but:
- the program has to be the right one;
- you have to use "low level" programming, programming more like in C,
avoiding most of the nice things Python has, like list generators,
etc.;
- you can try to use array.array (for floats and ints), I have found
that sometimes Psyco can use them in a very fast way.

This is an example of mine, it's not really a real-world case, it looks
like C, but it shows the difference, if you switch off Psyco you can
see that it goes much slower:
http://shootout.alioth.debian.org/gp...ang=psyco&id=0

This Python version is MUCH slower, but it looks more like Python:
http://shootout.alioth.debian.org/gp...ng=python&id=0
If you try Psyco with this version you can see that it's much slower
than the other one.

This is the summary page, the Python version is about 54 times slower
than the Psyco version:
http://shootout.alioth.debian.org/gp...nkuch&lang=all

More info:
http://psyco.sourceforge.net/psycoguide/node29.html

Bye,
bearophile

Jan 4 '06 #11

Riko Wichmann

Hi Peter,

Riko, any chance you could post the final code and a bit more detail on
exactly how much Psyco contributed to the speedup? The former would be
educational for all of us, while I'm personally very curious about the
latter because my limited attempts to use Psyco in the past have
resulted in speedups on the order of only 20% or so. (I blame my
particular application, not Psyco per se, but I'd be happy to see a
real-world case where Psyco gave a much bigger boost.)

the difference between running with and without psyco is about a factor
3 for my MC simulation. Without psyco the simulation runs for 62 sec,
with it for 19 secs (still using time instead of timeit, though!:) This
is for about 2300 and 10000 in for the inner and outer loop, respectively.

A factor 3 I consider worthwhile, especially since it doesn't really
cost you much.

This is on a Dell Lat D600 running Linux (Ubuntu 5.10) with a 1.6 GHz
Pentium M and 512 MB of RAM and python2.4.

The final code snipplet is attached. However, it is essentially
unchanged compared to the piece I posted earlier which already had most
of the global namespace look-up removed. Taking care of sqrt and random
as you suggested didn't improve much anymore. So it's probably not that
educational afterall.

Cheers,

Riko

-----------------------------------------------------
# import some modules
import string
import time
from math import sqrt

# accelerate:
import psyco

# random number init
from random import random, seed
seed(1)

# riskfunc(med, low, high):
# risk function for costs: triangular distribution
# implemented acoording to:
http://www.brighton-webs.co.uk/distr...triangular.asp
def riskfunc(med, low, high):
if med != 0.0:
u = random()
try:
if u <= (med-low)/(high-low):
r = low+sqrt(u*(high-low)*(med-low))
else:
r = high - sqrt((1.0-u)*(high-low)*(high-med))

except ZeroDivisionError: # case high = low
r = med
else:
r = 0.0

return r
# doMC:
# run the MC of the cost analysis
#
def doMC(Ntrial = 1):

start = time.time()
print 'run MC with ', Ntrial, ' trials'
# now do MC simulation and calculate sums

for i in range(Ntrial):

summe = 0.0
# do MC experiments for all cost entries
for k in range(len(Gcost)):
x = riskfunc(Gcost[k], Gdown[k], Gup[k])
summe +=x

if i%(Ntrial/10) == 0:
print i, 'MC experiment processed, Summe = %10.2f' % (summe)

stop = time.time()
print 'Computing time: ', stop-start

################################################## ##################
################################################## ##################

if __name__ == '__main__':

fname_base = 'XFEL_budget-book_Master-2006-01-02_cost'

readCosts(fname_base+'.csv')

psyco.full()

n = 10000
doMC(n)

Jan 4 '06 #12

Steven D'Aprano

On Wed, 04 Jan 2006 13:18:32 +0100, Riko Wichmann wrote:

hi everyone,

I'm googeling since some time, but can't find an answer - maybe because
the answer is 'No!'.

Can I call a function in python inline, so that the python byte compiler
does actually call the function, but sort of inserts it where the inline
call is made? Therefore avoiding the function all overhead.

The closest thing to that is the following:
# original version:

for i in xrange(100000):
myObject.something.foo() # three name space lookups every loop
# inline version:

# original version:

foo = myObject.something.foo
for i in xrange(100000):
foo() # one name space lookup every loop

--
Steven.

Jan 4 '06 #13

Peter Hansen

Riko Wichmann wrote:

the difference between running with and without psyco is about a factor
3 for my MC simulation. A factor 3 I consider worthwhile, especially since it doesn't really
cost you much.
Definitely. "import psyco; psyco.full()" is pretty hard to argue
against. :-)
The final code snipplet is attached. However, it is essentially
unchanged compared to the piece I posted earlier which already had most
of the global namespace look-up removed. Taking care of sqrt and random
as you suggested didn't improve much anymore. So it's probably not that
educational afterall.

Seeing what others have achieved is always educational to the ignorant,
so I learned something. ;-)

I suspect using psyco invalidates a number of the typical Python
optimizations, and localizing global namespace lookups with default
variable assignments is probably one of them.

Thanks for posting.
-Peter

Jan 4 '06 #14

bearophileHUGS

I haven't examined the code very well, but generally I don't suggest to
use exceptions inside tight loops that that have to go fast.

Bye,
bearophile

Jan 4 '06 #15

Riko Wichmann

Peter Hansen wrote:

Seeing what others have achieved is always educational to the ignorant,
so I learned something. ;-)

Would have been even more educating, if I had my original code still at
hand for comparison, which unfortunately I didn't. But all the
improvements come from following the advises given in the URL I posted
earlier already:

http://wiki.python.org/moin/PythonSpeed/PerformanceTips

Cheers,

Riko

Jan 5 '06 #16

Christopher Subich

Diez B. Roggisch wrote:

No. That is simply impossible in python as well as in java where functions
are always virtual, meaning they are looked up at runtime. Because you'd
never know _which_ code to insert of all the different foo()-methods that
might be around there.

Not quite "simply impossible" -- inlining function/method calls could
indeed be an optimization done (eventually) by PyPy or another
Psyco-like optimizer. In this case, function inlining is just something
that You Can Do if you dynamically determine that the function is a
constant object.

Since it is an optimization that makes an assumption about the constancy
of an object, this wouldn't hold true in the general case; an
interpreter which makes that dynamic optimization would need some degree
if checks to make sure that the assumption remains valid.

So it's not technically impossible, at least in the majority of cases
where functions are neither modified nor rebound, no current python
interpreter makes that assumption.

Jan 5 '06 #17

Bengt Richter

On Thu, 05 Jan 2006 08:47:37 +1100, Steven D'Aprano <st***@REMOVETHIScyber.com.au> wrote:

On Wed, 04 Jan 2006 13:18:32 +0100, Riko Wichmann wrote:
hi everyone,

I'm googeling since some time, but can't find an answer - maybe because
the answer is 'No!'.

Can I call a function in python inline, so that the python byte compiler
does actually call the function, but sort of inserts it where the inline
call is made? Therefore avoiding the function all overhead.

The closest thing to that is the following:
# original version:

for i in xrange(100000):
myObject.something.foo() # three name space lookups every loop
# inline version:

# original version:

foo = myObject.something.foo
for i in xrange(100000):
foo() # one name space lookup every loop

But that isn't in-lining, that's hoisting some expression-evaluation
out of a loop on the assumption that the expression has constant value
and no side effects. That's good optimization, but it isn't in-lining.
Inlining is putting code to accomplish what foo does in place of the foo call.
E.g. if under simple conditions and assumptions you in-lined

def foo(x,y):return 2*x+y+1
in
a = 3*foo(b,c)

the code would be

a = 3*(2*b+c+1)

This could be achieved by a custom import function that would capture the AST
and e.g. recognize a declaration like __inline__ = foo, bar followed by defs
of foo and bar, and extracting that from the AST and modifying the rest of the
AST wherever foo and bar calls occur, and generating suitable substitutions of
suitable AST subtrees generated from the ASTs of the foo and bar ASTs and rules
for renaming and guaranteeing safe temporary names etc. The modified AST would
then pass through the rest of the process to compilation and execution for the
creation of a module. You just wouldn't be able to plain old import to do the
importing (unless you had the custom import hooked in, and assumed that sources
without __inline__ = ... statements would not occur unintended.

Without hooking, usage might look like

import inliner
mymod = inliner.__import__('mymod') # side effect to mymod.pyc and sys.modules

and then you could expect calls to designated and defined inline functions in mymod.py
to have been inlined in the code of mymod.

I've been meaning to do a proof of concept, but I've hinted at these things before,
and no one seems much interested. And besides it's really a distraction from more radical
stuff I'd like to try ;-)

Regards,
Bengt Richter

Jan 5 '06 #18

Michael Spencer

Bengt Richter wrote:
....

This could be achieved by a custom import function that would capture the AST
and e.g. recognize a declaration like __inline__ = foo, bar followed by defs
of foo and bar, and extracting that from the AST and modifying the rest of the
AST wherever foo and bar calls occur, and generating suitable substitutions of
suitable AST subtrees generated from the ASTs of the foo and bar ASTs and rules
for renaming and guaranteeing safe temporary names etc. The modified AST would
then pass through the rest of the process to compilation and execution for the
creation of a module.
I've thought about a similar approach, but I suspect that pure AST
transformations (i.e., source-code inlining) would work only in a few special
cases. General inlining would require, I guess:

* Munging the names of the in-lined local function variables, e.g., prepending
them with the name of the inlined function, and perhaps a sequence number
* Inserting a set of assignments to replicate the parameter-passing
* Replacing the return statement (there would have to be only one) with another
assignment

In order for the assignment operations to take place in an expression list, the
modifications would have to happen at the byte-code level. It may be possible
to modify compiler.pycodegen.ModuleCodeGenerator to do most of the work. Or
perhaps the new pypy compiler might be more amenable to experimenting - I
haven't looked at it.

I've been meaning to do a proof of concept, but I've hinted at these things before,
and no one seems much interested. And besides it's really a distraction from more radical
stuff I'd like to try ;-)

Regards,
Bengt Richter

Might be fun to try

Michael

Jan 6 '06 #19

Xavier Morel

Peter Hansen wrote:

Riko, any chance you could post the final code and a bit more detail on
exactly how much Psyco contributed to the speedup? The former would be
educational for all of us, while I'm personally very curious about the
latter because my limited attempts to use Psyco in the past have
resulted in speedups on the order of only 20% or so. (I blame my
particular application, not Psyco per se, but I'd be happy to see a
real-world case where Psyco gave a much bigger boost.)

Thanks,
-Peter

Someone I know created an application to compute Markus Lyapunov
fractals (aka heavy mathematical computations) (he pretty much did it to
learn Python).

Last time I checked, his code ran in roughly 3 minutes (179s) on my box
(Athlon64/3000+) without psyco and 46 seconds with psyco enabled under
Windows 2000.

Someone else got respectively 2mn34s and 13s (without and with psyco) on
a Linux box with an Athlon XP 2600+ (same frequency as my 3200+ btw, 2GHz).

My tests show a 74% speedup, and the Linux test shows a 91% speedup.

In any case, the gain is significant because the actual code is very
short (less than 200 lines, and the algorithm itself fits in under 50
lines) and is called very often (from my notes, the main function is
called 160000 times during the computation of the fractal)

Jan 6 '06 #20

inline function call

Similar topics