473,507 Members | 5,060 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

inline function call

hi everyone,

I'm googeling since some time, but can't find an answer - maybe because
the answer is 'No!'.

Can I call a function in python inline, so that the python byte compiler
does actually call the function, but sort of inserts it where the inline
call is made? Therefore avoiding the function all overhead.

Thanks and cheers,

Riko
Jan 4 '06 #1
19 17485
I think it does'n exists.
But should be.

You can also roll up your own using some templating library..

Jan 4 '06 #2
Riko Wichmann wrote:
Can I call a function in python inline, so that the python byte compiler
does actually call the function, but sort of inserts it where the inline
call is made? Therefore avoiding the function all overhead.


No. That is simply impossible in python as well as in java where functions
are always virtual, meaning they are looked up at runtime. Because you'd
never know _which_ code to insert of all the different foo()-methods that
might be around there.

Do you have an actual use-case for that? I mean, do you have code that runs
slow, but with inlined code embarrassingly faster?

Regards,

Diez
Jan 4 '06 #3
> Do you have an actual use-case for that? I mean, do you have code that runs
slow, but with inlined code embarrassingly faster?


Well, I guess it would not actually be embarrassingly faster. From
trying various things and actually copying the function code into the
DoMC routine, I estimate to get about 15-20% reduction in the execution
time. It ran very slow, in the beginning but after applying some other
'fastpython' techniques it's actually quite fast ....

'inlining' is mostly a matter of curiosity now :)

here is the code snipplet:

-----------------------------------------------------------------

[... cut out some stuff here ....]
# riskfunc(med, low, high):
# risk function for costs: triangular distribution
# implemented acoording to:
http://www.brighton-webs.co.uk/distr...triangular.asp
def riskfunc(med, low, high):
if med != 0.0:
u = random()
try:
if u <= (med-low)/(high-low):
r = low+sqrt(u*(high-low)*(med-low))
else:
r = high - sqrt((1.0-u)*(high-low)*(high-med))

except ZeroDivisionError: # case high = low
r = med
else:
r = 0.0

return r
# doMC:
# run the MC of the cost analysis
#
def doMC(Ntrial = 1):

from math import sqrt

start = time.time()
print 'run MC with ', Ntrial, ' trials'

# start with a defined seed for reproducability

total = 0.0

for i in range(Ntrial):

summe = 0.0
for k in range(len(Gcost)):

x = riskfunc(Gcost[k], Gdown[k], Gup[k])
summe += x

# store the value 'summe' for later usage
# ..... more code here
print "Summe : ", summe
stop = time.time()
print 'Computing time: ', stop-start

################################################## ##################
################################################## ##################

if __name__ == '__main__':

n = 100000
doMC(n)
Jan 4 '06 #4
Riko Wichmann wrote:
hi everyone,

I'm googeling since some time, but can't find an answer - maybe because
the answer is 'No!'.

Can I call a function in python inline, so that the python byte compiler
does actually call the function, but sort of inserts it where the inline
call is made? Therefore avoiding the function all overhead.


The cannonical answer is "you probably don't need to do that."

If you're still set on inlining functions, take a look at bytecodehacks:
http://bytecodehacks.sourceforge.net/
Jan 4 '06 #5
Riko Wichmann wrote:
Can I call a function in python inline, so that the python byte compiler
does actually call the function, but sort of inserts it where the inline
call is made? Therefore avoiding the function all overhead.


I know a simple technique that could should basically avoid the function
call overhead that might be worrying you, but it's really not suitable
to use except in case of a really serious bottleneck. How bad is the
performance in the particular case that concerns you? What kinds of
timing measurement have you got that show the problem?

(The technique is basically a particular way of using a generator...
should be obvious and easy to figure out if it's really the "function
call overhead" that is your bottleneck.)

-Peter

Jan 4 '06 #6
Riko Wichmann wrote:
def riskfunc(med, low, high):
if med != 0.0:
u = random()
try:
if u <= (med-low)/(high-low):
r = low+sqrt(u*(high-low)*(med-low))
else:
r = high - sqrt((1.0-u)*(high-low)*(high-med))

except ZeroDivisionError: # case high = low
r = med
else:
r = 0.0

return r


Since math.sqrt() is in C, the overhead of the sqrt() call is probably
minor, but the lookup is having to go to the global namespace which is a
tiny step beyond looking just at the locals. Using a default argument
to get a local could make a small difference. That is do this instead:

def riskfunc(med, low, high, sqrt=math.sqrt):

Same thing with calling random(), which is doing a global lookup first
to find the function.

By the way, you'll get better timing information by learning to use the
timeit module. Among other things, depending on your platform and how
long the entire loop takes to run, using time.time() could be given you
pretty coarse results.

You might also try precalculating high-low and storing it in a temporary
variable to avoid the duplicate calculations.

-Peter

Jan 4 '06 #7
On Wed, 04 Jan 2006 13:18:32 +0100, Riko Wichmann wrote:
I'm googeling since some time, but can't find an answer - maybe because
the answer is 'No!'.

Can I call a function in python inline, so that the python byte compiler
does actually call the function, but sort of inserts it where the inline
call is made? Therefore avoiding the function all overhead.


In standard python, the answer is no. The reason is that all python
functions are effectively "virtual", and you don't know *which* version to
inline.

HOWEVER, there is a slick product called Psyco:

http://psyco.sourceforge.net/

which gets around this by creating multiple versions of functions which
contain inlined (or compiled) code. For instance, if foo(a,b) is often
called with a and b of int type, then a special version of foo is compiled
that is equivalent performance wise to foo(int a,int b). Dynamically
finding the correct version of foo at runtime is no slower than normal
dynamic calls, so the result is a very fast foo function. The only
tradeoff is that every specialized version of foo eats memory. Psyco
provides controls allowing you to specialize only those functions that
need it after profiling your application.

--
Stuart D. Gathman <st****@bmsi.com>
Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

Jan 4 '06 #8
Hey guys,

thanks for all the quick replies! In addition to the tips Peter and
Stuart gave me above, I also followed some of the hints found under

http://wiki.python.org/moin/PythonSpeed/PerformanceTips

That greatly improved performance from about 3 minutes initially (inner
loop about 2000, outer loop about 10000 runs - I think) down to a few
seconds. My question on the inline function call was triggered by the 3
minute run on a pretty small statistic (10000 events) Monte Carlo
sample. Now, I'm much more relaxed! :)

One of the biggest improvements in addition to using psyco was actually
being careful about avoiding global namespace lookup.

So, thanks again, I learned a great deal about efficient python coding
today! :)

Cheers,

Riko
Jan 4 '06 #9
Riko Wichmann wrote:
That greatly improved performance from about 3 minutes initially (inner
loop about 2000, outer loop about 10000 runs - I think) down to a few
seconds. My question on the inline function call was triggered by the 3
minute run on a pretty small statistic (10000 events) Monte Carlo
sample. Now, I'm much more relaxed! :)

One of the biggest improvements in addition to using psyco was actually
being careful about avoiding global namespace lookup.


Riko, any chance you could post the final code and a bit more detail on
exactly how much Psyco contributed to the speedup? The former would be
educational for all of us, while I'm personally very curious about the
latter because my limited attempts to use Psyco in the past have
resulted in speedups on the order of only 20% or so. (I blame my
particular application, not Psyco per se, but I'd be happy to see a
real-world case where Psyco gave a much bigger boost.)

Thanks,
-Peter

Jan 4 '06 #10
Peter Hansen>but I'd be happy to see a real-world case where Psyco gave
a much bigger boost.)<

Psyco can be very fast, but:
- the program has to be the right one;
- you have to use "low level" programming, programming more like in C,
avoiding most of the nice things Python has, like list generators,
etc.;
- you can try to use array.array (for floats and ints), I have found
that sometimes Psyco can use them in a very fast way.

This is an example of mine, it's not really a real-world case, it looks
like C, but it shows the difference, if you switch off Psyco you can
see that it goes much slower:
http://shootout.alioth.debian.org/gp...ang=psyco&id=0

This Python version is MUCH slower, but it looks more like Python:
http://shootout.alioth.debian.org/gp...ng=python&id=0
If you try Psyco with this version you can see that it's much slower
than the other one.

This is the summary page, the Python version is about 54 times slower
than the Psyco version:
http://shootout.alioth.debian.org/gp...nkuch&lang=all

More info:
http://psyco.sourceforge.net/psycoguide/node29.html

Bye,
bearophile

Jan 4 '06 #11
Hi Peter,
Riko, any chance you could post the final code and a bit more detail on
exactly how much Psyco contributed to the speedup? The former would be
educational for all of us, while I'm personally very curious about the
latter because my limited attempts to use Psyco in the past have
resulted in speedups on the order of only 20% or so. (I blame my
particular application, not Psyco per se, but I'd be happy to see a
real-world case where Psyco gave a much bigger boost.)


the difference between running with and without psyco is about a factor
3 for my MC simulation. Without psyco the simulation runs for 62 sec,
with it for 19 secs (still using time instead of timeit, though!:) This
is for about 2300 and 10000 in for the inner and outer loop, respectively.

A factor 3 I consider worthwhile, especially since it doesn't really
cost you much.

This is on a Dell Lat D600 running Linux (Ubuntu 5.10) with a 1.6 GHz
Pentium M and 512 MB of RAM and python2.4.

The final code snipplet is attached. However, it is essentially
unchanged compared to the piece I posted earlier which already had most
of the global namespace look-up removed. Taking care of sqrt and random
as you suggested didn't improve much anymore. So it's probably not that
educational afterall.

Cheers,

Riko

-----------------------------------------------------
# import some modules
import string
import time
from math import sqrt

# accelerate:
import psyco

# random number init
from random import random, seed
seed(1)

# riskfunc(med, low, high):
# risk function for costs: triangular distribution
# implemented acoording to:
http://www.brighton-webs.co.uk/distr...triangular.asp
def riskfunc(med, low, high):
if med != 0.0:
u = random()
try:
if u <= (med-low)/(high-low):
r = low+sqrt(u*(high-low)*(med-low))
else:
r = high - sqrt((1.0-u)*(high-low)*(high-med))

except ZeroDivisionError: # case high = low
r = med
else:
r = 0.0

return r
# doMC:
# run the MC of the cost analysis
#
def doMC(Ntrial = 1):

start = time.time()
print 'run MC with ', Ntrial, ' trials'
# now do MC simulation and calculate sums

for i in range(Ntrial):

summe = 0.0
# do MC experiments for all cost entries
for k in range(len(Gcost)):
x = riskfunc(Gcost[k], Gdown[k], Gup[k])
summe +=x

if i%(Ntrial/10) == 0:
print i, 'MC experiment processed, Summe = %10.2f' % (summe)

stop = time.time()
print 'Computing time: ', stop-start

################################################## ##################
################################################## ##################

if __name__ == '__main__':

fname_base = 'XFEL_budget-book_Master-2006-01-02_cost'

readCosts(fname_base+'.csv')

psyco.full()

n = 10000
doMC(n)
Jan 4 '06 #12
On Wed, 04 Jan 2006 13:18:32 +0100, Riko Wichmann wrote:
hi everyone,

I'm googeling since some time, but can't find an answer - maybe because
the answer is 'No!'.

Can I call a function in python inline, so that the python byte compiler
does actually call the function, but sort of inserts it where the inline
call is made? Therefore avoiding the function all overhead.


The closest thing to that is the following:
# original version:

for i in xrange(100000):
myObject.something.foo() # three name space lookups every loop
# inline version:

# original version:

foo = myObject.something.foo
for i in xrange(100000):
foo() # one name space lookup every loop

--
Steven.

Jan 4 '06 #13
Riko Wichmann wrote:
the difference between running with and without psyco is about a factor
3 for my MC simulation. A factor 3 I consider worthwhile, especially since it doesn't really
cost you much.
Definitely. "import psyco; psyco.full()" is pretty hard to argue
against. :-)
The final code snipplet is attached. However, it is essentially
unchanged compared to the piece I posted earlier which already had most
of the global namespace look-up removed. Taking care of sqrt and random
as you suggested didn't improve much anymore. So it's probably not that
educational afterall.


Seeing what others have achieved is always educational to the ignorant,
so I learned something. ;-)

I suspect using psyco invalidates a number of the typical Python
optimizations, and localizing global namespace lookups with default
variable assignments is probably one of them.

Thanks for posting.
-Peter

Jan 4 '06 #14
I haven't examined the code very well, but generally I don't suggest to
use exceptions inside tight loops that that have to go fast.

Bye,
bearophile

Jan 4 '06 #15
Peter Hansen wrote:

Seeing what others have achieved is always educational to the ignorant,
so I learned something. ;-)


Would have been even more educating, if I had my original code still at
hand for comparison, which unfortunately I didn't. But all the
improvements come from following the advises given in the URL I posted
earlier already:

http://wiki.python.org/moin/PythonSpeed/PerformanceTips

Cheers,

Riko
Jan 5 '06 #16
Diez B. Roggisch wrote:
No. That is simply impossible in python as well as in java where functions
are always virtual, meaning they are looked up at runtime. Because you'd
never know _which_ code to insert of all the different foo()-methods that
might be around there.


Not quite "simply impossible" -- inlining function/method calls could
indeed be an optimization done (eventually) by PyPy or another
Psyco-like optimizer. In this case, function inlining is just something
that You Can Do if you dynamically determine that the function is a
constant object.

Since it is an optimization that makes an assumption about the constancy
of an object, this wouldn't hold true in the general case; an
interpreter which makes that dynamic optimization would need some degree
if checks to make sure that the assumption remains valid.

So it's not technically impossible, at least in the majority of cases
where functions are neither modified nor rebound, no current python
interpreter makes that assumption.
Jan 5 '06 #17
On Thu, 05 Jan 2006 08:47:37 +1100, Steven D'Aprano <st***@REMOVETHIScyber.com.au> wrote:
On Wed, 04 Jan 2006 13:18:32 +0100, Riko Wichmann wrote:
hi everyone,

I'm googeling since some time, but can't find an answer - maybe because
the answer is 'No!'.

Can I call a function in python inline, so that the python byte compiler
does actually call the function, but sort of inserts it where the inline
call is made? Therefore avoiding the function all overhead.


The closest thing to that is the following:
# original version:

for i in xrange(100000):
myObject.something.foo() # three name space lookups every loop
# inline version:

# original version:

foo = myObject.something.foo
for i in xrange(100000):
foo() # one name space lookup every loop

But that isn't in-lining, that's hoisting some expression-evaluation
out of a loop on the assumption that the expression has constant value
and no side effects. That's good optimization, but it isn't in-lining.
Inlining is putting code to accomplish what foo does in place of the foo call.
E.g. if under simple conditions and assumptions you in-lined

def foo(x,y):return 2*x+y+1
in
a = 3*foo(b,c)

the code would be

a = 3*(2*b+c+1)

This could be achieved by a custom import function that would capture the AST
and e.g. recognize a declaration like __inline__ = foo, bar followed by defs
of foo and bar, and extracting that from the AST and modifying the rest of the
AST wherever foo and bar calls occur, and generating suitable substitutions of
suitable AST subtrees generated from the ASTs of the foo and bar ASTs and rules
for renaming and guaranteeing safe temporary names etc. The modified AST would
then pass through the rest of the process to compilation and execution for the
creation of a module. You just wouldn't be able to plain old import to do the
importing (unless you had the custom import hooked in, and assumed that sources
without __inline__ = ... statements would not occur unintended.

Without hooking, usage might look like

import inliner
mymod = inliner.__import__('mymod') # side effect to mymod.pyc and sys.modules

and then you could expect calls to designated and defined inline functions in mymod.py
to have been inlined in the code of mymod.

I've been meaning to do a proof of concept, but I've hinted at these things before,
and no one seems much interested. And besides it's really a distraction from more radical
stuff I'd like to try ;-)

Regards,
Bengt Richter
Jan 5 '06 #18
Bengt Richter wrote:
....

This could be achieved by a custom import function that would capture the AST
and e.g. recognize a declaration like __inline__ = foo, bar followed by defs
of foo and bar, and extracting that from the AST and modifying the rest of the
AST wherever foo and bar calls occur, and generating suitable substitutions of
suitable AST subtrees generated from the ASTs of the foo and bar ASTs and rules
for renaming and guaranteeing safe temporary names etc. The modified AST would
then pass through the rest of the process to compilation and execution for the
creation of a module.
I've thought about a similar approach, but I suspect that pure AST
transformations (i.e., source-code inlining) would work only in a few special
cases. General inlining would require, I guess:

* Munging the names of the in-lined local function variables, e.g., prepending
them with the name of the inlined function, and perhaps a sequence number
* Inserting a set of assignments to replicate the parameter-passing
* Replacing the return statement (there would have to be only one) with another
assignment

In order for the assignment operations to take place in an expression list, the
modifications would have to happen at the byte-code level. It may be possible
to modify compiler.pycodegen.ModuleCodeGenerator to do most of the work. Or
perhaps the new pypy compiler might be more amenable to experimenting - I
haven't looked at it.

I've been meaning to do a proof of concept, but I've hinted at these things before,
and no one seems much interested. And besides it's really a distraction from more radical
stuff I'd like to try ;-)

Regards,
Bengt Richter


Might be fun to try

Michael

Jan 6 '06 #19
Peter Hansen wrote:
Riko, any chance you could post the final code and a bit more detail on
exactly how much Psyco contributed to the speedup? The former would be
educational for all of us, while I'm personally very curious about the
latter because my limited attempts to use Psyco in the past have
resulted in speedups on the order of only 20% or so. (I blame my
particular application, not Psyco per se, but I'd be happy to see a
real-world case where Psyco gave a much bigger boost.)

Thanks,
-Peter


Someone I know created an application to compute Markus Lyapunov
fractals (aka heavy mathematical computations) (he pretty much did it to
learn Python).

Last time I checked, his code ran in roughly 3 minutes (179s) on my box
(Athlon64/3000+) without psyco and 46 seconds with psyco enabled under
Windows 2000.

Someone else got respectively 2mn34s and 13s (without and with psyco) on
a Linux box with an Athlon XP 2600+ (same frequency as my 3200+ btw, 2GHz).

My tests show a 74% speedup, and the Linux test shows a 91% speedup.

In any case, the gain is significant because the actual code is very
short (less than 200 lines, and the algorithm itself fits in under 50
lines) and is called very often (from my notes, the main function is
called 160000 times during the computation of the fractal)
Jan 6 '06 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

14
2742
by: Chris Mantoulidis | last post by:
I am not clear with the use of the keyword inline... I believe you add it do a function when you implement the function inside the header file where the class is stored... But is that all? What...
20
5930
by: qazmlp | last post by:
My class in a header file, contains inline virtual destructor. Is this Ok? Can it cause any problems? class base { public: base() { } virtual ~base { std::cout<<"Inside virtual destructor\n";...
47
3799
by: Richard Hayden | last post by:
Hi, I have the following code: /******************************** file1.c #include <iostream> extern void dummy(); inline int testfunc() {
20
3106
by: Grumble | last post by:
Hello everyone, As far as I understand, the 'inline' keyword is a hint for the compiler to consider the function in question as a candidate for inlining, yes? What happens when a function with...
5
1931
by: Tony Johansson | last post by:
Hello experts! I reading a book called programming with design pattern revealed by Tomasz Muldner and here I read something that sound strange. Here is the whole section: It says" Because...
8
1728
by: John Ratliff | last post by:
Can the compiler ever inline a method when there is a pointer to the member used? Thanks, --John Ratliff
18
5025
by: Method Man | last post by:
If I don't care about the size of my executable or compile time, is there any reason why I wouldn't want to inline every function in my code to make the program run more efficient?
9
2884
by: chinu | last post by:
hi all, i did a small experiment to grasp the advantages of declaring a function as inline. inline int fun1(); int main(){ unsigned int start=0,end=0; asm("rdtsc\n\t"
12
676
by: sam_cit | last post by:
Hi Everyone, I have few questions on inline functions, when i declare a function as inline, is it for sure that the compiler would replace the function call with the actual body of the function?...
17
8352
by: Juha Nieminen | last post by:
As we know, the keyword "inline" is a bit misleading because its meaning has changed in practice. In most modern compilers it has completely lost its meaning of "a hint for the compiler to inline...
0
7110
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7372
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
5623
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
4702
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3191
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
3179
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1540
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
758
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
411
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.