Inline function wont increase the size of the program...as every time a inline function is called the whole function is copied at the the time of the call of the function.
As requirement is though the fastest method but it should be an optimised one.
The fastest way is the inline function. There are no calls. No stack frames to allocate and deallocate.
10000 copies of the function along the thread of execution is the fastest way.
But there's bloat. Next best is an inline function inside a loop.
Next best is a function call inside a loop.