By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
458,112 Members | 1,619 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 458,112 IT Pros & Developers. It's quick & easy.

How to force 'inline' with GCC or ICC

P: n/a
Hello

I have a program with many many inlined template functions
It is essential for the execution speed that every (or almost every)
function marked as inlined, becomes really inlined by the compiler.

I already compiled the program with Intel Compiler (ICL) on Visual C++, and
it works fine and fast. I verified that the functions are really inlined.

But with GCC 3.4 (Linux+Cygwin) or ICC (Linux), The same program is about 5
times slower than under Windows.
The '-Winline' option of GCC shows me that many functions are not inlined
like they should. The compiler consider the 'inline' keyword as an advice,
but does not follow it.
I try to set various options of GCC, but nothing is satisfactory as far:
-finline-limie 100000000
--param large-function-growth
--param max-inline-insns-single
....

Has some-one suggestions how to force GCC/ICC to obey, or to increase the
limits that these compilers internaly have?


Jul 23 '05 #1
Share this Question
Share on Google+
43 Replies


P: n/a

15.05.2005 16:25 answer to:
"Patrick Laurent" <la*****@ient.rwth-aachen.de> wrote:
Hello

I have a program with many many inlined template functions
It is essential for the execution speed that every (or almost every)
function marked as inlined, becomes really inlined by the compiler.

I already compiled the program with Intel Compiler (ICL) on Visual C++, and
it works fine and fast. I verified that the functions are really inlined.

But with GCC 3.4 (Linux+Cygwin) or ICC (Linux), The same program is about 5
times slower than under Windows.
The '-Winline' option of GCC shows me that many functions are not inlined
like they should. The compiler consider the 'inline' keyword as an advice,
but does not follow it.
I try to set various options of GCC, but nothing is satisfactory as far:
-finline-limie 100000000
--param large-function-growth
--param max-inline-insns-single
....

Has some-one suggestions how to force GCC/ICC to obey, or to increase the
limits that these compilers internaly have?

Inlining is generally a rather tricky business. The keyword is simply a
hint and not mandatory. It depends not only on the compiler switches that
you issue, but also on the code itself whether the compiler decides to
inline the code or not. Virtual functions for example may or may not be
inlined, depending whether the compiler can unambiguously identify the
"real" object type. The general rule is that polymorphism must work and all
optimizations must stand behind. There are more issues (e.g. recursion,
etc...) regarding the decision of inlined code. Probably you could give
some example code?

BTW, check if the debug & the release version of your code show the same
inlining-behavior.

Cheers
Chris

Jul 23 '05 #2

P: n/a
Patrick Laurent wrote:
Hello

I have a program with many many inlined template functions
It is essential for the execution speed that every (or almost every)
function marked as inlined, becomes really inlined by the compiler.

I already compiled the program with Intel Compiler (ICL) on Visual C++, and
it works fine and fast. I verified that the functions are really inlined.

But with GCC 3.4 (Linux+Cygwin) or ICC (Linux), The same program is about 5
times slower than under Windows.
The '-Winline' option of GCC shows me that many functions are not inlined
like they should. The compiler consider the 'inline' keyword as an advice,
but does not follow it.
I try to set various options of GCC, but nothing is satisfactory as far:
-finline-limie 100000000
--param large-function-growth
--param max-inline-insns-single
...

Did you actually supply a value to the --param arguments? Otherwise you
probably set them to zero. With -Winline g++ 3.4 tells me exactly which
parameter is exceeded.

p
Jul 23 '05 #3

P: n/a
>
Did you actually supply a value to the --param arguments? Otherwise you probably set them to zero. With -Winline g++ 3.4 tells me exactly which parameter is exceeded.


I only wrote the parameters names.

Yes,I did supply a value (in fact I tried many values, most of the time big
values).

But GCC still doesn't inline many important functions, in comparison ICL on
windows does.

You are right, GCC tells which parameter is exceeded, so I always supplied a
bigger value to every corresponding parameter (up to astronomic values). But
it did not work: a few more functions were inlined, but the execution speed
is still very much slower than on Windows.

Is there no way to force inlining?

Pat
Jul 23 '05 #4

P: n/a
Patrick Laurent wrote:
Is there no way to force inlining?


__forceinline (instead of inline) on Windows.

Jul 23 '05 #5

P: n/a
> Is there no way to force inlining?

__forceinline (instead of inline) on Windows.


That's not my question.
I have no problem with ICL on a visual environnement. And I already knew
'__forceinline' if I had a problem.
I need to force inlining with GCC and ICC.
Jul 23 '05 #6

P: n/a
Patrick Laurent wrote:
That's not my question.
I have no problem with ICL on a visual environnement. And I already knew
'__forceinline' if I had a problem.
I need to force inlining with GCC and ICC.

I think you will find more help on this, if you consult the GCC mailing lists (and
newsgroups if such exist).
--
Ioannis Vranos

http://www23.brinkster.com/noicys
Jul 23 '05 #7

P: n/a
I found __inline__ for GCC, it is an improvement, but it is still not
satisfying.
I define a macro:
#define inline __inline__
Jul 23 '05 #8

P: n/a
Patrick Laurent wrote:
I found __inline__ for GCC, it is an improvement, but it is still not
satisfying.
I define a macro:
#define inline __inline__


You're more or less trying to force your compiler to do something, which
it obviously has reasons not to do. In general you can trust your
compiler with these things ;-) Only because other compilers will inline
some code does not necessarily mean that it can be inlined by all
compilers because that depends very much on the whole code and the
development system including the linker. Inlining is not as trivial as
it seems at first glance (see my previous post).

I'd suggest that you either follow Ioannis hint of asking in a GCC
dedicated group or at least show us part of your code, which is not
inlined by GCC, but by other compilers.

Cheers
Chris
Jul 23 '05 #9

P: n/a
I verified, in fact __inline__ is not better.

That's the point, I obviously cannot trust GCC nor ICC, because ICL on
Windows results in much quicker execution speed.
GCC and ICC do not inline functions as good as ICL.
The fact is that GCC is very very very bad in my case in comparison to ICL.

It is not possible to give a small test programm. If you want to test on
your own, I propose you download my library at following adress, and compile
the following test. (No need to compile the library, it is STL-like)
http://www.ient.rwth-aachen.de/team/...al/genial.html

The execution time on a Pentium 4, 3.2GHz:
With ICL on Windows:
-No simd: 0.368s
-SSE: 0.126s
-SSE3: 0.112s
With GCC on Cygwin (-O3 -msse3 -UWIN32 -ftemplate-depth-36 -lstlport)
-No SIMD : 0.969s
-SSE: 2.069s
#define FFT_LEVEL 32
#include "signal/fft.h"
int main()
{
DenseVector<complex<float> >::self X(32,0);
DenseVector<complex<float> >::self Y(X.size(),0);
double t0=get_time();
for (int i=0; i<1000000; ++i)
fft(X,Y);
cout << get_time()-t0 << endl;
}
Jul 23 '05 #10

P: n/a
Patrick Laurent wrote:
That's the point, I obviously cannot trust GCC nor ICC, because ICL on Windows results in much quicker execution speed.


You compare a compiler for multiple platforms and processors with a
highly specialiced compiler for one processor (line). The latter is
faster. Surprise? A sportscar is faster than a general purpose van.
Surprise?

R.C.

Jul 23 '05 #11

P: n/a
I compared many other configurations (Windows/Cygwin/Linux, ICL/ICC/GCC, No
Simd/SSE/SSE2/SSE3), but every configuration was on the same pentium 4
3.2GHz.
I also compiled with GCC with the pentium specific flags.

So please, stop defending GCC. Did you buy actions of GNU?
I found the (main) reason why it is much slower than ICL: it does not inline
the functions as much as ICL.

Rapscallion, if I understand you well, it's normal that GCC is much slower
than ICL on the same system.
Maybe GCC and ICL are both good to compile C, but ICL has a clear advance
for generic C++.

Pat

Jul 23 '05 #12

P: n/a
"Patrick Laurent" <la*****@ient.rwth-aachen.de> wrote in message news:3e************@news.dfncis.de...
I verified, in fact __inline__ is not better.

That's the point, I obviously cannot trust GCC nor ICC, because ICL on
Windows results in much quicker execution speed.
GCC and ICC do not inline functions as good as ICL.
The fact is that GCC is very very very bad in my case in comparison to ICL.


Why are you so convinced that (lack of) inlining is responsible for the comparatively poor performance you are seeing
for GCC? There are other possible explanations (it is also quite well-known that excessive inlining can actually be
detrimental to performance - modern compilers really do know best). There are, in particular, other (particularly
floating point-related) optimisation settings for GCC that you may want to look into.

Perhaps gnu.g++.help would be a good place to ask.

Regards,

--
Lionel B

Jul 23 '05 #13

P: n/a
I am convicted that the poor performance is due to the lack of inlining
because I get slow execution speed with ICL when the functions are not
marked as 'inline'.
With the '-Winline' option of GCC, I see every not inlined functions.

Also the SSE mode should be much quicker than without SIMD, but requires
much more inlining.
ICL manages it, GCC not at all. (see speed measure in a previous post)
Jul 23 '05 #14

P: n/a
Patrick Laurent wrote:
Rapscallion, if I understand you well, it's normal that GCC is much slower than ICL on the same system.
Yes, VC++ and ICL produce faster and smaller code on Windows. Is this
really surprising to you?
Maybe GCC and ICL are both good to compile C, but ICL has a clear advance for generic C++.


A volunteer open source project vs the compiler team of a BIG company.

R.C.

Jul 23 '05 #15

P: n/a
"Patrick Laurent" <la*****@ient.rwth-aachen.de> wrote in message news:3e************@news.dfncis.de...
I am convicted that the poor performance is due to the lack of inlining
because I get slow execution speed with ICL when the functions are not
marked as 'inline'.
So that explains why inlining is crucial to the performance of the ICL compiled code :-)
With the '-Winline' option of GCC, I see every not inlined functions.
Does ICL really inline everything you tell it to (either explicitly or via member definition in class declaration[*])? I
find that quite surprising.[*] BTW, I note that you use the "inline" keyword within class declarations - that is
redundant, as far as I know... or is it just to generate warnings?

I also find it somewhat odd that the ICC compiler should produce much slower code for Linux than ICL does for Windows.
Also the SSE mode should be much quicker than without SIMD, but requires
much more inlining.
ICL manages it, GCC not at all. (see speed measure in a previous post)


ICL does have a good reputation as an optimising compiler; I've never found gcc that fantastic for optimisation either
(on Win32 about on par with the old VC6), although it is difficult to generalise, as relative performance seems to
depend heavily on the nature of the code. As of ver 4.0 (recently released) GCC has a new optimisation framework.
Apparently new optimisations are not yet in place, but we are told to expect better optimisation in forthcoming
releases...

Regards,

--
Lionel B

Jul 23 '05 #16

P: n/a
ICL really inline the functions.
I debugged the program in release mode (yes, it's possible), and also saw
the assembler code. I don't understand much the assembler, but it's easy to
see that the functions are really inlined.

It is also possible with ICL to use the non standard '__forceinline'
keyword, but it does not bring much.
If such a '__forceinline' would exist for GCC, it would be useful.
I use the 'inline'within class declaration test purposes, I write the
following macro for ICL:
#define inline __forceinline

I cannot either explain why ICC produce much slower code than ICL.
I do not know how to see which functions it inlined or not.
The '__forceinline' has no effect with ICC.

Pat
Jul 23 '05 #17

P: n/a
Lionel B wrote:
BTW, I note that you use the "inline" keyword within class declarations - that is
redundant, as far as I know... or is it just to generate warnings?

As far as I know it is redundant but it is also legal, no warnings should be produced.
--
Ioannis Vranos

http://www23.brinkster.com/noicys
Jul 23 '05 #18

P: n/a
Patrick Laurent wrote:
ICL really inline the functions.
I debugged the program in release mode (yes, it's possible), and also saw
the assembler code. I don't understand much the assembler, but it's easy to
see that the functions are really inlined.

It is also possible with ICL to use the non standard '__forceinline'
keyword, but it does not bring much.
If such a '__forceinline' would exist for GCC, it would be useful.
I use the 'inline'within class declaration test purposes, I write the
following macro for ICL:
#define inline __forceinline

I cannot either explain why ICC produce much slower code than ICL.
I do not know how to see which functions it inlined or not.
The '__forceinline' has no effect with ICC.

If you want the functions to be inlined at all costs and there is no GCC switch to do that
(you should really consult GCC mailing lists and/or newsgroups for that, check
http://gcc.gnu.org for any discussion mailing list that is suitable for your subject),
then you can use the low level part of C++, macros. If you write your code with macros, it
will definitely be inlined.
--
Ioannis Vranos

http://www23.brinkster.com/noicys
Jul 23 '05 #19

P: n/a
"Ioannis Vranos" <iv*@remove.this.grad.com> wrote in message news:1116332851.514972@athnrd02...
Lionel B wrote:
BTW, I note that you use the "inline" keyword within class declarations - that is
redundant, as far as I know... or is it just to generate warnings?


As far as I know it is redundant but it is also legal, no warnings should be produced.


Of course - I didn't express myself well: I thought it might be necessary to use the "inline" keyword to enable
reporting of non-inlining by g++ with the -Winline flag (I've no reason to suspect that that is the case, beyond the use
of "inline" by the OP).

--
Lionel B

Jul 23 '05 #20

P: n/a
I found nothing on the GCC mailing list.
I posted a mail...waiting...
Jul 23 '05 #21

P: n/a
"Patrick Laurent" <la*****@ient.rwth-aachen.de> wrote in message
news:3e************@news.dfncis.de...
I found nothing on the GCC mailing list.
I posted a mail...waiting...


As you're obviously making use of sophisticated template mechanisms you
should be careful comparing the results of different compilers. You should
be aware that Intel and GCC follow different approaches on how these things
are treated and this naturally has an impact on the generated code. Hence,
as it was mentioned before the lack of speed might not necessarily be
related only the inlining only! You would have to compare the whole
generated machine code to look deeper into this.

However, the question of to inline or not to inline should be left to the
compiler as it is closely connected to the other optimizations. For example
inlining might cause or even prevent thrashing and the same is true for
cache misses. The decision is based on many things, e.g., does the function
call another inlined function, are there loops, does it recur.

BTW have you tried your code with GCC 4.0 which is based on a totally
different design and also optimization approach?

Cheers
Chris
Jul 23 '05 #22

P: n/a
> As you're obviously making use of sophisticated template mechanisms you
should be careful comparing the results of different compilers.
I agree, but I still can compare the execution speed on the same computer,
with a chronometer.
The winner is the quickest.
I don't really care how the compiler does its job, but it must give a quick
executable.
And in this case GCC is between 3 times and 20 times slower.
However, the question of to inline or not to inline should be left to the
compiler as it is closely connected to the other optimizations.
I'd love not be concerned with inlining and let this difficulty to the
compiler, but I have many reasons to think that the slowness is due to bad
inlining from GCC.
BTW have you tried your code with GCC 4.0 which is based on a totally
different design and also optimization approach?

I could not test with GCC 4.0 yet, because the version is very new, and our
administrator thinks (I don't know why, he must have his reasons) that it is
not stable enough.

Pat
Jul 23 '05 #23

P: n/a
Patrick Laurent wrote:
As you're obviously making use of sophisticated template mechanisms you
should be careful comparing the results of different compilers.

I agree, but I still can compare the execution speed on the same computer,
with a chronometer.
The winner is the quickest.
I don't really care how the compiler does its job, but it must give a quick
executable.
And in this case GCC is between 3 times and 20 times slower.


Ouch then something seems to go quite wrong here. GCC is not that bad
normally. We frequently use it here at CERN, also for speed sensitive
applications & simulations. I somehow have the impression that this
might be closely related to your code.

I just took a quick look at your code and you're swinging the keyword
inline like a hammer. Most of the non-class functions that you declare
inline won't be inlined anyway because of their structure. Inlining
those would presumably have more negative than positive effects
regarding cache misses. Furthermore, you have quite a lot of redundant
code, which could be optimized (and in most cases will be) by the
compiler. However, it strikes me a little odd, as your obviously very
speed concerned.

However, the question of to inline or not to inline should be left to the
compiler as it is closely connected to the other optimizations.

I'd love not be concerned with inlining and let this difficulty to the
compiler, but I have many reasons to think that the slowness is due to bad
inlining from GCC.

BTW have you tried your code with GCC 4.0 which is based on a totally
different design and also optimization approach?


I could not test with GCC 4.0 yet, because the version is very new, and our
administrator thinks (I don't know why, he must have his reasons) that it is
not stable enough.

Pat


Okay GCC 4.0 might not be that stable, however it might be worth a shot
trying it with a local installation on your private machine.

Cheers
Chris

Jul 23 '05 #24

P: n/a
Patrick Laurent wrote:
I agree, but I still can compare the execution speed on the same computer,
with a chronometer.
The winner is the quickest.

Unless you did not express yourself well, I think that this one is original. :-)
--
Ioannis Vranos

http://www23.brinkster.com/noicys
Jul 23 '05 #25

P: n/a
Patrick Laurent wrote:
Did you actually supply a value to the --param arguments? Otherwise


you probably set them to zero. With -Winline g++ 3.4 tells me exactly


which parameter is exceeded.


I only wrote the parameters names.

Yes,I did supply a value (in fact I tried many values, most of the time big
values).

But GCC still doesn't inline many important functions, in comparison ICL on
windows does.

You are right, GCC tells which parameter is exceeded, so I always supplied a
bigger value to every corresponding parameter (up to astronomic values). But
it did not work: a few more functions were inlined, but the execution speed
is still very much slower than on Windows.

Is there no way to force inlining?

Pat

I had a similar issue not too long ago. It was a numerical math problem
that could be parametrized with function templates. With gcc 3.4 I
cranked the parameters up ( I don't recall the actual numbers and the
parameters, but I think I tried something well in the range of 100000)
until everything was inlined. It gave me a factor ten in execution
speed. Also change the right parameter. I am not kidding, there are lots
of parameters that have 'inline' in their name.

I have never programmed for a windows platform, but on linux gcc
produces very fast code in my experience. Comparabable in speed to the
code produced with icc or the portland compiler

p

Jul 23 '05 #26

P: n/a
>I just took a quick look at your code and you're swinging the keyword
inline like a hammer. Most of the non-class functions that you declare
inline won't be inlined anyway because of their structure. Inlining
those would presumably have more negative than positive effects
regarding cache misses. Furthermore, you have quite a lot of redundant
code, which could be optimized (and in most cases will be) by the
compiler. However, it strikes me a little odd, as your obviously very
speed concerned.
I declared many inline functions, but there are small functions.
I admit that the combinaison of many functions can become quite big.
But as far as I know the 'O2' or 'O3' options consider any function (inline
or not) as potentially inlinable.
GCC might be good for C programs, but not for generic C++ (in comparison to
ICL).
If you think that my program is redundant (or bad), contributions are
welcome, but not critics.
You could have a look to MTL,Pooma,Blitz++,Newmat: these libraries use
equivalent princips.
My philosophy was never to adapt my code to the compilers, I want to have my
code as clean as possible.
I had a similar issue not too long ago. It was a numerical math problem
that could be parametrized with function templates. With gcc 3.4 I
cranked the parameters up ( I don't recall the actual numbers and the
parameters, but I think I tried something well in the range of 100000)
until everything was inlined. It gave me a factor ten in execution
speed. Also change the right parameter. I am not kidding, there are lots
of parameters that have 'inline' in their name.

I have never programmed for a windows platform, but on linux gcc
produces very fast code in my experience. Comparabable in speed to the
code produced with icc or the portland compiler


I am glad to see that someone had a similar experience to me.
I already tried many parameters (with astronmic values as well), but nothing
was satisfying.

In my case, ICC is about as slow as GCC, I cannot explain why it does not
compile like ICL. I strongly suppose that ICC does not inline the functions
like I would like. But I don't know how to verify it.

Pat
Jul 23 '05 #27

P: n/a

"Patrick Laurent" <la*****@ient.rwth-aachen.de> wrote in message
news:3f************@news.dfncis.de...
I just took a quick look at your code and you're swinging the keyword
inline like a hammer. Most of the non-class functions that you declare
inline won't be inlined anyway because of their structure. Inlining
those would presumably have more negative than positive effects
regarding cache misses. Furthermore, you have quite a lot of redundant
code, which could be optimized (and in most cases will be) by the
compiler. However, it strikes me a little odd, as your obviously very
speed concerned.


I declared many inline functions, but there are small functions.
I admit that the combinaison of many functions can become quite big.
But as far as I know the 'O2' or 'O3' options consider any function
(inline or not) as potentially inlinable.
GCC might be good for C programs, but not for generic C++ (in comparison
to ICL).
If you think that my program is redundant (or bad), contributions are
welcome, but not critics.
You could have a look to MTL,Pooma,Blitz++,Newmat: these libraries use
equivalent princips.
My philosophy was never to adapt my code to the compilers, I want to have
my code as clean as possible.


And your philosophy is a good one, I absolutely agree on that. However, if
you really wanna drive it to the edge then you will have to resort to some
compiler specific things at some time. The aforementioned libs make
extensive use of meta-template programming, which is why they are fast. If I
remember correctly there should be a FFT implementation based on this
technique in Tod's Blitz++.

Anyway, first of all relax & take a depp breath. Second, if you put your
code on the web and make it public you will have to face criticism (it
happens and happened to the best), but you should realize that criticism is
not necessarily a bad thing! Speaking of this I never said that your code is
bad, but I said that after a quick glance I saw that there are redundant
parts and I'll stick to this statement.

In FFT for example you declare variables of value_type very often just to
assign a value of an array to it. Afterwards you call a new function with
the sum or the difference of these newly declared variables. Why not just
use the array itself? Of course, the compiler can and most certainly will
optimize this away after data-flow analysis, but you're making the life of
the compiler harder than necessary.

Regarding potential inlining candidates I'll have to clarify some things
here. The compiler sees every function which is explicitly declared inline
or defined within the class statement as a potential inline candidate. But,
this is only true if certain requirements are fullfilled and here is where
the problem is hidden and numerous issues are to be considered. I'll only
cover a few to give you an idea:

# Virtual functions:
Virtual functions were said to be out of the game regarding inlining,
however this is not necessarily true. The important thing is that for
virtual functions polymorphism must work - so this is the main condition.
However, if the compiler has a way to figure out the actual type of the
object which will take care of the function call, then even virtual
functions can be inlined. In some cases this is easy, whereas in others this
proves quite tricky and most compilers won't go into detailed analysis here.

# Large functions:
Large functions can very often result in cache misses and are thus not a
good candidate for inlining. In your code you declare quite a huge amount of
large functions as inlined. This won't have any effect as the compiler is
free to choose and thus, in most cases, will choose not to inline. The same
is true for recursive function calls of inlined functions or inlined
functions calling other functions. This is a construct you see quite often
in your implementation.
If you really want to inline large functions and recursive function calls
etc. then you should resort to meta-templates.

[SNIP]

Cheers
Chris
Jul 23 '05 #28

P: n/a
> And your philosophy is a good one, I absolutely agree on that. However, if
you really wanna drive it to the edge then you will have to resort to some
compiler specific things at some time. The aforementioned libs make
extensive use of meta-template programming, which is why they are fast. If
I remember correctly there should be a FFT implementation based on this
technique in Tod's Blitz++.

Anyway, first of all relax & take a depp breath. Second, if you put your
code on the web and make it public you will have to face criticism (it
happens and happened to the best), but you should realize that criticism
is not necessarily a bad thing! Speaking of this I never said that your
code is bad, but I said that after a quick glance I saw that there are
redundant parts and I'll stick to this statement.

In FFT for example you declare variables of value_type very often just to
assign a value of an array to it. Afterwards you call a new function with
the sum or the difference of these newly declared variables. Why not just
use the array itself? Of course, the compiler can and most certainly will
optimize this away after data-flow analysis, but you're making the life of
the compiler harder than necessary.
My FFT is somewhat quicker than FFTW (both compiled with ICL on Window, made
many speed tests with or without SIMD).
My FFT code is much smaller than FFTW, the only problem is that the
requirement to the compiler is much bigger.
That is where GCC and ICC fail (until yet).

I accept constructive critric, but since the beginning, you critic my
programming style without understanding the reason why I programmed it so.
Here I just cite you from memory
-GCC inlines very well
-virtual functions bla bla bla
-to many inline in the library
-code redundant.
-...

So just know that your remark about all the variables for the FFT is totaly
unfounded.
If you did understood the code, you would have understood that 'value_type'
is a very fluctuant type that can describle complex<float>, complex<double>,
SIMD registers, and other various types.
To store them in a variable make the program far much quicker because the
compiler can store them in a register once for all.
Only using the array itself results in a catastrofic loss of time because
the compilers are not good enough to understand it, and the progam lose its
time with read/write accesses to the memory. I know this effect, and I could
debat many hours. You obviously did not even make a try.
So I help very much the compiler not the opposite.
By the way FFTW uses this technic too.

I know the effect of large functions.
By the way in my FFT, large function instances are only called once but
always in a loop.

Don't tell me how virtual functions work, I know it and it is not the point.
Did you see dynamic polymorphism in my FFT? No.

So please don't say that my code is redundant, but say "I don't understand
this and this" or "I would have done this differently".
You should learn a little bit modesty, but instead of this you say: We frequently use it here at CERN, also for speed sensitive applications &
simulations

I honestly do not feel at ease about your speed sensitive applications...

My question was "how force inlining", not "Should I inline".
Until yet, you did not tell a single thing that would help me a little bit.

Cheers
Patrick
Jul 23 '05 #29

P: n/a
Patrick Laurent wrote:
My question was "how force inlining", not "Should I inline".

Did you ask in a GCC mailing list?

--
Ioannis Vranos

http://www23.brinkster.com/noicys
Jul 23 '05 #30

P: n/a
Patrick Laurent wrote:
[SNIP stuff that I won't start arguing about as it seems to be senseless
anyway]

I honestly do not feel at ease about your speed sensitive applications...

My question was "how force inlining", not "Should I inline".
Until yet, you did not tell a single thing that would help me a little bit.


I'm not gonna go into an argument regarding your coding style and
whatever you think I understand or I don't. However, you seem to be
rather ignorant about all the comments that some people (very
experienced ones like Ioannis and others) here gave you (like see a GCC
group, which was mentioned not only once!). Therefore, I'll just resort
to answering your original question again:

Read the standard! - Facit: You cannot force inlining - end of the story.

Regarding the issue of modesty - you might treat yourself to
reconsidering your attitude towards people who are spending time to help
you.

Best regards
Chris
Jul 23 '05 #31

P: n/a
> Did you ask in a GCC mailing list?

I did at gcc-help.
I was adviced by someone in charge, to post my problem as a bug.
It is not really a bug, nevertheless I have posted my problem for 2 days.
I am still waiting an answer...

Pat
Jul 23 '05 #32

P: n/a
Patrick Laurent wrote:

[snip a lot of ranting]

My question was "how force inlining", not "Should I inline".
Then the answer is:
There is no way to 'force the compiler to inline'.

If the capabilities of one compiler are not good enough
for you, you are free to choose a different compiler.
Until yet, you did not tell a single thing that would help me a little bit.


Good luck with that altitude. After all it is *your* problem.

--
Karl Heinz Buchegger
kb******@gascad.at
Jul 23 '05 #33

P: n/a

"Patrick Laurent" <la*****@ient.rwth-aachen.de> wrote in message
news:3f************@news.dfncis.de...
Did you ask in a GCC mailing list?


I did at gcc-help.
I was adviced by someone in charge, to post my problem as a bug.
It is not really a bug, nevertheless I have posted my problem for 2 days.
I am still waiting an answer...


What do you expect? Do you really think that GCC folks are just waiting to
look into your problem - probably they have the same attitude towards
criticism (note that it is not spelled critic, which is something completely
different) like you. Patience is a virtue.

Chris
Jul 23 '05 #34

P: n/a
Patrick Laurent wrote:
I did at gcc-help.
I was adviced by someone in charge, to post my problem as a bug.
It is not really a bug, nevertheless I have posted my problem for 2 days.
I am still waiting an answer...

OK, I would also try the plain "gcc" mailing list. If their answer doesn't satisfy you,
then perhaps you should pick another compiler.

--
Ioannis Vranos

http://www23.brinkster.com/noicys
Jul 23 '05 #35

P: n/a
> OK, I would also try the plain "gcc" mailing list. If their answer doesn't
satisfy you, then perhaps you should pick another compiler.


I do, I use ICL on Windows.
But I want other people to use my library on other systems.
They can, but a similar speed to ICL is in some cases not available.

Pat
Jul 23 '05 #36

P: n/a
Patrick Laurent wrote:
I do, I use ICL on Windows.
But I want other people to use my library on other systems.
They can, but a similar speed to ICL is in some cases not available.

Well if it compiles, they can use it. You can't make much for compiler deficiencies (if
there is one in this case).

If by ICL you mean Intel C++ compiler, then if I recall well, there is also a Linux
version of it.
There are also two other ways to inline. Using macros, or convert your run-time
computations to compile-time (template meta-programming). Both are an entire world of
their own (for advanced uses). Or use assembly, but that one is not portable. :-)

--
Ioannis Vranos

http://www23.brinkster.com/noicys
Jul 23 '05 #37

P: n/a
> If by ICL you mean Intel C++ compiler, then if I recall well, there is
also a Linux version of it. Yes, Intel C++ Compiler for Linux, is called ICC.
But ICC is as slow as GCC (in comparison to ICL), I don't know why ICC does
not inline like ICL.
There are also two other ways to inline. Using macros, or convert your
run-time computations to compile-time (template meta-programming). Both
are an entire world of their own (for advanced uses). Or use assembly, but
that one is not portable. :-)

I can exclude the macro suggestion.
I can exclude assembly too!
I think I know what you mean with 'meta programming'. I can exclude it too.
I use meta programming for type handling. I cannot imagine any way to
program a FFT, or any other signal processing functions with meta
programming. It seems to be theoretically possible, but anyway I think the
compilers are not quite ready yet.

Pat

Jul 23 '05 #38

P: n/a
"Patrick Laurent" <la*****@ient.rwth-aachen.de> wrote in message news:3f************@news.dfncis.de...
[...]
There are also two other ways to inline. Using macros, or convert your
run-time computations to compile-time (template meta-programming). Both
are an entire world of their own (for advanced uses). Or use assembly, but
that one is not portable. :-) I can exclude the macro suggestion.


On what grounds?
I can exclude assembly too!
On what grounds?
I think I know what you mean with 'meta programming'. I can exclude it too.
On what grounds?
I use meta programming for type handling. I cannot imagine any way to
program a FFT, or any other signal processing functions with meta
programming.
Because you can't imagine any way doesn't mean it's not possible.
It seems to be theoretically possible,
.... so you *can* imagine a way to do it, then ...?
but anyway I think the
compilers are not quite ready yet.


How do you know that?

Patrick: Ioannis has offered a range of (to my mind reasonable and helpful) suggestions which you have summarily trashed
without any hint of justification. How do we help you? You seem to be after some magic quick fix - has it dawned on you
yet that perhaps there isn't one?

--
Lionel B

Jul 23 '05 #39

P: n/a
Patrick Laurent wrote:
I can exclude the macro suggestion.

Actually this is what I would begin with. You could make macro functions (any effort to
force a compiler with source code, is going to be ugly anyway).
--
Ioannis Vranos

http://www23.brinkster.com/noicys
Jul 23 '05 #40

P: n/a
> There are also two other ways to inline. Using macros, or convert your
> run-time computations to compile-time (template meta-programming). Both
> are an entire world of their own (for advanced uses). Or use assembly,
> but
> that one is not portable. :-) I can exclude the macro suggestion.


On what grounds?


Because:
-I don't know good assembly
-I want a partable solution
-no suggestion is feasible in an reasonable time (it tends towards infinity)
Because you can't imagine any way doesn't mean it's not possible. Can you or have you programmed signal processing with meta-programming? I
have not.
Could you give me internet pages, where meta-programming was succesful used
for signal processing or for any mathematical applications?
It seems to be theoretically possible,

... so you *can* imagine a way to do it, then ...?
but anyway I think the compilers are not quite ready yet.

How do you know that?

I said 'I think', I could have said 'I suppose', or 'it seems'. I did not
say 'I know'. I am not good in english. If 'to think' means 'to know
something for sure', then I used the wrong verb. Did I?

I use a big template depth (reached once 36), and compilers already have
some difficulties.
GCC is for example far away from ICL to optimized my code. ICL is a clear
winner.
So meta-programming for FFT seems to be an utopie.
Patrick: Ioannis has offered a range of (to my mind reasonable and
helpful) suggestions which you have summarily trashed
without any hint of justification. How do we help you? You seem to be
after some magic quick fix - has it dawned on you
yet that perhaps there isn't one?

It seems indeed there is no magic fix. I don't wan't to reprogram everything
(in C or whatever) just to make GCC happy.
I asked on this topic if it is possible to bring GCC at the same speed as
ICL for my case, it seems it is not.

I close the debate
Thanks to everyone.
Patrick
Jul 23 '05 #41

P: n/a

"Patrick Laurent" <la*****@ient.rwth-aachen.de> wrote in message
news:3f************@news.dfncis.de...
> There are also two other ways to inline. Using macros, or convert your
> run-time computations to compile-time (template meta-programming).
> Both
> are an entire world of their own (for advanced uses). Or use assembly,
> but
> that one is not portable. :-)
I can exclude the macro suggestion.
On what grounds?


Because:
-I don't know good assembly


This actually puts the comments you had on registers & optimization etc. in
a very interesting perspective...
-I want a partable solution
No, you certainly do not want a partable solution, but a portable solution
(note the spelling)
-no suggestion is feasible in an reasonable time (it tends towards
infinity)
On what grounds do you base this statement?
Because you can't imagine any way doesn't mean it's not possible. Can you or have you programmed signal processing with meta-programming? I
have not.


It's amazing that there are things that you have not yet seen ;-)
Could you give me internet pages, where meta-programming was succesful
used for signal processing or for any mathematical applications?
Yes, I can & I already did but you obviously did not read my postings
properly.
It seems to be theoretically possible, ... so you *can* imagine a way to do it, then ...?
but anyway I think the compilers are not quite ready yet.

How do you know that?

I said 'I think', I could have said 'I suppose', or 'it seems'. I did not
say 'I know'. I am not good in english. If 'to think' means 'to know
something for sure', then I used the wrong verb. Did I?

I use a big template depth (reached once 36), and compilers already have
some difficulties.
GCC is for example far away from ICL to optimized my code. ICL is a clear
winner.
So meta-programming for FFT seems to be an utopie.


Wrong.
Patrick: Ioannis has offered a range of (to my mind reasonable and
helpful) suggestions which you have summarily trashed
without any hint of justification. How do we help you? You seem to be
after some magic quick fix - has it dawned on you
yet that perhaps there isn't one?

It seems indeed there is no magic fix. I don't wan't to reprogram
everything (in C or whatever) just to make GCC happy.
I asked on this topic if it is possible to bring GCC at the same speed as
ICL for my case, it seems it is not.


You were told in the beginning that there is no magic fix for this, because
that is clearly what the language specification states about inlining. I'd
suggest that you go on and do some research about meta-templates, compiler
techniques, the inlining concept and optimization techniques. These are
conecepts on which you obviously have to catch up.

Cheers
Chris

Jul 23 '05 #42

P: n/a
Chris, you do understand nothing.
Words, words, words, only words.
If you would have undestood one percent of my library, you would have seen,
that it is programmed with meta-programming.
What you call meta-programing, I call it generic C++.

You don't need to laugh at my english or misspelling of some words.
Learn Modesty!

Ich habe schon deine Programmierkenntnisse festgestellt, du blamierst dich
nur.

Höre auf jedes Forum durchzuwühlen, habet ihr doch im CERN nichts Besseres
zu tun?

Du solltest dir mit deiner Elektronen und anderen Higgs Bosons beschäftigen.

(Deutsch ist übrigens auch nicht meine Muttersprache)

Auf (nicht) Wiedersehen

Patrick
Jul 23 '05 #43

P: n/a
"Patrick Laurent" <la*****@ient.rwth-aachen.de> wrote in message
news:3f************@news.dfncis.de...
Chris, you do understand nothing.
Words, words, words, only words.
If you would have undestood one percent of my library, you would have
seen, that it is programmed with meta-programming.
What you call meta-programing, I call it generic C++.

You don't need to laugh at my english or misspelling of some words.
Learn Modesty!

Ich habe schon deine Programmierkenntnisse festgestellt, du blamierst dich
nur.

Höre auf jedes Forum durchzuwühlen, habet ihr doch im CERN nichts Besseres
zu tun?

Du solltest dir mit deiner Elektronen und anderen Higgs Bosons
beschäftigen.

(Deutsch ist übrigens auch nicht meine Muttersprache)

Auf (nicht) Wiedersehen

Patrick


Relax & take a deep breath. I certainly have better things to do than
wasting more time on this. Sorry for trying to help and I wish you good
luck!

Chris
Jul 23 '05 #44

This discussion thread is closed

Replies have been disabled for this discussion.