On May 12, 9:35 am, llothar <llot...@web.de wrote:
Does anybody have some benchmarks or links to articles that compare
this for different compiler implementations ?
PGO is usually best for runtime feedback on branch prediction
statistics. The compiler can then use the hinted branch instructions,
or flip the sense of the branch so it tends to be fall through more of
the time (this is better on the decoders and trace cache.) However,
this really tended to make more of a difference with the deeply
pipelined P4s than he relatively shorter pipeline Athlon/Opteron and
Core architectures.
I would especially like to see if it is usefull on MSVC, Intel 9.0 C
and gcc. Also what is about the effect of "interprocedura l optimization".
I don't remember. I usually just turned it on and saw no difference.
But that's because my code tends to lean on inner loops, not call
overhead.
All my use cases are 98% integer performance dominated. Currently i
only use -O2 or -O3 for MSVC and gcc but i would really like to now if
it is worth to spend time on optimization (which means that i would
see a 20% improvement by this two kinds of optimizations).
Truly integer limited? As in cryptography or something of that
nature? If so, then your best bet is to try for SIMD or just general
parallelism. If that doesn't buy you anything, then there's not much
you can do with the "micro-optimization" angle.
--
Paul Hsieh
http://www.pobox.com/~qed/ http://bstring.sf.net/