On Mar 18, 3:14 pm, santosh <santosh....@gm ail.comwrote:
I wonder how much suboptimal register allocation would matter with
modern processors with huge L1 and L2 caches.
The size of the L1 cache doesn't make any difference to this problem,
and the L2 cache most certainly doesn't.
A typical x86 system can perform three or four micro operations per
cycle. Operations between registers are one micro operation;
operations using one memory variable use at least three micro
operations (calculate address, load, process) which means throughput
is reduced by a factor of three, you get much bigger latencies which
is a killer with long dependency chains, you run out of resources for
checking memory dependencies, so all in all using memory instead of
registers is awfully bad for your code's performance.