PLS wrote:
So I'm back to my original question. Why were 32 bit registers used?
Compilers do that, because modern CPUs can work many times faster with
full word-sized registers. I would just use unsigned instead of unsigned
short. It doesn't matter if it overflows and gets cropped inside the
register, or doesn't overflow but you trim it. All you care about is
that after the 32-bit sum is calculated, you clear the upper 16 bits:
unsigned sum = 0;
while(...)
{
sum += *p++;
}
unsigned short result = static_cast<unsigned short>(sum); // trim
This will run much faster than the unsigned short arithmetics, and
you'll get the same result anyway.
You can do the static_cast trimming every time you need to ensure that
the value is strictly 16-bit -- which is only needed after summing, in
your case. Who cares what's in the upper bits when it doesn't affect the
lower ones? Just be careful, before you shift (>>), you need to clear
the upper bits out.
You would be surprised what a struggle it is for the CPU to work with
16-bit registers. The ALU is not able to perform 16-bit operations. If
you force the processor to do that, it has no choice but to insert
several micro instructions that normally wouldn't be necessary. For
example, the lower 16 bits must be zero or sign-extend into 32 bits for
the ALU to accept it, then chopped off after the operation is complete,
and masked back to the lower 16 bits of the register (because an
operation with the ax register doesn't corrupt the upper 16 bits of
eax). We're talking about many times more micro instructions than
normal, which could even cause pipeline penalties on top of the wasted
clocks.
Tom