Jackie wrote:
Hi everyone,
Does anyone know when "register" declarations should be used and when
"register" must not be used? If possible please give examples for both
cases.
Thanks
The register keyword should be used as a last resort when
optimizing a function; and it is platform dependent as to
whether it will be efficient or not. By the way, the
"register" keyword is only a hint to the compiler. The
compiler is allow to ignore your suggestion.
Given a program fragment to sum array locations:
#define ARRAY_SIZE 102400
unsigned int array[ARRAY_SIZE];
// Assume array has been filled with valid integers.
unsigned int Array_Sum(unsigned int * p_array,
unsigned int quantity)
{
unsigned int i;
unsigned int sum = 0;
for (i = 0; i < quantity; ++i)
{
sum += p_array[i];
}
return sum;
}
In the above function, Array_Sum, the array is
accessed one location at a time. In generic
computer theory, accessing many memory locations
at once is faster than many accesses to many
memory locations.
Here is the same function, but applying the
above idea and loop unrolling:
unsigned int Array_Sum(unsigned int * p_array,
unsigned int quantity)
{
unsigned int i;
unsigned int sum = 0;
/* declare some temporary values to use the
processors "extra" registers */
register unsigned int r1, r2, r3, r4;
#define ACCESSES_PER_LOOP 4
for (i = 0;
i + ACCESSES_PER_LOOP < quantity;
i += ACCESSES_PER_LOOP)
{
/* Read from memory into several registers */
r1 = p_array[i + 0];
r2 = p_array[i + 1];
r3 = p_array[i + 2];
r4 = p_array[i + 3];
/* Calculate sum using registers */
sum += r1;
sum += r2;
sum += r3;
sum += r4;
}
/* Sum up remaining numbers */
for (; i < quantity; ++i)
{
sum += p_array[i];
}
return sum;
}
Hopefully, the compiler is smart enough to recognize the
reading pattern and use specialized processor instructions.
The loop unrolling serves two purposes: 1) More data
processing instructions are exected per branch instructions;
and 2) Allows for multiple fetches from memory at one time.
Whether or not the compiler or process takes advantage of
2), depends on the compiler and the platform. This solution
works nice on an ARM7 or ARM9 processor that has a
special instruction for loading multiple registers from
memory with one instruction.
So the "register" keyword along with loop unrolling
help form an optimization pattern. A good compiler will
take the hints and generate appropriate code. If the
compiler doesn't generate the optimized code, there is
always the option of writing the code in assembly language.
--
Thomas Matthews
C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq:
http://www.parashift.com/c++-faq-lite
C Faq:
http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.comeaucomputing.com/learn/faq/
Other sites:
http://www.josuttis.com -- C++ STL Library book