Tim Prince wrote:
mdh wrote:
>The 3rd paragraph says: "Alignment requirements can generally be
satisfied easily, at the cost of some wasted space, by ensuring that
the allocator always return a poiner that meets all ( italicized)
alignment restrictions"
I have looked at the threads about alignment in this group, and most
**assume** that the writer/replies understand fully what alignment
entails. There are some quite good explanations on the web, but none
really capture for me, at least, the essence of what alignment does,
and what can go wrong if alignment is not adhered to.
So, could anyone very very briefly explain the essence of alignment
and why the C programmer should worry about it and why it seems to
come up in structs and pointers..(if this last fact is indeed
true). :-).
Thanks as usual.
No, this is not a brief subject. I'll try to compromise.
Most platforms have either a performance penalty or throw an address
fault upon misaligned access. Aligned addresses are those which are a
multiple of the size of the object (16 bytes for 128-bit parallel
objects on several platforms).
We know (most of us, anyway) what is meant, but what is
said is wrong. For example, this explanation implies that
a `char array[137];', since it is 137 bytes in size, must be
aligned on an address divisible by 137.
"Alignment requirements" are C's recognition of, or perhaps
concession to, some realities of the ways computers are built.
Most machines view memory as a big array of N bytes, with numeric
addresses running from 0 through N-1. A size-1 object like a
char can reside at any of these addresses, and the machine can
fetch, inspect, and store the char value at any of them.
But when it comes to multi-byte objects, machines are often
less flexible. For example, on a machine whose ints are four
bytes long, it may turn out that an int value in memory can only
be fetched and stored from an address divisible by four. That is,
the instructions that move ints to and from memory might view that
memory not as an array of bytes numbered 0 through N-1, but as an
array of ints numbered 0 through N/4-1. If so, then the first few
ints are the 0th, 1st, 2nd, 3rd, which correspond to starting byte
addresses of 0, 4, 8, 12. That is, the machine's view of memory
as an array of four-byte ints implies that an int can only start
on byte address divisible by four.
Why this artificial view? Because it can streamline the
hardware. For example, consider a machine whose memory management
unit naturally divides memory into "pages" of, say, 8KB, each with
its own access protections and other attributes. When the CPU tries
to fetch something from or store something to this memory, it must
check that the memory is mapped and grants the appropriate access
permissions. If a four-byte int necessarily starts at an address
divisible by four, it's easy to see that any such int lies entirely
in one memory page, so the CPU need only check the accessibility of
one of its bytes: the other three will necessarily be the same. But
if an int could start just anywhere, it could start one byte before
the end of a page with the other three protruding into the following
page: two access tests take more time and/or hardware than one, so
CPU designers prefer to avoid making them.
Getting back to Tim Prince's mis-statement: If we view memory
as an indexed array of bytes, we can talk about the divisibility
of a particular byte's index by this or that number. The alignment
requirement for a type T is a number Tn such that a T instance can
start at any byte whose index is divisible by Tn. By considering
the way arrays work in C, we can see that Tn must be a divisor of
sizeof(T), not a multiple thereof as Tim suggests. Thus, if an
int is four bytes long, one machine might require that it start
on a byte address divisible by 4, another might permit any even
address, and yet another might require only divisibility by 1:
any address at all will do. But no machine will require that a
four-byte int start on an address divisible by 3, or 6, or 8.
Similarly, a twelve-byte long double might require 1-, 2-, 3-, 4-,
6-, or 12-byte alignment, but will never need alignment on a byte
address that is a multiple of 7 or 24.
Finally, a word about the "necessity" of obeying alignment
requirements. When an instruction that expects Tn-byte alignment
encounters an operand that is not properly aligned, different
machines respond in different ways. The C Standard does not
prescribe any particular behavior, so this list of outcomes that
I have actually encountered may not be exhaustive:
1) The hardware simply pretends that the operand is aligned
and accesses a properly-aligned address near the one
actually specified. For example, a four-byte access to
address 0x87654321 might behave as if 0x87654320 had
been specified instead. This behavior is very fast, but
usually unwelcome.
2) The hardware detects the misalignment and generates some
kind of trap or fault, which the operating environment
then turns into a signal (or something like it) that
usually terminates the program abnormally. This behavior
is almost always unwelcome.
3) The hardware detects the misalignment and generates a trap
as above, but software in the trap handler figures out what
was attempted and then emulates it, using multiple memory
accesses along with shifting and masking to get at the
desired bytes. This usually produces a slowdown of between
a hundred- to a thousand-fold; that is, your 3GHz machine
runs at an effective speed of 3-30MHz.
4) The hardware detects the misalignment but does a hardware-
assisted fixup, in the style of (3) but with much greater
speed. This usually produces a slowdown of two- to three-
fold; your 3GHz machine runs at 1-1.5GHz.
.... and, as I mentioned, the C Standard does not limit the
outcome to any of these; as far as the Standard is concerned,
accessing a misaligned object may cause demons to fly out of
your misaligned nose.
--
Eric Sosman
es*****@ieee-dot-org.invalid