>Steve Kobes wrote:
Is this legal? Must it print 4?
int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
printf("%d\n", *(b + 3));
In article <news:2r******* ******@uni-berlin.de>
Al Bowers <xa******@rapid sys.com> wrote:Yes it is legal and will print 4.
I believe there are those in comp.std.c, at least, who will disagree
with you.
This array describes a contiguously allocated nonempty set of objects of
type int.
b = a[0] positions the int *b to the beginning of the array object as
would b = &a[0][0].
In particular, "b" points to the first of the two consecutive "int"s
{1, 2}. These two consecutive "int"s are indeed followed immediately
in memory by the two consecutive "int"s {3, 4} -- but a compiler
is allowed, through arbitrarily magical means, to insert bounds-checking
on array subscripts and pointer arithmetic. (Remember that these
two are quite similar: subscripting is simply address arithmetic
followed by pointer indirection, and pointer indirection is simply
subscripting -- *p and p[0] are identical.[%] Note that we are allowed
to compute &array[N] but not follow the resulting pointer, so there
is a slight hitch in using the same code for both -- you also need
a boolean flag "this address arithmetic will be followed by
indirection, or not", to handle this in any bounds-checking code.)
Suppose that the compiler does in fact do this magical bounds-checking,
so that it "knows" that b points to the first of only *two* ints stored
in the array a[0]. In this case, b[0] and b[1] are allowed, but b[2]
is rejected with an "out of bounds array subscript" error.
(Personally, I find it all quite unlikely, but it does seem to be
allowed. I also think that, if one writes a compiler with the
decidedly non-magical bounds-checking done via either "fat pointers"
or some kind of runtime table lookups, the bounds for b should run
from 0 to 3 inclusive, in this case, rather than 0 to 1. But these
are my opinions of "the way it ought to work" rather than "what
the letter of the C standard says".)
In practice, I do not believe anyone has yet found an implementation
where b[3] (or equivalently, *(b + 3)) is caught as an error. The
most likely place to find this kind of runtime[%%] checking is in
C compilers for Lisp machines, and I have not had access to one of
those in years.
[% As a result, you can always replace (*p) with p[0], including
in expressions like (*p).field, which is more usually written as
p->field. Hence the -> operator is replaceable with "[0].", e.g.,
instead of:
if (a->b->c == 42)
you can write:
if (a[0].b[0].c == 42)
This example just goes to show that C has a plethora of operators. :-)]
[%% This particular case can be caught at compile time, but in
general one would want to test the subscript in b[i], or the offset
in (b+i), at runtime, both because "i" might be difficult or
impossible to predict, and also because "b" might also be difficult
or impossible to predict:
int a1[10], a2[2];
int *b;
...
if (user_input == 42)
b = a1;
else if (user_input == 6 * 9)
b = a2;
...
do_something(b[3]);
Here, a general-purpose bounds-checker might do something vaguely
like this:
struct bounds *bp;
bp = bounds_lookup(b );
if (bp == NULL)
__runtime_error ("invalid pointer");
if (bp->upper_limit < 3)
__runtime_error ("out of bounds array subscript");
tmp = b[3];
do_something(tm p);
to handle the call to do_something(). The bounds_lookup() function
compares the supplied pointer value against a table of all active
arrays -- all "global variables" that have array type, all outstanding
malloc()s, and all local variables that are still live on the current
stack frame, for instance -- and finds the corresponding array
bounds information. We can do this without concerning ourselves
with ordinary simple pointers (&localvar, where localvar is *not*
an array) because we have a nonzero subscript. If this were using
b[i] the code would have to do:
if (i != 0) {
struct bounds *bp;
... as before ...
} else
tmp = *b; /* cannot do bounds checking on zero subscripts */
do_something(tm p);
becaues each ordinary variable "acts like" an array of size 0.
Alternatively, bounds_lookup() might actually have bounds-table
information for *all* variables whose address is taken, instead
of just all arrays.
The bounds-checking above needs to handle pointer arithmetic as
well, so that if we do "b++", the upper limit is reduced by one
and the lower limit goes from 0 to -1. (There are other methods
for handling this that are probably more practical in real
bounds-checkers; this version is just for illustration.)
Bounds-checking tends to slow down programs quite a bit, and is an
area in which good compiler-level optimization is important. If
a loop is guaranteed to run from 0 to N-1 inclusive, a single
bounds-check on N-1 before entering the loop at all replaces many
runtime tests. If the bounds-check can be done at compile time,
then omitted entirely from the runtime code, that is even better.]
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it
http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.