In article <re**********************************@news.superne ws.com>
Tim Smith <re************@mouse-potato.comwrote:
>I seem to have lost my copy of the ANSI C standard. Is this code legal?
Indeed, in the case you describe, it is not:
/* p points somewhere in a string that is known to contain
an 'a' at or before where p points. We want to change
this 'a' to a 'b'. */
while ( *p-- != 'a' )
;
*++p = 'b';
If the only 'a' at or before the initial place p points to is the first
character of the string, the last p-- will be doing -- on a pointer to
the first character of the string, but it is then incremented with ++
before being used.
This is technically undefined, although it tends to work in practice
on "real systems".
You can re-code the above as:
while (*p != 'a')
p--;
*p = 'b';
to remove the bug.
>Another example. Again, p points to a string, this time known to
contain an 'a', but this time at or after where p points. To change it
to a 'b':
--p;
while ( *++p != 'a' )
;
*p = 'b';
If p started at the beginning of the string, this would --p to before
the string, then immediately ++p back into the string.
Again, technically illegal.
>I seem to remember that you can't take the address of elements of an
array before the 0th element, but I don't recall if using -- on a
pointer counts as taking the address of something.
It does.
The second example can be recoded as the obvious:
while (*p != 'a')
p++;
*p = 'b';
or the less-obvious (but still legal, provided there is some 'a' in
range):
while (*p++ != 'a')
continue;
*--p = 'b';
This second one is legal, despite going "one past the end" in some
cases, because "going one past the end" is explicitly allowed, so
that loops like:
for (p = arr, i = 0; i < n; p++, i++)
... operate on *p ...
remained defined in C89. On those rare "real systems" on which
out of bounds pointer arithmetic is actually trapped, the cost of
making "one past the end" work is generally one byte (or one machine
word) of extra storage at the end of a segment, while the cost of
making "one before the start" work would have been "as many bytes
(or machine words) as needed based on sizeof *p".
(This falls out naturally since, if sizeof *p is 1000, "p++" turns
into "add #1000,reg" and "p--" turns into "sub #1000,reg", all
assuming the machine works in C-bytes natively of course. When
reg is near the end of the segment, pointing to the last valid
object, adding 1000 puts it one byte past the last valid object --
so the implementation has to sneak in an extra "pad" byte at the
end of the segment -- but when reg is right at the beginning,
pointing to the first valid object, subtracting 1000 puts it 1000
bytes before the start, so the implementation would have had to
insert 1000 pad bytes at the front of the segment.)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it
http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.