Out-of-bounds nonsense

Frederick Gotham

[ This post deals with both C and C++, but does not alienate either
language because the language feature being discussed is common to both
languages. ]

Over on comp.lang.c, we've been discussing the accessing of array elements
via subscript indices which may appear to be out of range. In particular,
accesses similar to the following:

int arr[2][2];

arr[0][3] = 7;

Both the C Standard and the C++ Standard necessitate that the four int's be
lain out in memory in ascending order with no padding in between, i.e.:

(best viewed with a monowidth font)

--------------------------------
| Memory Address | Object |
--------------------------------
| 0 | arr[0][0] |
| 1 | arr[0][1] |
| 2 | arr[1][0] |
| 3 | arr[1][1] |
--------------------------------

One can see plainly that there should be no problem with the little snippet
above because arr[0][3] should be the same as arr[1][1], but I've had
people over on comp.lang.c telling me that the behaviour of the snippet is
undefined because of an "out of bounds" array access. They've even backed
this up with a quote from the C Standard:

J.2 Undefined behavior:
The behavior is undefined in the following circumstances:
[...]
- An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).

Are the same claims of undefined behaviour existing in C++ made by anyone?

If it is claimed that the snippet's behaviour is undefined because the
second subscript index is out of range of the dimension, then this
rationale can be brought into doubt by the following breakdown. First let's
look at the expression statement:

arr[0][3] = 9;

The compiler, both in C and in C++, must interpret this as:

*( *(arr+0) + 3 ) = 9;

In the inner-most set of parentheses, "arr" decays to a pointer to its
first element, i.e. an R-value of the type int(*)[2]. The value 0 is then
added to this address, which has no effect. The address is then
dereferenced, yielding an L-value of the type int[2]. This expression then
decays to a pointer to its first element, yielding an R-value of the type
int*. The value 3 is then added to this address. (In terms of bytes, it's p
+= 3 * sizeof(int)). This address is then dereferenced, yielding an L-value
of the type int. The L-value int is then assigned to.

The only thing that sounds a little dodgy in the above paragraph is that an
L-value of the type int[2] is used as a stepping stone to access an element
whose index is greater than 1 -- but this shouldn't be a problem, because
the L-value decays to a simple R-value int pointer prior to the accessing
of the int object, so any dimension info should be lost by then.

To the C++ programmers: Is the snippet viewed as invoking undefined
behaviour? If so, why?

To the C programmers: How can you rationalise the assertion that it
actually does invoke undefined behaviour?

I'd like to remind both camps that, in other places, we're free to use our
memory however we please (given that it's suitably aligned, of course). For
instance, look at the following. The code is an absolute dog's dinner, but
it should work perfectly on all implementations:

/* Assume the inclusion of all necessary headers */

void Output(int); /* Defined elsewhere */

int main(void)
{
assert( sizeof(double) sizeof(int) );

{ /* Start */

double *p;
int *q;
char unsigned const *pover;
char unsigned const *ptr;

p = malloc(5 * sizeof*p);
q = (int*)p++;
pover = (char unsigned*)(p+4);
ptr = (char unsigned*)p;
p[3] = 2423.234;
*q++ = -9;
do Output(*ptr++);
while (pover != ptr);

return 0;

} /* End */
}

Another thing I would remind both camps of, is that we can access any
memory as if it were simply an array of unsigned char's. That means we can
access an "int[2][2]" as if it were simply an object of the type "char
unsigned[sizeof(int[2][2])]".

The reason I'm writing this is that, at the moment, it sounds like absolute
nonsense to me that the original snippet's behaviour is undefined, and so I
challenge those who support its alleged undefinedness.

I leave you with this:

int arr[2][2];

void *const pv = &arr;

int *const pi = (int*)pv; /* Cast used for C++ programmers! */

pi[3] = 8;

--

Frederick Gotham

Nov 1 '06 #1

Subscribe Post Reply

2448

Keith Thompson

Frederick Gotham <fg*******@SPAM.comwrites:

[ This post deals with both C and C++, but does not alienate either
language because the language feature being discussed is common to both
languages. ]

Over on comp.lang.c, we've been discussing the accessing of array elements
via subscript indices which may appear to be out of range.

[snip]

This was multi-posted to at least two newsgroups, comp.std.c and
comp.lang.c. (Given the content, it may have been posted to one or
more C++ newsgroups as well, but I haven't checked.)

I mention this so that readers will be aware of it when deciding
whether and where to post a followup.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 1 '06 #2

Frederick Gotham

Keith Thompson:

This was multi-posted to at least two newsgroups, comp.std.c and
comp.lang.c. (Given the content, it may have been posted to one or
more C++ newsgroups as well, but I haven't checked.)

I mention this so that readers will be aware of it when deciding
whether and where to post a followup.

I wasn't sure how preferable it was over cross-posting, although I know that
my own newsreader makes a mess of cross-posts (...not to mention I don't
quite understand how they're supposed to work).

I have indeed posted to both C newsgroups and C++ newsgroups.

--

Frederick Gotham

Nov 1 '06 #3

Eric Sosman

Frederick Gotham wrote:

[...] but I've had
people over on comp.lang.c telling me that the behaviour of the snippet is
undefined because of an "out of bounds" array access. They've even backed
this up with a quote from the C Standard:
[...]

Frederick, you are under no obligation to believe. But
if you choose to disbelieve, do the believers the courtesy of
leaving the temple quietly. Any door you like, just stop
the mewling. Please?

"The man convinced against his will
Is of the same opinion still."

--
Eric Sosman
es*****@acm-dot-org.invalid

Nov 1 '06 #4

Richard Heathfield

Frederick Gotham said:

>
[ This post deals with both C and C++, but does not alienate either
language because the language feature being discussed is common to both
languages. ]

The C++ parts are irrelevant here.

Over on comp.lang.c,

Huh? This *is* comp.lang.c.

we've been discussing the accessing of array elements
via subscript indices which may appear to be out of range.

Yes, and it's undefined behaviour, as has been explained more than ad
nauseam.

<snip>

One can see plainly that there should be no problem with the little
snippet above

No, one can plainly see that the behaviour is undefined, and that's a
problem.

Are the same claims of undefined behaviour existing in C++ made by anyone?

Questions about C++ are off-topic here.

<snip>

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Nov 1 '06 #5

Frederick Gotham

Richard Heathfield:

Yes, and it's undefined behaviour, as has been explained more than ad
nauseam.

Not ad nauseam enough.

If the following is well-defined:

int *const p = malloc(5 * sizeof *p);

p[2] = 6;

, then I don't see how my original snippet cannot be.

--

Frederick Gotham

Nov 1 '06 #6

Richard Heathfield

Frederick Gotham said:

Richard Heathfield:

>Yes, and it's undefined behaviour, as has been explained more than ad
nauseam.

Not ad nauseam enough.

Maybe you have a higher nausea threshold than many of us.

If the following is well-defined:

int *const p = malloc(5 * sizeof *p);

p[2] = 6;

, then I don't see how my original snippet cannot be.

That's your problem, not ours. The Standard forbids access outside the
bounds of an array. If you wish to violate that prohibition, that's your
choice but, if you do so, the behaviour of the program is undefined. You
may not like the fact, but the ISO C Standard is not concerned with your
(or my) likes or dislikes.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Nov 1 '06 #7

Flash Gordon

Frederick Gotham wrote:

Richard Heathfield:

>Yes, and it's undefined behaviour, as has been explained more than ad
nauseam.

Not ad nauseam enough.

If the following is well-defined:

int *const p = malloc(5 * sizeof *p);

p[2] = 6;

It's allowed because the standard says it is allowed.

, then I don't see how my original snippet cannot be.

It is undefined behaviour because that is what the standards committee
decided. They even made it clear in one of the annexes (as someone else
pointed out to you), so even if you can't follow the reasoning from the
normative text you can see that it is what the committee intended. If
you cannot accept what the committee clearly state then perhaps you
should write your own language which is defined as you thing it should
be and use that instead of C.
--
Flash Gordon

Nov 1 '06 #8

Pierre Asselin

Frederick Gotham <fg*******@spam.comwrote:

int arr[2][2];
arr[0][3] = 7;

Yep, undefined behavior indeed. Surprising, but that's what the
standard says. Your code may break at the next compiler upgrade.

Both the C Standard and the C++ Standard necessitate that the four int's be
lain out in memory in ascending order with no padding in between, i.e.:

That sounds right, but to write portable code you will need to
express your intent with an explicit cast.

((int * const) arr[0])[3]= 7; /* ugly */
or
{
int * const tmp= arr[0]; /* wordy */
tmp[3]= 7;
}
--
pa at panix dot com

Nov 1 '06 #9

Frederick Gotham

Pierre Asselin:

That sounds right, but to write portable code you will need to
express your intent with an explicit cast.

((int * const) arr[0])[3]= 7;

Firstly, all casts yield an R-value, so the const is redudant. That would
leave us with:

((int*)arr[0])[3] = 7;

Secondly, the cast is redundant, because "arr[0]" decays to a pointer to its
first element, and no cast is required.

Still though, people seem to think it invokes undefined behaviour.

--

Frederick Gotham

Nov 1 '06 #10

Chris Dollin

Frederick Gotham wrote:

Pierre Asselin:

>That sounds right, but to write portable code you will need to
express your intent with an explicit cast.

((int * const) arr[0])[3]= 7;

Firstly, all casts yield an R-value, so the const is redudant. That would
leave us with:

((int*)arr[0])[3] = 7;

Secondly, the cast is redundant, because "arr[0]" decays to a pointer to its
first element, and no cast is required.

Still though, people seem to think it invokes undefined behaviour.

int arr[2][2];
arr[0][3] = 7;

`arr[0]` has type `array[2]int`. Clearly such an object has no
element at index 3. BOOM.

(It doesn't matter that `arr[0]` then decays into a pointer-to-int.
That pointer only points to /2/ ints. That there are more ints
afterward, even that there are /surely/ more ints afterward, doesn't
stop it being undefined. Think of it as the Standard permitting an
implementation to do bounds-checking.)

(Similarly, if the Standard were to say that use of any identifier
ending in `kers` yielded undefined behaviour, then using
`bonkers` or `blinkers` in your code would yeild undefined
behaviour, even if the implementation were unchanged from whatever
it now is. Implementations don't have to go out of their way to
make undefined constructs have bizarre behaviour. Of course the
Standard would never make such a generic constraint on names,
so you don't have to avoid `inkers` or `thankers` or `streakers`
as names in your code ...)

(fx:BOOM)

--
Chris "everyone knows it's flat" Dollin
"We did not have time to find out everything we wanted to know."
- James Blish, /A Clash of Cymbals/

Nov 1 '06 #11

Frederick Gotham

Chris Dollin:

int arr[2][2];
arr[0][3] = 7;

`arr[0]` has type `array[2]int`.

The type in question is written as: int[2]

Clearly such an object has no
element at index 3. BOOM.

No, but it's part of a contiguous sequence of memory.

--

Frederick Gotham

Nov 1 '06 #12

Frederick Gotham

What ever happened to the idea of contiguous memory? When I define the
following object:

int arr[2][2];

, the type of the object "arr" is: int[2][2]

It consists of four int objects which are lain out contiguously in memory.

Therefore, if we take the address of the first int, why can't we add to that
address to yield the addresses of the int's which are directly after it in
contiguous memory? Isn't that one of the fundamental faculties of pointers?

--

Frederick Gotham

Nov 1 '06 #13

Frederick Gotham

Do you think there's anything wrong with the following?

int arr[2][2];

int *p = *arr;

*p++ = 1;
*p++ = 2;
*p++ = 3;
*p++ = 4;

--

Frederick Gotham

Nov 1 '06 #14

Fred Kleinschmidt

"Frederick Gotham" <fg*******@SPAM.comwrote in message
news:2q*******************@news.indigo.ie...

>
[ This post deals with both C and C++, but does not alienate either
language because the language feature being discussed is common to both
languages. ]

Over on comp.lang.c, we've been discussing the accessing of array elements
via subscript indices which may appear to be out of range. In particular,
accesses similar to the following:

int arr[2][2];

arr[0][3] = 7;

<snip>

Frederick Gotham

Consider what happens when you pass this to a function

foo( arr, 2, 2 );

and foo is defined as:

void foo( int **arr, int dim1, int dim2 ) {
/*
* you think this is OK as long as [0][3]
* is inside the bounds of [2][2] ?
*/
arr[0][3] = 7;
}

Now foo can't determine whether you passed a 2D array to it,
or a pointer to a pointer to int.

Now supposes somewhere else I write this code:
int **arr2;
arr2 = malloc( 2 * sizeof (*arr) );
for ( i=0; i < 2; i++ ) {
arr2[i] = malloc( 2 * sizeof(*arr2[i]) );
}
foo( arr2 ) ;

What will happen in foo() ?
--
Fred L. Kleinschmidt
Boeing Associate Technical Fellow
Technical Architect, Software Reuse Project

Nov 1 '06 #15

Frederick Gotham

Fred Kleinschmidt:

Consider what happens when you pass this to a function

foo( arr, 2, 2 );

and foo is defined as:

void foo( int **arr, int dim1, int dim2 ) {
/*
* you think this is OK as long as [0][3]
* is inside the bounds of [2][2] ?
*/
arr[0][3] = 7;
}

Thankfully, there's no implicit conversion from int[2][2] to int**.

It would appear you have confused a multi-dimensional array with an array
of pointers to arrays... ?

Now supposes somewhere else I write this code:
int **arr2;

Here you define a pointer to a pointer to an int.

arr2 = malloc( 2 * sizeof (*arr) );

Here you allocate enough memory for two int pointers.

for ( i=0; i < 2; i++ ) {
arr2[i] = malloc( 2 * sizeof(*arr2[i]) );
}
foo( arr2 ) ;

I think this confirms my suspicion that you're thinking of arrays of
pointers to arrays, rather than multi-dimensional arrays.

Oh, by the way, a multi-dimensonal array is merely an array of arrays.

--

Frederick Gotham

Nov 1 '06 #16

Clever Monkey

Frederick Gotham wrote:

Pierre Asselin:

>That sounds right, but to write portable code you will need to
express your intent with an explicit cast.

((int * const) arr[0])[3]= 7;

Firstly, all casts yield an R-value, so the const is redudant. That would
leave us with:

((int*)arr[0])[3] = 7;

I totally love this word you just created:

redudant
adj 1. More dude than is needed or required; "being that cool is
just redudant, dude"

Nov 1 '06 #17

Richard Heathfield

Frederick Gotham said:

>
Do you think there's anything wrong with the following?

int arr[2][2];

int *p = *arr;

*arr is equivalent to arr[0], which is an array of two int. It is acceptable
for p to point to the first element in this array, so the assignment is
fine.

*p++ = 1;

No problem. Now arr[0][0] has the value 1, and p points to arr[0][1].

*p++ = 2;

No problem. Now arr[0][1] has the value 2, and p points one past the end of
the arr[0] array.

*p

Illegal dereference of p. The behaviour is undefined.

And it will remain undefined, no matter which way you cut it.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Nov 1 '06 #18

Richard Heathfield

Frederick Gotham said:

>
What ever happened to the idea of contiguous memory? When I define the
following object:

int arr[2][2];

, the type of the object "arr" is: int[2][2]

It consists of four int objects which are lain out contiguously in memory.

Therefore, if we take the address of the first int, why can't we add to
that address to yield the addresses of the int's which are directly after
it in contiguous memory?

You can, as long as you don't exceed the bounds of any array.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Nov 1 '06 #19

Jordan Abel

2006-11-01 <qa******************************@bt.com>,
Richard Heathfield wrote:

Frederick Gotham said:

>>
Do you think there's anything wrong with the following?

int arr[2][2];

int *p = *arr;

*arr is equivalent to arr[0], which is an array of two int. It is acceptable
for p to point to the first element in this array, so the assignment is
fine.

> *p++ = 1;

No problem. Now arr[0][0] has the value 1, and p points to arr[0][1].

> *p++ = 2;

No problem. Now arr[0][1] has the value 2, and p points one past the end of
the arr[0] array.

> *p

Illegal dereference of p. The behaviour is undefined.

And it will remain undefined, no matter which way you cut it.

ok. So how about if instead of int *p = *arr; you instead use this:
int *p;

p = (int *)(unsigned char *)arr;
p[0]=0; p[1]=1; /* no problems */
p[2]=2; p[3]=3; /* is this legal? */

/* assuming the above wasn't wrong, or if it was wrong, wasn't executed */
p = (int *)((unsigned char *)arr+2*sizeof(int))
p[0]=2; p[1]=3; /* is this legal? */

Nov 1 '06 #20

Rafael Almeida

On Wed, 01 Nov 2006 02:14:22 GMT
Frederick Gotham <fg*******@SPAM.comwrote:

>
int arr[2][2];

arr[0][3] = 7;

<snip>

One can see plainly that there should be no problem with the little
snippet above because arr[0][3] should be the same as arr[1][1], but
I've had people over on comp.lang.c telling me that the behaviour of
the snippet is undefined because of an "out of bounds" array access.
They've even backed this up with a quote from the C Standard:

J.2 Undefined behavior:
The behavior is undefined in the following circumstances:
[...]
- An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).

As far as I know, it's not guaranteed that

*(*(arr+0)+3)

should work. After all, the *(arr+0) has only two elements, and there's
no guarantees that what would be the first undefined position after
*(arr+0) is the first position of *(arr+1).

Moreover, if someone writes a compiler that might even work when you
write *(*(arr+0)+3), but when you write arr[0][3] it deletes all your
files, it would be standard compliant, but the behaviour of arr[0][3]
would be very different from what you wanted. That's why it's undefined
behaviour.

Nov 1 '06 #21

Default User

Jordan Abel wrote:

ok. So how about if instead of int *p = *arr; you instead use this:
int *p;

p = (int *)(unsigned char *)arr;
p[0]=0; p[1]=1; /* no problems */
p[2]=2; p[3]=3; /* is this legal? */

/* assuming the above wasn't wrong, or if it was wrong, wasn't
executed */ p = (int *)((unsigned char *)arr+2*sizeof(int))
p[0]=2; p[1]=3; /* is this legal? */

You mentioned this before, and I'm not sure. The best I can find in the
standard (c99 draft) is:
[#7] A pointer to an object or incomplete type may be
converted to a pointer to a different object or incomplete
type. If the resulting pointer is not correctly aligned50)
for the pointed-to type, the behavior is undefined.
Otherwise, when converted back again, the result shall
compare equal to the original pointer. When a pointer to an
object is converted to a pointer to a character type, the
result points to the lowest addressed byte of the object.
Successive increments of the result, up to the size of the
object, yield pointers to the remaining bytes of the object.
50)In general, the concept ``correctly aligned'' is
transitive: if a pointer to type A is correctly aligned
for a pointer to type B, which in turn is correctly
aligned for a pointer to type C, then a pointer to type A
is correctly aligned for a pointer to type C.
So the question becomes one of alignment, I think. I'm fairly sure that
it would have to be aligned properly.

Brian

Nov 1 '06 #22

Jordan Abel

2006-11-01 <4q************@individual.net>,
Default User wrote:

So the question becomes one of alignment, I think. I'm fairly sure that
it would have to be aligned properly.

Well, it's guaranteed that it points at a place where an integer is
actually stored, so it would certainly have to be aligned properly

Now, the second half of my question is - what if i skip the explicit
cast to (unsigned char *)? The point being i'm still starting with
a pointer that is to the first member of an array, correctly aligned for
an int, that is 4*sizeof(int) bytes wide, so i'm absolutely sure there's
no _real_ issue here - the question becomes one of whether an
implementation can decide to reject it just to be contrary.

starting with *arr is different because your pointer is then to the
first member of an array of two ints, regardless of the fact that
another identical array follows it in memory. arr[0][2] is right out,
i've been thoroughly convinced of this in comp.std.c

That is, in any "heavy pointer" [as discussed earlier in this thread, or
perhaps in comp.std.c - this is why multiposting is incorrect, by the
way] that you get from 'arr' instead of '*arr', the "how far can it go
before reaching the end" has to be 2*sizeof(int[2]), that is, 2*(2*
sizeof(int)) because it's the pointer-to-the-first-element of an array
of two elements of type int[2], whereas it's conceivable that *arr is
realized as the pointer-to-the-first-element of an array of two ints
(and therefore its "heavy pointer" parameter will be 2*sizeof(int)
instead)

I think that actually, the conclusion that must be reached is that this
is legal: int a[2][2]; int *p=(int *)a; p[2]=2; and this is not: int a[
2][2]; int *p=*a; p[2]=2; despite the fact that it makes more apparent
intuitive sense for a and *a to be equivalent pointers in all but type.

Nov 1 '06 #23

Peter Nilsson

Frederick Gotham wrote:

Over on comp.lang.c, we've been discussing the accessing of array elements
via subscript indices which may appear to be out of range. In particular,
accesses similar to the following:

int arr[2][2];

arr[0][3] = 7;

Both the C Standard and the C++ Standard necessitate that the four int's be
lain out in memory in ascending order with no padding in between

One can see plainly that there should be no problem with the little snippet
above because arr[0][3] should be the same as arr[1][1],

Frederick, in this and other related threads, you appear to be running
two
arguments simultaniously:

1) The behaviour is defined; and
2) The behaviour should be defined.

On point 1, you are wrong and can look up the cited c&v any time you
like.

On point 2, no one is arguing that the concept cannot be made rigorous
and consistent. It's just that no one can see clear benefit to doing
so.
[The struct hack problems were fixed through amended syntax and
semantics that didn't involve legalising out of bounds access.]

In contrast, the _disadvantages_ to the technique are well known and
documented, in particular, buffer overflow problems and optimisation
crimping.

What you suggest is unnecessary. To the programmers that want to do
that kind of thing, the blanket of undefined behaviour covers them as
it
does for many other implementation specific techniques.

--
Peter

Nov 1 '06 #24

Mark McIntyre

On Wed, 01 Nov 2006 15:34:46 GMT, in comp.lang.c , Frederick Gotham
<fg*******@SPAM.comwrote:

>Chris Dollin:

> int arr[2][2];
arr[0][3] = 7;

`arr[0]` has type `array[2]int`.

The type in question is written as: int[2]

Thats how you write it in C, not how you explain what it is.

>Clearly such an object has no
element at index 3. BOOM.

No, but it's part of a contiguous sequence of memory.

So is
struct
{
int x[2];
int y[2];
}bar;

and I'm sure you'd agree that writing to bar.x[3] would be UB./

Whats so hard to understand here?
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Nov 1 '06 #25

Andrey Tarasevich

Frederick Gotham wrote:

...
The only thing that sounds a little dodgy in the above paragraph is that an
L-value of the type int[2] is used as a stepping stone to access an element
whose index is greater than 1 -- but this shouldn't be a problem, because
the L-value decays to a simple R-value int pointer prior to the accessing
of the int object, so any dimension info should be lost by then.
...

That's exactly the point where you are both right and wrong at the same time. As
I said before, attempts to explain from the committee's point of view this have
already been made in the "struct hack" thread and "struct hack"-related defect
reports.

You are saying that "any dimension info should be lost by then". That's not
true. It has been stated here that the intended meaning of the pointer
arithmetic rules given in the standard allows for dimension info to be retained
inside the pointer itself. In other words, when initializing a pointer the
implementation is allowed to store the accessible memory range inside that
pointer. For example, a pointer value produced in this case

int a[100];
int* p = a;

can be internally represented as a range-and-address combination

<0<100<address of a>

and an attempt to perform index arithmetics on this pointer might intentionally
verify the range limitations and fail (produce UB) in out-of-range situations.
The range values can be inherited from pointer to pointer during address
arithmetic operations

int* q = p + 5; /* q is <-5<95<address of a + 5*/

and so on. Needless to say, the very same thing might apply to the implicit
pointer resulting from the implicit array-to-pointer conversion. That's the
reason why your example might fail.

Now, while the above is definitely implementable, the real question is whether
the standard actually allows this kind of (overly restrictive?) implementation.

Some posters insisted that this is immediately allowed by pointer arithmetic
rules described in the standard. Formally, this is not true. Standard pointer
arithmetic rules are indeed formulated in terms of "array size", but they do not
say that the aforementioned "size" is the _declared_ size of the array object
(as opposed to the actual size of the underlying memory block). In other words,
formally, as follows from the standard _document_ (disregarding any informal
additions distributed by word-of-mouth), the implementation that restricts this
kind of "out of bound" access in non-conforming. This also means that neither
C89/90 nor C99 _documents_ really outlaw the infamous "struct hack".

At the same time it is important to note that it is well-known that the
committee's position is that the real intent behind the current version of the
pointer arithmetic rules was to interpret the notion of "array size" as the
_declared_ size of the array object. _This_ is the reason why "struct hack" and
the out-of-bounds access from your example are considered illegal in C. They are
outlawed "semi-informally" by known authoritative word-of-mouth comments, which
nevertheless are not included in the standard document.

Also it is worth noting that the big intuitive problem with this kind of access
being illegal is that the "ranged-pointer" feature I described above is
definitely out of place in C. In stronger words, an artificial restriction like
this is completely unacceptable in C. Moreover, it is completely unacceptable in
C++ as well It is something a 'std::vector<>' might do, but not a raw pointer).
And, as one would expect in case of such a random restriction, there's no
rationale behind it at all.

--
Best regards,
Andrey Tarasevich

Nov 1 '06 #26

Mark McIntyre

On Wed, 01 Nov 2006 15:38:17 GMT, in comp.lang.c , Frederick Gotham
<fg*******@SPAM.comwrote:

>
What ever happened to the idea of contiguous memory? When I define the
following object:

int arr[2][2];

, the type of the object "arr" is: int[2][2]

No, thats its C declaration. Its type is array [2] of array[2] of
ints.

>It consists of four int objects which are lain out contiguously in memory.

It consists of four int objects, yup. They may even be contiguous.
Thats irrelevant.

>Therefore, if we take the address of the first int, why can't we add to that
address to yield the addresses of the int's which are directly after it in
contiguous memory?

Because it says so in the Standard. This doesn't mean its impossible,
or even impractical. Just that its not allowed. If you really don't
like that, raise it as a DR with the committee.

>Isn't that one of the fundamental faculties of pointers?

Yes, but its still not relevant.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Nov 1 '06 #27

Keith Thompson

Mark McIntyre <ma**********@spamcop.netwrites:

On Wed, 01 Nov 2006 15:34:46 GMT, in comp.lang.c , Frederick Gotham
<fg*******@SPAM.comwrote:

[...]

>>No, but it's part of a contiguous sequence of memory.

So is
struct
{
int x[2];
int y[2];
}bar;

and I'm sure you'd agree that writing to bar.x[3] would be UB./

x and y aren't *necessarily* contiguous; there could be a gap between
them. In the array case being discussed, the representation is
specified by the standard, and there can be no gap.

Whats so hard to understand here?

The behavior is undefined because the standard says so. Anyone who's
unwilling or unable to accept that simple fact isn't going to be
persuaded by anything we say here.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 1 '06 #28

Joe Wright

Mark McIntyre wrote:

On Wed, 01 Nov 2006 15:34:46 GMT, in comp.lang.c , Frederick Gotham
<fg*******@SPAM.comwrote:

>Chris Dollin:

>> int arr[2][2];
arr[0][3] = 7;

`arr[0]` has type `array[2]int`.

The type in question is written as: int[2]

Thats how you write it in C, not how you explain what it is.

>>Clearly such an object has no
element at index 3. BOOM.

No, but it's part of a contiguous sequence of memory.

So is
struct
{
int x[2];
int y[2];
}bar;

and I'm sure you'd agree that writing to bar.x[3] would be UB./

Whats so hard to understand here?

If I had..
int a[2][2];
... and wanted to treat it cavalierly, I might use..
int *p = (int*)a;
...and then treat p[0]..p[3]. That's legal isn't it?

--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---

Nov 1 '06 #29

Frederick Gotham

Mark McIntyre:

So is
struct
{
int x[2];
int y[2];
}bar;

and I'm sure you'd agree that writing to bar.x[3] would be UB.

No, it wouldn't. It's possible that you'd be writing to padding bytes, but
nonetheless it's perfectly OK:

struct { int x[2], y[2]; } obj;

int *p = (int*)&obj;
int const *const pover = p + sizeof obj;

do *p++ = 0;
while (pover != p);

This code in and of itself is perfectly OK, but you'd want to watch out for
trapping if you go on to read from "x" or "y" within "obj".

If there were no padding, then everything would be fine and dandy.

--

Frederick Gotham

Nov 1 '06 #30

Frederick Gotham

Mark McIntyre:

>>, the type of the object "arr" is: int[2][2]

No, thats its C declaration. Its type is array [2] of array[2] of
ints.

Its type is int[2][2]. _You_ can call it:

Chocolate factory of arrays of length two with squidgy marshmellow bits
in between to two int's.

, but I think it's best that _we_ call it by its C name.

>>It consists of four int objects which are lain out contiguously in
memory.

It consists of four int objects, yup. They may even be contiguous.

They _are_ contiguous, despite your (misplaced) sarcasm.

Thats irrelevant.

If it were irrelevant, I wouldn't be ranting on about this so much.

--

Frederick Gotham

Nov 1 '06 #31

Frederick Gotham

Thanks Andrey, I've finally gotten the response I was looking for.

Andrey Tarasevich:

Also it is worth noting that the big intuitive problem with this kind of
access being illegal is that the "ranged-pointer" feature I described
above is definitely out of place in C.

Yes. I like control. I _love_ control. That's why I opt for _proper_
programming languages like C and C++, and not mickey-mouse languages like
Java.

In stronger words, an artificial restriction like this is completely
unacceptable in C.

So how do we get our hands on a "Range-liberal pointer", a pointer without
armbands? Must we have an intermediate cast to something like a void* or a
char* in order to liberate the pointer from its range restriction?
Something like:

int arr[2][2];

int *const p = (int*)(char unsigned*)&arr;

p[3] = 5;

(I chose "char unsigned*" because the Standard explicitly allows us to
treat any object as though it were simply a sequence of bytes -- which they
are!)

See how I took the address of the entire array (i.e. &arr), well I wonder
if the code would be any less proper if I took the address of the first
element instead, i.e.:

int *const p = (int*)(char unsigned*)&**arr;

--

Frederick Gotham

Nov 2 '06 #32

Eric Sosman

Frederick Gotham wrote:

What ever happened to the idea of contiguous memory?

Nothing "happened to" it. It's still there. It's still
a useful notion for describing representations. But that's
all it is.

When I define the
following object:

int arr[2][2];

, the type of the object "arr" is: int[2][2]

It consists of four int objects which are lain out contiguously in memory.

"Laid." But so do all of

int brr[1][4];
int crr[4][1];
int drr[4];

If you cannot see that arr, brr, crr, and drr have four different
types despite their single common representation, you have not
grasped the notion of "type." Part of that notion is that different
types behave differently even if their representations are the same:

int i = 0;
unsigned int u = 0;
// Claim: i and u have identical representations
--i;
--u;
// Claim: i and u have behaved differently despite
// their identical representations, because they
// are of different types

Now: arr, brr, crr, drr have the same representation but different
types, therefore they can behave differently. In particular, they
can behave differently w.r.t. the [] operator, and there's an end on't.

Therefore, if we take the address of the first int, why can't we add to that
address to yield the addresses of the int's which are directly after it in
contiguous memory? Isn't that one of the fundamental faculties of pointers?

You are confusing representation with value, and ignoring type.

--
Eric Sosman
es*****@acm-dot-org.invalid

Nov 2 '06 #33

Jordan Abel

2006-11-01 <vk********************************@4ax.com>,
Mark McIntyre wrote:

On Wed, 01 Nov 2006 15:34:46 GMT, in comp.lang.c , Frederick Gotham
<fg*******@SPAM.comwrote:
>>Chris Dollin:
>>`arr[0]` has type `array[2]int`.

The type in question is written as: int[2]

Thats how you write it in C, not how you explain what it is.

Well, if you want to explain what it is, I'd do so in english rather
than some hereto-unknown moon syntax. However, types have names, so why
not use them?

Nov 2 '06 #34

Jordan Abel

2006-11-02 <Ca******************************@comcast.com>,
Joe Wright wrote:

If I had..
int a[2][2];
.. and wanted to treat it cavalierly, I might use..
int *p = (int*)a;
..and then treat p[0]..p[3]. That's legal isn't it?

My conclusion, which i've articulated, is that it is. No-one else has
weighed in yet.

Nov 2 '06 #35

Jordan Abel

2006-11-01 <12*************@news.supernews.com>,
Andrey Tarasevich wrote:

For example, a pointer value produced in this case

int a[100];
int* p = a;

can be internally represented as a range-and-address combination

<0<100<address of a>

nit: 200 or 400 would be more likely, unless we have word-addressed
memory, in which case void*/char* would have a very different
representation.

Also it is worth noting that the big intuitive problem with this kind
of access being illegal is that the "ranged-pointer" feature
I described above is definitely out of place in C. In stronger words,
an artificial restriction like this is completely unacceptable in C.
Moreover, it is completely unacceptable in C++ as well It is something
a 'std::vector<>' might do, but not a raw pointer). And, as one would
expect in case of such a random restriction, there's no rationale
behind it at all.

What if ranged pointers are provided in hardware, and accessing out of
bounds incurs a trap that must be handled? (incidentally,
base/max/offset would be an equivalent implementation, and may actually
exist in some hardware)

And there's a very good rationale for providing this in debug mode even
if it's not provided in production mode.

Nov 2 '06 #36

Old Wolf

Joe Wright wrote:

If I had..
int a[2][2];
.. and wanted to treat it cavalierly, I might use..
int *p = (int*)a;
..and then treat p[0]..p[3]. That's legal isn't it?

No, but:
int *p = (int *)&a;
would be fine.

&a points to an object containing 4 ints.
&a[0] points to an object containing 2 ints.

Writing "a" by itself has the same effect as &a[0],
in your snippet.

Nov 2 '06 #37

Jordan Abel

2006-11-02 <11**********************@k70g2000cwa.googlegroups .com>,
Old Wolf wrote:

Joe Wright wrote:
>If I had..
int a[2][2];
.. and wanted to treat it cavalierly, I might use..
int *p = (int*)a;
..and then treat p[0]..p[3]. That's legal isn't it?

No, but:
int *p = (int *)&a;
would be fine.

&a points to an object containing 4 ints.
&a[0] points to an object containing 2 ints.

actually... &a[0] points to *two* objects containing 2 ints.

if we had
int a[2]
int *p = a /* &a[0] */

no-one would be claiming we can't use p[1].

Writing "a" by itself has the same effect as &a[0],
in your snippet.

Nov 2 '06 #38

Flash Gordon

Joe Wright wrote:

<snip>

If I had..
int a[2][2];
.. and wanted to treat it cavalierly, I might use..
int *p = (int*)a;
..and then treat p[0]..p[3]. That's legal isn't it?

I believe that would be perfectly legal because a (as opposed to a[0])
decays to a pointer to the start of entire array of arrays and you are
not going outside the region a defines. The previous example using a[0],
on the other hand, do go outside the region a[0] is defined as referring to.
--
Flash Gordon

Nov 2 '06 #39

Flash Gordon

Frederick Gotham wrote:

Mark McIntyre:

>So is
struct
{
int x[2];
int y[2];
}bar;

and I'm sure you'd agree that writing to bar.x[3] would be UB.

No, it wouldn't. It's possible that you'd be writing to padding bytes, but
nonetheless it's perfectly OK:

<snip>

Wrong. As has been pointed out already look at the discussions on the
struct hack, also look at the defect report about it and the
justification for C99 including an officially sanctioned method for
solving the problem the struct hack is used to deal with.
--
Flash Gordon

Nov 2 '06 #40

Old Wolf

Frederick Gotham wrote:

Andrey Tarasevich:

>Also it is worth noting that the big intuitive problem with this kind of
access being illegal is that the "ranged-pointer" feature I described
above is definitely out of place in C.

Yes. I like control. I _love_ control. That's why I opt for _proper_
programming languages like C and C++, and not mickey-mouse
languages like Java.

What is mickey-mouse about Java ?

>Also it is worth noting that the big intuitive problem with this kind of access
being illegal is that the "ranged-pointer" feature I described above is
definitely out of place in C. In stronger words, an artificial restriction like
this is completely unacceptable in C. Moreover, it is completely unacceptable in
C++ as well It is something a 'std::vector<>' might do, but not a raw pointer).
And, as one would expect in case of such a random restriction, there's no
rationale behind it at all.

I, for one, would find such a pointer very useful for debugging.

Currently my compiler includes a tool that will warn when I
step outside the bounds of an allocated block of memory.
But that won't help with code like:

struct S {
int x[4];
int y[4];
};
struct S s;

if I accidentally access s.x[5] . Which, I should add, I would
consider a bug (some of you would consider it a feature,
apparently).

So how do we get our hands on a "Range-liberal pointer", a pointer without
armbands? Must we have an intermediate cast to something like a void* or a
char* in order to liberate the pointer from its range restriction?

No, casts don't affect the pointer range. What do you mean by
"range-liberal pointer" ? The C standard is quite clear that you
cannot portably point outside the bounds of an object. The only
thing we are debating here is whether it is OK if the pointer
leaves the bounds of the object it was pointing to, but it is
still within the bounds of an object of which the original object
were a sub-object.

It is not a part of C that you can use pointer arithmetic on
a pointer to move it around any part of some flat address
space you might imagine.

int *const p = (int*)(char unsigned*)&arr;

The second cast is unnecessary; the expression "&arr" implies
a range of anywhere inside the object designated by "arr",
(and one-after-the-end of course).

Nov 2 '06 #41

Richard Heathfield

Flash Gordon said:

Joe Wright wrote:

<snip>

>If I had..
int a[2][2];
.. and wanted to treat it cavalierly, I might use..
int *p = (int*)a;
..and then treat p[0]..p[3]. That's legal isn't it?

I believe that would be perfectly legal because a (as opposed to a[0])
decays to a pointer to the start of entire array of arrays

Well, a actually decays to &a[0], which is a pointer to the first element in
a, i.e. it is a pointer to an array of two int, and it has type int (*)[2].

Can I just ask that people not use a as an identifier in Usenet discussions?
One advantage of foo and bar is that they are trivial for the eye to
distinguish from indefinite articles without having to engage the conscious
brain.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Nov 2 '06 #42

Chris Dollin

Frederick Gotham wrote:

Chris Dollin:

> int arr[2][2];
arr[0][3] = 7;

`arr[0]` has type `array[2]int`.

The type in question is written as: int[2]

That's one way of writing it, yes.

>Clearly such an object has no
element at index 3. BOOM.

No, but it's part of a contiguous sequence of memory.

So?

--
Chris "unhashedup hashed up hashing" Dollin
"Our future looks secure, but it's all out of our hands"
- Magenta, /Man and Machine/

Nov 2 '06 #43

pemo

Frederick Gotham wrote:

[ This post deals with both C and C++, but does not alienate either
language because the language feature being discussed is common to
both languages. ]

Over on comp.lang.c, we've been discussing the accessing of array
elements via subscript indices which may appear to be out of range.
In particular, accesses similar to the following:

<snip>

I've been following /some/ of this discussion, and I must admit that I was
surprised by the stds J2 quote [An array subscript is out of range, even if
an object is apparently accessible].

So, I wonder if I could ask whether I've got this right - the first loop
below has UB, the second if fine, and the third is also ok according to
6.5.6 (8)?

#include <stdio.h>

int main(void)
{
int q[4][3][2] =
{
{
{1,},
},
{
{2, 3},
},
{
{4, 5},
{6},
},
};

int n;
int i;
int j;

int * p;

/* UB */
for(n = 0; n < 24; ++n)
{
printf("%d ", q[0][0][n]);
}

puts("");

/* OK */
for(n = 0; n < 4; ++n)
{
for(i = 0; i < 3; ++i)
{
for(j = 0; j < 2; ++j)
{
printf("%d ", q[n][i][j]);
}
}

}

puts("");

/* ?? from 6.5.6 - 8 */
for(n = 0, p = (int *)&q[0]; n < 24; ++n)
{
printf("%d ", *(p + n));
}

return 0;
}

--
==============
Not a pedant
==============

Nov 2 '06 #44

Frederick Gotham

Flash Gordon:

Wrong. As has been pointed out already look at the discussions on the
struct hack, also look at the defect report about it and the
justification for C99 including an officially sanctioned method for
solving the problem the struct hack is used to deal with.

If memory is mine to play with, I'll play with it however I like.

--

Frederick Gotham

Nov 2 '06 #45

Richard Heathfield

Frederick Gotham said:

Flash Gordon:

>Wrong. As has been pointed out already look at the discussions on the
struct hack, also look at the defect report about it and the
justification for C99 including an officially sanctioned method for
solving the problem the struct hack is used to deal with.

If memory is mine to play with, I'll play with it however I like.

And C implementations often give you the power to do that, provided you are
prepared to pay the cost - i.e. that your program is not guaranteed to work
on /other/ C implementations.

Since memory is mine to play with, I can play with it like this:

void print(const char *s, int x, int y, unsigned char fg, unsigned char bg)
{
unsigned char attr = (bg << 4) | fg;
unsigned char *p = (unsigned char *)0xb8000000UL + 160 * y + 2 * x;
while(*s)
{
*p++ = *s++;
*p++ = attr;
}
}

and, provided my implementation plays ball, and provided I have a monitor
that supports, and is currently in, 80-column text mode, I have a
moderately fast way to write to the screen. But if it doesn't or I don't,
all I have is a moderately fast way to crash the program, or possibly the
entire machine.

Sure as eggs is eggs, the above code works just fine on my MS-DOS machine.
So can I complain when the mainframe barfs on it? Well, yeah, I can, but
the complaint is groundless, because I stepped outside the bounds of the
Standard. So if it works anyway, fabulous, but if it doesn't, that's my
problem, not ISO's or the implementor's.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Nov 2 '06 #46

Frederick Gotham

Richard Heathfield:

>If memory is mine to play with, I'll play with it however I like.

And C implementations often give you the power to do that, provided you
are prepared to pay the cost - i.e. that your program is not guaranteed
to work on /other/ C implementations.

In saying that my memory is mine to play with, I'm saying:

If I allocate some memory for my own use, be it via static duration
objects, automatic objects, or via malloc, then I can do whatever I like
with that memory. That's the way C is supposed to be, right?

struct { int a; int b; int c; } obj;

int *p = (int*)&obj;

*p++ = 1;
*p++ = 2;
*p++ = 3;

if (sizeof obj >= 4*sizeof*p) *p++ = 4;
if (sizeof obj >= 5*sizeof*p) *p++ = 5;
if (sizeof obj >= 6*sizeof*p) *p++ = 6;
if (sizeof obj >= 7*sizeof*p) *p++ = 7;
Do you think the behaviour of the above code is undefined? Sure, it might
not write to a, b, and c respectively. And sure, it might write to padding
bytes... but it is still perfectly OK.

The following code should always be OK:

SomeType1 obj1;
SomeType2 obj2;

memcpy(&obj1,&obj2,sizeof obj1);

Sure, you'll probably end up with gibberish, but the code is perfectly OK.

--

Frederick Gotham

Nov 2 '06 #47

Flash Gordon

Frederick Gotham wrote:

Flash Gordon:

>Wrong. As has been pointed out already look at the discussions on the
struct hack, also look at the defect report about it and the
justification for C99 including an officially sanctioned method for
solving the problem the struct hack is used to deal with.

If memory is mine to play with, I'll play with it however I like.

Well, don't go to India and try to get a job with the company my
employer has outsourced its development to.
--
Flash Gordon

Nov 2 '06 #48

Flash Gordon

Richard Heathfield wrote:

Flash Gordon said:

>Joe Wright wrote:

<snip>

>>If I had..
int a[2][2];
.. and wanted to treat it cavalierly, I might use..
int *p = (int*)a;
..and then treat p[0]..p[3]. That's legal isn't it?
I believe that would be perfectly legal because a (as opposed to a[0])
decays to a pointer to the start of entire array of arrays

Well, a actually decays to &a[0], which is a pointer to the first element in
a, i.e. it is a pointer to an array of two int, and it has type int (*)[2].

You are quite correct, I was sloppy. It decays as you specified, but you
are allowed to use it to access all of the memory allocated by the 2D
array declaration. So, taking in to account what you say below, I
believe that:
int foo[2][2];
int *ptr = (int*)foo;
ptr[3];
is valid since:

the pointer that foo decays to is guaranteed to be correctly alligend

the pointer that foo decays to points to a 2 element array where each
element is of type int[2] so I'm not exceeding the bounds allowed.

Can I just ask that people not use a as an identifier in Usenet discussions?
One advantage of foo and bar is that they are trivial for the eye to
distinguish from indefinite articles without having to engage the conscious
brain.

Agreed. I just followed on what others had done without thinking, which
was bad of me.
--
Flash Gordon

Nov 2 '06 #49

Richard Heathfield

Frederick Gotham said:

Richard Heathfield:

>>If memory is mine to play with, I'll play with it however I like.

And C implementations often give you the power to do that, provided you
are prepared to pay the cost - i.e. that your program is not guaranteed
to work on /other/ C implementations.

In saying that my memory is mine to play with, I'm saying:

If I allocate some memory for my own use, be it via static duration
objects, automatic objects, or via malloc, then I can do whatever I like
with that memory.

....within the requirements of the Standard. (If you choose to violate those
requirements for your own reasons, that's fine, but at that point the C
Standard no longer defines the behaviour of your program, and we lose the
common ground essential to discussion.)

That's the way C is supposed to be, right?

The way you think C is supposed to be is not necessarily in line with the
way ISO think it is supposed to be.

<weird code snipped>

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Nov 2 '06 #50

Out-of-bounds nonsense

Similar topics