By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
448,538 Members | 875 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 448,538 IT Pros & Developers. It's quick & easy.

Out-of-bounds nonsense

P: n/a

[ This post deals with both C and C++, but does not alienate either
language because the language feature being discussed is common to both
languages. ]

Over on comp.lang.c, we've been discussing the accessing of array elements
via subscript indices which may appear to be out of range. In particular,
accesses similar to the following:

int arr[2][2];

arr[0][3] = 7;

Both the C Standard and the C++ Standard necessitate that the four int's be
lain out in memory in ascending order with no padding in between, i.e.:

(best viewed with a monowidth font)

--------------------------------
| Memory Address | Object |
--------------------------------
| 0 | arr[0][0] |
| 1 | arr[0][1] |
| 2 | arr[1][0] |
| 3 | arr[1][1] |
--------------------------------

One can see plainly that there should be no problem with the little snippet
above because arr[0][3] should be the same as arr[1][1], but I've had
people over on comp.lang.c telling me that the behaviour of the snippet is
undefined because of an "out of bounds" array access. They've even backed
this up with a quote from the C Standard:

J.2 Undefined behavior:
The behavior is undefined in the following circumstances:
[...]
- An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).

Are the same claims of undefined behaviour existing in C++ made by anyone?

If it is claimed that the snippet's behaviour is undefined because the
second subscript index is out of range of the dimension, then this
rationale can be brought into doubt by the following breakdown. First let's
look at the expression statement:

arr[0][3] = 9;

The compiler, both in C and in C++, must interpret this as:

*( *(arr+0) + 3 ) = 9;

In the inner-most set of parentheses, "arr" decays to a pointer to its
first element, i.e. an R-value of the type int(*)[2]. The value 0 is then
added to this address, which has no effect. The address is then
dereferenced, yielding an L-value of the type int[2]. This expression then
decays to a pointer to its first element, yielding an R-value of the type
int*. The value 3 is then added to this address. (In terms of bytes, it's p
+= 3 * sizeof(int)). This address is then dereferenced, yielding an L-value
of the type int. The L-value int is then assigned to.

The only thing that sounds a little dodgy in the above paragraph is that an
L-value of the type int[2] is used as a stepping stone to access an element
whose index is greater than 1 -- but this shouldn't be a problem, because
the L-value decays to a simple R-value int pointer prior to the accessing
of the int object, so any dimension info should be lost by then.

To the C++ programmers: Is the snippet viewed as invoking undefined
behaviour? If so, why?

To the C programmers: How can you rationalise the assertion that it
actually does invoke undefined behaviour?

I'd like to remind both camps that, in other places, we're free to use our
memory however we please (given that it's suitably aligned, of course). For
instance, look at the following. The code is an absolute dog's dinner, but
it should work perfectly on all implementations:

/* Assume the inclusion of all necessary headers */

void Output(int); /* Defined elsewhere */

int main(void)
{
assert( sizeof(double) sizeof(int) );

{ /* Start */

double *p;
int *q;
char unsigned const *pover;
char unsigned const *ptr;

p = malloc(5 * sizeof*p);
q = (int*)p++;
pover = (char unsigned*)(p+4);
ptr = (char unsigned*)p;
p[3] = 2423.234;
*q++ = -9;
do Output(*ptr++);
while (pover != ptr);

return 0;

} /* End */
}

Another thing I would remind both camps of, is that we can access any
memory as if it were simply an array of unsigned char's. That means we can
access an "int[2][2]" as if it were simply an object of the type "char
unsigned[sizeof(int[2][2])]".

The reason I'm writing this is that, at the moment, it sounds like absolute
nonsense to me that the original snippet's behaviour is undefined, and so I
challenge those who support its alleged undefinedness.

I leave you with this:

int arr[2][2];

void *const pv = &arr;

int *const pi = (int*)pv; /* Cast used for C++ programmers! */

pi[3] = 8;

--

Frederick Gotham
Nov 1 '06
Share this Question
Share on Google+
63 Replies


P: n/a
Frederick Gotham wrote:
>...
The following code should always be OK:

SomeType1 obj1;
SomeType2 obj2;

memcpy(&obj1,&obj2,sizeof obj1);

Sure, you'll probably end up with gibberish, but the code is perfectly OK.
Not so:

/* in a system where sizeof long is 4 */
long obj1;
char obj2;
/* UB: attempts to copy 3 chars past obj2 */
memcpy(&obj1,&obj2,sizeof obj1);
Roberto Waltman
Nov 2 '06 #51

P: n/a
Frederick Gotham <fg*******@SPAM.comwrites:
In saying that my memory is mine to play with, I'm saying:

If I allocate some memory for my own use, be it via static duration
objects, automatic objects, or via malloc, then I can do whatever I like
with that memory. That's the way C is supposed to be, right?
No. That's the way C often is, but it's not guaranteed by the
standard and is likely to fail when you leave your cozy "all the world's
a Vax^WIntel x86 processor" enclave.

Do the wrong thing with your memory, and the computer crashes. How do
you know what the wrong thing is? The standard tells you.

Charlton
Nov 2 '06 #52

P: n/a
Roberto Waltman:
Not so:

/* in a system where sizeof long is 4 */
long obj1;
char obj2;
/* UB: attempts to copy 3 chars past obj2 */
memcpy(&obj1,&obj2,sizeof obj1);

Of course, you're right.

Type1 obj1;
Type2 obj2;

if (sizeof obj2 >= sizeof obj1)
memcpy(&obj1,&obj2,sizeof obj1);

--

Frederick Gotham
Nov 2 '06 #53

P: n/a
Charlton Wilbur said:
Do the wrong thing with your memory, and the computer crashes.
Might crash. Or might do something more - um - creative. Or might do what
you expected. Or might do what you expected *and* something creative with
which to surprise you later on. Isn't programming exciting!?
How do
you know what the wrong thing is? The standard tells you.
Quite so.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
Nov 2 '06 #54

P: n/a
Richard Heathfield <in*****@invalid.invalidwrites:
Charlton Wilbur said:
Do the wrong thing with your memory, and the computer crashes.

Might crash. Or might do something more - um - creative. Or might do what
you expected. Or might do what you expected *and* something creative with
which to surprise you later on. Isn't programming exciting!?
True. I learned C on MIPS machines with memory protection, and so
when I did the wrong thing with my memory, the program obligingly
crashed. As annoying as it was at the time, this was a great
pedagogical help.

Then I started working on Linux at home and MIPS or Alpha at school,
and learned what "portable" really meant -- and that was just among
different varieties of Unix.

Charlton
Nov 2 '06 #55

P: n/a
Frederick Gotham wrote:
Thanks Andrey, I've finally gotten the response I was looking for.

Andrey Tarasevich:
>Also it is worth noting that the big intuitive problem with this kind of
access being illegal is that the "ranged-pointer" feature I described
above is definitely out of place in C.

Yes. I like control. I _love_ control. That's why I opt for _proper_
programming languages like C and C++, and not mickey-mouse languages like
Java.
Oh please. Prefer another language over another all you like (heck, I
do), but blanket statements like this suggest you don't actually have
any significant experience with Java.

I hesitate to call any language a "Mickey-Mouse" language, including
venerable BASIC or Logo. Maybe Brainfuck or TRACEY can be considered
so, but the inventors of those languages are dead serious about what the
language is _for_, so even then I'm not so sure.

Having done more than my share of "porting" "portable" programs written
in C and C++, I'll take Java any day of the week for the sorts of
enterprise client-server development I do for a living.

C is a wonderful language exquisitely suitable for a great number of
purposes, but there is no way I want to maintain an internationalized,
multi-platform client-server app that requires (among other things)
robust UTF string handling, true exception handling and and easy way to
deliver fixes in the field. Been there, done that. Got the t-shirt.

Yes, all these things can be added to and approximated with C, but why
reinvent the wheel? Java is not just a language. It is a programming
environment for medium-to-large scale, multi-tiered computing. We get
paid to implement features and scale up to larger and larger iron.

Java allows that out of the box, without having to do stuff that
*doesn't* pay, like inventing specialized libraries or creating a
middle-tier. For this kind of computing no one cares whether or not a
multi-dimensional array decays to a pointer to contiguous memory or not.
It's just not important.

It's all about the right tool for the job. I like C. I'm the C guy
around here. But there is no way my company would be doing half the
business we are doing without Java (or some other similar toolset).

There are poorly written programs in any language, and there are poor
uses of a language. Sometimes, if you are extremely unlucky and the
gods hate you, there is a combination of the two.
Nov 2 '06 #56

P: n/a
On Thu, 02 Nov 2006 00:16:50 GMT, in comp.lang.c , Frederick Gotham
<fg*******@SPAM.comwrote:
>Mark McIntyre:
>>>, the type of the object "arr" is: int[2][2]

No, thats its C declaration. Its type is array [2] of array[2] of
ints.


Its type is int[2][2].
*sigh*.

No, thats its declaration.
Its type is array two of array two of int.

If you don't realise the difference, you need to go back to basics.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
Nov 2 '06 #57

P: n/a
On Wed, 01 Nov 2006 23:58:09 GMT, in comp.lang.c , Keith Thompson
<ks***@mib.orgwrote:
>x and y aren't *necessarily* contiguous; there could be a gap between
them. In the array case being discussed, the representation is
specified by the standard, and there can be no gap.
Yeah, I was going to chuck in "assume word-aligned memory and no
struct packing" but couldn't be donkeyed.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
Nov 2 '06 #58

P: n/a
On Thu, 2 Nov 2006 18:32:21 UTC, Frederick Gotham <fg*******@SPAM.com>
wrote:
Roberto Waltman:
Not so:

/* in a system where sizeof long is 4 */
long obj1;
char obj2;
/* UB: attempts to copy 3 chars past obj2 */
memcpy(&obj1,&obj2,sizeof obj1);


Of course, you're right.

Type1 obj1;
Type2 obj2;

if (sizeof obj2 >= sizeof obj1)
memcpy(&obj1,&obj2,sizeof obj1);
May result in udefined behavior when Type1 != Type2 as the
representation of different types does not require that they are have
to have the same padding bits adn/or alignment requirements. memcpy
can fail in the lands of udefined behavior here. Accessing obje1
thereafter can end in anything but may not do what you thinks it
should do.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!
Nov 3 '06 #59

P: n/a
Herbert Rosenau wrote:
On Thu, 2 Nov 2006 18:32:21 UTC, Frederick Gotham <fg*******@SPAM.com>
wrote:
>Roberto Waltman:
>>Not so:

/* in a system where sizeof long is 4 */
long obj1;
char obj2;
/* UB: attempts to copy 3 chars past obj2 */
memcpy(&obj1,&obj2,sizeof obj1);

Of course, you're right.

Type1 obj1;
Type2 obj2;

if (sizeof obj2 >= sizeof obj1)
memcpy(&obj1,&obj2,sizeof obj1);
May result in udefined behavior when Type1 != Type2 as the
representation of different types does not require that they are have
to have the same padding bits adn/or alignment requirements. memcpy
can fail in the lands of udefined behavior here.
No, memcpy is required to treat the data as unsigned char so there are
no alignment issues, no padding and no trap representations. At least,
not during the memcpy.
Accessing obje1
thereafter can end in anything but may not do what you thinks it
should do.
That is indeed where you get problems of trap representations.
--
Flash Gordon
Nov 3 '06 #60

P: n/a
Herbert Rosenau:
> Type1 obj1;
Type2 obj2;

if (sizeof obj2 >= sizeof obj1)
memcpy(&obj1,&obj2,sizeof obj1);
May result in udefined behavior when Type1 != Type2 as the
representation of different types does not require that they are have
to have the same padding bits adn/or alignment requirements.

Not that we don't access "obj1" subsequent to the copy. Therefore, all
behaviour is well-defined.

memcpy can fail in the lands of udefined behavior here.

No, it can't.

Accessing obje1 thereafter can end in anything but may not do what you
thinks it should do.

Indeed, one would exercise caution if going on to access "obj1".

--

Frederick Gotham
Nov 3 '06 #61

P: n/a
Mark McIntyre:
>>Its type is int[2][2].

*sigh*.

No, thats its declaration.
Its type is array two of array two of int.
I prefer the name:

Eagar dhá dhúil, ar eagar dhá int gach dúil.

, but then again, we can call it whatever we like. The Standard seems to call
it an:

int[2][2]

--

Frederick Gotham
Nov 3 '06 #62

P: n/a
On Fri, 03 Nov 2006 15:11:43 GMT, in comp.lang.c , Frederick Gotham
<fg*******@SPAM.comwrote:
>Mark McIntyre:
>>>Its type is int[2][2].

*sigh*.
The Standard seems to call
it an:

int[2][2]
Nope.

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
Nov 3 '06 #63

P: n/a
Mark McIntyre:
>>The Standard seems to call
it an:

int[2][2]

Nope.

Mark McIntyre, you're such a genius.

--

Frederick Gotham
Nov 3 '06 #64

63 Replies

This discussion thread is closed

Replies have been disabled for this discussion.