Out-of-bounds Nonsense

Frederick Gotham

[ This post deals with both C and C++, but does not alienate either language
because the language feature being discussed is common to both languages. ]

Over on comp.lang.c, we've been discussing the accessing of array elements
via subscript indices which may appear to be out of range. In particular,
accesses similar to the following:

int arr[2][2];

arr[0][3] = 7;

Both the C Standard and the C++ Standard necessitate that the four int's be
lain out in memory in ascending order with no padding in between, i.e.:

(best viewed with a monowidth font)

--------------------------------
| Memory Address | Object |
--------------------------------
| 0 | arr[0][0] |
| 1 | arr[0][1] |
| 2 | arr[1][0] |
| 3 | arr[1][1] |
--------------------------------

One can see plainly that there should be no problem with the little snippet
above because arr[0][3] should be the same as arr[1][1], but I've had people
over on comp.lang.c telling me that the behaviour of the snippet is undefined
because of an "out of bounds" array access. They've even backed this up with
a quote from the C Standard:

J.2 Undefined behavior:
The behavior is undefined in the following circumstances:
[...]
- An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).

Are the same claims of undefined behaviour existing in C++ made by anyone?

If it is claimed that the snippet's behaviour is undefined because the second
subscript index is out of range of the dimension, then this rationale can be
brought into doubt by the following breakdown. First let's look at the
expression statement:

arr[0][3] = 9;

The compiler, both in C and in C++, must interpret this as:

*( *(arr+0) + 3 ) = 9;

In the inner-most set of parentheses, "arr" decays to a pointer to its first
element, i.e. an R-value of the type int(*)[2]. The value 0 is then added to
this address, which has no effect. The address is then dereferenced, yielding
an L-value of the type int[2]. This expression then decays to a pointer to
its first element, yielding an R-value of the type int*. The value 3 is then
added to this address. (In terms of bytes, it's p += 3 * sizeof(int)). This
address is then dereferenced, yielding an L-value of the type int. The L-
value int is then assigned to.

The only thing that sounds a little dodgy in the above paragraph is that an
L-value of the type int[2] is used as a stepping stone to access an element
whose index is greater than 1 -- but this shouldn't be a problem, because the
L-value decays to a simple R-value int pointer prior to the accessing of the
int object, so any dimension info should be lost by then.

To the C++ programmers: Is the snippet viewed as invoking undefined
behaviour? If so, why?

To the C programmers: How can you rationalise the assertion that it actually
does invoke undefined behaviour?

I'd like to remind both camps that, in other places, we're free to use our
memory however we please (given that it's suitably aligned, of course). For
instance, look at the following. The code is an absolute dog's dinner, but it
should work perfectly on all implementations:

/* Assume the inclusion of all necessary headers */

void Output(int); /* Defined elsewhere */

int main(void)
{
assert( sizeof(double) sizeof(int) );

{ /* Start */

double *p;
int *q;
char unsigned const *pover;
char unsigned const *ptr;

p = malloc(5 * sizeof*p);
q = (int*)p++;
pover = (char unsigned*)(p+4);
ptr = (char unsigned*)p;
p[3] = 2423.234;
*q++ = -9;
do Output(*ptr++);
while (pover != ptr);

return 0;

} /* End */
}

Another thing I would remind both camps of, is that we can access any memory
as if it were simply an array of unsigned char's. That means we can access an
"int[2][2]" as if it were simply an object of the type "char unsigned[sizeof
(int[2][2])]".

The reason I'm writing this is that, at the moment, it sounds like absolute
nonsense to me that the original snippet's behaviour is undefined, and so I
challenge those who support its alleged undefinedness.

I leave you with this:

int arr[2][2];

void *const pv = &arr;

int *const pi = (int*)pv; /* Cast used for C++ programmers! */

pi[3] = 8;

--

Frederick Gotham

Nov 1 '06 #1

Subscribe Post Reply

1689

Kai-Uwe Bux

Frederick Gotham wrote:

>
[ This post deals with both C and C++, but does not alienate either
[ language
because the language feature being discussed is common to both languages.
]

Over on comp.lang.c, we've been discussing the accessing of array elements
via subscript indices which may appear to be out of range. In particular,
accesses similar to the following:

int arr[2][2];

arr[0][3] = 7;

Both the C Standard and the C++ Standard necessitate that the four int's
be lain out in memory in ascending order with no padding in between, i.e.:

(best viewed with a monowidth font)

--------------------------------
| Memory Address | Object |
--------------------------------
| 0 | arr[0][0] |
| 1 | arr[0][1] |
| 2 | arr[1][0] |
| 3 | arr[1][1] |
--------------------------------

One can see plainly that there should be no problem with the little
snippet above because arr[0][3] should be the same as arr[1][1], but I've
had people over on comp.lang.c telling me that the behaviour of the
snippet is undefined because of an "out of bounds" array access. They've
even backed this up with a quote from the C Standard:

J.2 Undefined behavior:
The behavior is undefined in the following circumstances:
[...]
- An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).

Are the same claims of undefined behaviour existing in C++ made by anyone?

I think I have seen those claims in this news group with regard to C++.

If it is claimed that the snippet's behaviour is undefined because the
second subscript index is out of range of the dimension, then this
rationale can be brought into doubt by the following breakdown. First
let's look at the expression statement:

arr[0][3] = 9;

The compiler, both in C and in C++, must interpret this as:

*( *(arr+0) + 3 ) = 9;

In the inner-most set of parentheses, "arr" decays to a pointer to its
first element, i.e. an R-value of the type int(*)[2]. The value 0 is then
added to this address, which has no effect. The address is then
dereferenced, yielding an L-value of the type int[2]. This expression then
decays to a pointer to its first element, yielding an R-value of the type
int*. The value 3 is then added to this address. (In terms of bytes, it's
p += 3 * sizeof(int)). This address is then dereferenced, yielding an
L-value of the type int. The L- value int is then assigned to.

The only thing that sounds a little dodgy in the above paragraph is that
an L-value of the type int[2] is used as a stepping stone to access an
element whose index is greater than 1 -- but this shouldn't be a problem,

I think it might be.

because the L-value decays to a simple R-value int pointer prior to the
accessing of the int object, so any dimension info should be lost by then.

Why is it necessarily true that the pointer decays to a "simple" int
pointer? Do you have a clause in the standard for this? Moreover, what is
so "simple" about pointers anyway? I think, the standard allows for what I
like to call "decorated pointers" that have type and bounds information
attached to them, i.e., a pointer obtained from an int[2] could have
bounds-information built in that would trigger a segfault for out of bounds
access. In that case, the simple int* you mention could remember the bound
of the array that it is supposedly bound to. Where in the standard are the
provisions that prevent this type of overzealous bounds-checking?

To the C++ programmers: Is the snippet viewed as invoking undefined
behaviour? If so, why?

Because, you cannot deduce its behavior from the guarantees made by the
standard? I just note that you did not put in any references into your
reasoning. That makes it very hard to check whether the standard actually
guarantees the things you need. Given that there is a prima-facie out of
bounds access, I think you carry the burden of proof.

To the C programmers: How can you rationalise the assertion that it
actually does invoke undefined behaviour?

I have no idea about C. Sorry.

[snip]

Nov 1 '06 #2

Frederick Gotham

What ever happened to the idea of contiguous memory? When I define the
following object:

int arr[2][2];

, the type of the object "arr" is: int[2][2]

It consists of four int objects which are lain out contiguously in memory.

Therefore, if we take the address of the first int, why can't we add to that
address to yield the addresses of the int's which are directly after it in
contiguous memory? Isn't that one of the fundamental faculties of pointers?

--

Frederick Gotham

Nov 1 '06 #3

Victor Bazarov

Frederick Gotham wrote:

What ever happened to the idea of contiguous memory? When I define the
following object:

int arr[2][2];

, the type of the object "arr" is: int[2][2]

It consists of four int objects which are lain out contiguously in
memory.

Therefore, if we take the address of the first int, why can't we add
to that address to yield the addresses of the int's which are
directly after it in contiguous memory? Isn't that one of the
fundamental faculties of pointers?

I think the conflict here is between the habits of [some] programmers
and what the Standard actually can *guarantee*. To interpret an array
of 2 arrays of 2 ints (int[2][2]) as a single array of 4 ints (which
have the same memory layouts, supposedly), you need to use a cast (and
a nasty one, reinterpret_cast). It's fine (on most platforms), but
since there can exist platforms on which it isn't OK, the Stadnard,
trying to be as generic as possible, cannot define the behaviour thus
prohibiting a C++ implemenation from existing on such [rare] platforms
and chooses to leave the behaviour undefined.

Again, nothing is there on most implementations and hardware platforms
to stop you from doing

int *p = &arr[0][0];
int &arr_1_1 = *(p + 3);

except that in standard terms it's UB.

Do we really need to keep going about it?

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask

Nov 1 '06 #4

Frederick Gotham

Victor Bazarov:

I think the conflict here is between the habits of [some] programmers
and what the Standard actually can *guarantee*. To interpret an array
of 2 arrays of 2 ints (int[2][2]) as a single array of 4 ints (which
have the same memory layouts, supposedly),

The Standard necessitates that they have the same layout.

A multi-dimensional array is merely an array of arrays. An array may have
no padding at its start nor end, nor between elements.

Therefore, even if we have an array of arrays of arrays of arrays of
arrays, all objects must be directly after one another with no padding in
between.

you need to use a cast (and
a nasty one, reinterpret_cast).

Indeed, one could write:

int arr[2][2];

int (&b)[4] = reinterpret_cast<int(&)[4]>(arr);

b[0] = 1;
b[1] = 2;
b[2] = 3;
b[3] = 4;

It's fine (on most platforms), but
since there can exist platforms on which it isn't OK, the Stadnard,
trying to be as generic as possible, cannot define the behaviour thus
prohibiting a C++ implemenation from existing on such [rare] platforms
and chooses to leave the behaviour undefined.

Again, nothing is there on most implementations and hardware platforms
to stop you from doing

int *p = &arr[0][0];
int &arr_1_1 = *(p + 3);

except that in standard terms it's UB.

Do we really need to keep going about it?

Yes, because I think it's bullshit, and I think the Standard needs to
change.

--

Frederick Gotham

Nov 1 '06 #5

Victor Bazarov

Frederick Gotham wrote:

Victor Bazarov:
>[..]
Do we really need to keep going about it?

Yes, because I think it's bullshit, and I think the Standard needs to
change.

Then go argue your case in comp.std.c++. Otherwise it's a waste of
bandwidth.

Nov 1 '06 #6

Michiel.Salters

Victor Bazarov wrote:

int arr[2][2];
nothing is there on most implementations and hardware platforms
to stop you from doing

int *p = &arr[0][0];
int &arr_1_1 = *(p + 3);

except that in standard terms it's UB.

Is it? After all, int is a POD, and so is int [2][2]. I think that will
make it defined.
But the wording that defines the bahavior is POD specific and won't
work for
std::string.

Regards,
Michiel Salters

Nov 2 '06 #7

Kai-Uwe Bux

Mi*************@tomtom.com wrote:

Victor Bazarov wrote:
> int arr[2][2];
nothing is there on most implementations and hardware platforms
to stop you from doing

int *p = &arr[0][0];
int &arr_1_1 = *(p + 3);

except that in standard terms it's UB.

Is it? After all, int is a POD, and so is int [2][2]. I think that will
make it defined.
But the wording that defines the bahavior is POD specific and won't
work for
std::string.

Could you provide chapter and verse for the language that saves the day for
PODs?
Best

Kai-Uwe Bux

Nov 2 '06 #8

Similar topics

Can't set back System.out

by: FilexBB | last post by:

Hi Folks, I have tried to redirect system.out for a while and then set it back, but it can't set it back as following program snapshot ByteArrayOutputStream baos = new ByteArrayOutputStream();...

Java

finding out if a string contains characters

by: Merlin | last post by:

Hi there, I would like to check if a string is a valid zip code via Javascript. Length and existents are already checked. How can I find out if the string contains characters other than...

Javascript

out object parameter

by: Mike Carroll | last post by:

I have a COM server that's generally working ok. But one of its methods, when the IDL gets read by the Intertop layer, has parameters of type "out object". The C# compiler tells me that it...

C# / C Sharp

Differences between the ref and out keyword ?

by: Steve B. | last post by:

Hello I'm wondering what is exactly the difference between "ref" and "out" keywords. Thanks, Steve

C# / C Sharp

[Out] vs out keywords!

by: Chua Wen Ching | last post by:

Hi there, I am wondering the difference between attribute and out keywords. Are they the same or does it serve any different purposes? I saw the and out usage in this code, and i had idea,...

C# / C Sharp

Why are out parmeters included in an BeginInvoke ?

by: Jon | last post by:

Why are out parmeters included in an BeginInvoke? They seem to do nothing? TestProgam: using System; namespace TempConsole { class App { public delegate void MyDelegate( out byte b, out...

C# / C Sharp

MemberInfo Invoke with out object[]

by: stic | last post by:

Hi, I'm in a middle of writing something like 'exception handler wraper' for a set of different methodes. The case is that I have ca. 40 methods form web servicem, with different return values...

C# / C Sharp

[Out] param in C++ called from C#

by: dlgproc | last post by:

I have a managed C++ DLL that contains the following: MyLib.h: // MyLib.h #pragma once using namespace System; using namespace System::Runtime::InteropServices; namespace MyLib { public...

.NET Framework

How to convert C# out parameter to C++?

by: nick | last post by:

For example: public static void FillRow(Object obj, out SqlDateTime timeWritten, out SqlChars message, out SqlChars category, out long instanceId)

.NET Framework

Writing an arraylist to a file. CHeck this out!

by: carlos123 | last post by:

Ok guys, check this out! Im getting an error "Error: Index: 0, Size: 0" not sure why. try{ // Create file FileWriter fstream = new FileWriter("database.txt"); BufferedWriter...

Java

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice