What does the standard say about array access wraparound?

David Mathog

If this:

int i,sum;
int *array;
for(sum=0, i=0; i<len; i++){
sum += array[i];
}

is converted to this (never mind why for the moment):

int i,sum;
int *array;
int *arrl;
arl=&array[-len];
for(sum=0,i=len; i<2*len; i++){
sum += arrl[i];
}

it should give the same result. But there are some funny
things that can happen. For instance, if &array is 1000 and
len is 100000. In that case arrl will hold an address
(1000-100000) which presumably wraps around since the
pointer should be an unsigned int (whatever size int is).
The address it points to will be MAX_POINTER - 100000 + 1000.
When the second form loop loop begins i=len (100000) so
arrl[100000] will wrap back around and point to the same
place as array[0].

Or will it?

It seems possible that this sort of array access "off the top of
memory" could trigger a fault.

What does the C standard say about this (if anything)?

Thanks,

David Mathog
ma****@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech

Nov 14 '05 #1

Subscribe Reply

3765

Mike Wahler

"David Mathog" <ma****@caltech.edu> wrote in message
news:20040527080256.4fbb14a0.ma****@caltech.edu...

If this:

int i,sum;
int *array;
Note that the object you've named 'array' is
*not* an array, it's a pointer. Also note that
you've not given this pointer a valid value,
so evaluating it will produce undefined behavior.
for(sum=0, i=0; i<len; i++){
You haven't defined 'len'.
Also note that if you want to use them to index
(or record the size of) an array, the objects 'i'
and 'len' should be of type 'size_t', not 'int'.
sum += array[i];
Undefined behavior.
}

is converted to this (never mind why for the moment):

int i,sum;
int *array;
int *arrl;
Two pointers. No arrays.
arl=&array[-len];
Undefined behavior. Even if the pointer named 'array'
had a valid value, the behavior is still undefined.
The only valid indices are those which refer within
the same object.
for(sum=0,i=len; i<2*len; i++){
sum += arrl[i];
}

it should give the same result.
Well, yes, undefined == undefined.
But there are some funny
things that can happen.
That's essentially what 'undefined behavior' is.
Another 'funny thing' that might happen is that
your keyboard could emit 100,000 volts.
For instance, if &array is 1000 and
len is 100000. In that case arrl will hold an address
(1000-100000) which presumably wraps around
There is no such thing as 'wrapping around' of arrays in C.
since the
pointer should be an unsigned int (whatever size int is).
A pointer is *not* an integer. It's a pointer. Its
representation and structure is implementation-specific.

The address it points to will be MAX_POINTER
There's no such thing as 'MAX_POINTER' in C.
- 100000 + 1000.
When the second form loop loop begins i=len (100000) so
arrl[100000] will wrap back around and point to the same
place as array[0].

Or will it?
Yes. No. Maybe. Sometimes. Never. Only on Wednesdays
when it rains in Tokyo. Your code produces undefined behavior.

It seems possible that this sort of array access "off the top of
memory" could trigger a fault.
The fault is with your code.

What does the C standard say about this (if anything)?

It says your code is not valid C.

Which C book(s) are you reading?

-Mike

Nov 14 '05 #2

Case

Mike Wahler wrote:

"David Mathog" <ma****@caltech.edu> wrote in message
news:20040527080256.4fbb14a0.ma****@caltech.edu...
for(sum=0, i=0; i<len; i++){

Also note that if you want to use them to index
(or record the size of) an array, the objects 'i'
and 'len' should be of type 'size_t', not 'int'.
sum += array[i];

Huh, why should a variable used as array index be of
type 'size_t'? K&R-2nd/ANSI uses 'int' quite often
to index arrays.

Case

Nov 14 '05 #3

Dan Pop

In <20****************************@caltech.edu> David Mathog <ma****@caltech.edu> writes:

If this:

int i,sum;
int *array;
for(sum=0, i=0; i<len; i++){
sum += array[i];
}

is converted to this (never mind why for the moment):

int i,sum;
int *array;
int *arrl;
arl=&array[-len];
for(sum=0,i=len; i<2*len; i++){
sum += arrl[i];
}

it should give the same result.

Do yourself a favour and read the FAQ. Don't come back until you've
finished it!

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #4

Default User

David Mathog wrote:

[a bunch of crazy bad code with undefined behavior]
I suggest to thoroughly learn the language before trying this sort of
whacky stuff. Once you become familiar with using arrays and pointers,
then you'll also find you don't need to ask these questions, you'll
already have the answers.

Brian Rodenborn

Nov 14 '05 #5

Stephen L.

David Mathog wrote:

If this:

int i,sum;
int *array;
for(sum=0, i=0; i<len; i++){
sum += array[i];
}

is converted to this (never mind why for the moment):

int i,sum;
int *array;
int *arrl;
arl=&array[-len];
for(sum=0,i=len; i<2*len; i++){
sum += arrl[i];
}

it should give the same result. But there are some funny
things that can happen. For instance, if &array is 1000 and
len is 100000. In that case arrl will hold an address
(1000-100000) which presumably wraps around since the
pointer should be an unsigned int (whatever size int is).
The address it points to will be MAX_POINTER - 100000 + 1000.
When the second form loop loop begins i=len (100000) so
arrl[100000] will wrap back around and point to the same
place as array[0].

Or will it?

It seems possible that this sort of array access "off the top of
memory" could trigger a fault.

What does the C standard say about this (if anything)?

Thanks,

David Mathog
ma****@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech

This is a little bit better of a starting point...

int
main (void)
{
int the_array[100]; /* Note, we're not initializing the
array... */
int i,sum;
int *array = the_array; /* same as ... = &the_array[ 0 ]; */
int len = sizeof (the_array) / sizeof (the_array[ 0 ]);

for(sum=0, i=0; i<len; i++){
sum += array[i];
}

return (0);
}

Now, the C language does _not_ keep track of
array bounds; arrays decay to a pointer (to
the 1st element of the object). As such, "indexing"
through an array in C will not "wrap-around"
to the beginning/end when either the end/beginning
is reached. It is the programmer's responsibility
to keep track of his array(s) and their bounds.

I believe standard (I can't cite the exact paragraph
- para-phrased) says that if the result of any
pointer calculation is not within the object,
the results are undefined.

For example, using the above -

array[ -1 ]

would be undefined. There are some languages
that would produce a nice run-time diagnostic
for such a reference; C is not one of them.
A reference like the above may do nothing
meaningful (it may access memory not part of the
object), or produce a "memory violation" error.
It could do both in the same program, at different
times. Those are just a couple of examples of
"undefined behavior".

Having said all of that, you can do something like -
int
main (void)
{
int the_array[100]; /* Note, we're not initializing the
array... */
int i,sum;
int len = sizeof (the_array) / sizeof (the_array[ 0 ]);
int *array = &the_array[ len - 1 ];
for(sum=0, i=0; i < len; i++){
sum += array[ -i ];
}

return (0);
}

Personally, I avoid constructs like the above
because it "looks" like I'm advancing through the
array instead of going from the end to the beginning.

I'm not sure how you arrived at the "translation"
you provided, though.
HTH,

Stephen

Nov 14 '05 #6

Dan Pop

In <40***********************@news.xs4all.nl> Case <no@no.no> writes:

Mike Wahler wrote:
"David Mathog" <ma****@caltech.edu> wrote in message
news:20040527080256.4fbb14a0.ma****@caltech.edu...
for(sum=0, i=0; i<len; i++){

Also note that if you want to use them to index
(or record the size of) an array, the objects 'i'
and 'len' should be of type 'size_t', not 'int'.
sum += array[i];

Huh, why should a variable used as array index be of
type 'size_t'? K&R-2nd/ANSI uses 'int' quite often
to index arrays.

For indexing purposes, any integer type does the job:

6.5.2.1 Array subscripting

Constraints

1 One of the expressions shall have type ``pointer to object type'',
the other expression shall have integer type, and the result
has type ``type''.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #7

Mike Wahler

"Case" <no@no.no> wrote in message
news:40***********************@news.xs4all.nl...

Mike Wahler wrote:
"David Mathog" <ma****@caltech.edu> wrote in message
news:20040527080256.4fbb14a0.ma****@caltech.edu...
for(sum=0, i=0; i<len; i++){

Also note that if you want to use them to index
(or record the size of) an array, the objects 'i'
and 'len' should be of type 'size_t', not 'int'.
sum += array[i];

Huh, why should a variable used as array index be of
type 'size_t'? K&R-2nd/ANSI uses 'int' quite often
to index arrays.

If the range is sufficient (and usage is valid), 'int'
will work. But 'size_t' is specifically guaranteed to
be able to represent the largest possible size object
(and by corollary, the largest possible number of (byte-sized)
objects.) Also, since 'size_t' is an unsigned type, sometimes
one might need a signed type like 'int', if one needs to
index 'backward' from a point other than the beginning of
an array. (I've never needed to do this, but I suppose
certain applications might).

I always use 'size_t', then I need not be concerned about
whether 'int' is (or will be) sufficient if my code changes
later.

-Mike

Nov 14 '05 #8

David Mathog

On 27 May 2004 16:25:23 GMT
Da*****@cern.ch (Dan Pop) wrote:

In <20****************************@caltech.edu> David Mathog <ma****@caltech.edu> writes:

Do yourself a favour and read the FAQ. Don't come back until you've
finished it!

Good advice. The answer is in 6.17 where it says:

(snip)
Although this technique is attractive (and was used in old editions of the book Numerical Recipes in C), it does not conform to the C standards. Pointer arithmetic is defined only as long as the pointer points within the same allocated block of memory, or to the imaginary ``terminating'' element one past it; otherwise, the behavior is undefined, even if the pointer is not dereferenced. The code above could fail if, while subtracting the offset, an illegal address were generated (perhaps because the address tried to ``wrap around'' past the beginning of some memory segment).

References: K&R2 Sec. 5.3 p. 100, Sec. 5.4 pp. 102-3, Sec. A7.7 pp. 205-6
ANSI Sec. 3.3.6
ISO Sec. 6.3.6
Rationale Sec. 3.2.2.3

By "one past it" is the FAQ referring to both ends of the
memory block or just the "high" end? Either way this would be
ok (calculates an address "one after it"):

int *p;
int *plast;
int sum;
p=malloc(100*sizeof(int));
plast=&(p[99]);
/* code which stores values into those 100 positions */
for(sum=0; p<=plast; p++){ sum += *p; }

but this might not be ok (calculates an address "one before it"),
change last line only of previous to:

for(sum=0; p<=plast; plast--){ sum += *plast; }

Thanks,

David Mathog
ma****@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech

Nov 14 '05 #9

Mark McIntyre

On Thu, 27 May 2004 18:35:37 +0200, in comp.lang.c , Case <no@no.no> wrote:

Huh, why should a variable used as array index be of
type 'size_t'?
because size_t is designated to be large enough to hold the maximum size of
an object, hence you can guarantee to access every member of any
concievable array. An int may be smaller than size_t (and is, on some
popular implementations).
K&R-2nd/ANSI uses 'int' quite often to index arrays.

K&R != ISO Standard.

--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.angelfire.com/ms3/bchambless0/welcome_to_clc.html>
----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---
----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---

Nov 14 '05 #10

Mark McIntyre

On Thu, 27 May 2004 14:56:52 -0700, in comp.lang.c , David Mathog
<ma****@caltech.edu> wrote:

By "one past it" is the FAQ referring to both ends of the
memory block or just the "high" end?

one past is (IMHO) unambiguous. If it included one before, it would say
"one past or before"...
--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.angelfire.com/ms3/bchambless0/welcome_to_clc.html>
----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---
----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---

Nov 14 '05 #11

Old Wolf

"Stephen L." <sd*********@cast-com.net> wrote:

Now, the C language does _not_ keep track of
array bounds;

For example, using the above -

array[ -1 ]

would be undefined. There are some languages
that would produce a nice run-time diagnostic
for such a reference; C is not one of them.

The standard permits implementations to bounds-check their arrays.
I have heard of some which offer this as a debug option.

Nov 14 '05 #12

Mike Wahler

"David Mathog" <ma****@caltech.edu> wrote in message
news:20040527145652.1f44beb6.ma****@caltech.edu...

On 27 May 2004 16:25:23 GMT
Da*****@cern.ch (Dan Pop) wrote:
In <20040527080256.4fbb14a0.ma****@caltech.edu> David Mathog <ma****@caltech.edu> writes:
Do yourself a favour and read the FAQ. Don't come back until you've
finished it!
Good advice. The answer is in 6.17 where it says:

(snip)
Although this technique is attractive (and was used in old editions of

the book Numerical Recipes in C), it does not conform to the C standards.
Pointer arithmetic is defined only as long as the pointer points within
the same allocated block of memory, or to the imaginary ``terminating''
element one past it; otherwise, the behavior is undefined, even if the
pointer is not dereferenced. The code above could fail if, while
subtracting the offset, an illegal address were generated (perhaps because
the address tried to ``wrap around'' past the beginning of some memory
segment).
References: K&R2 Sec. 5.3 p. 100, Sec. 5.4 pp. 102-3, Sec. A7.7 pp. 205-6 ANSI Sec. 3.3.6
ISO Sec. 6.3.6
Rationale Sec. 3.2.2.3

By "one past it" is the FAQ referring to both ends of the
memory block or just the "high" end?
Just the 'high' end. (That is, the highest address).
Either way this would be
ok (calculates an address "one after it"):

int *p;
int *plast;
int sum;
p=malloc(100*sizeof(int));
Nit:
More idiomatic is:

p = malloc(100 * sizeof *p);

If you later change the type of 'p', this line
need not be changed. Also don't forget to
check return value of 'malloc()'.
plast=&(p[99]);
/* code which stores values into those 100 positions */
for(sum=0; p<=plast; p++){ sum += *p; }

but this might not be ok
Is definitely not "OK".
(calculates an address "one before it"),
change last line only of previous to:

for(sum=0; p<=plast; plast--){ sum += *plast; }

Not only is pointing before the start of the array
invalid, even if it were valid, your loop would
be 'infinite' ( p<=plast would never go false (barring
some platform-specific 'wrapping').

-Mike

Nov 14 '05 #13

Dan Pop

In <vE*****************@newsread2.news.pas.earthlink. net> "Mike Wahler" <mk******@mkwahler.net> writes:

I always use 'size_t', then I need not be concerned about
whether 'int' is (or will be) sufficient if my code changes
later.

1. Using size_t may be wasteful. Precisely because it is supposed to be
large enough to cover the largest object supported by the
implementation.

2. As you also pointed out, using an unsigned type may be ocasionally
inconvenient. I prefer to use them exclusively for bit manipulation
purposes, unless I have a *real* need for the extended range.

3. Using an unknown type may also be ocasionally inconvenient. You don't
even know whether size_t gets promoted to int by the integral
promotions ;-)

Far too often, it is possible to tell whether int will do the job or not.
If it does the job, there is NO point in using any other type.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #14

David Mathog

On Fri, 28 May 2004 04:47:41 GMT
"Mike Wahler" <mk******@mkwahler.net> wrote:

"David Mathog" <ma****@caltech.edu> wrote in message
news:20040527145652.1f44beb6.ma****@caltech.edu...
On 27 May 2004 16:25:23 GMT <SNIP> plast=&(p[99]);
/* code which stores values into those 100 positions */
for(sum=0; p<=plast; p++){ sum += *p; }

but this might not be ok

Is definitely not "OK".
(calculates an address "one before it"),
change last line only of previous to:

for(sum=0; p<=plast; plast--){ sum += *plast; }

Interesting.

Can anybody explain the reason why the standard
makes 1 above allocated memory special but 1 below not?
Doing that destroys the symmetry of loops controlled by pointer
comparisons (as shown above). What is gained by this
restriction? Does this have something to do with the extra
data that (at least on some platforms) malloc stores just
before the allocated memory?

To make a legal decrementing loop controlled by pointer
comparison something like this must be used instead:

for(sum=0; ; plast--){
sum += *plast;
if(p==plast)break;
}

The incrementing case can be rewritten in this format too,
so the format is symmetric as long as the pointer test is
not inside the for().

Conversely, if an index is used instead the simple for() form
is symmetric, ie:

for(i=0; i<=99; i++){ sum += p[i]; };

and

for(i=99; i>=0; i--){ sum += p[i]; };

Thanks,

David Mathog
ma****@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech

Nov 14 '05 #15

CBFalconer

David Mathog wrote:

.... snip ...
Can anybody explain the reason why the standard makes 1 above
allocated memory special but 1 below not? Doing that destroys
the symmetry of loops controlled by pointer comparisons (as
shown above). What is gained by this restriction? Does this
have something to do with the extra data that (at least on
some platforms) malloc stores just before the allocated memory?
Because, for an array of large items, one before could be a
considerable distance (rather than one byte). Also, if the array
is the first item in a segment, any distance before can easily be
an illegal address and cause traps.

To make a legal decrementing loop controlled by pointer
comparison something like this must be used instead:

for(sum=0; ; plast--){
sum += *plast;
if(p==plast)break;
}

Assuming array a to be scanned, try:

p = &a[0] + sizeof(a)/sizeof(*p); /* one past */
sum = 0;
do {
sum += *(--tp);
} while (tp > &a[0]);

--
fix (vb.): 1. to paper over, obscure, hide from public view; 2.
to work around, in a way that produces unintended consequences
that are worse than the original problem. Usage: "Windows ME
fixes many of the shortcomings of Windows 98 SE". - Hutchison

Nov 14 '05 #16

David Mathog

On Thu, 27 May 2004 23:28:27 +0100
Mark McIntyre <ma**********@spamcop.net> wrote:

On Thu, 27 May 2004 14:56:52 -0700, in comp.lang.c , David Mathog
<ma****@caltech.edu> wrote:
By "one past it" is the FAQ referring to both ends of the
memory block or just the "high" end?

one past is (IMHO) unambiguous. If it included one before, it would say
"one past or before"...

IMHO it is ambiguous. We can probably agree that "one above" and "one below" are unambiguous since these refer to specific memory locations.

However "one past" is ambiguous since the definition
of "past" is vector in nature and the direction of that vector
is not otherwise specified in the FAQ. A pointer could
be either incrementing up through a memory block
or decrementing down through it. In the latter case "one past"
may still be applied (at least grammatically, if not in C code)
and in this case "one past" would generally be understood to
refer to the position immediately below the memory block being traversed.

Ie, if you were directed to "the house adjacent to and one past
the blue Victorian on Elm street" your direction of travel on that
street would determine which of the two possible houses fitting
this description is your destination.

Regards,

David Mathog
ma****@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech

Nov 14 '05 #17

Eric Sosman

David Mathog wrote:

However "one past" is ambiguous since the definition
of "past" is vector in nature and the direction of that vector
is not otherwise specified in the FAQ. [...]

The direction *is* specified in the FAQ, Question 6.17.
The specification is implicit, true: 6.17 says that trying
to form a pointer to an imaginary [-1] element is unreliable,
so negative-going "past" is ruled out. What directions remain?
Sideways, at right angles to the progression of memory addresses?
(Anybody want to submit a C09 proposal for pointer-plus-complex?
Or pointer-times-pointer, yielding the cross product? ;-)

--
Er*********@sun.com

Nov 14 '05 #18

Mark McIntyre

On Tue, 1 Jun 2004 12:18:59 -0700, in comp.lang.c , David Mathog
<ma****@caltech.edu> wrote:

On Thu, 27 May 2004 23:28:27 +0100
Mark McIntyre <ma**********@spamcop.net> wrote:
On Thu, 27 May 2004 14:56:52 -0700, in comp.lang.c , David Mathog
<ma****@caltech.edu> wrote:
>By "one past it" is the FAQ referring to both ends of the
>memory block or just the "high" end?
one past is (IMHO) unambiguous. If it included one before, it would say
"one past or before"...

IMHO it is ambiguous.

.... "one past" is ambiguous since the definition
The definition of past is not ambiguous. Check a dictionary - the adverbial
and prepositional meanings of "past" all relate to after, beyond etc.
of "past" is vector in nature
and an array is a vector. It points upwards. Hence its easy to define
"past"
Ie, if you were directed to "the house adjacent to and one past
the blue Victorian on Elm street" your direction of travel on that
street would determine which of the two possible houses fitting
this description is your destination.

true, but irrelevant, as in either case it means the one beyond. Only a
pathological use would attempt to refer to the point before.

--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.angelfire.com/ms3/bchambless0/welcome_to_clc.html>
----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---
----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---

Nov 14 '05 #19

David Mathog

On Tue, 01 Jun 2004 15:57:57 -0400
Eric Sosman <Er*********@sun.com> wrote:

David Mathog wrote:

However "one past" is ambiguous since the definition
of "past" is vector in nature and the direction of that vector
is not otherwise specified in the FAQ. [...]

The direction *is* specified in the FAQ, Question 6.17.
The specification is implicit, true: 6.17 says that trying
to form a pointer to an imaginary [-1] element is unreliable,
so negative-going "past" is ruled out.

You're right, it does define the direction implicitly, both by
stating that the example, containing a [-1], violates the standard and later in "while subtracting the offset". Which still doesn't
negate the point that using "one above" in that sentence would
make it unambiguous without reference to anything else,
but "one past" is ambiguous without the context of
the implicit information elsewhere in the question.

The rest of that sentence ("perhaps because the address
tried to ``wrap around'' past the beginning of some memory segment")
raises another question. If a block of memory extends up to the
highest possible address in the system (for instance, memory location 65535 on a system with 16 bit memory space and 16 bit unsigned
pointers) then the pointer value "one past" the allocated block
would be 0, and would "wrap around" exactly as described in the
FAQ 6.17 for the other direction.

in ANSI C address 0 (NULL) is special, is address -1 (top of memory)
also special?

This must come up on microcontrollers and other similar small
computing devices. (Yes, those are usually programmed in assembler
but there are C compilers for them too.)

Regards,

David Mathog
ma****@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech

Nov 14 '05 #20

Keith Thompson

David Mathog <ma****@caltech.edu> writes:
[...]

in ANSI C address 0 (NULL) is special, is address -1 (top of memory)
also special?

A null pointer in C is not necessarily "address 0". It can be
represented as an integer constant 0 in C source, but the actual
address could be anything. See section 5 of the C FAQ.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 14 '05 #21

Dan Pop

In <20****************************@caltech.edu> David Mathog <ma****@caltech.edu> writes:

in ANSI C address 0 (NULL) is special,
Read the FAQ! NO address is special in C. The null pointer constant
need not correspond to any address.
is address -1 (top of memory) also special?
NO address is special in C. The result of converting -1 to a pointer
value is implementation-defined.
This must come up on microcontrollers and other similar small
computing devices. (Yes, those are usually programmed in assembler
but there are C compilers for them too.)

So what? Those C compilers provide all the extensions needed to access
all the underlying hardware features. And C code using them is inherently
non-portable. Furthermore, microcontrollers are notorious for having
multiple address spaces (e.g. internal ROM, internal RAM, external ROM,
external RAM).

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #22

David Mathog

On 3 Jun 2004 13:24:35 GMT
Da*****@cern.ch (Dan Pop) wrote:

In <20****************************@caltech.edu> David Mathog <ma****@caltech.edu> writes:
in ANSI C address 0 (NULL) is special,

Read the FAQ! NO address is special in C. The null pointer constant
need not correspond to any address.
is address -1 (top of memory) also special?

NO address is special in C. The result of converting -1 to a pointer
value is implementation-defined.

So the standard says the following:

1. Pointer access to a memory block is valid when that pointer lies within that memory block or one address above it (but not one address
below it).

2. There is nothing special about either address 0 (typically used
as the NULL pointer, but not necessarily so) or the top of memory, or
any other memory location.

and real machines have the property:

3. Memory is finite.

So what exactly in the ANSI standard (as opposed to each compiler's implementation of it) guarantees that the following
code will work?

#define DTYPE int
#define ASIZE 100
DTYPE *pa;
DTYPE *pp;
DTYPE *plim;
pa=malloc(sizeof(DTYPE)*ASIZE);
if(pa){
plim = &pa[ASIZE];
for(pp=pa; pp<plim; pp++){ /* some operation on *pp */}
}
else {
(void) fprintf(stderr,"Oops, malloc failed, exiting now...\n");
exit(EXIT_FAILURE);
}

If malloc returns pa such that pa[ASIZE-1] is the last int at the
top of memory then the expression &pa[ASIZE] is going to resolve
to something peculiar (probably 0 in most implementations) and
the test

pp < plim

will fail on every iteration.

In other words, I don't see how the C standard reconciles statement 1
(a pointer value to a memory location one unit above the allocated
block is ok) and statement 2 (there are no special memory locations) with statement 3 (memory is finite).

In a particular implementation I can see that this problem can be avoided by, for instance, not letting malloc or the compiler
allocate a block of memory which ends exactly at the top of memory, or by using memory pointers with more range than exists in physical memory.

The example above uses DTYPE just to indicate that this isn't
a problem for a particular data type, it could also occur
for huge structures or single characters. Unless something
else prevents it, ASIZE could always be adjusted upwards until
pa[ASIZE-1] fell at the top of memory and triggered the problem.

And yes, I do see that recoding to this:

/* check the allocated memory location, not one above it*/
plim = &(pa[ASIZE-1]);
for(pp=pa; pp<=plim; pp++){ /* some operation on *pp */}

avoids the test on a possibly whacky pointer value
no matter where pa falls in memory.

Statement 2 seems to not be entirely accurate in any case.
If in some implementation malloc were to return a memory block
which began with an address corresponding to the bit
representation of NULL the program would exit when it
checked the address returned, even though
memory was allocated at that location. Presumably no
extant malloc will return such a block. And that does make
the memory location corresponding to NULL "special"
at least to the extent that it cannot be returned by malloc,
nor released by free().
Regards,

David Mathog
ma****@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech

Nov 14 '05 #23

Chris Torek

In article <20****************************@caltech.edu>
David Mathog <ma****@caltech.edu> writes:

So the standard says the following:

1. Pointer access to a memory block is valid when that pointer lies
within that memory block or one address above it (but not one address
below it).
Correct (although not in these words).
2. There is nothing special about either address 0 (typically used
as the NULL pointer, but not necessarily so) or the top of memory, or
any other memory location.
The C standards (C89/C90, "C95", C99) do not say this, but they do
not say there *is* something special about them, either. They
leave the details up to the implementor.
and real machines have the property:

3. Memory is finite.
Yes. The Standards' concerns with real machines are somewhat
tangential, though.
So what exactly in the ANSI standard (as opposed to each compiler's
implementation of it) guarantees that the following
code will work?

#define DTYPE int
#define ASIZE 100
DTYPE *pa;
DTYPE *pp;
DTYPE *plim;
pa=malloc(sizeof(DTYPE)*ASIZE);
if(pa){
plim = &pa[ASIZE];
for(pp=pa; pp<plim; pp++){ /* some operation on *pp */}
}
else {
(void) fprintf(stderr,"Oops, malloc failed, exiting now...\n");
exit(EXIT_FAILURE);
}
The wording in the standard.

Which wording? Well, you have to put a number of pieces together,
such as this key section on relational operators:

[#5] When two pointers are compared, the result depends on
the relative locations in the address space of the objects
pointed to. ... If the
expression P points to an element of an array object and the
expression Q points to the last element of the same array
object, the pointer expression Q+1 compares greater than P.
If malloc returns pa such that pa[ASIZE-1] is the last int at the
top of memory then the expression &pa[ASIZE] is going to resolve
to something peculiar (probably 0 in most implementations) and
the test

pp < plim

will fail on every iteration.
Yes, if malloc() returned such a "pa" and the machine worked in the
way you describe here, then "pp < plim" would fail. This would
contradict paragraph 5, rendering the implementation non-conforming.
In a particular implementation I can see that this problem can be
avoided by, for instance, not letting malloc or the compiler
allocate a block of memory which ends exactly at the top of memory, or
by using memory pointers with more range than exists in physical memory.
Those are two methods by which the implementation can correct the
problem and become conforming.
The example above uses DTYPE just to indicate that this isn't
a problem for a particular data type, it could also occur
for huge structures or single characters. Unless something
else prevents it, ASIZE could always be adjusted upwards until
pa[ASIZE-1] fell at the top of memory and triggered the problem.
The implementor can make use of implementation-specific tricks.
For instance, suppose that the absolute maximum alignment required
for any C code is 8 bytes (and the machine is a conventional 8-bit
byte-addressed one). Then malloc() need only avoid handing out
"last 8" bytes of the total address space.
If in some implementation malloc were to return a memory block
which began with an address corresponding to the bit
representation of NULL ...

In this case, the implementation might fail to conform -- although
actually *deciding* this is another matter entirely, since the
observable behavior is the same as "malloc() was unable to get
memory". In other words, if malloc returns a value that compares
equal to NULL, malloc() has failed to obtain memory, even if the
implementor incorrectly thinks it has succeeded -- but malloc() is
*always* allowed to fail, so the implementor has simply produced
a poor implementation, rather than a non-conforming one.

In other words, if malloc() returns a pointer that compares equal
to NULL even though there is memory available and the memory has
been allocated, then the implementor has goofed. The malloc()
function has a bug. The bug does not make the implementation
non-conforming; it just reflects badly on the implementor. :-)

This is typical of a standard, or indeed any work that attempts to
describe "desired outcome" instead of "mechanism". One does not
prescribe how malloc() is supposed to work, or the bit patterns
for various null-pointers; instead, one says "when malloc() succeeds,
the returned value compares unequal to NULL" or "if P points into
an array, and P+1 is `one-past-the-end', then computing P+1 is OK
and (P+1)>P produces the value 1" -- without saying *how* these
are to be achieved, so that implementors are free to come up with
new, wonderful ways of achieving them.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Nov 14 '05 #24

Eric Sosman

David Mathog wrote: [long lines wrapped for legibility]

On 3 Jun 2004 13:24:35 GMT
Da*****@cern.ch (Dan Pop) wrote:

In <20****************************@caltech.edu> David Mathog <ma****@caltech.edu> writes:

in ANSI C address 0 (NULL) is special,
Read the FAQ! NO address is special in C. The null pointer constant
need not correspond to any address.

is address -1 (top of memory) also special?

NO address is special in C. The result of converting -1 to a pointer
value is implementation-defined.

So the standard says the following:

1. Pointer access to a memory block is valid when that pointer lies
within that memory block or one address above it (but not one address
below it).

Depends what you mean by "pointer access to a memory block." It's
legal to compute a pointer value designating any element of an array
(considering a free-standing object to be an array of one element),
and it's legal to use such a value to access the array element. It's
also legal to compute `&array[N]' in an N-element array, and it's legal
to use this value in comparisons and for further arithmetic, but it's
*not* legal to use this value to access that non-existent array element.
2. There is nothing special about either address 0 (typically used
as the NULL pointer, but not necessarily so) or the top of memory, or
any other memory location.
True. There is not even a requirement that memory addresses be
numbers.
and real machines have the property:

3. Memory is finite.
Also, a pointer value has a finite number of bits: Even if you
had infinite memory, a C program could use only a finite amount of it.
So what exactly in the ANSI standard (as opposed to each compiler's
implementation of it) guarantees that the following
code will work?

#define DTYPE int
#define ASIZE 100
DTYPE *pa;
DTYPE *pp;
DTYPE *plim;
pa=malloc(sizeof(DTYPE)*ASIZE);
if(pa){
plim = &pa[ASIZE];
for(pp=pa; pp<plim; pp++){ /* some operation on *pp */}
}
else {
(void) fprintf(stderr,"Oops, malloc failed, exiting now...\n");
exit(EXIT_FAILURE);
}

If malloc returns pa such that pa[ASIZE-1] is the last int at the
top of memory then the expression &pa[ASIZE] is going to resolve
to something peculiar (probably 0 in most implementations) and
the test

pp < plim

will fail on every iteration.
The implementation must make this work "somehow." The Standard
doesn't specify the "how," but it requires the "what."
In other words, I don't see how the C standard reconciles statement 1
(a pointer value to a memory location one unit above the allocated
block is ok) and statement 2 (there are no special memory locations)
with statement 3 (memory is finite).

In a particular implementation I can see that this problem can be
avoided by, for instance, not letting malloc or the compiler
allocate a block of memory which ends exactly at the top of memory,
or by using memory pointers with more range than exists in physical
memory.
The first of these stratagems is commonly used. In the second
I think you probably mean "virtual" instead of "physical;" I haven't
encountered an implementation that works this way, but such a thing
could certainly be done.
The example above uses DTYPE just to indicate that this isn't
a problem for a particular data type, it could also occur
for huge structures or single characters. Unless something
else prevents it, ASIZE could always be adjusted upwards until
pa[ASIZE-1] fell at the top of memory and triggered the problem.
Doesn't matter. One unallocated byte suffices for the first
stratagem, and one unused pointer-value bit is enough for the
second. Remember, the "one past the end" pointer does not point
to an actual DTYPE object; there need not be sizeof(DTYPE) bytes
at that spot. All that's required is that the first byte of the
non-existent element be "addressable;" there's no need for any
additional bytes' addresses to make any sense.
And yes, I do see that recoding to this:

/* check the allocated memory location, not one above it*/
plim = &(pa[ASIZE-1]);
for(pp=pa; pp<=plim; pp++){ /* some operation on *pp */}

avoids the test on a possibly whacky pointer value
no matter where pa falls in memory.
`plim' would not be "whacky," but `pp' becomes so on the
final iteration.
Statement 2 seems to not be entirely accurate in any case.
If in some implementation malloc were to return a memory block
which began with an address corresponding to the bit
representation of NULL the program would exit when it
checked the address returned, even though
memory was allocated at that location. Presumably no
extant malloc will return such a block. And that does make
the memory location corresponding to NULL "special"
at least to the extent that it cannot be returned by malloc,
nor released by free().

The Standard does not require the existence of a "memory
location corresponding to NULL." It's true that on many machines
the representation of a null pointer would "work" as an address
if somehow fed into a load or store or other machine instruction.
But C does not require this, and (sez the FAQ) there have been
machines that implemented NULL values differently.

On "practical" machines, where NULL is "address zero" and an
"end of memory" exists, it usually turns out that keeping these
locations off-limits to C programs is no hardship. For instance,
some systems put a stack at the top of memory and let it grow
downward; if they can guarantee that the first thing pushed on
the stack is not a data object -- a return address, say -- then
there's no way a program can get a data object to butt against
the end of memory. The addresses starting at zero and working
upwards might be used for environment variables, or for data
exchange with the host system -- or simply made inaccessible
altogether, as a debugging aid. The upshot is that the program
and its data fit "between" the extremes of the hypothetical range
of addresses without coming "too close" to either end.

... and, of course, the Standard permits any other shenanigans
the implementation chooses to indulge in, provided the pointer
calculations produce the results they're supposed to.

--
Er*********@sun.com

Nov 14 '05 #25

What does the standard say about array access wraparound?

Similar topics