By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,238 Members | 1,859 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,238 IT Pros & Developers. It's quick & easy.

Simple assignment operator and object overlap

P: n/a
I came across the following paragraph in the "Semantics" section for
simple assignment in N1124 (C99 draft) and I'm wondering if I'm
interpreting it right:

6.5.16.1p3:

If the value being stored in an object is read from another object that
overlaps in any way
the storage of the first object, then the overlap shall be exact and
the two objects shall
have qualified or unqualified versions of a compatible type; otherwise,
the behavior is
undefined.

The only concievable way for the storage of two objects to overlap in
an assignment context (that I can think of) is in the case where the
new value being stored to an object of type T is the result of the *
operator on an object of type 'pointer to T' (or a pointer to a
different type cast to 'pointer to T'), where the pointer points to a
valid location within the object being assigned to. That's a lot of
verbiage on my part, so hopefully the following code snippet
illustrates what I mean:

/* assume sizeof int >=2 */

int main(void)
{
int i = 10;
int *ip = &i;

i = *ip; /* although pointless, this seems to be allowed \
* because although the storage of i and *ip \
* overlaps, the overlap is exact */

++ip; /* get ready to (potentially) cause some UB :-) */

i = *ip; /* from my understanding, this is UB \
* because the storage of i and *ip \
* overlaps (ip is pointing at a location \
* occupied by the storage of i) \
* but ip is -not- pointing at the lowest \
* addressable byte of i, thus the \
* overlap is not exact and it's undefined */
return 0;
}

DId I read the Standard correctly and is my example illustrative of
what the 6.5.16.1p3 intends to communicate? If not, please set my
straight (and also, if there are other instances where overlap can
occur in an assignment besides trying to assign an object a value from
its own storage, that would be interesting to know, as my example is a
shamelessly contrived one).

--
Mike S

Jul 7 '06 #1
Share this Question
Share on Google+
13 Replies


P: n/a
Mike S wrote:
I came across the following paragraph in the "Semantics" section for
simple assignment in N1124 (C99 draft) and I'm wondering if I'm
interpreting it right:

6.5.16.1p3:

If the value being stored in an object is read from another object that
overlaps in any way
the storage of the first object, then the overlap shall be exact and
the two objects shall
have qualified or unqualified versions of a compatible type; otherwise,
the behavior is
undefined.

The only concievable way for the storage of two objects to overlap in
an assignment context (that I can think of) is in the case where the
new value being stored to an object of type T is the result of the *
operator on an object of type 'pointer to T' (or a pointer to a
different type cast to 'pointer to T'), where the pointer points to a
valid location within the object being assigned to. That's a lot of
verbiage on my part, so hopefully the following code snippet
illustrates what I mean:

/* assume sizeof int >=2 */

int main(void)
{
int i = 10;
int *ip = &i;

i = *ip; /* although pointless, this seems to be allowed \
* because although the storage of i and *ip \
* overlaps, the overlap is exact */

++ip; /* get ready to (potentially) cause some UB :-) */

i = *ip; /* from my understanding, this is UB \
* because the storage of i and *ip \
* overlaps (ip is pointing at a location \
* occupied by the storage of i) \
* but ip is -not- pointing at the lowest \
* addressable byte of i, thus the \
* overlap is not exact and it's undefined */
return 0;
}
After the increment, the storage of i and *ip do not overlap,
but you've still managed to invoke undefined behavior. I think
declaring ip as a pointer to an unsigned char would be better
suited to your query.
DId I read the Standard correctly and is my example illustrative of
what the 6.5.16.1p3 intends to communicate? If not, please set my
straight (and also, if there are other instances where overlap can
occur in an assignment besides trying to assign an object a value from
its own storage, that would be interesting to know, as my example is a
shamelessly contrived one).
A union perhaps?

Jul 7 '06 #2

P: n/a
"Mike S" <mg******@netscape.netwrites:
I came across the following paragraph in the "Semantics" section for
simple assignment in N1124 (C99 draft) and I'm wondering if I'm
interpreting it right:

6.5.16.1p3:

If the value being stored in an object is read from another object
that overlaps in any way the storage of the first object, then the
overlap shall be exact and the two objects shall have qualified or
unqualified versions of a compatible type; otherwise, the behavior
is undefined.

The only concievable way for the storage of two objects to overlap in
an assignment context (that I can think of) is in the case where the
new value being stored to an object of type T is the result of the *
operator on an object of type 'pointer to T' (or a pointer to a
different type cast to 'pointer to T'), where the pointer points to a
valid location within the object being assigned to. That's a lot of
verbiage on my part, so hopefully the following code snippet
illustrates what I mean:

/* assume sizeof int >=2 */

int main(void)
{
int i = 10;
int *ip = &i;

i = *ip; /* although pointless, this seems to be allowed \
* because although the storage of i and *ip \
* overlaps, the overlap is exact */
So far, so good, I think.
++ip; /* get ready to (potentially) cause some UB :-) */
This causes ip to point to an int object just after i; if
sizeof(int)==4, it causes ip to advance by 4 bytes.
i = *ip; /* from my understanding, this is UB \
* because the storage of i and *ip \
* overlaps (ip is pointing at a location \
* occupied by the storage of i) \
* but ip is -not- pointing at the lowest \
* addressable byte of i, thus the \
* overlap is not exact and it's undefined */
No, i and *ip don't overlap, but dereferencing ip invokes undefined
behavior, since ip doesn't point to an object.

You could make i an array of 2 ints, and do some pointer conversions
to increment ip by 1 byte rather than by sizeof(int) bytes, but then
dereferencing ip would invoke undefined behavior if it's not aligned
properly. If the implementation allows 1-byte alignment for ints, and
if sizeof(int) >= 2 as you mentioned above, then this would be a
(rather contrived) example of what 6.5.16.1p3 is talking about.
return 0;
}

DId I read the Standard correctly and is my example illustrative of
what the 6.5.16.1p3 intends to communicate? If not, please set my
straight (and also, if there are other instances where overlap can
occur in an assignment besides trying to assign an object a value from
its own storage, that would be interesting to know, as my example is a
shamelessly contrived one).
You can build a less contrived example using unions. Here's what I've
come up with:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
struct big_type {
int arr[32];
};
struct s1 {
char c;
struct big_type bt1;
};
struct s2 {
long long x;
struct big_type bt2;
};
union u {
struct s1 sub1;
struct s2 sub2;
} obj;

printf("obj.sub1.bt1, offset = %d, size = %d\n",
(int)offsetof(union u, sub1.bt1),
(int)sizeof obj.sub1.bt1);
printf("obj.sub2.bt2, offset = %d, size = %d\n",
(int)offsetof(union u, sub2.bt2),
(int)sizeof obj.sub2.bt2);

/*
* Initialize bt1 so we can refer to it without invoking UB.
*/
memset(&obj.sub1.bt1, 0, sizeof obj.sub1.bt1);
/*
* Now assign its value to bt2. If bt1 and bt2 overlap, this
* invokes undefined behavior.
*/
obj.sub2.bt2 = obj.sub1.bt1;

return 0;
}

The output I get is:

obj.sub1.bt1, offset = 4, size = 128
obj.sub2.bt2, offset = 8, size = 128

so obj.sub1.bt1 and obj.sub2.bt2 overlap, but not completely, and
they're both properly aligned. (An implementation could assign them
both the same offset, so the program doesn't *unconditionally* invoke
undefined behavior.)

Assignment can be done using the equivalent of memcpy(); it doesn't
have to use the equivalent of memmove().

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Jul 7 '06 #3

P: n/a

Keith Thompson wrote:
"Mike S" <mg******@netscape.netwrites:
I came across the following paragraph in the "Semantics" section for
simple assignment in N1124 (C99 draft) and I'm wondering if I'm
interpreting it right:

6.5.16.1p3:

If the value being stored in an object is read from another object
that overlaps in any way the storage of the first object, then the
overlap shall be exact and the two objects shall have qualified or
unqualified versions of a compatible type; otherwise, the behavior
is undefined.

The only concievable way for the storage of two objects to overlap in
an assignment context (that I can think of) is in the case where the
new value being stored to an object of type T is the result of the *
operator on an object of type 'pointer to T' (or a pointer to a
different type cast to 'pointer to T'), where the pointer points to a
valid location within the object being assigned to. That's a lot of
verbiage on my part, so hopefully the following code snippet
illustrates what I mean:

/* assume sizeof int >=2 */

int main(void)
{
int i = 10;
int *ip = &i;

i = *ip; /* although pointless, this seems to be allowed \
* because although the storage of i and *ip \
* overlaps, the overlap is exact */

So far, so good, I think.
++ip; /* get ready to (potentially) cause some UB :-) */

This causes ip to point to an int object just after i; if
sizeof(int)==4, it causes ip to advance by 4 bytes.
Oops...Of course it does. My brain was thinking in assembly language
when I wrote that little gem ;-) As Dingo mentioned, I should have used
an unsigned char* to make my case. Even in doing so, since *ip might
afterwards not be aligned properly, the resulting undefined behavior
from that also invalidates my experiment, now that I think about it...

i = *ip; /* from my understanding, this is UB \
* because the storage of i and *ip \
* overlaps (ip is pointing at a location \
* occupied by the storage of i) \
* but ip is -not- pointing at the lowest \
* addressable byte of i, thus the \
* overlap is not exact and it's undefined */

No, i and *ip don't overlap, but dereferencing ip invokes undefined
behavior, since ip doesn't point to an object.

You could make i an array of 2 ints, and do some pointer conversions
to increment ip by 1 byte rather than by sizeof(int) bytes, but then
dereferencing ip would invoke undefined behavior if it's not aligned
properly. If the implementation allows 1-byte alignment for ints, and
if sizeof(int) >= 2 as you mentioned above, then this would be a
(rather contrived) example of what 6.5.16.1p3 is talking about.
return 0;
}

DId I read the Standard correctly and is my example illustrative of
what the 6.5.16.1p3 intends to communicate? If not, please set my
straight (and also, if there are other instances where overlap can
occur in an assignment besides trying to assign an object a value from
its own storage, that would be interesting to know, as my example is a
shamelessly contrived one).

You can build a less contrived example using unions. Here's what I've
come up with:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
struct big_type {
int arr[32];
};
struct s1 {
char c;
struct big_type bt1;
};
struct s2 {
long long x;
struct big_type bt2;
};
union u {
struct s1 sub1;
struct s2 sub2;
} obj;

printf("obj.sub1.bt1, offset = %d, size = %d\n",
(int)offsetof(union u, sub1.bt1),
(int)sizeof obj.sub1.bt1);
printf("obj.sub2.bt2, offset = %d, size = %d\n",
(int)offsetof(union u, sub2.bt2),
(int)sizeof obj.sub2.bt2);

/*
* Initialize bt1 so we can refer to it without invoking UB.
*/
memset(&obj.sub1.bt1, 0, sizeof obj.sub1.bt1);
/*
* Now assign its value to bt2. If bt1 and bt2 overlap, this
* invokes undefined behavior.
*/
obj.sub2.bt2 = obj.sub1.bt1;

return 0;
}

The output I get is:

obj.sub1.bt1, offset = 4, size = 128
obj.sub2.bt2, offset = 8, size = 128
[...]

Again, as Dingo mentioned and as you show here, unions are probably the
best way to illustrate the kind of scenario the Standard is referring
to. That should've been obvious, seeing as how the entire concept of
union relies on objects overlapping one another ;-)

--
Mike S

Jul 8 '06 #4

P: n/a

Keith Thompson wrote:
"Mike S" <mg******@netscape.netwrites:
int i = 10;
int *ip = &i;
++ip; /* get ready to (potentially) cause some UB :-) */
i = *ip; /* from my understanding, this is UB \
No, i and *ip don't overlap, but dereferencing ip invokes undefined
behavior, since ip doesn't point to an object.
ip refers to an unknown object after self-increasing. To write to *ip
at that time causes undefined behavior, to use the value of the *ip is
using some garbage data of type of *ip, and will not cause undefined
behavior. Is it?

lovecreatesbeauty

Jul 8 '06 #5

P: n/a
Mike S wrote:
I came across the following paragraph in the "Semantics" section for
simple assignment in N1124 (C99 draft) and I'm wondering if I'm
interpreting it right:

6.5.16.1p3:

If the value being stored in an object is read from another object that
overlaps in any way
the storage of the first object, then the overlap shall be exact and
the two objects shall
have qualified or unqualified versions of a compatible type; otherwise,
the behavior is
undefined.

The only concievable way for the storage of two objects to overlap in
an assignment context (that I can think of) is in the case where the
new value being stored to an object of type T is the result of the *
operator on an object of type 'pointer to T' (or a pointer to a
different type cast to 'pointer to T'), where the pointer points to a
valid location within the object being assigned to. That's a lot of
verbiage on my part, so hopefully the following code snippet
illustrates what I mean:

/* assume sizeof int >=2 */
Why would want or need to assume that?!
int main(void)
{
int i = 10;
int *ip = &i;

i = *ip; /* although pointless, this seems to be allowed \
* because although the storage of i and *ip \
* overlaps, the overlap is exact */

++ip; /* get ready to (potentially) cause some UB :-) */

i = *ip; /* from my understanding, this is UB \
* because the storage of i and *ip \
No, it's UB because ip is not guaranteed to point to an object.
* overlaps (ip is pointing at a location \
* occupied by the storage of i) \
* but ip is -not- pointing at the lowest \
* addressable byte of i, thus the \
* overlap is not exact and it's undefined */
return 0;
}

DId I read the Standard correctly and is my example illustrative of
what the 6.5.16.1p3 intends to communicate?
The more likely scenario is along the lines...

struct X { int x; };
struct Y { struct X x; int y; };

void foo(struct Y *yp, const struct X *xp)
{
yp->x = *xp;
}

Potentially, xp points to yp's struct X member. But the assignment
is valid in the same sense that i = i; is generally a valid assignment
(so long as i has a legitimate value).

The invalid case is something like...

struct X { unsigned char x[3]; }
unsigned char data[1024]; /* data read from binary file
perhaps */
struct X *x1 = (struct X *) &data[1];
struct X *x2 = (struct X *) &data[0];
*x1 = *x2;

Here the assignment is not valid because x2 partially overlaps x1.
Of course, there are alignment issues with this example, but that's a
separate matter.

The point is that implementations have greater freedom to optimise
assignments of larger objects more aggressively if they can assume
that objects either overlap exactly, or are mutually exclusive.

--
Peter

Jul 8 '06 #6

P: n/a
lovecreatesbeauty wrote:
Keith Thompson wrote:
"Mike S" <mg******@netscape.netwrites:
int i = 10;
int *ip = &i;
++ip; /* get ready to (potentially) cause some UB :-) */
i = *ip; /* from my understanding, this is UB \
No, i and *ip don't overlap, but dereferencing ip invokes undefined
behavior, since ip doesn't point to an object.

ip refers to an unknown object after self-increasing.
No, ip needn't refer to any object.
To write to *ip at that time causes undefined behavior,
Yes.
to use the value of the *ip is using some garbage data of type of *ip,
and will not cause undefined behavior. Is it?
No. If there is an object at ip, then it may be a trap representation.
But there needn't be _any_ object at ip.

C allows pointers to point to one byte past the end of an object or
array. But you cannot portably dereference such pointers since
the prior object may be on the edge of a real memory boundary.

Consider a 32-bit processor on a machine that doesn't actually
have 4GB of memory. The C implementations may put an object
on the edge of a real memory boundary. Moving the pointer one
byte beyond the object is usually safe on such machines because
it's just simple arithmetic. Dereferencing such a pointer though may
crash the system because the processor will attempt to retrieve
memory that doesn't exist, usually causing a hardware interrupt.

--
Peter

Jul 8 '06 #7

P: n/a
On 7 Jul 2006 22:20:28 -0700, "Peter Nilsson" <ai***@acay.com.au>
wrote:
>lovecreatesbeauty wrote:
>Keith Thompson wrote:
"Mike S" <mg******@netscape.netwrites:
int i = 10;
int *ip = &i;
++ip; /* get ready to (potentially) cause some UB :-) */
i = *ip; /* from my understanding, this is UB \
No, i and *ip don't overlap, but dereferencing ip invokes undefined
behavior, since ip doesn't point to an object.

ip refers to an unknown object after self-increasing.

No, ip needn't refer to any object.
>To write to *ip at that time causes undefined behavior,

Yes.
Not only storing in *ip but any attempt to evaluate *ip (reading
also).
>
>to use the value of the *ip is using some garbage data of type of *ip,
and will not cause undefined behavior. Is it?

No. If there is an object at ip, then it may be a trap representation.
But there needn't be _any_ object at ip.
It doesn't matter whether there is an object at that address or not or
what value the object may have. It is a constraint violation to
evaluate *ip. From n1123, para 6.5.6-8: "If the result points one
past the last element of the array object, it shall not be used as the
operand of a unary * operator that is evaluated." Footnote 89
provides the intuitive extension of "last element" to a scalar object.
>
C allows pointers to point to one byte past the end of an object or
array. But you cannot portably dereference such pointers since
the prior object may be on the edge of a real memory boundary.
It's not a question of portability. It is a constraint violation.

Remove del for email
Jul 8 '06 #8

P: n/a
Barry Schwarz wrote:
"Peter Nilsson" <ai***@acay.com.auwrote:
...
C allows pointers to point to one byte past the end of an object or
array. But you cannot portably dereference such pointers since
the prior object may be on the edge of a real memory boundary.

It's not a question of portability.
Yes it is.
It is a constraint violation.
Quite so Barry, but the fact that it's a constraint violation does not
say _why_ it's a constraint violation. I was trying to illustrate how
on
real machines, it may be possible to reference one-past-the-end
pointers, but not to dereference them. Thus, the standard allows
flexibility but provides a constraint against one form of misuse.

--
Peter

Jul 9 '06 #9

P: n/a
On 9 Jul 2006 15:25:24 -0700, "Peter Nilsson" <ai***@acay.com.au>
wrote:
>Barry Schwarz wrote:
>"Peter Nilsson" <ai***@acay.com.auwrote:
...
C allows pointers to point to one byte past the end of an object or
array. But you cannot portably dereference such pointers since
the prior object may be on the edge of a real memory boundary.

It's not a question of portability.

Yes it is.
You said such a pointer could not be portably dereferenced. This
implies that such a dereference is no worse than assuming integers are
little-endian, which is in fact something that cannot be done
portably.

But it is worse. It is a constraint violation and therefore invokes
undefined behavior. It is in the same vain as referencing allocated
memory after freeing it.
>
>It is a constraint violation.

Quite so Barry, but the fact that it's a constraint violation does not
say _why_ it's a constraint violation. I was trying to illustrate how
on
real machines, it may be possible to reference one-past-the-end
pointers, but not to dereference them. Thus, the standard allows
flexibility but provides a constraint against one form of misuse.
The standard does not allow you to reference one past the end. It
allows you to calculate that address and use that address in the
intuitively obvious manner. But any attempt, or if you prefer all
attempts, to reference the memory is/are undefined.

Remove del for email
Jul 10 '06 #10

P: n/a
Barry Schwarz wrote:
"Mike S" <mg******@netscape.netwrites:
int i = 10;
int *ip = &i;
++ip; /* get ready to (potentially) cause some UB :-) */
i = *ip; /* from my understanding, this is UB \
It doesn't matter whether there is an object at that address or not or
what value the object may have. It is a constraint violation to
evaluate *ip. From n1123, para 6.5.6-8: "If the result points one
past the last element of the array object, it shall not be used as the
operand of a unary * operator that is evaluated."
Did you mean n1124? 6.5.6#8 is in the Semantics section, not the
Constraints section, so no, it's "just" undefined behaviour. A
constraint violation is stronger than that: it requires a diagnostic
(which is pretty much impossible for this in the general case), and may
cause the program to fail to compile even if it is code would otherwise
never be reached.

Jul 10 '06 #11

P: n/a
Barry Schwarz <sc******@doezl.netwrites:
On 7 Jul 2006 22:20:28 -0700, "Peter Nilsson" <ai***@acay.com.au>
wrote:
[...]
>>C allows pointers to point to one byte past the end of an object or
array. But you cannot portably dereference such pointers since
the prior object may be on the edge of a real memory boundary.

It's not a question of portability. It is a constraint violation.
No, it's not a constraint violation; it's undefined behavior.

An implementation is required to issue a compile-time diagnostic when
a constraint is violated (e.g., for a type mismatch in an assignment
statement). Attempting to dereference a pointer just past the end of
an object cannot in general be detected at compile time.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Jul 10 '06 #12

P: n/a
Barry Schwarz wrote:
"Peter Nilsson" <ai***@acay.com.auwrote:
Barry Schwarz wrote:
"Peter Nilsson" <ai***@acay.com.auwrote:
...
C allows pointers to point to one byte past the end of an object or
array. But you cannot portably dereference such pointers since
the prior object may be on the edge of a real memory boundary.
>
It's not a question of portability.
Yes it is.

You said such a pointer could not be portably dereferenced.
This implies that such a dereference is no worse than assuming
integers are little-endian, which is in fact something that cannot
be done portably.
In some cases, dereferencing one past the end pointers (and beyond)
is considerably less worse than assuming little-endian.

[Think struct hack.]

--
Peter

Jul 11 '06 #13

P: n/a
On 10 Jul 2006 20:42:44 -0700, "Peter Nilsson" <ai***@acay.com.au>
wrote:
>Barry Schwarz wrote:
>"Peter Nilsson" <ai***@acay.com.auwrote:
Barry Schwarz wrote:
"Peter Nilsson" <ai***@acay.com.auwrote:
...
C allows pointers to point to one byte past the end of an object or
array. But you cannot portably dereference such pointers since
the prior object may be on the edge of a real memory boundary.

It's not a question of portability.

Yes it is.

You said such a pointer could not be portably dereferenced.
This implies that such a dereference is no worse than assuming
integers are little-endian, which is in fact something that cannot
be done portably.

In some cases, dereferencing one past the end pointers (and beyond)
is considerably less worse than assuming little-endian.

[Think struct hack.]
But the struct hack requires you to allocate more space than sizeof
struct so you are still in the area of memory allocated for you.
Remove del for email
Jul 11 '06 #14

This discussion thread is closed

Replies have been disabled for this discussion.