I try to understand strict aliasing rules that are in the C Standard.
As gcc applies these rules by default, I just want to be sure to
understand fully this issue.
For questions (1), (2) and (3), I think that the answers are all "yes",
but I would be glad to have strong confirmation.
About questions (4), (5) and (6), I really don't know. Please help ! !
!
--------
The Standard says ( http://www.open-std.org/jtc1/sc22/wg...docs/n1124.pdf chapter 6.5
):
An object shall have its stored value accessed only by an lvalue
expression that has one of
the following types:
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type
of the object,
- a type that is the signed or unsigned type corresponding to the
effective type of the object,
- a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned
types among its members
(including, recursively, a member of a subaggregate or contained
union), or
- a character type.
***** Question (1) *****
Let's have two struct having different tag names, like:
struct s1 {int i;};
struct s2 {int i;};
struct s1 *p1;
struct s2 *p2;
The compiler is free to assume that p1 and p2 point to different memory
locations and don't alias.
Two struct having different names are considered to be different types.
In the standard, we read the wording "effective type of the object"
many times.
This "effective type of the object" may be an "int", "double", etc, but
may also be a "struct" type, right ???
And I suppose it may also be an "array" type or an "union" type as
well, is it correct ???
***** Question (2) *****
In the little program that follows, the line "printf("%d\n", *x);"
normally returns 123,
but an optimizing compiler can return garbage instead of 123.
Is my reasoning correct ???
On the other side, the line "printf("%d\n", p1->i);" always returns 999
as expected, right ???
----
#include <stdio.h>
#include <stdlib.h>
struct s1 { int i; double f; };
int main(void)
{
struct s1* p1;
int* x;
p1 = malloc(sizeof(*p1));
p1->i = 123; // object of type 'struct s1' contains 123
x = &(p1->i);
printf("%d\n", *x); // I try to access a value stored in an
object of type 'struct s1'
// through *x which is of type 'int'.
// I think this is not allowed by the
standard !
*x = 999; // I store 999 in *x, which is of type 'int'
printf("%d\n", p1->i); // I access a value stored in *x which is of
type 'int'
// by *p1 ( as p1->i is a shortcut for
(*p1).i )
// which is of type 'struct s1',
// but contains a member of type 'int'.
// I think this is allowed by the standard.
return 0;
}
***** Question (3) *****
The Standard forbids ( if I am not mistaken ) pointer of type "struct A
*" to access data written by a pointer of type "struct B *", as the are
different types.
This means that the common usage of faking inheritance in C like in
this code sniplet is now utterly wrong, is it correct ???
--- myfile.c ---
#include <stdio.h>
#include <stdlib.h>
typedef enum { RED, BLUE, GREEN } Color;
struct Point { int x;
int y;
};
struct Color_Point { int x;
int y;
Color color;
};
struct Color_Point2{ struct Point point;
Color color;
};
int main(int argc, char* argv[])
{
struct Point* p;
struct Color_Point* my_color_point = malloc(sizeof(struct
Color_Point));
my_color_point->x = 10;
my_color_point->y = 20;
my_color_point->color = GREEN;
p = (struct Point*)my_color_point;
printf("x:%d, y:%d\n", p->x, p->y); // trying to access data stored in
a "struct Color_Point" object using a "struct Point*" pointer is
forbidden by the Standard ???
struct Color_Point2* my_color_point2 = malloc(sizeof(struct
Color_Point2));
my_color_point2->point.x = 100;
my_color_point2->point.y = 200;
my_color_point2->color = RED;
p = (struct Point*)my_color_point2;
printf("x:%d, y:%d\n", p->x, p->y); // trying to access data stored in
a "struct Color_Point2" object using a "struct Point*" pointer is
forbidden by the Standard ???
p = &my_color_point2->point;
printf("x:%d, y:%d\n", p->x, p->y); // but this is correct, right ???
return 0;
}
Is the line "p = (struct Point*)my_color_point" also a case of what is
called "type-punning" ???
***** Question (4) *****
In the Standard, chapter 6.5.2.3, it is written:
One special guarantee is made in order to simplify the use of unions:
if a union contains
several structures that share a common initial sequence (see below),
and if the union
object currently contains one of these structures, it is permitted to
inspect the common
initial part of any of them anywhere that a declaration of the complete
type of the union is
visible. Two structures share a common initial sequence if
corresponding members have
compatible types (and, for bit-fields, the same widths) for a sequence
of one or more
initial members.
I find this statement completely obscure.
Let's have:
struct s1 {int i;};
struct s2 {int i;};
struct s1 *p1;
struct s2 *p2;
A compiler is free to assume that *p1 and *p2 don't alias.
If we just put a union declaration like this before this code, then it
acts like a flag to the compiler, indicating that pointers to "struct
s1" and pointers to "struct s2" ( here, p1 and p2 ) may alias and point
to the same location.
union p1_p2_alias_flag { struct s1 st1;
struct s2 st2;
};
There is no need to use "union p1_p2_alias_flag" for accessing data,
and "p1_p2_alias_flag", "st1" and "st2" are just dummy names, not used
anywhere else.
I mean, it is possible to access data using directly p1 and p2.
Do you agree, everybody ???
***** Question (5) *****
This question is really hard.
Let's have this code sniplet:
---------
#include <stdio.h>
int main (void)
{
struct s1 {int i;
};
struct s1 s = {77};
unsigned char* x = (unsigned char*)&s;
printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2], (int)x[3]);
// Standard says data stored in "struct s1" type can be read by pointer
to "char"
x[0] = 100; // here, I write data in "char" objects !!!
x[1] = 101;
x[2] = 102;
x[3] = 103;
printf("%d\n", s.i); // but data stored in "char" objects cannot be
read by pointer to "struct s1" ???
return 0;
}
-----------
For the line "printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2],
(int)x[3]);", I can rewrite the Standard clause like this:
An object [ here, s of type "struct s1" ] shall have its stored value
accessed only by an lvalue expression that has one of
the following types:
[ blah blah blah ]
- a character type [ in our example, x[0], x[1], x[2], x[3] ]. //
it is our case, so everything is OK so far !
But what about the line "printf("%d\n", s.i);" ??????
I read the Standard again and again, but I cannot express how is can
work.
If I rewrite the Standard clause, it gives:
An object [ in our example, x[0], x[1], x[2], and x[3] ] shall have its
stored value accessed only by an lvalue expression that has one of
the following types:
- a type compatible with the effective type of the object, [ this is
not our case ]
- a qualified version of a type compatible with the effective type
of the object, [ still not our case ]
- a type that is the signed or unsigned type corresponding to the
effective type of the object, [ still not our case ]
- a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object, [ still not our
case ]
- an aggregate or union type that includes one of the aforementioned
types among its members [ we read through "s" which is of type "struct
s1", but it does not contain a member of type "char" ]
(including, recursively, a member of a subaggregate or contained
union), or
- a character type. [ definitely not our case ]
We see that none of these conditions applies in our case.
Where is the flaw in my reasoning ???
Does the last "printf" line of this code sniplet work or not ??? and
why ???
***** Question (6) *****
I often see this code used with socket programming:
struct sockaddr_in my_addr;
...
bind(sockfd, (struct sockaddr *)&my_addr, sizeof(struct sockaddr));
The function bind(...) needs a pointer to "struct sockaddr", but
my_addr is a "struct sockaddr_in".
So, in my opinion, the function bind is not guaranteed to access safely
the content of object my_addr.
Someone knows why this code is not broken ( or if it is ) ??? 20 3682
In article <11**********************@f14g2000cwb.googlegroups .com>, ni************@genevoise.ch wrote: ***** Question (2) *****
In the little program that follows, the line "printf("%d\n", *x);" normally returns 123, but an optimizing compiler can return garbage instead of 123. Is my reasoning correct ???
On the other side, the line "printf("%d\n", p1->i);" always returns 999 as expected, right ???
----
#include <stdio.h> #include <stdlib.h>
struct s1 { int i; double f; };
int main(void) { struct s1* p1; int* x;
p1 = malloc(sizeof(*p1)); p1->i = 123; // object of type 'struct s1' contains 123
x = &(p1->i);
printf("%d\n", *x); // I try to access a value stored in an object of type 'struct s1' // through *x which is of type 'int'. // I think this is not allowed by the standard !
*x = 999; // I store 999 in *x, which is of type 'int'
printf("%d\n", p1->i); // I access a value stored in *x which is of type 'int' // by *p1 ( as p1->i is a shortcut for (*p1).i ) // which is of type 'struct s1', // but contains a member of type 'int'. // I think this is allowed by the standard.
return 0; }
This is all ok. The only unusual thing with structs is that there can be
padding, and that storing into any struct member could modify any
padding in the struct. If there is padding between int i and double f,
then p1->i = 123 could modify the padding, while *x = 999 couldn't.
***** Question (3) *****
The Standard forbids ( if I am not mistaken ) pointer of type "struct A *" to access data written by a pointer of type "struct B *", as the are different types.
This means that the common usage of faking inheritance in C like in this code sniplet is now utterly wrong, is it correct ???
--- myfile.c ---
#include <stdio.h> #include <stdlib.h>
typedef enum { RED, BLUE, GREEN } Color;
struct Point { int x; int y; };
struct Color_Point { int x; int y; Color color; };
struct Color_Point2{ struct Point point; Color color; };
int main(int argc, char* argv[]) {
struct Point* p;
struct Color_Point* my_color_point = malloc(sizeof(struct Color_Point)); my_color_point->x = 10; my_color_point->y = 20; my_color_point->color = GREEN;
p = (struct Point*)my_color_point;
This is undefined behavior. There is no guarantee that my_color_point is
correctly aligned for a pointer of type (struct Point *).
printf("x:%d, y:%d\n", p->x, p->y); // trying to access data stored in a "struct Color_Point" object using a "struct Point*" pointer is forbidden by the Standard ???
Yes. There is an exception: If the compiler has seen a declaration of a
union with members of type "struct Point" and "struct Color_Point", then
accessing the common members initial members of both structs is legal;
even writing to a member of one struct and reading as a member of
another struct.
struct Color_Point2* my_color_point2 = malloc(sizeof(struct Color_Point2)); my_color_point2->point.x = 100; my_color_point2->point.y = 200; my_color_point2->color = RED;
p = (struct Point*)my_color_point2;
Yes, you can always cast a pointer to struct to a pointer of the first
member.
printf("x:%d, y:%d\n", p->x, p->y); // trying to access data stored in a "struct Color_Point2" object using a "struct Point*" pointer is forbidden by the Standard ???
That's fine.
p = &my_color_point2->point;
printf("x:%d, y:%d\n", p->x, p->y); // but this is correct, right ???
return 0; }
Is the line "p = (struct Point*)my_color_point" also a case of what is called "type-punning" ???
***** Question (4) *****
In the Standard, chapter 6.5.2.3, it is written:
One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the complete type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.
I find this statement completely obscure.
Let's have:
struct s1 {int i;}; struct s2 {int i;};
struct s1 *p1; struct s2 *p2;
A compiler is free to assume that *p1 and *p2 don't alias.
Exactly.
If we just put a union declaration like this before this code, then it acts like a flag to the compiler, indicating that pointers to "struct s1" and pointers to "struct s2" ( here, p1 and p2 ) may alias and point to the same location.
union p1_p2_alias_flag { struct s1 st1; struct s2 st2; };
There is no need to use "union p1_p2_alias_flag" for accessing data, and "p1_p2_alias_flag", "st1" and "st2" are just dummy names, not used anywhere else. I mean, it is possible to access data using directly p1 and p2.
Yes, that is right.
***** Question (5) *****
This question is really hard.
Let's have this code sniplet:
--------- #include <stdio.h>
int main (void) {
struct s1 {int i; };
struct s1 s = {77};
unsigned char* x = (unsigned char*)&s; printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2], (int)x[3]); // Standard says data stored in "struct s1" type can be read by pointer to "char"
That is if sizeof (int) >= 4, which is nowhere guaranteed.
x[0] = 100; // here, I write data in "char" objects !!! x[1] = 101; x[2] = 102; x[3] = 103;
printf("%d\n", s.i); // but data stored in "char" objects cannot be read by pointer to "struct s1" ???
Assuming that sizeof (int) == 4, you have changed exactly every bit in
the representation of x. If the representation is not a trap
representation, you are fine. And it is even ok if for example the
result after storing three bytes, combined with the last remaining byte
of the number 77 were a trap representation, because you never access
that value.
return 0; }
For the line "printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2], (int)x[3]);", I can rewrite the Standard clause like this:
An object [ here, s of type "struct s1" ] shall have its stored value accessed only by an lvalue expression that has one of the following types: [ blah blah blah ] - a character type [ in our example, x[0], x[1], x[2], x[3] ]. // it is our case, so everything is OK so far !
But what about the line "printf("%d\n", s.i);" ?????? I read the Standard again and again, but I cannot express how is can work.
If the bytes stored are a valid representation of an int, then that is
what it prints. If not, it is undefined behavior. A specific compiler
might guarantee that int's have no trap representations.
***** Question (6) *****
I often see this code used with socket programming:
struct sockaddr_in my_addr; ... bind(sockfd, (struct sockaddr *)&my_addr, sizeof(struct sockaddr));
The function bind(...) needs a pointer to "struct sockaddr", but my_addr is a "struct sockaddr_in". So, in my opinion, the function bind is not guaranteed to access safely the content of object my_addr.
Someone knows why this code is not broken ( or if it is ) ???
Depends on the declarations of the types involved. And remember that the
C Standard is not the only standard. For example, C Standard doesn't
guarantee that 'a' + 1 == 'b', but if your C implementation uses ASCII
or Unicode for its character set, then the ASCII standard or the Unicode
standard would give you that guarantee.
In your case, it could be that POSIX guarantees that the code is
correct. So it will work on any implementation that conforms to the
POSIX standard (no matter whether it conforms to the C Standard or not),
even though it might not work on an implementation that conforms to the
C Standard but not to POSIX.
On 13 Oct 2005 07:39:48 -0700, ni************@genevoise.ch wrote in
comp.lang.c: I try to understand strict aliasing rules that are in the C Standard. As gcc applies these rules by default, I just want to be sure to understand fully this issue.
For questions (1), (2) and (3), I think that the answers are all "yes", but I would be glad to have strong confirmation.
About questions (4), (5) and (6), I really don't know. Please help ! ! !
--------
The Standard says ( http://www.open-std.org/jtc1/sc22/wg...docs/n1124.pdf chapter 6.5 ):
An object shall have its stored value accessed only by an lvalue expression that has one of the following types: - a type compatible with the effective type of the object, - a qualified version of a type compatible with the effective type of the object, - a type that is the signed or unsigned type corresponding to the effective type of the object, - a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object, - an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or - a character type.
***** Question (1) *****
Let's have two struct having different tag names, like:
struct s1 {int i;}; struct s2 {int i;};
struct s1 *p1; struct s2 *p2;
The compiler is free to assume that p1 and p2 point to different memory locations and don't alias. Two struct having different names are considered to be different types.
In the standard, we read the wording "effective type of the object" many times.
This "effective type of the object" may be an "int", "double", etc, but may also be a "struct" type, right ???
And I suppose it may also be an "array" type or an "union" type as well, is it correct ???
Yes.
***** Question (2) *****
In the little program that follows, the line "printf("%d\n", *x);" normally returns 123, but an optimizing compiler can return garbage instead of 123.
No, an optimizing compiler must still output "123" for this line.
Is my reasoning correct ???
On the other side, the line "printf("%d\n", p1->i);" always returns 999 as expected, right ???
----
#include <stdio.h> #include <stdlib.h>
struct s1 { int i; double f; };
int main(void) { struct s1* p1; int* x;
p1 = malloc(sizeof(*p1)); p1->i = 123; // object of type 'struct s1' contains 123
x = &(p1->i);
printf("%d\n", *x); // I try to access a value stored in an object of type 'struct s1' // through *x which is of type 'int'. // I think this is not allowed by the standard !
The effective type of *p1 is 'struct s1'. The effective type of s1.i
is 'int'. 'x' is a pointer to int, and you have initialized it with a
pointer to an int. This is perfectly legal.
Since the int contains the value 123, and 'x' quite properly points to
that int, *x must retrieve the int value 123. It can't do anything
else.
*x = 999; // I store 999 in *x, which is of type 'int'
printf("%d\n", p1->i); // I access a value stored in *x which is of type 'int' // by *p1 ( as p1->i is a shortcut for (*p1).i ) // which is of type 'struct s1', // but contains a member of type 'int'. // I think this is allowed by the standard.
return 0; }
***** Question (3) *****
The Standard forbids ( if I am not mistaken ) pointer of type "struct A *" to access data written by a pointer of type "struct B *", as the are different types.
This means that the common usage of faking inheritance in C like in this code sniplet is now utterly wrong, is it correct ???
--- myfile.c ---
#include <stdio.h> #include <stdlib.h>
typedef enum { RED, BLUE, GREEN } Color;
struct Point { int x; int y; };
struct Color_Point { int x; int y; Color color; };
struct Color_Point2{ struct Point point; Color color; };
int main(int argc, char* argv[]) {
struct Point* p;
struct Color_Point* my_color_point = malloc(sizeof(struct Color_Point)); my_color_point->x = 10; my_color_point->y = 20; my_color_point->color = GREEN;
p = (struct Point*)my_color_point;
printf("x:%d, y:%d\n", p->x, p->y); // trying to access data stored in
This is undefined behavior, pure and simple. It works on many
implementations, but is not guaranteed at all.
[snip]
Is the line "p = (struct Point*)my_color_point" also a case of what is called "type-punning" ???
Type punning is not a term defined by the standard, but I would say
that the act of assigning the pointer via a cast is not type punning.
Accessing a member of the foreign structure type through the pointer
is.
***** Question (4) *****
In the Standard, chapter 6.5.2.3, it is written:
One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the complete type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.
I find this statement completely obscure.
Let's have:
struct s1 {int i;}; struct s2 {int i;};
struct s1 *p1; struct s2 *p2;
A compiler is free to assume that *p1 and *p2 don't alias.
If we just put a union declaration like this before this code, then it acts like a flag to the compiler, indicating that pointers to "struct s1" and pointers to "struct s2" ( here, p1 and p2 ) may alias and point to the same location.
union p1_p2_alias_flag { struct s1 st1; struct s2 st2; };
There is no need to use "union p1_p2_alias_flag" for accessing data, and "p1_p2_alias_flag", "st1" and "st2" are just dummy names, not used anywhere else. I mean, it is possible to access data using directly p1 and p2.
It seems unlikely that a compiler could find a way to prevent it from
working in general, even if the implementer tried, but such behavior
would not render the compiler non-conforming.
On the other hand, since your structure only contains a single member,
and the first member always begins at the same address as the
structure itself, this particular usage can't fail.
Still, the behavior is undefined. Which means the language standard
places no requirements on it at all. Do you agree, everybody ???
***** Question (5) *****
This question is really hard.
Let's have this code sniplet:
--------- #include <stdio.h>
int main (void) {
struct s1 {int i; };
struct s1 s = {77};
unsigned char* x = (unsigned char*)&s; printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2], (int)x[3]); // Standard says data stored in "struct s1" type can be read by pointer to "char"
x[0] = 100; // here, I write data in "char" objects !!! x[1] = 101; x[2] = 102; x[3] = 103;
The standard does not say that you can do this. You are assuming that
sizeof(int) is at least 4, and there are implementations where that is
not true. Accessing, let alone writing to, x[1], x[2], or x[3] might
be outside the bounds of the int and the struct, producing undefined
behavior.
printf("%d\n", s.i); // but data stored in "char" objects cannot be read by pointer to "struct s1" ???
return 0; }
No, the point is that accessing s.i, an int, after storing data into
that memory using a different object type, is undefined. You might
have created a bit pattern that does not represent a valid value for
the int, called a trap representation.
-----------
For the line "printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2], (int)x[3]);", I can rewrite the Standard clause like this:
An object [ here, s of type "struct s1" ] shall have its stored value accessed only by an lvalue expression that has one of the following types: [ blah blah blah ] - a character type [ in our example, x[0], x[1], x[2], x[3] ]. // it is our case, so everything is OK so far !
I have worked on a platform where sizeof(int) is 1, and several where
sizeof(int) is 2. I have never worked on a platform where sizeof(int)
is 3, but C allows it. On any of these platforms you would be
invoking undefined behavior.
But what about the line "printf("%d\n", s.i);" ??????
Even assuming that sizeof(int) >= 4 on your implementation, you have
to understand that all types, other than unsigned char, can have trap
representations, that is bit patterns that do not represent a valid
value for the type. By writing arbitrary bit patterns into an int,
you may have created an invalid bit pattern in that int. When you
access that invalid bit pattern as an int, the behavior is undefined.
I read the Standard again and again, but I cannot express how is can work. If I rewrite the Standard clause, it gives:
An object [ in our example, x[0], x[1], x[2], and x[3] ] shall have its stored value accessed only by an lvalue expression that has one of the following types: - a type compatible with the effective type of the object, [ this is not our case ] - a qualified version of a type compatible with the effective type of the object, [ still not our case ] - a type that is the signed or unsigned type corresponding to the effective type of the object, [ still not our case ] - a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object, [ still not our case ] - an aggregate or union type that includes one of the aforementioned types among its members [ we read through "s" which is of type "struct s1", but it does not contain a member of type "char" ] (including, recursively, a member of a subaggregate or contained union), or - a character type. [ definitely not our case ]
We see that none of these conditions applies in our case.
The standard provides a specific list of what is allowed. Lists like
this are always exhaustive. That means anything on the list is
specifically undefined.
Where is the flaw in my reasoning ???
There is no flaw in your reasoning, the code produces undefined
behavior.
Does the last "printf" line of this code sniplet work or not ??? and why ???
There is no question of "work". Whatever it does is just as right or
wrong as anything else that might happen as far as the language is
concerned. That's what undefined behavior means. The C standard does
not know or care what happens.
***** Question (6) *****
I often see this code used with socket programming:
struct sockaddr_in my_addr; ... bind(sockfd, (struct sockaddr *)&my_addr, sizeof(struct sockaddr));
The function bind(...) needs a pointer to "struct sockaddr", but my_addr is a "struct sockaddr_in". So, in my opinion, the function bind is not guaranteed to access safely the content of object my_addr.
Someone knows why this code is not broken ( or if it is ) ???
That depends on the definition of 'struct sockaddr_in'. If its first
member is a 'struct sockaddr', the code is legal and well defined
because a pointer to a structure can always be converted to a pointer
to its first member. If not, then the code produces undefined
behavior if the called function actually uses the pointer to access
members of a 'struct sockaddr'.
You use terms like "broken" and "work", which do not really apply as
far as undefined behavior in C is concerned. They are subjective
terms at best. Code is "broken" if it does not do what you want, you
consider it to "work" if it does. If it produces undefined behavior,
it may "work" on one compiler but be "broken" on another, and both
compilers can be standard conforming.
--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html
Christian Bau <ch***********@cbau.freeserve.co.uk> wrote: In article <11**********************@f14g2000cwb.googlegroups .com>, ni************@genevoise.ch wrote:
[snip] ***** Question (5) *****
This question is really hard.
Let's have this code sniplet:
--------- #include <stdio.h>
int main (void) {
struct s1 {int i; };
struct s1 s = {77};
unsigned char* x = (unsigned char*)&s; printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2], (int)x[3]); // Standard says data stored in "struct s1" type can be read by pointer to "char"
That is if sizeof (int) >= 4, which is nowhere guaranteed.
x[0] = 100; // here, I write data in "char" objects !!! x[1] = 101; x[2] = 102; x[3] = 103;
Let's suppose that we copy value from another int:
int i = 42;
unsigned char *y = (void*)&i;
assert(sizeof(int) == 4);
x[0] = y[0];
//...etc. printf("%d\n", s.i); // but data stored in "char" objects cannot be read by pointer to "struct s1" ???
Storing values through character lvalues did not change the effective
type of the struct, or it's member, therefore it's okay (compiler must
reread the value from memory).
Effective type for declared objects is always the declared type.
Effective type for allocated objects is the last imprinted by
storing a value, by copying (memcpy, memmove, char array), or, if
none, is the type of the lvalue it is accessed with.
Assuming that sizeof (int) == 4, you have changed exactly every bit in the representation of x. If the representation is not a trap representation, you are fine. And it is even ok if for example the result after storing three bytes, combined with the last remaining byte of the number 77 were a trap representation, because you never access that value.
(all agreed)
[snip] For the line "printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2], (int)x[3]);", I can rewrite the Standard clause like this:
An object [ here, s of type "struct s1" ] shall have its stored value accessed only by an lvalue expression that has one of the following types: [ blah blah blah ] - a character type [ in our example, x[0], x[1], x[2], x[3] ]. // it is our case, so everything is OK so far !
But what about the line "printf("%d\n", s.i);" ?????? I read the Standard again and again, but I cannot express how is can work.
It means this: struct s1 object can be legally accessed with a character
lvalue (including writing data to the struct). Since it's legal,
the compiler must take it into consideration when later accessing
struct s1. Either it can prove that character lvalues did not refer
to the struct object, or it must re-read the struct value from memory.
This is not the case with other types:
assert(sizeof(int) == sizeof(short))
int i = 42;
short *ps = &i; //assume that alignment is the same
*ps = 54; //this access is UB; since it is not legal to access int object
//with short lvalue, compiler need not assume that object `i'
//was actually changed
printf("%d\n", i); //may print cached value 42
//(the Std says it can do or not do virtually anything)
For another example: when a value is stored through `short' lvalue,
the compiler need not assume that `struct s1' object was changed,
because `struct s1' does not contain a `short' member.
--
Stan Tobias
mailx `echo si***@FamOuS.BedBuG.pAlS.INVALID | sed s/[[:upper:]]//g`
Christian Bau <ch***********@cbau.freeserve.co.uk> wrote: In article <11**********************@f14g2000cwb.googlegroups .com>, ni************@genevoise.ch wrote: ***** Question (4) *****
In the Standard, chapter 6.5.2.3, it is written:
One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the complete type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.
I find this statement completely obscure.
Let's have:
struct s1 {int i;}; struct s2 {int i;};
struct s1 *p1; struct s2 *p2;
A compiler is free to assume that *p1 and *p2 don't alias.
Exactly.
What's more important: `p1->i' and `p2->i' don't alias, despite that they
have the same type!
However p1 and p2 _may_ point at the same object.
((char*)p1)[0] = 0;
At this point the compiler cannot blindly assume that `*p2' wasn't modified. If we just put a union declaration like this before this code, then it acts like a flag to the compiler, indicating that pointers to "struct s1" and pointers to "struct s2" ( here, p1 and p2 ) may alias and point to the same location.
(As I said above, they may point to the same location.) union p1_p2_alias_flag { struct s1 st1; struct s2 st2; };
There is no need to use "union p1_p2_alias_flag" for accessing data, and "p1_p2_alias_flag", "st1" and "st2" are just dummy names, not used anywhere else.
(I don't quite understand what you mean here.) I mean, it is possible to access data using directly p1 and p2.
After the compiler sees the union declaration, it is obliged to assume
that `p1->i' and `p2->i' may refer to (alias) the same object.
(However, it still need not assume that expressions `*p1' and `*p2' alias
the same object, since they are incompatible types).
--
Stan Tobias
mailx `echo si***@FamOuS.BedBuG.pAlS.INVALID | sed s/[[:upper:]]//g`
In article <3r************@individual.net> "S.Tobias" <si***@FamOuS.BedBuG.pAlS.INVALID> writes:
.... struct s1 {int i;}; struct s2 {int i;};
struct s1 *p1; struct s2 *p2;
A compiler is free to assume that *p1 and *p2 don't alias. Exactly.
With a caveat. It is free to assume that as long as nothing is assigned
to either p1 or p2.
However p1 and p2 _may_ point at the same object.
In that case the compiler can not assume that *p1 and *p2 don't alias.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
Dik T. Winter <Di********@cwi.nl> wrote: In article <3r************@individual.net> "S.Tobias" <si***@FamOuS.BedBuG.pAlS.INVALID> writes: ...> struct s1 {int i;}; > struct s2 {int i;}; > > struct s1 *p1; > struct s2 *p2; > > A compiler is free to assume that *p1 and *p2 don't alias.
[snip] However p1 and p2 _may_ point at the same object.
In that case the compiler can not assume that *p1 and *p2 don't alias.
I don't agree, otherwise aliasing rules would have no purpose.
Since `*p1' and `*p2' have incompatible types, the compiler may assume
(act as if) they don't refer to the same object, it doesn't have to prove
that both pointers don't point at the same location.
I believe that the compiler even needn't assume that these two alias
the same object:
*p1
*(struct s2 *)p1
The decision whether to alias or not to alias can be based on
the type of lvalue (mainly).
Can you give an example where `*p1' and `*p2' alias the same object
while the behaviour is defined? (...And where the aliasing is actually
relevant, eg.: `&*p1' and `&*p2' doesn't count.)
Perhaps reading from allocated and separately initialized object, but
this is not a situation when aliasing rules are very important.
--
Stan Tobias
mailx `echo si***@FamOuS.BedBuG.pAlS.INVALID | sed s/[[:upper:]]//g`
In article <3r************@individual.net> "S.Tobias" <si***@FamOuS.BedBuG.pAlS.INVALID> writes: Dik T. Winter <Di********@cwi.nl> wrote: In article <3r************@individual.net> "S.Tobias" <si***@FamOuS.BedBuG.pAlS.INVALID> writes: ... >> struct s1 {int i;}; >> struct s2 {int i;}; >> >> struct s1 *p1; >> struct s2 *p2; >> >> A compiler is free to assume that *p1 and *p2 don't alias. [snip] However p1 and p2 _may_ point at the same object.
In that case the compiler can not assume that *p1 and *p2 don't alias.
I don't agree,
Sorry, I missed that p1 and p2 have different types. Indeed, p1 and p2
_may_ point at the same object, but the only way to let that happen is
by either undefined or implementation defined behaviour. So you were
right.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
Christian Bau wrote: In article <11**********************@f14g2000cwb.googlegroups .com>, ni************@genevoise.ch wrote: --- myfile.c ---
#include <stdio.h> #include <stdlib.h>
typedef enum { RED, BLUE, GREEN } Color;
struct Point { int x; int y; };
struct Color_Point { int x; int y; Color color; };
struct Color_Point2{ struct Point point; Color color; };
int main(int argc, char* argv[]) {
struct Point* p;
struct Color_Point* my_color_point = malloc(sizeof(struct Color_Point)); my_color_point->x = 10; my_color_point->y = 20; my_color_point->color = GREEN;
p = (struct Point*)my_color_point;
This is undefined behavior. There is no guarantee that my_color_point is correctly aligned for a pointer of type (struct Point *).
Doesn't the fact that the value of my_color_point was returned by malloc
guarantee correct alignment?
Thad
In article <43***********************@auth.newsreader.octanew s.com>,
Thad Smith <Th*******@acm.org> wrote: Christian Bau wrote: In article <11**********************@f14g2000cwb.googlegroups .com>, ni************@genevoise.ch wrote:
--- myfile.c ---
#include <stdio.h> #include <stdlib.h>
typedef enum { RED, BLUE, GREEN } Color;
struct Point { int x; int y; };
struct Color_Point { int x; int y; Color color; };
struct Color_Point2{ struct Point point; Color color; };
int main(int argc, char* argv[]) {
struct Point* p;
struct Color_Point* my_color_point = malloc(sizeof(struct Color_Point)); my_color_point->x = 10; my_color_point->y = 20; my_color_point->color = GREEN;
p = (struct Point*)my_color_point;
This is undefined behavior. There is no guarantee that my_color_point is correctly aligned for a pointer of type (struct Point *).
Doesn't the fact that the value of my_color_point was returned by malloc guarantee correct alignment?
In this case, yes.
If you use
struct Color_Point* my_color_point = malloc(sizeof(struct
Color_Point) * 2);
++my_color_point;
my_color_point->x = 10;
my_color_point->y = 20;
my_color_point->color = GREEN;
p = (struct Point*)my_color_point;
you get undefined behavior.
Christian Bau wrote: ni************@genevoise.ch wrote:
struct Point { int x; int y; };
struct Color_Point { int x; int y; Color color; }; int main(int argc, char* argv[]) {
struct Point* p; p = (struct Point*)my_color_point;
This is undefined behavior. There is no guarantee that my_color_point is correctly aligned for a pointer of type (struct Point *).
I think all structs must have the same alignment requirements.
However there is UB because one struct might have different
padding to the other.
Jack Klein <ja*******@spamcop.net> writes: On 13 Oct 2005 07:39:48 -0700, ni************@genevoise.ch wrote in comp.lang.c:
[snip] ***** Question (4) *****
In the Standard, chapter 6.5.2.3, it is written:
One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the complete type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.
I find this statement completely obscure.
Let's have:
struct s1 {int i;}; struct s2 {int i;};
struct s1 *p1; struct s2 *p2;
A compiler is free to assume that *p1 and *p2 don't alias.
If we just put a union declaration like this before this code, then it acts like a flag to the compiler, indicating that pointers to "struct s1" and pointers to "struct s2" ( here, p1 and p2 ) may alias and point to the same location.
union p1_p2_alias_flag { struct s1 st1; struct s2 st2; };
There is no need to use "union p1_p2_alias_flag" for accessing data, and "p1_p2_alias_flag", "st1" and "st2" are just dummy names, not used anywhere else. I mean, it is possible to access data using directly p1 and p2.
It seems unlikely that a compiler could find a way to prevent it from working in general, even if the implementer tried, but such behavior would not render the compiler non-conforming.
On the other hand, since your structure only contains a single member, and the first member always begins at the same address as the structure itself, this particular usage can't fail.
Still, the behavior is undefined. Which means the language standard places no requirements on it at all.
It isn't clear what behavior you think is undefined, since
what is supposed to be executed is stated only approximately.
However, let's consider a particular example:
struct s1 {int i; int j;};
struct s2 {int x; int y;};
union p1_p2_alias_flag {
struct s1 st1;
struct s2 st2;
};
int
affected_function( struct s1 *p1, struct s2 *p2 ){
p1->j = 3;
p2->y = 4;
return p1->j;
}
There is no undefined behavior in 'affected_function'.
Moreover, there are legal calls to the function that must
return '4' as a value.
Of course, it is possible to choose argument values (such as
NULL) for calls to the function that result in undefined
behavior; but the function must work for the legal cases
when the two pointers point to the same address. And I
think that's what the OP was asking about.
Thank you very much, all of you, for having taken the time to answer my
quite confused questions.
I understand now that my interpretation of the standard was totally
wrong.
For those who will have problems with these aliasing rules and will
read this thread, this is my final interpretation of the standard.
I hope this time, I have made no mistake ( but else, tell me ).
The Standard says ( http://www.open-std.org/jtc1/sc22/wg...docs/n1124.pdf chapter 6.5
):
An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type of
the object,
- a type that is the signed or unsigned type corresponding to the
effective type of the object,
- a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned
types among its members
(including, recursively, a member of a subaggregate or contained
union), or
- a character type.
The wording of these rules is not very clear, but this is my tentative
of explanation.
Vocabulary:
an object is a memory location.
an aggregate is a struct or an array.
a character type can be "char", "signed char", or "unsigned char".
Let's have this code:
struct s1 {int i; double d;};
struct s2 {int i; double d;};
// struct s1 and struct s2 are different types, because their tag
names s1 and s2 are different.
int *pi;
struct s1 *p1;
struct s1 *p1_a;
struct s2 *p2;
1) The "objects" which are mentioned in the Standard are really just
memory locations.
So far, there is NO NOTION OF POINTERS at all.
Pointers are just a means of accessing the objects, no more.
You just take a sheet of paper ( which represents your computer's
memory ), and draw rectangles symbolizing all the objects you work with
in your code.
Let's suppose that in our computer memory, we have one location
containing an int, three instances of struct s1 and two of struct s2.
You should obtain something like this ( rectangles are represented
here by pairs of brackets [...] ) :
[int]
[ struct s1 [int] [double] ]
[ struct s1 [int] [double] ]
[ struct s1 [int] [double] ]
[ struct s2 [int] [double] ]
[ struct s2 [int] [double] ]
So far, we have a visual representation of all the object we work
with.
We have 16 objects on your paper:
- one "int" object
- three "struct s1" objects
- each struct s1 objet contains an "int" object
- each struct s1 objet contains a "double" object
- two "struct s2" objects
- each struct s2 objet contains an "int" object
- each struct s2 objet contains a "double" object
For accessing these objects, we use these pointers in our code:
pi, p1, p1_a, p2
Our work will be now to find for each object (=location) which
pointers may access it.
2) I take the visual representation hereabove, and I just write (obj1)
(obj2) ... to represent the objects so that I can explain more easily.
[int (obj1)]
[ struct s1 (obj2) [int (obj3)] [double (obj4)] ]
[ struct s1 (obj5) [int (obj6)] [double (obj7)] ]
[ struct s1 (obj8) [int (obj9)] [double (obj10)] ]
[ struct s2 (obj11) [int (obj12)] [double (obj13)] ]
[ struct s2 (obj14) [int (obj15)] [double (obj16)] ]
Now, let's take each location one after another and see which
pointers may also access them.
The object (=memory location) obj1 is of type "int".
It can be accessed (= read or modified) by *pi, which is a lvalue of
type "int".
It can also be accessed by p1->i, which is a shortcut for (*p1).i,
and *p1 is of type "struct s1 containing "int" as a member".
It can similarly be accessed by p1_a->i and p2->i.
The object obj2 is of type "struct s1".
It can be accessed by *p1_a which is also of type "struct s1".
It cannot be accessed by *pi which is of type "int".
It cannot be accessed by *p2 which is of type "struct s2".
The object obj3 is of type "int".
It can be accessed by *pi which is of type "int".
It can be accessed by p1->i, which is a shortcut for (*p1).i, and
*p1 is of type "struct s1 containing "int" as a member".
The same way, it can be accessed by p1_a->i.
But it cannot be accessed by p2->i, which is a shorcut for (*p2).i,
because *p2 is of type "struct s2 containing "int" as a member"."
It is not explicitly mentioned in the standard, but if access is
done through a struct, its type must match the type of the container of
the object we want to access.
We can do similar analysis for all the remaining locations obj4,
obj5 ...
Just one word about my misunderstanding of the Standard as I first read
it.
At first, I tried to find directly if two pointers may alias, but I was
the wrong way to do and leads to a dead end.
I understand now that it is easier to think first about MEMORY
LOCATIONS (=objects), AND ONLY THEN think about which pointers may
access this location, by seeing if they comply with the rules of the
Standard, as I just did hereabove.
This gives for each location a set of pointers that may access it, and
the compiler considers each of these sets as pointers that may alias
and access the same object.
This way, the Standard becomes more readable and logical.
In practice, the problem is often not to do this thorough analysis for
each object in memory.
It is more of the kind "I work with this object, can I access it with
this pointer ? and can I also access it with this other pointer ?".
"In particular, if I write data in this object using this pointer, can
this other pointer read these data ?"
*** about type-punning ***
double d = 1.234;
int* i = &d;
printf("%d\n", *i); // WRONG
1.234 is stored in an object of type "double".
We try to access it through *i, which is of type "int".
The result is undefined.
If you want to inspect the content of d ( assuming that a double is 4
bytes long and beeing aware about possible trap representations ), you
can do this:
unsigned char* c = (unsigned char*)&d;
and you can access the data with c[0], c[1], c[2] and c[3].
*** about pointer to char ***
Besides, don't forget that as the Standard rule says, a pointer to char
can access any object of any type !
When the location referenced by a pointer to char is updated, the
compiler must assume that any data stored in any type may have been
modified.
But don't think that this kind of code allows you to bypass the
aliasing rules:
struct A *a;
struct B *b;
b = (struct B*)(char*)a;
This won't make "*b" able to access data in "struct A", because "*b" is
of type "struct B".
It is the type of the dereferenced pointer that matters. The
intermediate casting to "char*" is thus totally useless and won't give
"b" more access possibilities.
*** about inheritance ***
struct Point { int x;
int y;
};
struct Color_Point { int x;
int y;
Color color;
};
struct Color_Point2{ struct Point point;
Color color;
};
struct Point* p;
struct Color_Point* my_color_point;
struct Color_Point2* my_color_point2;
my_color_point = malloc(sizeof(struct Color_Point));
my_color_point2 = malloc(sizeof(struct Color_Point2));
p = (struct Point*)my_color_point; // WRONG
// *p, which is of type "struct Point", cannot access data stored at
location *my_color_point, which is an object of type "struct
Color_Point".
p = &my_color_point2->point; // GOOD
// *p, which is of type "struct Point", can access data stored at
location (*my_color_point2).point, which is also of type "struct
Point".
p = (struct Point*)my_color_point2; // GOOD
// *p, which is of type "struct Point", can access data stored at
location (*my_color_point2).point, which is also of type "struct
Point".
// We see that in fact, this is exactly the same case as the
previous one !
// C gives the guarantee that we can cast the pointer to a struct to
the type of its first member, it gives a pointer to this first member
object.
// Just notice that this guarantee is about alignment, and that the
fact that we can access data stored in an object is granted to us by
the aliasing rules, exactly as in the previous example.
*** final word ***
When working with pointers, there seems to be no need to cast pointers.
( I don't speak here of casting objects, like casting a "double" to an
"int" for instance, which is of course allowed.
It is casting pointers, like "double*" to "int*" or "struct s1*" to
"struct s2*" which is dangerous. )
In fact, every time a pointer is cast to point to a different type, the
alias rules interfere and lead to undefined behaviour.
So, to avoid any aliasing problem, the best way seems never to cast
pointers, with these two exceptions:
a) cast a pointer to char*, so that it can access the byte
representation of the object ( cast to unsigned char* is best ), as
allowed by the aliasing rules.
b) cast of a pointer to struct to a pointer to its first member type,
like in the last example "p = (struct Point*)my_color_point2;".
( but this one is not really necessary, as we can just pass the
address of the first member as in the last example "p =
&my_color_point2->point;", so that a cast is avoided ).
As for pointers to void, such as those returned by malloc, there is no
need to cast them, as pointers to void may be assigned to and from
pointers to any type.
Any suggestion about something I could have missed or misunderstood ?
Best regards
On Wed, 19 Oct 2005 00:17:33 -0700, nicolas.riesch wrote:
A few corrections but generally what you wrote was accurate. struct s1 {int i; double d;}; struct s2 {int i; double d;}; // struct s1 and struct s2 are different types, because their tag names s1 and s2 are different.
They are of the same "effective type".
.... struct s2 *p2;
.... [int (obj1)] [ struct s1 (obj2) [int (obj3)] [double (obj4)] ]
.... The object obj2 is of type "struct s1".
.... It cannot be accessed by *p2 which is of type "struct s2".
Actually it can be, since s2 has "a type compatible with the effective
type of" s1.
The object obj3 is of type "int".
.... But it cannot be accessed by p2->i, which is a shorcut for (*p2).i, because *p2 is of type "struct s2 containing "int" as a member"."
Same applies here.
[...] *** about inheritance ***
struct Point { int x; int y; };
struct Color_Point { int x; int y; Color color; };
struct Color_Point2{ struct Point point; Color color; };
struct Point* p; struct Color_Point* my_color_point; struct Color_Point2* my_color_point2;
my_color_point = malloc(sizeof(struct Color_Point)); my_color_point2 = malloc(sizeof(struct Color_Point2));
Your analysis being based on malloc'd memory is flawed - malloc'd memory
is properly aligned for any object and until it is written to, it has no
effective type. So let's assume instead that you'd caused the pointers to
reference static objects of the type that they point to or that your code
has written such an object into the malloc'd memory to establish its
effective type.
p = (struct Point*)my_color_point; // WRONG // *p, which is of type "struct Point", cannot access data stored at location *my_color_point, which is an object of type "struct Color_Point".
But this code is not accessing data, it's setting a pointer. Since struct
Color_Point's initial elements are those of struct Point in the same
order, it can't have stricter alignment requirements. There's nothing
wrong with the code.
With the above assumption that my_color_point points to an object with the
effective type struct Color_Point, it is "wrong" to try to access p->y,
but not to access p->x. This is because the first member of a structure
is always at the same (initial, unpadded) location, but the second member
may be preceded by an arbitrary amount of padding. In practice it's
unlikely that a compiler that would precede y with different amounts of
padding in each structure type, but in theory it's possible.
[...] the best way seems never to cast pointers, with these two exceptions:
[to char* so as to access an object's bytes; to a pointer type compatible
with the first member(s) of a struct]
That's good practice (there are occasional other exceptions).
[...]
-- http://members.dodo.com.au/~netocrat ni************@genevoise.ch wrote: An object shall have its stored value accessed only by an lvalue expression that has one of the following types: - a type compatible with the effective type of the object, - a qualified version of a type compatible with the effective type of the object, - a type that is the signed or unsigned type corresponding to the effective type of the object, - a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object, - an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
This one means that an object of type int may be accessed through
a (bigger) struct type that contains an int member.
struct s { double d; int i; };
void f(int *pi, struct s *ps)
{
*pi;
*ps = /*...*/;
*pi; /* must be re-read from memory */
} - a character type.
Yes, but remember that this is not the whole story. They are type-based
aliasing rules, and there are other (expression-based) rules, too.
For example:
struct sx { int x; } *px;
struct sy { int y; } *py;
void *pv = malloc(...);
px = pv; py = pv;
*px = ...;
py->y; //BAD, object does not have `y' member
Example 2:
int ai[2][2];
ai[0][2]; //BAD, this is not the same as ai[1][1]
.... struct s1 {int i; double d;}; struct s2 {int i; double d;}; // struct s1 and struct s2 are different types, because their tag names s1 and s2 are different.
int *pi; struct s1 *p1; struct s1 *p1_a; struct s2 *p2;
For accessing these objects, we use these pointers in our code: pi, p1, p1_a, p2
Our work will be now to find for each object (=location) which pointers may access it.
Your hooked onto a bad terminology. What matters is the type of
lvalue. Lvalue is like a window though which you access an object,
a pointer is like an arrow. You don't access objects with pointers.
Pointers merely may be part of expressions that eventually may
be lvalues. Objects are not locations, but are memory ranges.
[int (obj1)] [ struct s1 (obj2) [int (obj3)] [double (obj4)] ] [ struct s1 (obj5) [int (obj6)] [double (obj7)] ] [ struct s1 (obj8) [int (obj9)] [double (obj10)] ] [ struct s2 (obj11) [int (obj12)] [double (obj13)] ] [ struct s2 (obj14) [int (obj15)] [double (obj16)] ]
Now, let's take each location one after another and see which pointers may also access them.
The object (=memory location) obj1 is of type "int". It can be accessed (= read or modified) by *pi, which is a lvalue of type "int".
Yes. It can also be accessed by p1->i, which is a shortcut for (*p1).i, and *p1 is of type "struct s1 containing "int" as a member".
No, `obj1' doesn't have member `i' (in fact, it's not a struct at all). It can similarly be accessed by p1_a->i and p2->i.
Idem.
The object obj2 is of type "struct s1". It can be accessed by *p1_a which is also of type "struct s1".
Right. It cannot be accessed by *pi which is of type "int".
It can. One of its member (obj3) is `int' type, so that one can
be accessed, which means that the containing object can be accessed
as well (when you access a member, you access the whole object, too).
It cannot be accessed by *p2 which is of type "struct s2".
Indeed, it can't. The object obj3 is of type "int". It can be accessed by *pi which is of type "int". It can be accessed by p1->i, which is a shortcut for (*p1).i, and *p1 is of type "struct s1 containing "int" as a member". The same way, it can be accessed by p1_a->i.
Yes. Moreover, it can be accessed with an expression `*p1' (IOW, that
expression may read value, or change the subobject), provided that
`p1' points at the right location (obj2).
But it cannot be accessed by p2->i, which is a shorcut for (*p2).i, because *p2 is of type "struct s2 containing "int" as a member"."
[assuming that `p2' may point to obj2]
No, this is because `struct s1' (which is the type of obj2) does not have
`s2::i' member (sorry for C++ notation; struct members have their own
namespace for each struct type). It is not explicitly mentioned in the standard, but if access is done through a struct, its type must match the type of the container of the object we want to access.
It is mentioned at the member access operators. If it weren't, nobody
whould argure this.
Just one word about my misunderstanding of the Standard as I first read it. At first, I tried to find directly if two pointers may alias, but I was the wrong way to do and leads to a dead end.
Again: pointers don't alias, lvalues may... I understand now that it is easier to think first about MEMORY LOCATIONS (=objects), AND ONLY THEN think about which pointers may access this location, by seeing if they comply with the rules of the Standard, as I just did hereabove.
Pointers may or may not point to locations, which is covered by
different rules. This gives for each location a set of pointers that may access it, and the compiler considers each of these sets as pointers that may alias and access the same object. This way, the Standard becomes more readable and logical.
In practice, the problem is often not to do this thorough analysis for each object in memory. It is more of the kind "I work with this object, can I access it with this pointer ? and can I also access it with this other pointer ?". "In particular, if I write data in this object using this pointer, can this other pointer read these data ?"
Again: what matters is the EXPRESSION, not pointers that may be
one of its components.
*** about type-punning ***
double d = 1.234; int* i = &d;
The last one is suspicious. printf("%d\n", *i); // WRONG
Right. Technically, it's UB.
unsigned char* c = (unsigned char*)&d; and you can access the data with c[0], c[1], c[2] and c[3].
Right
*** about pointer to char ***
Besides, don't forget that as the Standard rule says, a pointer to char can access any object of any type !
Pointer to character type may *point to* any object (of any type).
So can pointer to void. When the location referenced by a pointer to char is updated, the compiler must assume that any data stored in any type may have been modified.
No, when an object is modified though an *lvalue of character type*,
then compiler must assume anything might have been modified (unless
it can prove otherwise).
But don't think that this kind of code allows you to bypass the aliasing rules:
struct A *a; struct B *b;
b = (struct B*)(char*)a;
The struct cast is suspicious. This won't make "*b" able to access data in "struct A", because "*b" is of type "struct B". It is the type of the dereferenced pointer that matters.
More-or-less, yes.
*** final word ***
When working with pointers, there seems to be no need to cast pointers. ( I don't speak here of casting objects, like casting a "double" to an "int" for instance, which is of course allowed. It is casting pointers, like "double*" to "int*" or "struct s1*" to "struct s2*" which is dangerous. )
No, casting is sometimes necessary (where there's no implicit conversion),
and is always safe where conversion is well defined.
In fact, every time a pointer is cast to point to a different type, the alias rules interfere and lead to undefined behaviour.
No, aliasing rules have to do with lvalues. Period. End of story.
Pointers (in the way you talk about them) are subject to conversion rules.
--
Stan Tobias
mailx `echo si***@FamOuS.BedBuG.pAlS.INVALID | sed s/[[:upper:]]//g`
On Wed, 19 Oct 2005 09:05:33 +0000, Netocrat wrote: On Wed, 19 Oct 2005 00:17:33 -0700, nicolas.riesch wrote:
A few corrections but generally what you wrote was accurate.
struct s1 {int i; double d;}; struct s2 {int i; double d;}; // struct s1 and struct s2 are different types, because their tag names s1 and s2 are different. They are of the same "effective type".
OK my reading of the standard was incomplete - they're not of the same
effective type after all. Your original statement and the follow-ons that
I mistakenly corrected stand.
[...] p = (struct Point*)my_color_point; // WRONG // *p, which is of type "struct Point", cannot access data stored at location *my_color_point, which is an object of type "struct Color_Point".
But this code is not accessing data, it's setting a pointer. Since struct Color_Point's initial elements are those of struct Point in the same order, it can't have stricter alignment requirements. There's nothing wrong with the code.
With the above assumption that my_color_point points to an object with the effective type struct Color_Point, it is "wrong" to try to access p->y, but not to access p->x.
....but in the context of aliasing, yes, it's not guaranteed that you will
get the expected value when reading p->x.
This is because the first member of a structure is always at the same (initial, unpadded) location, but the second member may be preceded by an arbitrary amount of padding. In practice it's unlikely that a compiler that would precede y with different amounts of padding in each structure type, but in theory it's possible.
-- http://members.dodo.com.au/~netocrat
S. Thobias wrote: [int (obj1)] [ struct s1 (obj2) [int (obj3)] [double (obj4)] ] [ struct s1 (obj5) [int (obj6)] [double (obj7)] ] [ struct s1 (obj8) [int (obj9)] [double (obj10)] ] [ struct s2 (obj11) [int (obj12)] [double (obj13)] ] [ struct s2 (obj14) [int (obj15)] [double (obj16)] ]
The object (=memory location) obj1 is of type "int". It can be accessed (= read or modified) by *pi, which is a lvalue of type "int". It can also be accessed by p1->i, which is a shortcut for (*p1).i, and *p1 is of type "struct s1 containing "int" as a member".
No, `obj1' doesn't have member `i' (in fact, it's not a struct at all).
I was having this example in mind, in fact:
struct s1 mys1;
struct s1* p1 = &mys1;
int* pi = &mys1.i;
(*p1).i = 123;
printf("%d\n", *pi); // we read here the value of *pi, which is of
type "int",
// which has been written in the previous line
// by using *p1 which is of type "struct s1"
Again: pointers don't alias, lvalues may...
Absolutely, I must never forget that.
Pointer to character type may *point to* any object (of any type). So can pointer to void.
Yes, pointer to void can *point to* any object, but it cannot be
dereferenced, so it cannot *access* it.
When the location referenced by a pointer to char is updated, the compiler must assume that any data stored in any type may have been modified.
No, when an object is modified though an *lvalue of character type*, then compiler must assume anything might have been modified (unless it can prove otherwise).
Expressing it this way is better, yes.
And thank you very much for your comment.
I still must read it carefully until I am sure to understand
everything.
On Wed, 19 Oct 2005 13:24:27 +0000, S.Tobias wrote: ni************@genevoise.ch wrote:
[...] It is not explicitly mentioned in the standard, but if access is done through a struct, its type must match the type of the container of the object we want to access. It is mentioned at the member access operators. If it weren't, nobody whould argure this.
Is this an area where the draft and final version differ? I see no
mention of it in N869's "6.5.2.3 Structure and union members" which is the
section to which I presume you're referring.
[...]
-- http://members.dodo.com.au/~netocrat
On Wed, 19 Oct 2005 13:24:27 +0000, S.Tobias wrote: nicolas.rie...@genevoise.ch wrote:
[...] It is not explicitly mentioned in the standard, but if access is done through a struct, its type must match the type of the container of the object we want to access. It is mentioned at the member access operators. If it weren't, nobody whould argure this.
I see my explanation was unclear.
Here is the reason why obj3 cannot be accessed by p2->i :
p2->i is a shorcut for (*p2).i, and *p2 is of type "struct s2
containing "int" as a member"."
So far, one could think that *p2 could access obj3, as no rule seems to
forbid it.
But the Standard doesn't say that a lvalue complying to these rules CAN
also access the object (it only MAY, and sometimes, it even CANNOT for
other reasons).
Here, the answer is that obj3 is included in a "struct s1", which is in
a different location from any "struct s2" object because they are
different types.
So, a pointer to any "struct s2" OR TO ANY OF ITS MEMBERS cannot access
any location of a "struct s1".
Netocrat <ne******@dodo.com.au> wrote: On Wed, 19 Oct 2005 13:24:27 +0000, S.Tobias wrote: ni************@genevoise.ch wrote: [...] It is not explicitly mentioned in the standard, but if access is done through a struct, its type must match the type of the container of the object we want to access. It is mentioned at the member access operators. If it weren't, nobody whould argure this.
Is this an area where the draft and final version differ?
In the relevant parts - no. (In p.5 the first sentence has been dropped,
and the rest differs by one letter.)I see no mention of it in N869's "6.5.2.3 Structure and union members" which is the section to which I presume you're referring.
My bad, sorry. It's not explicitly mentioned, but can be derived.
Pp. 3 and 4 refer to a "member of a structure or union object"; it means
the operator (and behaviour) is defined iff the _object_ has the specified
member. (However, I decline to explain what exactly it should be; I think
the Std means the effective type of the object; it's one of the questions
on my list to c.s.c.)
Anyway, if the Std text is not enough, then at least Example 3 shows
the intention; if it were allowed (in the example) to access `t1::m'
with `p2->m' (or vv.), then the second part of the example would
be moot, as well as the "special guarantee" of p. 5 would.
--
Stan Tobias
mailx `echo si***@FamOuS.BedBuG.pAlS.INVALID | sed s/[[:upper:]]//g`
On Mon, 24 Oct 2005 13:40:55 +0000, S.Tobias wrote: Netocrat <ne******@dodo.com.au> wrote: On Wed, 19 Oct 2005 13:24:27 +0000, S.Tobias wrote: ni************@genevoise.ch wrote: [...] It is not explicitly mentioned in the standard, but if access is done through a struct, its type must match the type of the container of the object we want to access. It is mentioned at the member access operators. If it weren't, nobody whould argure this.
[...]I see no mention of it in N869's "6.5.2.3 Structure and union members" which is the section to which I presume you're referring. My bad, sorry. It's not explicitly mentioned, but can be derived. Pp. 3 and 4 refer to a "member of a structure or union object"; it means the operator (and behaviour) is defined iff the _object_ has the specified member.
OK - I was expecting something more explicit - the "iff" is implied - but
this seems to be a correct interpretation of the document's intent.
(However, I decline to explain what exactly it should be; I think the Std means the effective type of the object; it's one of the questions on my list to c.s.c.)
What else would it be?[*]
Anyway, if the Std text is not enough, then at least Example 3 shows the intention; if it were allowed (in the example) to access `t1::m' with `p2->m' (or vv.), then the second part of the example would be moot, as well as the "special guarantee" of p. 5 would.
Sure - that makes the intent plain, even if the prior wording is not.
[*] My reaction to your comment in another thread - that you have your own
idea of what it means to "complete a type" - was similar: how many
alternative interpretations could there be?
-- http://members.dodo.com.au/~netocrat This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Carl |
last post by:
"Nine Language Performance Round-up: Benchmarking Math & File I/O"
http://www.osnews.com/story.php?news_id=5602
I think this is an unfair comparison! I wouldn't dream of developing a
numerical...
|
by: Yuri Victorovich |
last post by:
In short my question is:
If I overload "operator new" for class A and return from it
instance of struct B (unrelated with A) as allocated memory
area for A should aliasing rules work and allow...
|
by: Bryan Parkoff |
last post by:
I create one 32 Bits variable and four pointer variables. Four pointer
variables link or point to one 32 Bits variable. Each pointer variable is 8
Bits. Look at my example below.
unsigned int...
|
by: The Bicycling Guitarist |
last post by:
My web site has not been spidered by Googlebot since April 2003. The site in
question is at www.TheBicyclingGuitarist.net/ I received much help from this
NG and the stylesheets NG when updating the...
|
by: Adam Warner |
last post by:
Hi all,
Message ID <c1qo3f0tro@enews2.newsguy.com> is one of many informative
articles by Chris Torek about C. The particular message discusses aliasing
and concludes with this paragraph:
...
|
by: Old Wolf |
last post by:
Consider the following program:
#include <stdio.h>
int main(void)
{
/* using malloc to eliminate alignment worries */
unsigned long *p = malloc( sizeof *p );
if ( p && sizeof(long) ==...
|
by: David Mathog |
last post by:
I have a program for which this line:
if(! lstrtol(&atoken,length-2,(long *) &(lclparams->pad)) ||
(lclparams->pad< 0)){
generates the warning below, but ONLY if the gcc compiler is at -O2 or...
|
by: Squat'n Dive |
last post by:
Does anyone have an idea why -fno-strict-aliasing is turned off when
cross compiling?
in configure generated for 2.4.4:
case $GCC in
yes)
# Python violates C99 rules, by casting between...
|
by: Paul Brettschneider |
last post by:
Hello all,
consider the following code:
typedef char T;
class test {
T *data;
public:
void f(T, T, T);
void f2(T, T, T);
|
by: erikbower65 |
last post by:
Here's a concise step-by-step guide for manually installing IntelliJ IDEA:
1. Download: Visit the official JetBrains website and download the IntelliJ IDEA Community or Ultimate edition based on...
|
by: kcodez |
last post by:
As a H5 game development enthusiast, I recently wrote a very interesting little game - Toy Claw ((http://claw.kjeek.com/))。Here I will summarize and share the development experience here, and hope it...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Sept 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM)
The start time is equivalent to 19:00 (7PM) in Central...
|
by: Taofi |
last post by:
I try to insert a new record but the error message says the number of query names and destination fields are not the same
This are my field names
ID, Budgeted, Actual, Status and Differences
...
|
by: DJRhino1175 |
last post by:
When I run this code I get an error, its Run-time error# 424 Object required...This is my first attempt at doing something like this. I test the entire code and it worked until I added this -
If...
|
by: Rina0 |
last post by:
I am looking for a Python code to find the longest common subsequence of two strings. I found this blog post that describes the length of longest common subsequence problem and provides a solution in...
|
by: lllomh |
last post by:
Define the method first
this.state = {
buttonBackgroundColor: 'green',
isBlinking: false, // A new status is added to identify whether the button is blinking or not
}
autoStart=()=>{
|
by: Mushico |
last post by:
How to calculate date of retirement from date of birth
|
by: DJRhino |
last post by:
Was curious if anyone else was having this same issue or not....
I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
| | |