Questions about pointers to objects and pointers to functions

Marc Thrun

Hello,

I've got a few questions:

1) Given the two structs
struct A {
int x;
};

and

struct B {
struct A y;
int z;
};

is it ok to treat a "pointer to an object of type struct B" as a
"pointer to an object of type struct A"?
(I think someone asked something like this some time ago, but
unfortunately I can't find the article anymore)

2) When I now have a function pointer of type
void (*fpa)(struct A *, int);
and a function pointer of type
void (*fpb)(struct B *, int);

and the corresponding functions
void fa(struct A*,int);
and
void fb(struct B*,int);

is it ok to assign a "pointer to fa" to fpb and call the function
through fpb with a "pointer to an object of type struct B" as the first
parameter?

Thanks in advance
Marc Thrun

Nov 15 '05 #1

Subscribe Post Reply

1950

SM Ryan

Marc Thrun <Te********@gmx.de> wrote:
# Hello,
#
# I've got a few questions:
#
# 1) Given the two structs
# struct A {
# int x;
# };
#
# and
#
# struct B {
# struct A y;
# int z;
# };
#
# is it ok to treat a "pointer to an object of type struct B" as a
# "pointer to an object of type struct A"?

Given
struct B sample
you are guarenteed
(struct A*)(&sample) == &(sample.y)
that is, a pointer to a struct is also a pointer to the
first field.
--
SM Ryan http://www.rawbw.com/~wyrmwif/
Who's leading this mob?

Nov 15 '05 #2

Tim Rentsch

Marc Thrun <Te********@gmx.de> writes:

Hello,

I've got a few questions:

1) Given the two structs
struct A {
int x;
};

and

struct B {
struct A y;
int z;
};

is it ok to treat a "pointer to an object of type struct B" as a
"pointer to an object of type struct A"?
(I think someone asked something like this some time ago, but
unfortunately I can't find the article anymore)
This question might mean a couple of different things. If you're
asking about converting (casting) a 'struct B *' to a 'struct A *'
then that has to work, eg

struct B *b = ...;
struct A *a;

a = (struct A*) b;
if( a == &b->y ) /* this 'if' will always be taken */

You might be asking about converting the object representation
directly, eg, with one of

memcpy( &a, &b, sizeof a ); /* 1 */

a = * (struct A**) &b; /* 2 */

Either /*1*/ or /*2*/ should also result in a usable pointer that
passes the 'if' test above. That is to say, to the best of my
understanding that is how the language in the Standard should be
understood. It's possible to debate the point; the language used
in talking about such things is not completely clear cut. As a
practical matter, however, it's reasonable to expect that this
result will hold in any actual implementation.

2) When I now have a function pointer of type
void (*fpa)(struct A *, int);
and a function pointer of type
void (*fpb)(struct B *, int);

and the corresponding functions
void fa(struct A*,int);
and
void fb(struct B*,int);

is it ok to assign a "pointer to fa" to fpb and call the function
through fpb with a "pointer to an object of type struct B" as the first
parameter?

Technically such a call is illegal, since the two types are not
compatible. But, even though it's technically illegal, it's
almost certainly going to work in any actual implementation.

Nov 15 '05 #3

Marc Thrun

Tim Rentsch wrote:

Marc Thrun <Te********@gmx.de> writes:

Hello,

I've got a few questions:

1) Given the two structs
struct A {
int x;
};

and

struct B {
struct A y;
int z;
};

is it ok to treat a "pointer to an object of type struct B" as a
"pointer to an object of type struct A"?
(I think someone asked something like this some time ago, but
unfortunately I can't find the article anymore)

This question might mean a couple of different things. If you're
asking about converting (casting) a 'struct B *' to a 'struct A *'
then that has to work, eg

I admit that it's quite a bit unclear, sorry for my not so good english
;-). But you are right, I meant casting (I just wonder why I did not
take this term actually). struct B *b = ...;
struct A *a;

a = (struct A*) b;
if( a == &b->y ) /* this 'if' will always be taken */

You might be asking about converting the object representation
directly, eg, with one of

memcpy( &a, &b, sizeof a ); /* 1 */

a = * (struct A**) &b; /* 2 */

Either /*1*/ or /*2*/ should also result in a usable pointer that
passes the 'if' test above. That is to say, to the best of my
understanding that is how the language in the Standard should be
understood. It's possible to debate the point; the language used
in talking about such things is not completely clear cut. As a
practical matter, however, it's reasonable to expect that this
result will hold in any actual implementation.

2) When I now have a function pointer of type
void (*fpa)(struct A *, int);
and a function pointer of type
void (*fpb)(struct B *, int);

and the corresponding functions
void fa(struct A*,int);
and
void fb(struct B*,int);

is it ok to assign a "pointer to fa" to fpb and call the function
through fpb with a "pointer to an object of type struct B" as the first
parameter?

Technically such a call is illegal, since the two types are not
compatible. But, even though it's technically illegal, it's
almost certainly going to work in any actual implementation.

I thought the same, but was not sure. Assuming I would use a void * for
the first parameter in both functions, and casting it to the type
"struct A*"/"struct B*" should be valid then?

PS: Where can I get a copy of the standard (maybe a draft version)?
Might help me next time ;-).

Nov 15 '05 #4

Tim Rentsch

Marc Thrun <Te********@gmx.de> writes:

Tim Rentsch wrote:
Marc Thrun <Te********@gmx.de> writes: [snip]
2) When I now have a function pointer of type
void (*fpa)(struct A *, int);
and a function pointer of type
void (*fpb)(struct B *, int);

and the corresponding functions
void fa(struct A*,int);
and
void fb(struct B*,int);

is it ok to assign a "pointer to fa" to fpb and call the function
through fpb with a "pointer to an object of type struct B" as the first
parameter?

Technically such a call is illegal, since the two types are not
compatible. But, even though it's technically illegal, it's
almost certainly going to work in any actual implementation.

I thought the same, but was not sure. Assuming I would use a void * for
the first parameter in both functions, and casting it to the type
"struct A*"/"struct B*" should be valid then?

Using

void fa( void *pv, int n ){ ... };
void fb( void *pv, int n ){ ... };
void (*fpa)( void *, int ) = fa;
void (*fpb)( void *, int ) = fb;

makes assignment of the function pointers, and also the resultant
calls, legal. As you point out, it's then necessary to convert the
void* parameters inside the actual function bodies to be of the
appropriate type.

If I were asked, I'd be inclined to recommend using the first approach
(that uses 'struct A*' and 'struct B*' types, and not 'void*' types),
because the stronger type checking done would be more likely to catch
errors than some theoretical advantage that might result from using
'void*'. But that could depend on the local situation and what
tradeoffs were considered important in the context of the particular
project.

PS: Where can I get a copy of the standard (maybe a draft version)?
Might help me next time ;-).

I'm sorry, I don't have a URL handy; if you do a google search
I expect you'll find something without too much difficulty.

Nov 15 '05 #5

S.Tobias

Tim Rentsch <tx*@alumnus.caltech.edu> wrote:

Marc Thrun <Te********@gmx.de> writes:
struct A {
int x;
};

and

struct B {
struct A y;
int z;
};

[snip]
a = (struct A*) b;
if( a == &b->y ) /* this 'if' will always be taken */

You might be asking about converting the object representation
directly, eg, with one of

memcpy( &a, &b, sizeof a ); /* 1 */

a = * (struct A**) &b; /* 2 */

Either /*1*/ or /*2*/ should also result in a usable pointer that
passes the 'if' test above. That is to say, to the best of my
understanding that is how the language in the Standard should be
understood.

Generally, I agree, with a warning: /*2*/ is meant to express
reinterpretation (wich is basically what /*1*/ does as well),
but is technically UB (you're accessing the value of `b' with
an incompatible type lvalue), and might cause real trouble in
real world (esp. when compiler optimization is turned on).

--
Stan Tobias
mailx `echo si***@FamOuS.BedBuG.pAlS.INVALID | sed s/[[:upper:]]//g`

Nov 15 '05 #6

Tim Rentsch

"S.Tobias" <si***@FamOuS.BedBuG.pAlS.INVALID> writes:

Tim Rentsch <tx*@alumnus.caltech.edu> wrote:
Marc Thrun <Te********@gmx.de> writes:

struct A {
int x;
};

and

struct B {
struct A y;
int z;
};

[snip]

a = (struct A*) b;
if( a == &b->y ) /* this 'if' will always be taken */

You might be asking about converting the object representation
directly, eg, with one of

memcpy( &a, &b, sizeof a ); /* 1 */

a = * (struct A**) &b; /* 2 */

Either /*1*/ or /*2*/ should also result in a usable pointer that
passes the 'if' test above. That is to say, to the best of my
understanding that is how the language in the Standard should be
understood.

Generally, I agree, with a warning: /*2*/ is meant to express
reinterpretation (wich is basically what /*1*/ does as well),
but is technically UB (you're accessing the value of `b' with
an incompatible type lvalue), and might cause real trouble in
real world (esp. when compiler optimization is turned on).

Right, both on the technical UB and on the real potential
for problems. Thank you for pointing this out.

Rather than /*2*/ we might consider /*2'*/:

a = * (struct A *volatile *) &b; /* 2' */

Of course, this access still technically results in UB, but
it's unlikely that the access here will result in any real
world difficulties.

Nov 15 '05 #7

S.Tobias

Tim Rentsch <tx*@alumnus.caltech.edu> wrote:

"S.Tobias" <si***@FamOuS.BedBuG.pAlS.INVALID> writes:
Tim Rentsch <tx*@alumnus.caltech.edu> wrote:
> Marc Thrun <Te********@gmx.de> writes:
>> struct A {
>> int x;
>> };
>>
>> and
>>
>> struct B {
>> struct A y;
>> int z;
>> };

[snip]
>
> a = (struct A*) b;
> if( a == &b->y ) /* this 'if' will always be taken */
>
> You might be asking about converting the object representation
> directly, eg, with one of
>
> memcpy( &a, &b, sizeof a ); /* 1 */
>
> a = * (struct A**) &b; /* 2 */
>
> Either /*1*/ or /*2*/ should also result in a usable pointer that
> passes the 'if' test above. That is to say, to the best of my
> understanding that is how the language in the Standard should be
> understood.

Generally, I agree, with a warning: /*2*/ is meant to express
reinterpretation (wich is basically what /*1*/ does as well),
but is technically UB (you're accessing the value of `b' with
an incompatible type lvalue), and might cause real trouble in
real world (esp. when compiler optimization is turned on).

Right, both on the technical UB and on the real potential
for problems. Thank you for pointing this out.

I think /*1*/ was ok (all pointers to structs have the same
representation), what might cause problems dereferencing it,
but it won't in this case, since A is the first member of B.
Rather than /*2*/ we might consider /*2'*/:

a = * (struct A *volatile *) &b; /* 2' */

Of course, this access still technically results in UB, but
it's unlikely that the access here will result in any real
world difficulties.

Still wrong, look what a compiler might "think":

TYPEA a;
TYPEB b;
//both types are incompatible
b = something_b;
//put something into b
/*...*/
b = something_else_b;
//put something else into b... but wait, let's cache
//it in a register for now, and see what's next
a = *(TYPEA volatile*)&b;
//take the address of b, convert it to ptr to `volatile TYPEA',
//dereference, take value and put into `a'
//lvalue is type `volatile TYPEA'... hmm... nooooo, of course
//it can't mean b here, let's read what that object contains,
//and update b later...
/* a contains something_b */

Perhaps `b' itself should be volatile in this case.

--
Stan Tobias
mailx `echo si***@FamOuS.BedBuG.pAlS.INVALID | sed s/[[:upper:]]//g`

Nov 15 '05 #8

Tim Rentsch

"S.Tobias" <si***@FamOuS.BedBuG.pAlS.INVALID> writes:

Tim Rentsch <tx*@alumnus.caltech.edu> wrote:
"S.Tobias" <si***@FamOuS.BedBuG.pAlS.INVALID> writes:
Tim Rentsch <tx*@alumnus.caltech.edu> wrote:
> Marc Thrun <Te********@gmx.de> writes:

>> struct A {
>> int x;
>> };
>>
>> and
>>
>> struct B {
>> struct A y;
>> int z;
>> };
[snip]
>
> a = (struct A*) b;
> if( a == &b->y ) /* this 'if' will always be taken */
>
> You might be asking about converting the object representation
> directly, eg, with one of
>
> memcpy( &a, &b, sizeof a ); /* 1 */
>
> a = * (struct A**) &b; /* 2 */
>
> Either /*1*/ or /*2*/ should also result in a usable pointer that
> passes the 'if' test above. That is to say, to the best of my
> understanding that is how the language in the Standard should be
> understood.

Generally, I agree, with a warning: /*2*/ is meant to express
reinterpretation (wich is basically what /*1*/ does as well),
but is technically UB (you're accessing the value of `b' with
an incompatible type lvalue), and might cause real trouble in
real world (esp. when compiler optimization is turned on).

Right, both on the technical UB and on the real potential
for problems. Thank you for pointing this out. [snip] Rather than /*2*/ we might consider /*2'*/:

a = * (struct A *volatile *) &b; /* 2' */

Of course, this access still technically results in UB, but
it's unlikely that the access here will result in any real
world difficulties.

Still wrong, look what a compiler might "think":

TYPEA a;
TYPEB b;
//both types are incompatible
b = something_b;
//put something into b
/*...*/
b = something_else_b;
//put something else into b... but wait, let's cache
//it in a register for now, and see what's next
a = *(TYPEA volatile*)&b;
//take the address of b, convert it to ptr to `volatile TYPEA',
//dereference, take value and put into `a'
//lvalue is type `volatile TYPEA'... hmm... nooooo, of course
//it can't mean b here, let's read what that object contains,
//and update b later...
/* a contains something_b */

It's an interesting argument, but the reasoning is not quite sound.

During code generation, the compiler keeps track (in the data flow
sense) of where the value of 'b' is held at any given moment. When
the address is taken ('&b'), the compiler is going to use the address
of the location where 'b' is currently held. If 'b' is currently held
in a register, and the address value might escape the context that the
compiler can analyze, the register value of 'b' will be written back
to its regular memory location so that the address will point to the
currently meaningful value. Of course, what's going to happen in a
real compiler in this situation is that the compiler will know that
the (casted) address refers to 'b', and the register holding 'b' will
simply be stored into 'a'.

Nov 15 '05 #9

S.Tobias

Tim Rentsch <tx*@alumnus.caltech.edu> wrote:

"S.Tobias" <si***@FamOuS.BedBuG.pAlS.INVALID> writes:
TYPEA a;
TYPEB b;
//both types are incompatible
b = something_b;
//put something into b
/*...*/
b = something_else_b;
//put something else into b... but wait, let's cache
//it in a register for now, and see what's next
a = *(TYPEA volatile*)&b;
//take the address of b, convert it to ptr to `volatile TYPEA',
//dereference, take value and put into `a'
//lvalue is type `volatile TYPEA'... hmm... nooooo, of course
//it can't mean b here, let's read what that object contains,
//and update b later...
/* a contains something_b */

It's an interesting argument, but the reasoning is not quite sound.

During code generation, the compiler keeps track (in the data flow
sense) of where the value of 'b' is held at any given moment. When
the address is taken ('&b'), the compiler is going to use the address
of the location where 'b' is currently held.

But taking the address of `b' does not automatically mean that
the object `b' is going to be accessed. The decision which object
_may_ be accessed can be based only on the type of lvalue and
effective type of the object, and I believe that this optimization
above is allowed. That's how I imagine aliasing rule works.
But I'd still appreciate others' comments on this.
If 'b' is currently held
in a register, and the address value might escape the context that the
compiler can analyze, the register value of 'b' will be written back
to its regular memory location so that the address will point to the
currently meaningful value. Of course, what's going to happen in a
real compiler in this situation is that the compiler will know that
the (casted) address refers to 'b', and the register holding 'b' will
simply be stored into 'a'.

But my point is that the context does *not* escape, it's just that
the compiler determines that `b' cannot be accessed because
the lvalue type doesn't match its effective type.

Since all information is in one expression, it's easy to infer that
in this case object `b' is going to be accessed. However, a compiler
need not be that wise. Have you tried this with DS9000?

--
Stan Tobias
mailx `echo si***@FamOuS.BedBuG.pAlS.INVALID | sed s/[[:upper:]]//g`

Nov 15 '05 #10

Tim Rentsch

"S.Tobias" <si***@FamOuS.BedBuG.pAlS.INVALID> writes:

Tim Rentsch <tx*@alumnus.caltech.edu> wrote:
"S.Tobias" <si***@FamOuS.BedBuG.pAlS.INVALID> writes:
TYPEA a;
TYPEB b;
//both types are incompatible
b = something_b;
//put something into b
/*...*/
b = something_else_b;
//put something else into b... but wait, let's cache
//it in a register for now, and see what's next
a = *(TYPEA volatile*)&b;
//take the address of b, convert it to ptr to `volatile TYPEA',
//dereference, take value and put into `a'
//lvalue is type `volatile TYPEA'... hmm... nooooo, of course
//it can't mean b here, let's read what that object contains,
//and update b later...
/* a contains something_b */

It's an interesting argument, but the reasoning is not quite sound.

During code generation, the compiler keeps track (in the data flow
sense) of where the value of 'b' is held at any given moment. When
the address is taken ('&b'), the compiler is going to use the address
of the location where 'b' is currently held.

But taking the address of `b' does not automatically mean that
the object `b' is going to be accessed.

That's true. But a compiler must assume that taking an address
is going to result in accessing the object, unless the compiler
can "prove" otherwise.

The decision which object
_may_ be accessed can be based only on the type of lvalue and
effective type of the object, and I believe that this optimization
above is allowed.
What I think you're saying is that, even though the compiler
knows that the casted address refers to 'b', the rule in the
Standard about effective type allows the compiler to forget
that it does. And that's right; the Standard does allow
that.

But no actual compiler is going to do that. When optimizing, a
compiler always wants to use the best information available,
because that information might enable further optimization. So
the compiler is going to remember that the casted address points
to 'b' even though it's "gone through" the cast.

That's how I imagine aliasing rule works.
But I'd still appreciate others' comments on this.
The reason for the effective type rule has to do with alias
analysis, which happens earlier in the compilation process. The
question is, when you have a pointer and it isn't known where it
points, what assumptions can you make when making an access
through that pointer? A typical case for this to happen is when
a function has a pointer parameter; suppose for example the
parameter has type 'float *'. Then any access through that
pointer can be assumed to access floats and not (for example)
ints.

But in this case the compiler knows where the pointer points.
It's to the optimizer's advantage to keep track; furthermore, it
had to have kept track in order to know whether the address was
used to access the object (which of course it was in this case).
If we imagine that the compiler "forgot" where the pointer
points, then it wouldn't have been able to prove to itself that
the pre-cast address '&b' wasn't used to access 'b'; that means
it would have had to store the register value of 'b' back in its
memory location. So either way, the right thing happens.

If 'b' is currently held
in a register, and the address value might escape the context that the
compiler can analyze, the register value of 'b' will be written back
to its regular memory location so that the address will point to the
currently meaningful value. Of course, what's going to happen in a
real compiler in this situation is that the compiler will know that
the (casted) address refers to 'b', and the register holding 'b' will
simply be stored into 'a'.

But my point is that the context does *not* escape, it's just that
the compiler determines that `b' cannot be accessed because
the lvalue type doesn't match its effective type.

Please read the above again. Both cases are covered.

Since all information is in one expression, it's easy to infer that
in this case object `b' is going to be accessed. However, a compiler
need not be that wise. Have you tried this with DS9000?

Not a relevant question, since we agree on what judgment the
Standard renders in such cases. Rather, the question is what do
actual compilers do. So if you want to make your case, look for
an actual compiler where the generated code for an assignment
statement

a = * (TYPEA volatile*) &b;

is along the lines you suggest.

Nov 15 '05 #11

S.Tobias

Tim Rentsch <tx*@alumnus.caltech.edu> wrote:

"S.Tobias" <si***@FamOuS.BedBuG.pAlS.INVALID> writes:
Tim Rentsch <tx*@alumnus.caltech.edu> wrote:
> "S.Tobias" <si***@FamOuS.BedBuG.pAlS.INVALID> writes:
.... >> a = *(TYPEA volatile*)&b;
>> //take the address of b, convert it to ptr to `volatile TYPEA',
>> //dereference, take value and put into `a'
>> //lvalue is type `volatile TYPEA'... hmm... nooooo, of course
>> //it can't mean b here, let's read what that object contains,
>> //and update b later...
....
The decision which object
_may_ be accessed can be based only on the type of lvalue and
effective type of the object, and I believe that this optimization
above is allowed.

What I think you're saying is that, even though the compiler
knows that the casted address refers to 'b', the rule in the
Standard about effective type allows the compiler to forget
that it does. And that's right; the Standard does allow
that.

But no actual compiler is going to do that. When optimizing, a
compiler always wants to use the best information available,
because that information might enable further optimization. So
the compiler is going to remember that the casted address points
to 'b' even though it's "gone through" the cast.

[snip]

Since all information is in one expression, it's easy to infer that
in this case object `b' is going to be accessed. However, a compiler
need not be that wise. Have you tried this with DS9000?

Not a relevant question, since we agree on what judgment the
Standard renders in such cases. Rather, the question is what do
actual compilers do. So if you want to make your case, look for
an actual compiler where the generated code for an assignment
statement

a = * (TYPEA volatile*) &b;

is along the lines you suggest.

Okay, I guess I'm talking just theory, you're making a practical point.

I don't really know what compilers do (and I don't especially want to).
I once had this code, where I unwisely tried to save on temporary
variables:

int Mem_dosomething(const void *s1, const void *s2) {
for (; *(char*)s1 == *(char*)s2; ++*(char**)&s1, ++*(char**)&s2) ;

and gcc gave the warning (in -O2 mode):
t.c:3: warning: dereferencing type-punned pointer will break
strict-aliasing rules
What this says to me is that the implementors have done something
"clever" and warn me that the "lvalue cast" construct is not
safe, despite that char* and void* have the same representation.
I don't actually know if the warning is really applicable at that
particular point, or is merely printed by default before the compiler
even considers whether it's going to do an optimization at all.
Better safe than sorry. I corrected above code to:

const unsigned char *c1 = s1, *c2 = s2;
for (; *c1 == *c2; ++c1, ++c2) ;

which is always portable.

--
Stan Tobias
mailx `echo si***@FamOuS.BedBuG.pAlS.INVALID | sed s/[[:upper:]]//g`

Nov 15 '05 #12

Flash Gordon

S.Tobias wrote:

Tim Rentsch <tx*@alumnus.caltech.edu> wrote:
"S.Tobias" <si***@FamOuS.BedBuG.pAlS.INVALID> writes:
Tim Rentsch <tx*@alumnus.caltech.edu> wrote:

"S.Tobias" <si***@FamOuS.BedBuG.pAlS.INVALID> writes: ...
a = *(TYPEA volatile*)&b;
> //take the address of b, convert it to ptr to `volatile TYPEA',
> //dereference, take value and put into `a'
> //lvalue is type `volatile TYPEA'... hmm... nooooo, of course
> //it can't mean b here, let's read what that object contains,
> //and update b later...
...The decision which object
_may_ be accessed can be based only on the type of lvalue and
effective type of the object, and I believe that this optimization
above is allowed.
What I think you're saying is that, even though the compiler
knows that the casted address refers to 'b', the rule in the
Standard about effective type allows the compiler to forget
that it does. And that's right; the Standard does allow
that.

But no actual compiler is going to do that. When optimizing, a
compiler always wants to use the best information available,
because that information might enable further optimization. So
the compiler is going to remember that the casted address points
to 'b' even though it's "gone through" the cast.
Based on the gcc info mages I think gcc might...

Since all information is in one expression, it's easy to infer that
in this case object `b' is going to be accessed. However, a compiler
need not be that wise. Have you tried this with DS9000?

Not a relevant question, since we agree on what judgment the
Standard renders in such cases. Rather, the question is what do
actual compilers do. So if you want to make your case, look for
an actual compiler where the generated code for an assignment
statement

a = * (TYPEA volatile*) &b;

is along the lines you suggest.

Okay, I guess I'm talking just theory, you're making a practical point.

I don't really know what compilers do (and I don't especially want to).
I once had this code, where I unwisely tried to save on temporary
variables:

int Mem_dosomething(const void *s1, const void *s2) {
for (; *(char*)s1 == *(char*)s2; ++*(char**)&s1, ++*(char**)&s2) ;

and gcc gave the warning (in -O2 mode):
t.c:3: warning: dereferencing type-punned pointer will break
strict-aliasing rules
What this says to me is that the implementors have done something
"clever" and warn me that the "lvalue cast" construct is not
safe, despite that char* and void* have the same representation.
I don't actually know if the warning is really applicable at that
particular point, or is merely printed by default before the compiler
even considers whether it's going to do an optimization at all.

If you read the information on that warning
http://gcc.gnu.org/onlinedocs/gcc-4....arning-Options
you will find it says "It warns about code which might break the
strict aliasing rules that the compiler is using for optimization."

If you read the info on -fstrict-aliasing
http://gcc.gnu.org/onlinedocs/gcc-4....timize-Options
you will then find that it activates optimisations based on the type of
the expression and "In particular, an object of one type is assumed
never to reside at the same address as an object of a different type,
unless the types are almost the same."

The example it gives of something that might not work is:
union a_union {
int i;
double d;
};

int f() {
a_union t;
int* ip;
t.d = 3.0;
ip = &t.i;
return *ip;
}

With the following being explicitly allowed even though it is undefined
behaviour in the standard:
int f() {
a_union t;
t.d = 3.0;
return t.i;
}
Better safe than sorry. I corrected above code to:

const unsigned char *c1 = s1, *c2 = s2;
for (; *c1 == *c2; ++c1, ++c2) ;

which is always portable.

Indeed.

Of course, the specifics of what gcc does are not really topical, but as
an example of a compiler that might produce code that does not do what
is expected I think it is valid.
--
Flash Gordon
Living in interesting times.
Although my email address says spam, it is real and I read it.

Nov 15 '05 #13

Friedhelm Usenet Waitzmann

Marc Thrun <Te********@gmx.de>:

PS: Where can I get a copy of the standard (maybe a draft version)?
Might help me next time ;-).

Draft version:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n869/

Nov 15 '05 #14

Tim Rentsch

"S.Tobias" <si***@FamOuS.BedBuG.pAlS.INVALID> writes:

Tim Rentsch <tx*@alumnus.caltech.edu> wrote:
"S.Tobias" <si***@FamOuS.BedBuG.pAlS.INVALID> writes:
Tim Rentsch <tx*@alumnus.caltech.edu> wrote:
> "S.Tobias" <si***@FamOuS.BedBuG.pAlS.INVALID> writes:
... >> a = *(TYPEA volatile*)&b;
>> //take the address of b, convert it to ptr to `volatile TYPEA',
>> //dereference, take value and put into `a'
>> //lvalue is type `volatile TYPEA'... hmm... nooooo, of course
>> //it can't mean b here, let's read what that object contains,
>> //and update b later... ...
The decision which object
_may_ be accessed can be based only on the type of lvalue and
effective type of the object, and I believe that this optimization
above is allowed.
What I think you're saying is that, even though the compiler
knows that the casted address refers to 'b', the rule in the
Standard about effective type allows the compiler to forget
that it does. And that's right; the Standard does allow
that.

But no actual compiler is going to do that. When optimizing, a
compiler always wants to use the best information available,
because that information might enable further optimization. So
the compiler is going to remember that the casted address points
to 'b' even though it's "gone through" the cast.

[snip]
Since all information is in one expression, it's easy to infer that
in this case object `b' is going to be accessed. However, a compiler
need not be that wise. Have you tried this with DS9000?

Not a relevant question, since we agree on what judgment the
Standard renders in such cases. Rather, the question is what do
actual compilers do. So if you want to make your case, look for
an actual compiler where the generated code for an assignment
statement

a = * (TYPEA volatile*) &b;

is along the lines you suggest.

Okay, I guess I'm talking just theory, you're making a practical point.

That seems like a fair summary.

I don't really know what compilers do (and I don't especially want to).
I once had this code, where I unwisely tried to save on temporary
variables:

int Mem_dosomething(const void *s1, const void *s2) {
for (; *(char*)s1 == *(char*)s2; ++*(char**)&s1, ++*(char**)&s2) ;

and gcc gave the warning (in -O2 mode):
t.c:3: warning: dereferencing type-punned pointer will break
strict-aliasing rules
What this says to me is that the implementors have done something
"clever" and warn me that the "lvalue cast" construct is not
safe, despite that char* and void* have the same representation.
That's possible. I think it's more likely to indicate an aggressive
warning check than to reflect whether the compiler considers the
generated code doesn't match the intention.

I don't actually know if the warning is really applicable at that
particular point, or is merely printed by default before the compiler
even considers whether it's going to do an optimization at all.
Better safe than sorry. I corrected above code to:

const unsigned char *c1 = s1, *c2 = s2;
for (; *c1 == *c2; ++c1, ++c2) ;

which is always portable.

It seems clear that the second writing is preferable to the first.

Nov 15 '05 #15

Tim Rentsch

Flash Gordon <sp**@flash-gordon.me.uk> writes:

S.Tobias wrote:
Tim Rentsch <tx*@alumnus.caltech.edu> wrote:
"S.Tobias" <si***@FamOuS.BedBuG.pAlS.INVALID> writes:

Tim Rentsch <tx*@alumnus.caltech.edu> wrote:

>"S.Tobias" <si***@FamOuS.BedBuG.pAlS.INVALID> writes:

...
>> a = *(TYPEA volatile*)&b;
>> //take the address of b, convert it to ptr to `volatile TYPEA',
>> //dereference, take value and put into `a'
>> //lvalue is type `volatile TYPEA'... hmm... nooooo, of course
>> //it can't mean b here, let's read what that object contains,
>> //and update b later...

...
The decision which object
_may_ be accessed can be based only on the type of lvalue and
effective type of the object, and I believe that this optimization
above is allowed.

What I think you're saying is that, even though the compiler
knows that the casted address refers to 'b', the rule in the
Standard about effective type allows the compiler to forget
that it does. And that's right; the Standard does allow
that.

But no actual compiler is going to do that. When optimizing, a
compiler always wants to use the best information available,
because that information might enable further optimization. So
the compiler is going to remember that the casted address points
to 'b' even though it's "gone through" the cast.
Based on the gcc info mages I think gcc might...
Since all information is in one expression, it's easy to infer that
in this case object `b' is going to be accessed. However, a compiler
need not be that wise. Have you tried this with DS9000?

Not a relevant question, since we agree on what judgment the
Standard renders in such cases. Rather, the question is what do
actual compilers do. So if you want to make your case, look for
an actual compiler where the generated code for an assignment
statement

a = * (TYPEA volatile*) &b;

is along the lines you suggest.

Okay, I guess I'm talking just theory, you're making a practical point.

I don't really know what compilers do (and I don't especially want to).
I once had this code, where I unwisely tried to save on temporary
variables:

int Mem_dosomething(const void *s1, const void *s2) {
for (; *(char*)s1 == *(char*)s2; ++*(char**)&s1, ++*(char**)&s2) ;

and gcc gave the warning (in -O2 mode):
t.c:3: warning: dereferencing type-punned pointer will break
strict-aliasing rules
What this says to me is that the implementors have done something
"clever" and warn me that the "lvalue cast" construct is not
safe, despite that char* and void* have the same representation.
I don't actually know if the warning is really applicable at that
particular point, or is merely printed by default before the compiler
even considers whether it's going to do an optimization at all.

If you read the information on that warning
http://gcc.gnu.org/onlinedocs/gcc-4....arning-Options
you will find it says "It warns about code which might break the
strict aliasing rules that the compiler is using for optimization."

If you read the info on -fstrict-aliasing
http://gcc.gnu.org/onlinedocs/gcc-4....timize-Options
you will then find that it activates optimisations based on the type of
the expression and "In particular, an object of one type is assumed
never to reside at the same address as an object of a different type,
unless the types are almost the same."

The example it gives of something that might not work is:
union a_union {
int i;
double d;
};

int f() {
a_union t;
int* ip;
t.d = 3.0;
ip = &t.i;
return *ip;
}

With the following being explicitly allowed even though it is undefined
behaviour in the standard:
int f() {
a_union t;
t.d = 3.0;
return t.i;
}

The information in the gcc pages was interesting reading.
Thank you for posting this.

Despite what the gcc documentation currently says, I'd be surprised if
this example remains apropos as an example. The reason for having
aliasing rules is to allow a compiler to generate better code, not to
excuse it for generating bad code. In cases where both goals can be
met (generating good code and generating correct code), that's what
compiler writers will want to do. That's the case here. So it will
be interesting to see how the compilers and documentation evolve as
these kinds of efforts go forward.

I'm still interested to hear about any actual compiler where code
like this:

... any calculation(s) of b ...

a = * (TYPEA volatile *) &b;

... any further calculation(s) of b ...

generates code that gets "the wrong value" into the variable 'a'.

Nov 15 '05 #16

Questions about pointers to objects and pointers to functions

Similar topics