By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,440 Members | 1,872 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,440 IT Pros & Developers. It's quick & easy.

Accessing alternate union members

P: n/a
Given a union of the form
union {
T1 m1;
T2 m2;}obj;
where T1 and T2 are different scalar (non-aggregate) types.

The C99 standard states that
obj.m1 = value;
if (obj.m2 ...
invokes undefined behavior because my reference to the union is via a
member different than the last one stored into.

My question is, what about the following?
memcpy(&obj, &data, sizeof data);
if (obj.m1 ...

Ignoring the pathological cases such as sizeof data > sizeof obj or
sizeof data < sizeof (T1), is this valid?

If so and if I replace m1 with m2 above (thereby accessing something
other than the first member), is it still valid?
<<Remove the del for email>>
Nov 14 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
In article <news:bu**********@216.39.143.103>
Barry Schwarz <sc******@deloz.net> writes:
Given a union of the form
union {
T1 m1;
T2 m2;}obj;
where T1 and T2 are different scalar (non-aggregate) types.

The C99 standard states that
obj.m1 = value;
if (obj.m2 ...
invokes undefined behavior because my reference to the union is via a
member different than the last one stored into.
Right. Note that on "real world" systems (as opposed to Deathstations
or some such :-) ) the problem is most likely to occur when T1 is
some sort of integral type and T2 is some sort of floating-point
type, and you have managed to store a reserved or signalling-NaN
bit pattern into the bytes that will be examined for obj.m2. For
instance, it is easy enough to come up with bit patterns that result
in "floating point exception" crashes on Intel CPUs (provided
signalling NaNs are not being ignored) when T1 is int and T2 is
float, or when T1 is long long and T2 is double.
My question is, what about the following?
memcpy(&obj, &data, sizeof data);
if (obj.m1 ...

Ignoring the pathological cases such as sizeof data > sizeof obj or
sizeof data < sizeof (T1), is this valid?
Since this copies bytes (what C99 calls "object representations")
from "data" to "obj", it is valid if and only if those bytes are
those resulting from storing a valid value to an obj.m1 or equivalent.
One obvious problem here is that "obj" has an unnamed union type,
so that it is impossible for "data" to have the same type unless
"data" is declared and defined in a separate translation unit --
but in that separate translation unit it is at least difficult, if
not impossible, to declare "obj" correctly.

If we give the union type a name so that we can consistently refer
to it:

union U { T1 m1; T2 m2; };
union U obj;
union U data;

then we can be sure about what is in "data" if, e.g., we do this:

obj.m1 = value;
memcpy(&data, &obj, sizeof data);

Now "data" is a copy of "obj", so that data.m1 is valid because
obj.m1 is valid. A subsequent memcpy() back to &obj leaves obj.m1
valid again.
If so and if I replace m1 with m2 above (thereby accessing something
other than the first member), is it still valid?


The conditions for whether obj.m2 is valid are basically the same as
those for whether obj.m1 is valid -- the bytes copied from &data to
&obj must be those making up a vaild "object representation".

A somewhat trickier question (and the one I suspect you are really
asking) is: suppose we have union U as above, but we then do
something like this:

union U obj;
T1 data;
...
data = some_valid_value_of_type_t1;
memcpy(&obj, &data, sizeof data);
... now refer to obj.m1 ...

I think it is safe to say that most real-world C implementations
will have no problem with this; but without careful scrutiny of
the C99 standard to prove otherwise, I would assume that
Deathstation-like "evil" C implementations would be allowed to fail
if "unused" bytes of the union were not properly set. For instance,
suppose T1 is int and T2 is double, and sizeof(int) is 4 while
sizeof(double) is 8. Suppose further that the Evil Implementation
handles the union by storing a checksummed copy of the four bytes
making up the "int" in a fifth byte in the space that would otherwise
be occupied by the double. If the checksum fails to match, the
implementation delivers a runtime exception. As far as I can tell
(without careful study of the C99 wording) this is allowed.

In other words, unless you want to depend on the friendliness of
your implementation, Don't Do That. :-)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 14 '05 #2

P: n/a
On 18 Jan 2004 00:35:12 GMT, Chris Torek <no****@torek.net> wrote:
In article <news:bu**********@216.39.143.103>
Barry Schwarz <sc******@deloz.net> writes:
Given a union of the form
union {
T1 m1;
T2 m2;}obj;
where T1 and T2 are different scalar (non-aggregate) types.

The C99 standard states that
obj.m1 = value;
if (obj.m2 ...
invokes undefined behavior because my reference to the union is via a
member different than the last one stored into.
Right. Note that on "real world" systems (as opposed to Deathstations
or some such :-) ) the problem is most likely to occur when T1 is
some sort of integral type and T2 is some sort of floating-point
type, and you have managed to store a reserved or signalling-NaN
bit pattern into the bytes that will be examined for obj.m2. For
instance, it is easy enough to come up with bit patterns that result
in "floating point exception" crashes on Intel CPUs (provided
signalling NaNs are not being ignored) when T1 is int and T2 is
float, or when T1 is long long and T2 is double.


All true but only tangentially related to my question. The situation
you describe can be produced just as easily with code of the form
int i = ...
float f;
memcpy(&f, &i, sizeof i);
if (f ...
yet the language does not *require* this to be undefined as it does my
first sample.
My question is, what about the following?
memcpy(&obj, &data, sizeof data);
if (obj.m1 ...

Ignoring the pathological cases such as sizeof data > sizeof obj or
sizeof data < sizeof (T1), is this valid?
Since this copies bytes (what C99 calls "object representations")
from "data" to "obj", it is valid if and only if those bytes are
those resulting from storing a valid value to an obj.m1 or equivalent.
One obvious problem here is that "obj" has an unnamed union type,
so that it is impossible for "data" to have the same type unless
"data" is declared and defined in a separate translation unit --
but in that separate translation unit it is at least difficult, if
not impossible, to declare "obj" correctly.


I realize that copying an invalid bit pattern to an object and then
attempting to evaluate the object is a no-no, but it is basically a
run time problem. If we make T2 in my question unsigned char, then no
matter what value is stored in m1, m2 can never have any invalid or
trap representation. However, code of the form
obj.m1 = value;
if (obj.m2 ...
still invokes undefined behavior simply because the standard says so,
not for any practical reason.

So my real question is, ignoring pathological cases (to also include
invalid bit patterns) and considering that I do not store into a
member of the union, does my second example involve a priori undefined
behavior the way my first does?

If we give the union type a name so that we can consistently refer
to it:

union U { T1 m1; T2 m2; };
union U obj;
union U data;

then we can be sure about what is in "data" if, e.g., we do this:

obj.m1 = value;
memcpy(&data, &obj, sizeof data);

Now "data" is a copy of "obj", so that data.m1 is valid because
obj.m1 is valid. A subsequent memcpy() back to &obj leaves obj.m1
valid again.
If so and if I replace m1 with m2 above (thereby accessing something
other than the first member), is it still valid?
The conditions for whether obj.m2 is valid are basically the same as
those for whether obj.m1 is valid -- the bytes copied from &data to
&obj must be those making up a vaild "object representation".

A somewhat trickier question (and the one I suspect you are really
asking) is: suppose we have union U as above, but we then do
something like this:

union U obj;
T1 data;
...
data = some_valid_value_of_type_t1;
memcpy(&obj, &data, sizeof data);
... now refer to obj.m1 ...

I think it is safe to say that most real-world C implementations
will have no problem with this; but without careful scrutiny of
the C99 standard to prove otherwise, I would assume that
Deathstation-like "evil" C implementations would be allowed to fail
if "unused" bytes of the union were not properly set. For instance,
suppose T1 is int and T2 is double, and sizeof(int) is 4 while
sizeof(double) is 8. Suppose further that the Evil Implementation
handles the union by storing a checksummed copy of the four bytes
making up the "int" in a fifth byte in the space that would otherwise
be occupied by the double. If the checksum fails to match, the
implementation delivers a runtime exception. As far as I can tell
(without careful study of the C99 wording) this is allowed.


Again no disagreement. And I like your example of why it should be
undefined. However, if T1 is the same type as T2, my first example
still invokes undefined behavior by definition (or is it by
specification) while the problem you describe cannot occur in my
second example.

In other words, unless you want to depend on the friendliness of
your implementation, Don't Do That. :-)


Maybe if I phrased the question as: "A really clever lint program
would be correct to generate a diagnostic that my first example must
invoke undefined behavior. Would it be correct to do so, according to
the standard, for my second example?"
<<Remove the del for email>>
Nov 14 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.