Is this legal? Aesthetically acceptable?

Dave Vandervies

I'm writing an abstraction layer over an OS's ability to do non-blocking
reads and writes on various things. The basic idea is that the caller
will be able to let the I/O happen in the background, and will poll
for completion using something along the lines of *nix's select() or
Win32's WaitForMultipleObjects, and when the OS signals that a read has
completed or a write will be accepted the abstraction layer will be called
to actually do the I/O call and call any code that's interested in it.

The read and write handling have some basic asymmetry, so the abstraction
layer's data and the interface to the code that requests reads and writes
will be different depending on which direction the data is going, but
the basic poll-and-call-handler pattern in the main loop will be the
same either way.

What I'd like to do is something like this:
--------
/*in overlapped.h*/
struct overlapped_cookie
{
sync_type need_attention;
struct overlapped_data *internal;
};
--------
/*in overlapped-read.c*/
struct overlapped_data
{
/*Internal data for read bookkeeping*/
};

struct overlapped_cookie *setup_read(/*args*/)
{
struct overlapped_cookie *ret=malloc(sizeof *ret);
if(!ret)
return NULL;
ret->internal=malloc(sizeof ret->internal);
if(!ret->internal)
{
free(ret);
return NULL;
}

/*Continue with initialization*/
}
--------
/*in overlapped-write.c*/
struct overlapped_data
{
/*Internal data for write bookkeeping*/
};

struct overlapped_cookie *setup_write(/*args*/)
{
struct overlapped_cookie *ret=malloc(sizeof *ret);
if(!ret)
return NULL;
ret->internal=malloc(sizeof ret->internal);
if(!ret->internal)
{
free(ret);
return NULL;
}

/*Continue with initialization*/
}
--------

The idea is that the main loop can do something like:
--------
/*These arrays are populated by other code to keep an up-to-date list of
reads and writes (possibly to be extended to other events) that are
being waited for
*/
sync_type sync[N];
int (*handler[N])(struct overlapped_cookie *);
struct overlapped_cookie *all_overlapped[N];

/*main loop, minus error handling:*/
while(!are_we_done_yet())
{
int which_one=do_wait(sync,num_filled);
handler[which_one](all_overlapped[which_one]);
}
--------
I have two questions about this:

(1) Is it legal and well-defined for external code to use a pointer to
the same incomplete struct type for what is actually different types
in different translation units, as long as no read struct ever gets
passed to the write functions and no write struct ever gets passed to
the read functions?
(I'm fairly sure, but not certain, that the struct pointer representation
requirements make this valid.)

(2) Is this considered acceptable, maintainable, and not aesthetically
unpleasant by other C programmers? Or, at least, no worse than the
alternative of forcing the main poll loop to handle reads and writes
separately?
(This one I'm not so sure about.)

(2b) Is there a better way to do this?
dave

--
Dave Vandervies dj******@csclub.uwaterloo.ca
I _am_ consistent - if one of those other pointer guide writers came
here and asked for comments, they'd get chewed out just as badly.
--Richard Bos in comp.lang.c

May 29 '07 #1

Subscribe Post Reply

1526

Eric Sosman

Dave Vandervies wrote On 05/29/07 13:33,:

[code and exposition snipped for brevity; see up-thread]
I have two questions about this:

(1) Is it legal and well-defined for external code to use a pointer to
the same incomplete struct type for what is actually different types
in different translation units, as long as no read struct ever gets
passed to the write functions and no write struct ever gets passed to
the read functions?
(I'm fairly sure, but not certain, that the struct pointer representation
requirements make this valid.)

As far as I can see, it's legal. The incomplete type is
completed differently in the read and write implementations,
but as long as the barrier between them remains unbroken I
don't see how any trouble can arise.

(2) Is this considered acceptable, maintainable, and not aesthetically
unpleasant by other C programmers? Or, at least, no worse than the
alternative of forcing the main poll loop to handle reads and writes
separately?
(This one I'm not so sure about.)

It's perhaps a matter of taste, but I'd consider the
"identifier overloading" to be Asking For Trouble, hence
UNmaintainable and UNacceptable. A programmer encounters
a `struct overlapped_data', educates himself about what it
is and how it's used, and then runs into another `struct
overlapped_data' that looks and behaves differently. That
would be unremarkable if the two divergent struct types
came from independent libraries being integrated into the
same program, but to find the conflict within one single
package of related functions ... Ugh. Double ugh.

(2b) Is there a better way to do this?

One possibility would be to use a `void*' as the
"externally visible" pointer to the internal struct, and
to convert it to a `struct overlapped_read_data*' or to a
`struct overlapped_write_data*' internally. Completely
gives up on type safety, but does little or no violence to
your framework as it stands.

Another approach might be to rearrange things a bit.
Instead of having the overlapped_cookie struct point at
different varieties of overlapped_data structures, make
the overlapped_cookie be the first element of both flavors
of overlapped_data:

struct overlapped_cookie {
sync_type need_attention;
/* any other externally-visible stuff
* that you always want
*/
};

In the internals of the read and write implementations:

struct overlapped_read_data {
struct overlapped_cookie cookie;
/* internal stuff specific to read */
};

and

struct overlapped_write_data {
struct overlapped_cookie cookie;
/* internal stuff specific to write */
};

Your external interface would traffic in pointers to
struct overlapped_cookie objects, which are elements of
the struct overlapped_*_data objects allocated internally.
Thanks to 6.7.2.1p13 it is permissible to convert a struct
pointer to a pointer to its first element and vice versa,
so an internal function can convert a cookie pointer to
a pointer to the struct that contains it.

It's still a matter of taste, but I find the second
approach far more palatable than the first.

--
Er*********@sun.com

May 29 '07 #2

Hallvard B Furuseth

Eric Sosman writes:

>Dave Vandervies wrote On 05/29/07 13:33,:
>(1) Is it legal and well-defined for external code to use a pointer to
the same incomplete struct type for what is actually different types
in different translation units, as long as no read struct ever gets
passed to the write functions and no write struct ever gets passed to
the read functions?
(I'm fairly sure, but not certain, that the struct pointer representation
requirements make this valid.)

As far as I can see, it's legal. The incomplete type is
completed differently in the read and write implementations,
but as long as the barrier between them remains unbroken I
don't see how any trouble can arise.

What barrier? Google "link time optimization".
I don't really expect trouble in this case either, but...

(...)
>(2b) Is there a better way to do this?

One possibility would be to use a `void*' as the
"externally visible" pointer to the internal struct, and
to convert it to a `struct overlapped_read_data*' or to a
`struct overlapped_write_data*' internally. Completely
gives up on type safety, but does little or no violence to
your framework as it stands.

Or point to a union which contains both structs. Wastes
memory for the smallest struct though.

Another approach might be to rearrange things a bit.
Instead of having the overlapped_cookie struct point at
different varieties of overlapped_data structures, make
the overlapped_cookie be the first element of both flavors
of overlapped_data:
(...)
Thanks to 6.7.2.1p13 it is permissible to convert a struct
pointer to a pointer to its first element and vice versa,
so an internal function can convert a cookie pointer to
a pointer to the struct that contains it.

Alternatively, declare both structs in a header file and
make them members of a union in that header. Then thanks
to C99 6.5.2.3p5 it's OK to use the common initial members
of either of the struct types - as long as the union is
in scope (even if you do not use it).

--
Regards,
Hallvard

May 30 '07 #3

Eric Sosman

Hallvard B Furuseth wrote On 05/30/07 14:55,:

Eric Sosman writes:

>>Dave Vandervies wrote On 05/29/07 13:33,:

>>>(2b) Is there a better way to do this?

One possibility would be to use a `void*' as the
"externally visible" pointer to the internal struct, and
to convert it to a `struct overlapped_read_data*' or to a
`struct overlapped_write_data*' internally. Completely
gives up on type safety, but does little or no violence to
your framework as it stands.

Or point to a union which contains both structs. Wastes
memory for the smallest struct though.

It also exposes the details of the no-longer-opaque
structs, making it difficult to change those details at
a later date. (The O.P. did not say in so many words that
opacity was a goal, so I may be reading more than is there.)

> Another approach might be to rearrange things a bit.
Instead of having the overlapped_cookie struct point at
different varieties of overlapped_data structures, make
the overlapped_cookie be the first element of both flavors
of overlapped_data:
(...)
Thanks to 6.7.2.1p13 it is permissible to convert a struct
pointer to a pointer to its first element and vice versa,
so an internal function can convert a cookie pointer to
a pointer to the struct that contains it.

Alternatively, declare both structs in a header file and
make them members of a union in that header. Then thanks
to C99 6.5.2.3p5 it's OK to use the common initial members
of either of the struct types - as long as the union is
in scope (even if you do not use it).

Again, opacity is lost. Also, I don't think this quite
obeys the letter of the law: 6.5.2.3p5 specifically talks
about accessing struct elements inside union objects, not
about free-standing structs of the same types.

I once fixed a bug that had to do with exactly this
distinction. As in 6.5.2.3, we had a bunch of structs with
a common initial subsequence, and a union containing all of
them. Vastly simplified and violently paraphrased:

struct One {
int type; /* always equal to 1 */
int payload;
};

struct Two {
int type; /* always equal to 2 */
double data;
};

union All {
struct One eins;
struct Two zwei;
};

The bug arose when a pointer to a free-standing struct One
instance was passed to a function expecting a union pointer:

void printValue(const union All *ptr) {
switch (ptr->eins.type) {
case 1:
printf ("one: %d\n", ptr->eins.payload);
break;
case 2:
printf ("two: %g\n", ptr->zwei.data);
break;
...
}
}

...

struct One instance = { 1, 42 };
printValue ((union All*) &instance);

At first glance this might look all right, but what
it actually produced was the platform's equivalent of a
bus error. The problem was that the alignment requirement
for a struct Two was stricter than that for a struct One,
so the alignment for a union All was also stricter than
for a struct One. The free-standing struct One instance
was properly aligned for a struct One (of course), but
happened not satisfy the stricter union All requirement.
Unfortunately, the compiler "knew" that its argument would
point to strictly-aligned memory, and generated instructions
that faulted when the actual argument turned out to be too
leniently aligned.

To put it another way: every valid union All pointer
was a valid struct One pointer, but the converse didn't
hold.

Now, this might not be an issue in the O.P.'s case,
because he's using malloc() to obtain memory for instances
of his "variant" structs and malloc()'s alignment suffices
for anything at all. But that doesn't strike me as a safe
practice for the long haul: Someday, somebody may embed a
struct One in a struct BigBox and not put it at the very
start, or somebody may mallocate an array of struct One
instances -- the [0] element will be splendidly aligned,
but what about the [1] element?

From 6.5.2.3p5 and from other parts of the Standard,
we can deduce that the common initial subsequences of
the various structs that appear together in unions must
be arranged identically: You could walk through them with
offsetof() and get the same answers from all variants.
But identical layout isn't in itself enough to avoid
undefined behavior in type punning. And when there's a
perfectly clean (and opacity-preserving) method available,
there seems little incentive to juggle hand grenades.

--
Er*********@sun.com

May 30 '07 #4

Hallvard B Furuseth

Eric Sosman writes:

Hallvard B Furuseth wrote On 05/30/07 14:55,:
>Alternatively, declare both structs in a header file and
make them members of a union in that header. Then thanks
to C99 6.5.2.3p5 it's OK to use the common initial members
of either of the struct types - as long as the union is
in scope (even if you do not use it).

Again, opacity is lost. Also, I don't think this quite
obeys the letter of the law: 6.5.2.3p5 specifically talks
about accessing struct elements inside union objects, not
about free-standing structs of the same types.

Oops, you are right.

--
Regards,
Hallvard

May 31 '07 #5

Dave Vandervies

In article <1180559022.950171@news1nwk>,
Eric Sosman <Er*********@Sun.COMwrote:

>Hallvard B Furuseth wrote On 05/30/07 14:55,:

>Or point to a union which contains both structs. Wastes
memory for the smallest struct though.

It also exposes the details of the no-longer-opaque
structs, making it difficult to change those details at
a later date. (The O.P. did not say in so many words that
opacity was a goal, so I may be reading more than is there.)

It was, but more as a matter of good habits than for any specific reason.
(I've seen too much code that was made into a crawling horror by intricate
cross-linking of dependencies between different parts, so I like to make
it as difficult as possible to introduce dependencies on anything other
than a published interface.)

After some further thought, hiding the opaque structs behind a void
pointer seems to have been what I was after. (I was clearly too close
to the initial code that I was trying to abstract to actually see that;
it's probably what I would have suggested had somebody else asked the
question.)
I ended up deciding to make my abstraction layer even more abstract as
well, and having done that it makes even more sense to hide as many of
the details as possible behind a void *.
dave
--
Dave Vandervies dj******@csclub.uwaterloo.ca

I sense some Perl coming on. --Mike Andrews and Steve VanDevender

Be sure to [...] have someone nearby if a particularly in the scary
severe attack requires restraint or sedation. devil monastery

Jun 5 '07 #6

Eric Sosman

Dave Vandervies wrote:

In article <1180559022.950171@news1nwk>,
Eric Sosman <Er*********@Sun.COMwrote:
>Hallvard B Furuseth wrote On 05/30/07 14:55,:

>>Or point to a union which contains both structs. Wastes
memory for the smallest struct though.
It also exposes the details of the no-longer-opaque
structs, making it difficult to change those details at
a later date. (The O.P. did not say in so many words that
opacity was a goal, so I may be reading more than is there.)

It was, but more as a matter of good habits than for any specific reason.
(I've seen too much code that was made into a crawling horror by intricate
cross-linking of dependencies between different parts, so I like to make
it as difficult as possible to introduce dependencies on anything other
than a published interface.)

After some further thought, hiding the opaque structs behind a void
pointer seems to have been what I was after. (I was clearly too close
to the initial code that I was trying to abstract to actually see that;
it's probably what I would have suggested had somebody else asked the
question.)
I ended up deciding to make my abstraction layer even more abstract as
well, and having done that it makes even more sense to hide as many of
the details as possible behind a void *.

An advantage of using a pointer to an incomplete struct
type instead of a pointer to void is that you retain a little
bit of type safety. Consider the scenarios:

/* in the header: */
void doSomething(void *); /* pass a Gizmo pointer */
/* stupid caller: */
doSomething ("Gizmo"); /* no murmur from compiler */

vs.

/* in the header: */
void doSomething(struct gizmo*);
/* stupid caller: */
doSomething ("Gizmo"); /* compile-time diagnostic */

Now, it's clear from the outset that users of your software
are of above average intelligence, nowhere near stupid enough to
pass a string when something else is required. But if there are
a lot of different but related types floating about -- struct
gizmo, struct pseudoGizmo, struct gizmoExtended -- even an elite
audience may suffer occasional confusion. If that should happen,
it's usually better to rescue them with the compiler than with
the debugger ...

The cost of providing such extra safety is usually small:
One new identifier pollutes the user's name space for each type
you need to invent. That cost is usually tolerable, considering
that doSomething() and its brethren are even more polluting (they
probably have external linkage). A handful of no-linkage type
names will not usually inconvenience the user who's already made
up his mind to accept your package's more numerous (usually)
external identifiers.

Treasure my free advice: It's worth every cent you pay ...

--
Eric Sosman
es*****@acm-dot-org.invalid

Jun 5 '07 #7

Is this legal? Aesthetically acceptable?

Similar topics