Thank you for taking the time to write this very helpful and
enlightening response. May I suggest that it is in some way added to
the FAQ?
Chris Torek wrote:[color=blue]
> In article <1143293442.573269.244570@v46g2000cwv.googlegroups .com>[/color]
[...][color=blue]
> Let us assume, for the sake of argument, that "const void *" is a
> special datatype that uses 16 bytes to hold lots of extra information
> about the target, while other pointers (where the compiler can make
> more assumptions about the target) use just 4 bytes. So a *pointer
> to* "const void *" is only 4 bytes long but it points to 16 bytes.
> Meanwhile let us assume that a "double" is 8 bytes. Then:
>
> double x = 3.141592653589793238462;
> const void *p = &x;
> const void **q = &p;
>
> might look like this in memory:
>
> x p
> +---------------+ +------------------------------------------+
> | 3.415926... | <-|---------------------* |
> +---------------+ +------------------------------------------+
> ^
> |
> |
> +-------+
> q | * |
> +-------+[/color]
Do any systems exist that actually _do_ this? (I want to fix the
problem regardless, I'm just curious about if any system actually use
this extre freedom provided by C)
[color=blue]
>
> Now we have this:
>[color=green]
> >void function_remove( const wchar_t *name ) {
> > void *key;
> > function_data_t *d;
> > event_t ev;
> >
> > hash_remove( &function,[/color]
>
> Presumably "&function" is the address of a hash table, and OK (although
> we have not seen a declaration for the identifier "function").[/color]
My bad. Your assumption is correct.
[color=blue]
>[color=green]
> > name,[/color]
>
> Here "name" has type "const wchar_t *", but this is converted to a
> 32-byte "const void *" and the resulting value is passed in. This
> seems fine.
>[color=green]
> > (const void **) &key,[/color]
>
> Here we have a possible problem. The variable "key" is just "void *",
> not "const void *". If "key" were "const void *", &key would have
> type "const void **" and be a 4-byte pointer to a 16-byte pointer.
> The 16-byte pointer currently contains garbage (is uninitialized)
> but this would presumably be OK -- hash_remove is going to fill it
> in.
>
> As it happens, the C standard has some fuzzy wording about how
> const qualifiers should not affect the underlying representation,
> so if "const void *" is a 16-byte thing, plain "void *" should
> also be a 16-byte thing. So &key should be a 4-byte pointer to
> a 16-byte pointer, and this should actually work. It would be
> better to make "key" a "const void *" and remove the cast, though.
>[color=green]
> > (const void **)&d );[/color]
>
> Here, on the other hand, we have a definite problem.
>
> Remember that, except for "void *", we said that pointers were all
> four bytes. Here "d" has type "function_data_t *", so it is a
> 4-byte pointer. &d is a 4-byte pointer to a 4-byte pointer:
>
> &d d
> +-------+ +-------+
> | *---|--> | *---|-----? [we have no idea where this points]
> +-------+ +-------+
>
> but hash_remove is expecting a 4-byte pointer to a 16-byte pointer:
>
> old_data
> +-------+ +-------+.....................+
> | *---|--> | *---|
> +-------+ +-------+.....................+
> \_____/ \____________________/
> OK expected, but not there
>[color=green]
> >Compiling it using GCC 4.1 results in the following warning:
> >
> >gcc -g -O2 -std=c99 -fno-optimize-sibling-calls -Wall -D _GNU_SOURCE
> >-c -o function.o function.c function.c: In function
> >'function_remove':
> >function.c:204: warning: dereferencing type-punned pointer will break
> >strict-aliasing rules
> >
> >The call to hash_remove is what is generating the warning.
> >
> >My understanding about why this is a problem may be a bit blurry. In
> >general, using typecasting and pointer dereferencing does not seem to
> >be proper C99, and may end up breaking the compilers assumptions about
> >(non)aliasing of pointers of different types, and I suppose that is why
> >GCC is a bit unhappy here.[/color]
>
> Yes. Or, if pointers to different data types have different sizes,
> you could run into a situation in which hash_remove() updates all
> 16 bytes of your 4-byte pointer.
>[color=green]
> >I can get around this by using a union of a const void * and the real
> >datatype, but this seems like a hacksih solution. Any pointers to the
> >proper way to do this type of typecasting, so as to avoid aliasing
> >problems would be welcome.[/color]
>
> In general, the "right way to cast pointers" is not to do it at all.
>
> Consider the following small rewrite of function_remove():
>
> void function_remove(const wchar_t *name) {
> const void *key;
> const void *old_data;
> const function_data_t *d;
> event_t ev;
>
> /* this fills in "key" and "old_data" */
> hash_remove(&function, name, &key, &old_data);
>
> /*
> * We do not use "key" but we do use "old_data".
> * The "const void *" value that hash_remove() filled
> * in is actually a read-only pointer to function_data_t
> * (after being converted to "void *"), so we convert
> * it back from "void *" to "function_data_t *", all
> * const-qualified, i.e., read-only.
> */
> d = old_data;
>
> ... more code here, presumably ...
> }
>
> It is probably the case that hash_remove() does not modify the
> data, but the data themselves are not necessarily read-only. If
> this is true, you will have fewer headaches if you remove most of
> the "const"s. Otherwise you will have places at which you are
> forced to use casts to cast away the const-ness. This means that
> "const" is not giving you any protection after all, and is therefore
> useless. (This is usually true in C, which gets "const" quite
> wrong. C++ is better about it, although still "not quite right".
> It would all work right, and I think be more in the "spirit of C",
> if const had been a storage-class modifier instead of a type-qualifier.
> The other alternative is to have dynamic typing, as in real OO
> languages, so that the type of a return parameter corresponds to
> the type of the input parameter. This is sort of like C++ templates,
> except not ugly :-) . But oh well.)[/color]
Ok, that makes sense. Thanks!
[color=blue]
>
> Retaining just the one "useful or at least harmless" const gives:
>
> void hash_remove(hash_table_t *h, const void *key,
> void **old_key, void **old_data);
>
> void function_remove(const wchar_t *name) {
> void *key;
> void *old_data;
> function_data_t *d;
> event_t ev;
>
> hash_remove(&function, name, &key, &old_data);
> d = old_data;
>
> ... more code here, presumably ...
> }
>
> In either case, though, the heart of the matter is to supply a
> correct "&old_data", then derive the new value for "d" by *converting*
> the value that hash_remove() fills in. It is true that, on many
> (or perhaps even most) machines, this "conversion" is just "take
> the bits as-is", but by doing an actual conversion -- even if the
> compiler uses a total of 0 (zero) machine instructions to implement
> that conversion -- you get guaranteed-to-work code. On machines
> where the "conversion" is a no-op, if the compiler is half-decent
> (and gcc is), you get the zero-extra-instructions you would if you
> wrote non-portable, not-guaranteed code. Thus, there is no penalty
> for doing it right.[/color]
Yes, I've noticed that the const:ness doesn't buy you much. The
disadvantages are pretty clearly outlined in your response, the
advantage is that you never get a non-const pointer to a const memory
region, which is something that always makes me a bit uneasy. I added
it to err on the side of caution. You may be right that the
disadvantage outweighs the advantage - I might remove the const:s.
[color=blue]
> --
> In-Real-Life: Chris Torek, Wind River Systems
> Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
> email: forget about it
http://web.torek.net/torek/index.html
> Reading email is like searching for food in the garbage, thanks to spammers.[/color]
--
Axel