Few doubts (wrong behaviour or correct )

Arthur J. O'Dwyer wrote:

>well here are a few questions
1) Where does it fail. I mean is there any exception where it willnot
work assuming malloc works fine.

Technically, it doesn't have to work according to any standard, because
you'll be accessing the array 'b' beyond its end. ('b' has one element,
according to its definition, so accessing 'b[1]' is an error.)

(You probably meant "array obj". 'b' is the type. 'obj' is the arrays name.)

What you are saying is not true.

There's nothing in the standard that says that the limitation you are
talking about applies specifically to array 'obj' as it is declared
(i.e. with the size specified in the declaration). Note, that when we
use the '[]' operator, array type immediately decays to pointer type,
meaning that 'obj[1]' is equivalent to '*((b*) obj + 1)', i.e. where
'obj' value is no longer an array, but a mere pointer. How far we are
allowed to step away from that pointer depends only on the size of the
_actual_ array pointed by decayed 'obj', not by the declared size of
array 'obj'. That actual size is defined by one and only one thing - how
much memory we actually allocated for the struct.

The incorrect assumption that "struct hack" (with nonzero array size) is
somehow illegal is normally a consequence of one of two popular mistakes:

1) People mistake it with a zero-size "struct hack" (which is indeed a
"hack" and is indeed illegal)

2) People are used to the fact that in most practical cases the actual
size of the array in C is identical to its declared size. In such cases
access beyond the declared size is indeed illegal. However, "struct
hack" is a completely different beast.

In practice, this is historically how a lot of things were implemented
in Unix and so on, so many compilers go out of their way to make sure
it works the way you expected.

That specifically refers to using a zero-sized array in the declaration.
That would be illegal. And that's exactly what many compilers "went out
of their way" to allow as an extension.

As for the struct hack with non-zero-sized array (as in OP's example)
there's nothing illegal about it and absolutely no additional effort is
required from the compiler in order to support it.

>4) is b *ptr instead of b obj[1] a better way. Why should one use b
obj[1] instead of b *ptr.

It is a better way. The "struct hack" is a historical oddity, but
you shouldn't use it in new code just for the sake of using it.

Absolutely not. For a beginner who has hard time understanding the
"struct hack" this might be a good temporary technique. However, it
comes at the cost of unnecessary level of indirection, which is a design
issue and a performance issue: deep-copy required, indirect access
required, memory fragmentation increased, locality of the access reduced
etc. In mature code "struct hack" technique is the correct way to go in
situations when the variable-size array is a natural aggregate member of
the struct (according to the program's design).

--
Best regards,
Andrey Tarasevich

Oct 12 '06 #3

Andrey Tarasevich <an**************@hotmail.comwrites:
[...]

As for the struct hack with non-zero-sized array (as in OP's example)
there's nothing illegal about it and absolutely no additional effort is
required from the compiler in order to support it.

It's true that no additional effort is required to support the struct
hack, at least for most compilers. But if a compiler goes to the
consdeirable extra effort to implement array bounds checking, it could
break code that uses the struct hack.

For example:

int arr[5];
arr[5] = 42;

In most implementations, the compiler won't complain about this, and
the program will happily write the value 42 over some chunk of memory.
But it's undefined behavior, which means a compiler is *allowed* to do
anything it likes, including catching the error.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Oct 12 '06 #4

temper3243

Arthur J. O'Dwyer wrote:

On Wed, 11 Oct 2006 te********@gmail.com wrote:

(I cleaned up your code a bit. I hope you don't write code like that
in Real Life.)

#include <stdio.h>
#include <stdlib.h>
typedef struct b {
int b1;
char bc;
} b;

typedef struct a {
char a;
int a1;
b obj[1];
} a;

int main()
{
a *ptr;
ptr = malloc(sizeof(struct a) + 5*sizeof(struct b));
[...]
return 0;
}

well here are a few questions
1) Where does it fail. I mean is there any exception where it willnot
work assuming malloc works fine.

Technically, it doesn't have to work according to any standard, because
you'll be accessing the array 'b' beyond its end. ('b' has one element,
according to its definition, so accessing 'b[1]' is an error.)

In the standard i have seen that last member of structure can have
incomplete type.
What exactly is an incomplete type and if arr[1] is an incomplete type
then isn't this valid. I am confused about this or i must have
misunderstood this.

In practice, this is historically how a lot of things were implemented
in Unix and so on, so many compilers go out of their way to make sure
it works the way you expected.

2) someone told me in C99 we have declar b obj[] , so that it can be
declared at runtime. How do we do that. Do we have to malloc again .
Does this also have unwanted effects.

See http://david.tribble.com/text/cdiffs.htm#C99-fam
You'd write exactly what you already have, except you'd write 'b[]'
instead of 'b[1]'. Since 'b' now no longer has only one element, it's
acceptable, according to the C99 standard, to refer to b[0], b[1], b[2],
and so on.
The only "unwanted effect" of this approach --- and it's a big one ---
is that it won't work on pre-C99 C compilers.

3) Assuming the below code is a hack and it works , how do i make it
work for n objects of type a with each a having k objects of type b.

You don't. You rewrite the code to use a pointer to a separately
malloc'ed array, which is the /real/ portable solution --- it works
in both C90 and C99.

4) is b *ptr instead of b obj[1] a better way. Why should one use b
obj[1] instead of b *ptr.

It is a better way. The "struct hack" is a historical oddity, but
you shouldn't use it in new code just for the sake of using it. Use
the correct, portable method, even if it takes two more lines.

struct a {
char a;
int a1;
b *obj;
};

int main()
{
struct a *ptr = malloc(sizeof *ptr);
if (ptr != NULL)
ptr->obj = malloc(6*sizeof *ptr->obj);
[...]
free(ptr->obj);
free(ptr);
return 0;
}

HTH,
-Arthur

Oct 12 '06 #5

CBFalconer

Andrey Tarasevich wrote:

Arthur J. O'Dwyer wrote:

.... snip ...

>>
It is a better way. The "struct hack" is a historical oddity, but
you shouldn't use it in new code just for the sake of using it.

Absolutely not. For a beginner who has hard time understanding the
"struct hack" this might be a good temporary technique. However, it
comes at the cost of unnecessary level of indirection, which is a
design issue and a performance issue: deep-copy required, indirect
access required, memory fragmentation increased, locality of the
access reduced etc. In mature code "struct hack" technique is the
correct way to go in situations when the variable-size array is a
natural aggregate member of the struct (according to the program's
design).

I can't remember an occasion where I have needed to use a "struct
hack". I usually use a pointer to something of the appropriate
type, and if a need for copies is anticipated provide a deep-copy
routine associated with the type. This usually also involves
providing an associated deep-free routine. The original
constructor mechanism is needed in any case.

--
Some informative links:
<news:news.announce.newusers
<http://www.geocities.com/nnqweb/>
<http://www.catb.org/~esr/faqs/smart-questions.html>
<http://www.caliburn.nl/topposting.html>
<http://www.netmeister.org/news/learn2quote.html>
<http://cfaj.freeshell.org/google/>

Oct 12 '06 #6

Keith Thompson wrote:

Andrey Tarasevich <an**************@hotmail.comwrites:
[...]
>As for the struct hack with non-zero-sized array (as in OP's example)
there's nothing illegal about it and absolutely no additional effort is
required from the compiler in order to support it.

It's true that no additional effort is required to support the struct
hack, at least for most compilers. But if a compiler goes to the
consdeirable extra effort to implement array bounds checking, it could
break code that uses the struct hack.

For example:

int arr[5];
arr[5] = 42;

In most implementations, the compiler won't complain about this, and
the program will happily write the value 42 over some chunk of memory.
But it's undefined behavior, which means a compiler is *allowed* to do
anything it likes, including catching the error.
...

Yes, but, once again, that's a completely different case. Note the important
detail here: this is undefined not because the static type of 'arr' is 'int[5]'
before decay, but because the _dynamic_ type of 'arr' is 'int[5]'. (I use the
notions of "static" and "dynamic type" borrowed from C++ terminology, but I hope
it is clear what I mean). With arrays declared as in your example it is easy to
determine because their static type is always the same as their dynamic type.

Note also that the very same compiler will probably not catch the error in

int arr[5];
int* const p = arr;
p[5] = 42;

meaning that it treats these two situations differently, although from the point
of view of language specification they are essentially the same.

In the following example

int arr[100];
int (*parr)[5] = (int(*)[5]) &arr;

(*parr)[5] = 42;

the array access is also perfectly legal even though it seems to violate the
array bound of static type of '*parr'. A compiler that issues an error in this
case (i.e. refuses to compile the code) would be non-compliant.

--
Best regards,
Andrey Tarasevich

Oct 12 '06 #7

Andrey Tarasevich <an**************@hotmail.comwrites:

Keith Thompson wrote:
>Andrey Tarasevich <an**************@hotmail.comwrites:
[...]
>>As for the struct hack with non-zero-sized array (as in OP's example)
there's nothing illegal about it and absolutely no additional effort is
required from the compiler in order to support it.

It's true that no additional effort is required to support the struct
hack, at least for most compilers. But if a compiler goes to the
consdeirable extra effort to implement array bounds checking, it could
break code that uses the struct hack.

For example:

int arr[5];
arr[5] = 42;

In most implementations, the compiler won't complain about this, and
the program will happily write the value 42 over some chunk of memory.
But it's undefined behavior, which means a compiler is *allowed* to do
anything it likes, including catching the error.
...

Yes, but, once again, that's a completely different case. Note the
important detail here: this is undefined not because the static type
of 'arr' is 'int[5]' before decay, but because the _dynamic_ type of
'arr' is 'int[5]'. (I use the notions of "static" and "dynamic type"
borrowed from C++ terminology, but I hope it is clear what I
mean). With arrays declared as in your example it is easy to
determine because their static type is always the same as their
dynamic type.

Sorry, I don't know what you mean by "static type" vs. "dynamic type".
Can you make the argument in terms of C?

Note also that the very same compiler will probably not catch the error in

int arr[5];
int* const p = arr;
p[5] = 42;

meaning that it treats these two situations differently, although
from the point of view of language specification they are
essentially the same.

Agreed.

In the following example

int arr[100];
int (*parr)[5] = (int(*)[5]) &arr;

(*parr)[5] = 42;

the array access is also perfectly legal even though it seems to
violate the array bound of static type of '*parr'. A compiler that
issues an error in this case (i.e. refuses to compile the code)
would be non-compliant.

I'm not convinced of that.

A compiler could implement "fat pointers" that incorporate bounds
information into each pointer along with the address. Such an
implementation could catch things like "p[5] = 42", because it could
tell from the value of p that it points to the beginning of an array
of only 5 elements.

There's been considerable debate on comp.std.c about whether indexing
beyond the bounds of an array causes undefined behavior even if it's
within some declared array object. I'm not sure that the question was
ever resolved. For example, given:

int arr[10][10];

it's not clear whether

arr[0][20]

invokes undefined behavior. This also affects whether the struct hack
is legal.

And recall what FAQ 2.6 says about the struct hack:

Despite its popularity, the technique is also somewhat notorious:
Dennis Ritchie has called it ``unwarranted chumminess with the C
implementation,'' and an official interpretation has deemed that
it is not strictly conforming with the C Standard, although it
does seem to work under all known implementations. (Compilers
which check array bounds carefully might issue warnings.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Oct 12 '06 #8

Arthur J. O'Dwyer

On Thu, 12 Oct 2006 te********@gmail.com wrote:

Arthur J. O'Dwyer wrote:
>On Wed, 11 Oct 2006 te********@gmail.com wrote:
>>>
1) Where does it fail. I mean is there any exception where it willnot
work assuming malloc works fine.

Technically, it doesn't have to work according to any standard, because
you'll be accessing the array 'b' beyond its end. ('b' has one element,
according to its definition, so accessing 'b[1]' is an error.)

In the standard i have seen that last member of structure can have
incomplete type.
What exactly is an incomplete type and if arr[1] is an incomplete type
then isn't this valid. I am confused about this or i must have
misunderstood this.

In the C99 standard, the last member of a structure may have incomplete
type, yes. (But not in the older, more popular C90 standard.)

int arr[1];

defines 'arr' to have the type "array of one int," a complete type.
An "incomplete" type is one such as "array of int":

int arr[];

Here, we do not say /anything/ about how many elements the array has,
so the type is "incomplete."

HTH,
-Arthur

Oct 12 '06 #9

Chris Torek

>Keith Thompson wrote:

>It's true that no additional effort is required to support the struct
hack, at least for most compilers. But if a compiler goes to the
considerable extra effort to implement array bounds checking, it could
break code that uses the struct hack.

For example:

int arr[5];
arr[5] = 42;

In most implementations, the compiler won't complain about this, and
the program will happily write the value 42 over some chunk of memory.
But it's undefined behavior, which means a compiler is *allowed* to do
anything it likes, including catching the error.

In article <12*************@news.supernews.com>
Andrey Tarasevich <an**************@hotmail.comwrote:

>Yes, but, once again, that's a completely different case.

Not necessarily. You might *like* it to be, but Standard C (either
C89 or C99) does not say so.

>Note the important detail here: this is undefined not because the
static type of 'arr' is 'int[5]' before decay, but because the
_dynamic_ type of 'arr' is 'int[5]'. (I use the notions of "static"
and "dynamic type" borrowed from C++ terminology, but I hope
it is clear what I mean).

It is to me. The problem is that C does not require compilers to
use "dynamic types". Within any other limits of the Standard, they
can apply "static types" to array bounds-checking. Given:

/* for some valid data-type T and constant K */
struct S {
size_t n;
T obj[K]; /* actually size n */
};

and:

struct S *p = malloc(sizeof *p + (n - K) * sizeof *p->obj);
...
use(p->obj[i]);

a C compiler may legitimately insert the equivalent of:

assert((size_t)i < K);

in front of the use() call, because the only valid bounds for i
*could* be [0..K), based strictly on the type of p->obj (which
is "array K of T", whatever that constant K and type T are).

>With arrays declared as in your example it is easy to
determine because their static type is always the same as their dynamic type.

Note also that the very same compiler will probably not catch the error in

int arr[5];
int* const p = arr;
p[5] = 42;

meaning that it treats these two situations differently,

Some bounds-checking systems will not catch this, and some will.
The "best" ones (where the degree of "better" or "worse" is based
on my personal opinion :-) ) happen to implement this such that
pointers pointing to malloc()ed areas have bounds determined
dynamically. That is, they will use the "dynamic type" as you
have suggested. But Standard C allows them to use the "static
type", and if they do, the code will abort at runtime.

>In the following example

int arr[100];
int (*parr)[5] = (int(*)[5]) &arr;

(*parr)[5] = 42;

the array access is also perfectly legal even though it seems to
violate the array bound of static type of '*parr'. A compiler that
issues an error in this case (i.e. refuses to compile the code)
would be non-compliant.

I disagree; you need to identify the section(s) of the Standard
that would cause that compiler to be non-compliant. (This is
probably better discussed in comp.std.c, though.)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Oct 13 '06 #10

Arthur J. O'Dwyer

On Fri, 13 Oct 2006, Chris Torek wrote:

Andrey Tarasevich <an**************@hotmail.comwrote:
>In the following example

int arr[100];
int (*parr)[5] = (int(*)[5]) &arr;

(*parr)[5] = 42;

the array access is also perfectly legal even though it seems to
violate the array bound of static type of '*parr'. A compiler that
issues an error in this case (i.e. refuses to compile the code)
would be non-compliant.

I disagree; you need to identify the section(s) of the Standard
that would cause that compiler to be non-compliant. (This is
probably better discussed in comp.std.c, though.)

I think Chris is right-as-usual. In this particular case, comp.std.c
would probably point out that the cast from (int(*)[100]) to (int(*)[5])
may invoke undefined behavior per 6.3.2.3#7, "if the resulting pointer
is not correctly aligned for the pointed-to type, the behavior is
undefined." Technically, there's no reason int[5] and int[100] have
to have the same alignment requirements.
(And then they'd get into a big flamewar over whether this matters. :)

Anyway, the point of this thread was "don't use the struct hack,
unless it's in C99 and done properly and for a very good reason,"
and I don't think an intimate knowledge of array alignments is
required in order to concur with /that/.

HTH,
-Arthur

Oct 13 '06 #11

Richard Heathfield

Arthur J. O'Dwyer said:

<snip>

Anyway, the point of this thread was "don't use the struct hack,
unless it's in C99 and done properly and for a very good reason,"
and I don't think an intimate knowledge of array alignments is
required in order to concur with /that/.

I agree, Arthur. So does Dennis Ritchie, who once labelled the struct hack
as "unwarranted chumminess with the implementation".

But on the other hand, as far as I'm aware nobody - including the ISO guys -
has /ever/ encountered a real world implementation where the struct hack
didn't work. No, I'm not advocating it, but I think we'd be hard-pressed to
demonstrate that it would break anything in the real world.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Oct 13 '06 #12

"Arthur J. O'Dwyer" <aj*******@andrew.cmu.eduwrites:

On Fri, 13 Oct 2006, Chris Torek wrote:
>Andrey Tarasevich <an**************@hotmail.comwrote:
>>In the following example

int arr[100];
int (*parr)[5] = (int(*)[5]) &arr;

(*parr)[5] = 42;

the array access is also perfectly legal even though it seems to
violate the array bound of static type of '*parr'. A compiler that
issues an error in this case (i.e. refuses to compile the code)
would be non-compliant.

I disagree; you need to identify the section(s) of the Standard
that would cause that compiler to be non-compliant. (This is
probably better discussed in comp.std.c, though.)

I think Chris is right-as-usual. In this particular case, comp.std.c
would probably point out that the cast from (int(*)[100]) to
(int(*)[5]) may invoke undefined behavior per 6.3.2.3#7, "if the
resulting pointer
is not correctly aligned for the pointed-to type, the behavior is
undefined." Technically, there's no reason int[5] and int[100] have
to have the same alignment requirements.
(And then they'd get into a big flamewar over whether this matters. :)

I'm not sure about that. I'd normally expect both int[5] and int[100]
to have the same alignment as int. I'm not sure whether this can be
inferred from the standard, but consider this:

void func(int *param, size_t len)
{
...
}

int array_object[6];
func(array_object + 1, 5);

I presume that func can treat param as a pointer to the first element
of an int[5] array. More generally, I presume that any array of foo
has the same alignment requirements as foo (though a compiler might,
for example, choose to align an int[2] more strictly than an int, even
though it doesn't have to).

At least, that's consistent with my mental model of the C abstract
machine. Quite possibly my mental model makes some assumptions that
aren't actually supported by the standard.

Anyway, the point of this thread was "don't use the struct hack,
unless it's in C99 and done properly and for a very good reason,"
and I don't think an intimate knowledge of array alignments is
required in order to concur with /that/.

I'd argue that it's ok to use the struct hack if you're willing to
accept the fact that it's not strictly portable. I suspect that it
will work on all existing compilers (the DS9K doesn't count).

I'm not advocating the use of non-portable constructs, just arguing
that they can be ok if you're aware of the risks.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Oct 13 '06 #13

Flash Gordon

Chris Torek wrote:

>Keith Thompson wrote:
>>It's true that no additional effort is required to support the struct
hack, at least for most compilers. But if a compiler goes to the
considerable extra effort to implement array bounds checking, it could
break code that uses the struct hack.

For example:

int arr[5];
arr[5] = 42;

In most implementations, the compiler won't complain about this, and
the program will happily write the value 42 over some chunk of memory.
But it's undefined behavior, which means a compiler is *allowed* to do
anything it likes, including catching the error.

In article <12*************@news.supernews.com>
Andrey Tarasevich <an**************@hotmail.comwrote:
>Yes, but, once again, that's a completely different case.

Not necessarily. You might *like* it to be, but Standard C (either
C89 or C99) does not say so.

>Note the important detail here: this is undefined not because the
static type of 'arr' is 'int[5]' before decay, but because the
_dynamic_ type of 'arr' is 'int[5]'. (I use the notions of "static"
and "dynamic type" borrowed from C++ terminology, but I hope
it is clear what I mean).

It is to me. The problem is that C does not require compilers to
use "dynamic types". Within any other limits of the Standard, they
can apply "static types" to array bounds-checking. Given:

/* for some valid data-type T and constant K */
struct S {
size_t n;
T obj[K]; /* actually size n */
};

and:

struct S *p = malloc(sizeof *p + (n - K) * sizeof *p->obj);
...
use(p->obj[i]);

a C compiler may legitimately insert the equivalent of:

assert((size_t)i < K);

in front of the use() call, because the only valid bounds for i
*could* be [0..K), based strictly on the type of p->obj (which
is "array K of T", whatever that constant K and type T are).

In support of the struct hack and similar constructs invoking undefined
behaviour...

How about the C99 rational saying that the committee had, in a response
to a defect report, said the struct hack invoked undefined behaviour and
going on to say that "there was no way to implement the “struct hack” in
C89".

<snip>

>In the following example

int arr[100];
int (*parr)[5] = (int(*)[5]) &arr;

(*parr)[5] = 42;

the array access is also perfectly legal even though it seems to
violate the array bound of static type of '*parr'. A compiler that
issues an error in this case (i.e. refuses to compile the code)
would be non-compliant.

I disagree; you need to identify the section(s) of the Standard
that would cause that compiler to be non-compliant. (This is
probably better discussed in comp.std.c, though.)

In support of Chris I would say it falls foul of exactly the same
undefined behaviour of the struct hack. The memory is there but the type
used to access it does not allow indexing that far.

Oct 13 '06 #14

Chris Torek wrote:

...
In article <12*************@news.supernews.com>
Andrey Tarasevich <an**************@hotmail.comwrote:
>>Yes, but, once again, that's a completely different case.

Not necessarily. You might *like* it to be, but Standard C (either
C89 or C99) does not say so.

Well, it doesn't say otherwise. Which means that the only way to solve this is
to find an indirect explanation in the Standard.

>>Note the important detail here: this is undefined not because the
static type of 'arr' is 'int[5]' before decay, but because the
_dynamic_ type of 'arr' is 'int[5]'. (I use the notions of "static"
and "dynamic type" borrowed from C++ terminology, but I hope
it is clear what I mean).

It is to me. The problem is that C does not require compilers to
use "dynamic types".

The same can be said about static types. You seem to be missing the most
important detail here: the indexing operation involves a _pointer_ and an
integer. Pointers have no size information associated with them, which
automatically means that any size-based restrictions imposed be the standard
apply to the run-time properties of the array pointed by the pointer. There's no
semantic difference between a pointer pointing to a 'malloc'ed array and a
pointer pointing to an automatic or static array.

Within any other limits of the Standard, they
can apply "static types" to array bounds-checking.

_That_ is what needs to be supported by a quote from the standard. Where exactly
does it say anything like this?

Given:

/* for some valid data-type T and constant K */
struct S {
size_t n;
T obj[K]; /* actually size n */
};

and:

struct S *p = malloc(sizeof *p + (n - K) * sizeof *p->obj);
...
use(p->obj[i]);

a C compiler may legitimately insert the equivalent of:

assert((size_t)i < K);

in front of the use() call, because the only valid bounds for i
*could* be [0..K), based strictly on the type of p->obj (which
is "array K of T", whatever that constant K and type T are).

No, the compiler cannot do that. It can issue a non-mandatory compile-time
diagnostic if it has a reason to believe that the run-time value of 'i' will get
out of the declared bounds of the array, but that as much as it can do. Pointer
arithmetic, as described in the standard, is limited solely by the run-time
properties of the underlying array.

>>With arrays declared as in your example it is easy to
determine because their static type is always the same as their dynamic type.

Note also that the very same compiler will probably not catch the error in

int arr[5];
int* const p = arr;
p[5] = 42;

meaning that it treats these two situations differently,

Some bounds-checking systems will not catch this, and some will.
The "best" ones (where the degree of "better" or "worse" is based
on my personal opinion :-) ) happen to implement this such that
pointers pointing to malloc()ed areas have bounds determined
dynamically. That is, they will use the "dynamic type" as you
have suggested.

That would be the only correct way to implement this. Of course, that would be a
run-time test, which is not exactly what we are talking about.

But Standard C allows them to use the "static
type", and if they do, the code will abort at runtime.

No. There's no place in the standard that would explicitly allow the use of the
"static type". The wording in the pointer arithmetic section applies, of course,
to the "dynamic type", since it is the general case.

>>In the following example

int arr[100];
int (*parr)[5] = (int(*)[5]) &arr;

(*parr)[5] = 42;

the array access is also perfectly legal even though it seems to
violate the array bound of static type of '*parr'. A compiler that
issues an error in this case (i.e. refuses to compile the code)
would be non-compliant.

I disagree; you need to identify the section(s) of the Standard
that would cause that compiler to be non-compliant. (This is
probably better discussed in comp.std.c, though.)

My example indeed suffers form the bad conversion. Otherwise, forgetting about
the conversion issue for a second, the indexing is valid simply because it is
applied to a pointer of type 'int*' that points to the beginning of an array of
100 'int's. It is actually you, who has to identify the section of the standard
that would show that a non-accepting compiler is compliant. So far you said that
standard allows compilers to use the "static type" in this case. I don't see it
in the standard. At least, the pointer arithmetic section doesn't say anything
like this.

--
Best regards,
Andrey Tarasevich

Oct 17 '06 #15

Flash Gordon wrote:

...

>>In the following example

int arr[100];
int (*parr)[5] = (int(*)[5]) &arr;

(*parr)[5] = 42;

the array access is also perfectly legal even though it seems to
violate the array bound of static type of '*parr'. A compiler that
issues an error in this case (i.e. refuses to compile the code)
would be non-compliant.

I disagree; you need to identify the section(s) of the Standard
that would cause that compiler to be non-compliant. (This is
probably better discussed in comp.std.c, though.)

In support of Chris I would say it falls foul of exactly the same
undefined behaviour of the struct hack. The memory is there but the type
used to access it does not allow indexing that far.
...

No, being worded like that it is entirely incorrect. The _type_ that
participates in the indexing operation in place of the actual array is a
_pointer_ type. Pointer types cannot allow or disallow indexing. Whatever
limitations are imposed on indexing (or, more precisely, binary + and -
operations involving pointers and integers) are based solely on run-time
properties of the underlying memory block, which, of course, cannot be described
through the notion of 'type'.

--
Best regards,
Andrey Tarasevich

Oct 17 '06 #16

Chris Torek wrote:

...

I just did some research on the "struct hack" and found the following DR

http://www.open-std.org/jtc1/sc22/wg...cs/dr_051.html

(also take a look at http://www.open-std.org/jtc1/sc22/wg...cs/dr_178.html)

According to the DR51 the problem with the "struct hack" is not the violation of
the addressing arithmetic rules by themselves. No, it is noticeably different.

Consider the 'ptr->ob[5]' expression (in terms of OP's example). This expression
uses pointer 'ptr' to access the contents of the struct object of type 'a' and
goes outside the declared bounds of that struct object. This is the _real_
problem. It exists at the entire 'struct' level only, but not at the member
array level.

Now, just to verify one thing let's take a look at the end of 6.1.2.5 in C89/90.
People familiar with C99 probably already realized what I'm looking for. I'm
looking for the

"All pointers to structure types shall have the same representation and
alignment requirements as each other."

clause present in C99 (6.2.5/26). Is it in C89/90? No. It is not in C89/90. OK,
that explains everything. C89/90 allowed pointers to different struct types to
have different representations. This means that a pointer of type 'struct a*' is
not required to provide access "that far" from the beginning of the structure.
Note, once again, the issue here is not with the decayed array pointer and the
actual array indexing attempt, but with the struct pointer and struct member
access attempt. It exists, once again, only at the entire 'struct' level, but
not at the member array level.

Also note that nothing in these DRs says that the following

b* pb = ptr->ob;
pb[5] = <whatever>;

is illegal, since now the struct pointer is not involved in the access operation
anymore. Also, note that what was added to C99 in 6.2.5/26 (see quote above)
effectively removes the "different pointer representations" issue mentioned in
DR#51, which means that in C99 the "struct hack" becomes legal even when the
trailing array size is explicitly specified (unless there's some other clause in
C99 that prohibits it in some other way).

--
Best regards,
Andrey Tarasevich

Oct 17 '06 #17

Andrey Tarasevich <an**************@hotmail.comwrites:
[...]

The same can be said about static types. You seem to be missing the
most important detail here: the indexing operation involves a
_pointer_ and an integer. Pointers have no size information
associated with them, which automatically means that any size-based
restrictions imposed be the standard apply to the run-time
properties of the array pointed by the pointer. There's no semantic
difference between a pointer pointing to a 'malloc'ed array and a
pointer pointing to an automatic or static array.

How do you know that there's no size information associated with
pointers?

Certainly there doesn't have to be. The common implementation model
in which pointers are just machine addresses, with no additional
information about what they point to, but that's not the only
possibility.

Consider a simple example:

int *ptr = malloc(10 * sizeof *ptr);
if (ptr != NULL) {
int i = ptr[20];
}

In a typical implementation where pointers are simple addresses, this
error will not be directly detected. Depending on what happens to be
at ptr+20, it might crash, or it might set it to some arbitrary value,
among infinitely many other possibilities. It's undefined behavior.

But since it's undefined behavior, an implementation *could* represent
ptr as a structure consisting of:
the base address of the object into which it points;
an offset; and
the size of the object
and use that information to perform bounds checking.

So ptr itself would contain the address returned by malloc(), an
offset of 0, and a size of 10 (or perhaps 10*sizeof(int)). ptr[20] is
equivalent to *(ptr+20). ptr+20 would consist of the same base
address as ptr, an offset of 20, and a size of 10. Applying "*" to
this value violates the bounds check. All perfectly legal, sinc the
check can be triggered only in cases where the behavior is already
undefined.

The question is, where does the size information come from? In the
case of the struct hack, is the size 1 (as specified in the member
array declaration), or is it the size allocated by malloc()?

I don't know whether the standard really answers this question one way
or the other.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Oct 17 '06 #18

Andrey:

> int arr[100];
int (*parr)[5] = (int(*)[5]) &arr;

(*parr)[5] = 42;

the array access is also perfectly legal even though it seems to
violate the array bound of static type of '*parr'.

Andrey (later):

No, being worded like that it is entirely incorrect. The _type_ that
participates in the indexing operation in place of the actual array is a
_pointer_ type. Pointer types cannot allow or disallow indexing.

Indeed. Given int *ptr = &i for some integer i allows p+0 and p+1 to
be calculated, but disallows all other calculations and disallows access
to p[1]. It is not the *type* of ptr that disallows indexing, it is the
actual object pointed to. In this case, ptr points to a non-array object,
so it is seen as an array object of size 1.

I think I agree with Andrey. Within the context of pointer arithmetic the
standard talks only about array objects. And as far as I can see, the only
way to get array objects is through array declaration (more about malloc a
bit later). And, an array object remains fixed in size, whatever
intermediate type transformations are present.
So also (*parr)[42] = 42 is valid.

Malloc *is* a bit different. But also here, the standard (at least C99)
pretty much excludes the struct hack (with a trailing array of fixed size).
Malloc is described as returning an object or an array of objects of the
type of the pointer the result is assigned to. In this case, the *type*
defines the size of the trailing array object; not the memory allocated.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Oct 18 '06 #19

Dietmar Schindler

Dik T. Winter wrote:

Malloc is described as returning an object or an array of objects of the
type of the pointer the result is assigned to. In this case, the *type*
defines the size of the trailing array object; not the memory allocated.

N1124 7.20.3.3 The malloc function
....
void *malloc(size_t size);
Description
The malloc function allocates space for an object whose size is
specified by size ...

--
Dietmar Schindler

Oct 20 '06 #20

In article <45***********@arcor.deDietmar Schindler <dS***@arcor.dewrites:

Dik T. Winter wrote:
Malloc is described as returning an object or an array of objects of the
type of the pointer the result is assigned to. In this case, the *type*
defines the size of the trailing array object; not the memory allocated.

N1124 7.20.3.3 The malloc function
...
void *malloc(size_t size);
Description
The malloc function allocates space for an object whose size is
specified by size ...

7.20.3:
"The pointer returned if the allocation succeeds is suitably aligned
so that it may be assigned to a pointer to any type of object and
then used to access such an object or an array of such objects in
the space allocated..."
I love such inconsistencies. Yes, I should have looked a bit further.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Oct 20 '06 #21

"Dik T. Winter" <Di********@cwi.nlwrites:

In article <45***********@arcor.deDietmar Schindler
<dS***@arcor.dewrites:
Dik T. Winter wrote:
Malloc is described as returning an object or an array of objects of the
type of the pointer the result is assigned to. In this case, the *type*
defines the size of the trailing array object; not the memory allocated.
>
N1124 7.20.3.3 The malloc function
...
void *malloc(size_t size);
Description
The malloc function allocates space for an object whose size is
specified by size ...

7.20.3:
"The pointer returned if the allocation succeeds is suitably aligned
so that it may be assigned to a pointer to any type of object and
then used to access such an object or an array of such objects in
the space allocated..."
I love such inconsistencies. Yes, I should have looked a bit further.

What inconsistency are you referring to? You wrote above that:

Malloc is described as returning an object or an array of objects
of the type of the pointer the result is assigned to.

Where did you get that description? (malloc() doesn't return an
object or an array of objects; it returns a pointer value of type
void*.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Oct 21 '06 #22

In article <ln************@nuthaus.mib.orgKeith Thompson <ks***@mib.orgwrites:

"Dik T. Winter" <Di********@cwi.nlwrites:
In article <45***********@arcor.deDietmar Schindler
<dS***@arcor.dewrites:
Dik T. Winter wrote:
Malloc is described as returning an object or an array of objects of the
type of the pointer the result is assigned to. In this case, the *type*
defines the size of the trailing array object; not the memory allocated.
>
N1124 7.20.3.3 The malloc function
...
void *malloc(size_t size);
Description
The malloc function allocates space for an object whose size is
specified by size ...
7.20.3:
"The pointer returned if the allocation succeeds is suitably aligned
so that it may be assigned to a pointer to any type of object and
then used to access such an object or an array of such objects in
the space allocated..."
I love such inconsistencies. Yes, I should have looked a bit further.

What inconsistency are you referring to? You wrote above that:
Malloc is described as returning an object or an array of objects
of the type of the pointer the result is assigned to.
Where did you get that description? (malloc() doesn't return an
object or an array of objects; it returns a pointer value of type
void*.)

Well, I thought the description was a bit inconsistent. But that is just
my opinion.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Oct 21 '06 #23

Some additional thoughts about this:

In article <J7********@cwi.nl"Dik T. Winter" <Di********@cwi.nlwrites:

Andrey:

> int arr[100];
> int (*parr)[5] = (int(*)[5]) &arr;
>>
> (*parr)[5] = 42;

Note that if this would not be permitted,
int *qarr = (int *)&arr;
qarr[5] would also not be permitted.
the only difference is that parr is a pointer to array 5 of int, and qarr
is a pointer to int (which, when indexing, would be assumed to be an array 1
of int). So when indexing the actual *type* of the pointer is not relevant.

This does *not* destroy the possibility of bounds checking. When an object
is created, remember first address and size of the object and transfer that
to every pointer pointing into the object (or one past the object).

I see however one problem. Consider:
typedef struct {int p, q;} structure;
structure x;
is:
(&(x.p))[1]
correct? You would not like that, but it is nevertheless allowed
(arithmetic is still within the main object). Unless you consider that
x.p denotes a member object (of type int), and use that in the description
of valid indexing. But if you do that you get in problems with:
int a[5];
(&(a[3]))[1];
because now a[3] denotes an element object, and this indexing would also
not be allowed (the element object is also of type int).

The only solution (in my opinion) is to consider member objects (of structs
and unions) as true objects, while element objects are not so considered,
in the context of indexing.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Oct 21 '06 #24