Making Fatal Hidden Assumptions - Page 6

CBFalconer

We often find hidden, and totally unnecessary, assumptions being
made in code. The following leans heavily on one particular
example, which happens to be in C. However similar things can (and
do) occur in any language.

These assumptions are generally made because of familiarity with
the language. As a non-code example, consider the idea that the
faulty code is written by blackguards bent on foulling the
language. The term blackguards is not in favor these days, and for
good reason. However, the older you are, the more likely you are
to have used it since childhood, and to use it again, barring
specific thought on the subject. The same type of thing applies to
writing code.

I hope, with this little monograph, to encourage people to examine
some hidden assumptions they are making in their code. As ever, in
dealing with C, the reference standard is the ISO C standard.
Versions can be found in text and pdf format, by searching for N869
and N1124. [1] The latter does not have a text version, but is
more up-to-date.

We will always have innocent appearing code with these kinds of
assumptions built-in. However it would be wise to annotate such
code to make the assumptions explicit, which can avoid a great deal
of agony when the code is reused under other systems.

In the following example, the code is as downloaded from the
referenced URL, and the comments are entirely mine, including the
'every 5' linenumber references.

/* Making fatal hidden assumptions */
/* Paul Hsiehs version of strlen.
http://www.azillionmonkeys.com/qed/asmexample.html

Some sneaky hidden assumptions here:
1. p = s - 1 is valid. Not guaranteed. Careless coding.
2. cast (int) p is meaningful. Not guaranteed.
3. Use of 2's complement arithmetic.
4. ints have no trap representations or hidden bits.
5. 4 == sizeof(int) && 8 == CHAR_BIT.
6. size_t is actually int.
7. sizeof(int) is a power of 2.
8. int alignment depends on a zeroed bit field.

Since strlen is normally supplied by the system, the system
designer can guarantee all but item 1. Otherwise this is
not portable. Item 1 can probably be beaten by suitable
code reorganization to avoid the initial p = s - 1. This
is a serious bug which, for example, can cause segfaults
on many systems. It is most likely to foul when (int)s
has the value 0, and is meaningful.

He fails to make the valid assumption: 1 == sizeof(char).
*/

#define hasNulByte(x) ((x - 0x01010101) & ~x & 0x80808080)
#define SW (sizeof (int) / sizeof (char))

int xstrlen (const char *s) {
const char *p; /* 5 */
int d;

p = s - 1;
do {
p++; /* 10 */
if ((((int) p) & (SW - 1)) == 0) {
do {
d = *((int *) p);
p += SW;
} while (!hasNulByte (d)); /* 15 */
p -= SW;
}
} while (*p != 0);
return p - s;
} /* 20 */

Let us start with line 1! The constants appear to require that
sizeof(int) be 4, and that CHAR_BIT be precisely 8. I haven't
really looked too closely, and it is possible that the ~x term
allows for larger sizeof(int), but nothing allows for larger
CHAR_BIT. A further hidden assumption is that there are no trap
values in the representation of an int. Its functioning is
doubtful when sizeof(int) is less that 4. At the least it will
force promotion to long, which will seriously affect the speed.

This is an ingenious and speedy way of detecting a zero byte within
an int, provided the preconditions are met. There is nothing wrong
with it, PROVIDED we know when it is valid.

In line 2 we have the confusing use of sizeof(char), which is 1 by
definition. This just serves to obscure the fact that SW is
actually sizeof(int) later. No hidden assumptions have been made
here, but the usage helps to conceal later assumptions.

Line 4. Since this is intended to replace the systems strlen()
function, it would seem advantageous to use the appropriate
signature for the function. In particular strlen returns a size_t,
not an int. size_t is always unsigned.

In line 8 we come to a biggie. The standard specifically does not
guarantee the action of a pointer below an object. The only real
purpose of this statement is to compensate for the initial
increment in line 10. This can be avoided by rearrangement of the
code, which will then let the routine function where the
assumptions are valid. This is the only real error in the code
that I see.

In line 11 we have several hidden assumptions. The first is that
the cast of a pointer to an int is valid. This is never
guaranteed. A pointer can be much larger than an int, and may have
all sorts of non-integer like information embedded, such as segment
id. If sizeof(int) is less than 4 the validity of this is even
less likely.

Then we come to the purpose of the statement, which is to discover
if the pointer is suitably aligned for an int. It does this by
bit-anding with SW-1, which is the concealed sizeof(int)-1. This
won't be very useful if sizeof(int) is, say, 3 or any other
non-poweroftwo. In addition, it assumes that an aligned pointer
will have those bits zero. While this last is very likely in
todays systems, it is still an assumption. The system designer is
entitled to assume this, but user code is not.

Line 13 again uses the unwarranted cast of a pointer to an int.
This enables the use of the already suspicious macro hasNulByte in
line 15.

If all these assumptions are correct, line 19 finally calculates a
pointer difference (which is valid, and of type size_t or ssize_t,
but will always fit into a size_t). It then does a concealed cast
of this into an int, which could cause undefined or implementation
defined behaviour if the value exceeds what will fit into an int.
This one is also unnecessary, since it is trivial to define the
return type as size_t and guarantee success.

I haven't even mentioned the assumption of 2's complement
arithmetic, which I believe to be embedded in the hasNulByte
macro. I haven't bothered to think this out.

Would you believe that so many hidden assumptions can be embedded
in such innocent looking code? The sneaky thing is that the code
appears trivially correct at first glance. This is the stuff that
Heisenbugs are made of. Yet use of such code is fairly safe if we
are aware of those hidden assumptions.

I have cross-posted this without setting follow-ups, because I
believe that discussion will be valid in all the newsgroups posted.

[1] The draft C standards can be found at:
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/>

--
"If you want to post a followup via groups.google.c om, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell. org/google/>
Also see <http://www.safalra.com/special/googlegroupsrep ly/>

Mar 6 '06

Subscribe Reply

351

12951

« First
<
4
5
6
7
8
16
>
Last »

Keith Thompson

Andrew Reilly <an************ *@areilly.bpc-users.org> writes:

On Wed, 08 Mar 2006 00:56:03 +0000, Robin Haigh wrote:
It's not always equivalent. The trouble starts with

char a[8];
char *p;

for ( p = a+1 ; p < a+8 ; p += 2 ) {}

intending that the loop terminates on p == a+9 (since it skips a+8). But
how do we know that a+9 > a+8 ? If the array is right at the top of some
kind of segment, the arithmetic might have wrapped round.

a+9 > a+8 because a + 9 - (a + 8) == 1, which is > 0. Doesn't matter if
the signed or unsigned pointer value wrapped around in an intermediate
term. On many machines that's how the comparison is done anyway. You're
suggesting that having the compiler ensure that a+8 doesn't wrap around
wrt a is OK, but a+9 is too hard. I don't buy it.

How would you guarantee that a+(i+1) > a+i for all arbitrary values of
i? It's easy enough to do this when the addition doesn't go beyond
the end of the array (plus the case where it points just past the end
of the array), but if you want to support arbitrary arithmetic beyond
the array bounds, it's going to take some extra work, all for the sake
of guaranteeing properties that have *never* been guaranteed by the C
language. (Don't confuse what happens to have always worked for you
with what's actually guaranteed by the language itself.)

[...]

Ints have the nice property that 0 is in the middle and we know how much
headroom we've got either side. So it's easy for the compiler to make
the int version work (leaving it to the programmer to take
responsibility for avoiding overflow, which is no big deal).

Unsigned ints have the nice property that (a + 1) - 1 == a for all a, even
if a + 1 == 0. Overflow is generally no big deal in any case. (Other
than the object larger than half the address space issue.)

But unsigned ints *don't* have the property that a + 1 > a for all a.

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Mar 8 '06 #51

CBFalconer

Ben Pfaff wrote:

Andrew Reilly <an************ *@areilly.bpc-users.org> writes:
So the standards body broke decades of practice and perfectly safe
and reasonable code to support a *hypothetical* implementation
that was so stupid that it checked pointer values, rather than
pointer *use*?

You can still write your code to make whatever assumptions you
like. You just can't assume that it will work portably. If,
for example, you are writing code for a particular embedded
architecture with a given compiler, then it may be reasonable
to make assumptions beyond those granted by the standard.

Which was the point of my little article in the first place. If,
for one reason or another, you need to make assumptions, document
what they are. To do this you first need to be able to recognize
those assumptions. You may easily find you could avoid making the
assumption in the first place without penalty, as in this
particular "p = p - 1;" case.

Why is everybody concentrating on the one real error in the example
code, and not on the hidden assumptions. Errors are correctible,
assumptions need to be recognized.

--
"If you want to post a followup via groups.google.c om, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell. org/google/>
Also see <http://www.safalra.com/special/googlegroupsrep ly/>

Mar 8 '06 #52

msg

OK, I've made enough of a fool of myself already. I'll go and have that
second cup of coffee for the morning, before I start going on about having
the standard support non-2's complement integers, or machines that have no
arithmetic right shifts...

I get queasy reading the rants against 1's complement architectures; I
wish Seymour Cray were still around to address this.

Michael Grigoni
Cybertheque Museum

Mar 8 '06 #53

Michael Mair

Andrew Reilly schrieb:

On Tue, 07 Mar 2006 13:28:37 -0500, Arthur J. O'Dwyer wrote:
K&R answers your question. If pa points to some element of an array,
then pa-1 points to the /previous element/. But what's the "previous
element" relative to the first element in the array? It doesn't exist. So
we have undefined behavior.

Only because the standard says so. Didn't have to be that way. There are
plenty of logically correct algorithms that could exist that involve
pointers that point somewhere outside of a[0..N]. As long as there's no
de-referencing, no harm, no foul. (Consider the simple case of iterating
through an array at a non-unit stride, using the normal p < s + N
termination condition. The loop finishes with p > s + N and the standard
says "pow, you're dead", when the semantically identical code written with
integer indexes has no legal problems.

I have encountered situations where
free(p);
....
if (p == q)
leads to the platform's equivalent of the much beloved
"segmentati on fault". Your theory means that this should
have worked. Assigning NULL or a valid address to p after
freeing avoids the error.

<OT>Incidentall y, in gnu.gcc.help there is a discussion about
much the same situation in C++ where someone gets in trouble
for delete a; .... if (a == b) ...
Happens only for multiple inheritance and only for gcc.
Thread starts at <Eg0Pf.110735$s a3.34921@pd7tw1 no>
</OT>

[snip!]

Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Mar 8 '06 #54

msg

Since most coding for AS/400s were (is still?) done in COBOL and PL/1
Most coding was in flavors of RPG.

since AS/400s are actually Power processors with a JIT over the top now

There is a large installed base of CISC (non Power-PC) AS-400s; they use
a flat memory model reminiscent of the Multics design.

Michael Grigoni
Cybertheque Museum

Mar 8 '06 #55

Andrew Reilly

On Wed, 08 Mar 2006 03:33:10 +0000, Keith Thompson wrote:

But unsigned ints *don't* have the property that a + 1 > a for all a.

My last comment on the thread, hopefully:

No, they don't, but when you're doing operations on pointer derivations
that are all in some sense "within the same object", even if hanging
outside it, (i.e., by dint of being created by adding integers to a single
initial pointer), then the loop termination condition is, in a very real
sense, a ptrdif_t, and *should* be computed that way. The difference can
be both positive and negative.

The unsigned comparison a + n > a fails for some values of a, but the
ptrdiff_t (signed) comparison a + n - a > 0 is indeed true for all a and
n > 0, so that's what should be used. And it *is* what is used on most
processors that do comparison by subtraction (even if that's wrapped in a
non-destructive cmp).

I actually agree completely with the piece of K&R that you posted a few
posts ago, where it was pointed out that pointer arithmetic only makes
sense for pointers within the same object (array). Since C doesn't tell
you that the pointer that your function has been given isn't somewhere in
the middle of a real array, my aesthetic sense is that conceptually,
arrays (as pointers, within functions) extend infinitely (or at least to
the range of int) in *both* directions, as far as pointer arithmetic
within a function is concerned. Actually accessing values outside of the
bounds of the real array that has been allocated somewhere obviously
contravenes the "same object" doctrine, and it's up to the logic of the
caller and the callee to avoid that.

Now it has been amply explained that my conception of how pointer
arithmetic ought to work is not the way the standard describes, even
though it *is* the way I have experienced it in all of the C
implementations that it has obviously been my good fortune to encounter.
I consider that to be a pity, and obviously some of my code wouldn't
survive a translation to a Boroughs or AS/400 machine (or perhaps even to
some operating environments on 80286es). Oh, well. I can live with that.
It's not really the sort of code that I'd expect to find there, and I
don't expect to encounter such constraints in the future, but I *will* be
more careful, and will keep my eyes more open.

Thanks to all,

--
Andrew

Mar 8 '06 #56

CBFalconer

Randy Howard wrote:

Andrew Reilly wrote
On Tue, 07 Mar 2006 15:59:37 -0700, Al Balmer wrote:

An implementation might choose, for valid reasons, to prefetch
the data that pointer is pointing to. If it's in a segment not
allocated ...

Hypothetical hardware that traps on *speculative* loads isn't
broken by design? I'd love to see the initialization sequences,
or the task switching code that has to make sure that all pointer
values are valid before they're loaded. No, scratch that. I've
got better things to do.

This is a lot of whining about a specific problem that can
easily be remedied just by changing the loop construction. The
whole debate is pretty pointless in that context, unless you
have some religious reason to insist upon the method in the
original.

Er, didn't I point that fix out in the original article? That was
the only error in the original sample code, all other problems can
be tied to assumptions, which may be valid on any given piece of
machinery. The point is to avoid making such assumptions, which
requires recognizing their existence in the first place.

--
"If you want to post a followup via groups.google.c om, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell. org/google/>
Also see <http://www.safalra.com/special/googlegroupsrep ly/>

Mar 8 '06 #57

Ben Bacarisse

On Tue, 07 Mar 2006 19:10:10 -0500, CBFalconer wrote:

Why is everybody concentrating on the one real error in the example code,
and not on the hidden assumptions. Errors are correctible, assumptions
need to be recognized.

Well I, for one, commented on the hidden assumption that must be made for
what you call "the one real error" to actually be an error -- but it was
not recognised! ;-)

[At the top of your original post did not, in fact, claim this was an
error but you call it a "real error" later on.]

I feel that your points would have been better made using other examples.
The context of the code made me read the C as little more than pseudo-code
with the added advantage that a C compiler might, with a following wind,
produce something like the assembler version (which in turn has its own
assumptions but you were not talking about that).

I found Eric Sosman's "if (buffer + space_required > buffer_end) ..."
example more convincing, because I have seen that in programs that are
intended to be portable -- I am pretty sure I have written such things
myself in my younger days. Have you other more general examples of
dangerous assumptions that can sneak into code? A list of the "top 10
things you might be assuming" would be very interesting.

--
Ben.

Mar 8 '06 #58

Randy Howard

CBFalconer wrote
(in article <44************ ***@yahoo.com>) :

Randy Howard wrote:
This is a lot of whining about a specific problem that can
easily be remedied just by changing the loop construction. The
whole debate is pretty pointless in that context, unless you
have some religious reason to insist upon the method in the
original.

Er, didn't I point that fix out in the original article?

Yes, which is precisely why I'm surprised at the ensuing debate
over the original version, as my comments should reflect.

--
Randy Howard (2reply remove FOOBAR)
"The power of accurate observation is called cynicism by those
who have not got it." - George Bernard Shaw

Mar 8 '06 #59

Randy Howard

Rod Pemberton wrote
(in article <du***********@ news3.infoave.n et>):

"Gerry Quinn" <ge****@DELETET HISindigo.ie> wrote in message
news:MP******** *************** *@news1.eircom. net...
In article <44************ ***@yahoo.com>, cb********@yaho o.com says...
These assumptions are generally made because of familiarity with
the language. As a non-code example, consider the idea that the
faulty code is written by blackguards bent on foulling the
language. The term blackguards is not in favor these days, and for
good reason.

About as good a reason as the term niggardly, as far as I can tell.
Perhaps the words are appropriate in a post relating to fatal
assumptions.

I didn't know what he meant either. Not being racist (at least I hope not),
I went GIYF'ing. I think it might be a reference to some Dungeons and
Dragons persona or something. Unfortunately, he'd need to clarify...

Oh come on. Doesn't anyone own a dictionary anymore, or have a
vocabulary which isn't found solely on digg, slashdot or MTV?

blackguard:
A thoroughly unprincipled person; a scoundrel.
A foul-mouthed person.
Does everything have to become a racism experiment?

--
Randy Howard (2reply remove FOOBAR)
"The power of accurate observation is called cynicism by those
who have not got it." - George Bernard Shaw

Mar 8 '06 #60