473,883 Members | 1,694 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Bounds checked arrays

As everybody knows, the C language lacks
a way of specifying bounds checked arrays.

This situation is intolerable for people that know
that errors are easy to do, and putting today's
powerful microprocessor to do a few instructions
more at each array access will not make any
difference what speed is concerned.

Not all C applications are real-time apps.

Besides, there are the viruses
and other malicious software that are using
this problem in the C language to do their dirty
work.

Security means that we avoid the consequences
of mistakes and expose them as soon as possible.

It would be useful then, if we introduced into C

#pragma STDC bounds_checking (ON/OFF)

When the state of this toggle is ON, the compiler
would accept declarations (like now)

int array[2][3];

The compiler would emit code that tests
each index for a well formed index.
Each index runs from zero to n-1, i.e.
must be greater than zero and less than
"n".

In arrays of dimension "n", the compiler would
emit code that tests "n" indices, before using
them.

Obviously, optimizations are possible, and
good compilers will optimize away many tests
specially in loops. This is left unspecified.

Important is to know that the array updates
can't overflow in neighboring memory areas.

How many machine instructions does this cost?

Each test is a comparison of an index with a
constant value, and a conditional jump. If the
compiler only emits forward branches, the
branch predictor can correctly predict that in
most cases the branch will NOT be taken.

In abstract assembly this is 4 instructions:
test if index >= 0
jump if not "indexerror "
test if index < "n"
jump if not "indexerror "

where "n" is a compile time constant.

We have something like 4 cycles then, what
a 2GHZ machine does in 0,000 000 004 seconds.

Yes, table access is a common operation but
it would take millions of those to slow the program
a negligible quantity of time. We are not in the
PDP-11 any more.

This would make C a little bit easier to program,
and the resulting programs of better quality.
Buffer overflows happen of course, but the language
limits the consequences by enforcing limits.

By default the behavior is to stop the program.
The user can override this, and different schemas
can be specified by him/her to take actions when
a buffer overflow happens.

A simple strategy is to just do nothing.

int fn(char *input)
{
char tmpbuf[BUFSIZ];
int i=0;
bool result = false;

while (*input) {
tmpbuf[i++] = *input++;
}
// Do things with the input
// set result
return result;
indexerror:
return false;
}

This function uses the built-in error checking
to avoid any bad consequence for an overflow.
If the input data is too long, it is a mal-formed
input that should be discarded.

This frees the programmer from the tedious task
of writing
if (i >= sizeof(tmpbuf)) goto indexerror;

at EACH array access. This can be done better
by a machine and the compiler.

Because a program like that today
***assumes*** the input length
can't be bigger than BUFSIZ.

This is always *implicitely* assumed and
nowhere *enforced* by the way. The current
state implies that catastrophic errors can happen
if the index starts overwriting separate memory
areas like the return address...

Everyone knows this. Let's do something to
stop it. Something simple, without too much
fuzz.

In this case the compiler generates code that
in case of index error
jumps to this label and does what the programmer
specifies.

The motto of C is that: Trust the programmer.

We have just to allow him/her to specify what to do
in case of overflow.

Trust the programmer doesn't mean that we trust
that he never does a mistake of course. It means
that the programmer can specify what actions
to take in case of error and provide sensible
defaults.

Default is then, to finish the program like the
assert() macro, another useful construct.

Note that this proposal doesn't change anything
in the language. No new constructs, even if
compilers could provide arrangements like the
one proposed above.

I propose then:

#pragma STDC bounds_checking (ON/OFF)

that should be written outside a function scope.

That's all.

This proposal is an invitation to
brain-storming..:-)

I know that anyone using C is aware of this.
So, let's fix it.

jacob
Nov 14 '05
50 6207
On 16 Feb 2004 14:38:33 GMT, Da*****@cern.ch (Dan Pop) wrote:
Bound checking is well defined on most languages which either don't
support pointers at all (Fortran <= F77) or have a very restricted notion
of pointers (Fortran >= F90 and Pascal).
Although many, probably most, F77 *implementation s* do as extensions,
and I believe at least some Pascal implementations relax the language
restrictions. Raising exactly the same problems, of course.

PL/I has unrestricted pointers, and Ada has both (restricted and un),
which can't be checked except for special cases; but they (both) have
first-class arrays, at least w.r.t. passing as arguments, which thus
*can* be and are checked, unlike C.

In the no-pointers-at-all category you could add APL, BASIC, and LISP.
And if you can suppress your gag reflex long enough, COBOL.

(other problems snipped) Sorry, but C isn't the language for people needing bound checking.
Unfortunately, far too many such people do program in C... And even if
bound checking is eventually introduced, most of those people would
be the last ones to enable it in their code ;-)

(Sadly) Agree. Unless it were mandatory. Even if it was the default,
many people who need it would go to the trouble of *disabling* it.
- David.Thompson1 at worldnet.att.ne t
Nov 14 '05 #51

This thread has been closed and replies have been disabled. Please start a new discussion.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.