Bounds checked arrays

jacob navia

As everybody knows, the C language lacks
a way of specifying bounds checked arrays.

This situation is intolerable for people that know
that errors are easy to do, and putting today's
powerful microprocessor to do a few instructions
more at each array access will not make any
difference what speed is concerned.

Not all C applications are real-time apps.

Besides, there are the viruses
and other malicious software that are using
this problem in the C language to do their dirty
work.

Security means that we avoid the consequences
of mistakes and expose them as soon as possible.

It would be useful then, if we introduced into C

#pragma STDC bounds_checking(ON/OFF)

When the state of this toggle is ON, the compiler
would accept declarations (like now)

int array[2][3];

The compiler would emit code that tests
each index for a well formed index.
Each index runs from zero to n-1, i.e.
must be greater than zero and less than
"n".

In arrays of dimension "n", the compiler would
emit code that tests "n" indices, before using
them.

Obviously, optimizations are possible, and
good compilers will optimize away many tests
specially in loops. This is left unspecified.

Important is to know that the array updates
can't overflow in neighboring memory areas.

How many machine instructions does this cost?

Each test is a comparison of an index with a
constant value, and a conditional jump. If the
compiler only emits forward branches, the
branch predictor can correctly predict that in
most cases the branch will NOT be taken.

In abstract assembly this is 4 instructions:
test if index >= 0
jump if not "indexerror"
test if index < "n"
jump if not "indexerror"

where "n" is a compile time constant.

We have something like 4 cycles then, what
a 2GHZ machine does in 0,000 000 004 seconds.

Yes, table access is a common operation but
it would take millions of those to slow the program
a negligible quantity of time. We are not in the
PDP-11 any more.

This would make C a little bit easier to program,
and the resulting programs of better quality.
Buffer overflows happen of course, but the language
limits the consequences by enforcing limits.

By default the behavior is to stop the program.
The user can override this, and different schemas
can be specified by him/her to take actions when
a buffer overflow happens.

A simple strategy is to just do nothing.

int fn(char *input)
{
char tmpbuf[BUFSIZ];
int i=0;
bool result = false;

while (*input) {
tmpbuf[i++] = *input++;
}
// Do things with the input
// set result
return result;
indexerror:
return false;
}

This function uses the built-in error checking
to avoid any bad consequence for an overflow.
If the input data is too long, it is a mal-formed
input that should be discarded.

This frees the programmer from the tedious task
of writing
if (i >= sizeof(tmpbuf)) goto indexerror;

at EACH array access. This can be done better
by a machine and the compiler.

Because a program like that today
***assumes*** the input length
can't be bigger than BUFSIZ.

This is always *implicitely* assumed and
nowhere *enforced* by the way. The current
state implies that catastrophic errors can happen
if the index starts overwriting separate memory
areas like the return address...

Everyone knows this. Let's do something to
stop it. Something simple, without too much
fuzz.

In this case the compiler generates code that
in case of index error
jumps to this label and does what the programmer
specifies.

The motto of C is that: Trust the programmer.

We have just to allow him/her to specify what to do
in case of overflow.

Trust the programmer doesn't mean that we trust
that he never does a mistake of course. It means
that the programmer can specify what actions
to take in case of error and provide sensible
defaults.

Default is then, to finish the program like the
assert() macro, another useful construct.

Note that this proposal doesn't change anything
in the language. No new constructs, even if
compilers could provide arrangements like the
one proposed above.

I propose then:

#pragma STDC bounds_checking(ON/OFF)

that should be written outside a function scope.

That's all.

This proposal is an invitation to
brain-storming..:-)

I know that anyone using C is aware of this.
So, let's fix it.

jacob

Nov 14 '05 #1

Subscribe Post Reply

6105

osmium

jacob navia writes:

In arrays of dimension "n", the compiler would
emit code that tests "n" indices, before using
them.

n is, in general, unknown in functions that have arrays passed to them.
This presents serious problems to your scheme.

Nov 14 '05 #2

jacob navia

"osmium" <r1********@comcast.net> a écrit dans le message de
news:c0*************@ID-179017.news.uni-berlin.de...

jacob navia writes:
In arrays of dimension "n", the compiler would
emit code that tests "n" indices, before using
them.

n is, in general, unknown in functions that have arrays passed to them.
This presents serious problems to your scheme.

This must be changed. You write:

int fn(int array[2][3]);

Meaning that this function accepts only arrays of
dimensions 2 lines three columns. Inside the
function the dimensions are known.
Or you use *bounded* pointers.
You write:
int fn(size_t lines, size_t cols,int array[lines][cols]);

This information must be passed around.

Nov 14 '05 #3

Martin Dickopp

"jacob navia" <ja***@jacob.remcomp.fr> writes:

As everybody knows, the C language lacks a way of specifying bounds
checked arrays.
Yes, but I don't think anything in the C standard /forbits/ an
implementation to check array bounds. From the point of view of the
standard, accessing an out of bounds array element causes undefined
behavior, so the implementation is free to (e.g.) terminate the program.

<OT>
FWIW, there is or was an attempt to implement bounds checking in the
GNU C compiler. I don't know what the current state is.
</OT>
It would be useful then, if we introduced into C

#pragma STDC bounds_checking(ON/OFF)

I disagree. While I would find bounds checking very useful, I consider
it a quality of implementation issue, not something which should be
standardized. Remember that there are also C compilers for embedded
devices, which often operate under severe memory constraints (the
devices, not the compilers) and cannot (and need not) afford the
additional bounds checking overhead. Mandatory bounds checking in the
standard would force these compilers to implement it nevertheless.

Martin

Nov 14 '05 #4

jacob navia

"Martin Dickopp" <ex****************@zero-based.org> a écrit dans le message
de news:cu*************@zero-based.org...

"jacob navia" <ja***@jacob.remcomp.fr> writes:
As everybody knows, the C language lacks a way of specifying bounds
checked arrays.
Yes, but I don't think anything in the C standard /forbits/ an
implementation to check array bounds. From the point of view of the
standard, accessing an out of bounds array element causes undefined
behavior, so the implementation is free to (e.g.) terminate the program.

Yes. That's precisely my proposal
<OT>
FWIW, there is or was an attempt to implement bounds checking in the
GNU C compiler. I don't know what the current state is.
</OT>
It would be useful then, if we introduced into C

#pragma STDC bounds_checking(ON/OFF)

I disagree. While I would find bounds checking very useful, I consider
it a quality of implementation issue, not something which should be
standardized. Remember that there are also C compilers for embedded
devices, which often operate under severe memory constraints (the
devices, not the compilers) and cannot (and need not) afford the
additional bounds checking overhead. Mandatory bounds checking in the
standard would force these compilers to implement it nevertheless.

Very easy. In that case, the user just do NOT writes to the code

#pragma ...

That is all.

Normally, in embedded devices the code is in flash/eprom and
memory constraints are low for code, but not for data (RAM).

This means that the 4 assembler instructions more at each access
do not produce a memory starvation situation.

But if the code size is critical, or performance constraints
makes this #pragma impossible, those implementations
do not accept this pragma.

As I said before, not ALL applications in C are running
in 2K RAM. Let's not impose a lower common denominator
for all programs.

Programs that need this feature should be able to use it
in *standard* C.

Nov 14 '05 #5

Malcolm

"jacob navia" <ja***@jacob.remcomp.fr> wrote in message

As everybody knows, the C language lacks
a way of specifying bounds checked arrays.

There's a sub-thread on this (size of a sizeof(pointer)).

An implementation is allowed to create safe pointers that contain bounds
information. For complete safety it also has to put memory access through
gyristics to prevent corruption of the pointer itself. However very few
implementations do so, presumably for efficiency reasons.

Nov 14 '05 #6

Martin Dickopp

"jacob navia" <ja***@jacob.remcomp.fr> writes:

"Martin Dickopp" <ex****************@zero-based.org> a écrit dans le message
de news:cu*************@zero-based.org...
"jacob navia" <ja***@jacob.remcomp.fr> writes:
> It would be useful then, if we introduced into C
>
> #pragma STDC bounds_checking(ON/OFF)
I disagree. While I would find bounds checking very useful, I consider
it a quality of implementation issue, not something which should be
standardized. Remember that there are also C compilers for embedded
devices, which often operate under severe memory constraints (the
devices, not the compilers) and cannot (and need not) afford the
additional bounds checking overhead. Mandatory bounds checking in the
standard would force these compilers to implement it nevertheless.

Very easy. In that case, the user just do NOT writes to the code

#pragma ...

That is all.

Yes, but the /compiler/ would still have to implement it.
Programs that need this feature should be able to use it
in *standard* C.

You can always try to lobby the committee members. However, according to
my understanding of the standardization process, it is unlikely that the
committee will consider something unless at least some implementations
already offer it as an extension.

Martin

Nov 14 '05 #7

Nick Landsberg

jacob navia wrote:

As everybody knows, the C language lacks
a way of specifying bounds checked arrays.

This situation is intolerable for people that know
that errors are easy to do, and putting today's
powerful microprocessor to do a few instructions
more at each array access will not make any
difference what speed is concerned.

Not all C applications are real-time apps.

But for those applications which are real-time
apps, the overhead for the bounds checking may
well be intolerable.
Besides, there are the viruses
and other malicious software that are using
this problem in the C language to do their dirty
work.
Actually, they are using the undisciplined coding
practices of dilletante C-coders to do their
dirty work.

Security means that we avoid the consequences
of mistakes and expose them as soon as possible.
Yes, if possible at compile time. Else, institute
real coding practices (rather than bogus ones which
say "use fixed arrays rather than malloc") and have
the code inspected by experts. A flag in the compiler
to do as much strict bounds checking as possible at
compile time would go a part of the way to this end.

It would be useful then, if we introduced into C

#pragma STDC bounds_checking(ON/OFF)

When the state of this toggle is ON, the compiler
would accept declarations (like now)

int array[2][3];

The compiler would emit code that tests
each index for a well formed index.
Each index runs from zero to n-1, i.e.
must be greater than zero and less than
"n".

In arrays of dimension "n", the compiler would
emit code that tests "n" indices, before using
them.

Obviously, optimizations are possible, and
good compilers will optimize away many tests
specially in loops. This is left unspecified.

Important is to know that the array updates
can't overflow in neighboring memory areas.
As someone else pointed out, the calling sequences
for all subroutine calls may have to change
to pass the limits. If not, then, the implementation
may have to pass these without the programmer knowing
about it. The C language has a long-standing tradition
that there is a well known "sentinel", i.e. NULL,
which indicates the end of the array for character types.
Of more importance, what should the behaviour be
when array bounds are exceeded. You are proposing
a new standard behaviour which includes bounds checking.
(This, as someone else pointed out, would be an extension
to the language and would probably not be considered
unless there was at least one existing implementation,
an "existance proof."

What should be the "standard behaviour" if the bounds
checks fail? Proposing a solution to a percieved
problem without proposing appropriate behaviour
when something like that happens, is, IMO, half
a solution.

There are many languages out there which perform
bounds checking. Elsethread, there are many languages
which do not have pointers. There is a need for
such languages, otherwise they would not be there,
but they are NOT C.

How many machine instructions does this cost?

Each test is a comparison of an index with a
constant value, and a conditional jump. If the
compiler only emits forward branches, the
branch predictor can correctly predict that in
most cases the branch will NOT be taken.

In abstract assembly this is 4 instructions:
test if index >= 0
jump if not "indexerror"
test if index < "n"
jump if not "indexerror"

where "n" is a compile time constant.

We have something like 4 cycles then, what
a 2GHZ machine does in 0,000 000 004 seconds.

Yes, table access is a common operation but
it would take millions of those to slow the program
a negligible quantity of time. We are not in the
PDP-11 any more.
I differ with your analysis of the number of assembly
instructions, but that's a nit-pick. I work on systems
which need to do upwards of 10,000 database lookups
per second and the same order of magnitude of parsing
strings, etc. They involve copying information from one
memory space to another. For those applications we
use C. For applications with less stringent requirements,
e.g. "only" 1,000 database accesses per second, we use Java.

C has it's place, Java has it's place, other languages
have their place.

On the C applications, we take great care NOT to use
dubious constructs and code reviews by the lead
developers are required before the code even gets
to system test. (No, it does not catch all the
problems.) There is a discipline involved.
If you don't have that discipline, use another
language. (This last was not meant as a flame,
rather a simple statement of fact.)

This would make C a little bit easier to program,
and the resulting programs of better quality.

[Much Snipped]

jacob

--
Ñ
"It is impossible to make anything foolproof because fools are so
ingenious" - A. Bloch

Nov 14 '05 #8

Nick Landsberg

Nick Landsberg wrote:

Apology below.

What should be the "standard behaviour" if the bounds
checks fail? Proposing a solution to a percieved
problem without proposing appropriate behaviour
when something like that happens, is, IMO, half
a solution.
Whoops ... my apologies ... I just checked your original
post and you do propose a solution to "stop the
program" when this check fails (or "do nothing",
whatever that means).

On the systems that I mentioned in my original
reply, this is NOT an option. (Maximum down-time
from ALL causes, hardware, software, pilot-error,
software upgrade, is not to exceed 53 minutes a year
which is colloquialy quoted a 4-9's or 99.99%. For
installations which require higher availability,
we duplex everything.)

Thus, given the option of "stopping" the program
when array bounds were exceeded
as the proposed behaviour, it would not help me
in my projects. I would still have to enforce the
coding discipline a-priori in order to ensure
that no bounds checks were ever exceeded.
This is what we do now.

There are many languages out there which perform
bounds checking. Elsethread, there are many languages
which do not have pointers. There is a need for
such languages, otherwise they would not be there,
but they are NOT C.

How many machine instructions does this cost?

Each test is a comparison of an index with a
constant value, and a conditional jump. If the
compiler only emits forward branches, the
branch predictor can correctly predict that in
most cases the branch will NOT be taken.

In abstract assembly this is 4 instructions:
test if index >= 0
jump if not "indexerror"
test if index < "n"
jump if not "indexerror"

where "n" is a compile time constant.

We have something like 4 cycles then, what
a 2GHZ machine does in 0,000 000 004 seconds.

Yes, table access is a common operation but
it would take millions of those to slow the program
a negligible quantity of time. We are not in the
PDP-11 any more.

I differ with your analysis of the number of assembly
instructions, but that's a nit-pick. I work on systems
which need to do upwards of 10,000 database lookups
per second and the same order of magnitude of parsing
strings, etc. They involve copying information from one
memory space to another. For those applications we
use C. For applications with less stringent requirements,
e.g. "only" 1,000 database accesses per second, we use Java.

C has it's place, Java has it's place, other languages
have their place.

On the C applications, we take great care NOT to use
dubious constructs and code reviews by the lead
developers are required before the code even gets
to system test. (No, it does not catch all the
problems.) There is a discipline involved.
If you don't have that discipline, use another
language. (This last was not meant as a flame,
rather a simple statement of fact.)

This would make C a little bit easier to program,
and the resulting programs of better quality.

[Much Snipped]

jacob

--
Ñ
"It is impossible to make anything foolproof because fools are so
ingenious" - A. Bloch

Nov 14 '05 #9

nrk

jacob navia wrote:

"osmium" <r1********@comcast.net> a ï¿½rit dans le message de
news:c0*************@ID-179017.news.uni-berlin.de...
jacob navia writes:
> In arrays of dimension "n", the compiler would
> emit code that tests "n" indices, before using
> them.

n is, in general, unknown in functions that have arrays passed to them.
This presents serious problems to your scheme.

This must be changed. You write:

int fn(int array[2][3]);

Meaning that this function accepts only arrays of
dimensions 2 lines three columns. Inside the
function the dimensions are known.
Or you use *bounded* pointers.
You write:
int fn(size_t lines, size_t cols,int array[lines][cols]);

This information must be passed around.

The proposal is worthy and am sure has been considered by wise folks in the
past. IMHO, the only realistic way of doing this is to ensure that pointer
types somehow implicitly contain all the relevant information needed for
bounds checking. However, this automatically means that all pointer
accesses will slow down. In the naive implementation, all memory access
through pointers might slow down by approximately 50% on average
(neglecting cache effects). Even if you optimize your checks smartly, I
doubt if the hit you take in performance is going to come down by too much.
C has traditionally been the language that's close to the "metal" and more
and more is becoming the language you use when performance becomes
critical. IMO, under these situations, you might be better off using
languages such as Java that implicitly provide such protected environments
for you.

If you want some degree of fast+safe, C++ is turning out to be a better
alternative here. If one uses the STL wisely and extensively, the need for
manual dynamic memory management comes down drastically. Additionally, the
provision of a good in-built string type reduces a lot of common errors
that you find in C programs to non-issues.

I think it is always a bad idea to try and enforce safety (or other
non-functional properties) by relying on what the user must do (for
instance, your suggestion that the size must be passed around is a
monstrosity in my view). The best solutions are those that are invisible
and omni-present.

Or to para-phrase the Isha Upanishad to fit the bottom-line :-)

It moves, and it does not move
It is far, and it is near
It is within all this, and it is also outside all this.

Just my $0.02.

-nrk.

--
Remove devnull for email

Nov 14 '05 #10

Sidney Cadot

nrk wrote:

The proposal is worthy and am sure has been considered by wise folks in the
past. IMHO, the only realistic way of doing this is to ensure that pointer
types somehow implicitly contain all the relevant information needed for
bounds checking. However, this automatically means that all pointer
accesses will slow down. In the naive implementation, all memory access
through pointers might slow down by approximately 50% on average
(neglecting cache effects).

<OT>

I did some benchmarking on the Java/hotspot JIT compiler compared to
equivalent C code, and the amazing thing was that the bounds-checked
Java code (which compiled to quite decent assembly on the x386 platform,
anyway) was only marginally slower than the C code (5-10%) for tight
inner loops.

It appeared that this was for two reasons. First, these array-accessing
loops are bound by memory access throughput (the index check is very
cheap, as it resides in a register); second, on superscalar processors
such as the x386, the bounds-check and actual operation can be performed
basically in parallel.

In short, I think we should be more concerned about semantics than about
performance impact.

Best regards,

Sidney

Nov 14 '05 #11

Nick Landsberg

Sidney Cadot wrote:

[snip]

<OT>

I did some benchmarking on the Java/hotspot JIT compiler compared to
equivalent C code, and the amazing thing was that the bounds-checked
Java code (which compiled to quite decent assembly on the x386 platform,
anyway) was only marginally slower than the C code (5-10%) for tight
inner loops.

It appeared that this was for two reasons. First, these array-accessing
loops are bound by memory access throughput (the index check is very
cheap, as it resides in a register); second, on superscalar processors
such as the x386, the bounds-check and actual operation can be performed
basically in parallel.
Please explain further, Sidney, possibly in an Email. I fail to
grasp the concept of bounds checks and actual operations
being performed in parallel. It's late at night and I need
sleep to function well. :)

Something must be linear, somewhere?

(yes, it's OT, that's why the request for Email)

In short, I think we should be more concerned about semantics than about
performance impact.
Disagree, but it's almost a religious argument. See my posts regarding
high-volume systems elsethread. Even 10% may price me out of the
market. Then again, if everyone used the same implementation, we'd
be on a level playing field again (but the customers would not like
it! :)

Best regards,

Sidney

Resident skeptic and professional paranoid.

--
Ñ
"It is impossible to make anything foolproof because fools are so
ingenious" - A. Bloch

Nov 14 '05 #12

Malcolm

"Nick Landsberg" <hu*****@att.net> wrote in message

Whoops ... my apologies ... I just checked your original
post and you do propose a solution to "stop the
program" when this check fails (or "do nothing",
whatever that means).
"do nothing" would be replacing an attempt to write outside an array with a
nop. An attempt to read outside of bounds would return a set value.
On the systems that I mentioned in my original
reply, this is NOT an option. (Maximum down-time
from ALL causes, hardware, software, pilot-error,
software upgrade, is not to exceed 53 minutes a year
which is colloquialy quoted a 4-9's or 99.99%.

For most apps I don't see that you've got a choice. If there is a software
error causing an array overflow you've got a choice between no results or
wrong results.
My own field of games is an exception, since the worst thing that is likely
to happen is that you spoil someone's enjoyment of a video game, you can
plough on and hope the program recovers.

Nov 14 '05 #13

Mark McIntyre

On Sun, 15 Feb 2004 08:02:40 -0000, in comp.lang.c , "Malcolm"
<ma*****@55bank.freeserve.co.uk> wrote:

"do nothing" would be replacing an attempt to write outside an array with a
nop. An attempt to read outside of bounds would return a set value.
wof!! Rhat would be disasterous. Imagine trying to find someone's bank
balance, the height to fly an aircraft at, or the dosage of a medicine that
they needed, and getting some compiler-defined default value....

"Your balance is 10,000, so we can pay that bad cheque "
"Fly at 10,000 feet. Don't worry about the Andes just in front of you"
"give em 10,000 ml of morphine, that ought to do"

On the systems that I mentioned in my original
reply, this is NOT an option. (Maximum down-time
from ALL causes, hardware, software, pilot-error,
software upgrade, is not to exceed 53 minutes a year
which is colloquialy quoted a 4-9's or 99.99%.

For most apps I don't see that you've got a choice. If there is a software
error causing an array overflow you've got a choice between no results or
wrong results.

Sure but the discussion here is between the compiler doing these checks
magically for you, and you doing them yourself. If your system is mission
critical then frankly you should do them, not rely on the compiler
implementor's quality control.
My own field of games is an exception, since the worst thing that is likely
to happen is that you spoil someone's enjoyment of a video game, you can
plough on and hope the program recovers.

ah, *that* explains a lot ! <gd&r>

--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.angelfire.com/ms3/bchambless0/welcome_to_clc.html>
----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---

Nov 14 '05 #14

nrk

Nick Landsberg wrote:

Nick Landsberg wrote:

Apology below.

What should be the "standard behaviour" if the bounds
checks fail? Proposing a solution to a percieved
problem without proposing appropriate behaviour
when something like that happens, is, IMO, half
a solution.

Whoops ... my apologies ... I just checked your original
post and you do propose a solution to "stop the
program" when this check fails (or "do nothing",
whatever that means).

On the systems that I mentioned in my original
reply, this is NOT an option. (Maximum down-time
from ALL causes, hardware, software, pilot-error,
software upgrade, is not to exceed 53 minutes a year
which is colloquialy quoted a 4-9's or 99.99%. For
installations which require higher availability,
we duplex everything.)

Thus, given the option of "stopping" the program
when array bounds were exceeded
as the proposed behaviour, it would not help me
in my projects. I would still have to enforce the
coding discipline a-priori in order to ensure
that no bounds checks were ever exceeded.
This is what we do now.

Yes, the array bounds check is not going to be useful for what you describe
in your *production* environment. But in all your testing phases I would
think it would add significant value. After thinking about it a bit, I
feel the proposal indeed is worthy even if you can never use it in
production environments ever for performance or other reasons. The simple
fact that I don't have to spend ages trying to track down a off-by-one
error somewhere in my code during my testing phase is enough of an
incentive to support such an initiative. As Jacob suggests, it might be
possible to build such a feature so that it can be toggled at compile time.

Those who suggest that overflowing array bounds happens only to "bad"
programmers (actually most of these kind of suggestions are in the other
"Bounds checked string library" thread)... well, what can I say?

"He that is without sin among you, let him first cast a stone at her."

-nrk.

<snip>

--
Remove devnull for email

Nov 14 '05 #15

Nick Landsberg

nrk wrote:

Nick Landsberg wrote:

Nick Landsberg wrote:
Apology below.

What should be the "standard behaviour" if the bounds
checks fail? Proposing a solution to a percieved
problem without proposing appropriate behaviour
when something like that happens, is, IMO, half
a solution.

Whoops ... my apologies ... I just checked your original
post and you do propose a solution to "stop the
program" when this check fails (or "do nothing",
whatever that means).

On the systems that I mentioned in my original
reply, this is NOT an option. (Maximum down-time
from ALL causes, hardware, software, pilot-error,
software upgrade, is not to exceed 53 minutes a year
which is colloquialy quoted a 4-9's or 99.99%. For
installations which require higher availability,
we duplex everything.)

Thus, given the option of "stopping" the program
when array bounds were exceeded
as the proposed behaviour, it would not help me
in my projects. I would still have to enforce the
coding discipline a-priori in order to ensure
that no bounds checks were ever exceeded.
This is what we do now.

Yes, the array bounds check is not going to be useful for what you describe
in your *production* environment. But in all your testing phases I would
think it would add significant value. After thinking about it a bit, I
feel the proposal indeed is worthy even if you can never use it in
production environments ever for performance or other reasons. The simple
fact that I don't have to spend ages trying to track down a off-by-one
error somewhere in my code during my testing phase is enough of an
incentive to support such an initiative. As Jacob suggests, it might be
possible to build such a feature so that it can be toggled at compile time.

Those who suggest that overflowing array bounds happens only to "bad"
programmers (actually most of these kind of suggestions are in the other
"Bounds checked string library" thread)... well, what can I say?

"He that is without sin among you, let him first cast a stone at her."

-nrk.

<snip>

What you say has merit. Even with all the discipline in the world,
some "bounds exceeded" bugs will get through. Upon further thought,
I would be in favor of a "debugging" mode where bounds checking
would be performed but which could be completely turned off
for the "production compile." This would help during individual
module development and the initial system test phase. Note that by
"completely turned off" I mean that the compiler does not
emit the bounds-checking code, not just turn the diagnostics
off. This is for performance reasons stated elsethread.

We'd still run a full regression test of the "production compile"
anyway, because, by our rather paranoid definition, this is
a different executable than the one which had bounds
checking turned on, thus a full regression test is still
necessary.

--
Ñ
"It is impossible to make anything foolproof because fools are so
ingenious" - A. Bloch

Nov 14 '05 #16

Nick Landsberg

A couple of points I forgot in the previous followup
(hit send too soon):

Nick Landsberg wrote:

What you say has merit. Even with all the discipline in the world,
some "bounds exceeded" bugs will get through. Upon further thought,
I would be in favor of a "debugging" mode where bounds checking
would be performed but which could be completely turned off
for the "production compile." This would help during individual
module development and the initial system test phase. Note that by
"completely turned off" I mean that the compiler does not
emit the bounds-checking code, not just turn the diagnostics
off. This is for performance reasons stated elsethread.

We'd still run a full regression test of the "production compile"
anyway, because, by our rather paranoid definition, this is
a different executable than the one which had bounds
checking turned on, thus a full regression test is still
necessary.

One could even argue that compiling differently (with and
without bounds checks) that it is a different language
being compiled, with bounds checks being "not C"
(or is that !C ? ... is !C == D ?, ...
or should that be P instead of D ?, but I digress. :)

I wouldn't go quite as far as saying it's a different
language, but I am sure there are people who would.

If it (bounds checking) is the default setting, then it would
benefit the uninitiated by finding their bugs earlier.
It it is not, then those same uninitated (uninitialized? :)
progammers would never use it and there's no benefit for them.

Additional developer discipline would still have to be added
to compile "with" for their functional tests and "without"
for their performance tests. This could be done with "make"
parameters, but that's OT here, and I won't get into that.
--
Ñ
"It is impossible to make anything foolproof because fools are so
ingenious" - A. Bloch

Nov 14 '05 #17

jacob navia

"Nick Landsberg" <hu*****@att.net> a écrit dans le message de
news:ZI*******************@bgtnsc04-news.ops.worldnet.att.net...

What you say has merit. Even with all the discipline in the world,
some "bounds exceeded" bugs will get through.
Nobody is perfect. Let's accept this. Nobody is perfect and
only the machine has the patience to verify things.
Upon further thought,
I would be in favor of a "debugging" mode where bounds checking
would be performed but which could be completely turned off
for the "production compile."
If you do not write
#pragma STDC bounds_checking(off)

no bounds checking is inkected in the code.

In assert.h we could add that line if NDEBUG
is not defined. Etc.
This would help during individual
module development and the initial system test phase.
I wouldn't turn it off in many programs. Normal PC
programs are completely bound by OS calls and other
stuff. The index checking overhead is so small that
nobody will notice.

Note that by
"completely turned off" I mean that the compiler does not
emit the bounds-checking code, not just turn the diagnostics
off. This is for performance reasons stated elsethread.
The code generator doesn't emit anything new if
there is no bounds check specified.

Even where bounds check is ON, there is no need to
check more than once an input array that is not "resized"
within the body of the function.
We'd still run a full regression test of the "production compile"
anyway, because, by our rather paranoid definition, this is
a different executable than the one which had bounds
checking turned on, thus a full regression test is still
necessary.

Obvious. That is not paranoid but necessary. 4 instructions
more increase the size, lower the speed of the program. It
is another executable.

A finer grained decision can be done if you just turn
bounds checking off only in time critical parts of the
program and leave the rest.

The big problem is the managing of size information.

An obstacle to that is the lack of array assignment and
array "decay" on C.

--
Ñ
"It is impossible to make anything foolproof because fools are so
ingenious" - A. Bloch

Nov 14 '05 #18

Nick Landsberg

jacob navia wrote:

"Nick Landsberg" <hu*****@att.net> a écrit dans le message de
news:ZI*******************@bgtnsc04-news.ops.worldnet.att.net...

What you say has merit. Even with all the discipline in the world,
some "bounds exceeded" bugs will get through.

Nobody is perfect. Let's accept this. Nobody is perfect and
only the machine has the patience to verify things.

But the "machine" is also programmed by humans. In this
case the compiler is progammed by humans.

Upon further thought,
I would be in favor of a "debugging" mode where bounds checking
would be performed but which could be completely turned off
for the "production compile."

If you do not write
#pragma STDC bounds_checking(off)

no bounds checking is inkected in the code.

In assert.h we could add that line if NDEBUG
is not defined. Etc.

I would not support this proposal with the #pragma
because this means modifying the file which contains
the code. I would be more willing to support it if
it was a flag to the compiler. The reasons have
to do with code maintenance and paranoia on my part.
If the source file was modified, how does one guarantee
that the #pragma directive was the ONLY one modified?
How does one guarantee that every developer properly
modified the #pragma directive prior to the "production
build?" One could write various scripts to do this,
I suppose, but that's outside of the language.
This issue transcends the language but is legitimate
to address when dealing with a software system having
several thousand directories and several tens of
thousands of source files. From my perspective,
if this is adopted, it should be a compile time
flag, rather than a pre-processor or compiler
directive in the source file.

Trial implementations may well decide to use the
#pragma, but from a code maintenance standpoint
I feel it needs to be a compile-time flag.

This would help during individual
module development and the initial system test phase.

I wouldn't turn it off in many programs. Normal PC
programs are completely bound by OS calls and other
stuff. The index checking overhead is so small that
nobody will notice.

Agreed. I am just viewing it from the rather
parochial standpoint of th kinds of projects
I have been working on the past 10 years or so.

Note that by
"completely turned off" I mean that the compiler does not
emit the bounds-checking code, not just turn the diagnostics
off. This is for performance reasons stated elsethread.

The code generator doesn't emit anything new if
there is no bounds check specified.

Even where bounds check is ON, there is no need to
check more than once an input array that is not "resized"
within the body of the function.

That's tricky code within the compiler. A braindead
implementation could choose to apply the bound checks
for all instances. Since the standard has NEVER provided
performance guidelines, there is nothing to prevent
a compiler vendor to emit that code for every array
access. Other than a better compiler vendor doing
the very optimization you are speaking of. If the
subscript is computed "on the fly" I presume a bounds
check would be necessary, even if the size of the array
did not change.

We'd still run a full regression test of the "production compile"
anyway, because, by our rather paranoid definition, this is
a different executable than the one which had bounds
checking turned on, thus a full regression test is still
necessary.

Obvious. That is not paranoid but necessary. 4 instructions
more increase the size, lower the speed of the program. It
is another executable.

It's actually 6 or more instructions in most assemblers.

A finer grained decision can be done if you just turn
bounds checking off only in time critical parts of the
program and leave the rest.

The big problem is the managing of size information.

An obstacle to that is the lack of array assignment and
array "decay" on C.

And "C" has always let the programmer shoot themselves in
the foot. Once upon a time, someone mentioned to me
that "Pascal is like a nanny. If you want to go out and
play in the street, she insists that you put your galoshes
on in case it should rain. C, on the other hand, has the
attitude of 'don't blame me if you catch your death of cold!'"

Later.
--
Ñ
"It is impossible to make anything foolproof because fools are so
ingenious" - A. Bloch

Nov 14 '05 #19

Keith Thompson

"jacob navia" <ja***@jacob.remcomp.fr> writes:
[...]

It would be useful then, if we introduced into C

#pragma STDC bounds_checking(ON/OFF)

When the state of this toggle is ON, the compiler
would accept declarations (like now)

int array[2][3];

The compiler would emit code that tests
each index for a well formed index.
Each index runs from zero to n-1, i.e.
must be greater than zero and less than
"n".

[...]

To be useful, such bounds checking cannot be limited to explicitly
declared arrays. You've mentioned requiring explicit bounds on
function arguments, but that's only one special case. For example:

int arr[10];
int *ptr = arr;
...
int x = ptr[15];

The declaration of arr and ptr, the initialization of ptr, and the
evaluation of ptr[15] can be almost arbitrarily complex and can be
widely separated, even in separate source files. If you want to do
reliable bounds checking on ptr[15], you have to carry the bounds
information along with the pointer.

In other words, you need fat pointers.

The type of a pointer is known at compilation time, so you don't need
to store, for example, the fact that ptr points to, say, a 4-byte
type. You do need to store the lower and upper bounds of the object
into which it points. For example (let's call the raw address
returned by malloc() M):

char *ptr = malloc(100); /* ptr contains (M, 0, 100) */
ptr += 20; /* ptr contains (M+20, -20, 80) */
char c1 = ptr[-10]; /* ok, -10 is within the bounds */
char c2 = ptr[50]; /* ok, 50 is within the bounds */
char c3 = ptr[80]; /* failure, 80 is just beyond the upper bound */

Every pointer arithmetic operation has to update the bounds
information as well as the address, and can fail if the resulting
address points outside the original object. Every pointer dereference
operation has to check the bounds. The bounds information could be
stored either as byte offsets or as a count of the pointed-to type; I
have a hunch byte offsets would work better, but I'm not certain.

Finally, you still need to have a way to represent a pointer with no
associated bounds information, on which no checking can be done.
Unless you're designing a whole new system from scratch, some code in
C-with-bounds-checking ("C[]"?) will still have to deal with pointers
from system interfaces and from code written in C-without-bounds-checking.

Perhaps, rather than a "#pragma", bounds-checked pointers should be
specified with a new type qualifier.

Note that the presence of absence of bounds-checking, however it's
specified, will change the sizes of pointer objects, which will affect
memory layouts, which will inevitably reveal or hide subtle bugs.
There will be programs that work correctly (and don't trigger any
checks) with bounds-checking enabled, but will fail mysteriously when
it's disabled.

Another possibility might be to store the bounds information
elsewhere, rather than in each pointer object. The system could keep
a table of all known objects, including base address and size, that's
updated whenever an object is created or destroyed. But then each
pointer operation would have to do a (hash?) lookup on the table,
which would slow things down even more.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"

Nov 14 '05 #20

Ben Pfaff

Keith Thompson <ks***@mib.org> writes:

The declaration of arr and ptr, the initialization of ptr, and the
evaluation of ptr[15] can be almost arbitrarily complex and can be
widely separated, even in separate source files. If you want to do
reliable bounds checking on ptr[15], you have to carry the bounds
information along with the pointer.

In other words, you need fat pointers.

An interesting paper on this subject was recently published as
Ruwase and Lam, "A Practical Dynamic Buffer Overflow Detector",
at NDSS 2004. It's available from
http://suif.stanford.edu/papers/tunji04.pdf
Typical performance penalty appears to be about 25% and it only
handles string buffers (because those are typical sources of
security problems) not general arrays.
--
"The fact that there is a holy war doesn't mean that one of the sides
doesn't suck - usually both do..."
--Alexander Viro

Nov 14 '05 #21

Martin Dickopp

Keith Thompson <ks***@mib.org> writes:

Perhaps, rather than a "#pragma", bounds-checked pointers should be
specified with a new type qualifier.

That implies that the programmer is careful to apply the new qualifier.
But if the programmer is that careful, why doesn't s/he write correct
code in the first place?

IOW, I wouldn't find bounded pointers useful in new, carefully designed
code, but I would rather like to be able to apply them to all the sloppy
code already out there.

Martin

Nov 14 '05 #22

Keith Thompson

Martin Dickopp <ex****************@zero-based.org> writes:

Keith Thompson <ks***@mib.org> writes:
Perhaps, rather than a "#pragma", bounds-checked pointers should be
specified with a new type qualifier.

That implies that the programmer is careful to apply the new qualifier.
But if the programmer is that careful, why doesn't s/he write correct
code in the first place?

IOW, I wouldn't find bounded pointers useful in new, carefully designed
code, but I would rather like to be able to apply them to all the sloppy
code already out there.

Hmm. I tend to think that all software has bugs. If you've found a
way to eliminate all bugs by being careful, I'm very impressed.

What I'd like to suggest is adding a qualifier that specifies a
pointer type is *not* bounds-checked (probably with a #pragma and/or
command-line option to override all bounds-checking), but I suspect it
can't be done compatibly.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"

Nov 14 '05 #23

Martin Dickopp

Keith Thompson <ks***@mib.org> writes:

Martin Dickopp <ex****************@zero-based.org> writes:
Keith Thompson <ks***@mib.org> writes:
> Perhaps, rather than a "#pragma", bounds-checked pointers should be
> specified with a new type qualifier.
That implies that the programmer is careful to apply the new qualifier.
But if the programmer is that careful, why doesn't s/he write correct
code in the first place?

IOW, I wouldn't find bounded pointers useful in new, carefully designed
code, but I would rather like to be able to apply them to all the sloppy
code already out there.

Hmm. I tend to think that all software has bugs. If you've found a
way to eliminate all bugs by being careful, I'm very impressed.

Unfortunately, I haven't. :)

(Well, I could argue that it is trivial to eliminate bugs by being
careful, but impossible to always be that careful...)
What I'd like to suggest is adding a qualifier that specifies a
pointer type is *not* bounds-checked

I'd find that much more useful than /enabling/ bound checkings by a new
qualifier. Primarily, my point is that I want to be able to recompile
existing code and have bounds checking.

Martin

Nov 14 '05 #24

Peter Ammon

jacob navia wrote:

As everybody knows, the C language lacks
a way of specifying bounds checked arrays.

So use an implementation that performs bounds checking.

For example: <http://fabrice.bellard.free.fr/tcc/>

-Peter
--
Pull out a splinter to reply.

Nov 14 '05 #25

Joona I Palaste

Nick Landsberg <hu*****@att.net> scribbled the following:

One could even argue that compiling differently (with and
without bounds checks) that it is a different language
being compiled, with bounds checks being "not C"
(or is that !C ? ... is !C == D ?, ...
or should that be P instead of D ?, but I digress. :)

I thought we weren't supposed to talk about those nasty off-topic !C's.
<g, d & r>

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"It's time, it's time, it's time to dump the slime!"
- Dr. Dante

Nov 14 '05 #26

Phil Tregoning

Martin Dickopp <ex****************@zero-based.org> wrote in
news:cu*************@zero-based.org:

"jacob navia" <ja***@jacob.remcomp.fr> writes:
As everybody knows, the C language lacks a way of specifying bounds
checked arrays.

Yes, but I don't think anything in the C standard /forbits/ an
implementation to check array bounds. From the point of view of the
standard, accessing an out of bounds array element causes undefined
behavior, so the implementation is free to (e.g.) terminate the program.

<OT>
FWIW, there is or was an attempt to implement bounds checking in the
GNU C compiler. I don't know what the current state is.
</OT>

FWIW, the VMS C compiler offers bounds checking as a compiler
option. It only works on real arrays (not pointers). There is
a description of usage and limitations here:

http://h71000.www7.hp.com/commercial...unds_check_sec

They can be summed up as:

o Only works on real arrays.
o Allows address one-past-the-end to be taken.
o Disables checks on arrays in a struct of size one (to allow the
"struct hack").
o Each separate subscript is checked in multidimensional arrays
(so "int a[10][10]; a[0][12] = 0;" counts as out-of-bounds).

If an out-of-bounds access is discovered during compilation the
compiler emits a warning and continues.

If an out-of-bounds access is discovered during run-time the
program exits with a "SYSTEM-F-SUBRNG, arithmetic trap, subscript
out of range at PC..." error (which counts as a SIGFPE signal and
can be trapped).

Because it doesn't work on pointers, I don't find it very useful.

Phil T

Nov 14 '05 #27

Dan Pop

In <c0**********@news-reader5.wanadoo.fr> "jacob navia" <ja***@jacob.remcomp.fr> writes:

As everybody knows, the C language lacks
a way of specifying bounds checked arrays.

And there is a fundamental reason for that: the C's ability to alias
everything with character pointers renders the problem practically
insolvable.

Consider the trivial example:

char a[5][3], *p = (char *)a + 10;

What should be the limits of p and why? Things get even cloudier when
dealing with pointers that alias arrays embedded into structures:

struct foo {
int i;
char a[10];
double x;
} bar;

char *p = (char *)&bar + offsetof(struct foo, a) + 5;

Is p bounded by the array a or by the struct foo? If the compiler doesn't
read the programmer's mind properly, it will either generate an undesired
alert or quietly allow an out of bounds access.

Or how about unions:

union foo {
char a[10];
int i[10];
float x[10];
} bar;

char *p = (char *)&bar + 5;

Bound checking is well defined on most languages which either don't
support pointers at all (Fortran <= F77) or have a very restricted notion
of pointers (Fortran >= F90 and Pascal).

In C, limiting the bound checking to expressions containing the array
identifier itself is far from providing any kind of safety (think buffer
overflows a la gets) and going beyond this is incredibly difficult for
reasons partly described above.

There is also the library issue: each implementation would have to come
with two versions of the libraries: with bound checking and without.
Otherwise, inserting checks only in the user code is not enough (again,
think gets).

Sorry, but C isn't the language for people needing bound checking.
Unfortunately, far too many such people do program in C... And even if
bound checking is eventually introduced, most of those people would
be the last ones to enable it in their code ;-)

But feel free to experiment with it in your compiler and see what
happens.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #28

CBFalconer

osmium wrote:

jacob navia writes:
In arrays of dimension "n", the compiler would emit code that
tests "n" indices, before using them.

n is, in general, unknown in functions that have arrays passed
to them. This presents serious problems to your scheme.

This can be cured, with heavy run time expense, by making all
pointers indirect such that they describe an area, its limits, and
an offset within that area. The pointers themselves need not grow
excessively, since one field can index a description array that
need not be unique to that pointer, and another field can hold the
offset. All this requires no changes in the standard. I have
outlined such a system before, with no overwhelming response.
Something recent, either here or in comp.arch.embedded in the past
week or so, indicated that IBM created some such machine name
???400???.

For example, a stack oriented machine will have an entry for the
stack as a whole, for each stack frame, and for each declared
object in any frame. Things will grow because even single byte
objects must have such an entry in order to be able to create and
pass pointers to them. Similarly anything created by malloc will
require at least one entry.

Exit from a stack frame (assuming again a stack machine) will
probably need to destroy entries, or at least mark them invalid.
This brings up the problem of dangling references, and detection
thereof.

Whereever you look, the C practice of brandishing pointers
indiscriminately poses apparently insoluble barriers to the
creation of accurate self-checking code.

My conclusion is that the language should be used in its original
mode - as structured assembly - and that critical code should be
written in better suited languages that have been designed for the
task.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #29

CBFalconer

jacob navia wrote:

"osmium" <r1********@comcast.net> a écrit dans le message de
jacob navia writes:
In arrays of dimension "n", the compiler would> emit code
that tests "n" indices, before using them.

n is, in general, unknown in functions that have arrays passed
to them. This presents serious problems to your scheme.

This must be changed. You write:

int fn(int array[2][3]);

Meaning that this function accepts only arrays of
dimensions 2 lines three columns. Inside the
function the dimensions are known.
Or you use *bounded* pointers.
You write:
int fn(size_t lines, size_t cols,int array[lines][cols]);

This information must be passed around.

This is built into the language, if you simply use Pascal, Modula,
or Ada. C will never be altered in directions that seriously
break existing code.

Halfway improvements are probably even more dangerous than no
improvements, because they give programmers a false sense of
security.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #30

Malcolm

"CBFalconer" <cb********@yahoo.com> wrote in message

My conclusion is that the language should be used in its original
mode - as structured assembly - and that critical code should be
written in better suited languages that have been designed for the
task.

Or maybe run the bounded pointer implementation for a debug and testing
session.
The problem you face is that if you have a buffer overrun then an error has
already occured, because buffer overflows happen for a reason.

Nov 14 '05 #31

Keith Thompson

Da*****@cern.ch (Dan Pop) writes:
[...]

Sorry, but C isn't the language for people needing bound checking.
Unfortunately, far too many such people do program in C... And even if
bound checking is eventually introduced, most of those people would
be the last ones to enable it in their code ;-)

I'm curious: which people *don't" need bounds checking?

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"

Nov 14 '05 #32

Keith Thompson

CBFalconer <cb********@yahoo.com> writes:
[...]

This can be cured, with heavy run time expense, by making all
pointers indirect such that they describe an area, its limits, and
an offset within that area. The pointers themselves need not grow
excessively, since one field can index a description array that
need not be unique to that pointer, and another field can hold the
offset. All this requires no changes in the standard. I have
outlined such a system before, with no overwhelming response.
Something recent, either here or in comp.arch.embedded in the past
week or so, indicated that IBM created some such machine name
???400???.
AS/400?

Can you provide a pointer to your proposal? I'd be interested in
seeing it.

Indexing into a description array saves space in each pointer, but
since each pointer dereference has to refer to the bounds information,
putting it in the pointer itself might make for faster execution.

[...] Whereever you look, the C practice of brandishing pointers
indiscriminately poses apparently insoluble barriers to the
creation of accurate self-checking code.

My conclusion is that the language should be used in its original
mode - as structured assembly - and that critical code should be
written in better suited languages that have been designed for the
task.

It might be an interesting exercise to design a new language (as close
to C as possible, but no closer) that incorporates this kind of bounds
checking. If it can't be done without breaking existing code, then
there's little chance of the new language being called C, but there
seems to be a market advantage in designing new languages with C-like
syntax, even if they're incompatible (see Java and Perl, for example).
If a lot of existing C code can be ported to the new language without
too much effort, it might even catch on. It would probably need to
have a mechanism for working with C-style skinny pointers so it could
interface to existing C code (OS interfaces, for example); such a
mechanism should be very explicit so programmers aren't tempted to use
it by default (perhaps a type qualifier called "dangerous").

And I think I've just jumped off the cliff of topicality into the icy
fjord of speculative language design.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"

Nov 14 '05 #33

Rob Thorpe

Da*****@cern.ch (Dan Pop) wrote in message news:<c0***********@sunnews.cern.ch>...

In <c0**********@news-reader5.wanadoo.fr> "jacob navia" <ja***@jacob.remcomp.fr> writes:
As everybody knows, the C language lacks
a way of specifying bounds checked arrays.
And there is a fundamental reason for that: the C's ability to alias
everything with character pointers renders the problem practically
insolvable.

Consider the trivial example:

char a[5][3], *p = (char *)a + 10;

What should be the limits of p and why? Things get even cloudier when
dealing with pointers that alias arrays embedded into structures:

struct foo {
int i;
char a[10];
double x;
} bar;

char *p = (char *)&bar + offsetof(struct foo, a) + 5;

Is p bounded by the array a or by the struct foo? If the compiler doesn't
read the programmer's mind properly, it will either generate an undesired
alert or quietly allow an out of bounds access.

Or how about unions:

union foo {
char a[10];
int i[10];
float x[10];
} bar;

char *p = (char *)&bar + 5;

Bound checking is well defined on most languages which either don't
support pointers at all (Fortran <= F77) or have a very restricted notion
of pointers (Fortran >= F90 and Pascal).

In C, limiting the bound checking to expressions containing the array
identifier itself is far from providing any kind of safety (think buffer
overflows a la gets) and going beyond this is incredibly difficult for
reasons partly described above.

There is also the library issue: each implementation would have to come
with two versions of the libraries: with bound checking and without.
Otherwise, inserting checks only in the user code is not enough (again,
think gets).

Just because it is impossible to make bounds checking bulletproof
doesn't mean it is not a good idea. I think most of us who write C
would be thankful for it if it. I would be even if it only catches a
few mistakes during coding.

The problem is explaining to people that the bounds checking is not
watertight. Even this shouldn't pose too much of a problem.

Sorry, but C isn't the language for people needing bound checking.
Unfortunately, far too many such people do program in C... And even if
bound checking is eventually introduced, most of those people would
be the last ones to enable it in their code ;-)

But feel free to experiment with it in your compiler and see what
happens.

Exactly, lets wait and see what the problems turn out to be.

Nov 14 '05 #34

Rob Thorpe

"jacob navia" <ja***@jacob.remcomp.fr> wrote in message news:<c0**********@news-reader5.wanadoo.fr>...

As everybody knows, the C language lacks
a way of specifying bounds checked arrays.

This situation is intolerable for people that know
that errors are easy to do, and putting today's
powerful microprocessor to do a few instructions
more at each array access will not make any
difference what speed is concerned.

Not all C applications are real-time apps.

Besides, there are the viruses
and other malicious software that are using
this problem in the C language to do their dirty
work.

Security means that we avoid the consequences
of mistakes and expose them as soon as possible.

It would be useful then, if we introduced into C

#pragma STDC bounds_checking(ON/OFF)

When the state of this toggle is ON, the compiler
would accept declarations (like now)

int array[2][3];

The compiler would emit code that tests
each index for a well formed index.
Each index runs from zero to n-1, i.e.
must be greater than zero and less than
"n".

In arrays of dimension "n", the compiler would
emit code that tests "n" indices, before using
them.

Obviously, optimizations are possible, and
good compilers will optimize away many tests
specially in loops. This is left unspecified.

Important is to know that the array updates
can't overflow in neighboring memory areas.

How many machine instructions does this cost?

Each test is a comparison of an index with a
constant value, and a conditional jump. If the
compiler only emits forward branches, the
branch predictor can correctly predict that in
most cases the branch will NOT be taken.

In abstract assembly this is 4 instructions:
test if index >= 0
jump if not "indexerror"
test if index < "n"
jump if not "indexerror"

where "n" is a compile time constant.

We have something like 4 cycles then, what
a 2GHZ machine does in 0,000 000 004 seconds.

Yes, table access is a common operation but
it would take millions of those to slow the program
a negligible quantity of time. We are not in the
PDP-11 any more.

This would make C a little bit easier to program,
and the resulting programs of better quality.
Buffer overflows happen of course, but the language
limits the consequences by enforcing limits.

By default the behavior is to stop the program.
The user can override this, and different schemas
can be specified by him/her to take actions when
a buffer overflow happens.

A simple strategy is to just do nothing.

int fn(char *input)
{
char tmpbuf[BUFSIZ];
int i=0;
bool result = false;

while (*input) {
tmpbuf[i++] = *input++;
}
// Do things with the input
// set result
return result;
indexerror:
return false;
}

This function uses the built-in error checking
to avoid any bad consequence for an overflow.
If the input data is too long, it is a mal-formed
input that should be discarded.

This frees the programmer from the tedious task
of writing
if (i >= sizeof(tmpbuf)) goto indexerror;

at EACH array access. This can be done better
by a machine and the compiler.

Because a program like that today
***assumes*** the input length
can't be bigger than BUFSIZ.

This is always *implicitely* assumed and
nowhere *enforced* by the way. The current
state implies that catastrophic errors can happen
if the index starts overwriting separate memory
areas like the return address...

Everyone knows this. Let's do something to
stop it. Something simple, without too much
fuzz.

In this case the compiler generates code that
in case of index error
jumps to this label and does what the programmer
specifies.

The motto of C is that: Trust the programmer.

We have just to allow him/her to specify what to do
in case of overflow.

Trust the programmer doesn't mean that we trust
that he never does a mistake of course. It means
that the programmer can specify what actions
to take in case of error and provide sensible
defaults.

Default is then, to finish the program like the
assert() macro, another useful construct.

Note that this proposal doesn't change anything
in the language. No new constructs, even if
compilers could provide arrangements like the
one proposed above.

I propose then:

#pragma STDC bounds_checking(ON/OFF)

that should be written outside a function scope.

That's all.

This proposal is an invitation to
brain-storming..:-)

I know that anyone using C is aware of this.
So, let's fix it.

Sounds like a good idea. Since nothing gets standardised without
someone doing it first, why not implement it in LCC, then see how many
problems are encountered.

Before you do read:
http://www.doc.ic.ac.uk/~phjk/BoundsChecking.html

this is how it was done in TCC.

Perhaps try to make it work the same way as TCC to gets things started
without initial compatibility problems.

Nov 14 '05 #35

Nick Landsberg

Rob Thorpe wrote:

The problem is explaining to people that the bounds checking is not
watertight. Even this shouldn't pose too much of a problem.

I was with you until this statement.

The problem is explaining this to Managementcritters,
who don't know one language from another, have never
written a line of code in their lives, and latch on
to the latest buzzword as a panacea for everything
that's wrong with software development.

"We're gonna have bounds checking in C! That means
we can save time by eliminating the code reviews!
We can get the product out the door faster!"

And yes, the innuendo that they are not really people was
intentional. :)

--
Ñ
"It is impossible to make anything foolproof because fools are so
ingenious" - A. Bloch

Nov 14 '05 #36

Joe Wright

Keith Thompson wrote:

Da*****@cern.ch (Dan Pop) writes:
[...]
Sorry, but C isn't the language for people needing bound checking.
Unfortunately, far too many such people do program in C... And even if
bound checking is eventually introduced, most of those people would
be the last ones to enable it in their code ;-)

I'm curious: which people *don't" need bounds checking?

Dan can speak for himself, of course, but I don't need it and I suppose
you don't need it. We know what C is and we do our own bounds checking
when we write our programs. People *need* bounds checking by the
Language/Compiler presumably because they are unwilling or unable to do
it themselves.
--
Joe Wright http://www.jw-wright.com
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---

Nov 14 '05 #37

Nick Landsberg

Joe Wright wrote:

Keith Thompson wrote:

Da*****@cern.ch (Dan Pop) writes:
[...]
Sorry, but C isn't the language for people needing bound checking.
Unfortunately, far too many such people do program in C... And even if
bound checking is eventually introduced, most of those people would
be the last ones to enable it in their code ;-)

I'm curious: which people *don't" need bounds checking?

Dan can speak for himself, of course, but I don't need it and I suppose
you don't need it. We know what C is and we do our own bounds checking
when we write our programs. People *need* bounds checking by the
Language/Compiler presumably because they are unwilling or unable to do
it themselves.

Or have not been trained in the school of professional paranoia. :)

As I mentioned upthread, on the systems on which I work, it is
unacceptable to have the implementation take what I would
presume to be the default action of stopping the program
if a compiler generated bounds check were exceeded.
This may not be your environment, but it is mine.

Even when a language that has bounds checking is used in
our stuff, care is taken to ensure that no run-time
bounds checks are exceeded. The program logs an
error, ditches the the transaction, and goes on to
the next. I choose to call this "programming discipline."

Thus, I see no real need for compiler-generated bounds
checking in production code. I see it as a very
useful debugging tool, tho (e.g. during my testing,
I discover that I have exceeded array bounds. I will
than put in the code which I forgot which checks the
array bounds).

--
Ñ
"It is impossible to make anything foolproof because fools are so
ingenious" - A. Bloch

Nov 14 '05 #38

Rob Thorpe

Nick Landsberg <hu*****@att.net> wrote in message news:<vX*******************@bgtnsc05-news.ops.worldnet.att.net>...

Rob Thorpe wrote:

The problem is explaining to people that the bounds checking is not
watertight. Even this shouldn't pose too much of a problem.

I was with you until this statement.

The problem is explaining this to Managementcritters,
who don't know one language from another, have never
written a line of code in their lives, and latch on
to the latest buzzword as a panacea for everything
that's wrong with software development.

"We're gonna have bounds checking in C! That means
we can save time by eliminating the code reviews!
We can get the product out the door faster!"

And yes, the innuendo that they are not really people was
intentional. :)

You are undoubtably right that many people will jump to incorrect
conclusions. Probably some stupider managers will misunderstand it.
However, if your managers jump on a minor programming language
development as a reason to change the development process then they
are idiots. And if that is the case you have bigger fish to fry.

Nov 14 '05 #39

Richard Bos

Keith Thompson <ks***@mib.org> wrote:

Da*****@cern.ch (Dan Pop) writes:
[...]
Sorry, but C isn't the language for people needing bound checking.
Unfortunately, far too many such people do program in C... And even if
bound checking is eventually introduced, most of those people would
be the last ones to enable it in their code ;-)

I'm curious: which people *don't" need bounds checking?

People who have already done explicit bounds checking on their data the
moment they get it from the outside, and know that all data that's
passed the tests can be trusted.

Richard

Nov 14 '05 #40

Nick Landsberg

Rob Thorpe wrote:

Nick Landsberg <hu*****@att.net> wrote in message news:<vX*******************@bgtnsc05-news.ops.worldnet.att.net>...
[snip]

You are undoubtably right that many people will jump to incorrect
conclusions. Probably some stupider managers will misunderstand it.
However, if your managers jump on a minor programming language
development as a reason to change the development process then they
are idiots. And if that is the case you have bigger fish to fry.

And you are absolutely correct, we have bigger fish to fry.

--
Ñ
"It is impossible to make anything foolproof because fools are so
ingenious" - A. Bloch

Nov 14 '05 #41

Keith Thompson

rl*@hoekstra-uitgeverij.nl (Richard Bos) writes:

Keith Thompson <ks***@mib.org> wrote:
Da*****@cern.ch (Dan Pop) writes:
[...]
Sorry, but C isn't the language for people needing bound checking.
Unfortunately, far too many such people do program in C... And even if
bound checking is eventually introduced, most of those people would
be the last ones to enable it in their code ;-)

I'm curious: which people *don't" need bounds checking?

People who have already done explicit bounds checking on their data the
moment they get it from the outside, and know that all data that's
passed the tests can be trusted.

And who never make mistakes.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"

Nov 14 '05 #42

CBFalconer

Keith Thompson wrote:

CBFalconer <cb********@yahoo.com> writes:
[...]
This can be cured, with heavy run time expense, by making all
pointers indirect such that they describe an area, its limits, and
an offset within that area. The pointers themselves need not grow
excessively, since one field can index a description array that
need not be unique to that pointer, and another field can hold the
offset. All this requires no changes in the standard. I have
outlined such a system before, with no overwhelming response.
Something recent, either here or in comp.arch.embedded in the past
week or so, indicated that IBM created some such machine name
???400???.
AS/400?

Can you provide a pointer to your proposal? I'd be interested in
seeing it.

I have just thrown such ideas out upon the fallow ground of
usenet. Nothing complete or thought out exists that originated
with me.

Indexing into a description array saves space in each pointer, but
since each pointer dereference has to refer to the bounds information,
putting it in the pointer itself might make for faster execution.
The pointer has to hold an offset. For an array the bounds info
specifies the ends of that array, and its space (which stack
frame, which mallocation, etc). But a walking pointer, as used in
"while (*d++ = *s++);" has the same bounds, with the offset being
diddled.

[...]
Whereever you look, the C practice of brandishing pointers
indiscriminately poses apparently insoluble barriers to the
creation of accurate self-checking code.

My conclusion is that the language should be used in its original
mode - as structured assembly - and that critical code should be
written in better suited languages that have been designed for the
task.

It might be an interesting exercise to design a new language (as close
to C as possible, but no closer) that incorporates this kind of bounds
checking. If it can't be done without breaking existing code, then
there's little chance of the new language being called C, but there
seems to be a market advantage in designing new languages with C-like
syntax, even if they're incompatible (see Java and Perl, for example).
If a lot of existing C code can be ported to the new language without
too much effort, it might even catch on. It would probably need to
have a mechanism for working with C-style skinny pointers so it could
interface to existing C code (OS interfaces, for example); such a
mechanism should be very explicit so programmers aren't tempted to use
it by default (perhaps a type qualifier called "dangerous").

Why bother? Accurate languages already exist, including Ada and
Pascal. Since Ada is specifically intended to call routines in
other languages (e.g. C) there is no reason to try to defeat its
restrictions. To my mind the trouble with C++ and Java is that
they are so closely tied to C phraseology.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #43

CBFalconer

Richard Bos wrote:

Keith Thompson <ks***@mib.org> wrote:
Da*****@cern.ch (Dan Pop) writes:
[...]
Sorry, but C isn't the language for people needing bound
checking. Unfortunately, far too many such people do program
in C... And even if bound checking is eventually introduced,
most of those people would be the last ones to enable it in
their code ;-)

I'm curious: which people *don't" need bounds checking?

People who have already done explicit bounds checking on their
data the moment they get it from the outside, and know that all
data that's passed the tests can be trusted.

Unfortunately the design of C (and of any language without
subranges) is such that such pre-testing cannot be described to
the remainder of the program. This is one more reason that
checking, besides being impossible to fully implement, must also
be highly inefficient in C.

This whole discussion reminds me of wives trying to remake
husbands, or vice-versa. Let's just accept the language for what
it is, and possibly try to ameliorate a few rough spots.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #44

Dan Pop

In <1a**************************@posting.google.com > ro***********@antenova.com (Rob Thorpe) writes:

Just because it is impossible to make bounds checking bulletproof
doesn't mean it is not a good idea. I think most of us who write C
would be thankful for it if it. I would be even if it only catches a
few mistakes during coding.

You either do something or you don't. Halfway solutions are simply not
good enough. You'd get a false sense of safety that's much worse than
knowing that you're fully on your own and have to rely on your own
checking.

AFAIK, there is a version of gcc with bound checking support. It doesn't
seem to be particularly popular even among gcc users.

And I have seen a beginner confused by a C compiler with bound checking
support. His code was correct, but the bound checker didn't realise it,
for one of the reasons I explained in my previous post.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #45

Dan Pop

In <ln************@nuthaus.mib.org> Keith Thompson <ks***@mib.org> writes:

Da*****@cern.ch (Dan Pop) writes:
[...]
Sorry, but C isn't the language for people needing bound checking.
Unfortunately, far too many such people do program in C... And even if
bound checking is eventually introduced, most of those people would
be the last ones to enable it in their code ;-)

I'm curious: which people *don't" need bounds checking?

Those who know what they're doing.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #46

Dan Pop

In <ln************@nuthaus.mib.org> Keith Thompson <ks***@mib.org> writes:

rl*@hoekstra-uitgeverij.nl (Richard Bos) writes:
Keith Thompson <ks***@mib.org> wrote:
> Da*****@cern.ch (Dan Pop) writes:
> [...]
> > Sorry, but C isn't the language for people needing bound checking.
> > Unfortunately, far too many such people do program in C... And even if
> > bound checking is eventually introduced, most of those people would
> > be the last ones to enable it in their code ;-)
>
> I'm curious: which people *don't" need bounds checking?

People who have already done explicit bounds checking on their data the
moment they get it from the outside, and know that all data that's
passed the tests can be trusted.

And who never make mistakes.

Or who take additional trouble to avoid certain classes of mistakes.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #47

Rob Thorpe

Da*****@cern.ch (Dan Pop) wrote in message news:<c0**********@sunnews.cern.ch>...

In <1a**************************@posting.google.com > ro***********@antenova.com (Rob Thorpe) writes:
Just because it is impossible to make bounds checking bulletproof
doesn't mean it is not a good idea. I think most of us who write C
would be thankful for it if it. I would be even if it only catches a
few mistakes during coding.
You either do something or you don't. Halfway solutions are simply not
good enough. You'd get a false sense of safety that's much worse than
knowing that you're fully on your own and have to rely on your own
checking.

No, I don't agree. Halfway solutions are useful in engineering. In
many ways the C type system is rather halfway solution in the first
place, since you can break through it.

Read the paper I referred to further down this thread. In this case
the partialness of the solution is probably a lot less partial than
you imagine, it doesn't work if you link to libraries that don't use
it or you if you write your own memory allocation code. For instance
this horrible thing:

char *p = (char *)&bar + offsetof(struct foo, a) + 5;

would be handled sanely.
AFAIK, there is a version of gcc with bound checking support. It doesn't
seem to be particularly popular even among gcc users.
I would use it, but unfortunately it wasn't updated for modern GCC's,
the last version that had patches available was some egcs version.
And I have seen a beginner confused by a C compiler with bound checking
support. His code was correct, but the bound checker didn't realise it,
for one of the reasons I explained in my previous post.

I think that is a worthwhile price to pay for the benefits of bounds
checking.

Nov 14 '05 #48

James Rogers

Da*****@cern.ch (Dan Pop) wrote in news:c0**********@sunnews.cern.ch:

In <ln************@nuthaus.mib.org> Keith Thompson <ks***@mib.org>
writes:
I'm curious: which people *don't" need bounds checking?

Those who know what they're doing.

And those who never need to use code developed by somebody else.

You may know what you are doing but the person who developed the
library you are using may not be as good as you, or may have
applied different assumptions about the code and data than you
have. As mentioned elsethread, C offers precious little assitance
to the effort of making your assumptions visible to other parts
of a program. This is particularly true when dealing with issues
of separate compilation.

Jim Rogers

Nov 14 '05 #49

Dan Pop

In <Xn******************************@204.127.36.1> James Rogers <ji**************@att.net> writes:

Da*****@cern.ch (Dan Pop) wrote in news:c0**********@sunnews.cern.ch:
In <ln************@nuthaus.mib.org> Keith Thompson <ks***@mib.org>
writes:
I'm curious: which people *don't" need bounds checking?

Those who know what they're doing.

And those who never need to use code developed by somebody else.

I am not responsible for bugs in code I haven't developed myself. And
since any program written for a hosted implementation has to rely on
external code, your point is moot.

BTW, if the bound checker was developed by someone else, how do you know
whether you can rely on it? Maybe your code was actually correct, but
there are buffer overruns in the bound checking code itself...

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #50