469,306 Members | 1,987 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,306 developers. It's quick & easy.

Bounds checking and safety in C

We hear very often in this discussion group that
bounds checking, or safety tests are too expensive
to be used in C.

Several researchers of UCSD have published an interesting
paper about this problem.

http://www.jilp.org/vol9/v9paper10.pdf

Specifically, they measured the overhead of a bounds
checking implementation compared to a normal one, and
found that in some cases the overhead can be reduced
to a mere 8.3% in some cases...

I quote from that paper

< quote >
To summarize, our meta-data layout coupled with meta-check instruction
reduce the average overhead of bounds checking to 21% slowdown which is
a significant reduction when compared to 81% incurred by current
software implementations when providing complete bounds checking.
< end quote>

This 21% slowdown is the overhead of checking EACH POINTER
access, and each (possible) dangling pointer dereference.

If we extrapolate to the alleged overhead of using some extra
arguments to strcpy to allow for safer functions (the "evil
empire" proposal) the overhead should be practically ZERO.

Somehow, we are not realizing that with the extreme power of the
CPUs now at our disposal, it is a very good idea to try to
minimize the time we stay behind the debugger when developing
software. A balance should be sought for improving the safety
of the language without overly compromising the speed of the
generated code.

I quote again from that paper:

< quote >
As high GHZ processors become prevalent, adding hardware support to
ensure the correctness and security of programs will be just as
important, for the average user, as further increases in processor
performance. The goal of our research is to focus on developing
compiler and hardware support for efficiently performing software checks
that can be left on all of the time, even in production code releases,
to provide a signi cant increase in the correctness and security of
software.

< end quote >

The C language, as it is perceived by many people here, seems
frozen in the past without any desire to incorporate the changing
hardware/software relationship into the language itself.

When this issues are raised, the "argument" most often presented is
"Efficiency" or just "it is like that".

This has lead to the language being perceived as a backward and error
prone, only good for outdated software or "legacy" systems.

This pleases again the C++ people, that insist in seeing their language
as the "better C", and obviously, C++ is much better in some ways as
C, specially what string handling/common algorithms in the STL/ and
many other advances.

What strikes me is that this need not be, since C could with minimal
improvements be a much safer and general purpose language than it is
now.

Discussion about this possibility is nearly impossible, since a widely
read forum about C (besides this newsgroup) is non existing.

Hence this message.

To summarize:

o Bounds checking and safer, language supported constructs are NOT
impossible because too much overhead
o Constructs like a better run time library could be implemented in a
much safer manner if we would redesign the library from scratch,
without any effective run time cost.
jacob

P.S. If you think this article is off topic, please just ignore it.
I am tired of this stupid polemics.

Jul 29 '07
125 5659
On Jul 30, 5:40 am, jacob navia <ja...@jacob.remcomp.frwrote:
tell me then, how can I know if a program is
"strictly conforming" then?

In general you cannot. However, examples of strictly
conforming programs can be given and some properties
of stictly conforming programs can be listed.
One property of all stictly conforming programs
is that they do not have bounds violations.

This (theoretical) property is enough to
conclude that a C implementation may
include a bounds checker without the
C standard.

So we have three questions:

i: Q: Is a C implementation allowed to have bounds checking
A: Yes (follows from considerations of stictly conforming
programs)

ii: Q: Is it practical for a C implementation to have bounds checking
A: Yes, examples exist.

iii: Q: Should changes be made to C so that implementing a bounds
checker is easier.
A: Disputed. However, your claim that a lot of people answer "no"
with the justification "bounds checking carries too large of
a performance penalty" is dubious at best.

- William Hughes

Jul 30 '07 #51
William Hughes wrote:
On Jul 30, 5:40 am, jacob navia <ja...@jacob.remcomp.frwrote:
>tell me then, how can I know if a program is
"strictly conforming" then?


In general you cannot. However, examples of strictly
conforming programs can be given and some properties
of stictly conforming programs can be listed.
One property of all stictly conforming programs
is that they do not have bounds violations.

This (theoretical) property is enough to
conclude that a C implementation may
include a bounds checker without the
C standard.

So we have three questions:

i: Q: Is a C implementation allowed to have bounds checking
A: Yes (follows from considerations of stictly conforming
programs)

ii: Q: Is it practical for a C implementation to have bounds checking
A: Yes, examples exist.

iii: Q: Should changes be made to C so that implementing a bounds
checker is easier.
A: Disputed. However, your claim that a lot of people answer "no"
with the justification "bounds checking carries too large of
a performance penalty" is dubious at best.

- William Hughes
That is the justification I hear most often.

The second is the "spirit of C". C is for macho programmers
that do not need bounds checking because they never make mistakes.

And there are others.

I wanted with my post address the first one. Those researchers
prove that a FAST implementation of bounds checking is
feasible even without language support.

I would say that with language support, the task would be much easier
AND much faster, so fast that it could be done at run time without
any crushing overhead.

jacob
Jul 30 '07 #52
JT
On Jul 30, 4:38 pm, jacob navia <ja...@jacob.remcomp.frwrote:
Those researchers
prove that a FAST implementation of bounds checking is
feasible even without language support.
Yes.
I would say that with language support, the task would be much easier
AND much faster, so fast that it could be done at run time without
any crushing overhead.
You "would" say?

What sample of code have you converted into
the new SuperSafeC dialect and then compared
the performance? etc. etc.

(By the way, you might also want to look
at other ways of making C safer.
For example, CCured is a dialect of C
developed at UC Berkeley that forces
the programmer to add suitable
annotation, but then guarantees type safety)

You need to do due diligence and research first.

Jul 30 '07 #53
On Jul 29, 9:49 pm, Keith Thompson <ks...@mib.orgwrote:
Richard Heathfield <r...@see.sig.invalidwrites:
Keith Thompson said:
[...]
On the other hand, there are some presumably valid C constructs that
could break in the presence of bounds checking, such as the classic
"struct hack" and code that assumes two-dimensional arrays can be
accessed as one-dimensional arrays.
Do you have any examples of valid C constructs that are actually valid?

I don't think that's *quite* what you meant.
Both the struct hack and the 2d/1d hack are actually invalid C. And
before you start: the codification of the struct hack in C99 involved a
syntax change, so you can put /that/ Get Out Of Jail Free card back in
the pack! :-)

I actually can't think of any realistic examples off the top of my
head. But there are plenty of programs that aren't strictly
conforming but that a conforming implementation must accept.
Indeed, but this just means that an executable must
be produced.
example, a program that prints the value of INT_MAX is not strictly
conforming.

What I have in mind in general is that a bounds-checking
implementation might make incorrect assumptions about when checks can
be removed or, more relevantly, when they can be proven during
compilation to fail.
Even if there is a conforming program that can be proven duing
compilation
have bounds violations, all this means is that the compiler
may have to produce an executable (A reasonable behaviour
would be to output a warning that there is a bounds violation
and let the dynamic bounds checking stuff deal with it).
With certain sets of assumptions, no strictly
conforming programs would be affected, but some correct programs that
depend on implementation-defined behavior could be.

It seems fairly clear to me that such examples are theoretically
possible. If you're still not convinced, I can try to come up with
something more concrete.
I am not convinced that an example of a conforming program that
could be shown at compile time to violate bounds exists.

- William Hughes
Jul 30 '07 #54
JT wrote:
On Jul 30, 4:38 pm, jacob navia <ja...@jacob.remcomp.frwrote:
>Those researchers
prove that a FAST implementation of bounds checking is
feasible even without language support.

Yes.
>I would say that with language support, the task would be much easier
AND much faster, so fast that it could be done at run time without
any crushing overhead.

You "would" say?

What sample of code have you converted into
the new SuperSafeC dialect and then compared
the performance? etc. etc.
The most obvious example is the development of
length delimited strings.

strlen becomes just a memory read with those strings.
Much faster *and safer* than an UNBOUNDED memory scan!

Other functions like strcat that implicitly call
strlen are FASTER.

I have been promoting this change (without obsoleting
zero terminated strings of course for legacy code)
for several years. Maybe you are new here and did not see
my other posts.

(By the way, you might also want to look
at other ways of making C safer.
For example, CCured is a dialect of C
developed at UC Berkeley that forces
the programmer to add suitable
annotation, but then guarantees type safety)
The annotations chapter is a huge issue in itself.
But I can't say everything in one post please.
You need to do due diligence and research first.
Thanks for the advice but what makes you think I haven't done it?

Jul 30 '07 #55
Richard Heathfield wrote, On 30/07/07 14:32:
Flash Gordon said:

<snip>
>Even more important is what is the implementation going to do if it
detects an out-of-bounds access?

Well, presumably that feature would only be used during debugging.
In that case I would like it to cause the ICE to break the program so it
can be debugged :-)
>I certainly don't want to have to
switch off my car when doing 70MPH in the fast lane of the M25!

You won't have to - the M25 doesn't /have/ a fast lane.
You forget that a program invoking undefined behaviour can cause
anything* to happen! Anyway, I've been known to be driving on the M25
after midnight when it can actually be fast.
--
Flash Gordon
Jul 30 '07 #56
JT
On Jul 30, 4:53 pm, jacob navia <ja...@jacob.remcomp.frwrote:
The most obvious example is the development of
length delimited strings [strlen] [strcat]
I have been promoting this change [..] for several years.
First of all, it was thought of long ago [by OTHER people]
Second of all, its overall efficiency is still debated[*]
Third of all, companies such as Microsoft already REQUIRE it
internally
>
Thanks for the advice but what makes you think I haven't done it?
Because you appear painfully naive in thinking
that you have the answer, and others are just too
stubborn to see it.

* I'll leave the literature search and other
examples for others to cite. One quick example is
the tokenization and manipulation of an input text block.
With NUL-terminated string, you can tokenize
the input buffer in-place (by replacing whitespace
with \0), but with length-denoted strings you need
to always malloc a new space since there's no room
for the length field.

** For a few years now, Microsoft internally requires
EVERY METHOD in their internal code (eg. Office, Windows...)
must always pass a length argument for every variable
length argument. So an internal method that accepts
a string arg must also have an arg that contains its size.

You did NOT come up with the idea.

And researchers at Microsoft have already tried
and evaluated the practice of always storing/passing
the string length AT ALL TIME.

I leave it up to you to look at their whitepaper
(you know, real research) and see what their evaluation
of the performance penalty, programmer productivity penalty,
and percent of bug reduction is.


Jul 30 '07 #57
[snips]

On Mon, 30 Jul 2007 13:08:34 +0200, Richard wrote:
>"A strictly conforming program shall use only those features of
the language and library specified in this Standard. It shall
not produce output dependent on any unspecified, undefined, or
implementation-defined behavior, and shall not exceed any minimum
implementation limit."
So theory then and almost impossible to achieve in real life since it's
impossible to, IMO, to predetermine that your program will react
properly under all variations of potential user input.
Actually, no, it's not all that difficult to achieve. It does require
some skill and some patience, but it's not all that difficult, actually.

Perhaps the most obvious example of this is gets vs fgets. If you use
gets, forget it, your code is virtually guaranteed, at some point, to
invoke UB. Using fgets, on the other hand, allows you to prevent buffer
overflows on input and thus avoid UB as a result of the overflow.
Jul 30 '07 #58
On 30 Jul 2007 16:12:10 GMT, ri*****@cogsci.ed.ac.uk (Richard Tobin)
wrote:
>In article <jn********************************@4ax.com>, a\\/b <al@f.gwrote:
>>>Specifically, they measured the overhead of a bounds
checking implementation compared to a normal one, and
found that in some cases the overhead can be reduced
to a mere 8.3% in some cases...
>>for me it is all wrong
the % of slowness should be 0%

Yes, but unlike you the authors of the paper actually did some research.
i say that a programme that use bound check whould be near 0% slower
than one that not use it
>-- Richard
Jul 30 '07 #59
[snips]

On Mon, 30 Jul 2007 18:38:06 +0200, jacob navia wrote:
The second is the "spirit of C". C is for macho programmers
that do not need bounds checking because they never make mistakes.
Er... no. C programmers can make mistakes, but that's not relevant here;
all that's relevant here is whether an accomplished C programmer is liable
to make significant (UB-inducing) errors as pertain to buffer/array
management - the stuff which boundary checking helps with.

Check the code produced by most of the seasoned regulars around here; see
how much of it is prone to such things. Then stop and consider, of that,
how much makes it past review and into "gold" status.
Jul 30 '07 #60
JT wrote:
On Jul 30, 4:53 pm, jacob navia <ja...@jacob.remcomp.frwrote:
>The most obvious example is the development of
length delimited strings [strlen] [strcat]
I have been promoting this change [..] for several years.

First of all, it was thought of long ago [by OTHER people]
Obvious, did you see any copyright in my message?
Second of all, its overall efficiency is still debated[*]
I presented you with obvious examples.
Third of all, companies such as Microsoft already REQUIRE it
internally
Yes, this is maybe a hint that is not so BAD as it seems.
>Thanks for the advice but what makes you think I haven't done it?

Because you appear painfully naive in thinking
that you have the answer, and others are just too
stubborn to see it.
I am proposing a change in the handling of strings, using
Strcmp, Strlen, and other functions.
I have developed an implementation that I distribute (with the
source code) with my compiler system.

Maybe I am "naive" as you say, but I believe that discussing
this proposal and related items is worthwhile.

* I'll leave the literature search and other
examples for others to cite. One quick example is
the tokenization and manipulation of an input text block.
With NUL-terminated string, you can tokenize
the input buffer in-place (by replacing whitespace
with \0), but with length-denoted strings you need
to always malloc a new space since there's no room
for the length field.
Yes, in that example zero terminated strings take less space
and do not require an allocation.
** For a few years now, Microsoft internally requires
EVERY METHOD in their internal code (eg. Office, Windows...)
must always pass a length argument for every variable
length argument. So an internal method that accepts
a string arg must also have an arg that contains its size.
That is the problem. The programmer should NOT pass
the length. Strings should carry the length field
so the programmers should just pass a string and
do not have to figure out what the length is!

You did NOT come up with the idea.
I feel that the only objective of your posts is to make me feel
bad, as if I was caught in the act of stealing Strings at the
grocery store.
And researchers at Microsoft have already tried
and evaluated the practice of always storing/passing
the string length AT ALL TIME.
Ditto. It is a BAD idea to pass that length.
I leave it up to you to look at their whitepaper
(you know, real research) and see what their evaluation
of the performance penalty, programmer productivity penalty,
and percent of bug reduction is.
Maybe I have time to look at their research. Since it will be
long to search, I would suggest you stop this games and
give your reference.

I gave the reference of the article I cited.
Jul 30 '07 #61
JT wrote:
On Jul 30, 4:53 pm, jacob navia <ja...@jacob.remcomp.frwrote:
>The most obvious example is the development of
length delimited strings [strlen] [strcat]
I have been promoting this change [..] for several years.

First of all, it was thought of long ago [by OTHER people]
Second of all, its overall efficiency is still debated[*]
Third of all, companies such as Microsoft already REQUIRE it
internally
[ ... ]
** For a few years now, Microsoft internally requires
EVERY METHOD in their internal code (eg. Office, Windows...)
must always pass a length argument for every variable
length argument. So an internal method that accepts
a string arg must also have an arg that contains its size.

You did NOT come up with the idea.
Neither did Microsoft. The concept of length delimited strings is many
decades old.
Jul 30 '07 #62
[snips]

On Mon, 30 Jul 2007 01:41:25 +0200, jacob navia wrote:
>No, it doesn't say that. There is nothing to stop a tool vendor
providing some form of access and leak checking tool. From my
perspective, such a feature is an indication of the quality of their tools.

Yes, there are a lot of tools, and their existence PROVES the gaping
hole in the language.
Thus Intercal's "come from" proves the need of such a construct? I think
not.

What it proves, if anything, is that diagnostic tools are useful
particularly in the early development phase.
Jul 30 '07 #63
On Mon, 30 Jul 2007 14:06:12 +0200, Richard wrote:
rl*@hoekstra-uitgeverij.nl (Richard Bos) writes:
>Bjoern Vian <Bj*********@gmx.liwrote:
>>Richard Heathfield schrieb:

I actually said "strictly conforming program". A strictly conforming
program does not contain any instances of undefined behaviour. (If it
did, it would not be strictly conforming.) Therefore, it cannot violate
any bounds.

Ok, but that is completely irrelevant for programming practice;
it's pure theory.

No, it isn't. If it _were_ possible for a strictly conforming program to
violate object bounds, a bounds checking implementation would be
legal.

So "if conforming program violates" then bounds checking is legal.
Correct, as if it violates object bounds, it invokes undefined behaviour
and thus *any* result is allowable.
>Since it is _not_ possible, a bounds checking implementation is legal
and, get this, on occasion very _practical_ to discover where your

Since "its not possible for conforming program to violate", checking is
legal.
Correct, because a strictly conforming program *cannot* tell the
difference between a bounds-checking implementation and one that does not
do bounds-checking.
I do have my thick head on. I don't understand anything of the past few
points about "conforming" programs. It sounds like a load of mumbo
jumbo.
It is not terribly difficult, once you understand the import of undefined
behaviour.
Jul 30 '07 #64
Kelsey Bjarnason wrote:
[snips]

On Mon, 30 Jul 2007 13:08:34 +0200, Richard wrote:
>>"A strictly conforming program shall use only those features of
the language and library specified in this Standard. It shall
not produce output dependent on any unspecified, undefined, or
implementation-defined behavior, and shall not exceed any minimum
implementation limit."
>So theory then and almost impossible to achieve in real life since it's
impossible to, IMO, to predetermine that your program will react
properly under all variations of potential user input.

Actually, no, it's not all that difficult to achieve. It does require
some skill and some patience, but it's not all that difficult, actually.

Perhaps the most obvious example of this is gets vs fgets. If you use
gets, forget it, your code is virtually guaranteed, at some point, to
invoke UB. Using fgets, on the other hand, allows you to prevent buffer
overflows on input and thus avoid UB as a result of the overflow.
I think he was talking about whether a given program, that took user input,
could be determined as strictly conforming or not. Not just the I/O
functions, but the entire program.

Jul 30 '07 #65
[snips]

On Mon, 30 Jul 2007 13:19:10 +1200, Ian Collins wrote:
>Impossible to use because the program will slow down for a factor
of 1,000 at least...
It's no where near that bad. Yes there is a performance penalty, but
this can be mitigated by only applying the full set of checks to
selected parts of the application.
Ah, so the parts where you used strcpy safely you can skip, but the parts
where you didn't use it safely, you should bounds-check?

Seems silly to me; if you suspect a particular piece of code of needing
such hand-holding, fix the code. If you simply don't know, then why do
you assume some parts are safe and others not?

As to speed...

void mycpy( char *d, char *s )
{
while( *s )
*d++ = *s++

....
}

How's your bounds-checking implementation going to handle that? Is it
going to try to figure out where d was defined (or allocated), track the
space involved... okay, assume it does, then what? Calculate length of s,
add to start of d, compare to size of buffer and go "Okay, carry on"?
Or is it going to trap the pointer increments and compare them to the
buffer?

One involves a single set of calculations and almost zero impact; the
other involves potentially tens of thousands of operations and may well
have serious impact. If I - as the coder - am particularly worried about
this, I can pretty trivially code in the passing of the buffer sizes and a
singular up-front check which tells me whether I have enough room or not.

So how does your implementation work to avoid the test-ever-pointer-op
mode and the subsequent performance hit?
Jul 30 '07 #66
Kelsey Bjarnason wrote:
[snips]

On Mon, 30 Jul 2007 13:19:10 +1200, Ian Collins wrote:
>>Impossible to use because the program will slow down for a factor
of 1,000 at least...
It's no where near that bad. Yes there is a performance penalty, but
this can be mitigated by only applying the full set of checks to
selected parts of the application.

Ah, so the parts where you used strcpy safely you can skip, but the parts
where you didn't use it safely, you should bounds-check?

Seems silly to me; if you suspect a particular piece of code of needing
such hand-holding, fix the code. If you simply don't know, then why do
you assume some parts are safe and others not?

As to speed...

void mycpy( char *d, char *s )
{
while( *s )
*d++ = *s++

....
}

How's your bounds-checking implementation going to handle that? Is it
going to try to figure out where d was defined (or allocated), track the
space involved... okay, assume it does, then what? Calculate length of s,
add to start of d, compare to size of buffer and go "Okay, carry on"?
Or is it going to trap the pointer increments and compare them to the
buffer?

One involves a single set of calculations and almost zero impact; the
other involves potentially tens of thousands of operations and may well
have serious impact. If I - as the coder - am particularly worried about
this, I can pretty trivially code in the passing of the buffer sizes and a
singular up-front check which tells me whether I have enough room or not.

So how does your implementation work to avoid the test-ever-pointer-op
mode and the subsequent performance hit?
Please read the article I mentioned in my first message.
There it is explained how to do it precisely.

Jul 30 '07 #67
jacob navia wrote, On 30/07/07 15:42:
Ian Collins wrote:
>jacob navia wrote:
>>Ian Collins wrote:
<snip>
>>>>What I am aiming at is a general language construct that would
allow more easy compiler checking in the existing toolset,
i.e. the compiler and the linker.
Why impost the extra burden on the vendor when they can provides
optinal
tools?

Because not every vendor can provide such tools, and a small language
modifications would suffice to provide for most bound
checking applications.

But they are not required everywhere. On many embedded targets, there
simply would not be space for the extra code.

Maybe, maybe not, it depends. In any case nobody is advocating making
it mandatory.

The real problem behind this is the difficulty of the standard library
since if you store the meta data with the object, the object layout
changes slightly and all library routines not compiled with bounds
checking will not work.
So do what MS do and build different versions of the libraries for
different purposes.
That is why a standard procedure would much better.
The "standard" procedure would be the same as the procedure for
selecting any other compiler mode and version of the C library.
It would be possible to build compatible libraries.
So why complain when you already know the solution? Since the C standard
does not specify how to do linking why should it specify how to enable
bounds checking?

<OT>
MS already have a system for specifying calling conventions, they also
have (or had last I looked) different versions of the C library for use
with different build options.
</OT>
--
Flash Gordon
Jul 30 '07 #68
William Hughes wrote:
[snip]
I use a common approach, use bounds
checking tools for development and remove
them for production.
What a BRIGHT idea.

So, in production, when your code needs the most
security you have none.

You say:

A factor of ten slowdown is no problem, since I use it only
when debugging.

The objective is to use it at runtime since the speed penalty is
not great.

Another point is that you do not explain why the counted strings
would not work well.

In my implementation:

char *str = (char *)String;

and that is it.

It is designed to have almost 100% compatibility with the old zero
terminated strings.
Jul 30 '07 #69
rl*@hoekstra-uitgeverij.nl (Richard Bos) writes:
Bjoern Vian <Bj*********@gmx.liwrote:
>Richard Heathfield schrieb:
I actually said "strictly conforming program". A strictly conforming
program does not contain any instances of undefined behaviour. (If it
did, it would not be strictly conforming.) Therefore, it cannot violate
any bounds.

Ok, but that is completely irrelevant for programming practice;
it's pure theory.

No, it isn't. If it _were_ possible for a strictly conforming program to
violate object bounds, a bounds checking implementation would be legal.
You mean "would be illegal", of course.
Since it is _not_ possible, a bounds checking implementation is legal
and, get this, on occasion very _practical_ to discover where your
program is not strictly conforming in a bounds-violating way.
As I've mentioned, an implementation that correctly handles all
strictly conforming programs is not necessarily a conforming
implementation. The standard requires more of implementations than
that. An implementation that rejects or mishandles this:

#include <limits.h>
#include <stdio.h>
int main(void)
{
printf("INT_MAX = %d\n", INT_MAX);
return 0;
}

is not conforming.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jul 30 '07 #70
On Jul 30, 2:44 pm, jacob navia <ja...@jacob.remcomp.frwrote:
Another point is that you do not explain why the counted strings
would not work well.

In my implementation:

char *str = (char *)String;

and that is it.

It is designed to have almost 100% compatibility with the old zero
terminated strings.
No you have a simple way of converting from the new
strings to the old strings. In particular, you can use
the new string as an argument to a function call
that expects and old string.
This is nice but it it a long way from 100% compatibility.

For example, strings and Strings do not have the same
size. Consider a set of strings
in an array. With the old zero terminated strings
I can find the next string by next_string = string_var +
strlen(string_var) + 1
This will not work with the new strings. Yes the fix is
simple but a fix is needed.

Another example. Suppose I have a String_var of length 100.
I pass it to my error output routine, during
the function call String_var is converted to string_var.
The first thing that happens is I check the length
of the string and if necessary I truncate to length
30 by string_var[30]=\0. String_var now contains an embeded null
so strlen((char*)String_var) and new_Strlen(String_var) are different.

Will these be a problem in real programs? I don't know.
(though I have seen both lists of strings and trunctation)
Are there other similar problems? I don't know.
In maintaining a program
you would have to stick with the old strings unless you were
*SURE* that converting will not cause a problem or that
you had made all the fixes necessary.
In practice you will end up with two parallel string
implemenations.

- William Hughes

Jul 30 '07 #71
jacob navia <ja***@jacob.remcomp.frwrites:
[...]
I never said that I wanted to make zero terminated strings illegal.
Good.
I just propose that OTHER types of strings could be as well supported by
the language, nothing else.
Other types of strings can already be well supported by the language.
The only current limitation is that some operations require a bit more
syntax than you might like. For example, if you have a type String
that's really a structure, you can't use a cast to 'char*' to convert
a String to a classic C string -- but if one of the members is a char*
pointing to a C string, you can just use something like "obj.str".
You can't use string literals for String values, but you can use a
function call.

For example:

...
String s = Str("hello");
s = append(s, Str(", world"));
printf("%s\n", s.str);
...

If you want the convenience of using string literals and so forth, I
think that only a few minor changes to the language would be required.
I haven't thought this through, but I suspect that most or all of
these changes could be implemented as conforming extensions (i.e.,
extensions that don't alter the behavior of any strictly conforming
code; see C99 4p6). Any programs that depend on such extensions would
of course be restricted to implementations that support them, but it
could be the first step in establishing existing practice and possibly
getting the extensions adopted in a future C standard.

Incidentally, depending on how this hypothetical String type is
implemented, aliasing could be an issue. For example;

String s1, s2;
s1 = "hello";
s2 = s1;
s2.str[0] = 'j';

s2 is now equal to str("jello"). Is s1 equal to str("jello"), or to
str("hello")? In other words, does assignment of Strings copy the
entire string value, or does it just create a new reference to the
same string value?

Certainly classic C strings have the same issue, but there's bound to
be considerable work to be done in deciding how a new (standard?)
String type will deal with it.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jul 30 '07 #72
William Hughes <wp*******@hotmail.comwrites:
On Jul 29, 9:49 pm, Keith Thompson <ks...@mib.orgwrote:
>Richard Heathfield <r...@see.sig.invalidwrites:
[...]
>I actually can't think of any realistic examples off the top of my
head. But there are plenty of programs that aren't strictly
conforming but that a conforming implementation must accept.

Indeed, but this just means that an executable must
be produced.
And often that it must execute correctly. For example:

#include <limits.h>
#include <stdio.h>
int main(void)
{
printf("INT_MAX = %d\n", INT_MAX);
return 0;
}

This example isn't relevant to bounds checking, but it is an example
of a non-strictly-conforming program that must compile and execute
correctly under any conforming implementation.

See C99 4p3:

A program that is correct in all other aspects, operating on
correct data, containing unspecified behavior shall be a correct
program and act in accordance with 5.1.2.3.
>example, a program that prints the value of INT_MAX is not strictly
conforming.

What I have in mind in general is that a bounds-checking
implementation might make incorrect assumptions about when checks can
be removed or, more relevantly, when they can be proven during
compilation to fail.

Even if there is a conforming program that can be proven duing
compilation
have bounds violations, all this means is that the compiler
may have to produce an executable (A reasonable behaviour
would be to output a warning that there is a bounds violation
and let the dynamic bounds checking stuff deal with it).
>With certain sets of assumptions, no strictly
conforming programs would be affected, but some correct programs that
depend on implementation-defined behavior could be.

It seems fairly clear to me that such examples are theoretically
possible. If you're still not convinced, I can try to come up with
something more concrete.

I am not convinced that an example of a conforming program that
could be shown at compile time to violate bounds exists.
That wasn't my claim. A program that can be shown at compile time to
violate bounds invokes undefined behavior, so it's neither strictly
conforming (C99 4p5) nor "correct (C99 4p3). An implementation can do
anything it likes with such a program.

My argument is that an bounds-checking implementation that doesn't
affect any strictly conforming program (except perhaps for
performance), but that does break some "correct" programs (i.e.,
programs that do not invoke UB but that depend on unspecified
behavior) is not a conforming implementation. In other words, it's
not the effect on strictly conforming programs we have to worry about;
it's the effect on the much larger set of "correct" programs.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jul 30 '07 #73
In article <h3********************************@4ax.com>, a\\/b <al@f.gwrote:
>>>>Specifically, they measured the overhead of a bounds
checking implementation compared to a normal one, and
found that in some cases the overhead can be reduced
to a mere 8.3% in some cases...
>>>for me it is all wrong
the % of slowness should be 0%
>>Yes, but unlike you the authors of the paper actually did some research.
>i say that a programme that use bound check whould be near 0% slower
than one that not use it
Yes, we know you say it, but we don't take any notice of you because you
just make it up. If you can do bounds checking in C at zero cost, then
do it and we'll all be impressed. But we won't be holding our breath.

I believe the Americans have a phrase "put up or shut up" which seems
to fit.

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
Jul 30 '07 #74
I've quoted jacob's entire article for a reason. Skip down to see my
comments.

jacob navia <ja***@jacob.remcomp.frwrites:
We hear very often in this discussion group that
bounds checking, or safety tests are too expensive
to be used in C.

Several researchers of UCSD have published an interesting
paper about this problem.

http://www.jilp.org/vol9/v9paper10.pdf

Specifically, they measured the overhead of a bounds
checking implementation compared to a normal one, and
found that in some cases the overhead can be reduced
to a mere 8.3% in some cases...

I quote from that paper

< quote >
To summarize, our meta-data layout coupled with meta-check instruction
reduce the average overhead of bounds checking to 21% slowdown which is
a significant reduction when compared to 81% incurred by current
software implementations when providing complete bounds checking.
< end quote>

This 21% slowdown is the overhead of checking EACH POINTER
access, and each (possible) dangling pointer dereference.

If we extrapolate to the alleged overhead of using some extra
arguments to strcpy to allow for safer functions (the "evil
empire" proposal) the overhead should be practically ZERO.

Somehow, we are not realizing that with the extreme power of the
CPUs now at our disposal, it is a very good idea to try to
minimize the time we stay behind the debugger when developing
software. A balance should be sought for improving the safety
of the language without overly compromising the speed of the
generated code.

I quote again from that paper:

< quote >
As high GHZ processors become prevalent, adding hardware support to
ensure the correctness and security of programs will be just as
important, for the average user, as further increases in processor
performance. The goal of our research is to focus on developing
compiler and hardware support for efficiently performing software checks
that can be left on all of the time, even in production code releases,
to provide a signi cant increase in the correctness and security of
software.

< end quote >

The C language, as it is perceived by many people here, seems
frozen in the past without any desire to incorporate the changing
hardware/software relationship into the language itself.

When this issues are raised, the "argument" most often presented is
"Efficiency" or just "it is like that".

This has lead to the language being perceived as a backward and error
prone, only good for outdated software or "legacy" systems.

This pleases again the C++ people, that insist in seeing their language
as the "better C", and obviously, C++ is much better in some ways as
C, specially what string handling/common algorithms in the STL/ and
many other advances.

What strikes me is that this need not be, since C could with minimal
improvements be a much safer and general purpose language than it is
now.

Discussion about this possibility is nearly impossible, since a widely
read forum about C (besides this newsgroup) is non existing.

Hence this message.

To summarize:

o Bounds checking and safer, language supported constructs are NOT
impossible because too much overhead
o Constructs like a better run time library could be implemented in a
much safer manner if we would redesign the library from scratch,
without any effective run time cost.
jacob

P.S. If you think this article is off topic, please just ignore it.
I am tired of this stupid polemics.
jacob, the paper does look interesting. It's 26 pages long, and I
haven't yet been able to set aside the time to read the whole thing,
but it's on my to-do list.

This is *in spite of* your article recommending it. You spent most of
your article attempting to refute arguments that nobody has actually
made. And in the ensuing discussion you have ignored comments
demonstrating that bounds-checking can already be implemented without
changing the language at all.

A great many of the responses, including some from me and from Richard
Heathfield, have been *in support of* the idea of bounds checking in C.
We could have had an interesting and useful discussion if you hadn't
assumed throughout that we're all out to get you. If you attempt to
conduct both sides of a flame war yourself, it shouldn't be surprising
that the result is a flame war.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jul 30 '07 #75
jacob navia wrote:
Ian Collins wrote:
>jacob navia wrote:
>>Ian Collins wrote:
o They are NOT available in many embedded platforms
You can test the code off-target.
No. The code will run on target and you would have to
simulate the conditions on target, not always an easy task,
mind you.
Well I still support a couple of embedded projects and I can't remember
the last time I did any debugging on the targets. Everything is
developed and unit tested on the host, even the acceptance test suite
can run against both the target and the host simulation.


In most cases this is a recipe for disaster.
Not if you know what you are doing and understand the limitations of the
simulation.
Only the most perfect emulators can REALLY
reproduce 100% of the features of the target, either
because it is too slow/too fast for the same real
time conditions, either because the simulated input stream
doesn't correspond with the actual input stream 100%,
and a thousand of other reasons.

The emulator is never the REAL THING, it is a an emulator!
That doesn't matter when you are unit testing code. The unit test
framework is as good a place as any to run any bounds checking.
>
OK. Then you would agree with me that this feature

#pragma STDC_BOUNDS_CHECK(ON)

would be much better since it wouldn't be constrained
to just dbx...
Not really, you loose the ability to select what you want to check at
run time.

--
Ian Collins.
Jul 30 '07 #76
Kelsey Bjarnason wrote:
[snips]

On Mon, 30 Jul 2007 13:19:10 +1200, Ian Collins wrote:
>>Impossible to use because the program will slow down for a factor
of 1,000 at least...
It's no where near that bad. Yes there is a performance penalty, but
this can be mitigated by only applying the full set of checks to
selected parts of the application.

Ah, so the parts where you used strcpy safely you can skip, but the parts
where you didn't use it safely, you should bounds-check?

Seems silly to me; if you suspect a particular piece of code of needing
such hand-holding, fix the code. If you simply don't know, then why do
you assume some parts are safe and others not?
I think you miss the point.

Assuming you build you application from a set of libraries, you don't
have to bounds check every library every run.

--
Ian Collins.
Jul 30 '07 #77
Ian Collins <ia******@hotmail.comwrites:
Kelsey Bjarnason wrote:
>[snips]
On Mon, 30 Jul 2007 13:19:10 +1200, Ian Collins wrote:
>>>Impossible to use because the program will slow down for a factor
of 1,000 at least...

It's no where near that bad. Yes there is a performance penalty, but
this can be mitigated by only applying the full set of checks to
selected parts of the application.

Ah, so the parts where you used strcpy safely you can skip, but the parts
where you didn't use it safely, you should bounds-check?

Seems silly to me; if you suspect a particular piece of code of needing
such hand-holding, fix the code. If you simply don't know, then why do
you assume some parts are safe and others not?
I think you miss the point.

Assuming you build you application from a set of libraries, you don't
have to bounds check every library every run.
Why not? Well, I agree that you don't *have* to, but there could be
some benefit in doing so. Testing can never (well, hardly ever) be
100% exhaustive. The fact that you've thoroughly tested your
application and/or library with bounds checking doesn't necessarily
mean that no bounds errors are possible during production runs.

The usefulness of bounds checking in production code depends on what
happens when a check fails. If a failed check causes the application
to terminate immediately, that might or might not be better than
allowing the application to continue running; it depends very much on
the context in which the application is used. If it allows the
application to catch the error, perhaps via some sort of exception
handling mechanism, then it could be advantageous *if* the
exception-handling code is correct.

Also, an application with bounds checking and the same application
without bounds checking are, in a sense, two different applications,
and *both* should be tested just as thoroughly.

If bounds checking were completely free of cost, I might advocate
requiring it in the language. If it always caused code to be slower
by a factor of 10, I wouldn't suggest it except during testing or in
safety-critical code. The truth is somewhere in between.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jul 30 '07 #78

"jacob navia" <ja***@jacob.remcomp.frwrote in message
news:46***********************@news.orange.fr...
JT wrote:
>On Jul 30, 4:38 pm, jacob navia <ja...@jacob.remcomp.frwrote:
>>Those researchers
prove that a FAST implementation of bounds checking is
feasible even without language support.

Yes.
>>I would say that with language support, the task would be much easier
AND much faster, so fast that it could be done at run time without
any crushing overhead.

You "would" say?

What sample of code have you converted into
the new SuperSafeC dialect and then compared
the performance? etc. etc.

The most obvious example is the development of
length delimited strings.

strlen becomes just a memory read with those strings.
Much faster *and safer* than an UNBOUNDED memory scan!

Other functions like strcat that implicitly call
strlen are FASTER.

I have been promoting this change (without obsoleting
zero terminated strings of course for legacy code)
for several years. Maybe you are new here and did not see
my other posts.

>(By the way, you might also want to look
at other ways of making C safer.
For example, CCured is a dialect of C
developed at UC Berkeley that forces
the programmer to add suitable
annotation, but then guarantees type safety)

The annotations chapter is a huge issue in itself.
But I can't say everything in one post please.
>You need to do due diligence and research first.

Thanks for the advice but what makes you think I haven't done it?
why not generalise it more and add an "array" datatype instead of a string
datatype. Strings are only useful for well strings. And you'd have to add
wchar_t * strings to with their casts and support functions. Much better to
have an array type then that knows its size and strings can be implemented
on top of the array type.
Jul 30 '07 #79
Serve Lau wrote:
"jacob navia" <ja***@jacob.remcomp.frwrote in message
news:46***********************@news.orange.fr...
>JT wrote:
>>On Jul 30, 4:38 pm, jacob navia <ja...@jacob.remcomp.frwrote:
Those researchers
prove that a FAST implementation of bounds checking is
feasible even without language support.
Yes.

I would say that with language support, the task would be much easier
AND much faster, so fast that it could be done at run time without
any crushing overhead.
You "would" say?

What sample of code have you converted into
the new SuperSafeC dialect and then compared
the performance? etc. etc.
The most obvious example is the development of
length delimited strings.

strlen becomes just a memory read with those strings.
Much faster *and safer* than an UNBOUNDED memory scan!

Other functions like strcat that implicitly call
strlen are FASTER.

I have been promoting this change (without obsoleting
zero terminated strings of course for legacy code)
for several years. Maybe you are new here and did not see
my other posts.

>>(By the way, you might also want to look
at other ways of making C safer.
For example, CCured is a dialect of C
developed at UC Berkeley that forces
the programmer to add suitable
annotation, but then guarantees type safety)
The annotations chapter is a huge issue in itself.
But I can't say everything in one post please.
>>You need to do due diligence and research first.
Thanks for the advice but what makes you think I haven't done it?

why not generalise it more and add an "array" datatype instead of a string
datatype. Strings are only useful for well strings. And you'd have to add
wchar_t * strings to with their casts and support functions. Much better to
have an array type then that knows its size and strings can be implemented
on top of the array type.

Obvious, but that is *much* more complicated.
I programmed a generalized array that knows the size of the stored
elements too.

But needs finishing. I am planning a general array package,
with optimized array operations.

jacob
Jul 30 '07 #80
Keith Thompson wrote:
Ian Collins <ia******@hotmail.comwrites:
>>
Assuming you build you application from a set of libraries, you don't
have to bounds check every library every run.

Why not? Well, I agree that you don't *have* to, but there could be
some benefit in doing so. Testing can never (well, hardly ever) be
100% exhaustive. The fact that you've thoroughly tested your
application and/or library with bounds checking doesn't necessarily
mean that no bounds errors are possible during production runs.
I agree, there is nothing to stop something from passing a bad pointer
to a tested library, or even the standard library.

My comment and practice is based on past experience, bounds errors tend
to show up in the user code that originates them so selective testing
during development is useful. One of the reasons I build applications
from a set of dynamic libraries is to make the access checking easier
(there used to be problems on Sparc with modules over a certain size)
and faster, a case of the tools shaping the process. I would use the
feature more if it had less of a performance hit (bear in mind that this
tool also performs access checking).
The usefulness of bounds checking in production code depends on what
happens when a check fails. If a failed check causes the application
to terminate immediately, that might or might not be better than
allowing the application to continue running; it depends very much on
the context in which the application is used. If it allows the
application to catch the error, perhaps via some sort of exception
handling mechanism, then it could be advantageous *if* the
exception-handling code is correct.
The only time I have used bounds checking in production code was an
embedded product based on the 386, I used a local description table
entry for each allocation. This deferred all of the checking to the
hardware, any out of bounds (including use of a freed pointer) access
resulted in a trap and reboot. We decided that any out of bounds access
would leave the system in an unsafe state.

--
Ian Collins.
Jul 30 '07 #81
In article <46***********************@news.orange.fr>,
jacob navia <ja***@jacob.remcomp.frwrote:
>
Consider this program:
int fn(int *p,int c)
{
return p[c];
}

int main(void)
{
int tab[3];

int s = fn(tab,3);
}

Please tell me a compiler system where this program generates an
exception.
gcc -fmudflap

(If optimizing, the whole thing becomes a noop, but adding "return s;" at the
end of main takes care of that.)

--
Alan Curry
pa****@world.std.com
Jul 30 '07 #82
On Jul 30, 4:45 pm, Keith Thompson <ks...@mib.orgwrote:
William Hughes <wpihug...@hotmail.comwrites:
On Jul 29, 9:49 pm, Keith Thompson <ks...@mib.orgwrote:
Richard Heathfield <r...@see.sig.invalidwrites:
[...]
I actually can't think of any realistic examples off the top of my
head. But there are plenty of programs that aren't strictly
conforming but that a conforming implementation must accept.
Indeed, but this just means that an executable must
be produced.

And often that it must execute correctly. For example:

#include <limits.h>
#include <stdio.h>
int main(void)
{
printf("INT_MAX = %d\n", INT_MAX);
return 0;

}

This example isn't relevant to bounds checking, but it is an example
of a non-strictly-conforming program that must compile and execute
correctly under any conforming implementation.

See C99 4p3:

A program that is correct in all other aspects, operating on
correct data, containing unspecified behavior shall be a correct
program and act in accordance with 5.1.2.3.
example, a program that prints the value of INT_MAX is not strictly
conforming.
What I have in mind in general is that a bounds-checking
implementation might make incorrect assumptions about when checks can
be removed or, more relevantly, when they can be proven during
compilation to fail.
Even if there is a conforming program that can be proven duing
compilation
have bounds violations, all this means is that the compiler
may have to produce an executable (A reasonable behaviour
would be to output a warning that there is a bounds violation
and let the dynamic bounds checking stuff deal with it).
With certain sets of assumptions, no strictly
conforming programs would be affected, but some correct programs that
depend on implementation-defined behavior could be.
It seems fairly clear to me that such examples are theoretically
possible. If you're still not convinced, I can try to come up with
something more concrete.
I am not convinced that an example of a conforming program that
could be shown at compile time to violate bounds exists.

That wasn't my claim. A program that can be shown at compile time to
violate bounds invokes undefined behavior, so it's neither strictly
conforming (C99 4p5) nor "correct (C99 4p3). An implementation can do
anything it likes with such a program.

My argument is that an bounds-checking implementation that doesn't
affect any strictly conforming program (except perhaps for
performance), but that does break some "correct" programs (i.e.,
programs that do not invoke UB but that depend on unspecified
behavior) is not a conforming implementation. In other words, it's
not the effect on strictly conforming programs we have to worry about;
it's the effect on the much larger set of "correct" programs.
Yes, I concede the point.

Do you have an example of a "correct" program that has a
bounds violation?

The examples you gave, a two dimensional array accessed as a one
dimensional
array and the struct hack, are examples of undefined behaviour so are
not part of a "correct" program.

- William Hughes


Jul 30 '07 #83
William Hughes <wp*******@hotmail.comwrites:
On Jul 30, 4:45 pm, Keith Thompson <ks...@mib.orgwrote:
[...]
>My argument is that an bounds-checking implementation that doesn't
affect any strictly conforming program (except perhaps for
performance), but that does break some "correct" programs (i.e.,
programs that do not invoke UB but that depend on unspecified
behavior) is not a conforming implementation. In other words, it's
not the effect on strictly conforming programs we have to worry about;
it's the effect on the much larger set of "correct" programs.

Yes, I concede the point.

Do you have an example of a "correct" program that has a
bounds violation?
By definition, no. A program that has a bounds violation invokes
undefined behavior, and is therefore not "correct".

Imagine, though, a hypothetical bounds-checking implementation that
operates on the *assumption* that the code being processed is strictly
conforming. This could be an unintentional implicit assumption;
perhaps the folks who wrote the bounds-checking subsystem didn't think
to deal with unspecified behavior, and were too aggressive in their
assumptions. Such an implementation would correctly handle any
strictly conforming program, but could break some correct programs.

I didn't mean to suggest that this kind of thing is likely to be a
concern in the real world (and perhaps I haven't been sufficiently
clear on that point). I merely meant to point out that a
bounds-checking implementation must work correctly for all *correct*
programs, not just for all strictly conforming programs.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jul 30 '07 #84
we******@gmail.com wrote:
On Jul 29, 4:47 pm, William Hughes <wpihug...@hotmail.comwrote:
>On Jul 29, 7:17 pm, Ian Collins <ian-n...@hotmail.comwrote:
>>There are alternatives to C if you want performance and
better memory safety.
I am not sure what you are saying here. Are you claiming
that among the existing implementations there is an
implementation of another language that gives you
performance and better memory safety than any existing
implementation of C, or are you claiming that there is
another language which gives performance and better
memory safety than any possible implemntation of C (in
this case is the claim that the performance is comparable
to that of C)? Or do you mean something else?

Well, whichever of those he means, he's just wrong. C (and C++) enjoy
a very solitary place as fast practical low level languages, which are
really not generally matched in performance by any widely usable
language except sometimes in narrow applications.
I didn't spot William's post, but I was referring to C++ as the
alternative. There you have the option of the raw C style seat of your
pants style of programming or through another layer of indirection, the
Pascal bounds checking style. The choice is up to the developer.

--
Ian Collins.
Jul 31 '07 #85
On 2007-07-29 23:21, Guillaume <"grsNOSPAM at NOTTHATmail dot com"wrote:
Bounds checking is nice and all, but it certainly is no panacea.
It may even not be *that* useful IMO. Here is why:

1. No bounds checking. You read or write data outside bounds. It
generates an exception.
This IS bounds checking.
2. Bounds checking. You read or write data outside bounds. It generates
an 'out of bounds' exception.

Not that much different.
Right. There is much difference between bounds checking and bounds
checking.
(All implementations where it doesn't always generate an exception, or
worse, where it can lead to code execution, is brain-dead IMO, but
that's another story. Thus, it's not a problem of bounds checking or
not.)
But it is. If a bounds violation doesn't generate an exception, the
implementation obviously doesn't do bounds checking. If a bounds
violation does generate an exception the implementation does check
bounds (at least in some cases).

hp

--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hj*@hjp.at |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"
Jul 31 '07 #86
[snips]

On Mon, 30 Jul 2007 20:44:56 +0200, jacob navia wrote:
A factor of ten slowdown is no problem, since I use it only when
debugging.

The objective is to use it at runtime since the speed penalty is not
great.
An impact of almost 10% would be intolerable in many situations, thank you
very much.
In my implementation:

char *str = (char *)String;
Oh goody - modifiable, directly in the object. Now you have to trap every
single pointer operation I might ever choose to do to ensure I don't
modify, say, the length. Or free the buffer. Or whatever.
Jul 31 '07 #87
On Mon, 30 Jul 2007 01:37:09 +0200, jacob navia wrote:
Guillaume wrote:
>Bounds checking is nice and all, but it certainly is no panacea.
It may even not be *that* useful IMO. Here is why:

1. No bounds checking. You read or write data outside bounds. It
generates an exception.

NO, in most cases writing beyond a variable's specified length doesn't
produce any exception.

Consider this program:
int fn(int *p,int c)
{
return p[c];
}

int main(void)
{
int tab[3];

int s = fn(tab,3);
}

Please tell me a compiler system where this program generates an
exception.
$gcc -fmudflap -lmudflap test.c
$./a.out
*******
mudflap violation 1 (check/read): time=1185910311.158143 ptr=0xbfe5f050 size=4
pc=0xb7e9f20d location=`test.c:3 (fn)'
/usr/lib/libmudflap.so.0(__mf_check+0x3d) [0xb7e9f20d]
./a.out(fn+0x80) [0x80487d4]
./a.out(main+0x47) [0x8048826]
Nearby object 1: checked region begins 1B after and ends 4B after
mudflap object 0x80cb110: name=`test.c:8 (main) tab'
bounds=[0xbfe5f044,0xbfe5f04f] size=12 area=stack check=0r/0w liveness=0
alloc time=1185910311.158136 pc=0xb7e9ec4d
number of nearby objects: 1
$
Jul 31 '07 #88
On Tue, 31 Jul 2007 09:16:15 +1200, Ian Collins wrote:
Kelsey Bjarnason wrote:
>[snips]

On Mon, 30 Jul 2007 13:19:10 +1200, Ian Collins wrote:
>>>Impossible to use because the program will slow down for a factor
of 1,000 at least...

It's no where near that bad. Yes there is a performance penalty, but
this can be mitigated by only applying the full set of checks to
selected parts of the application.

Ah, so the parts where you used strcpy safely you can skip, but the parts
where you didn't use it safely, you should bounds-check?

Seems silly to me; if you suspect a particular piece of code of needing
such hand-holding, fix the code. If you simply don't know, then why do
you assume some parts are safe and others not?
I think you miss the point.

Assuming you build you application from a set of libraries, you don't
have to bounds check every library every run.
You don't? So one can safely assume that a library which has worked in
three cases will continue to work in 300 more, never exposing a bug of
this sort?
Jul 31 '07 #89
On 2007-07-30 10:06, jacob navia <ja***@jacob.remcomp.frwrote:
dan wrote:
>if you read their conclusions carefully. You totally missed the point
that the authors said hardware changes would be needed to make the
bounds checking not use excessive resources.

They do not said that. They say that a hardware support would be
better to have, but that other things like storing the meta-data
with the object instead of in the pointer would speed things
according to their simulations.
From the conclusion:

| storing the required meta-data with the object scales better in terms of
| performance. Incorporating both bounds and dangling pointer checks using
| this approach results in an average slowdown of 63.9%.
| This slowdown is still too large for the checks to be used in released
| software. We therefore propose an ISA and architecture extension using
| the meta-check instruction.

What is unclear about "This slowdown is still too large for the checks
to be used in released software"? The authors are clearly of the opinion
that the speedup from using OMD instead of PMD is not enough. (I don't
share that opinion: For most software an overhead of 63.9% won't matter
at all).

hp
--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hj*@hjp.at |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"
Jul 31 '07 #90
Kelsey Bjarnason <kb********@gmail.comwrites:
[snips]
On Mon, 30 Jul 2007 20:44:56 +0200, jacob navia wrote:
>In my implementation:

char *str = (char *)String;

Oh goody - modifiable, directly in the object. Now you have to trap every
single pointer operation I might ever choose to do to ensure I don't
modify, say, the length. Or free the buffer. Or whatever.
I suspect that jacob has overloaded the cast operator to allow this;
the cast probably extracts the information from the String object
(which might be a struct) rather than doing an actual pointer
conversion.

This kind of overloading is non-standard, of course, but it's a
permitted extension as long as it doesn't change the behavior of any
strictly conforming program.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jul 31 '07 #91
Kelsey Bjarnason wrote:
On Tue, 31 Jul 2007 09:16:15 +1200, Ian Collins wrote:
>Kelsey Bjarnason wrote:
>>[snips]

On Mon, 30 Jul 2007 13:19:10 +1200, Ian Collins wrote:

Impossible to use because the program will slow down for a factor
of 1,000 at least...
>
It's no where near that bad. Yes there is a performance penalty, but
this can be mitigated by only applying the full set of checks to
selected parts of the application.
Ah, so the parts where you used strcpy safely you can skip, but the parts
where you didn't use it safely, you should bounds-check?

Seems silly to me; if you suspect a particular piece of code of needing
such hand-holding, fix the code. If you simply don't know, then why do
you assume some parts are safe and others not?
I think you miss the point.

Assuming you build you application from a set of libraries, you don't
have to bounds check every library every run.

You don't? So one can safely assume that a library which has worked in
three cases will continue to work in 300 more, never exposing a bug of
this sort?
No, where did I say that?

--
Ian Collins.
Jul 31 '07 #92
On 2007-07-30 16:38, jacob navia <ja***@jacob.remcomp.frwrote:
William Hughes wrote:
>>
iii: Q: Should changes be made to C so that implementing a bounds
checker is easier.
A: Disputed. However, your claim that a lot of people answer "no"
with the justification "bounds checking carries too large of
a performance penalty" is dubious at best.

That is the justification I hear most often.

The second is the "spirit of C". C is for macho programmers
that do not need bounds checking because they never make mistakes.

And there are others.
Indeed. You mentioned one of these others in a different post yourself:

It breaks the ABI.

C compilers aren't developed in a vacuum. For almost any platform, there
are already C compilers, and there are libraries compiled with these C
compilers. To be able to use these libraries, the compiler must use the
same sizes and alignments for all types or know that it needs to
interface with a different ABI. The first is clearly incompatible with
fat pointers: If a library function expects to get 4 byte pointers, but
is passed 12 byte pointers instead, it will produce garbage. The second
is possible, but requires a lot of housekeeping, especially if bounds
checking is optional.

I think lack of interoperability with existing compilers and libraries
has a lot more to do with the scarcity of bounds checking C compilers,
than performance, machism, or the spirit of C.
I wanted with my post address the first one. Those researchers
prove that a FAST implementation of bounds checking is
feasible even without language support.
Right. I'm not surprised, though. I believe that the overhead figures
I've seen more than 10 years ago weren't much worse (but I don't have
any papers at hand, so don't ask me for references).
I would say that with language support, the task would be much easier
AND much faster, so fast that it could be done at run time without
any crushing overhead.
Possible. But none of the proposed language changes I've seen from
you has that effect. But since you are the author of a compiler, you can
just implement that change in your compiler and then publish benchmark
results.

hp

--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hj*@hjp.at |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"
Jul 31 '07 #93
In article <ln************@nuthaus.mib.org>,
Keith Thompson <ks***@mib.orgwrote:
>Guillaume <"grsNOSPAM at NOTTHATmail dot com"writes:
>Bounds checking is nice and all, but it certainly is no panacea.
It may even not be *that* useful IMO. Here is why:

1. No bounds checking. You read or write data outside bounds. It
generates an exception. (All implementations where it doesn't always
generate an exception, or worse, where it can lead to code execution,
is brain-dead IMO, but that's another story. Thus, it's not a problem
of bounds checking or not.)
[...]

Are you under the impression that attempts to violate bounds in C
typically trigger an exception? In my experience, attempting to
access memory just beyond the bounds of an array *usually* results in
the program silently accessing some other memory, perhaps part of
another object.
This is true, and I have numbers to back it up. Vaguely remembered
numbers, but numbers based on direct experience nonetheless.
<c6**********@rumours.uwaterloo.cadescribes an out-of-bounds read
bug that triggered an exception something like three or four times in
the equivalent of a year of running time for the program it was in.
This is definitely not "always".

A bounds-checked build is a Rather Useful debugging tool (if we'd been
able to build with something like GCC's mudflap checking mentioned
elsethread we'd've found this bug the first time we ran the checked
build), but if you write code that works by design, it's only an
effort-saving tool, and won't catch anything that careful reviews and
testing wouldn't.
dave

--
Dave Vandervies dj******@csclub.uwaterloo.ca
Sadly, the books-i've-never-read pile is already at height that would turn
a health and safety nazi white. --Geoff Lane and Howard
Fix your priorities. This one is important. S Shubs in the SDM
Jul 31 '07 #94
On 2007-07-29 23:21, Guillaume <"grsNOSPAM at NOTTHATmail dot com"wrote:
Bounds checking is nice and all, but it certainly is no panacea.
It may even not be *that* useful IMO. Here is why:

1. No bounds checking. You read or write data outside bounds. It
generates an exception.
This IS bounds checking.
2. Bounds checking. You read or write data outside bounds. It generates
an 'out of bounds' exception.

Not that much different.
Right. There is not much difference between bounds checking and bounds
checking.
(All implementations where it doesn't always generate an exception, or
worse, where it can lead to code execution, is brain-dead IMO, but
that's another story. Thus, it's not a problem of bounds checking or
not.)
But it is. If a bounds violation doesn't generate an exception, the
implementation obviously doesn't do bounds checking. If a bounds
violation does generate an exception the implementation does check
bounds (at least in some cases).

hp

--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hj*@hjp.at |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"
Jul 31 '07 #95
Peter J. Holzer wrote:
On 2007-07-29 23:21, Guillaume <"grsNOSPAM at NOTTHATmail dot com"wrote:
>Bounds checking is nice and all, but it certainly is no panacea.
It may even not be *that* useful IMO. Here is why:

1. No bounds checking. You read or write data outside bounds. It
generates an exception.

This IS bounds checking.
This is the case where the operating system traps access to addresses not
owned by the process.
>2. Bounds checking. You read or write data outside bounds. It generates
an 'out of bounds' exception.

Not that much different.

Right. There is not much difference between bounds checking and bounds
checking.
Bounds checking would also trap access to memory not a part of the concerned
object, but still within the process's writable address space.

[snip]

Jul 31 '07 #96
In article <f8**********@registered.motzarella.org>,
santosh <sa*********@gmail.comwrote:
>>1. No bounds checking. You read or write data outside bounds. It
generates an exception.
>This IS bounds checking.
>This is the case where the operating system traps access to addresses not
owned by the process.
It might be or it might not be. On a processor that can use a
different segment for each object (including the x86 in theory, though
as far as I'm aware it's never done) you can get hardware bounds
checking on individual objects.

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
Jul 31 '07 #97
Richard Tobin wrote:
In article <f8**********@registered.motzarella.org>,
santosh <sa*********@gmail.comwrote:
>>>1. No bounds checking. You read or write data outside bounds. It
generates an exception.
>>This IS bounds checking.
>This is the case where the operating system traps access to addresses not
owned by the process.

It might be or it might not be. On a processor that can use a
different segment for each object (including the x86 in theory, though
as far as I'm aware it's never done) you can get hardware bounds
checking on individual objects.
It can be done, I have used this feature in an embedded allocator. The
problem for the general case is that there are a finite number (4096 on
the 386) number of descriptor table entries.

--
Ian Collins.
Jul 31 '07 #98
On Wed, 01 Aug 2007 09:00:27 +1200, Ian Collins wrote:
Kelsey Bjarnason wrote:
>On Tue, 31 Jul 2007 09:16:15 +1200, Ian Collins wrote:
>>Kelsey Bjarnason wrote:
[snips]

On Mon, 30 Jul 2007 13:19:10 +1200, Ian Collins wrote:

>Impossible to use because the program will slow down for a factor
>of 1,000 at least...
>>
It's no where near that bad. Yes there is a performance penalty, but
this can be mitigated by only applying the full set of checks to
selected parts of the application.
Ah, so the parts where you used strcpy safely you can skip, but the parts
where you didn't use it safely, you should bounds-check?

Seems silly to me; if you suspect a particular piece of code of needing
such hand-holding, fix the code. If you simply don't know, then why do
you assume some parts are safe and others not?

I think you miss the point.

Assuming you build you application from a set of libraries, you don't
have to bounds check every library every run.

You don't? So one can safely assume that a library which has worked in
three cases will continue to work in 300 more, never exposing a bug of
this sort?

No, where did I say that?
The subject is whether to check everything or not. I maintain that if
you're not checking everything, why check anything, as by definition
you've determined which parts of the code are unsafe - so fix 'em.

The alternative is to treat all memory manipulation as suspect, shy of
proving that some function or library is incapable of such flaws.

The response to this was if you build from a set of libraries, you don't
need to check every library every run - and the obvious question is, why
not? If you've proven it safe, you *never* need to check it; if you
haven't, then the argument for bounds checking at all applies and it must
(by the logic behind the checking) be checked every run, every time -
other than the degenerate cases where the input state and system state are
identical on subsequent runs.
Aug 1 '07 #99
Kelsey Bjarnason wrote:
On Wed, 01 Aug 2007 09:00:27 +1200, Ian Collins wrote:
>Kelsey Bjarnason wrote:
>>On Tue, 31 Jul 2007 09:16:15 +1200, Ian Collins wrote:

Kelsey Bjarnason wrote:
[snips]
>
Seems silly to me; if you suspect a particular piece of code of needing
such hand-holding, fix the code. If you simply don't know, then why do
you assume some parts are safe and others not?
>
I think you miss the point.

Assuming you build you application from a set of libraries, you don't
have to bounds check every library every run.
You don't? So one can safely assume that a library which has worked in
three cases will continue to work in 300 more, never exposing a bug of
this sort?
No, where did I say that?

The subject is whether to check everything or not. I maintain that if
you're not checking everything, why check anything, as by definition
you've determined which parts of the code are unsafe - so fix 'em.
See my response to Keith.

--
Ian Collins.
Aug 1 '07 #100

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

50 posts views Thread by jacob navia | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by suresh191 | last post: by
reply views Thread by harlem98 | last post: by
1 post views Thread by Geralt96 | last post: by
reply views Thread by harlem98 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.