Safer and Better C

bazad wrote:

I am not using C all the time. I have a general understanding of C
and nothing else. The recent reply to use strlcpy and strlcat showed
me that I am not aware of the best and safe techniques. Is there any
place where I could learn more about safer and better C (on FreeBSD)?

Do not start a new thread without a reason. This should have been
a reply to something in some other thread, with sufficient material
quoted and attributed for us to put things in context.

C is inherently unsafe. By monitoring this newsgroup you will now
and then find out about ways of appeasing the lurking tigers.
Beyond that you just have to be aware of what is going on.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #2

E. Robert Tisdale

bazad wrote:

I am not using C all the time.
I have a general understanding of C and nothing else.
The recent reply to use strlcpy and strlcat showed me that
I am not aware of the best and safe techniques.
Is there any place
where I could learn more about safer and better C (on FreeBSD)?

Type

man strlcpy

or

man strlcat

at your FreeBSD prompt.

Nov 14 '05 #3

Minti

"bazad" <no****@noreply.not> wrote in message
news:1097780068.uMbmVua0IQ4XTixLRe9hpg@teranews...

Hi,

I am not using C all the time. I have a general understanding of C
and nothing else. The recent reply to use strlcpy and strlcat showed
me that I am not aware of the best and safe techniques. Is there any
place where I could learn more about safer and better C (on FreeBSD)?
Thank you

Read this thread

http://books.slashdot.org/article.pl...102&tid=190&ti
d=130&tid=6

a couple of days back. Says that a complete chapter on secure C.

HTH

Nov 14 '05 #4

William Ahern

bazad <no****@noreply.not> wrote:

Hi,

I am not using C all the time. I have a general understanding of C
and nothing else. The recent reply to use strlcpy and strlcat showed
me that I am not aware of the best and safe techniques. Is there any
place where I could learn more about safer and better C (on FreeBSD)?

Read the FAQ--http://www.eskimo.com/~scs/C-faq/top.html--twice. You can't go
wrong there. You're likely better off using the existing interfaces properly
than looking for "safer" interfaces.

On a related note, Theo and Company of OpenBSD fame--arguably the ones who
most popularized the functions--will admit that strlcpy() and strlcat() are
_not_ the preferred solutions. memcpy() is even better, because the
occasions when you do not know the length of your source string should be
few and far between. strlcpy() and strlcat() should be a last resort. It's
also worth noting that the C99 semantics of snprintf() are very similar and
more widely available (FreeBSD's snprintf() is one such implementation, I
believe).

strlcpy() and strlcat() are fairly unique in that they're additions to
C--albeit platform specific extensions and not very portable--which play
fair with and generally fit in well amongst the wider body of C code. Using
fancy libraries can often create more problems than they solve, because they
don't fit well with the existing corpus of C source and the points of
contact require considerable attention to detail.

For more secure applications overall--like chroot() and privilege revocation
techniques--in FreeBSD, comp.unix.programmer is probably a better bet.

Nov 14 '05 #5

Mark F. Haigh

bazad wrote:

Hi,

I am not using C all the time. I have a general understanding of C
and nothing else. The recent reply to use strlcpy and strlcat showed
me that I am not aware of the best and safe techniques. Is there any
place where I could learn more about safer and better C (on FreeBSD)?
Thank you

The most common security problems are buffer overflows. Simply put,
this means writing more data into a buffer than there's space for.
You'd do yourself a favor by learning how some of these exploits work.
I know there's a couple of old Phrack articles around, as well as an
article over at SecuriTeam, entitled 'Writing Buffer Overflow Exploits -
a Tutorial for Beginners':

http://www.securiteam.com/securityre...OP0B006UQ.html

However, note that discussions of the information in that article are
off topic here.
Mark F. Haigh
mf*****@sbcglobal.net

Nov 14 '05 #6

William Ahern wrote:

.... snip ...
strlcpy() and strlcat() are fairly unique in that they're additions
to C--albeit platform specific extensions and not very portable--
which play fair with and generally fit in well amongst the wider
body of C code. Using fancy libraries can often create more
problems than they solve, because they don't fit well with the
existing corpus of C source and the points of contact require
considerable attention to detail.

Their implementation is NOT platform specific and totally portable,
and thus they can be used anywhere by supplying an implementation.
I have done so, written in purely standard C. See my page in sig.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #7

jo*******@my-deja.com (John Bode) writes:
[...]

6. When comparing against a constant expression for equality, put the
constant on the LHS (i.e., if (SOME_CONSTANT == x)); this will catch
any problems where you typed "=" when you meant "==".

[...]

This one is controversial. Personally, I find the (5 == x) form
grating; I'd rather use (x == 5) and just make sure I get the operator
right. (This has been discussed to death here before.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 14 '05 #8

Mike Wahler

"Keith Thompson" <ks***@mib.org> wrote in message
news:ln************@nuthaus.mib.org...

jo*******@my-deja.com (John Bode) writes:
[...]
6. When comparing against a constant expression for equality, put the
constant on the LHS (i.e., if (SOME_CONSTANT == x)); this will catch
any problems where you typed "=" when you meant "==".

[...]

This one is controversial. Personally, I find the (5 == x) form
grating; I'd rather use (x == 5) and just make sure I get the operator
right. (This has been discussed to death here before.)

#define equals ==

if(x equals y)
;

:-)

-Mike

Nov 14 '05 #9

William Ahern

CBFalconer <cb********@yahoo.com> wrote:

William Ahern wrote:
... snip ...

strlcpy() and strlcat() are fairly unique in that they're additions
to C--albeit platform specific extensions and not very portable--
which play fair with and generally fit in well amongst the wider
body of C code. Using fancy libraries can often create more
problems than they solve, because they don't fit well with the
existing corpus of C source and the points of contact require
considerable attention to detail.

Their implementation is NOT platform specific and totally portable,
and thus they can be used anywhere by supplying an implementation.
I have done so, written in purely standard C. See my page in sig.

Ah, yes. That statement was poorly worded. I include OpenBSD's strlcpy() and
strlcat() code in many of my projects. I just meant that it's not available
on many platforms--e.g. Linux--and if you don't want to go through the
trouble of including it yourself snprintf() often suffices.

FWIW, the OpenBSD crowd writes very portable code (not fans of GCC'isms). I
keep a compat library around which I reuse for most of my development (I
especially like Niels Provos' sys/tree.h header for easy-peasy splay and
red-black trees).

Nov 14 '05 #10

John Bode wrote:

.... snip ...
1. Initialize all variables to a known value.
2. Check all return values from library functions.
3. Don't use gets().
4. During development, set the warning level on the compiler to
its highest setting. Review and eliminate each warning.
5. Don't cast an expression *just* to eliminate a warning.
6. When comparing against a constant expression for equality, put
the constant on the LHS (i.e., if (SOME_CONSTANT == x)); this
will catch any problems where you typed "=" when you meant "==".
7. Abstract out tedious, repetitive, and/or low-level tasks. IOW,
don't call malloc() directly from your application code, but
wrap it in a function that performs error checking and
initialization of the memory being returned.

I agree with all except #1, which can mask a failure to suitably
initialize later.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #11

Kenny McCormack

In article <g0*****************@newsread3.news.pas.earthlink. net>,
Mike Wahler <mk******@mkwahler.net> wrote:

"Keith Thompson" <ks***@mib.org> wrote in message
news:ln************@nuthaus.mib.org...
jo*******@my-deja.com (John Bode) writes:
[...]
> 6. When comparing against a constant expression for equality, put the
> constant on the LHS (i.e., if (SOME_CONSTANT == x)); this will catch
> any problems where you typed "=" when you meant "==".

[...]

This one is controversial. Personally, I find the (5 == x) form
grating; I'd rather use (x == 5) and just make sure I get the operator
right. (This has been discussed to death here before.)

#define equals ==

if(x equals y)
;

Heh.

But don't most compilers catch (warn about) this anyway, these days?

That is, they want you to change:

if (x = 5)
to:
if ((x = 5))

Nov 14 '05 #12

Guillaume

> But don't most compilers catch (warn about) this anyway, these days?

That is, they want you to change:

if (x = 5)
to:
if ((x = 5))

What will the compiler catch if you have a multiple test, like:

if ((x = 5) && (y == 6))

Nov 14 '05 #13

Guillaume <"grsNOSPAM at NOTTHATmail dot com"> writes:

But don't most compilers catch (warn about) this anyway, these days?
That is, they want you to change:
if (x = 5)
to:
if ((x = 5))

What will the compiler catch if you have a multiple test, like:

if ((x = 5) && (y == 6))

gcc doesn't.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 14 '05 #14

Kelsey Bjarnason

On Sat, 16 Oct 2004 01:26:14 +0000, Keith Thompson wrote:

Guillaume <"grsNOSPAM at NOTTHATmail dot com"> writes:
But don't most compilers catch (warn about) this anyway, these days?
That is, they want you to change:
if (x = 5)
to:
if ((x = 5))

What will the compiler catch if you have a multiple test, like:

if ((x = 5) && (y == 6))

gcc doesn't.

int main()
{
int x = 3, y = 4;

if ( y = x )
;

return 0;
}

gcc -Wall test.c
test.c: In function `main':
test.c:5: warning: suggest parentheses around assignment used as truth
value
Apparently, it does. Just not with the default warning levels... but
you'd never fail to use at least -Wall during development, would you?

Nov 14 '05 #15

Arthur J. O'Dwyer

On Fri, 15 Oct 2004, Kelsey Bjarnason wrote:

On Sat, 16 Oct 2004 01:26:14 +0000, Keith Thompson wrote:
Guillaume <"grsNOSPAM at NOTTHATmail dot com"> writes:

What will the compiler catch if you have a multiple test, like:

if ((x = 5) && (y == 6))
gcc doesn't.

[...] Apparently, it does.

Try again, this time with the line Guillaume asked about. Keith's
absolutely right.

On the other hand, gcc /will/ warn you if you leave off the redundant
parentheses in Guillaume's example. Which some people might see as an
advantage to leaving them off (my preferred style in many cases as it
reduces clutter), but really I don't consider "mistyping == as = or
vice versa" to be a statistically significant problem in the first place.

-Arthur

Nov 14 '05 #16

Jonathan Adams

In article
<pa****************************@xxnospamyy.lightsp eed.bc.ca>,
Kelsey Bjarnason <ke*****@xxnospamyy.lightspeed.bc.ca> wrote:

On Sat, 16 Oct 2004 01:26:14 +0000, Keith Thompson wrote:
Guillaume <"grsNOSPAM at NOTTHATmail dot com"> writes:
But don't most compilers catch (warn about) this anyway, these days?
That is, they want you to change:
if (x = 5)
to:
if ((x = 5))

What will the compiler catch if you have a multiple test, like:

if ((x = 5) && (y == 6))

gcc doesn't.

int main()
{
int x = 3, y = 4;

if ( y = x )
;

return 0;
}

gcc -Wall test.c
test.c: In function `main':
test.c:5: warning: suggest parentheses around assignment used as truth
value

I believe they were referring to the latter construction:

if ((x = 5) && (y == 6))

which is not caught (at least not with -Wall on gcc 3.4.2).

Cheers,
- jonathan

Nov 14 '05 #17

Ben Pfaff

Kelsey Bjarnason <ke*****@xxnospamyy.lightspeed.bc.ca> writes:

On Sat, 16 Oct 2004 01:26:14 +0000, Keith Thompson wrote:
Guillaume <"grsNOSPAM at NOTTHATmail dot com"> writes:
What will the compiler catch if you have a multiple test, like:

if ((x = 5) && (y == 6))

gcc doesn't.

if ( y = x )
;

Are you paying attention?
--
Ben Pfaff
email: bl*@cs.stanford.edu
web: http://benpfaff.org

Nov 14 '05 #18

"Arthur J. O'Dwyer" <aj*@nospam.andrew.cmu.edu> writes:

On Fri, 15 Oct 2004, Kelsey Bjarnason wrote:

On Sat, 16 Oct 2004 01:26:14 +0000, Keith Thompson wrote:
Guillaume <"grsNOSPAM at NOTTHATmail dot com"> writes:

What will the compiler catch if you have a multiple test, like:

if ((x = 5) && (y == 6))

gcc doesn't.

[...]
Apparently, it does.

Try again, this time with the line Guillaume asked about. Keith's
absolutely right.

On the other hand, gcc /will/ warn you if you leave off the
redundant parentheses in Guillaume's example. Which some people
might see as an advantage to leaving them off (my preferred style in
many cases as it reduces clutter), but really I don't consider
"mistyping == as = or vice versa" to be a statistically significant
problem in the first place.

The parentheses aren't redundant (if that's really supposed to be "="
rather than "=="). If you leave them out:

if (x = 5 && y == 6)

is equivalent to

if (x = (5 && y == 6))

Of course if you correctly use "==" rather than "=", they are redundant:

if (x == 5 && y == 6)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 14 '05 #19

Kelsey Bjarnason

On Sat, 16 Oct 2004 00:12:55 -0400, Arthur J. O'Dwyer wrote:

On Fri, 15 Oct 2004, Kelsey Bjarnason wrote:

On Sat, 16 Oct 2004 01:26:14 +0000, Keith Thompson wrote:
Guillaume <"grsNOSPAM at NOTTHATmail dot com"> writes:

What will the compiler catch if you have a multiple test, like:

if ((x = 5) && (y == 6))

gcc doesn't.

[...]
Apparently, it does.

Try again, this time with the line Guillaume asked about. Keith's
absolutely right.

Actually, it does. Note that the (x=5) is included in the extra layer of
parentheses, which is the _fix_ to allow such a situation to occur without
the warning. Trying it in the context of the original actual problem -
without the extra parentheses - it does, indeed, complain.

One can hardly say "X doesn't do this" when it does _unless_ one takes
steps to prevent it... and then test with code which has, in fact, taken
those steps. Might as well compile with all warnings disabled and then
complain the compiler doesn't detect any of a thousand or more things.

Nov 14 '05 #20

Kelsey Bjarnason <ke*****@xxnospamyy.lightspeed.bc.ca> writes:

On Sat, 16 Oct 2004 00:12:55 -0400, Arthur J. O'Dwyer wrote:

On Fri, 15 Oct 2004, Kelsey Bjarnason wrote:

On Sat, 16 Oct 2004 01:26:14 +0000, Keith Thompson wrote:
Guillaume <"grsNOSPAM at NOTTHATmail dot com"> writes:
>
> What will the compiler catch if you have a multiple test, like:
>
> if ((x = 5) && (y == 6))

gcc doesn't.

[...]
Apparently, it does.

Try again, this time with the line Guillaume asked about. Keith's
absolutely right.

Actually, it does. Note that the (x=5) is included in the extra layer of
parentheses, which is the _fix_ to allow such a situation to occur without
the warning. Trying it in the context of the original actual problem -
without the extra parentheses - it does, indeed, complain.

Without the "extra" parentheses, it's a different expression.
if (x = 5 && y == 6)
is equivalent to
if (x = (5 && y == 6))
and gcc complains because the assignment is at the top level of the
expression.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 14 '05 #21

Keith Thompson wrote:

"Arthur J. O'Dwyer" <aj*@nospam.andrew.cmu.edu> writes:
On Fri, 15 Oct 2004, Kelsey Bjarnason wrote:
On Sat, 16 Oct 2004 01:26:14 +0000, Keith Thompson wrote:
Guillaume <"grsNOSPAM at NOTTHATmail dot com"> writes:
>
> What will the compiler catch if you have a multiple test, like:
>
> if ((x = 5) && (y == 6))

gcc doesn't.

[...]
Apparently, it does.

Try again, this time with the line Guillaume asked about. Keith's
absolutely right.

On the other hand, gcc /will/ warn you if you leave off the
redundant parentheses in Guillaume's example. Which some people
might see as an advantage to leaving them off (my preferred style
in many cases as it reduces clutter), but really I don't consider
"mistyping == as = or vice versa" to be a statistically
significant problem in the first place.

The parentheses aren't redundant (if that's really supposed to be
"=" rather than "=="). If you leave them out:

if (x = 5 && y == 6)

is equivalent to

if (x = (5 && y == 6))

Of course if you correctly use "==" rather than "=", they are redundant:

if (x == 5 && y == 6)

Yet, of the 4 statements above, only the first is unequivocally
clear to any reader with any level of understanding of C's weird
and wonderful precedence system. Any decent optimizer should be
able to figure out that no code need be generated.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #22

Jonathan Adams wrote:

.... snip ...
I believe they were referring to the latter construction:

if ((x = 5) && (y == 6))

which is not caught (at least not with -Wall on gcc 3.4.2).

What's to be caught? It is perfectly valid. If you want it to
catch failure to type the second '=' simply get in the habit of
putting the constant first:

if ((5 = x) && (6 == y))

which will squawk loudly on any compiler.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #23

John Bode

Keith Thompson <ks***@mib.org> wrote in message news:<ln************@nuthaus.mib.org>...

jo*******@my-deja.com (John Bode) writes:
[...]
6. When comparing against a constant expression for equality, put the
constant on the LHS (i.e., if (SOME_CONSTANT == x)); this will catch
any problems where you typed "=" when you meant "==".

[...]

This one is controversial. Personally, I find the (5 == x) form
grating; I'd rather use (x == 5) and just make sure I get the operator
right. (This has been discussed to death here before.)

Yeah. I don't use it myself for the same reason; it's just too
mentally jarring. And it's not a mistake I make very often. But it
does provide some level of safety (of course, of both expressions are
variables (lvalues), it doesn't help much).

Nov 14 '05 #24

Minti

"CBFalconer" <cb********@yahoo.com> wrote in message
news:41**************@yahoo.com...

Jonathan Adams wrote:

... snip ...

I believe they were referring to the latter construction:

if ((x = 5) && (y == 6))

which is not caught (at least not with -Wall on gcc 3.4.2).

What's to be caught? It is perfectly valid. If you want it to
catch failure to type the second '=' simply get in the habit of
putting the constant first:

if ((5 = x) && (6 == y))

which will squawk loudly on any compiler.

When someone first suggested me this thing I really really liked it and in
fact started using it. However maybe for equality it sounds good. But when I
see code like

if ( 0 >= c )
;

I always have to convert it either mentally or in my notes to
if ( c <= 0 )
;

Of course YMMV. Just like MMV.

--
Imanpreet Singh Arora
Zmoc.Zliamg@Zteerpnami
Remove Z to mail
"Things may come to those who wait, but only the things left by those who
hustle."
Abraham Lincoln

Nov 14 '05 #25

jo*******@my-deja.com (John Bode) wrote:

bazad <no****@noreply.not> wrote:
I am not using C all the time. I have a general understanding of C
and nothing else. The recent reply to use strlcpy and strlcat showed
me that I am not aware of the best and safe techniques. Is there any
place where I could learn more about safer and better C (on FreeBSD)?

strlcpy and strlcat just transform the way in which a buffer overflow
can happen. They don't address the cause (human error w.r.t. length
calculations.) The way I avoid buffer overflows in strings is to use
a string ADT which doesn't takes memory length and string length into
account automatically with each operation:

http://bstring.sf.net/

This can bring the safety level essentially up to the same as is found
in other higher level languages in string operations.
I don't know of any specific resources, but here are some personal
guidelines in no particular order (note that I don't necessarily
follow these *all* the time, depending on the situation and how
deluded^H^H^H^H^H^H^Hconfident I am in my own abilities that day):

1. Initialize all variables to a known value.
Hmm ... well so for pointers do you initialize them to NULL? That's
fine, but its not much of a safety parachute if you have an accidental
"use before proper initialize" error.
2. Check all return values from library functions.
Well except for in bstrlib where its semantically optional. You can
usually just check dependent return values at the end and still know
an error has occurred without suffering from UB.
3. Don't use gets().
Better yet define gets() to emit an error or do something like stop
the program in its tracks.
4. During development, set the warning level on the compiler to its
highest setting. Review and eliminate each warning.
Right. The point is to recognize that even if you don't agree with a
warning, the effort you put into eliminating it is worth it for all
the other hints the compiler gives you through its warnings.

While not practical for everyone, these days I also try to ensure that
my code compiles with multiple compilers. I have found that different
compilers have vastly different safety coverage with their warnings --
complying with all of them helps make code truly bulletproof and
maintainable.
5. Don't cast an expression *just* to eliminate a warning.
I'm not sure what scenario you are talking about here. I would rather
say that you should cast *correctly*. I.e., clearly there are ways in
which casting numerics incorrectly can get the right type but the
wrong/inaccurate result.
6. When comparing against a constant expression for equality, put the
constant on the LHS (i.e., if (SOME_CONSTANT == x)); this will catch
any problems where you typed "=" when you meant "==".
Of course. People who have problems with this are somehow letting
some neurosis in their brain dominate over recognizing the law of
commutativity for this operator. The safety benefit for doing this is
obvious.
7. Abstract out tedious, repetitive, and/or low-level tasks. IOW,
don't call malloc() directly from your application code, but wrap it
in a function that performs error checking and initialization of the
memory being returned.

Well, the new/delete paradigm of Pascal or C++ is usually a lot safer
and readable than C's crazy mallocing. So for ADT's, I usually have
creation functions that I name with a "New" in them, and destruction
function names with a "Destroy" in them.

These are all reasonable ideas. But certainly I would add to them:

8. Use const *maximally*, and never cast away or work around const
semantics. Using const will typically make it obvious what parameters
to a function are inputs and what are outputs.

9. Always include error paths out of every function. (This goes with
2. above.) Without exception handling in C, your choices are either
to exit immediately (not recommended) or return with some kind of
erroneous return status. For ADTs that cannot be constructed, I
usually return NULL, and for just general errors, I return some
negative value under the assumption that normal operations always
return with 0 or a positive number. For debugging purposes -__LINE__
is a typical value that I return as an error.

10. Program for thread safety and reentrancy. strtok() is an example
of how *NOT* to design a function. Modifying what should obviously be
a source parameter, and then storing away the result in some single
focus, static way makes strtok non-reentrant. There is nothing in the
desired functionality of such a function that demands such bad
properties. Think of trying to do a simple thing like iterating
through substrings of a string in an outer loop, and then doing the
same on each substring in the inner loop -- strtok cannot be used in
something even as simple as that. If you are on linux look at
strtok_r() in the man pages for an example of a superior design which
has essentially the same functionality of strtok without its
weaknesses.

Ok, although ANSI C says nothing about multithreading, there is hardly
any modern implementation that does not expose platform specific, or
posix multithreading functionality. Statics, globals, and
side-effects are the kinds of things that work against race condition
safety, and so it pays to minimize them in *all* of your code.

11. If you have algorithms that only make sense for certain modes of
some parameters, try to implement them in functions with static
declaration. External interfaces should accept any combination and
modes of parameters so long as they are legal with respect to their
own type. The idea is that a developer should be able to read a .h
file read the function names, and already have a good idea of how to
use the module. Typically what prevents this is that usage of
functions have non-obvious parameter restrictions which requires that
developers read through documentation (which may or may not exist, may
be of poor quality, have errors in it, etc) to figure out what is
going on.

There is some controversy here though. *Personally* I insist on
*supporting* aliased parameters to the maximum degree possible.
However, I have basically seen almost no libraries that are
implemented with this in mind (gmp is an example of a library which
takes my point of view, for functionality reasons -- but you can see
how supporting aliasing can be very well motivated.) The assumption
of no aliasing is usually implicit or specifically required, even
though this is rarely enforced by "restrict" (which is not in
widespread use since C99 has not been adopted by any mainstream
compiler vendor.)

12. Avoid the C library for string manipulation. Use
http://bstrlib.sf.net/ or something in which memory and length
semantics are automatically managed.

--
Paul Hsieh
http://www.pobox.com/~qed/

Nov 14 '05 #26

Paul Hsieh wrote:

jo*******@my-deja.com (John Bode) wrote:
bazad <no****@noreply.not> wrote:
I am not using C all the time. I have a general understanding of C
and nothing else. The recent reply to use strlcpy and strlcat showed
me that I am not aware of the best and safe techniques. Is there any
place where I could learn more about safer and better C (on FreeBSD)?

strlcpy and strlcat just transform the way in which a buffer
overflow can happen. They don't address the cause (human error
w.r.t. length calculations.)

Please don't give such misinformation. Those routines prevent
buffer overflows, and report the condition. They DO address the
human cause by simplifying supplying the appropriate parameter
values. Read the bloody documentation.
The way I avoid buffer overflows in strings is to use
a string ADT which doesn't takes memory length and string length
into account automatically with each operation:

Your string system stands (or fails) by itself. strlcpy and
strlcat work with the existing standardized string system.

--
Some informative links:
news:news.announce.newusers
http://www.geocities.com/nnqweb/
http://www.catb.org/~esr/faqs/smart-questions.html
http://www.caliburn.nl/topposting.html
http://www.netmeister.org/news/learn2quote.html

Nov 14 '05 #27

Richard Bos

CBFalconer <cb********@yahoo.com> wrote:

Jonathan Adams wrote:
I believe they were referring to the latter construction:

if ((x = 5) && (y == 6))

which is not caught (at least not with -Wall on gcc 3.4.2).

What's to be caught? It is perfectly valid. If you want it to
catch failure to type the second '=' simply get in the habit of
putting the constant first:

if ((5 = x) && (6 == y))

....and be prepared to spend a lot of time trying to track the bug next
time you accidentally write if (y=x) instead of if (x==y), because
you've grown out of the habit of paying attention.

Richard

Nov 14 '05 #28

CBFalconer <cb********@yahoo.com> wrote:

Paul Hsieh wrote:
jo*******@my-deja.com (John Bode) wrote:
bazad <no****@noreply.not> wrote:
I am not using C all the time. I have a general understanding of C
and nothing else. The recent reply to use strlcpy and strlcat showed
me that I am not aware of the best and safe techniques. Is there any
place where I could learn more about safer and better C (on FreeBSD)?

strlcpy and strlcat just transform the way in which a buffer
overflow can happen. They don't address the cause (human error
w.r.t. length calculations.)

Please don't give such misinformation. Those routines prevent
buffer overflows, and report the condition. They DO address the
human cause by simplifying supplying the appropriate parameter
values. Read the bloody documentation.

These functions don't magically associate legal memory size with the
string. That's still the responsibility of the programmer. So its
changed an implicit requirement to an explicit one. But it doesn't
address the real problem, which is that programmers make mistakes
which causes memory limit to be inadequate for the desired string
operation -- and there following is the buffer overrun.

The documentation seems to suggest that sizeof() is the best way of
tracking the length of strings. In any event, max presized strings is
the *real* problem. One is left either preallocating too much if the
input is small, or too little if the input is large, and avoiding
buffer overflows is still up to programmer dilligence. There's a
reason why other languages have dynamically sized strings.

The way I avoid buffer overflows in strings is to use
a string ADT which [sic] takes memory length and string length
into account automatically with each operation:

Your string system stands (or fails) by itself. strlcpy and
strlcat work with the existing standardized string system.

Bstrlib also works with the standardized string system (in the most
natural way possible) without increasing the burden of programmer
considerations (tracking the dynamic memory length along with the
string.) This is one of its most important features -- you can still
continue to use char * libraries and interfaces while using Bstrlib
without any conversion penalties.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 14 '05 #29

Paul Hsieh wrote:

CBFalconer <cb********@yahoo.com> wrote:
Paul Hsieh wrote:
.... snip ...

strlcpy and strlcat just transform the way in which a buffer
overflow can happen. They don't address the cause (human error
w.r.t. length calculations.)

Please don't give such misinformation. Those routines prevent
buffer overflows, and report the condition. They DO address the
human cause by simplifying supplying the appropriate parameter
values. Read the bloody documentation.

These functions don't magically associate legal memory size with the
string. That's still the responsibility of the programmer. So its
changed an implicit requirement to an explicit one. But it doesn't
address the real problem, which is that programmers make mistakes
which causes memory limit to be inadequate for the desired string
operation -- and there following is the buffer overrun.

C in general cannot perform such protection, and the cause is
rooted in the flagrant bandying about of pointers, pointer
arithmetic, transformations such as casts, the use of variadic
functions, and more. Strings are simply one aspect of this. We
can all easily avoid these problems by switching to a language
designed with correctness in mind, such as Pascal, Modula, Ada.
Unfortunately the world is full of macho programmers who seem to
feel they can handle an error-prone language, such as C, without
ever getting burnt. I am often among them.

I think it is not a good idea to supply areas that avoid the
typical C problems, without handling the overall (impossible)
problem. It only gives the unwashed a false sense of security, and
encourages those to make even graver errors elsewhere. Note that
providing routines that can be controlled is not such an avoidance.

--
"I support the Red Sox and any team that beats the Yankees"
"Any baby snookums can be a Yankee fan, it takes real moral
fiber to be a Red Sox fan"
"I listened to Toronto come back from 3:0 in '42, I plan to
watch Boston come back from 3:0 in 04"

Nov 14 '05 #30

CBFalconer <cb********@yahoo.com> wrote:

Paul Hsieh wrote:
CBFalconer <cb********@yahoo.com> wrote:
Paul Hsieh wrote:
strlcpy and strlcat just transform the way in which a buffer
overflow can happen. They don't address the cause (human error
w.r.t. length calculations.)

Please don't give such misinformation. Those routines prevent
buffer overflows, and report the condition. They DO address the
human cause by simplifying supplying the appropriate parameter
values. Read the bloody documentation.
These functions don't magically associate legal memory size with the
string. That's still the responsibility of the programmer. So its
changed an implicit requirement to an explicit one. But it doesn't
address the real problem, which is that programmers make mistakes
which causes memory limit to be inadequate for the desired string
operation -- and there following is the buffer overrun.

C in general cannot perform such protection, and the cause is
rooted in the flagrant bandying about of pointers, pointer
arithmetic, transformations such as casts, the use of variadic
functions, and more. Strings are simply one aspect of this. We
can all easily avoid these problems by switching to a language
designed with correctness in mind, such as Pascal, Modula, Ada.

Pascal has the same problem -- trying to dereference a nil pointer,
for example (I am not an expert in Modula-2 or Ada, but as I recall
neither is GC based, so are likely to have the same problem.) You
have to go to Java/Python/Perl if you want to bring yourself into a
more totally insulated programming environment. Another very
interesting approach is the Cyclone programming language -- though it
might be a little syntactically annoying, it gives near C level of
program control while being completely safe. These languages do total
abstraction for you. The question is -- can you add just enough
abstraction to C to gain enough safety that dealing with the
weaknesses of the C language doesn't become burdensome or unmanageable
in the long run. One way to do this is develop ADTs with full
closure. Bstrlib is about as close as you can come to an ADT with
full closure in C.

You speak in these generalities with your opinions about these things
yet you demonstrate so clearly that you have not honestly examined
Bstrlib. You don't understand the interesting line in the sand that I
have drawn -- the real question I have put forth to programming world.
If you can develop ADTs with safety, speed, functionality,
portability and interoperability of Bstrlib then is the switch to
other programming languages really necessary?
Unfortunately the world is full of macho programmers who seem to
feel they can handle an error-prone language, such as C, without
ever getting burnt. I am often among them.
I don't think programming in C is about machismo. That would be
programming in ASM.
I think it is not a good idea to supply areas that avoid the
typical C problems, without handling the overall (impossible)
problem. It only gives the unwashed a false sense of security, and
encourages those to make even graver errors elsewhere.
What basis is there for this outrageous statement? All programming
abstractions are basically schemes for managing complexity which are
supposed to improve productivity or safety -- including switching to
other languages. Some of the *typical* problems of C are repetitive,
and solving them over and over again has no value whatsoever. I
reject the notion that allowing beginner programmers to make mistakes
with strings will help them make less grave errors in other
programming.

In fact Bstrlib includes a module called bsafe which forcably
deprecates some of C's unsafe functionality -- so it can be argued
that it reduces some errors outside of Bstrlib.
Note that providing routines that can be controlled is not such an avoidance.

Routines like strlcat, and strlcpy don't solve the real problem. All
of the exact same buffer overflow scenarios are still present in the
same way and same sense, just with a somewhat reduced likelihood of
happening.

A direct comparison with Bstrlib shows that so many of the typical
buffer overflow scenarios are just not possible using Bstrlib. But it
is not only just some safe programming library. Its an *EXAMPLE* (and
its source code is available) of how to improve safety through proper
abstraction without significantly compromising on anything.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 14 '05 #31

Paul Hsieh wrote:

CBFalconer <cb********@yahoo.com> wrote:
.... snip ...
You speak in these generalities with your opinions about these things
yet you demonstrate so clearly that you have not honestly examined
Bstrlib. You don't understand the interesting line in the sand that I
have drawn -- the real question I have put forth to programming world.
If you can develop ADTs with safety, speed, functionality,
portability and interoperability of Bstrlib then is the switch to
other programming languages really necessary?
I readily concede that point. The fact that so far I have had no
noticeable problems using the existing system has something to do
with it.

Unfortunately the world is full of macho programmers who seem to
feel they can handle an error-prone language, such as C, without
ever getting burnt. I am often among them.

I don't think programming in C is about machismo. That would be
programming in ASM.
I think it is not a good idea to supply areas that avoid the
typical C problems, without handling the overall (impossible)
problem. It only gives the unwashed a false sense of security,
and encourages those to make even graver errors elsewhere.

What basis is there for this outrageous statement? ... snip ...

My own opinion. I got rid of the training wheels on my childrens
bicycles at the earliest opportunity.

--
"I support the Red Sox and any team that beats the Yankees"
"Any baby snookums can be a Yankee fan, it takes real moral
fiber to be a Red Sox fan" - "I listened to Toronto come back
from 3:0 in '42, I watched Boston come back from 3:0 in '04"

Nov 14 '05 #32

CBFalconer <cb********@yahoo.com> wrote:

Paul Hsieh wrote:
CBFalconer <cb********@yahoo.com> wrote:
You speak in these generalities with your opinions about these things
yet you demonstrate so clearly that you have not honestly examined
Bstrlib. You don't understand the interesting line in the sand that I
have drawn -- the real question I have put forth to programming world.
If you can develop ADTs with safety, speed, functionality,
portability and interoperability of Bstrlib then is the switch to
other programming languages really necessary?

I readily concede that point. The fact that so far I have had no
noticeable problems using the existing system has something to do
with it.

It means you don't measure performance, and you don't measure the
danger or effort required to deal with buffer overruns. You do it for
ego, yet you don't realize that you are implicitely subordinating your
ego to Thompson, Kernigan and Ritchie and their vision for how strings
should be implemented.

The C language and libraries implementation of strings is really poor
from every angle of consideration except for extremely small systems
(like 8bit systems with < 64K, or ROM programming.) Bstrlib makes the
effort to leave all the worst aspects of '\0' terminated char *
strings behind while retaining just the right amount of
interoperability to leverage the incumbancy of them.

You've let K&R&T tell you that string and binary buffers are
necessarily distinct. Its insane things like this which make binary
preserving text or hex editors all the more complicated to implement.

I think it is not a good idea to supply areas that avoid the
typical C problems, without handling the overall (impossible)
problem. It only gives the unwashed a false sense of security,
and encourages those to make even graver errors elsewhere.

What basis is there for this outrageous statement? ... snip ...

My own opinion. I got rid of the training wheels on my childrens
bicycles at the earliest opportunity.

But what you don't realize is that you've also taken off the gears
with the greatest torque. The C way of doing things is not just
dangerous but its also *SLOWER* (there is an example included in the
Bstrlib downloads which has a benchmark -- see for yourself.) So
you've accepted a more difficult challenge in order to achieve an
inferior result.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 14 '05 #33

Paul Hsieh wrote:

CBFalconer <cb********@yahoo.com> wrote:
Paul Hsieh wrote:
CBFalconer <cb********@yahoo.com> wrote: .... snip ... I think it is not a good idea to supply areas that avoid the
typical C problems, without handling the overall (impossible)
problem. It only gives the unwashed a false sense of security,
and encourages those to make even graver errors elsewhere.

What basis is there for this outrageous statement? ... snip ...

My own opinion. I got rid of the training wheels on my childrens
bicycles at the earliest opportunity.

But what you don't realize is that you've also taken off the gears
with the greatest torque. The C way of doing things is not just
dangerous but its also *SLOWER* (there is an example included in the
Bstrlib downloads which has a benchmark -- see for yourself.) So
you've accepted a more difficult challenge in order to achieve an
inferior result.

No, when I want a better, more secure language I simply use
Pascal. When I want to stay very close to the machine I use
assembly. When I am willing to compromise or want to maximize
practical portability I use C. Apart from the assembly I am
usually on very firm ISO standardized ground.

Bear in mind that I have no objection to you, or anyone else, using
your Bstrlib system. I have simply found no need so far, and
consider such efforts better applied to languages with a secure
foundation. I reserve the right to change my mind in future.

--
"I support the Red Sox and any team that beats the Yankees"
"Any baby snookums can be a Yankee fan, it takes real moral
fiber to be a Red Sox fan" - "I listened to Toronto come back
from 3:0 in '42, I watched Boston come back from 3:0 in '04"

Nov 14 '05 #34

Richard Bos

we******@gmail.com (Paul Hsieh) wrote:

interoperability to leverage the incumbancy of them.

You have just told us that you're a manager, and can conveniently be
ignored in a technical discussion. Thanks for the frankness.

Richard

Nov 14 '05 #35

Alan Balmer

On Fri, 22 Oct 2004 07:16:56 GMT, rl*@hoekstra-uitgeverij.nl (Richard
Bos) wrote:

we******@gmail.com (Paul Hsieh) wrote:
interoperability to leverage the incumbancy of them.

You have just told us that you're a manager, and can conveniently be
ignored in a technical discussion. Thanks for the frankness.

I think he's an academic. No difference to your conclusion, of course.

--
Al Balmer
Balmer Consulting
re************************@att.net

Nov 14 '05 #36

Dave Thompson

On 18 Oct 2004 13:57:17 -0700, qe*@pobox.com (Paul Hsieh) wrote:
<snip>

11. If you have algorithms that only make sense for certain modes of
some parameters, try to implement them in functions with static
declaration. External interfaces should accept any combination and
modes of parameters so long as they are legal with respect to their
own type. The idea is that a developer should be able to read a .h
file read the function names, and already have a good idea of how to
use the module. Typically what prevents this is that usage of
functions have non-obvious parameter restrictions which requires that
developers read through documentation (which may or may not exist, may
be of poor quality, have errors in it, etc) to figure out what is
going on.
I'm not sure what you mean by "mode" of a parameter. I have seen it
used (in COBOL, Pascal, and Ada, and informally in F9X) to mean the
parameter-passing mechanism or form (value, reference, in, out, etc.)
Since C only has by-value-initialized, you can't mean that. It sounds
like you mean values, or ranges, or combinations of such, that are
representable in the declared types but not valid for the callee.
There is some controversy here though. *Personally* I insist on
*supporting* aliased parameters to the maximum degree possible.
However, I have basically seen almost no libraries that are
implemented with this in mind (gmp is an example of a library which
takes my point of view, for functionality reasons -- but you can see
how supporting aliasing can be very well motivated.) The assumption
of no aliasing is usually implicit or specifically required, even
though this is rarely enforced by "restrict" (which is not in
widespread use since C99 has not been adopted by any mainstream
compiler vendor.)

Note that 'restrict' even when implemented does not ENFORCE anything,
or at least is not required to and actual checking in nontrivial cases
would be costly so implementors are unlikely to do it. This is unlike
the other/preexisting qualifiers const and volatile which are "safe"
(and sometimes annoying!) in that they cannot be "removed" from the
type without casting or cheating. All 'restrict' does for the caller
is DOCUMENT the requirement of nonaliasing, in a standardized (and
conceivably tool-processable) way, and in a place (the prototype)
where the programmer using it is almost certain to need to look. The
real benefit is supposed to be on the callee side, for optimization.

- David.Thompson1 at worldnet.att.net

Nov 14 '05 #37

Dave Thompson <da*************@worldnet.att.net> wrote:

On 18 Oct 2004 13:57:17 -0700, qe*@pobox.com (Paul Hsieh) wrote:
11. If you have algorithms that only make sense for certain modes of
some parameters, try to implement them in functions with static
declaration. External interfaces should accept any combination and
modes of parameters so long as they are legal with respect to their
own type. The idea is that a developer should be able to read a .h
file read the function names, and already have a good idea of how to
use the module. Typically what prevents this is that usage of
functions have non-obvious parameter restrictions which requires that
developers read through documentation (which may or may not exist, may
be of poor quality, have errors in it, etc) to figure out what is
going on.
I'm not sure what you mean by "mode" of a parameter.

Actually what I mean is things like "this integer parameter cannot be
negative" or "this pointer cannot be NULL" or "this integer parameter
must correspond to a lower bound for the space available for a given
buffer parameter" etc. Those are ok for module-internal statically
declared functions. But for stuff you expose via extern, you should
just accept any combination of whatever parameters so long as they are
legal with respect to their type (for example pointers should either
point to something valid or be NULL -- not just be randomly
unitialized.)

There is some controversy here though. *Personally* I insist on
*supporting* aliased parameters to the maximum degree possible.
However, I have basically seen almost no libraries that are
implemented with this in mind (gmp is an example of a library which
takes my point of view, for functionality reasons -- but you can see
how supporting aliasing can be very well motivated.) The assumption
of no aliasing is usually implicit or specifically required, even
though this is rarely enforced by "restrict" (which is not in
widespread use since C99 has not been adopted by any mainstream
compiler vendor.)

Note that 'restrict' even when implemented does not ENFORCE anything,
or at least is not required to and actual checking in nontrivial cases
would be costly so implementors are unlikely to do it.

I agree that enforcing it completely and pervasively is basically
infeasible. But obviously the compiler can and should check the most
obvious cases from the call sites (i.e., I'd at least like a warning
for strcat(p,p).)
[...] This is unlike
the other/preexisting qualifiers const and volatile which are "safe"
(and sometimes annoying!) in that they cannot be "removed" from the
type without casting or cheating. All 'restrict' does for the caller
is DOCUMENT the requirement of nonaliasing, in a standardized (and
conceivably tool-processable) way, and in a place (the prototype)
where the programmer using it is almost certain to need to look. The
real benefit is supposed to be on the callee side, for optimization.

I understand all this. But since "restrict" is not very good at
*enforcing* behavior it makes it significantly less useful than const
or volatile. Errors that arise from incorrect aliasing handling can
be difficult to debug, so without enforcement, we are left with a
performance hack (like register or inline) that is inevitably
superseded by better technology (like cross-file-inlining or similar
techniques from which true non-Aliasing properties can be sussed out.)

I don't think being in this situation of ambiguity is ultimately
productive, so I instead take the opposite tack. What if instead we
demand proper suppose for aliasing in our libraries/modules? This
leads us to considering for the following questions about
implementation:

1) Can we detect aliasing at runtime with high performance?
2) Is there a proper interpretation for functions that take aliased
parameters?
3) Can algorithms be written that are aliasing neutral?
4) Can we implement our ADTs to restrict aliasing to only trivial
kinds of aliasing? (i.e., identical -- but without partial
overlapping.)

The memmove() library call in the latest x86 compilers actually follow
these principles in enacting their solution. The idea is that the
memmove() function tries to switch to memcpy() in a maximal number of
cases, before using a slower aliasing neutral algorithm for
implementation. So memmove() is usually not really slower than
memcpy(), while being somewhat safer.

For bignum libraries like gmp, rather than having a myriad of
functions for implementing what seem like seperate operations: A = B +
C, A += B, A += A, the simpler solution is only to support A = B + C,
where any of A, B, or C can be aliased. This keeps the API managable,
while not seriously impacting performance (detection and branch is
very low overhead in comparison to the inner loops of any bignum
function.)

For my string library "the better string library"
(http://bstring.sf.net/) what I found is that under the assumption
that memmove() has nearly identical performance to memcpy(), there is
no performance difference in implementing a completely alias-safe
library versus one that is not. Having aliased parameters does not
lead to any ambiguous interpretation of what any Bstrlib function
*should* do semantically. This also means I don't have document
caveats like "Don't ever write bconcat(b,b) or binsert(b,0,b)",
because they work exactly as expected.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 14 '05 #38

Chris Torek

(Almost-off-topic drift, and I am only going to respond to this
one item...)

Dave Thompson <da*************@worldnet.att.net> wrote:
I'm not sure what you mean by "mode" of a parameter.

In article <news:d0**************************@posting.google. com>
Paul Hsieh <we******@gmail.com> wrote:Actually what I mean is things like "this integer parameter cannot be
negative" or "this pointer cannot be NULL" or "this integer parameter
must correspond to a lower bound for the space available for a given
buffer parameter" etc. ...
A better word for this -- or at least one more commonly used -- is
"constraints". This also happens to be the word used in the C
standards (not entirely coincidentally) for its own requirements
upon the programmer.
Those are ok for module-internal statically
declared functions. But for stuff you expose via extern, you should
just accept any combination of whatever parameters so long as they are
legal with respect to their type (for example pointers should either
point to something valid or be NULL -- not just be randomly
unitialized.)

While I agree that, in general, weaker constraints are "better"
for exposed interfaces than stronger ones, sometimes strong(ish)
constraints seem to make sense. An example we had earlier (though
I have no idea whether it was in this same thread) occurs with
strlen(NULL): while the C standards could require that strlen()
return 0 in this case, and perhaps that strcpy() do nothing if
either of its operands is NULL, and so on, NULL is not actually a
string, and claiming that it *is* a string of length 0 is clearly
not entirely correct either. I would not object to strlen(NULL)
returning 0, but I do not object to its being considered a
dreadful mistake either (as C works today). I find neither one
"clearly superior" to the other: there are tradeoffs either way.

Languages with exceptions (Ada, C++, and Eiffel all come to mind)
can handle this by rejecting the attempt at runtime with an error;
and indeed, in Eiffel one can even express many constraints directly
in a function interface, so that compilers can catch some of these
errors at compile time. Once one buys into the "exceptions" model,
it becomes clear what to do with constraint violations: instead
of allowing all possible inputs (and generating "the least garbagey
possible output" for garbage input), just throw an exception.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Nov 14 '05 #39