473,396 Members | 1,846 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Safer and Better C

Hi,

I am not using C all the time. I have a general understanding of C
and nothing else. The recent reply to use strlcpy and strlcat showed
me that I am not aware of the best and safe techniques. Is there any
place where I could learn more about safer and better C (on FreeBSD)?
Thank you
Nov 14 '05 #1
39 2335
bazad wrote:

I am not using C all the time. I have a general understanding of C
and nothing else. The recent reply to use strlcpy and strlcat showed
me that I am not aware of the best and safe techniques. Is there any
place where I could learn more about safer and better C (on FreeBSD)?


Do not start a new thread without a reason. This should have been
a reply to something in some other thread, with sufficient material
quoted and attributed for us to put things in context.

C is inherently unsafe. By monitoring this newsgroup you will now
and then find out about ways of appeasing the lurking tigers.
Beyond that you just have to be aware of what is going on.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
Nov 14 '05 #2
bazad wrote:
I am not using C all the time.
I have a general understanding of C and nothing else.
The recent reply to use strlcpy and strlcat showed me that
I am not aware of the best and safe techniques.
Is there any place
where I could learn more about safer and better C (on FreeBSD)?


Type

man strlcpy

or

man strlcat

at your FreeBSD prompt.
Nov 14 '05 #3
"bazad" <no****@noreply.not> wrote in message
news:1097780068.uMbmVua0IQ4XTixLRe9hpg@teranews...
Hi,

I am not using C all the time. I have a general understanding of C
and nothing else. The recent reply to use strlcpy and strlcat showed
me that I am not aware of the best and safe techniques. Is there any
place where I could learn more about safer and better C (on FreeBSD)?
Thank you


Read this thread

http://books.slashdot.org/article.pl...102&tid=190&ti
d=130&tid=6

a couple of days back. Says that a complete chapter on secure C.

HTH
Nov 14 '05 #4
bazad <no****@noreply.not> wrote:
Hi,

I am not using C all the time. I have a general understanding of C
and nothing else. The recent reply to use strlcpy and strlcat showed
me that I am not aware of the best and safe techniques. Is there any
place where I could learn more about safer and better C (on FreeBSD)?


Read the FAQ--http://www.eskimo.com/~scs/C-faq/top.html--twice. You can't go
wrong there. You're likely better off using the existing interfaces properly
than looking for "safer" interfaces.

On a related note, Theo and Company of OpenBSD fame--arguably the ones who
most popularized the functions--will admit that strlcpy() and strlcat() are
_not_ the preferred solutions. memcpy() is even better, because the
occasions when you do not know the length of your source string should be
few and far between. strlcpy() and strlcat() should be a last resort. It's
also worth noting that the C99 semantics of snprintf() are very similar and
more widely available (FreeBSD's snprintf() is one such implementation, I
believe).

strlcpy() and strlcat() are fairly unique in that they're additions to
C--albeit platform specific extensions and not very portable--which play
fair with and generally fit in well amongst the wider body of C code. Using
fancy libraries can often create more problems than they solve, because they
don't fit well with the existing corpus of C source and the points of
contact require considerable attention to detail.

For more secure applications overall--like chroot() and privilege revocation
techniques--in FreeBSD, comp.unix.programmer is probably a better bet.

Nov 14 '05 #5
bazad wrote:
Hi,

I am not using C all the time. I have a general understanding of C
and nothing else. The recent reply to use strlcpy and strlcat showed
me that I am not aware of the best and safe techniques. Is there any
place where I could learn more about safer and better C (on FreeBSD)?
Thank you

The most common security problems are buffer overflows. Simply put,
this means writing more data into a buffer than there's space for.
You'd do yourself a favor by learning how some of these exploits work.
I know there's a couple of old Phrack articles around, as well as an
article over at SecuriTeam, entitled 'Writing Buffer Overflow Exploits -
a Tutorial for Beginners':

http://www.securiteam.com/securityre...OP0B006UQ.html

However, note that discussions of the information in that article are
off topic here.
Mark F. Haigh
mf*****@sbcglobal.net
Nov 14 '05 #6
William Ahern wrote:
.... snip ...
strlcpy() and strlcat() are fairly unique in that they're additions
to C--albeit platform specific extensions and not very portable--
which play fair with and generally fit in well amongst the wider
body of C code. Using fancy libraries can often create more
problems than they solve, because they don't fit well with the
existing corpus of C source and the points of contact require
considerable attention to detail.


Their implementation is NOT platform specific and totally portable,
and thus they can be used anywhere by supplying an implementation.
I have done so, written in purely standard C. See my page in sig.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
Nov 14 '05 #7
jo*******@my-deja.com (John Bode) writes:
[...]
6. When comparing against a constant expression for equality, put the
constant on the LHS (i.e., if (SOME_CONSTANT == x)); this will catch
any problems where you typed "=" when you meant "==".

[...]

This one is controversial. Personally, I find the (5 == x) form
grating; I'd rather use (x == 5) and just make sure I get the operator
right. (This has been discussed to death here before.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #8

"Keith Thompson" <ks***@mib.org> wrote in message
news:ln************@nuthaus.mib.org...
jo*******@my-deja.com (John Bode) writes:
[...]
6. When comparing against a constant expression for equality, put the
constant on the LHS (i.e., if (SOME_CONSTANT == x)); this will catch
any problems where you typed "=" when you meant "==".

[...]

This one is controversial. Personally, I find the (5 == x) form
grating; I'd rather use (x == 5) and just make sure I get the operator
right. (This has been discussed to death here before.)


#define equals ==

if(x equals y)
;

:-)

-Mike
Nov 14 '05 #9
CBFalconer <cb********@yahoo.com> wrote:
William Ahern wrote:
... snip ...

strlcpy() and strlcat() are fairly unique in that they're additions
to C--albeit platform specific extensions and not very portable--
which play fair with and generally fit in well amongst the wider
body of C code. Using fancy libraries can often create more
problems than they solve, because they don't fit well with the
existing corpus of C source and the points of contact require
considerable attention to detail.

Their implementation is NOT platform specific and totally portable,
and thus they can be used anywhere by supplying an implementation.
I have done so, written in purely standard C. See my page in sig.

Ah, yes. That statement was poorly worded. I include OpenBSD's strlcpy() and
strlcat() code in many of my projects. I just meant that it's not available
on many platforms--e.g. Linux--and if you don't want to go through the
trouble of including it yourself snprintf() often suffices.

FWIW, the OpenBSD crowd writes very portable code (not fans of GCC'isms). I
keep a compat library around which I reuse for most of my development (I
especially like Niels Provos' sys/tree.h header for easy-peasy splay and
red-black trees).
Nov 14 '05 #10
John Bode wrote:
.... snip ...
1. Initialize all variables to a known value.
2. Check all return values from library functions.
3. Don't use gets().
4. During development, set the warning level on the compiler to
its highest setting. Review and eliminate each warning.
5. Don't cast an expression *just* to eliminate a warning.
6. When comparing against a constant expression for equality, put
the constant on the LHS (i.e., if (SOME_CONSTANT == x)); this
will catch any problems where you typed "=" when you meant "==".
7. Abstract out tedious, repetitive, and/or low-level tasks. IOW,
don't call malloc() directly from your application code, but
wrap it in a function that performs error checking and
initialization of the memory being returned.


I agree with all except #1, which can mask a failure to suitably
initialize later.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
Nov 14 '05 #11
In article <g0*****************@newsread3.news.pas.earthlink. net>,
Mike Wahler <mk******@mkwahler.net> wrote:

"Keith Thompson" <ks***@mib.org> wrote in message
news:ln************@nuthaus.mib.org...
jo*******@my-deja.com (John Bode) writes:
[...]
> 6. When comparing against a constant expression for equality, put the
> constant on the LHS (i.e., if (SOME_CONSTANT == x)); this will catch
> any problems where you typed "=" when you meant "==".

[...]

This one is controversial. Personally, I find the (5 == x) form
grating; I'd rather use (x == 5) and just make sure I get the operator
right. (This has been discussed to death here before.)


#define equals ==

if(x equals y)
;


Heh.

But don't most compilers catch (warn about) this anyway, these days?

That is, they want you to change:

if (x = 5)
to:
if ((x = 5))

Nov 14 '05 #12
> But don't most compilers catch (warn about) this anyway, these days?

That is, they want you to change:

if (x = 5)
to:
if ((x = 5))


What will the compiler catch if you have a multiple test, like:

if ((x = 5) && (y == 6))
Nov 14 '05 #13
Guillaume <"grsNOSPAM at NOTTHATmail dot com"> writes:
But don't most compilers catch (warn about) this anyway, these days?
That is, they want you to change:
if (x = 5)
to:
if ((x = 5))


What will the compiler catch if you have a multiple test, like:

if ((x = 5) && (y == 6))


gcc doesn't.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #14
On Sat, 16 Oct 2004 01:26:14 +0000, Keith Thompson wrote:
Guillaume <"grsNOSPAM at NOTTHATmail dot com"> writes:
But don't most compilers catch (warn about) this anyway, these days?
That is, they want you to change:
if (x = 5)
to:
if ((x = 5))


What will the compiler catch if you have a multiple test, like:

if ((x = 5) && (y == 6))


gcc doesn't.


int main()
{
int x = 3, y = 4;

if ( y = x )
;

return 0;
}

gcc -Wall test.c
test.c: In function `main':
test.c:5: warning: suggest parentheses around assignment used as truth
value
Apparently, it does. Just not with the default warning levels... but
you'd never fail to use at least -Wall during development, would you?
Nov 14 '05 #15

On Fri, 15 Oct 2004, Kelsey Bjarnason wrote:

On Sat, 16 Oct 2004 01:26:14 +0000, Keith Thompson wrote:
Guillaume <"grsNOSPAM at NOTTHATmail dot com"> writes:

What will the compiler catch if you have a multiple test, like:

if ((x = 5) && (y == 6))
gcc doesn't.

[...] Apparently, it does.


Try again, this time with the line Guillaume asked about. Keith's
absolutely right.

On the other hand, gcc /will/ warn you if you leave off the redundant
parentheses in Guillaume's example. Which some people might see as an
advantage to leaving them off (my preferred style in many cases as it
reduces clutter), but really I don't consider "mistyping == as = or
vice versa" to be a statistically significant problem in the first place.

-Arthur
Nov 14 '05 #16
In article
<pa****************************@xxnospamyy.lightsp eed.bc.ca>,
Kelsey Bjarnason <ke*****@xxnospamyy.lightspeed.bc.ca> wrote:
On Sat, 16 Oct 2004 01:26:14 +0000, Keith Thompson wrote:
Guillaume <"grsNOSPAM at NOTTHATmail dot com"> writes:
But don't most compilers catch (warn about) this anyway, these days?
That is, they want you to change:
if (x = 5)
to:
if ((x = 5))

What will the compiler catch if you have a multiple test, like:

if ((x = 5) && (y == 6))


gcc doesn't.


int main()
{
int x = 3, y = 4;

if ( y = x )
;

return 0;
}

gcc -Wall test.c
test.c: In function `main':
test.c:5: warning: suggest parentheses around assignment used as truth
value


I believe they were referring to the latter construction:

if ((x = 5) && (y == 6))

which is not caught (at least not with -Wall on gcc 3.4.2).

Cheers,
- jonathan
Nov 14 '05 #17
Kelsey Bjarnason <ke*****@xxnospamyy.lightspeed.bc.ca> writes:
On Sat, 16 Oct 2004 01:26:14 +0000, Keith Thompson wrote:
Guillaume <"grsNOSPAM at NOTTHATmail dot com"> writes:
What will the compiler catch if you have a multiple test, like:

if ((x = 5) && (y == 6))


gcc doesn't.


if ( y = x )
;


Are you paying attention?
--
Ben Pfaff
email: bl*@cs.stanford.edu
web: http://benpfaff.org
Nov 14 '05 #18
"Arthur J. O'Dwyer" <aj*@nospam.andrew.cmu.edu> writes:
On Fri, 15 Oct 2004, Kelsey Bjarnason wrote:

On Sat, 16 Oct 2004 01:26:14 +0000, Keith Thompson wrote:
Guillaume <"grsNOSPAM at NOTTHATmail dot com"> writes:

What will the compiler catch if you have a multiple test, like:

if ((x = 5) && (y == 6))

gcc doesn't.

[...]
Apparently, it does.


Try again, this time with the line Guillaume asked about. Keith's
absolutely right.

On the other hand, gcc /will/ warn you if you leave off the
redundant parentheses in Guillaume's example. Which some people
might see as an advantage to leaving them off (my preferred style in
many cases as it reduces clutter), but really I don't consider
"mistyping == as = or vice versa" to be a statistically significant
problem in the first place.


The parentheses aren't redundant (if that's really supposed to be "="
rather than "=="). If you leave them out:

if (x = 5 && y == 6)

is equivalent to

if (x = (5 && y == 6))

Of course if you correctly use "==" rather than "=", they are redundant:

if (x == 5 && y == 6)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #19
On Sat, 16 Oct 2004 00:12:55 -0400, Arthur J. O'Dwyer wrote:

On Fri, 15 Oct 2004, Kelsey Bjarnason wrote:

On Sat, 16 Oct 2004 01:26:14 +0000, Keith Thompson wrote:
Guillaume <"grsNOSPAM at NOTTHATmail dot com"> writes:

What will the compiler catch if you have a multiple test, like:

if ((x = 5) && (y == 6))

gcc doesn't.

[...]
Apparently, it does.


Try again, this time with the line Guillaume asked about. Keith's
absolutely right.


Actually, it does. Note that the (x=5) is included in the extra layer of
parentheses, which is the _fix_ to allow such a situation to occur without
the warning. Trying it in the context of the original actual problem -
without the extra parentheses - it does, indeed, complain.

One can hardly say "X doesn't do this" when it does _unless_ one takes
steps to prevent it... and then test with code which has, in fact, taken
those steps. Might as well compile with all warnings disabled and then
complain the compiler doesn't detect any of a thousand or more things.
Nov 14 '05 #20
Kelsey Bjarnason <ke*****@xxnospamyy.lightspeed.bc.ca> writes:
On Sat, 16 Oct 2004 00:12:55 -0400, Arthur J. O'Dwyer wrote:

On Fri, 15 Oct 2004, Kelsey Bjarnason wrote:

On Sat, 16 Oct 2004 01:26:14 +0000, Keith Thompson wrote:
Guillaume <"grsNOSPAM at NOTTHATmail dot com"> writes:
>
> What will the compiler catch if you have a multiple test, like:
>
> if ((x = 5) && (y == 6))

gcc doesn't.

[...]
Apparently, it does.


Try again, this time with the line Guillaume asked about. Keith's
absolutely right.


Actually, it does. Note that the (x=5) is included in the extra layer of
parentheses, which is the _fix_ to allow such a situation to occur without
the warning. Trying it in the context of the original actual problem -
without the extra parentheses - it does, indeed, complain.


Without the "extra" parentheses, it's a different expression.
if (x = 5 && y == 6)
is equivalent to
if (x = (5 && y == 6))
and gcc complains because the assignment is at the top level of the
expression.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #21
Keith Thompson wrote:
"Arthur J. O'Dwyer" <aj*@nospam.andrew.cmu.edu> writes:
On Fri, 15 Oct 2004, Kelsey Bjarnason wrote:
On Sat, 16 Oct 2004 01:26:14 +0000, Keith Thompson wrote:
Guillaume <"grsNOSPAM at NOTTHATmail dot com"> writes:
>
> What will the compiler catch if you have a multiple test, like:
>
> if ((x = 5) && (y == 6))

gcc doesn't.

[...]
Apparently, it does.


Try again, this time with the line Guillaume asked about. Keith's
absolutely right.

On the other hand, gcc /will/ warn you if you leave off the
redundant parentheses in Guillaume's example. Which some people
might see as an advantage to leaving them off (my preferred style
in many cases as it reduces clutter), but really I don't consider
"mistyping == as = or vice versa" to be a statistically
significant problem in the first place.


The parentheses aren't redundant (if that's really supposed to be
"=" rather than "=="). If you leave them out:

if (x = 5 && y == 6)

is equivalent to

if (x = (5 && y == 6))

Of course if you correctly use "==" rather than "=", they are redundant:

if (x == 5 && y == 6)


Yet, of the 4 statements above, only the first is unequivocally
clear to any reader with any level of understanding of C's weird
and wonderful precedence system. Any decent optimizer should be
able to figure out that no code need be generated.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
Nov 14 '05 #22
Jonathan Adams wrote:
.... snip ...
I believe they were referring to the latter construction:

if ((x = 5) && (y == 6))

which is not caught (at least not with -Wall on gcc 3.4.2).


What's to be caught? It is perfectly valid. If you want it to
catch failure to type the second '=' simply get in the habit of
putting the constant first:

if ((5 = x) && (6 == y))

which will squawk loudly on any compiler.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
Nov 14 '05 #23
Keith Thompson <ks***@mib.org> wrote in message news:<ln************@nuthaus.mib.org>...
jo*******@my-deja.com (John Bode) writes:
[...]
6. When comparing against a constant expression for equality, put the
constant on the LHS (i.e., if (SOME_CONSTANT == x)); this will catch
any problems where you typed "=" when you meant "==".

[...]

This one is controversial. Personally, I find the (5 == x) form
grating; I'd rather use (x == 5) and just make sure I get the operator
right. (This has been discussed to death here before.)


Yeah. I don't use it myself for the same reason; it's just too
mentally jarring. And it's not a mistake I make very often. But it
does provide some level of safety (of course, of both expressions are
variables (lvalues), it doesn't help much).
Nov 14 '05 #24
"CBFalconer" <cb********@yahoo.com> wrote in message
news:41**************@yahoo.com...
Jonathan Adams wrote:

... snip ...

I believe they were referring to the latter construction:

if ((x = 5) && (y == 6))

which is not caught (at least not with -Wall on gcc 3.4.2).


What's to be caught? It is perfectly valid. If you want it to
catch failure to type the second '=' simply get in the habit of
putting the constant first:

if ((5 = x) && (6 == y))

which will squawk loudly on any compiler.

When someone first suggested me this thing I really really liked it and in
fact started using it. However maybe for equality it sounds good. But when I
see code like

if ( 0 >= c )
;

I always have to convert it either mentally or in my notes to
if ( c <= 0 )
;

Of course YMMV. Just like MMV.

--
Imanpreet Singh Arora
Zmoc.Zliamg@Zteerpnami
Remove Z to mail
"Things may come to those who wait, but only the things left by those who
hustle."
Abraham Lincoln


Nov 14 '05 #25
jo*******@my-deja.com (John Bode) wrote:
bazad <no****@noreply.not> wrote:
I am not using C all the time. I have a general understanding of C
and nothing else. The recent reply to use strlcpy and strlcat showed
me that I am not aware of the best and safe techniques. Is there any
place where I could learn more about safer and better C (on FreeBSD)?

strlcpy and strlcat just transform the way in which a buffer overflow
can happen. They don't address the cause (human error w.r.t. length
calculations.) The way I avoid buffer overflows in strings is to use
a string ADT which doesn't takes memory length and string length into
account automatically with each operation:

http://bstring.sf.net/

This can bring the safety level essentially up to the same as is found
in other higher level languages in string operations.
I don't know of any specific resources, but here are some personal
guidelines in no particular order (note that I don't necessarily
follow these *all* the time, depending on the situation and how
deluded^H^H^H^H^H^H^Hconfident I am in my own abilities that day):

1. Initialize all variables to a known value.
Hmm ... well so for pointers do you initialize them to NULL? That's
fine, but its not much of a safety parachute if you have an accidental
"use before proper initialize" error.
2. Check all return values from library functions.
Well except for in bstrlib where its semantically optional. You can
usually just check dependent return values at the end and still know
an error has occurred without suffering from UB.
3. Don't use gets().
Better yet define gets() to emit an error or do something like stop
the program in its tracks.
4. During development, set the warning level on the compiler to its
highest setting. Review and eliminate each warning.
Right. The point is to recognize that even if you don't agree with a
warning, the effort you put into eliminating it is worth it for all
the other hints the compiler gives you through its warnings.

While not practical for everyone, these days I also try to ensure that
my code compiles with multiple compilers. I have found that different
compilers have vastly different safety coverage with their warnings --
complying with all of them helps make code truly bulletproof and
maintainable.
5. Don't cast an expression *just* to eliminate a warning.
I'm not sure what scenario you are talking about here. I would rather
say that you should cast *correctly*. I.e., clearly there are ways in
which casting numerics incorrectly can get the right type but the
wrong/inaccurate result.
6. When comparing against a constant expression for equality, put the
constant on the LHS (i.e., if (SOME_CONSTANT == x)); this will catch
any problems where you typed "=" when you meant "==".
Of course. People who have problems with this are somehow letting
some neurosis in their brain dominate over recognizing the law of
commutativity for this operator. The safety benefit for doing this is
obvious.
7. Abstract out tedious, repetitive, and/or low-level tasks. IOW,
don't call malloc() directly from your application code, but wrap it
in a function that performs error checking and initialization of the
memory being returned.


Well, the new/delete paradigm of Pascal or C++ is usually a lot safer
and readable than C's crazy mallocing. So for ADT's, I usually have
creation functions that I name with a "New" in them, and destruction
function names with a "Destroy" in them.

These are all reasonable ideas. But certainly I would add to them:

8. Use const *maximally*, and never cast away or work around const
semantics. Using const will typically make it obvious what parameters
to a function are inputs and what are outputs.

9. Always include error paths out of every function. (This goes with
2. above.) Without exception handling in C, your choices are either
to exit immediately (not recommended) or return with some kind of
erroneous return status. For ADTs that cannot be constructed, I
usually return NULL, and for just general errors, I return some
negative value under the assumption that normal operations always
return with 0 or a positive number. For debugging purposes -__LINE__
is a typical value that I return as an error.

10. Program for thread safety and reentrancy. strtok() is an example
of how *NOT* to design a function. Modifying what should obviously be
a source parameter, and then storing away the result in some single
focus, static way makes strtok non-reentrant. There is nothing in the
desired functionality of such a function that demands such bad
properties. Think of trying to do a simple thing like iterating
through substrings of a string in an outer loop, and then doing the
same on each substring in the inner loop -- strtok cannot be used in
something even as simple as that. If you are on linux look at
strtok_r() in the man pages for an example of a superior design which
has essentially the same functionality of strtok without its
weaknesses.

Ok, although ANSI C says nothing about multithreading, there is hardly
any modern implementation that does not expose platform specific, or
posix multithreading functionality. Statics, globals, and
side-effects are the kinds of things that work against race condition
safety, and so it pays to minimize them in *all* of your code.

11. If you have algorithms that only make sense for certain modes of
some parameters, try to implement them in functions with static
declaration. External interfaces should accept any combination and
modes of parameters so long as they are legal with respect to their
own type. The idea is that a developer should be able to read a .h
file read the function names, and already have a good idea of how to
use the module. Typically what prevents this is that usage of
functions have non-obvious parameter restrictions which requires that
developers read through documentation (which may or may not exist, may
be of poor quality, have errors in it, etc) to figure out what is
going on.

There is some controversy here though. *Personally* I insist on
*supporting* aliased parameters to the maximum degree possible.
However, I have basically seen almost no libraries that are
implemented with this in mind (gmp is an example of a library which
takes my point of view, for functionality reasons -- but you can see
how supporting aliasing can be very well motivated.) The assumption
of no aliasing is usually implicit or specifically required, even
though this is rarely enforced by "restrict" (which is not in
widespread use since C99 has not been adopted by any mainstream
compiler vendor.)

12. Avoid the C library for string manipulation. Use
http://bstrlib.sf.net/ or something in which memory and length
semantics are automatically managed.

--
Paul Hsieh
http://www.pobox.com/~qed/
Nov 14 '05 #26
Paul Hsieh wrote:
jo*******@my-deja.com (John Bode) wrote:
bazad <no****@noreply.not> wrote:
I am not using C all the time. I have a general understanding of C
and nothing else. The recent reply to use strlcpy and strlcat showed
me that I am not aware of the best and safe techniques. Is there any
place where I could learn more about safer and better C (on FreeBSD)?


strlcpy and strlcat just transform the way in which a buffer
overflow can happen. They don't address the cause (human error
w.r.t. length calculations.)


Please don't give such misinformation. Those routines prevent
buffer overflows, and report the condition. They DO address the
human cause by simplifying supplying the appropriate parameter
values. Read the bloody documentation.
The way I avoid buffer overflows in strings is to use
a string ADT which doesn't takes memory length and string length
into account automatically with each operation:


Your string system stands (or fails) by itself. strlcpy and
strlcat work with the existing standardized string system.

--
Some informative links:
news:news.announce.newusers
http://www.geocities.com/nnqweb/
http://www.catb.org/~esr/faqs/smart-questions.html
http://www.caliburn.nl/topposting.html
http://www.netmeister.org/news/learn2quote.html
Nov 14 '05 #27
CBFalconer <cb********@yahoo.com> wrote:
Jonathan Adams wrote:
I believe they were referring to the latter construction:

if ((x = 5) && (y == 6))

which is not caught (at least not with -Wall on gcc 3.4.2).


What's to be caught? It is perfectly valid. If you want it to
catch failure to type the second '=' simply get in the habit of
putting the constant first:

if ((5 = x) && (6 == y))


....and be prepared to spend a lot of time trying to track the bug next
time you accidentally write if (y=x) instead of if (x==y), because
you've grown out of the habit of paying attention.

Richard
Nov 14 '05 #28
CBFalconer <cb********@yahoo.com> wrote:
Paul Hsieh wrote:
jo*******@my-deja.com (John Bode) wrote:
bazad <no****@noreply.not> wrote:
I am not using C all the time. I have a general understanding of C
and nothing else. The recent reply to use strlcpy and strlcat showed
me that I am not aware of the best and safe techniques. Is there any
place where I could learn more about safer and better C (on FreeBSD)?


strlcpy and strlcat just transform the way in which a buffer
overflow can happen. They don't address the cause (human error
w.r.t. length calculations.)


Please don't give such misinformation. Those routines prevent
buffer overflows, and report the condition. They DO address the
human cause by simplifying supplying the appropriate parameter
values. Read the bloody documentation.


These functions don't magically associate legal memory size with the
string. That's still the responsibility of the programmer. So its
changed an implicit requirement to an explicit one. But it doesn't
address the real problem, which is that programmers make mistakes
which causes memory limit to be inadequate for the desired string
operation -- and there following is the buffer overrun.

The documentation seems to suggest that sizeof() is the best way of
tracking the length of strings. In any event, max presized strings is
the *real* problem. One is left either preallocating too much if the
input is small, or too little if the input is large, and avoiding
buffer overflows is still up to programmer dilligence. There's a
reason why other languages have dynamically sized strings.
The way I avoid buffer overflows in strings is to use
a string ADT which [sic] takes memory length and string length
into account automatically with each operation:


Your string system stands (or fails) by itself. strlcpy and
strlcat work with the existing standardized string system.


Bstrlib also works with the standardized string system (in the most
natural way possible) without increasing the burden of programmer
considerations (tracking the dynamic memory length along with the
string.) This is one of its most important features -- you can still
continue to use char * libraries and interfaces while using Bstrlib
without any conversion penalties.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
Nov 14 '05 #29
Paul Hsieh wrote:
CBFalconer <cb********@yahoo.com> wrote:
Paul Hsieh wrote:
.... snip ...

strlcpy and strlcat just transform the way in which a buffer
overflow can happen. They don't address the cause (human error
w.r.t. length calculations.)


Please don't give such misinformation. Those routines prevent
buffer overflows, and report the condition. They DO address the
human cause by simplifying supplying the appropriate parameter
values. Read the bloody documentation.


These functions don't magically associate legal memory size with the
string. That's still the responsibility of the programmer. So its
changed an implicit requirement to an explicit one. But it doesn't
address the real problem, which is that programmers make mistakes
which causes memory limit to be inadequate for the desired string
operation -- and there following is the buffer overrun.


C in general cannot perform such protection, and the cause is
rooted in the flagrant bandying about of pointers, pointer
arithmetic, transformations such as casts, the use of variadic
functions, and more. Strings are simply one aspect of this. We
can all easily avoid these problems by switching to a language
designed with correctness in mind, such as Pascal, Modula, Ada.
Unfortunately the world is full of macho programmers who seem to
feel they can handle an error-prone language, such as C, without
ever getting burnt. I am often among them.

I think it is not a good idea to supply areas that avoid the
typical C problems, without handling the overall (impossible)
problem. It only gives the unwashed a false sense of security, and
encourages those to make even graver errors elsewhere. Note that
providing routines that can be controlled is not such an avoidance.

--
"I support the Red Sox and any team that beats the Yankees"
"Any baby snookums can be a Yankee fan, it takes real moral
fiber to be a Red Sox fan"
"I listened to Toronto come back from 3:0 in '42, I plan to
watch Boston come back from 3:0 in 04"
Nov 14 '05 #30
CBFalconer <cb********@yahoo.com> wrote:
Paul Hsieh wrote:
CBFalconer <cb********@yahoo.com> wrote:
Paul Hsieh wrote:
strlcpy and strlcat just transform the way in which a buffer
overflow can happen. They don't address the cause (human error
w.r.t. length calculations.)

Please don't give such misinformation. Those routines prevent
buffer overflows, and report the condition. They DO address the
human cause by simplifying supplying the appropriate parameter
values. Read the bloody documentation.
These functions don't magically associate legal memory size with the
string. That's still the responsibility of the programmer. So its
changed an implicit requirement to an explicit one. But it doesn't
address the real problem, which is that programmers make mistakes
which causes memory limit to be inadequate for the desired string
operation -- and there following is the buffer overrun.


C in general cannot perform such protection, and the cause is
rooted in the flagrant bandying about of pointers, pointer
arithmetic, transformations such as casts, the use of variadic
functions, and more. Strings are simply one aspect of this. We
can all easily avoid these problems by switching to a language
designed with correctness in mind, such as Pascal, Modula, Ada.


Pascal has the same problem -- trying to dereference a nil pointer,
for example (I am not an expert in Modula-2 or Ada, but as I recall
neither is GC based, so are likely to have the same problem.) You
have to go to Java/Python/Perl if you want to bring yourself into a
more totally insulated programming environment. Another very
interesting approach is the Cyclone programming language -- though it
might be a little syntactically annoying, it gives near C level of
program control while being completely safe. These languages do total
abstraction for you. The question is -- can you add just enough
abstraction to C to gain enough safety that dealing with the
weaknesses of the C language doesn't become burdensome or unmanageable
in the long run. One way to do this is develop ADTs with full
closure. Bstrlib is about as close as you can come to an ADT with
full closure in C.

You speak in these generalities with your opinions about these things
yet you demonstrate so clearly that you have not honestly examined
Bstrlib. You don't understand the interesting line in the sand that I
have drawn -- the real question I have put forth to programming world.
If you can develop ADTs with safety, speed, functionality,
portability and interoperability of Bstrlib then is the switch to
other programming languages really necessary?
Unfortunately the world is full of macho programmers who seem to
feel they can handle an error-prone language, such as C, without
ever getting burnt. I am often among them.
I don't think programming in C is about machismo. That would be
programming in ASM.
I think it is not a good idea to supply areas that avoid the
typical C problems, without handling the overall (impossible)
problem. It only gives the unwashed a false sense of security, and
encourages those to make even graver errors elsewhere.
What basis is there for this outrageous statement? All programming
abstractions are basically schemes for managing complexity which are
supposed to improve productivity or safety -- including switching to
other languages. Some of the *typical* problems of C are repetitive,
and solving them over and over again has no value whatsoever. I
reject the notion that allowing beginner programmers to make mistakes
with strings will help them make less grave errors in other
programming.

In fact Bstrlib includes a module called bsafe which forcably
deprecates some of C's unsafe functionality -- so it can be argued
that it reduces some errors outside of Bstrlib.
Note that providing routines that can be controlled is not such an avoidance.


Routines like strlcat, and strlcpy don't solve the real problem. All
of the exact same buffer overflow scenarios are still present in the
same way and same sense, just with a somewhat reduced likelihood of
happening.

A direct comparison with Bstrlib shows that so many of the typical
buffer overflow scenarios are just not possible using Bstrlib. But it
is not only just some safe programming library. Its an *EXAMPLE* (and
its source code is available) of how to improve safety through proper
abstraction without significantly compromising on anything.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
Nov 14 '05 #31
Paul Hsieh wrote:
CBFalconer <cb********@yahoo.com> wrote:
.... snip ...
You speak in these generalities with your opinions about these things
yet you demonstrate so clearly that you have not honestly examined
Bstrlib. You don't understand the interesting line in the sand that I
have drawn -- the real question I have put forth to programming world.
If you can develop ADTs with safety, speed, functionality,
portability and interoperability of Bstrlib then is the switch to
other programming languages really necessary?
I readily concede that point. The fact that so far I have had no
noticeable problems using the existing system has something to do
with it.
Unfortunately the world is full of macho programmers who seem to
feel they can handle an error-prone language, such as C, without
ever getting burnt. I am often among them.


I don't think programming in C is about machismo. That would be
programming in ASM.
I think it is not a good idea to supply areas that avoid the
typical C problems, without handling the overall (impossible)
problem. It only gives the unwashed a false sense of security,
and encourages those to make even graver errors elsewhere.


What basis is there for this outrageous statement? ... snip ...


My own opinion. I got rid of the training wheels on my childrens
bicycles at the earliest opportunity.

--
"I support the Red Sox and any team that beats the Yankees"
"Any baby snookums can be a Yankee fan, it takes real moral
fiber to be a Red Sox fan" - "I listened to Toronto come back
from 3:0 in '42, I watched Boston come back from 3:0 in '04"
Nov 14 '05 #32
CBFalconer <cb********@yahoo.com> wrote:
Paul Hsieh wrote:
CBFalconer <cb********@yahoo.com> wrote:
You speak in these generalities with your opinions about these things
yet you demonstrate so clearly that you have not honestly examined
Bstrlib. You don't understand the interesting line in the sand that I
have drawn -- the real question I have put forth to programming world.
If you can develop ADTs with safety, speed, functionality,
portability and interoperability of Bstrlib then is the switch to
other programming languages really necessary?


I readily concede that point. The fact that so far I have had no
noticeable problems using the existing system has something to do
with it.


It means you don't measure performance, and you don't measure the
danger or effort required to deal with buffer overruns. You do it for
ego, yet you don't realize that you are implicitely subordinating your
ego to Thompson, Kernigan and Ritchie and their vision for how strings
should be implemented.

The C language and libraries implementation of strings is really poor
from every angle of consideration except for extremely small systems
(like 8bit systems with < 64K, or ROM programming.) Bstrlib makes the
effort to leave all the worst aspects of '\0' terminated char *
strings behind while retaining just the right amount of
interoperability to leverage the incumbancy of them.

You've let K&R&T tell you that string and binary buffers are
necessarily distinct. Its insane things like this which make binary
preserving text or hex editors all the more complicated to implement.
I think it is not a good idea to supply areas that avoid the
typical C problems, without handling the overall (impossible)
problem. It only gives the unwashed a false sense of security,
and encourages those to make even graver errors elsewhere.


What basis is there for this outrageous statement? ... snip ...


My own opinion. I got rid of the training wheels on my childrens
bicycles at the earliest opportunity.


But what you don't realize is that you've also taken off the gears
with the greatest torque. The C way of doing things is not just
dangerous but its also *SLOWER* (there is an example included in the
Bstrlib downloads which has a benchmark -- see for yourself.) So
you've accepted a more difficult challenge in order to achieve an
inferior result.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
Nov 14 '05 #33
Paul Hsieh wrote:
CBFalconer <cb********@yahoo.com> wrote:
Paul Hsieh wrote:
CBFalconer <cb********@yahoo.com> wrote: .... snip ... I think it is not a good idea to supply areas that avoid the
typical C problems, without handling the overall (impossible)
problem. It only gives the unwashed a false sense of security,
and encourages those to make even graver errors elsewhere.

What basis is there for this outrageous statement? ... snip ...


My own opinion. I got rid of the training wheels on my childrens
bicycles at the earliest opportunity.


But what you don't realize is that you've also taken off the gears
with the greatest torque. The C way of doing things is not just
dangerous but its also *SLOWER* (there is an example included in the
Bstrlib downloads which has a benchmark -- see for yourself.) So
you've accepted a more difficult challenge in order to achieve an
inferior result.


No, when I want a better, more secure language I simply use
Pascal. When I want to stay very close to the machine I use
assembly. When I am willing to compromise or want to maximize
practical portability I use C. Apart from the assembly I am
usually on very firm ISO standardized ground.

Bear in mind that I have no objection to you, or anyone else, using
your Bstrlib system. I have simply found no need so far, and
consider such efforts better applied to languages with a secure
foundation. I reserve the right to change my mind in future.

--
"I support the Red Sox and any team that beats the Yankees"
"Any baby snookums can be a Yankee fan, it takes real moral
fiber to be a Red Sox fan" - "I listened to Toronto come back
from 3:0 in '42, I watched Boston come back from 3:0 in '04"
Nov 14 '05 #34
we******@gmail.com (Paul Hsieh) wrote:
interoperability to leverage the incumbancy of them.


You have just told us that you're a manager, and can conveniently be
ignored in a technical discussion. Thanks for the frankness.

Richard
Nov 14 '05 #35
On Fri, 22 Oct 2004 07:16:56 GMT, rl*@hoekstra-uitgeverij.nl (Richard
Bos) wrote:
we******@gmail.com (Paul Hsieh) wrote:
interoperability to leverage the incumbancy of them.


You have just told us that you're a manager, and can conveniently be
ignored in a technical discussion. Thanks for the frankness.

I think he's an academic. No difference to your conclusion, of course.

--
Al Balmer
Balmer Consulting
re************************@att.net
Nov 14 '05 #36
On 18 Oct 2004 13:57:17 -0700, qe*@pobox.com (Paul Hsieh) wrote:
<snip>
11. If you have algorithms that only make sense for certain modes of
some parameters, try to implement them in functions with static
declaration. External interfaces should accept any combination and
modes of parameters so long as they are legal with respect to their
own type. The idea is that a developer should be able to read a .h
file read the function names, and already have a good idea of how to
use the module. Typically what prevents this is that usage of
functions have non-obvious parameter restrictions which requires that
developers read through documentation (which may or may not exist, may
be of poor quality, have errors in it, etc) to figure out what is
going on.
I'm not sure what you mean by "mode" of a parameter. I have seen it
used (in COBOL, Pascal, and Ada, and informally in F9X) to mean the
parameter-passing mechanism or form (value, reference, in, out, etc.)
Since C only has by-value-initialized, you can't mean that. It sounds
like you mean values, or ranges, or combinations of such, that are
representable in the declared types but not valid for the callee.
There is some controversy here though. *Personally* I insist on
*supporting* aliased parameters to the maximum degree possible.
However, I have basically seen almost no libraries that are
implemented with this in mind (gmp is an example of a library which
takes my point of view, for functionality reasons -- but you can see
how supporting aliasing can be very well motivated.) The assumption
of no aliasing is usually implicit or specifically required, even
though this is rarely enforced by "restrict" (which is not in
widespread use since C99 has not been adopted by any mainstream
compiler vendor.)


Note that 'restrict' even when implemented does not ENFORCE anything,
or at least is not required to and actual checking in nontrivial cases
would be costly so implementors are unlikely to do it. This is unlike
the other/preexisting qualifiers const and volatile which are "safe"
(and sometimes annoying!) in that they cannot be "removed" from the
type without casting or cheating. All 'restrict' does for the caller
is DOCUMENT the requirement of nonaliasing, in a standardized (and
conceivably tool-processable) way, and in a place (the prototype)
where the programmer using it is almost certain to need to look. The
real benefit is supposed to be on the callee side, for optimization.

- David.Thompson1 at worldnet.att.net
Nov 14 '05 #37
Dave Thompson <da*************@worldnet.att.net> wrote:
On 18 Oct 2004 13:57:17 -0700, qe*@pobox.com (Paul Hsieh) wrote:
11. If you have algorithms that only make sense for certain modes of
some parameters, try to implement them in functions with static
declaration. External interfaces should accept any combination and
modes of parameters so long as they are legal with respect to their
own type. The idea is that a developer should be able to read a .h
file read the function names, and already have a good idea of how to
use the module. Typically what prevents this is that usage of
functions have non-obvious parameter restrictions which requires that
developers read through documentation (which may or may not exist, may
be of poor quality, have errors in it, etc) to figure out what is
going on.
I'm not sure what you mean by "mode" of a parameter.


Actually what I mean is things like "this integer parameter cannot be
negative" or "this pointer cannot be NULL" or "this integer parameter
must correspond to a lower bound for the space available for a given
buffer parameter" etc. Those are ok for module-internal statically
declared functions. But for stuff you expose via extern, you should
just accept any combination of whatever parameters so long as they are
legal with respect to their type (for example pointers should either
point to something valid or be NULL -- not just be randomly
unitialized.)
There is some controversy here though. *Personally* I insist on
*supporting* aliased parameters to the maximum degree possible.
However, I have basically seen almost no libraries that are
implemented with this in mind (gmp is an example of a library which
takes my point of view, for functionality reasons -- but you can see
how supporting aliasing can be very well motivated.) The assumption
of no aliasing is usually implicit or specifically required, even
though this is rarely enforced by "restrict" (which is not in
widespread use since C99 has not been adopted by any mainstream
compiler vendor.)


Note that 'restrict' even when implemented does not ENFORCE anything,
or at least is not required to and actual checking in nontrivial cases
would be costly so implementors are unlikely to do it.


I agree that enforcing it completely and pervasively is basically
infeasible. But obviously the compiler can and should check the most
obvious cases from the call sites (i.e., I'd at least like a warning
for strcat(p,p).)
[...] This is unlike
the other/preexisting qualifiers const and volatile which are "safe"
(and sometimes annoying!) in that they cannot be "removed" from the
type without casting or cheating. All 'restrict' does for the caller
is DOCUMENT the requirement of nonaliasing, in a standardized (and
conceivably tool-processable) way, and in a place (the prototype)
where the programmer using it is almost certain to need to look. The
real benefit is supposed to be on the callee side, for optimization.


I understand all this. But since "restrict" is not very good at
*enforcing* behavior it makes it significantly less useful than const
or volatile. Errors that arise from incorrect aliasing handling can
be difficult to debug, so without enforcement, we are left with a
performance hack (like register or inline) that is inevitably
superseded by better technology (like cross-file-inlining or similar
techniques from which true non-Aliasing properties can be sussed out.)

I don't think being in this situation of ambiguity is ultimately
productive, so I instead take the opposite tack. What if instead we
demand proper suppose for aliasing in our libraries/modules? This
leads us to considering for the following questions about
implementation:

1) Can we detect aliasing at runtime with high performance?
2) Is there a proper interpretation for functions that take aliased
parameters?
3) Can algorithms be written that are aliasing neutral?
4) Can we implement our ADTs to restrict aliasing to only trivial
kinds of aliasing? (i.e., identical -- but without partial
overlapping.)

The memmove() library call in the latest x86 compilers actually follow
these principles in enacting their solution. The idea is that the
memmove() function tries to switch to memcpy() in a maximal number of
cases, before using a slower aliasing neutral algorithm for
implementation. So memmove() is usually not really slower than
memcpy(), while being somewhat safer.

For bignum libraries like gmp, rather than having a myriad of
functions for implementing what seem like seperate operations: A = B +
C, A += B, A += A, the simpler solution is only to support A = B + C,
where any of A, B, or C can be aliased. This keeps the API managable,
while not seriously impacting performance (detection and branch is
very low overhead in comparison to the inner loops of any bignum
function.)

For my string library "the better string library"
(http://bstring.sf.net/) what I found is that under the assumption
that memmove() has nearly identical performance to memcpy(), there is
no performance difference in implementing a completely alias-safe
library versus one that is not. Having aliased parameters does not
lead to any ambiguous interpretation of what any Bstrlib function
*should* do semantically. This also means I don't have document
caveats like "Don't ever write bconcat(b,b) or binsert(b,0,b)",
because they work exactly as expected.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
Nov 14 '05 #38
(Almost-off-topic drift, and I am only going to respond to this
one item...)
Dave Thompson <da*************@worldnet.att.net> wrote:
I'm not sure what you mean by "mode" of a parameter.

In article <news:d0**************************@posting.google. com>
Paul Hsieh <we******@gmail.com> wrote:Actually what I mean is things like "this integer parameter cannot be
negative" or "this pointer cannot be NULL" or "this integer parameter
must correspond to a lower bound for the space available for a given
buffer parameter" etc. ...
A better word for this -- or at least one more commonly used -- is
"constraints". This also happens to be the word used in the C
standards (not entirely coincidentally) for its own requirements
upon the programmer.
Those are ok for module-internal statically
declared functions. But for stuff you expose via extern, you should
just accept any combination of whatever parameters so long as they are
legal with respect to their type (for example pointers should either
point to something valid or be NULL -- not just be randomly
unitialized.)


While I agree that, in general, weaker constraints are "better"
for exposed interfaces than stronger ones, sometimes strong(ish)
constraints seem to make sense. An example we had earlier (though
I have no idea whether it was in this same thread) occurs with
strlen(NULL): while the C standards could require that strlen()
return 0 in this case, and perhaps that strcpy() do nothing if
either of its operands is NULL, and so on, NULL is not actually a
string, and claiming that it *is* a string of length 0 is clearly
not entirely correct either. I would not object to strlen(NULL)
returning 0, but I do not object to its being considered a
dreadful mistake either (as C works today). I find neither one
"clearly superior" to the other: there are tradeoffs either way.

Languages with exceptions (Ada, C++, and Eiffel all come to mind)
can handle this by rejecting the attempt at runtime with an error;
and indeed, in Eiffel one can even express many constraints directly
in a function interface, so that compilers can catch some of these
errors at compile time. Once one buys into the "exceptions" model,
it becomes clear what to do with constraint violations: instead
of allowing all possible inputs (and generating "the least garbagey
possible output" for garbage input), just throw an exception.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 14 '05 #39
Chris Torek <no****@torek.net> wrote:
Paul Hsieh <we******@gmail.com> wrote:
Those are ok for module-internal statically
declared functions. But for stuff you expose via extern, you should
just accept any combination of whatever parameters so long as they are
legal with respect to their type (for example pointers should either
point to something valid or be NULL -- not just be randomly
unitialized.)
While I agree that, in general, weaker constraints are "better"
for exposed interfaces than stronger ones, sometimes strong(ish)
constraints seem to make sense. An example we had earlier (though
I have no idea whether it was in this same thread) occurs with
strlen(NULL): while the C standards could require that strlen()
return 0 in this case, and perhaps that strcpy() do nothing if
either of its operands is NULL, and so on, NULL is not actually a
string, and claiming that it *is* a string of length 0 is clearly
not entirely correct either. I would not object to strlen(NULL)
returning 0, but I do not object to its being considered a
dreadful mistake either (as C works today). I find neither one
"clearly superior" to the other: there are tradeoffs either way.


Ok, but that's because you are assuming the limits of size_t as output
and are just hacking in this extra condition. You see? You haven't
*DESIGNED* the right answer, you are just seeing if hacking in this
NULL <-> "" equivalence would work and are evaluating it from there.

Look, what happens if you change the definition of strlen like this:

int strlen (const char * s);

Which returns -1 if s is NULL, otherwise the same as strlen. But, now
we've limited the size of our strings to INT_MAX in length, rather
than the maximum of size_t. So if we think this is acceptable or can
ignore this for a second, then we have the return value telling us
about a typical error condition or else giving us the length in one
shot. This allows the user to treat it as the strlen they are used
to, or otherwise having at least some way of putting a low level
sanity check in there. More sophisticated platform specific debug
libraries could put more work into determining if the pointer s is
really a valid readable memory location or not and pile onto the -1
error condition.

Then of course we could change strcpy() to return NULL, if one of the
parameters is NULL as another indication of error.

Now the getting rid of half your integer range thing might be a
difficult pill to swallow, so as an alternative you can return
UINT_MAX (or whatever size_t's maximum value is -- why the hell isn't
there a SIZE_T_MAX in limits.h?!?! -- anyone who continues to put
K&R&T or anyone of the C standards committee up on a pedistal needs a
serious lobotomy) as the error value (thus eliminating only one
possible length.) If this is unsatisfactory then we always have:

int strlen (size_t * sz, const char * s);

but then it can't be used in an arithmetic expression.

If you haven't seen the punchline coming from a mile away yet, I'll be
very disappointed. Go look at http://bstring.sf.net/ to see how I
dealt with this exact issue. Strings in the bstring library can only
legally have lengths between 0 and INT_MAX. So my blength() function
returns an integer and happily returns -1 if you pass in NULL, or
other easily detectable flawed input that doesn't otherwise lead to
UB. So I give up the possibility of using monster long strings beyond
the size of INT_MAX -- and I don't have to worry about closure issues,
because its not feasible to construct a bstring that is longer in the
same way that a malloc'ed char * could be. I don't even consider that
a trade off -- for all the functionality, safety and speed I obtain, I
don't miss the possibility of generating certain incredibly long
strings that I have never encountered in my life of programming so
far.
Languages with exceptions (Ada, C++, and Eiffel all come to mind)
can handle this by rejecting the attempt at runtime with an error;
and indeed, in Eiffel one can even express many constraints directly
in a function interface, so that compilers can catch some of these
errors at compile time. Once one buys into the "exceptions" model,
it becomes clear what to do with constraint violations: instead
of allowing all possible inputs (and generating "the least garbagey
possible output" for garbage input), just throw an exception.


Dude -- you know the only reason C++ has exceptions is because there
is no program control for constructors or destructors (and therefore
nowhere for them to return an error code.) Anyhow, yes other
languages have other ways of obtaining closure. But now tell me who's
being off topic in c.l.c? So can I go back to posting how I think C
should be able to assume 2s complement arithmetic, have a widening
multiply, bit-scan, coroutines, and a seriously programmatically
enabled preprocessor?

Anyhow, if you view putting in code to handle NULL pointers as being
"garbage moderation" then I think you've missed the point. As you
point out, C doesn't have exception handling. So the next best thing
you've got is to just return error codes out of everything that might
screw up.

And you should be viewing this as a means of obtaining operational
closure, not just some interesting feature-add. Look, malloc can
return NULL, so pointers can be filled will NULL instead of pointing
to well formed blocks of memory. If you claim that you should always
test malloc for returning NULL and deal with failure cases, then why
not just stick with the typical malloc wrapper than exits the program
whenever NULL is returned? The easy answer: malloc() failing is not
necessarily fatal to the rest of your application. This is of
particular importance to Bstrlib which tries to allocate memory in
powers of 2; if it finds it cannot, then instead of failing
immediately, it tries to malloc the tightest possible fiting boundary
instead.

Just as malloc doesn't, similarly bstrlib doesn't set policy about
failed bstring construction -- if it fails (because there is no
memory) then it doesn't try to exit your program. It just returns
NULL then lets the programmer notice this or not. Then by having
closure (i.e., allowing parameters to be NULL, but returning an error
in such cases, so that the error continues to propogate) it gives the
programmer a myriad of options for dealing with such cases.

And of course to complete the picture, I also support parameter
aliasing, for a fairly generous definition of what aliasing could
legally mean.

This isn't just a "nice to have" -- its a powerful concept that takes
less code than you think and has scarcely any performance impact. The
reason why proponents of other languages poo poo C programmers is
because we don't have built-in safety nets in our code that keep us
from killing ourselves on UB at every turn. But my claim is that,
that might be more a matter of culture than what the real technical
limitations of C is.

People don't try to write such safe code in C because they think C is
supposed to be lean and mean and that doing so will bloat their code
or something. The minimal Bstrlib module is like 8K of footprint and
the C library has no chance of matching up to it in terms of
performance. People are just wrong. You can make libraries or just
code in general in C that are safe, fast, and as functional as any
higher level language.

You have to stop letting the short sightedness of Ritchie, Kernigan,
Thompson and the C standards committee keeping you from seeing C as a
language from which you can design sophisticated structures. Bstrlib
is not just some weird cases where everything seems to work out. Last
month I posted some abstracted linked list code which (while probably
not very fast) has a lot of the same safety, power, and simplicity of
Bstrlib. C is not just a fancy assembly language. At least I try not
to treat it so simply.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
Nov 14 '05 #40

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: psimakov | last post by:
There is a new article out by Pavel Simakov entitled: Javascript Refactoring for safer, faster, better AJAX. ...
8
by: Pradyot Dhulipala | last post by:
Can some one please point me to a comprehensive resource for writing C programs?I checked out Steve Summit's FAQ. Thanks, Pradyot
2
by: Brett | last post by:
Let's say some one makes the argument that instead of multi threading an application, they say it's better just to make multiple applications. The app does the same thing for different modules. ...
19
by: Clint Olsen | last post by:
I was just thinking about the virtues of C vs. C++ wrt. ADT/generic programming. The biggest complaint about writing container libraries for ADTs is that void * offers no type safety. Does it...
11
by: WXS | last post by:
Using lock(this) has been much maligned since someone external to your object can lock causing possible deadlock and forcing you to now create an extra object lock_=new object(); in any classes...
3
by: jacob navia | last post by:
Recently, Microsoft proposed to the C standards comitee a rewrite of many functions in the standard library to make them safer in usage than the current ones. The new functions are specified in...
6
by: Joseph Turian | last post by:
I've been using assert liberally throughout my code. Then, upon compiling with -NDEBUG, I found that my program had different output. Why? Because -NDEBUG disables assert, but I had (at least) one...
9
by: Ben Bacarisse | last post by:
I am porting a program from the Windows world to the Linux world. The source uses MS's new "safer" string functions such as: strcat_s(dest, size, source); but there are also calls such as: ...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.