Two Questions about "strlen", "strcat" and "strcpy"

Matt

I have 2 questions:

1. strlen returns an unsigned (size_t) quantity. Why is an unsigned
value more approprate than a signed value? Why is unsighned value less
appropriate?

2. Would there be any advantage in having strcat and strcpy return a
pointer to the "end" of the destination string rather than returning a
pointer to its beginning?

Thanks,
Matt
--
comp.lang.c.moderated - moderation address: cl**@plethora.net

Nov 14 '05 #1

Subscribe Post Reply

7206

Mike Wahler

"Matt" <ma********@hotmail.com> wrote in message
news:cl****************@plethora.net...

I have 2 questions:

1. strlen returns an unsigned (size_t) quantity. Why is an unsigned
value more approprate than a signed value?
Signed types are for representing values which could be negative
or positive (or of course zero).

How could the size of something (in this case a count of characters)
be negative? Also, a signed value would halve the largest possible
value.
Why is unsighned value less
appropriate?
unsigned is *more* appropriate.

2. Would there be any advantage in having strcat and strcpy return a
pointer to the "end" of the destination string rather than returning a
pointer to its beginning?

None that I'm aware of. What 'advantages' would you think it would have?

-Mike

Nov 14 '05 #2

Alan Balmer

On Thu, 26 Aug 2004 21:13:51 GMT, "Mike Wahler"
<mk******@mkwahler.net> wrote:

2. Would there be any advantage in having strcat and strcpy return a
pointer to the "end" of the destination string rather than returning a
pointer to its beginning?

None that I'm aware of. What 'advantages' would you think it would have?

It would be useful to have *different* functions which do this - in
fact I've written them a few times ;-)

Handy when you're putting items consecutively into a buffer.

--
Al Balmer
Balmer Consulting
re************************@att.net

Nov 14 '05 #3

Mike Wahler

"Alan Balmer" <al******@att.net> wrote in message
news:dr********************************@4ax.com...

On Thu, 26 Aug 2004 21:13:51 GMT, "Mike Wahler"
<mk******@mkwahler.net> wrote:

2. Would there be any advantage in having strcat and strcpy return a
pointer to the "end" of the destination string rather than returning a
pointer to its beginning?

None that I'm aware of. What 'advantages' would you think it would have?

It would be useful to have *different* functions which do this - in
fact I've written them a few times ;-)

Handy when you're putting items consecutively into a buffer.

'strcat()' handles that just fine.

-Mike

Nov 14 '05 #4

Alan Balmer

On Thu, 26 Aug 2004 21:55:59 GMT, "Mike Wahler"
<mk******@mkwahler.net> wrote:

"Alan Balmer" <al******@att.net> wrote in message
news:dr********************************@4ax.com.. .
On Thu, 26 Aug 2004 21:13:51 GMT, "Mike Wahler"
<mk******@mkwahler.net> wrote:
>>
>> 2. Would there be any advantage in having strcat and strcpy return a
>> pointer to the "end" of the destination string rather than returning a
>> pointer to its beginning?
>
>None that I'm aware of. What 'advantages' would you think it would have?
>

It would be useful to have *different* functions which do this - in
fact I've written them a few times ;-)

Handy when you're putting items consecutively into a buffer.

'strcat()' handles that just fine.

But rather inefficiently, due to the need to find the end of the
target string.

--
Al Balmer
Balmer Consulting
re************************@att.net

Nov 14 '05 #5

Ben Pfaff

Alan Balmer <al******@att.net> writes:

On Thu, 26 Aug 2004 21:13:51 GMT, "Mike Wahler"
<mk******@mkwahler.net> wrote:

2. Would there be any advantage in having strcat and strcpy return a
pointer to the "end" of the destination string rather than returning a
pointer to its beginning?

None that I'm aware of. What 'advantages' would you think it would have?

It would be useful to have *different* functions which do this - in
fact I've written them a few times ;-)

There's always
p += sprintf (p, "%s", string);
--
"Given that computing power increases exponentially with time,
algorithms with exponential or better O-notations
are actually linear with a large constant."
--Mike Lee

Nov 14 '05 #6

Alan Balmer

On Thu, 26 Aug 2004 15:39:56 -0700, Ben Pfaff <bl*@cs.stanford.edu>
wrote:

Alan Balmer <al******@att.net> writes:
On Thu, 26 Aug 2004 21:13:51 GMT, "Mike Wahler"
<mk******@mkwahler.net> wrote:

2. Would there be any advantage in having strcat and strcpy return a
pointer to the "end" of the destination string rather than returning a
pointer to its beginning?

None that I'm aware of. What 'advantages' would you think it would have?

It would be useful to have *different* functions which do this - in
fact I've written them a few times ;-)

There's always
p += sprintf (p, "%s", string);

Yes, a very useful idiom (and, of course, it can do data conversions
at the same time.) My solution of choice for desktop or server
programs.

For embedded systems, I've written more specialized functions which do
the equivalent for specific data types, sometimes passing a pointer to
a pointer to the "next available" position, leaving the return value
for error reporting.

--
Al Balmer
Balmer Consulting
re************************@att.net

Nov 14 '05 #7

Mike Wahler

"Alan Balmer" <al******@att.net> wrote in message
news:lr********************************@4ax.com...

Handy when you're putting items consecutively into a buffer.

'strcat()' handles that just fine.

But rather inefficiently, due to the need to find the end of the
target string.

char buffer[100] = {0};
char *p = buffer;
p += sprintf(p, "%s", "Hello");
p += sprintf(p, "%s", " world\n");

-Mike

Nov 14 '05 #8

Joona I Palaste

Jason Lee <ja*******@operamail.com> scribbled the following
on comp.lang.c:

Alan Balmer <al******@att.net> wrote in message news:<bi********************************@4ax.com>. ..
On Thu, 26 Aug 2004 15:39:56 -0700, Ben Pfaff <bl*@cs.stanford.edu>
wrote:
>Alan Balmer <al******@att.net> writes:
>> On Thu, 26 Aug 2004 21:13:51 GMT, "Mike Wahler"
>> <mk******@mkwahler.net> wrote:
>>>> 2. Would there be any advantage in having strcat and strcpy return a
>>>> pointer to the "end" of the destination string rather than returning a
>>>> pointer to its beginning?
>>>
>>>None that I'm aware of. What 'advantages' would you think it would have?
>>>
>> It would be useful to have *different* functions which do this - in
>> fact I've written them a few times ;-)
>
>There's always
> p += sprintf (p, "%s", string);

Yes, a very useful idiom (and, of course, it can do data conversions
at the same time.) My solution of choice for desktop or server
programs.

abandonware slandered indepth gawdamned acquintance inbuilt chegwin

(snip about fifty lines of this crap)

*PLONK*

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"The question of copying music from the Internet is like a two-barreled sword."
- Finnish rap artist Ezkimo

Nov 14 '05 #9

Dan Pop

In <qE****************@newsread1.news.pas.earthlink.n et> "Mike Wahler" <mk******@mkwahler.net> writes:

"Alan Balmer" <al******@att.net> wrote in message
news:lr********************************@4ax.com.. .
>> Handy when you're putting items consecutively into a buffer.
>
>'strcat()' handles that just fine.
> ^^^^^^

But rather inefficiently, due to the need to find the end of the
target string.

char buffer[100] = {0};
char *p = buffer;
p += sprintf(p, "%s", "Hello");
p += sprintf(p, "%s", " world\n");

I'm missing the usage of strcat in your code, despite your previous claim
that it handles the job "just fine".

For all you know, sprintf might implement %s with a strcpy and a strlen
call, so the sprintf solution need not be as efficient as it looks.

Small things like strcpy usually get implemented in hand optimised
assembly, which is virtually never the case with monsters like sprintf.
So, a strcpy returning a pointer to the terminating null character would
be, possibly by far, the best solution for multiple string concatenation.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #10

Alan Balmer

On 27 Aug 2004 08:10:17 GMT, Joona I Palaste <pa*****@cc.helsinki.fi>
wrote:

programs.

abandonware slandered indepth gawdamned acquintance inbuilt chegwin

(snip about fifty lines of this crap)

Hmm... I never saw this. Maybe my cross-post filter got it?

--
Al Balmer
Balmer Consulting
re************************@att.net

Nov 14 '05 #11

CBFalconer

Dan Pop wrote:

.... snip ...
Small things like strcpy usually get implemented in hand optimised
assembly, which is virtually never the case with monsters like
sprintf. So, a strcpy returning a pointer to the terminating null
character would be, possibly by far, the best solution for multiple
string concatenation.

This is one more reason that strlcpy (and strlcat) are the right
answer. They return the length of the resultant string, after
which locating the end is a single indexing operation, but
probably not needed. My implementation (public domain) is
available at:

<http://cbfalconer.home.att.net/download/strlcpy.zip>

and a version of the original Miller & de Raadt paper follows:

This is a text version of the original at:
<http://www.courtesan.com/todd/papers/strlcpy.html>
================================================== =============

strlcpy and strlcat
consistent, safe, string copy and concatenation.

Todd C. Miller
University of Colorado, Boulder

Theo de Raadt
OpenBSD project

Abstract

As the prevalence of buffer overflow attacks has increased,
more and more programmers are using size or length-bounded
string functions such as strncpy() and strncat(). While this is
certainly an encouraging trend, the standard C string functions
generally used were not really designed for the task. This
paper describes an alternate, intuitive, and consistent API
designed with safe string copies in mind.

There are several problems encountered when strncpy() and
strncat() are used as safe versions of strcpy() and strcat().
Both functions deal with NUL-termination and the length
parameter in different and non-intuitive ways that confuse even
experienced programmers. They also provide no easy way to
detect when truncation occurs. Finally, strncpy() zero-fills
the remainder of the destination string, incurring a
performance penalty. Of all these issues, the confusion caused
by the length parameters and the related issue of NUL-
termination are most important. When we audited the OpenBSD
source tree for potential security holes we found rampant
misuse of strncpy() and strncat(). While not all of these
resulted in exploitable security holes, they made it clear that
the rules for using strncpy() and strncat() in safe string
operations are widely misunderstood. The proposed replacement
functions, strlcpy() and strlcat(), address these problems by
presenting an API designed for safe string copies (see Figure 1
for function prototypes). Both functions guarantee NUL-
termination, take as a length parameter the size of the string
in bytes, and provide an easy way to detect truncation. Neither
function zero-fills unused bytes in the destination.

Introduction

In the middle of 1996, the authors, along with other members of
the OpenBSD project, undertook an audit of the OpenBSD source
tree looking for security problems, starting with an emphasis
on buffer overflows. Buffer overflows [1] had recently gotten a
lot of attention in forums such as BugTraq [2] and were being
widely exploited. We found a large number of overflows due to
unbounded string copies using sprintf(), strcpy() and strcat(),
as well as loops that manipulated strings without an explicit
length check in the loop invariant. Additionally, we also found
many instances where the programmer had tried to do safe string
manipulation with strncpy() and strncat() but failed to grasp
the subtleties of the API.

Thus, when auditing code, we found that not only was it
necessary to check for unsafe usage of functions like strcpy()
and strcat(), we also had to check for incorrect usage of
strncpy() and strncat(). Checking for correct usage is not
always obvious, especially in the case of ``static'' variables
or buffers allocated via calloc(), which are effectively pre-
terminated. We came to the conclusion that a foolproof
alternative to strncpy() and strncat() was needed, primarily to
simplify the job of the programmer, but also to make code
auditing easier.

-----------------------------------------------------------
size_t strlcpy(char *dst, const char *src, size_t size);
size_t strlcat(char *dst, const char *src, size_t size);

Figure 1: ANSI C prototypes for strlcpy() and strlcat()
------------------------------------------------------------

Common Misconceptions

The most common misconception is that strncpy() NUL-terminates
the destination string. This is only true, however, if length
of the source string is less than the size parameter. This can
be problematic when copying user input that may be of arbitrary
length into a fixed size buffer. The safest way to use
strncpy() in this situation is to pass it one less than the
size of the destination string, and then terminate the string
by hand. That way you are guaranteed to always have a NUL-
terminated destination string. Strictly speaking, it is not
necessary to hand-terminate the string if it is a ``static''
variable or if it was allocated via calloc() since such strings
are zeroed out when allocated. However, relying on this feature
is generally confusing to those persons who must later maintain
the code.

There is also an implicit assumption that converting code from
strcpy() and strcat() to strncpy() and strncat() causes
negligible performance degradation. With this is true of
strncat(), the same cannot be said for strncpy() since it zero-
fills the remaining bytes not used to store the string being
copied. This can lead to a measurable performance hit when the
size of the destination string is much greater than the length
of the source string. The exact penalty for using strncpy() due
to this behavior varies by CPU architecture and implementation.

The most common mistake made with strncat() is to use an
incorrect size parameter. While strncat() does guarantee to
NUL-terminate the destination, you must not count the space for
the NUL in the size parameter. Most importantly, this is not
the size of the destination string itself, rather it is the
amount of space available. As this is almost always a value
that must be computed, as opposed to a known constant, it is
often computed incorrectly.

How do strlcpy() and strlcat() help things?

The strlcpy() and strlcat() functions provide a consistent,
unambiguous API to help the programmer write more bullet-proof
code. First and foremost, both strlcpy() and strlcat()
guarantee to NUL-terminate the destination string for all
strings where the given size is non-zero. Secondly, both
functions take the full size of the destination string as a
size parameter. In most cases this value is easily computed at
compile time using the sizeof operator. Finally, neither
strlcpy() nor strlcat() zero-fill their destination strings
(other than the compulsatory NUL to terminate the string).

The strlcpy() and strlcat() functions return the total length
of the string they tried to create. For strlcpy() that is
simply the length of the source; for strlcat() that means the
length of the destination (before concatenation) plus the
length of the source. To check for truncation, the programmer
need only verify that the return value is less than the size
parameter. Thus, if truncation has occurred, the number of
bytes needed to store the entire string is now known and the
programmer may allocate more space and re-copy the strings if
he or she wishes. The return value has similar semantics to the
return value of snprintf() as implemented in BSD and as
specified by the upcoming C9X specification [4] (note that not
all snprintf() implementations currently comply with C9X). If
no truncation occurred, the programmer now has the length of
the resulting string. This is useful since it is common
practice to build a string with strncpy() and strncat() and
then to find the length of the result using strlen(). With
strlcpy() and strlcat() the final strlen() is no longer
necessary.

Example 1a is a code fragment with a potential buffer overflow
(the HOME environment variable is controlled by the user and
can be of arbitrary length).

------------------------------------------------------------
strcpy(path, homedir);
strcat(path, "/");
strcat(path, ".foorc");
len = strlen(path);

Example 1a: Code fragment using strcpy() and strcat()
------------------------------------------------------------

Example 1b is the same fragment converted to safely use
strncpy() and strncat() (note that we have to terminate the
destination by hand).

------------------------------------------------------------
strncpy(path, homedir, sizeof(path) - 1);
path[sizeof(path) - 1] = '\ 0';
strncat(path, "/",
sizeof(path) - strlen(path) - 1);
strncat(path, ".foorc",
sizeof(path) - strlen(path) - 1);
len = strlen(path);

Example 1b: Converted to strncpy() and strncat()
------------------------------------------------------------

Example 1c is a trivial conversion to the strlcpy()/strlcat()
API. It has the advantage of being as simple as Example 1a, but
it does not take advantage of the new API's return value.

------------------------------------------------------------
strlcpy(path, homedir, sizeof(path));
strlcat(path, "/", sizeof(path));
strlcat(path, ".foorc", sizeof(path));
len = strlen(path);

Example 1c: Trivial conversion to strlcpy()/strlcat()
------------------------------------------------------------

Since Example 1c is so easy to read and comprehend, it is
simple to add additional checks to it. In Example 1d, we check
the return value to make sure there was enough space for the
source string. If there was not, we return an error. This is
slightly more complicated but in addition to being more robust,
it also avoids the final strlen() call.

------------------------------------------------------------
len = strlcpy(path, homedir, sizeof(path);
if (len >= sizeof(path))
return (ENAMETOOLONG);
len = strlcat(path, "/", sizeof(path);
if (len >= sizeof(path))
return (ENAMETOOLONG);
len = strlcat(path, ".foorc", sizeof(path));
if (len >= sizeof(path))
return (ENAMETOOLONG);

Example 1d: Now with a check for truncation
------------------------------------------------------------

Design decisions

A great deal of thought (and a few strong words) went into
deciding just what the semantics of strlcpy() and strlcat()
would be. The original idea was to make strlcpy() and strlcat()
identical to strncpy() and strncat() with the exception that
they would always NUL-terminate the destination string.
However, looking back on the common use (and misuse) of
strncat() convinced us that the size parameter for strlcat()
should be the full size of the string and not just the number
of characters left unallocated. The return values started out
as the number of characters copied, since this was trivial to
get as a side effect of the copy or concatenation. We soon
decided that a return value with the same semantics as
snprintf()'s was a better choice since it gives the programmer
the most flexibility with respect to truncation detection and
recovery.

Performance

Programmers are starting to avoid strncpy() due its poor
performance when the target buffer is significantly larger than
the length of the source string. For instance, the apache group
[6] replaced calls to strncpy() with an internal function and
noticed a performance improvement [7]. Also, the ncurses [8]
package recently removed an occurrence of strncpy(), resulting
in a factor of four speedup of the tic utility. It is our hope
that, in the future, more programmers will use the interface
provided by strlcpy() rather than using a custom interface.

To get a feel for the worst-case scenario in comparing
strncpy() and strlcpy(), we ran a test program that copies the
string ``this is just a test'' 1000 times into a 1024 byte
buffer. This is somewhat unfair to strncpy(), since by using a
small string and a large buffer strncpy() has to fill most of
the buffer with NUL characters. In practice, however, it is
common to use a buffer that is much larger than the expected
user input. For instance, pathname buffers are MAXPATHLEN long
(1024 bytes), but most filenames are significantly shorter than
that. The averages run times in Table 1 were generated on an
HP9000/425t with a 25Mhz 68040 CPU running OpenBSD 2.5 and a
DEC AXPPCI166 with a 166Mhz alpha CPU also running OpenBSD 2.5.
In all cases, the same C versions of the functions were used
and the times are the ``real time'' as reported by the time
utility.

cpu architecture function time (sec)
m68k strcpy 0.137
m68k strncpy 0.464
m68k strlcpy 0.14
alpha strcpy 0.018
alpha strncpy 0.10
alpha strlcpy 0.02
Table 1: Performance timings in seconds

As can be seen in Table 1, the timings for strncpy() are far
worse than those for strcpy() and strlcpy(). This is probably
due not only to the cost of NUL padding but also because the
CPU's data cache is effectively being flushed by the long
stream of zeroes.

What strlcpy() and strlcat() are not

While strlcpy() and strlcat() are well-suited for dealing with
fixed-size buffers, they cannot replace strncpy() and strncat()
in all cases. There are still times where it is necessary to
manipulate buffers that are not true C strings (the strings in
struct utmp for instance). However, we would argue that such
``pseudo strings'' should not be used in new code since they
are prone to misuse, and in our experience, a common source of
bugs. Additionally, the strlcpy() and strlcat() functions are
not an attempt to ``fix'' string handling in C, they are
designed to fit within the normal framework of C strings. If
you require string functions that support dynamically
allocated, arbitrary sized buffers you may wish to examine the
``astring'' package from mib software [9].

Who uses strlcpy() and strlcat()?

The strlcpy() and strlcat() functions first appeared in OpenBSD
2.4. The functions have also recently been approved for
inclusion in a future version of Solaris. Third-party packages
are starting to pick up the API as well. For instance, the
rsync [5] package now uses strlcpy() and provides its own
version if the OS does not support it. It is our hope that
other operating systems and applications will use strlcpy() and
strlcat() in the future, and that it will receive standards
acceptance at some time.

What's Next?

We plan to replace occurrences of strncpy() and strncat() with
strlcpy() and strlcat() in OpenBSD where it is sensible to do
so. While new code in OpenBSD is being written to use the new
API, there is still a large amount of code that was converted
to use strncpy() and strncat() during our original security
audit. To this day, we continue to discover bugs due to
incorrect usage of strncpy() and strncat() in existing code.
Updating older code to use strlcpy() and strlcat() should serve
to speed up some programs and uncover bugs in others.

Availability

The source code for strlcpy() and strlcat() is available free
of charge and under a BSD-style license as part of the OpenBSD
operating system. You may also download the code and its
associated manual pages via anonymous ftp from ftp.openbsd.org
in the directory /pub/OpenBSD/src/lib/libc/string. The source
code for strlcpy() and strlcat() is in strlcpy.c and strlcat.c.
The documentation (which uses the tmac.doc troff macros) may be
found in strlcpy.3.

Author Information

Todd C. Miller has been involved in the free software community
since 1993 when he took over maintenance of the sudo package.
He joined the OpenBSD project in 1996 as an active developer.
Todd belatedly received a BS in Computer Science in 1997 from
the University of Colorado, Boulder (after years of prodding).
Todd has so far managed to avoid the corporate world and
currently works as a Systems Administrator at the University of
Colorado, Boulder blissfully ensconced in academia. He may be
reached via email at <Todd dot Miller at cs dot colorado.edu>.

Theo de Raadt has been involved with free Unix operating
systems since 1990. Early developments included porting Minix
to the sun3/50 and amiga, and PDP-11 BSD 2.9 to a 68030
computer. As one of the founders of the NetBSD project, Theo
worked on maintaining and improving many system components
including the sparc port and a free YP implementation that is
now in use by most free systems. In 1995 Theo created the
OpenBSD project, which places focus on security, integrated
cryptography, and code correctness. Theo works full time on
advancing OpenBSD. He may be reached via email at
<deraadt at openbsd dot org>.

References

[1] Aleph One. ``Smashing The Stack For Fun And Profit.''
Phrack Magazine Volume Seven, Issue Forty-Nine.
[2] BugTraq Mailing List Archives.
http://www.geek-girl.com/bugtraq/. This web page contains
searchable archives of the BugTraq mailing list.
[3] Brian W. Kernighan, Dennis M. Ritchie. The C Programming
Language, Second Edition. Prentice Hall, PTR, 1988.
[4] International Standards Organization. ``C9X FCD,
Programming languages \*- C''
http://wwwold.dkuug.dk/jtc1/sc22/open/n2794/ This web page
contains the current draft of the upcoming C9X standard.
[5] Andrew Tridgell, Paul Mackerras. The rsync algorithm.
http://rsync.samba.org/rsync/tech_report/. This web page
contains a technical report describing the rsync program.
[6] The Apache Group. The Apache Web Server.
http://www.apache.org. This web page contains information
on the Apache web server.
[7] The Apache Group. New features in Apache version 1.3.
http://www.apache.org/docs/new_features_1_3.html. This web
page contains new features in version 1.3 of the Apache
web server.
[8] The Ncurses (new curses) home page.
http://www.clark.net/pub/dickey/ncurses/. This web page
contains Ncurses information and distributions.
[9] Forrest J. Cavalier III. ``Libmib allocated string
functions.'' http://www.mibsoftware.com/libmib/astring/.
This web page contains a description and implementation of
a set of string functions that dynamically allocate memory
as necessary.

--
"The most amazing achievement of the computer software industry
is its continuing cancellation of the steady and staggering
gains made by the computer hardware industry..." - Petroski

Nov 14 '05 #12

Mike Wahler

"Dan Pop" <Da*****@cern.ch> wrote in message
news:cg**********@sunnews.cern.ch...

In <qE****************@newsread1.news.pas.earthlink.n et> "Mike Wahler" <mk******@mkwahler.net> writes:
"Alan Balmer" <al******@att.net> wrote in message
news:lr********************************@4ax.com.. .
>> Handy when you're putting items consecutively into a buffer.
>
>'strcat()' handles that just fine.
> ^^^^^^
But rather inefficiently, due to the need to find the end of the
target string.
char buffer[100] = {0};
char *p = buffer;
p += sprintf(p, "%s", "Hello");
p += sprintf(p, "%s", " world\n");

I'm missing the usage of strcat in your code, despite your previous claim
that it handles the job "just fine".

I responded to Alan's objection about finding the end of the string
by suggesting the 'sprintf()' alternative. It might or might
not be 'faster'. By 'just fine', I meant simply 'it will work'
not that it's 'better' than some other way.

For all you know, sprintf might implement %s with a strcpy and a strlen
call, so the sprintf solution need not be as efficient as it looks.
Right.

Small things like strcpy usually get implemented in hand optimised
assembly, which is virtually never the case with monsters like sprintf.
So, a strcpy returning a pointer to the terminating null character would
be, possibly by far, the best solution for multiple string concatenation.

imo that should be "...would possibly be..."

The only way to know is to measure.

-Mike

Nov 14 '05 #13

jacob navia

CBFalconer wrote:

Dan Pop wrote:

... snip ...
Small things like strcpy usually get implemented in hand optimised
assembly, which is virtually never the case with monsters like
sprintf. So, a strcpy returning a pointer to the terminating null
character would be, possibly by far, the best solution for multiple
string concatenation.

This is one more reason that strlcpy (and strlcat) are the right
answer. They return the length of the resultant string, after
which locating the end is a single indexing operation, but
probably not needed. My implementation (public domain) is
available at:

<http://cbfalconer.home.att.net/download/strlcpy.zip>

and a version of the original Miller & de Raadt paper follows:

This is a text version of the original at:
<http://www.courtesan.com/todd/papers/strlcpy.html>

[snip]

That paper doesn't address the fundamental problem of C strings:
an inefficient and unchecked data representation.

Strings (text mainly) is represented as sequences of bytes followed
by a terminating zero.

Changing that to length prefixed strings would improve both
security and efficiency.

A standard representation for length prefixed strings is
possible in C. In the implementation of length prefixed
strings in lcc-win32, I propose a similar syntax, but with
improved data representation.

A string is prefixed with the bounds of the enclosed data.
Very simple. A 32 bit representation allows lcc-win32 to
get 4GB maximum length strings, but we could agree that the
specific length of the maximum string is implementation defined.

I tried to keep a compatibility library with Strcpy Strcat, etc.

All those functions work with length prefixed strings. I used
operator overloading to do that, since it is a more general
solution, but it can be hardwired in the compiler in the
same way as the handling of zero terminated strings is hard
wired now.

jacob

Nov 14 '05 #14

CBFalconer

jacob navia wrote:

CBFalconer wrote:
Dan Pop wrote:

... snip ...
Small things like strcpy usually get implemented in hand optimised
assembly, which is virtually never the case with monsters like
sprintf. So, a strcpy returning a pointer to the terminating null
character would be, possibly by far, the best solution for multiple
string concatenation.

This is one more reason that strlcpy (and strlcat) are the right
answer. They return the length of the resultant string, after
which locating the end is a single indexing operation, but
probably not needed. My implementation (public domain) is
available at:

<http://cbfalconer.home.att.net/download/strlcpy.zip>

and a version of the original Miller & de Raadt paper follows:

This is a text version of the original at:
<http://www.courtesan.com/todd/papers/strlcpy.html>

[snip]

That paper doesn't address the fundamental problem of C strings:
an inefficient and unchecked data representation.

Strings (text mainly) is represented as sequences of bytes
followed by a terminating zero.

Changing that to length prefixed strings would improve both
security and efficiency.

A standard representation for length prefixed strings is
possible in C. In the implementation of length prefixed
strings in lcc-win32, I propose a similar syntax, but with
improved data representation.

However in this newsgroup we deal with things found in standard C,
which includes a whole library of functions to deal the 'string'
format, which in turn is detailed in the standard. Any compiler
dealing with your format as a built-in is not following the
standard, and code written for it is non-portable. As you may
have noticed we deal only with portable code here, and have no
wish to alter that focus.

Note that strlcpy and strlcat are perfectly standard, apart from
their names. They are implemented (at least my implementation) in
purely standard C, and their purpose is to live with the existing
string representation and greatly reduce the prevalence of errors.

If the user wants to take my published code and change the names
to say str_lcpy and str_lcat he is free to do so, and he can use
the result in completely conformant code. He doesn't have to
learn anything new.

--
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Nov 14 '05 #15

Keith Thompson

Joona I Palaste <pa*****@cc.helsinki.fi> writes:

Jason Lee <ja*******@operamail.com> scribbled the following
on comp.lang.c:

[...]

abandonware slandered indepth gawdamned acquintance inbuilt chegwin

(snip about fifty lines of this crap)

*PLONK*

I think you just plonked a spammer. The previous article was posted
only to comp.lang.c; the followup was cross-posted to comp.lang.c,
alt.algebra.help, sci.agriculture, alt.sports.basketball.nba, and
alt.autos.classic-trucks. Killfiling is certainly appropriate, but
announcing it seems superfluous.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 14 '05 #16

jacob navia

CBFalconer wrote:

[The user] doesn't have to
learn anything new.

Yes, but there is no free lunch. If you do
not learn anything, the old stuff continues
to produce the same bugs.

As you know Chuck, I do not use a 486 any more.

String representation is a problem that
needs fixing. Only a change of representation
can *really* take the burden of making a
mistake in text handling away from the programmer
at each step.

Mistakes will not disappear magically of course.
But we must address this problem. I am convinced
that C has a *future* and that means innovation.

jacob

Nov 14 '05 #17

Kenny McCormack

In article <cg**********@news-reader5.wanadoo.fr>,
jacob navia <ja***@jacob.remcomp.fr> wrote:

CBFalconer wrote:
[The user] doesn't have to
learn anything new.

Yes, but there is no free lunch. If you do
not learn anything, the old stuff continues
to produce the same bugs.

As you know Chuck, I do not use a 486 any more.

String representation is a problem that
needs fixing. Only a change of representation
can *really* take the burden of making a
mistake in text handling away from the programmer
at each step.

Mistakes will not disappear magically of course.
But we must address this problem. I am convinced
that C has a *future* and that means innovation.

jacob

Innovation is OT here (*). I thought you understood that by now.

(*) "here" beinc clc.

Nov 14 '05 #18

red floyd

Matt wrote:

[redacted]

You forgot to give us your professor's email address. How are we
supposed to send him our answers?
--
comp.lang.c.moderated - moderation address: cl**@plethora.net

Nov 14 '05 #19

Default User

Matt wrote:

I have 2 questions:

1. strlen returns an unsigned (size_t) quantity. Why is an unsigned
value more approprate than a signed value? Why is unsighned value less
appropriate?
How can a length be negative?
2. Would there be any advantage in having strcat and strcpy return a
pointer to the "end" of the destination string rather than returning a
pointer to its beginning?

No. It returns the pointer to the start so you can chain calls:
char str[80];

strcat(strcpy(str, "Hello "), "World!");

Brian Rodenborn
--
comp.lang.c.moderated - moderation address: cl**@plethora.net

Nov 14 '05 #20

Keith Thompson

ma********@hotmail.com (Matt) writes:
[...]

2. Would there be any advantage in having strcat and strcpy return a
pointer to the "end" of the destination string rather than returning a
pointer to its beginning?

Possibly a slight one. Returning a pointer to the beginning doesn't
give you any information you didn't already have. Returning a pointer
to the end (presumably to the trailing '\0') could, for example, let
you catenate more characters onto the end of the string without having
to scan the whole string again to find the end of it.

The disadvantage is that it would break code that depends on the
current behavior.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
--
comp.lang.c.moderated - moderation address: cl**@plethora.net

Nov 14 '05 #21

Douglas A. Gwyn

Matt wrote:

1. strlen returns an unsigned (size_t) quantity. Why is an unsigned
value more approprate than a signed value? Why is unsighned value less
appropriate?
It can't simultaneously be both.
This sounds like a homework question. The idea of
homework is to get you to do your own analysis.
2. Would there be any advantage in having strcat and strcpy return a
pointer to the "end" of the destination string rather than returning a
pointer to its beginning?

How would you use such a pointer if it did,
and what must you do instead to achieve a
similar goal using the existins standard
functions?
--
comp.lang.c.moderated - moderation address: cl**@plethora.net

Nov 14 '05 #22

Brian Inglis

On 26 Aug 2004 19:44:06 GMT in comp.lang.c.moderated,
ma********@hotmail.com (Matt) wrote:

I have 2 questions:

1. strlen returns an unsigned (size_t) quantity. Why is an unsigned
value more approprate than a signed value? Why is unsighned value less
appropriate?
What does the standard say about size_t? What do you think?
2. Would there be any advantage in having strcat and strcpy return a
pointer to the "end" of the destination string rather than returning a
pointer to its beginning?

The non-standard but commonly available stpcpy/stpncpy functions do
just that.

--
Thanks. Take care, Brian Inglis Calgary, Alberta, Canada

Br**********@CSi.com (Brian[dot]Inglis{at}SystematicSW[dot]ab[dot]ca)
fake address use address above to reply
--
comp.lang.c.moderated - moderation address: cl**@plethora.net

Nov 14 '05 #23

Brian Raiter

> I have 2 questions:

1. strlen returns an unsigned (size_t) quantity. Why is an unsigned
value more approprate than a signed value? Why is unsighned value less
appropriate?

2. Would there be any advantage in having strcat and strcpy return a
pointer to the "end" of the destination string rather than returning a
pointer to its beginning?

I also have 2 questions:

1. Posting homework questions to usenet frequently returns
misinformation and insults. Why is this more appropriate than actual
answers?

2. Would there be any advantage in having you do your own homework?

b
--
comp.lang.c.moderated - moderation address: cl**@plethora.net

Nov 14 '05 #24

Dan Pop

In <cl****************@plethora.net> ma********@hotmail.com (Matt) writes:

I have 2 questions:

1. strlen returns an unsigned (size_t) quantity. Why is an unsigned
value more approprate than a signed value?
o Strings can't have negative sizes.

o size_t is also capable of representing the size of the largest object
supported by the implementation. If you fill this object with a string,
strlen will be able to correctly return its size.
Why is unsighned value less appropriate?
Unsigned and signed types don't mix well in C. Ideally, unsigned types
should be used only for bit manipulation and modulo arithmetic purposes.
In real life, however, their ability to represent larger positive
values than their signed counterparts imposes their usage for normal
computational purposes.
2. Would there be any advantage in having strcat and strcpy return a
pointer to the "end" of the destination string rather than returning a
pointer to its beginning?

Most definitely. It would render multiple string concatenation faster.
fgets would also benefit from such a behaviour, since the end of its
string *must* be examined in order to determine if a complete line has
been read.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
--
comp.lang.c.moderated - moderation address: cl**@plethora.net

Nov 14 '05 #25

Karthiik Kumar

Matt wrote:

I have 2 questions:

1. strlen returns an unsigned (size_t) quantity. Why is an unsigned
value more approprate than a signed value? Why is unsighned value less
appropriate?

I cannot think of a string that has its length to be -ve. How would
a string of length -1 look like (as if that makes sense !! ) ?
2. Would there be any advantage in having strcat and strcpy return a
pointer to the "end" of the destination string rather than returning a
pointer to its beginning?

With the whole lot of string manipulation functions expecting a
pointer to the beginning of the string , to have strcpy / strcat return
pointer to the beginning of the string would be orthogonal to the API,
instead of other ways like pointing to its 'end' etc.

--
Karthik
--
comp.lang.c.moderated - moderation address: cl**@plethora.net

Nov 14 '05 #26

Paul Hsieh

ma********@hotmail.com (Matt) wrote:

I have 2 questions:

1. strlen returns an unsigned (size_t) quantity. Why is an unsigned
value more approprate than a signed value? Why is unsighned value less
appropriate?
The reasoning is that unsigned values have a larger maximum value and
that negative lengths don't have any meaning. So under the assumption
that strlen can never fail and only needs to represent all possible
outputs of a string length, size_t is the most appropriate output.
2. Would there be any advantage in having strcat and strcpy return a
pointer to the "end" of the destination string rather than returning a
pointer to its beginning?

As others have posted, incremental string concatenation is simplified
by such a scheme. The claims of superior performance is kind of funny
though -- while technically true, it misses the greater point that in
fact calls to strlen(), implicit or not, is the real performance
problem.

People may write strcpy() in "hand coded assembly language" if they
want, but the semantics for it limits the advantange one can gain from
doing this on most platforms (on modern x86 compilers you will gain
nothing by doing this.) memcpy() on the other hand, can be
*drammatically* improved in performance using assembly language on
most platforms -- the reason is that aligned block copying is
something most hardware has good support for that is far superior to
char by char copying.

So returning an "end pointer" helps half of the problem by implicitely
tracking the end-address for the destination, but does nothing about
the other half of the problem of not knowing the length of the source
and thus still doing the *implicit strlen()* as part of the strcat or
strcpy. If, on the other hand, the length of your source and
destination strings is known beforehand, then one could use memcpy()
instead, which would lead to a *true* performance boost.

People who have seen me post here before already know the punchline.
I've written a string library that does precisely this sort of thing
(as well as all sorts of other things related to speed, safety,
functionality and maintainability). You can learn more by visiting
the second link below.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
--
comp.lang.c.moderated - moderation address: cl**@plethora.net

Nov 14 '05 #27

jacob navia

Kenny McCormack wrote:

Innovation is OT here (*). I thought you understood that by now.

Well, that is a sentence that needs to be framed, and
right away kept in the museum.

Down with innovation!

Operator overloading is an accepted way of working with
numerical quantities that has gotten accepted even in
traditional languages like FORTRAN.

I like C. I do not want it to disappear as a computer
language that can provide simplicity and power.

jacob

Nov 14 '05 #28

jacob navia

Paul Hsieh wrote:

People who have seen me post here before already know the punchline.
I've written a string library that does precisely this sort of thing
(as well as all sorts of other things related to speed, safety,
functionality and maintainability). You can learn more by visiting
the second link below.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

The approach you take is the good one. Length delimited strings!

The string library lcc-win32 proposes is the same idea but with some
syntatic sugar around it:
1) You can index Strings like char *:
String S1 = "abc";
S1[1] // yields 'b'
S1[1] = 'X' // String is now "aXc"

2) You can assign them to local variables in the normal way as shown above

3) All strings are garbage collected. No more "free" problems.

4) Function names are cloned from the C library:
Strcpy
Strcat
etc

Easy to use, easy to learn.

Lcc-win32:
http://www.cs.virginia.edu/~lcc-win32

jacob

Nov 14 '05 #29

CBFalconer

jacob navia wrote:

Paul Hsieh wrote:
People who have seen me post here before already know the punchline.
I've written a string library that does precisely this sort of thing
(as well as all sorts of other things related to speed, safety,
functionality and maintainability). You can learn more by visiting
the second link below.

The approach you take is the good one. Length delimited strings!

The string library lcc-win32 proposes is the same idea but with
some syntatic sugar around it:
1) You can index Strings like char *:
String S1 = "abc";
S1[1] // yields 'b'
S1[1] = 'X' // String is now "aXc"

2) You can assign them to local variables in the normal way as
shown above

3) All strings are garbage collected. No more "free" problems.

4) Function names are cloned from the C library:
Strcpy
Strcat
etc

Easy to use, easy to learn.

The problem is that you are incorporating this into the
compiler/library, instead of a separate module available with
source and written in standard C, as does Paul. Things like your
indexing operations above are not possible without an extension in
the compiler, and thus are inherently non-portable. You could
make something that accepted:

s1.body[i] = 'a';

etc. in a fully portable manner, and this would be on-topic here.
Meanwhile your system runs only under Windoze (and even only a
subset of that), and is very far from portable.

--
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Nov 14 '05 #30

Dan Pop

In <_x**************@newsread1.news.pas.earthlink.net > "Mike Wahler" <mk******@mkwahler.net> writes:

"Dan Pop" <Da*****@cern.ch> wrote in message
news:cg**********@sunnews.cern.ch...

Small things like strcpy usually get implemented in hand optimised
assembly, which is virtually never the case with monsters like sprintf.
So, a strcpy returning a pointer to the terminating null character would
be, possibly by far, the best solution for multiple string concatenation.

imo that should be "...would possibly be..."

The only way to know is to measure.

To measure what? The behaviour of one specific implementation?

The point is that such a function solves the problem with NO overhead,
while both strcat and sprintf have overheads. One doesn't have to
measure anything to realise this.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #31

jacob navia

CBFalconer wrote:

The problem is that you are incorporating this into the
compiler/library, instead of a separate module available with
source and written in standard C, as does Paul. Things like your
indexing operations above are not possible without an extension in
the compiler, and thus are inherently non-portable. You could
make something that accepted:

s1.body[i] = 'a';

etc. in a fully portable manner, and this would be on-topic here.
Meanwhile your system runs only under Windoze (and even only a
subset of that), and is very far from portable.

That is *still* possible OF COURSE.

Nobody hinders you to write
s1.content[1] = 'a';
instead of
s1[1] = 'a';

Lcc-win32 is *still* a C compiler and will swallow that
in no time. The second form will be slightly faster since
it avoids bounds checking that is performed automatically
with length prefixed strings.

The structure is defined in the header file, and you
can avoid any dependencies with lcc-win32 by sticking
to a standard notation if you feel like.

The point here is that we need a portable way of using this
length delimited strings in C. Not a particular
implementation such as mine.

Important is that we have in the standard language a way of using
length prefixed strings in the same way as now we have zero terminated
strings.

The decision as to which strings are more convenient and appropiate
to the task at hand should be left to the C programmer. There is NO
CHOICE now. That is what bothers me.

jacob

P.S. Most languages accept operator overloading now, even
FORTRAN. (Fortran 90 standard), including C#, Perl, Ruby,
and many others. It is a proved compiler technology with
few mysteries left...

Operator overloading makes possible to write such libraries without
too much pain.

Nov 14 '05 #32

Alan Balmer

On Mon, 30 Aug 2004 09:36:56 +0200, jacob navia
<ja***@jacob.remcomp.fr> wrote:

Kenny McCormack wrote:

Innovation is OT here (*). I thought you understood that by now.

Well, that is a sentence that needs to be framed, and
right away kept in the museum.

Down with innovation!

Operator overloading is an accepted way of working with
numerical quantities that has gotten accepted even in
traditional languages like FORTRAN.

I like C. I do not want it to disappear as a computer
language that can provide simplicity and power.

There's another newsgroup, just down the hall, which discusses the C
standard and things which should go in its next version.

--
Al Balmer
Balmer Consulting
re************************@att.net

Nov 14 '05 #33

CBFalconer

jacob navia wrote:

.... snip ...
Important is that we have in the standard language a way of using
length prefixed strings in the same way as now we have zero
terminated strings.

The decision as to which strings are more convenient and
appropiate to the task at hand should be left to the C programmer.
There is NO CHOICE now. That is what bothers me.

Then present an appropriate module in source form, together with
suggested extensions to the language to ease use, IN THE
comp.std.c newsgroup. The source module can be advertised here,
as does Paul. If the technique is useful, simple, safe, and in
use it will be considered for a future version. And don't call
them strings, that already has a defined implementation in C.

--
"Churchill and Bush can both be considered wartime leaders, just
as Secretariat and Mr Ed were both horses." - James Rhodes.
"We have always known that heedless self-interest was bad
morals. We now know that it is bad economics" - FDR

Nov 14 '05 #34

Michael Wojcik

In article <cl****************@plethora.net>, Default User <fi********@boeing.com.invalid> writes:

Matt wrote:
2. Would there be any advantage in having strcat and strcpy return a
pointer to the "end" of the destination string rather than returning a
pointer to its beginning?
No. It returns the pointer to the start so you can chain calls:

Which makes it so much easier to overflow buffers, fail to check
results, and so forth - without which software would not be the
great success that it is today.

This reminds me of Mike Whaler's construction from earlier in this
thread:
char *p = buffer;
p += sprintf(p, "%s", "Hello");
p += sprintf(p, "%s", " world\n");

Certainly in a trivial case there's no danger, but in real code this
sort of thing is a Bad Idea. It's worse with fprintf, since there's
a wider range of error conditions, but *I* don't want to bet that I
never mistype a format string, just so I can write terse code.
--
Michael Wojcik mi************@microfocus.com

The penance was not building the field and bringing back Shoeless Joe
Jackson, but rather tossing on the field with his father. -- Kevin Aug

Nov 14 '05 #35

Default User

Michael Wojcik wrote:

In article <cl****************@plethora.net>, Default User <fi********@boeing.com.invalid> writes:
Matt wrote:
2. Would there be any advantage in having strcat and strcpy return a
pointer to the "end" of the destination string rather than returning a
pointer to its beginning?

No. It returns the pointer to the start so you can chain calls:

Which makes it so much easier to overflow buffers, fail to check
results, and so forth - without which software would not be the
great success that it is today.

No more so than any other use of those functions. Whether they're
chained or not is irrelevant.

Brian Rodenborn

Nov 14 '05 #36

Kenny McCormack

In article <b6********************************@4ax.com>,
Alan Balmer <al******@spamcop.net> wrote:

On Mon, 30 Aug 2004 09:36:56 +0200, jacob navia
<ja***@jacob.remcomp.fr> wrote:
Kenny McCormack wrote:

Innovation is OT here (*). I thought you understood that by now.

Well, that is a sentence that needs to be framed, and
right away kept in the museum.

Down with innovation!

Operator overloading is an accepted way of working with
numerical quantities that has gotten accepted even in
traditional languages like FORTRAN.

I like C. I do not want it to disappear as a computer
language that can provide simplicity and power.

There's another newsgroup, just down the hall, which discusses the C
standard and things which should go in its next version.

I think we all understand that (certainly Jacob & I do). And I think we
understand the reasons why it is so. I think we all understand the context
of my little jibe.

It is just that, and this is a point made several times, through the years,
by so many different people, it is so counter-intuitive that a newsgroup
named simply comp.lang.c would be about something so abstract and unrelated
to what most people consider C and for something like comp.lang.std.c (or
whatever that thing to which you obliquely refer is named) would:
a) Not be what clc in fact is (i.e., the home of the abstraction to
which I allude earlier)
b) Actually be closer (according to your statement) to what most
people would intuitively think clc would be.

Nov 14 '05 #37

Alan Balmer

On Mon, 30 Aug 2004 23:04:10 GMT, ga*****@yin.interaccess.com (Kenny
McCormack) wrote:

There's another newsgroup, just down the hall, which discusses the C
standard and things which should go in its next version.

I think we all understand that (certainly Jacob & I do). And I think we
understand the reasons why it is so. I think we all understand the context
of my little jibe.

I'm sure that Jacob understands it, but he seems reluctant to accept
it. Personally, I weary of the constant advertisements for his "better
than C" language.

--
Al Balmer
Balmer Consulting
re************************@att.net

Nov 14 '05 #38

Paul Hsieh

jacob navia <ja***@jacob.remcomp.fr> wrote:

The point here is that we need a portable way of using this
length delimited strings in C. Not a particular
implementation such as mine.
Uhh ... that's exactly what bstrlib does. bstrlib is extremely
portable -- semantically speaking, I argue its even more portable than
CLIB itself since its defined usage is maximal and necessarily
identical over all platforms.
Important is that we have in the standard language a way of using
length prefixed strings in the same way as now we have zero terminated
strings.
No. zero terminated strings is the whole problem in the first place.
This is where both buffer overflows and retrograde performance comes
from. The whole CLIB style of '\0' terminated strings forces the
programmer to think in terms of implementation and constantly respin
solutions to common problems. Every other modern language in
existence basically solves lets the programmer think about strings
just as strings. So what we want is something closer to how other
programming languages deal with strings (like Python, Perl, Java,
Pascal, etc), while supersetting the CLIB char * nonsense. Guess what
-- bstrlib *does* this.
Operator overloading makes possible to write such libraries without
too much pain.

Ok, this is a different thing, and it solves a different problem.
This is basically a focus on syntactical concerns more than anything
else -- and its not a very good solution to it either. The set of
operators available are dictated by the syntax/considerations for
arithmetic on scalars. For example if you want a comprehensive vector
library, which operator are you proposing to assign for cross-product,
dot-product, SIMD-product, tensor and trace, while retaining sum and
difference? Ask for enough operations and you will eventually just
run out of operators.

As I have posted before, it would make more sense if there were
*additional* operators (using multipler operator characters and some
previously unused characters such as @, $ to create things that look
like: @+, $*, @==, $^^, etc) that basically had empty definitions
unless actually defined. There are other operator overloading things
such as:

X"..."
"..."X
3.75X

Where the X could be other letters. This gives programmers a way of
using operator overloading without suffering from the confusion of
using previous operators that someone reading the code can easily
mistake for having different semantics from what they think. See?
Its still explicit (something C is useful for) while providing for
more modern functionality.

The C standards people have the opportunity to truly move the language
forward, and in fact get a small leg up over other languages if they
would consider things like this. But they are clearly such
retrograde, short-sighted and as we now see from the C99 debacle, now
ultimately ineffective. The standards committee is stuck in their old
"we have to make sure the VAX people, the embedded people, the super
computer people, the DOS people and the UNIX people can all adopt our
standard while actually doing completely different things in their
compiler" mantra.

The computing universe has changed. People don't want just the minor
silly things in C99 over C89. C still represents the lowest level
language, but there still remain several real problems not solved in
the language.

Jacob, people like you and the guy that is making "D" (Walter B---?)
make me sad. You are both seem overly concerned with such
ridiculously superficial aspects of the language -- yet both of you
two are the only ones putting your money where your mouth is and
converting your capability to make *compilers* to demonstrate the
possibility of evolution of C. Yet you haven't figured out that you
haven't sold your ideas to any significant population of programmers.

There are ideas in programming languages, especially the C-class of
programming langauges that I *desperately* want to see:

1. A preprocessor with "Turing complete" (or close enough) power.
The point is that the LISP people continue to laugh at C
programmers who have no code generation or "lambda"
programming ability.
a) Compile time type assertions to allow for type safe macros.
2. Exposing more important hardware functions such as a widening
multiply, bit scan and bit count operations. Many CPUs have
instructions for accelerating these operations, and its
otherwise fairly straightforward to simulate these operations.
a) Include things like round() and trunc() as seperate
functions.
3. For more powerful and useful C library.
a) Fix all the stupid problems like strtok(), gets(), etc.
b) More powerful heap API (freeall(), memsize(), sub-heaps,
totalAllocCount(), allAllocsIterate() etc).
4. Some way of performing a restartable vsnprintf (), or
generally va_* reuse (according to the standard, not the
implementation).
5. Real co-routines -- none of this setjmp/longjmp nonsense.
6. A scope specific "goto case ____". This kind of functionality
is *mandatory* for implementing fast state machines. Today
programmers are either forced to have redundant case ____ and
labels that are not scope protected or they do is the slow way
(while wrapped switch statement). Think about it -- there is
literally *NO* programming language with control mechanisms
that perfectly match the very common programming idiom of
state machines (except assembly language!)
7. Create an API for making "virtual file" objects. I.e., memory
files, network streams, algorithmic fractal strings, etc. could
be fopen(*,"r")'ed, fread()'ed, etc.

Think about it. This list is a set of *REAL* programming language
improvements, that are not really duplicated by any other language
(well ok, except for CPP improvements, which are duplicated with LISP
lambdas and MASM's preprocessor, of m4 or whatever). If you add these
things I don't see how anyone could *deny* that you were moving the
language forward in a significant way without just trying to be "me
too" with other languages.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 14 '05 #39

Paul Hsieh

ja***@jacob.remcomp.fr says...

Paul Hsieh wrote:
Important is that we have in the standard language a way of using
length prefixed strings in the same way as now we have zero terminated
strings.
No. zero terminated strings is the whole problem in the first place.
This is where both buffer overflows and retrograde performance comes
from. The whole CLIB style of '\0' terminated strings forces the
programmer to think in terms of implementation and constantly respin
solutions to common problems. Every other modern language in
existence basically solves lets the programmer think about strings
just as strings. So what we want is something closer to how other
programming languages deal with strings (like Python, Perl, Java,
Pascal, etc), while supersetting the CLIB char * nonsense. Guess what
-- bstrlib *does* this.

By giving the programmer the choice of zero terminated strings
OR length prefixed strings the language would retain compatibility
and at the same time allow the development of more robust
applications.

Bstrlib is directly compatible with '\0' terminated strings (it implicitely
concatenates '\0's automatically.) Bstrlib doesn't make it an either or choice
-- it supports both *simultaneously* while internally only using the length
based mechanism (since its faster, safer and more functional.) Bstrlib is a
*NO COMPROMISE* solution -- its truly portable, totally compatible with char *,
it outperforms everything (except Vstr), its safer than everything, and its
more functional than everything (except for missing Unicode support). Go
download it and study it. You're not going to be able to just poke holes in it
especially if you don't actually get it and understand how it works.

Operator overloading makes possible to write such libraries without
too much pain.

Ok, this is a different thing, and it solves a different problem.
This is basically a focus on syntactical concerns more than anything
else [...]

[...] but syntax sugar *is* important to easy the usage of the new data
type. If not, we would all program on assembler.

We have C++ for this. (Fortunately Bstrlib has a C++ API for people who want
this!)

As I have posted before, it would make more sense if there were
*additional* operators (using multipler operator characters and some
previously unused characters such as @, $ to create things that look
like: @+, $*, @==, $^^, etc) that basically had empty definitions
unless actually defined. There are other operator overloading things
such as:

X"..."
"..."X
3.75X

Where the X could be other letters. This gives programmers a way of
using operator overloading without suffering from the confusion of
using previous operators that someone reading the code can easily
mistake for having different semantics from what they think. See?
Its still explicit (something C is useful for) while providing for
more modern functionality.

This is quite correct Paul. But I have an incredible difficult time
trying to convince people about the need of evolution in C, and by the
other answers in this thread you can immediately see that the
conservative mood (innovation is off topic here) prevents any discussion
about improvements like the one you propose. I wanted to use greek
letters for new operators like
SIGMA j=1,inf (expression in j)

Look -- its because you are not thinking in the right terms. If you extend the
language in which it is *CLEAR* that you have truly made forward progress that
is not easily duplicated with the old standard then it will foster interest.
Using greek letters in the source code -- that's just going to create more
problems for source analysis tools and text editors.

Adding an infinite number of operators (via a grammar of operators) that can
all be defined arbitrarily -- now *THAT* would be innovation. It would deliver
something that mathematicians would be really interested in -- add *more*
operators on top of an ordinary mathematical system than can be accommodated by
just the ones available by the default language. And you don't see anything
comparable in other languages.

Just adding operator overloading alone? Nobody is going to care -- C++ already
does this. There's barely a person anywhere who has access to a C compiler
that isn't also capable of C++.

You want your audience to be captivated with the thought that they can *ONLY*
accoplish certain programming tasks in your language. There's so much room
left in C language unexplored and unaddressed.
The standards comitee refuses any change, since change should not exist
in C. Change and improvements are only allowed in C++. The comitee
has refused until now to correct even ouright *bugs* like the buffer
overflow specified in the asctime function. I pointed to that bug, and
was told that the comitee rejected any correction in 2001.
You don't get it. The C standards committee is dead. They committed suicide
with the C99 standard. Even if they *were* to listen to you or me, it would
not matter. They will not issue any more C standards -- they *CAN'T*. Since
C99 has been such an abysmal failure, they have nothing to build upon anymore -
- their credbility is completely gone. That *DEFACTO* standard is no longer in
their hands. Its in *YOURS*.

Its over for them. The future of C can *ONLY* come from people like you. This
is why I am so frustrated that you (and Walter, the D guy) are not leveraging
this opportunity correctly.

Jacob, people like you and the guy that is making "D" (Walter B---?)
make me sad. You are both seem overly concerned with such
ridiculously superficial aspects of the language -- yet both of you
two are the only ones putting your money where your mouth is and
converting your capability to make *compilers* to demonstrate the
possibility of evolution of C. Yet you haven't figured out that you
haven't sold your ideas to any significant population of programmers.

I spoke with Walter about D. He has overall good ideas, and his
language/compiler *is* an improvement. The problem with it is that is
object oriented, i.e. it provides support for a specific way of
organizing data. I think that C should be paradigm neutral, without
forcing *any* preconceived schema into the user's throat.

Monolithic ivory towers ... who uses D?

There are ideas in programming languages, especially the C-class of
programming langauges that I *desperately* want to see:

1. A preprocessor with "Turing complete" (or close enough) power.
The point is that the LISP people continue to laugh at C
programmers who have no code generation or "lambda"
programming ability.

Can you give an example?
You mean anonymous code blocks?

I mean integers, arithmetic and control structures that execute at the
*PREPROCESSOR* level. Masm has had this since its inception and its a very
powerful technique for building massive amounts of code or tables based on any
sort of algorithm you like. Something like this for example:

#define glue3(x,y,z) x ## y ## z
#TEXTEQUATE FNLIST =
#FOR IDX0 in (0,1,2,3,4)
# FOR IDX1 in #RANGE(0,3)
void glue3(output,IDX0,IDX1) (void) {
printf (#IDX0 #IDX1 "\n");
}
# IF #LENGTH(FNLIST) > 0
# TEXTEQUATE FNLIST = FNLIST ,
# ENDIF
# TEXTEQUATE FNLIST = FNLIST glue3(output,IDX0,IDX1)
# ENDFOR
#ENDFOR
void (*fns[5][4])(void) = { FNLIST };

a) Compile time type assertions to allow for type safe macros.

typeof() ?

Yeah. You need this to go with the functionality I am talking about above.
But really you only need something like assertTypeEqual(*,*) or
assertTypeCompatible(*,*) something like that.

2. Exposing more important hardware functions such as a widening
multiply, bit scan and bit count operations. Many CPUs have
instructions for accelerating these operations, and its
otherwise fairly straightforward to simulate these operations.

Lcc-win32 introduces intrinsics like _overflow(), MMX intrinsics, etc.

And what use are these MMX intrinsics to compilers for the PowerPC or ARM or
Alpha processors? I'm not talking about esoteric functions that are very CPU
specific. I am talking about mechanisms that clearly the majority of modern
processors have decided to implement. Remember you are trying to establish a
*STANDARD*, not just an implementation.

The Intel C compiler's vector-loop detection mechanism is probably the best way
to incorporate SIMD instruction sets -- not exposing instrinsics (which is not
really any better than assembly.)

a) Include things like round() and trunc() as seperate
functions.

They are separate functions now in C99.

I was unaware of this.

3. For more powerful and useful C library.
a) Fix all the stupid problems like strtok(), gets(), etc.

I have been saying this since quite a long time but nobody wants to
change anything.

Gcc has added strtok_r() and has a link time warning for use of gets(). I
wouldn't say that nobody wants to change anything -- its just that the
standards committee have their heads up their asses and has put far more weight
into the concerns of compiler vendors (who don't want customer support calls
wondering why they don't support gets() anymore) over the needs of actual
developers (who will benefit in the long run, if they are forced to not use
these stupid functions.)

b) More powerful heap API (freeall(), memsize(), sub-heaps,
totalAllocCount(), allAllocsIterate() etc).

Ditto.

I have working code for all of this -- the kind of difference it makes cannot
be underscored. Leaks and memory corruptions are tracked down far more
effectively -- and the performance gain of performing freeall()'s is pretty
incredible for some applications.

7. Create an API for making "virtual file" objects. I.e., memory
files, network streams, algorithmic fractal strings, etc. could
be fopen(*,"r")'ed, fread()'ed, etc.

This things should be done in a library. A low level language can be
improved with powerful libraries.

No. Some modules/libraries have been precanned to use FILE I/O as their only
interface (the official JPEG source library comes to mind.) I don't *WANT* to
change the JPEG library code -- I want to change the behaviour of file
functions underneath it, so I can feed it JPEG data from sources other than
files without changing any of their code (I don't want to maintain changes from
having hacked in everwhere where they have used a file function as the JPEG
group updates their code).

Think about it. This list is a set of *REAL* programming language
improvements, that are not really duplicated by any other language
(well ok, except for CPP improvements, which are duplicated with LISP
lambdas and MASM's preprocessor, of m4 or whatever). If you add these
things I don't see how anyone could *deny* that you were moving the
language forward in a significant way without just trying to be "me
too" with other languages.

Well "me too" is not intrinsically bad. Better to improve C than leave
it as it is now.

C++ already does this. Look, small incremental changes to languages have
always been rejected by the programming community. This is why Scheme has no
more adoption than Lisp, its why Objective-C never really took off, that's why
nobody cares about C99.
C is becoming obsolete, as FORTRAN did. Of course there are still places
where FORTRAN is good today, and it is still used.

Yes, thanks to the C standards committee! Look, we all have the C89 standard,
and compiler vendors aren't really moving from it. The C standard committee
has had nothing to offer us even with the myriad of remaining problems that are
in the language. Under the current standard, yes, C *will* become the next
COBOL. For the simple reason that no effort is being made to revitalize it.

The one advantage the C std committee has over you (and Walter) is that they
*DISCUSSED* the proposed changes in a large group (with their retrograde
adgenda). The C++ committee, of course, has the same advantage. If you simply
stay in your cubbyhole making your own changes to LCC-Win32, you're not going
to get widespread acceptance of what you are doing. In the end your only
customer will be yourself. You need to put together an "innovators committee"
of your own. Do it and drive the final nail in the coffin of the C std
committee. Don't do it and we might as well write the eulogy for the C
language itself.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 14 '05 #40

beliavsky

jacob navia <ja***@jacob.remcomp.fr> wrote in message news:<ch**********@news-reader2.wanadoo.fr>...

C is becoming obsolete, as FORTRAN did. Of course there are still places
where FORTRAN is good today, and it is still used.

Since Fortran has not been spelled with all caps as of the 1990
standard, you probably don't know much about the current features and
usage of Fortran. People are writing NEW code in Fortran 90 and 95 --
browse comp.lang.fortran. If Fortran were dead, there would not be
about 10 Fortran 95 compiler vendors, including relatively recent
entries like Intel and Pathscale (see
http://www.dmoz.org/Computers/Progra...ran/Compilers/
).

Fortran market share would be higher if the 1990 standard, which fixed
most of Fortran's deficiencies, had appeared sooner. I agree that even
an old language can evolve, but for C that role may be filled by C++.

Nov 14 '05 #41

Dan Pop

In <30**************************@posting.google.com > be*******@aol.com writes:

jacob navia <ja***@jacob.remcomp.fr> wrote in message news:<ch**********@news-reader2.wanadoo.fr>...
C is becoming obsolete, as FORTRAN did. Of course there are still places
where FORTRAN is good today, and it is still used.
Since Fortran has not been spelled with all caps as of the 1990
standard, you probably don't know much about the current features and
usage of Fortran. People are writing NEW code in Fortran 90 and 95 --
browse comp.lang.fortran. If Fortran were dead, there would not be

FORTRAN and Fortran are two fairly different programming languages.
Most FORTRAN features still supported by Fortran are considered either
obsolete or obsolescent: properly written Fortran code hardly resembles
FORTRAN.
about 10 Fortran 95 compiler vendors,
Which strongly suggests that Fortran is no longer the mainstream
programming language FORTRAN used to be until the late seventies, when no
general purpose computer architecture would have been viable without a
FORTRAN compiler.
Fortran market share would be higher if the 1990 standard, which fixed
most of Fortran's deficiencies, had appeared sooner.
The 1990 standard defined a different programming language, that supported
the old FORTRAN features mostly for backward compatibility purposes.
Someone even defined and implemented F, which is F90 with no backward
compatibility features.
I agree that even
an old language can evolve, but for C that role may be filled by C++.

C's evolution seems to have taken a different path than C++'s.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #42

beliavsky

Da*****@cern.ch (Dan Pop) wrote:

In <30**************************@posting.google.com > be*******@aol.com writes:
jacob navia <ja***@jacob.remcomp.fr> wrote in message news:<ch**********@news-reader2.wanadoo.fr>...
C is becoming obsolete, as FORTRAN did. Of course there are still places where FORTRAN is good today, and it is still used.

Since Fortran has not been spelled with all caps as of the 1990
standard, you probably don't know much about the current features and
usage of Fortran. People are writing NEW code in Fortran 90 and 95 --
browse comp.lang.fortran. If Fortran were dead, there would not be

FORTRAN and Fortran are two fairly different programming languages.
Most FORTRAN features still supported by Fortran are considered either
obsolete or obsolescent: properly written Fortran code hardly resembles
FORTRAN.

Few, not "most" features have been deleted or declared obsolescent. In the
700-page Fortran 95 handbook, the discussion of deleted and obsolescent features
takes about 5 pages.

Free-format Fortran code does look different from the fixed format of Fortran
77 and earlier versions, but that does not make it a new language.

about 10 Fortran 95 compiler vendors,

Which strongly suggests that Fortran is no longer the mainstream
programming language FORTRAN used to be until the late seventies, when no
general purpose computer architecture would have been viable without a
FORTRAN compiler.

Can you name a general-purpose hardware/OS platform without a Fortran compiler?
How many vendors have implemented full C99 compilers?

I know of only one textbook that covers C99, Stephen Prata's C Primer Plus
4th ed. At present neither Fortran 95 nor C99 appear to be mainstream languages.
Of course, C89 is a very different story.

----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---

Nov 14 '05 #43

jacob navia

In any case, Fortran people accept operator overloading,
what the C crowd here seems abhor because of "heresy" :-)

Nov 14 '05 #44

Dan Pop

In <ch**********@news-reader2.wanadoo.fr> jacob navia <ja***@jacob.remcomp.fr> writes:

In any case, Fortran people accept operator overloading,
what the C crowd here seems abhor because of "heresy" :-)

That's simply because operator overloading has been a standard Fortran
feature for the last 13 years, but it has never been a standard C feature.

Then again, you're too much of an idiot to be able to understand such
arguments...

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #45

Dan Pop

In <41********@127.0.0.1> "be*******@aol.com" <be*******@127.0.0.1:7501> writes:

Da*****@cern.ch (Dan Pop) wrote:
In <30**************************@posting.google.com > be*******@aol.com writes:
jacob navia <ja***@jacob.remcomp.fr> wrote in message news:<ch**********@news-reader2.wanadoo.fr>...

C is becoming obsolete, as FORTRAN did. Of course there are still places where FORTRAN is good today, and it is still used.

Since Fortran has not been spelled with all caps as of the 1990
standard, you probably don't know much about the current features and
usage of Fortran. People are writing NEW code in Fortran 90 and 95 --
browse comp.lang.fortran. If Fortran were dead, there would not be
FORTRAN and Fortran are two fairly different programming languages.
Most FORTRAN features still supported by Fortran are considered either
obsolete or obsolescent: properly written Fortran code hardly resembles
FORTRAN.

Few, not "most" features have been deleted or declared obsolescent. In the

Nevertheless, most of the surviving ones *are* obsolescent. Again, have
a look at F, which is F90 without backward compatibility features.
700-page Fortran 95 handbook, the discussion of deleted and obsolescent features
takes about 5 pages.
So what?
Free-format Fortran code does look different from the fixed format of Fortran
77 and earlier versions, but that does not make it a new language.
I was, obviously, not talking about the free format vs fixed format
differences. Almost every FORTRAN feature has been replaced in F90, even
if the old form is still supported for backward compatibility. It is
perfectly possible (and even recommended) to write code not using the
old forms. Such code would be complete gibberish to a F77 programmer who
has never read a F90 book. Hence my claim that F90 is a different
programming language.

about 10 Fortran 95 compiler vendors,

Which strongly suggests that Fortran is no longer the mainstream
programming language FORTRAN used to be until the late seventies, when no
general purpose computer architecture would have been viable without a
FORTRAN compiler.

Can you name a general-purpose hardware/OS platform without a Fortran compiler?

PalmOS on all supported hardware. Plenty of C implementations for it.
BASIC and Java, too.

Can you name a GNU F90 or later compiler? They even implemented Ada95...
How many vendors have implemented full C99 compilers?
Who cares? The C language in use today is certainly not C99.
I know of only one textbook that covers C99, Stephen Prata's C Primer Plus
4th ed. At present neither Fortran 95 nor C99 appear to be mainstream languages.
What is the mainstream Fortran flavour today? I have a sneaky suspicion
that it's still something like F77 plus the VAX FORTRAN extensions and
Cray pointers...

Note that F95 is marginally different from F90 (mostly bug fixes, rather
than new features) and most F90 vendors started to support F95 long ago.
So, comparing F95 with C99 is bullshit.
Of course, C89 is a very different story.

For all *practical* intents and purposes, C89 is the current definition of
the C programming language. If you want to compare your favourite Fortran
flavour with C, you have to compare it with C89 (which is only two years
older than F90).

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #46

beliavsky

Da*****@cern.ch (Dan Pop) wrote:

Can you name a GNU F90 or later compiler? They even implemented Ada95...

The G95 project at http://www.g95.org/ is well along and compiles many large
Fortran 95 codes. According to its author Andy Vaught, its speed is now competitive
with commercial compilers on some programs. The gfortran project at http://gcc.gnu.org/fortran/
, which forked from g95 , is at an earlier stage. I believe both use gcc
as a back-end and should eventually be usable wherever gcc is.

I have probably worn out my welcome in comp.lang.c debating Fortran. If you
want to continue the thread let's do it in comp.lang.fortran .

----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---

Nov 14 '05 #47

Kenny McCormack

In article <ch***********@sunnews.cern.ch>, Dan Pop <Da*****@cern.ch> wrote:
....

For all *practical* intents and purposes, C89 is the current definition of
the C programming language.

"practical" is OT here. I thought you understood that by now.

And, lest you think I'm just throwing stones, understand that I agree with
you - I think C89 *is* what we should understand as being "C". But some
here persist in thinking that C99 is actually the current standard.

As they say, there are standards and there are standards ...

Nov 14 '05 #48

Dan Pop

In <41********@127.0.0.1> "be*******@aol.com" <be*******@127.0.0.1:7501> writes:

Da*****@cern.ch (Dan Pop) wrote:
Can you name a GNU F90 or later compiler? They even implemented Ada95...

The G95 project at http://www.g95.org/ is well along and compiles many large

According to the web page, it's not a conforming F95 compiler, it's
merely a project. The list of stuff that happens to compile is entirely
irrelevant.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #49

Dan Pop

In <ch**********@yin.interaccess.com> ga*****@yin.interaccess.com (Kenny McCormack) writes:

In article <ch***********@sunnews.cern.ch>, Dan Pop <Da*****@cern.ch> wrote:
...
For all *practical* intents and purposes, C89 is the current definition of
the C programming language.
"practical" is OT here. I thought you understood that by now.

:-)
And, lest you think I'm just throwing stones, understand that I agree with
you - I think C89 *is* what we should understand as being "C". But some
here persist in thinking that C99 is actually the current standard.

They are right: C99 *is* the current standard, this is why I prefaced my
statement above by "For all *practical* intents and purposes", with an
emphasis on "practical". The value of a standard ignored by its intended
audience at large is purely academic.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #50

Two Questions about "strlen", "strcat" and "strcpy"

Similar topics