473,406 Members | 2,217 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Bug/Gross InEfficiency in HeathField's fgetline program

The function below is from Richard HeathField's fgetline program. For
some reason, it makes three passes through the string (a strlen(), a
strcpy() then another pass to change dots) when two would clearly be
sufficient. This could lead to unnecessarily bad performance on very
long strings. It is also written in a hard-to-read and clunky style.

char *dot_to_underscore(const char *s)
{
char *t = malloc(strlen(s) + 1);
if(t != NULL)
{
char *u;
strcpy(t, s);
u = t;
while(*u)
{
if(*u == '.')
{
*u = '_';
}
++u;
}
}
return
t;
}

Proposed solution:

char *dot_to_underscore(const char *s)
{
char *t, *u;
if(t=u=malloc(strlen(s)+1))
while(*u++=(*s=='.' ? s++, '_' : *s++));
return t;
}

Oct 7 '07
334 11242
Richard Heathfield wrote:
Tor Rustad said:
[...]
>What has assets of national importance, in common with *apples*???

Okay, let's try something oh so very different.
Why not try something computer related?
>I be very surprised, if UK or US security professionals these days,
will hire people with such a complete lack of understanding of basic
security principles.

You have not demonstrated such a lack in your correspondents.
Amazing, I was *not* hiring someone to protect *apples*, *crown* or
looking for a clueless in security, unable to identify *common* errors.

For an introduction to basic security principles, see e.g. [1]:

"Principle 32. Identify and prevent common errors and vulnerabilities

Discussion: Many errors reoccur with disturbing regularity - errors such
as buffer overflows, race conditions, format string errors, failing to
check input for validity, and programs being given excessive privileges.
Learning from the past will improve future results."
The strncpy function does a simple task reasonably well. Yes, we all know
it has a lousy name, but apart from that it's a simple function, easy to
use properly. Yes, it's easy to use improperly too, but then so are lots
of C functions.
The *relevant point*, is that this C function has been misused a lot,
and a buffer overflow can result in a total compromise of a computer
system. The probability of misuse, isn't low either.

[1] NIST Special Publication 800-27, "Engineering Principles for
Information Technology Security".

--
Tor <torust [at] online [dot] no>

C-FAQ: http://c-faq.com/
Oct 18 '07 #151
Malcolm McLean wrote:
"Tor Rustad" <to********@hotmail.comwrote in message
>In science, making statements that cannot be falsified, has *no
value*. So, how can your "no usage pitfalls with strncpy", be
falsified by measurement exactly?
A hypothesis which is empirical in nature is stronger if it survives an
attempt at falsification. That's not the same thing as saying that every
statement in science must be falsifiable, or it has no value. A theorem,
as opposed to theory, cannot be falsified by experiment, for example,
but theorems are very useful to scientists.
I didn't say, the only way to falsify statements, was by experiment. In
Mathematics, we usually falsify the incorrect conclusions or theorems,
by logic. However, I do admit that definitions and axioms are useful,
but can't be falsified.

We have lots of data now on strncpy usage, so why Richard didn't clarify
the conditions I could falsify his "no usage pitfalls with strncpy",
shows he was just word twisting.

--
Tor <torust [at] online [dot] no>

C-FAQ: http://c-faq.com/
Oct 18 '07 #152
Richard Heathfield wrote:
Tor Rustad said:
>>I be very surprised, if UK or US security professionals these days,
will hire people with such a complete lack of understanding of basic
security principles.

You have not demonstrated such a lack in your correspondents. I'm not quite
sure what you /have/ demonstrated (other than, perhaps, your own lack of
understanding of analogies).
Richard, Tor is by his own words in charge of recruitment. Wherever
I looked, such people are called "managers". Managers do not have to
demonstrate anything. They are omniscient by definition. You have lost
before you even knew there was a contest.
Oct 18 '07 #153
Peter Pichler wrote:
>
Richard, Tor is by his own words in charge of recruitment. Wherever
I looked, such people are called "managers".
FYI, more than management can disqualify people from being hired,
especially during the technical session of the interview.

--
Tor <torust [at] online [dot] no>
Oct 18 '07 #154
In data Wed, 17 Oct 2007 17:00:13 +0200, ¬a\/b scrisse:
/* test: if n bytes starting at s
// overlaps n bytes starting at t
*/
/* assume a pointer has the same size of an int and one unsigned */
/* return 1 if error or overlap 0 otherwise */
int mem_overlap123(char* s, int n, char* t, int m)
{if(s==0|| t==0 || n<0 || m<0) return 0;
^^^^^^^^^^^^^

return 1
/* no array can have the address 0
if( ((int)s)>=0 &&((int)(s+n-1))<= 0) return 1;
if( ((int)t)>=0 &&((int)(t+n-1))<= 0) return 1;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

if( ((int)t)>=0 &&((int)(t+m-1))<= 0) return 1;
if( ((int)s)<=0 &&((int)(s+n-1))>= 0) return 1;
if( ((int)t)<=0 &&((int)(t+n-1))>= 0) return 1;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

if( ((int)t)<=0 &&((int)(t+m-1))>= 0) return 1;
/* s----- t------ || t----- s------*/
if( (unsigned)(s+n-1) < (unsigned) t) return 0;
if( (unsigned)(t+m-1) < (unsigned) s) return 0;
return 1;
}

not tested
how many errors do you see?
Oct 18 '07 #155
¬a\/b wrote:
In data Wed, 17 Oct 2007 17:00:13 +0200, ¬a\/b scrisse:
> /* test: if n bytes starting at s
// overlaps n bytes starting at t
// overlaps m bytes starting at t

This is why I prefer to put each parameter on a separate line, and to
add a short comment describing that parameter on the same line. That
makes it easier to keep the comments consistent with the parameters.
Oct 18 '07 #156
Richard Heathfield wrote:
Tor Rustad said:
(b) if you think I'm clueless, why bother to continue this discussion?
So, you wouldn't have replied then, given the same conditions.

if ( tor_eval(richard) == cluless )
goto no_reply;

(c) what if I think /you/ are clueless, unable to recognise *common* sense?
This didn't make any sense, you did reply, didn't you?

Either "I think /you/ are clueless" is *false*, or you are posting
nonsense. You cannot have it both ways:

if ( richard_eval(tor) == cluless )
goto post_reply;

:)
>For an introduction to basic security principles, see e.g. [1]:

"Principle 32. Identify and prevent common errors and vulnerabilities

Discussion: Many errors reoccur with disturbing regularity - errors such
as buffer overflows, race conditions, format string errors, failing to
check input for validity, and programs being given excessive privileges.
Learning from the past will improve future results."

In my experience, the following bug is far more common than strncpy:

char *t;

strcpy(t, s);

The bug here is in failing to allocate *any storage at all* for t.
Such an error, is detectable by statically checking tools, if compiler
doesn't catch it, lint tools like e.g. splint does.

$ cat -n main.c
1 #include <stdio.h>
2 #include <string.h>
3
4 int main(void)
5 {
6
7 const char *s = "Hello";
8 char *t;
9 char d[5];
10
11 /* BUG 1 - storage not allocated */
12 (void)strcpy(t, s);
13 (void)printf("'%s'\n", t);
14
15 /* BUG 2 - destination buffer not null terminated */
16 strncpy(d, s, sizeof d);
17 (void)printf("'%.*s'\n", (int)sizeof d, d);
18
19 /* BUG 3 - destination buffer truncated */
20 strncpy(d, s, sizeof d - 1);
21 (void)printf("'%.*s'\n", (int)sizeof d, d);
22
23 return 0;
24 }
$ gcc -ansi -pedantic -W -Wall main.c
$ splint main.c
Splint 3.1.1 --- 20 Jun 2006

main.c: (in function main)
main.c:12:15: Unallocated storage t passed as out parameter to strcpy: t
An rvalue is used that may not be initialized to a value on some
execution
path. (Use -usedef to inhibit warning)

Finished checking --- 1 code warning
$

Note that the bugs of class 2 and 3, gives no warning above.

>The *relevant point*, is that this C function has been misused a lot,
and a buffer overflow can result in a total compromise of a computer
system. The probability of misuse, isn't low either.

So ban pointers. They cause far more trouble than strncpy.
To be honest, in safety-critical or security-critical software, I can't
ever remember being hit by buffer overflow or unallocated storage, in
production. Before someone is allowed to do the real thing, they need to
master the basics.

However, I have looked at alternatives, for example cyclone:

http://www.cs.umd.edu/~mwh/papers/cyclone-cuj.pdf
FCOL, Tor. Wake up and smell the real risk - clueless programmers, hired by
witless buffoons because they have good hair and a good CV.
The main risk where I work, is rather the "clever" insiders.

--
Tor <torust [at] online [dot] no>
Oct 18 '07 #157
Tor Rustad said:
Richard Heathfield wrote:
>Tor Rustad said:
>(b) if you think I'm clueless, why bother to continue this discussion?

So, you wouldn't have replied then, given the same conditions.

if ( tor_eval(richard) == cluless )
goto no_reply;

>(c) what if I think /you/ are clueless, unable to recognise *common*
sense?

This didn't make any sense, you did reply, didn't you?
Look up "if".
Either "I think /you/ are clueless" is *false*, or you are posting
nonsense. You cannot have it both ways:

if ( richard_eval(tor) == cluless )
goto post_reply;
See? You do understand about "if".
>>For an introduction to basic security principles, see e.g. [1]:

"Principle 32. Identify and prevent common errors and vulnerabilities

Discussion: Many errors reoccur with disturbing regularity - errors
such as buffer overflows, race conditions, format string errors,
failing to check input for validity, and programs being given excessive
privileges. Learning from the past will improve future results."

In my experience, the following bug is far more common than strncpy:

char *t;

strcpy(t, s);

The bug here is in failing to allocate *any storage at all* for t.

Such an error, is detectable by statically checking tools, if compiler
doesn't catch it, lint tools like e.g. splint does.
Try it on this:

/* foo.h */
#ifndef H_FOO_H
#define H_FOO_H 1
void build_foo(char *foo, int bar, char *baz);
#endif

/* foo.c */
#include <string.h>
#include "foo.h"

void build_foo(char *foo, int bar, char *baz)
{
sprintf(foo, "<h%d>%s</h%d>", bar, baz);
}

What warnings do you get? Compilation need not be done all at one time.

>>The *relevant point*, is that this C function has been misused a lot,
and a buffer overflow can result in a total compromise of a computer
system. The probability of misuse, isn't low either.

So ban pointers. They cause far more trouble than strncpy.

To be honest, in safety-critical or security-critical software, I can't
ever remember being hit by buffer overflow or unallocated storage, in
production. Before someone is allowed to do the real thing, they need to
master the basics.
Fine, so you agree that the best solution is to hire good people and make
sure they know what they're doing?
>FCOL, Tor. Wake up and smell the real risk - clueless programmers, hired
by witless buffoons because they have good hair and a good CV.

The main risk where I work, is rather the "clever" insiders.
What do you mean by "clever"? Where I come from, it means "smart, bright,
intelligent" and is considered a compliment. If you mean someone who tries
to write difficult code to show off how well he can write difficult code
(instead of writing easy code that anyone could maintain), I wouldn't call
that "clever".

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Oct 19 '07 #158
In data Thu, 18 Oct 2007 08:07:58 +0200, ¬a\/b scrisse:
>In data Wed, 17 Oct 2007 17:00:13 +0200, ¬a\/b scrisse:
> /* test: if n bytes starting at s
// overlaps n bytes starting at t
// overlaps m bytes starting at t

> */
>>/* assume a pointer has the same size of an int and one unsigned */
/* return 1 if error or overlap 0 otherwise */
int mem_overlap123(char* s, int n, char* t, int m)
{if(s==0|| t==0 || n<0 || m<0) return 0;
^^^^^^^^^^^^^

return 1
> /* no array can have the address 0
if( ((int)s)>=0 &&((int)(s+n-1))<= 0) return 1;
>if( ((int)t)>=0 &&((int)(t+n-1))<= 0) return 1;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^

if( ((int)t)>=0 &&((int)(t+m-1))<= 0) return 1;
>if( ((int)s)<=0 &&((int)(s+n-1))>= 0) return 1;
if( ((int)t)<=0 &&((int)(t+n-1))>= 0) return 1;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^

if( ((int)t)<=0 &&((int)(t+m-1))>= 0) return 1;
there is anhother error
the 'singular' points are two 0 and INT_MAX
--0 == -1 ++INT_MAX!=0
> /* s----- t------ || t----- s------*/
if( (unsigned)(s+n-1) < (unsigned) t) return 0;
if( (unsigned)(t+m-1) < (unsigned) s) return 0;
return 1;
}

not tested
how many errors do you see?
Oct 19 '07 #159
In data Fri, 19 Oct 2007 08:49:43 +0200, ¬a\/b scrisse:

/* test: if n bytes starting at s
// overlaps m bytes starting at t
*/

#define uns unsigned

/* assume a pointer has the same size of one unsigned */
/* return 1 if error or overlap 0 otherwise
if you like uns == size_t
*/
int mem_overlap123(char* s, int n, char* t, int m)
{if(s==0|| t==0) return 1;

/* s and t can not go trhu 0 address */
/* s----- t------- */
if((uns)(s+n-1) < (uns) s) return 1;
if((uns)(t+m-1) < (uns) t) return 1;
/* overlap */
/* s----- t------ || t----- s------*/
if( (uns)(s+n-1) < (uns) t) return 0;
if( (uns)(t+m-1) < (uns) s) return 0;

return 1;
}

Oct 19 '07 #160
In data Fri, 19 Oct 2007 09:19:27 +0200, ¬a\/b scrisse:
>In data Fri, 19 Oct 2007 08:49:43 +0200, ¬a\/b scrisse:

/* test: if n bytes starting at s
// overlaps m bytes starting at t
*/

#define uns unsigned

/* assume a pointer has the same size of one unsigned */
/* return 1 if error or overlap 0 otherwise
if you like uns == size_t
*/
int mem_overlap123(char* s, int n, char* t, int m)
int mem_overlap123(char* s, uns n, char* t, uns m)
>{if(s==0|| t==0) return 1;

/* s and t can not go trhu 0 address */
/* s----- t------- */
if((uns)(s+n-1) < (uns) s) return 1;
if((uns)(t+m-1) < (uns) t) return 1;
/* overlap */
/* s----- t------ || t----- s------*/
if( (uns)(s+n-1) < (uns) t) return 0;
if( (uns)(t+m-1) < (uns) s) return 0;

return 1;
}
so nobody has some interest in doing a check overlap function
in less than O(n+m) ?
Oct 19 '07 #161
Richard Heathfield wrote:
Tor Rustad said:
....
>The main risk where I work, is rather the "clever" insiders.

What do you mean by "clever"? Where I come from, it means "smart, bright,
intelligent" and is considered a compliment.
He's not using the word with a different meaning; he's using it
sarcastically, to describe someone who's just clever enough to create
some very complicated problems, without being clever enough to avoid or
escape those problems.
Oct 19 '07 #162
¬a\/b wrote:
....
int mem_overlap123(char* s, uns n, char* t, uns m)
>{if(s==0|| t==0) return 1;

/* s and t can not go trhu 0 address */
/* s----- t------- */
if((uns)(s+n-1) < (uns) s) return 1;
if((uns)(t+m-1) < (uns) t) return 1;
/* overlap */
/* s----- t------ || t----- s------*/
if( (uns)(s+n-1) < (uns) t) return 0;
if( (uns)(t+m-1) < (uns) s) return 0;

return 1;
}

so nobody has some interest in doing a check overlap function
in less than O(n+m) ?
Not unless it's portable, and I'm not aware of any portable algorithm
for checking overlap that meets that specification. I've already pointed
out why your algorithm is not portable.
Oct 19 '07 #163
"James Kuyper Jr." <ja*********@verizon.netwrites:
Richard Heathfield wrote:
>Tor Rustad said:
...
>>The main risk where I work, is rather the "clever" insiders.

What do you mean by "clever"? Where I come from, it means "smart,
bright, intelligent" and is considered a compliment.

He's not using the word with a different meaning; he's using it
sarcastically, to describe someone who's just clever enough to create
some very complicated problems, without being clever enough to avoid
or escape those problems.
It's a euphemism for "cocky smartass".
Oct 19 '07 #164
Richard Heathfield wrote:
Tor Rustad said:
>Richard Heathfield wrote:
>>Tor Rustad said:
[pissing match snipped, even if I really need to blow off more steam
after a hell of a month!]
Try it on this:

/* foo.h */
#ifndef H_FOO_H
#define H_FOO_H 1
void build_foo(char *foo, int bar, char *baz);
#endif

/* foo.c */
#include <string.h>
???

#include <stdio.h>
#include "foo.h"

void build_foo(char *foo, int bar, char *baz)
???

broken API design, remember gets()?
{
sprintf(foo, "<h%d>%s</h%d>", bar, baz);
???

sprintf(foo, "<h%d>%s</h%d>", bar, baz, bar);
}

What warnings do you get? Compilation need not be done all at one time.
I don't see the point with analyzing the broken code above.

Not all kinds of buffer overruns can be detected by static analysis, but
I have used splint to check private API's for overruns before. The
idea in splint, is to use annontations, the *requires* and *ensures*
clauses, put in pre- and postconditions, respectively.

Constraints on lvalues, can be set via maxSet() and minSet(), while
constraints on rvalues can be set via maxRead() and minRead(). Those
constraints, set bounds of legal memory access.

Example:

char *my_strcpy(char *s1, const char *s2)
/*@requires maxSet(s1) >= maxRead(s2)@*/
/*@ensures maxRead(s1) == maxRead(s2)
/\ result == s1 @*/;

>>>The *relevant point*, is that this C function has been misused a lot,
and a buffer overflow can result in a total compromise of a computer
system. The probability of misuse, isn't low either.
So ban pointers. They cause far more trouble than strncpy.
To be honest, in safety-critical or security-critical software, I can't
ever remember being hit by buffer overflow or unallocated storage, in
production. Before someone is allowed to do the real thing, they need to
master the basics.

Fine, so you agree that the best solution is to hire good people and make
sure they know what they're doing?
There isn't a single *best solution* in security engineering, just like
in good software engineering:

bad functional specs, if you screw up design, if you screw up version
control, if you screw up testing ...

the resulting software isn't trustworthy.
>>FCOL, Tor. Wake up and smell the real risk - clueless programmers, hired
by witless buffoons because they have good hair and a good CV.
The main risk where I work, is rather the "clever" insiders.

What do you mean by "clever"? Where I come from, it means "smart, bright,
intelligent" and is considered a compliment. If you mean someone who tries
to write difficult code to show off how well he can write difficult code
(instead of writing easy code that anyone could maintain), I wouldn't call
that "clever".
A common mistake among the "clever", is that they beleave they are
better than their own constraints. Not knowing your own limitation, is a
major security risk IMO. I do by far, prefer humble smartness.

There is a second dimension to this, the vast majority of fraud, is done
by insiders, IIRC ca. 60%-70%.

--
Tor <torust [at] online [dot] no>
Oct 19 '07 #165
Tor Rustad said:
I don't see the point with analyzing the broken code above.
Neither do I. In an argument about correctness, I should have taken the
time to compile the example code rather than trust my fingers to get it
right on auto-pilot. But, *had I done so*, it would have been a good
example!
Not all kinds of buffer overruns can be detected by static analysis,
Indeed. This is kind of my point, really. If you could detect all errors
automatically, you wouldn't need bright programmers. But since you can't,
you do.

<snip>
There isn't a single *best solution* in security engineering,
Agreed. Nevertheless, the best solution is to hire bright people. Bright
people should be able to work out how not to abuse strncpy, right?
A common mistake among the "clever", is that they beleave they are
better than their own constraints. Not knowing your own limitation, is a
major security risk IMO. I do by far, prefer humble smartness.
That's "clever" as in "dumb", right? Just checking.

The risk of not knowing your own limitations and weaknesses is precisely
the reason that clever people use coding conventions/standards/style
guides, and ask for code reviews by their peers. Indeed, it's why we
bother to do testing. I fail to see how this says anything about strncpy,
though.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Oct 20 '07 #166

"Richard Heathfield" <rj*@see.sig.invalidwrote in message
>
>There isn't a single *best solution* in security engineering,

Agreed. Nevertheless, the best solution is to hire bright people. Bright
people should be able to work out how not to abuse strncpy, right?
That's one of the few immutable rules of engineering. The better the people
you can hire, the less chance of mistakes.
The problem is that salaries are serious money, and even if you rack them
up, good people are not always easy to obtain. So the non-ideal reality is
that you might get a Java man, two years out from college and who did a
short course on C, fixing up your C program. So he sees a strcpy() and knows
that Microsoft has deprecated the call. Ha ha, he thinks, security loophole
here. Let's fix it up. Oh, the MS sstrcpy() doesn't seem to be available on
this installation. However they've got an equivalent called strncpy(). Let's
just drop that in instead.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Oct 20 '07 #167

"James Kuyper Jr." <ja*********@verizon.netwrote in message
Imposing the requirements you suggest would make efficient implementation
of C on many platforms more difficult. I've used machines where 16 bits
was the most reasonable size for 'int', which had a LOT more than 65536
bytes of memory installed. I'm certain that there will be machines in the
future (I wouldn't be surprised if they already exist) where the natural
size for an integer is 32 bits, which have far more than 4GB of memory
installed.
Here I don't agree.

Basically you are saying the paradigm

int i;

for(i=0;i<N;i++)
array[i] = x;

ought to be allowed to break down if 16 or 32 bit operations are faster on
machines with 32 or 64 bit address spaces.
I'd say that the burden should be on the micro-optimiser to say

short i; /* inner loop, use 16 bit type */

That does lead to the problem of what type to use for 32 bit register, 64
bit address space machines.

There was also the problem of the old x86 segmented machines, where arrays
of more than 64K were highly inefficient but sometimes needed. Generally
however, if you've got 64 bits in the address bus, you've got to ask what
the designer is doing by not providing an efficient 64 bit integer.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Oct 20 '07 #168
Malcolm McLean wrote, On 20/10/07 06:56:
>
"James Kuyper Jr." <ja*********@verizon.netwrote in message
>Imposing the requirements you suggest would make efficient
implementation of C on many platforms more difficult. I've used
machines where 16 bits was the most reasonable size for 'int', which
had a LOT more than 65536 bytes of memory installed. I'm certain that
there will be machines in the future (I wouldn't be surprised if they
already exist) where the natural size for an integer is 32 bits, which
have far more than 4GB of memory installed.
Here I don't agree.

Basically you are saying the paradigm

int i;

for(i=0;i<N;i++)
array[i] = x;

ought to be allowed to break down if 16 or 32 bit operations are faster
on machines with 32 or 64 bit address spaces.
Only if N is large. Or do you really want to increase the cost of most
of the appliances in your house, the cost of your car, the cost of
planes (and thus flying) etc.
I'd say that the burden should be on the micro-optimiser to say

short i; /* inner loop, use 16 bit type */
Your suggestion would mean that a lot of embedded code would have to use
short almost exclusively instead of int. Are you going to pay for all
the rewriting?
That does lead to the problem of what type to use for 32 bit register,
64 bit address space machines.

There was also the problem of the old x86 segmented machines, where
arrays of more than 64K were highly inefficient but sometimes needed.
Generally however, if you've got 64 bits in the address bus, you've got
to ask what the designer is doing by not providing an efficient 64 bit
integer.
Perhaps the designer is saving $25 on a $100 product. High speed memory
(e.g. cache) is expensive, so however fast operations are on 64 bit
integers you can massively increase your costs, or slow things down
massively, by doubling the size of your basic integer type.

You seem to have forgotten that all this and more has already been
pointed out to you. If you think the decision is wrong start by taking
it up with Intel, AMD, the Posix standard group and MS. Most people will
have some respect for the abilities of at least one of these groups,
although which group will depend on the person.

BTW, I happen to know that there are still a number of processors with
the Z80 instruction set flying around and a number of processors early
in the 80x86 range as well. When I say flying around I mean that they
are part of avionics systems on current aircraft.
--
Flash Gordon
Oct 20 '07 #169
"Tor Rustad" <to********@hotmail.coma écrit dans le message de news:
cf*********************@telenor.com...
kuyper wrote:
>Richard wrote:
...
>>I am astonished that people claiming to be professional programmers
could be in any way "confused" or "unsure" about your concise
replacement for Heathfield's version.

,----
| while(*u++=(*s=='.' ? '_' : *s))
| s++;
`----

I simply can not see the complication or the need to "analyse" this. Yes
if we were training in lesson 2 we might expand it out a little. but
only to the extent of removing the ?: usage to an if then else.

I have interviewed several dozen people for C programming positions
over the past 15 years. I've given every single one of them a simple
program to understand and explain and suggest improvements for. The
heart of that program was the following loop:

while(*p++ = *q++);

Only about half of them could even tell me correctly what that loop
does; not a single one has ever correctly explained how it does it.

Recently I was involved in hiring a security officer with C programming
skills. Those who passed the initial screening, had at least a master
degree, but their experience varied a lot.

I ended up recommending the only one, with 0 work experience, who admitted
he didn't knew C well. The seniors, failed big time implementing strncpy()
on the blackboard. Very embarrassing.
You should see the horrors I get for atoi, from seniors and juniors!

--
Chqrlie.
Oct 20 '07 #170
"Richard Heathfield" <rj*@see.sig.invalida écrit dans le message de news:
qd******************************@bt.com...
Tor Rustad said:

<snip>
>I ended up recommending the only one, with 0 work experience, who
admitted he didn't knew C well. The seniors, failed big time
implementing strncpy() on the blackboard. Very embarrassing.


Well, I'm game. Is this a blackboard? Why, yes, it is (although it's
actually white, but never mind).

Okay, it's an interview, so I'm not allowed to look stuff up. So, off the
top of my head, strncpy copies no more than n characters from s to t,
stopping at a null terminator if present, and zero-padding t. It then
returns t. I can't actually remember whether n is size_t or int. (It ought
to be size_t, of course, but then so ought the n in fgets.) So I'll risk
embarrassment by plumping for size_t.

#include <stddef.h>

char *strncpy(char *t, const char *s, size_t n)
{
char *u = t;
while(n 0 && *s != '\0')
{
*t++ = *s++;
--n;
}
while(n-- 0)
{
*t++ = '\0';
}
return u;
}

How did I do? Should I start blushing yet?
Why do you include <stddef.hinstead of <string.h?

--
Chqrlie.
Oct 20 '07 #171
"Tor Rustad" <to********@hotmail.coma écrit dans le message de news:
XJ*********************@telenor.com...
Richard Heathfield wrote:
>Tor Rustad said:
>(b) if you think I'm clueless, why bother to continue this discussion?

So, you wouldn't have replied then, given the same conditions.

if ( tor_eval(richard) == cluless )
goto no_reply;

>(c) what if I think /you/ are clueless, unable to recognise *common*
sense?

This didn't make any sense, you did reply, didn't you?

Either "I think /you/ are clueless" is *false*, or you are posting
nonsense. You cannot have it both ways:

if ( richard_eval(tor) == cluless )
goto post_reply;

:)
>>For an introduction to basic security principles, see e.g. [1]:

"Principle 32. Identify and prevent common errors and vulnerabilities

Discussion: Many errors reoccur with disturbing regularity - errors such
as buffer overflows, race conditions, format string errors, failing to
check input for validity, and programs being given excessive privileges.
Learning from the past will improve future results."

In my experience, the following bug is far more common than strncpy:

char *t;

strcpy(t, s);

The bug here is in failing to allocate *any storage at all* for t.

Such an error, is detectable by statically checking tools, if compiler
doesn't catch it, lint tools like e.g. splint does.

$ cat -n main.c
1 #include <stdio.h>
2 #include <string.h>
3
4 int main(void)
5 {
6
7 const char *s = "Hello";
8 char *t;
9 char d[5];
10
11 /* BUG 1 - storage not allocated */
12 (void)strcpy(t, s);
13 (void)printf("'%s'\n", t);
14
15 /* BUG 2 - destination buffer not null terminated */
16 strncpy(d, s, sizeof d);
17 (void)printf("'%.*s'\n", (int)sizeof d, d);
18
19 /* BUG 3 - destination buffer truncated */
20 strncpy(d, s, sizeof d - 1);
21 (void)printf("'%.*s'\n", (int)sizeof d, d);
22
23 return 0;
24 }
$ gcc -ansi -pedantic -W -Wall main.c
$ splint main.c
Splint 3.1.1 --- 20 Jun 2006

main.c: (in function main)
main.c:12:15: Unallocated storage t passed as out parameter to strcpy: t
An rvalue is used that may not be initialized to a value on some
execution
path. (Use -usedef to inhibit warning)

Finished checking --- 1 code warning
$

Note that the bugs of class 2 and 3, gives no warning above.
Whether 2 and 3 are bugs depends on how ``d'' will be used and on
programmers intent regarding truncation. But I agree with you, I would
advocate warnings on all strncpy instances.

Why do (void) strcpy and not strncpy ?
That is an ugly convention anyway, and splint would be silly to complain
about the return value of strcpy or strncpy not being used.

--
Chqrlie
Oct 20 '07 #172
Malcolm McLean wrote:
>
"James Kuyper Jr." <ja*********@verizon.netwrote in message
>Imposing the requirements you suggest would make efficient
implementation of C on many platforms more difficult. I've used
machines where 16 bits was the most reasonable size for 'int', which
had a LOT more than 65536 bytes of memory installed. I'm certain that
there will be machines in the future (I wouldn't be surprised if they
already exist) where the natural size for an integer is 32 bits, which
have far more than 4GB of memory installed.
Here I don't agree.

Basically you are saying the paradigm

int i;

for(i=0;i<N;i++)
array[i] = x;

ought to be allowed to break down if 16 or 32 bit operations are faster
on machines with 32 or 64 bit address spaces.
I'm confused by your example, and it's supposed connection to what I
said. Without definitions for N, array, and x, I'm left to assume that
they all have reasonable definitions. As long as N <= INT_MAX, and
assuming that array is defined as having at least N elements, and 'x'
has a value that can safely be converted to the type of array[i], I see
no way to interpret what I said as endorsing failure of that loop on
such machines.

I didn't say so, but my argument does imply an endorsement of the fact
that the standard does not define the value of INT_MAX, but only sets a
lower limit on that value. Is that what you're talking about? Are you
complaining about the need to compare N with INT_MAX, instead of being
able to relying on INT_MAX being the same on all implementations?

If the language were re-defined from scratch, I'd endorse making the
size-named types which were added in C99 fundamental types, preferably
with a nicer naming convention. For such a loop counter, I'd use the
equivalent of "int_fastN_t", choosing a value for N that is as small as
possible while still being big enough to prevent the loop counter from
overflowing. In C99, that's inconveniently complicated to write, and I
would normally use 'signed char', 'int' and 'long' as rough synonyms for
'int_fast8_t', 'int_fast16_t', and 'int_fast32_t', respectively.
Oct 20 '07 #173
Richard Heathfield wrote:
Tor Rustad said:
....
>A common mistake among the "clever", is that they beleave they are
better than their own constraints. Not knowing your own limitation, is a
major security risk IMO. I do by far, prefer humble smartness.

That's "clever" as in "dumb", right? Just checking.
No, there's a key difference between the sarcastic use of "clever" and
ordinary misuse of "dumb" as an insensitive synonym for stupid.

A "clever" person is sufficiently intelligent and bright to get himself
into trouble in ways that are far too complicated for a stupid person to
duplicate. Someone who is clever, rather than "clever" is sufficiently
intelligent to avoid getting into such trouble in the first place.
Oct 20 '07 #174
Charlie Gordon said:
"Richard Heathfield" <rj*@see.sig.invalida écrit dans le message de
news: qd******************************@bt.com...
>>
<strncpy implementation>
>>
How did I do? Should I start blushing yet?

Why do you include <stddef.hinstead of <string.h?
I needed a definition for size_t. This is defined in <stddef.h(as well as
in other places), which is why I included it. I didn't see any particular
need to include <string.h>, since it contained nothing I remembered
needing. On reflection, it would have been useful to pick up the
<string.hprototype of strncpy, just on the off-chance that I
misremembered the n type. (And yes, had I done so, I could have omitted
the <stddef.hheader completely.)

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Oct 20 '07 #175
James Kuyper Jr. said:
Richard Heathfield wrote:
>Tor Rustad said:
...
>>A common mistake among the "clever", is that they beleave they are
better than their own constraints. Not knowing your own limitation, is
a major security risk IMO. I do by far, prefer humble smartness.

That's "clever" as in "dumb", right? Just checking.

No, there's a key difference between the sarcastic use of "clever" and
ordinary misuse of "dumb" as an insensitive synonym for stupid.
I'm looking at his usage, and he's saying that "clever" people believe they
are better than their own constraints, by which I take him to mean that
such people think that they can get away with writing code in a way that
they would criticise if other people were to write it in that way.

If I have interpreted him correctly, the behaviour he describes is properly
labelled as "stupid" or "dumb".

<snip>

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Oct 20 '07 #176
Charlie Gordon wrote:
"Tor Rustad" <to********@hotmail.coma écrit dans le message de news:
[...]
Why do (void) strcpy and not strncpy ?
No reason really. The strncpy() code, was added to an existing source.
That is an ugly convention anyway, and splint would be silly to complain
about the return value of strcpy or strncpy not being used.
The cast was put there, before I knew what lint level I wanted. When the
default level was sufficient, I dropped the cast, but copied the
printf() line already there. :)

--
Tor <torust [at] online [dot] no>

"I have stopped reading Stephen King novels. Now I just read C code instead"
Oct 20 '07 #177
Richard Heathfield wrote:
Tor Rustad said:
[...]
>There isn't a single *best solution* in security engineering,

Agreed. Nevertheless, the best solution is to hire bright people.
The candidate will be evaluated on a number of different things, the
brightest, may not be the best candidate for a position. I rather say it
this way: being bright above a certain level, is a precondition, and I
have no interest training a clueless.

Bright people should be able to work out how not to abuse strncpy,
right?
I wouldn’t bet my life on it.

Why not use both bright people and Occam's razor?

>A common mistake among the "clever", is that they beleave they are
better than their own constraints. Not knowing your own limitation,
is a major security risk IMO. I do by far, prefer humble smartness.

That's "clever" as in "dumb", right? Just checking.
Not really, their IQ can still be very high. Some of the worst C code I
have audited, has been written by "clever" people. It appears to me, the
smarter they are, the more of a mess they can create, before being detected.

During audit, I have seen "prototype", "student code" or "good weather
code". Those programmers, has not complied with good software
engineering principles, perhaps because they never learned it or are too
lazy. The code audit was forced upon them, likely for the first time of
their lives.
I fail to see how this says anything about strncpy, though.
I suggest you re-read my initial post in this sub-thread then, which you
replied to by posting a strncpy() implementation.

--
Tor <torust [at] online [dot] no>

"I have stopped reading Stephen King novels. Now I just read C code instead"
Oct 20 '07 #178
"Flash Gordon" <sp**@flash-gordon.me.ukwrote in message
Malcolm McLean wrote, On 20/10/07 06:56:
>>
I'd say that the burden should be on the micro-optimiser to say

short i; /* inner loop, use 16 bit type */

Your suggestion would mean that a lot of embedded code would have to use
short almost exclusively instead of int. Are you going to pay for all the
rewriting?
It only applies to embedded machines where the address space is greater than
the register size. I can think of an obvious example - the typical 8 bit
scene. However here the standard has spoken, and in my favour. int must be
16 bits on such machines.

However granted that there are some such machines - int needs replacing with
short one of these two hold. Either the extra instructions to calculate a 32
bit result produce unacceptable performance, or the space taken is
unacceptable. There will be a few cases.

However if I am to pay for the rewriting I also want the economic benefits
of decreased costs and easier integration and reuse.

My namesake, Malcom (sic) McLean introduced containerised shipping. You
would have been the first to say "but Mr McLean, not all goods fit easily
into containers. Are you going to pay for all that hold space wasted as
ships sail around with half-filled containers?". It is an inefficiency, but
actually he revolutionised the cargo transport industry, simply by
increasing the ease of handling. Every container fits every crane, every
lorry and every railway truck, because there is only one size.
Perhaps the designer is saving $25 on a $100 product. High speed memory
(e.g. cache) is expensive, so however fast operations are on 64 bit
integers you can massively increase your costs, or slow things down
massively, by doubling the size of your basic integer type.
Assuming that you have a pattern of uniform cache usage. If you use 20% of
the cache 80% of the time, you will slow things down by about 10% by halving
effective cache size. Its a cost, but not the massive hit you are
suggesting.
>
You seem to have forgotten that all this and more has already been pointed
out to you. If you think the decision is wrong start by taking it up with
Intel, AMD, the Posix standard group and MS. Most people will have some
respect for the abilities of at least one of these groups, although which
group will depend on the person.
It is no part of my case that the people who disagree with me are stupid.
>
BTW, I happen to know that there are still a number of processors with the
Z80 instruction set flying around and a number of processors early in the
80x86 range as well. When I say flying around I mean that they are part of
avionics systems on current aircraft.
As I said, Z80s have 8 bit registers but int must be 16 bits. Let's keep
that convention. The segmented 80x86, where non-address size ints do make
some sort of sense, was an atrocious design for that very reason. In
practise the problem was solved on PCs at least by extending the language.
It is now widely acknowledged that flat architectures are better.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Oct 21 '07 #179

"James Kuyper Jr." <ja*********@verizon.netwrote in message
Malcolm McLean wrote:
>>
Basically you are saying the paradigm

int i;

for(i=0;i<N;i++)
array[i] = x;

ought to be allowed to break down if 16 or 32 bit operations are faster
on machines with 32 or 64 bit address spaces.

I'm confused by your example, and it's supposed connection to what I said.
Without definitions for N, array, and x, I'm left to assume that they all
have reasonable definitions. As long as N <= INT_MAX, and assuming that
array is defined as having at least N elements, and 'x' has a value that
can safely be converted to the type of array[i], I see no way to interpret
what I said as endorsing failure of that loop on such machines.
The point is that N isn't something we can control. It is given to us as "an
integral type" that counts the number of items in the array. However we know
that N must fit in memory, or else our computer isn't powerful enough to
handle the problem.
So really N needs to be a size_t, and i needs to be a size_t as well, and
vast swathes of C code are obsolete or subtly broken. Unless you specify
that int shall be able to address any array.
>
If the language were re-defined from scratch, I'd endorse making the
size-named types which were added in C99 fundamental types, preferably
with a nicer naming convention.
Exactly. We need a size_t. But it needs to be signed - the advantages
outweight that extra bit. And it needs a nicer name. Preferable a
three-letter one that suggests an arbitrary integer.
And we don't even need to change a word of the standard to achieve this.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Oct 21 '07 #180
Malcolm McLean wrote, On 21/10/07 08:07:
"Flash Gordon" <sp**@flash-gordon.me.ukwrote in message
>Malcolm McLean wrote, On 20/10/07 06:56:
>>>
I'd say that the burden should be on the micro-optimiser to say

short i; /* inner loop, use 16 bit type */

Your suggestion would mean that a lot of embedded code would have to
use short almost exclusively instead of int. Are you going to pay for
all the rewriting?
It only applies to embedded machines where the address space is greater
than the register size. I can think of an obvious example - the typical
8 bit scene. However here the standard has spoken, and in my favour. int
must be 16 bits on such machines.
I was not. It applies to plenty of 16 bit processors as well.
However granted that there are some such machines - int needs replacing
with short one of these two hold. Either the extra instructions to
calculate a 32 bit result produce unacceptable performance, or the space
taken is unacceptable. There will be a few cases.
The space can be unacceptably large on the latest 64 bit computers as
has been mentioned to you already.
However if I am to pay for the rewriting I also want the economic
benefits of decreased costs and easier integration and reuse.
You can have all the benefit it would give me. I've included the exact
amount of money in this post. Yes, that is correct, there is exactly no
money in this post.
My namesake, Malcom (sic) McLean introduced containerised shipping. You
would have been the first to say "but Mr McLean, not all goods fit
easily into containers. Are you going to pay for all that hold space
wasted as ships sail around with half-filled containers?". It is an
inefficiency, but actually he revolutionised the cargo transport
industry, simply by increasing the ease of handling. Every container
fits every crane, every lorry and every railway truck, because there is
only one size.
Now try and put one of those containers in the back of a transit or on
the back of a couriers bike. In either case you will find it does not work.
>Perhaps the designer is saving $25 on a $100 product. High speed
memory (e.g. cache) is expensive, so however fast operations are on 64
bit integers you can massively increase your costs, or slow things
down massively, by doubling the size of your basic integer type.
Assuming that you have a pattern of uniform cache usage. If you use 20%
of the cache 80% of the time, you will slow things down by about 10% by
halving effective cache size. Its a cost, but not the massive hit you
are suggesting.
No, it is not a massive cost like I suggested. Sometimes it is far
larger. For example, you have to go to the next smaller gate size which
costs 4 times as much or more because it is new. Or the higher capacity
may simply not be available thus killing the project.

Just because *you* do not approach the limits does not mean that others
are not. There are plenty of situations where people are working at the
limits.

Oh, and has also been pointed out the increased power consumption or
heat dissipation may not be acceptable (important considerations in
*lots* of products, including notebook PCs). That reminds me, we should
set the greens on well since you are advocating needlessly increasing
energy consumption.
>You seem to have forgotten that all this and more has already been
pointed out to you. If you think the decision is wrong start by taking
it up with Intel, AMD, the Posix standard group and MS. Most people
will have some respect for the abilities of at least one of these
groups, although which group will depend on the person.
It is no part of my case that the people who disagree with me are stupid.
Then go and try and convince them. If you succeed in convincing us it
will not change the situation, if you convince them it will.
>BTW, I happen to know that there are still a number of processors with
the Z80 instruction set flying around and a number of processors early
in the 80x86 range as well. When I say flying around I mean that they
are part of avionics systems on current aircraft.
As I said, Z80s have 8 bit registers but int must be 16 bits. Let's keep
that convention. The segmented 80x86, where non-address size ints do
make some sort of sense, was an atrocious design for that very reason.
Or the 68000 which was an excellent design with a flat memory space a 16
bit ALU and larger than 16 bit address space. There are plenty of other
example.
In practise the problem was solved on PCs at least by extending the
language. It is now widely acknowledged that flat architectures are better.
You hit the same problem with flat architectures. It has not been
uncommon for address registers to be wider than the ALU.
--
Flash Gordon
Oct 21 '07 #181
Malcolm McLean wrote:
>
"James Kuyper Jr." <ja*********@verizon.netwrote in message
>Malcolm McLean wrote:
>>>
Basically you are saying the paradigm

int i;

for(i=0;i<N;i++)
array[i] = x;

ought to be allowed to break down if 16 or 32 bit operations are
faster on machines with 32 or 64 bit address spaces.

I'm confused by your example, and it's supposed connection to what I
said. Without definitions for N, array, and x, I'm left to assume that
they all have reasonable definitions. As long as N <= INT_MAX, and
assuming that array is defined as having at least N elements, and 'x'
has a value that can safely be converted to the type of array[i], I
see no way to interpret what I said as endorsing failure of that loop
on such machines.
The point is that N isn't something we can control. It is given to us as
"an integral type" that counts the number of items in the array. ...
As I said, you didn't define N, so I had no idea whether or not it was
under your control. Even if N is not under your control, whether your
code enter that loop with a dangerous value of N is under your control:

if(N INT_MAX || N ELEMENTS(array))
{
// Error handling
}
else
{
// Loop code
}

There are, of course, other, more elegant ways of ensuring that the loop
is not entered with a dangerous value of N (for instance, the array
could be declared with a length of N), but if your code is doing nothing
to prevent that, it's poorly designed; it wouldn't pass code review in
my shop.
... However
we know that N must fit in memory, or else our computer isn't powerful
enough to handle the problem.
So really N needs to be a size_t, and i needs to be a size_t as well,
and vast swathes of C code are obsolete or subtly broken. Unless you
specify that int shall be able to address any array.
....
>If the language were re-defined from scratch, I'd endorse making the
size-named types which were added in C99 fundamental types, preferably
with a nicer naming convention.
Exactly. We need a size_t. But it needs to be signed - the advantages
outweight that extra bit. And it needs a nicer name. Preferable a
three-letter one that suggests an arbitrary integer.
And we don't even need to change a word of the standard to achieve this.
How could you add a requirement not currently in the standard without
changing a word of it?

I presume the three letter word you're suggesting is 'int'. The
standard's current specification for 'int' is that it "... has the
natural size suggested by the architecture of the execution environment
....". Historically, and probably also in the future, the natural size on
some machines has not been one which could meet your requirement, and
there's a lot of code out there which assumes that 'int' is indeed the
"natural size". I, for instance, have been writing such code for about
30 years now. Therefore, for the sake of backwards compatibility I would
oppose any requirement that would prohibit it from being the natural
size on those machines.

It's not as if having to use a typedef for a type which meets your
requirement would be a major problem for those programmers who wish to
use it.
Oct 21 '07 #182

"James Kuyper Jr." <ja*********@verizon.netwrote in message
I presume the three letter word you're suggesting is 'int'. The standard's
current specification for 'int' is that it "... has the natural size
suggested by the architecture of the execution environment ...".
Historically, and probably also in the future, the natural size on some
machines has not been one which could meet your requirement, and there's a
lot of code out there which assumes that 'int' is indeed the "natural
size". I, for instance, have been writing such code for about 30 years
now. Therefore, for the sake of backwards compatibility I would oppose any
requirement that would prohibit it from being the natural size on those
machines.

It's not as if having to use a typedef for a type which meets your
requirement would be a major problem for those programmers who wish to use
it.
The "natural size" is either the size of a register or the size of the
address bus. Usually the two are identical, but not always.
Where registers are 8 bits the standard has spoken - int is to be the size
of the address bus. I suppose there is some processor out there with 8 bit
registers and 32 bits of memory space to falsify me here, but basically 256
bytes of memory isn't enough for most programs, whilst 65,536 is generally
adequate for a small problem that doesn't demand a fast processor.

The problem comes with 64 bit machines. Is the natural size 32 bits, which
will be enough for most purposes, and might be a bit faster, or 64 bits?

I'd say that if you allow objects of over 4GB, then the natural integer size
is 64 bits. Integers are usually used either to index arrays or to count
them. Not every integer, obviously, but the great majority. So there's a
need for a type thast is able to index any array that caller might throw at
it.
However there is also a need for a 32 bit type to help out the
micro-optimiser. Integers count things, and most things come in blocks of
substantially less than 2 billion.
So we do actually need a change to the standard. A type for arbitrary array
index operations, and a type that is guaranteed to be fast. But the majority
of use will be the first type. Normally it is more important that software
fits together than that routines squeeze the last drop of efficiency out of
the CPU. Again not always, but normally. Projects fail because code becomes
unmanageable, not usually because the processor isn't fast enough.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Oct 21 '07 #183
Malcolm McLean wrote:
>
"James Kuyper Jr." <ja*********@verizon.netwrote in message
>I presume the three letter word you're suggesting is 'int'. The
standard's current specification for 'int' is that it "... has the
natural size suggested by the architecture of the execution
environment ...". Historically, and probably also in the future, the
natural size on some machines has not been one which could meet your
requirement, and there's a lot of code out there which assumes that
'int' is indeed the "natural size". I, for instance, have been
writing such code for about 30 years now. Therefore, for the sake of
backwards compatibility I would oppose any requirement that would
prohibit it from being the natural size on those machines.

It's not as if having to use a typedef for a type which meets your
requirement would be a major problem for those programmers who wish
to use it.
I suppose there is some processor out there
with 8 bit registers and 32 bits of memory space to falsify me here,
Well then, use size_t or long. What's the problem?
The problem comes with 64 bit machines. Is the natural size 32 bits,
which will be enough for most purposes, and might be a bit faster, or
64 bits?
Whatever the processor manufacturer decides to implement. Neither is
intrinsically "natural." It is what you define it to be.
I'd say that if you allow objects of over 4GB, then the natural
integer size is 64 bits.
Which it is on most 64 bit systems.
So there's a need for a type thast is able to index any
array that caller might throw at it.
That type in C is size_t, is it not?
However there is also a need for a 32 bit type to help out the
micro-optimiser. Integers count things, and most things come in blocks
of substantially less than 2 billion.
Well nearly all modern platforms have a native 32 bit type.
So we do actually need a change to the standard.
No need, IMO.
A type for arbitrary array index operations,
size_t or intmax_t.
and a type that is guaranteed to be fast.
int
But the majority of use will be the first type.
No it really would depend on the code and the actual array being
indexed. If you know that your array will be less than INT_MAX
elements, you don't need anything more than int. To be sure to address
4Gb use unsigned long. Anything more, you can use size_t or long long
etc.
Projects fail because code becomes unmanageable, not usually
because the processor isn't fast enough.
Yes, but somehow I doubt that the chief culprit for that unmanageability
is the use, or misuse, of types.

Oct 21 '07 #184
Malcolm McLean wrote:
....
So we do actually need a change to the standard. A type for arbitrary
array index operations, and a type that is guaranteed to be fast. But
the majority of use will be the first type.
That's not been my experience.

In my experience, arbitrary array indexing is an extremely rare need. In
most contexts I've ever had to worry about, I knew very precisely a
maximum size for every dimension of the array I was indexing. In most of
those cases that maximum was substantially less than 32767, so 'int' was
more than sufficient. When that wasn't the case, in C90 I would use
'long'; in C99, I'd use int_fast32_t. I've never had reason to index
anything where the index could be larger than 2GB.

On the other hand, almost every time I need an integer type that wasn't
fixed externally by a file or interface specification, I wanted the
fastest type available of sufficient size.
YMMV.
Oct 22 '07 #185
santosh <sa*********@gmail.comwrites:
Malcolm McLean wrote:
[...]
>I'd say that if you allow objects of over 4GB, then the natural
integer size is 64 bits.

Which it is on most 64 bit systems.
Assuming that Malcolm meant "int" rather than "integer", most 64-bit
systems I've seen have 32-bit int and 64-bit long (x86_64, ia-64,
64-bit SPARC).

If int is 64 bits and char is 8 bits, then either there's no 16-bit
integer type or there's no 32-bit integer type (unless the
implementation has C99-style extended integer types but I've never
seen one that does). That's usually considered too high a price to
pay for the benefit of making int the "natural" size. In any case, I
think 32-bit operations are reasonably fast on such systems anyway.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Oct 22 '07 #186

"James Kuyper Jr." <ja*********@verizon.netwrote in message
news:L2VSi.3575$c_4.575@trnddc05...
Malcolm McLean wrote:
...
>So we do actually need a change to the standard. A type for arbitrary
array index operations, and a type that is guaranteed to be fast. But the
majority of use will be the first type.

That's not been my experience.

In my experience, arbitrary array indexing is an extremely rare need. In
most contexts I've ever had to worry about, I knew very precisely a
maximum size for every dimension of the array I was indexing. In most of
those cases that maximum was substantially less than 32767, so 'int' was
more than sufficient. When that wasn't the case, in C90 I would use
'long'; in C99, I'd use int_fast32_t. I've never had reason to index
anything where the index could be larger than 2GB.

On the other hand, almost every time I need an integer type that wasn't
fixed externally by a file or interface specification, I wanted the
fastest type available of sufficient size.
YMMV.
I don't know what sort of programs you write.
Most of mine use arrays as by far the most common data structure. Typically
arrays represent lists of objects that are either entered by the user, or
decided at a high level.
So for instance if I need to perform an operation - say calculating the
standard deviation of a set of numbers, the stddev() function won't be privy
to the size of data. There will a legitimate expectation that the function
can handle any list of double that will fit into the computer's memory. The
actual list might be entered by the user or be hardcoded by the calling
programmer, and there might be some natural limit - such as the number of
people in the world, or the number of letters in the alphabet, or the number
of atom types in a protein - but the function knows nothing of that.

so the protoype needs to be

double stddev(double *x, type N)

and this is typical. Virtually all functions need to be specified in this
way. The question is what "type" should be called.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm
Oct 22 '07 #187
Malcolm McLean wrote:

<snip>
So for instance if I need to perform an operation - say calculating
the standard deviation of a set of numbers, the stddev() function
won't be privy to the size of data. There will a legitimate
expectation that the function can handle any list of double that will
fit into the computer's memory. The actual list might be entered by
the user or be hardcoded by the calling programmer, and there might be
some natural limit - such as the number of people in the world, or the
number of letters in the alphabet, or the number of atom types in a
protein - but the function knows nothing of that.

so the protoype needs to be

double stddev(double *x, type N)

and this is typical. Virtually all functions need to be specified in
this way. The question is what "type" should be called.
Surely size_t is tailormade for this purpose?

Oct 22 '07 #188
santosh wrote:
>
Malcolm McLean wrote:

<snip>
So for instance if I need to perform an operation - say calculating
the standard deviation of a set of numbers, the stddev() function
won't be privy to the size of data. There will a legitimate
expectation that the function can handle
any list of double that will
fit into the computer's memory. The actual list might be entered by
the user or be hardcoded by the calling programmer,
and there might be
some natural limit - such as the number of people in the world,
or the
number of letters in the alphabet, or the number of atom types in a
protein - but the function knows nothing of that.

so the protoype needs to be

double stddev(double *x, type N)

and this is typical. Virtually all functions need to be specified in
this way. The question is what "type" should be called.

Surely size_t is tailormade for this purpose?
Since he already explained that by "list", he meant "array",
size_t is definitely the right choice.

--
pete
Oct 22 '07 #189

"santosh" <sa*********@gmail.comwrote in message
Malcolm McLean wrote:
>double stddev(double *x, type N)

and this is typical. Virtually all functions need to be specified in
this way. The question is what "type" should be called.

Surely size_t is tailormade for this purpose?
Yes. There are two main snags.
Firstly it is unsigned. Although array indicies are naturally positive,
intermediate calculations can produce negative values.
Secondly it is called size_t. If I was supreme ruler of the universe I could
force everyone to use it, but I'm not, and there's just no way you are going
to get consistent usage of a type called "size_t" for an index variable.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Oct 23 '07 #190
"Malcolm McLean" <re*******@btinternet.comwrites:
Secondly it is called size_t. If I was supreme ruler of the universe I
could force everyone to use it, but I'm not, and there's just no way
you are going to get consistent usage of a type called "size_t" for an
index variable.
What's wrong with the name size_t?
--
"Your correction is 100% correct and 0% helpful. Well done!"
--Richard Heathfield
Oct 23 '07 #191
"Malcolm McLean" <re*******@btinternet.comwrites:
so the protoype needs to be

double stddev(double *x, type N)

and this is typical. Virtually all functions need to be specified in
this way. The question is what "type" should be called.
In my experience this is often still not abstract enough, and
will eventually get replaced by:

void stddev_start(struct stddev_state *);
void stddev_put(struct stddev_state *, double input);
double stddev_finish(struct stddev_state *);

or something even more abstract.
--
char a[]="\n .CJacehknorstu";int putchar(int);int main(void){unsigned long b[]
={0x67dffdff,0x9aa9aa6a,0xa77ffda9,0x7da6aa6a,0xa6 7f6aaa,0xaa9aa9f6,0x11f6},*p
=b,i=24;for(;p+=!*p;*p/=4)switch(0[p]&3)case 0:{return 0;for(p--;i--;i--)case+
2:{i++;if(i)break;else default:continue;if(0)case 1:putchar(a[i&15]);break;}}}
Oct 23 '07 #192
On Oct 23, 2:44 pm, Ben Pfaff <b...@cs.stanford.eduwrote:
"Malcolm McLean" <regniz...@btinternet.comwrites:
so the protoype needs to be
double stddev(double *x, type N)
and this is typical. Virtually all functions need to be specified in
this way. The question is what "type" should be called.

In my experience this is often still not abstract enough, and
will eventually get replaced by:

void stddev_start(struct stddev_state *);
void stddev_put(struct stddev_state *, double input);
I would recommend:
void stddev_put(struct stddev_state *, Number input);

where Number is a typedef somewhere.

It's not as useful in C as in C++, where 'Number' can be an extended
precision class, but at least it will work transparently on float,
double, and long double.
double stddev_finish(struct stddev_state *);

or something even more abstract.
--
char a[]="\n .CJacehknorstu";int putchar(int);int main(void){unsigned long b[]
={0x67dffdff,0x9aa9aa6a,0xa77ffda9,0x7da6aa6a,0xa6 7f6aaa,0xaa9aa9f6,0x11f6}*,*p
=b,i=24;for(;p+=!*p;*p/=4)switch(0[p]&3)case 0:{return 0;for(p--;i--;i--)case+
2:{i++;if(i)break;else default:continue;if(0)case 1:putchar(a[i&15]);break;}}}

Oct 23 '07 #193
[snips]

On Tue, 23 Oct 2007 22:21:50 +0100, Malcolm McLean wrote:
>Surely size_t is tailormade for this purpose?
Yes. There are two main snags.
Firstly it is unsigned.
Which makes sense, given that it is intended to be used to hold sizes,
which can never be negative, and indexes which, likewise, can never be
negative.
Although array indicies are naturally positive,
intermediate calculations can produce negative values.
Damned rarely, IME.
Secondly it is called size_t.
Yes, as differentiated from int or long, etc. It could have been called
"u_index_t", but it wasn't.
Oct 24 '07 #194
Malcolm McLean wrote:
>
"santosh" <sa*********@gmail.comwrote in message
>Malcolm McLean wrote:
>>double stddev(double *x, type N)

and this is typical. Virtually all functions need to be specified in
this way. The question is what "type" should be called.

Surely size_t is tailormade for this purpose?
Yes. There are two main snags.
Firstly it is unsigned. Although array indicies are naturally
positive, intermediate calculations can produce negative values.
This only happens occasionally. Maybe in those few cases you can use
something like intmax_t or int_fast64_t or long long?
Secondly it is called size_t. If I was supreme ruler of the universe I
could force everyone to use it, but I'm not, and there's just no way
you are going to get consistent usage of a type called "size_t" for an
index variable.
Do this then:

typedef size_t YOUR_CHOSEN_NAME;

Oct 24 '07 #195

"Kelsey Bjarnason" <kb********@gmail.comwrote in message
Your notion is, in essence, to use skids everywhere - the largest possible
unit of management. This may lead to _fast_ loading and unloading, but it
is hellishly wasteful of space, and space costs money - whether in a
container or in silicon.

What you're asking, in essence, is that the consumer eat the cost of the
$7200 per container due to inefficient loading, simply to let you load and
unload only with skids. I'm sure this would make _you_ happy, as you
could load and unload more efficiently, but why should someone else pay
the costs of your increased efficiency?
Malcom McLean's containers were a hit. Every commentator acknowledges that
they have massively reduced the cost of shipping. However they are not used
absolutely everywhere. Cars, for instance, are not typically packed into
containers. Neither is oil.

The sums do have to add up. However big software projects do fail,a nd often
expensively, and the reason is almost always the complexity of the programs.
The processors are typically physically capable of executing the
calcualations fast enough. Hardware costs seldom break projects.

One reason software is too complex is that there are too many standards for
representing data. Functions that do essentially the same thing are written
in different ways, pull in huge lists of dependencies, and need to be
rewritten before being included in projects.
So by reducing the number of types we are going in the right direction, and
attacking the bottleneck. Whether the costs will outweigh the benefits is
difficult to prove rigorously, of course, but history suggests that they
will. "You pay the costs personally" is an infantile argument.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Oct 24 '07 #196
Kelsey Bjarnason wrote:
>
[snips]

On Tue, 23 Oct 2007 22:21:50 +0100, Malcolm McLean wrote:
Surely size_t is tailormade for this purpose?
Yes. There are two main snags.
Firstly it is unsigned.

Which makes sense, given that it is intended to be used to hold sizes,
which can never be negative, and indexes which, likewise, can never be
negative.
Although array indicies are naturally positive,
intermediate calculations can produce negative values.

Damned rarely, IME.
I don't see what difference the negativity of
intermediate calculations makes anyway.

(0u - 10 + 15) is an expression of type unsigned,
with a value of 5u.

--
pete
Oct 24 '07 #197
"Malcolm McLean" <re*******@btinternet.comwrote:
"Kelsey Bjarnason" <kb********@gmail.comwrote in message
Your notion is, in essence, to use skids everywhere - the largest possible
unit of management. This may lead to _fast_ loading and unloading, but it
is hellishly wasteful of space, and space costs money - whether in a
container or in silicon.

What you're asking, in essence, is that the consumer eat the cost of the
$7200 per container due to inefficient loading, simply to let you load and
unload only with skids. I'm sure this would make _you_ happy, as you
could load and unload more efficiently, but why should someone else pay
the costs of your increased efficiency?
Malcom McLean's containers were a hit. Every commentator acknowledges that
they have massively reduced the cost of shipping.
Keep on dreaming, Malcolm; but please do so in your sleep, not in
comp.lang.c.

Richard
Oct 25 '07 #198

"pete" <pf*****@mindspring.comwrote in message
>
I don't see what difference the negativity of
intermediate calculations makes anyway.

(0u - 10 + 15) is an expression of type unsigned,
with a value of 5u.
If only C guaranteed you an overflow error in such a situation.
Yes it will work. And that's the problem. Eventually if you play that sort
of game the language will bite you.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Oct 25 '07 #199
"Flash Gordon" <sp**@flash-gordon.me.uka écrit dans le message de news:
64***********@news.flash-gordon.me.uk...
Charlie Gordon wrote, On 20/10/07 13:54:
<snip>
>>
strncpy can be banned with no downside.

Except where it is the right tool for the job. If an application developer
had made a slightly different choice, namely 0 padding instead of space
padding, on the application I spend a lot of time working on then it would
make *far* more calls to strncpy than calls strcpy. If I had the time I
would actually make that change to the SW.
I assume you wrote your own utility functions for space padded fixed width
non zero terminated char arrays. That was quite easy, was it not?
If the application developper had made the slightly different choice of 0
padding, you would just as easily have written the appropriate utility
functions. IMHO, it would be far less confusing to newbie readers of your
code then to have used strncpy.
Instead, my response in terms of strncpy and its safety aspects would have
been to ask the question, "Do you have fixed width 0 padded and
potentially not 0 terminated fields in the your SW?" If the answer was
"no" *then* I would say that almost any use of it in that SW would be
wrong and a security risk because it does not null terminate the result in
all conditions and in other conditions can take an unexpectedly long time.
Definitely so!
If the answer was "yes" then I would say the risk was in erroneously using
it when the intent is to produce a C string rather than populate such a
fields and that naming conventions should be used to assist in spotting
correct vs incorrect usage. Further mitigation would include ensuring that
such limits are tested and the usual stuff about reviews and use of grep.
Using a specific function with a more appropriate name is a good start to
avoid the confusion strncpy is likely to bring along.
> Pointers are a fundamental feature of the language. The class of
problems that can be solved without them has a topological measure of
zero.

Not quite, there are *some* programs that can be written without pointers,
just not many.
The only standard way for a C program to do I/O is to rely on streams
provided by stdio.h, which make use of pointers. Same for command line
arguments: argv is an array of pointers. Actually all arrays convert to
pointers as soon as they are used in expressions (except as the operand to
sizeof).
A program that does not use pointers at all cannot take form of variable
input, and can only produce a return or exit code, namely EXIT_SUCCESS or
EXIT_FAILURE (or 0, but that may be the value of either of these).
There is still a wide variety of programs that can be written this way to
just produce a yes/no answer... but an infinitely small fraction of them
all.
>>FCOL, Tor. Wake up and smell the real risk - clueless programmers, hired
by
witless buffoons because they have good hair and a good CV.

90% of the IT work force. More than 90% of all code produced.

I've been fortunate and only worked with a few clueless programmers.
Ambiguous: only a few clueless and many smart ones, or did you restrict your
working environment to just a few programmers, all of them clueless, and
consider that fortunate ;-?

--
Chqrlie.
Oct 26 '07 #200

This thread has been closed and replies have been disabled. Please start a new discussion.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.