Safe version of gets

Anthony Irwin

Hi,

I am fairly new to C and all the C books I got talk about gets() but when
I compile it says I should not use gets() because it is dangerous. I
understand that it is dangerous because it doesn't check whether there is
more characters entered by the user the what can be stored but I don't
know what the safe equivalent of gets() is.

Below is an example program I have written. (GCC is my compiler on
GNU/Linux system.)

Kind Regards,
Anthony Irwin

#include <stdio.h>

char histogram[50];
int i, space, begin, length;

int main() {
printf("\nEnter some words for the histogram: ");
sgets(histogram);
printf("\n\n");
begin = 1;
length = strlen(histogram);

for (i = 0; i <= length; i++) {
if (histogram[i] != '\0') {
if (space == 1 || begin == 1) {
histogram[i] = toupper(histogram[i]);
begin = 0;
}
printf("\n\t%c", histogram[i]);
printf("\t\t%d", i);

if (histogram[i] == ' ') {
space = 1;
printf("\t\tSpace");
}
else {
space = 0;
}
}
}

return 0;
}

Nov 15 '05 #1

Subscribe Post Reply

3382

Peter Pichler

Anthony Irwin wrote:

I am fairly new to C and all the C books I got talk about gets() but when
I compile it says I should not use gets() because it is dangerous. I
understand that it is dangerous because it doesn't check whether there is
more characters entered by the user the what can be stored but I don't
know what the safe equivalent of gets() is.

fgets()

Nov 15 '05 #2

Anthony Irwin

On Fri, 12 Aug 2005 15:32:04 +1000, Anthony Irwin wrote:

Hi,

I am fairly new to C and all the C books I got talk about gets() but when
I compile it says I should not use gets() because it is dangerous. I
understand that it is dangerous because it doesn't check whether there is
more characters entered by the user the what can be stored but I don't
know what the safe equivalent of gets() is.

Below is an example program I have written. (GCC is my compiler on
GNU/Linux system.)

Kind Regards,
Anthony Irwin

#include <stdio.h>

char histogram[50];
int i, space, begin, length;

int main() {
printf("\nEnter some words for the histogram: ");
sgets(histogram);
Above is actually gets(histogram); I was making changes before I posted
and missed it sgets doesn't exist with gcc or ansi c.
printf("\n\n");
begin = 1;
length = strlen(histogram);

for (i = 0; i <= length; i++) {
if (histogram[i] != '\0') {
if (space == 1 || begin == 1) {
histogram[i] = toupper(histogram[i]);
begin = 0;
}
printf("\n\t%c", histogram[i]);
printf("\t\t%d", i);

if (histogram[i] == ' ') {
space = 1;
printf("\t\tSpace");
}
else {
space = 0;
}
}
}

return 0;
}

Nov 15 '05 #3

Suman

Peter Pichler wrote:

Anthony Irwin wrote:
I am fairly new to C and all the C books I got talk about gets() but when
I compile it says I should not use gets() because it is dangerous. I
understand that it is dangerous because it doesn't check whether there is
more characters entered by the user the what can be stored but I don't
know what the safe equivalent of gets() is.

fgets()

gets() works (or not) on stdin. So, to be accurate:
fgets( histogram, sizeof histogram, stdin );

[1] variable histogram from code posted by OP

Read up on this function, it's something like --
"The fgets() function reads at most n-1 characters from stream into the
buffer pointed to by s. No additional characters are read after fgets()
has read and transferred a newline character to the buffer. A null
character is written immediately after the last character that fgets()
reads into the buffer."

Nov 15 '05 #4

Anthony Irwin

On Fri, 12 Aug 2005 06:33:23 +0100, Peter Pichler wrote:

Anthony Irwin wrote:
I am fairly new to C and all the C books I got talk about gets() but when
I compile it says I should not use gets() because it is dangerous. I
understand that it is dangerous because it doesn't check whether there is
more characters entered by the user the what can be stored but I don't
know what the safe equivalent of gets() is.

fgets()

Hey thanks I can do the following:

fgets(histogram,sizeof(histogram), stdin);

Kind Regards,
Anthony Irwin

Nov 15 '05 #5

Villy Kruse

On Fri, 12 Aug 2005 15:50:06 +1000,
Anthony Irwin <no****@example.com> wrote:

On Fri, 12 Aug 2005 06:33:23 +0100, Peter Pichler wrote:
Anthony Irwin wrote:
I am fairly new to C and all the C books I got talk about gets() but when
I compile it says I should not use gets() because it is dangerous. I
understand that it is dangerous because it doesn't check whether there is
more characters entered by the user the what can be stored but I don't
know what the safe equivalent of gets() is.

fgets()

Hey thanks I can do the following:

fgets(histogram,sizeof(histogram), stdin);

Don't forget to handle the trailing newline character, and do realise
that if there is no trailing new line, your record is truncated, in which
case the following fgets will get you the rest of the line. Of cours,
if the last line doesn't have a new line your last line read from the
file may not have a new line, which you also need to deal with.

Villy

Nov 15 '05 #6

Martin Ambuhl

Anthony Irwin wrote:

Hi,

I am fairly new to C and all the C books I got talk about gets()
Burn them. Where did you get "all the C books" that use gets()?
but when
I compile it says I should not use gets() because it is dangerous.
That's right.
I
understand that it is dangerous because it doesn't check whether there is
more characters entered by the user the what can be stored but I don't
know what the safe equivalent of gets() is.

fgets() is much safer. And you can always roll your own.

Nov 15 '05 #7

CBFalconer

Anthony Irwin wrote:

I am fairly new to C and all the C books I got talk about gets()
but when I compile it says I should not use gets() because it is
dangerous. I understand that it is dangerous because it doesn't
check whether there is more characters entered by the user the
what can be stored but I don't know what the safe equivalent of
gets() is.

Download and use the portable public domain ggets module at:

<http://cbfalconer.home.att.net/download/ggets.zip>

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 15 '05 #8

ahmed.mubin

use fgets using stdout as input

Nov 15 '05 #9

akarl

Anthony Irwin wrote:

Hi,

I am fairly new to C and all the C books I got talk about gets() but when
I compile it says I should not use gets() because it is dangerous. I
understand that it is dangerous because it doesn't check whether there is
more characters entered by the user the what can be stored but I don't
know what the safe equivalent of gets() is.

Below is an example program I have written. (GCC is my compiler on
GNU/Linux system.)

Why don't you tell us (exactly) what the program does/is supposed to do?

August

Nov 15 '05 #10

wolfman

I wrote a function, that I always use: (shure you have to include stdio.h)

int mygets(FILE *stream, char *buffer, int lim)
{ int i = 0, c;

while (i < lim - 1 && (c = getc(stream)) != EOF && c != '\n')
{ *buffer++ = (char) c;
i++;
}
*buffer = 0; /* or better *buffer = '\0' in non ASCII / ANSI Systems */
return i;
}

Parameters:
stream: I/O-Stream or FILE-Pointer
buffer: pointer to text buffer in memory
lim: buffer size

Return value:
Function returns the number of read characters

Remarks:
- If you use 16-bit characters you have to change the datatype char to the
appropriate datatype of your system,
for example wchar.
- You don't have to care about terminating zeros. The function always adds
one.

Best regards,
wolfman
"Anthony Irwin" <no****@example.com> schrieb im Newsbeitrag
news:pa****************************@example.com...

Hi,

I am fairly new to C and all the C books I got talk about gets() but when
I compile it says I should not use gets() because it is dangerous. I
understand that it is dangerous because it doesn't check whether there is
more characters entered by the user the what can be stored but I don't
know what the safe equivalent of gets() is.

Below is an example program I have written. (GCC is my compiler on
GNU/Linux system.)

Kind Regards,
Anthony Irwin

#include <stdio.h>

char histogram[50];
int i, space, begin, length;

int main() {
printf("\nEnter some words for the histogram: ");
sgets(histogram);
printf("\n\n");
begin = 1;
length = strlen(histogram);

for (i = 0; i <= length; i++) {
if (histogram[i] != '\0') {
if (space == 1 || begin == 1) {
histogram[i] = toupper(histogram[i]);
begin = 0;
}
printf("\n\t%c", histogram[i]);
printf("\t\t%d", i);

if (histogram[i] == ' ') {
space = 1;
printf("\t\tSpace");
}
else {
space = 0;
}
}
}

return 0;
}

Nov 15 '05 #11

Malcolm

"Martin Ambuhl" <ma*****@earthlink.net> wrote

fgets() is much safer. And you can always roll your own.

Actually it's more dangerous, unless the programmer really knows what he is
doing.
Undefined behaviour is usually correct behaviour, terminating the program
with an appropriate message. Silently truncating input will usually produce
incorrect behaviour.

Nov 15 '05 #12

Peter Pichler

ah*********@gmail.com wrote:

use fgets using stdout as input

^^^^^^
ITYM stdin.

Nov 15 '05 #13

CBFalconer

wolfman wrote:

I wrote a function, that I always use: (shure you have to include
stdio.h)

int mygets(FILE *stream, char *buffer, int lim)
{ int i = 0, c;

while (i < lim - 1 && (c = getc(stream)) != EOF && c != '\n')
{ *buffer++ = (char) c;
i++;
}
*buffer = '\0';
return i;
}

Please don't toppost. Your answer belongs after (or intermixed
with) the material to which you reply, after snipping anything not
germane to your answer.

Problem, because the caller has no idea whether a '\n' was received
or not. The insidious advantage of gets is that it ALWAYS returns
a complete line, regardless of what that tramples over. My ggets
has the same advantage, without the trampling. Thus it is safe to
remove the terminal \n in the function.

Apart from the \n being returned, I see no essential difference
between your function and fgets. In interactive work it is
essential to know a line has been consumed.

My ggets has the prototype:

int ggets(char **ln);

returning non-zero for error/EOF, and is called with:

char *ln;

result = ggets(&ln);
or
result = fggets(&ln, file);

see <http://cbfalconer.home.att.net/download/ggets.zip>

(The cost is that you have to free(ln) when done with it)

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 15 '05 #14

Chris Torek

>"Martin Ambuhl" <ma*****@earthlink.net> wrote

fgets() is much safer. And you can always roll your own.

In article <dd**********@nwrdmz03.dmz.ncs.ea.ibs-infra.bt.com>
Malcolm <re*******@btinternet.com> wrote:Actually it's more dangerous, unless the programmer really knows
what he is doing. Undefined behaviour is usually correct behaviour ...
Except for the fact that it is not.
terminating the program with an appropriate message.
Most often, undefined behavior leads to bugs in Microsoft products
that result in viruses, worms, "zombie" PCs, and spam.

Much of your spam is due to gets() calls (or equivalent). (Most
of the rest is due to other bugs in IIS.)
Silently truncating input will usually produce
incorrect behaviour.

But observable, repeatable, predictable incorrect behavior.
Moreover, programs that use gets() in this same way also
silently truncate input:

gets(buf); /* XXX bad */
error = FALSE;
...

Since the "error" variable happens to alias &buf[sizeof buf], this
truncates the input, silently. In other words, the function you
claim to be "superior" has the same flaw as fgets(), plus additional
flaws.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Nov 15 '05 #15

Anton Petrusevich

CBFalconer wrote:

My ggets has the prototype:
int ggets(char **ln);

Great. What if I use your ggets() in my program and someone will run it
as ./myprogram </dev/zero? Sorry I didn't look into your code, but I
suppose it will eat all available memory and then crash, right?
--
Anton Petrusevich

Nov 15 '05 #16

Giannis Papadopoulos

Anton Petrusevich wrote:

CBFalconer wrote:

My ggets has the prototype:
int ggets(char **ln);

Great. What if I use your ggets() in my program and someone will run it
as ./myprogram </dev/zero? Sorry I didn't look into your code, but I
suppose it will eat all available memory and then crash, right?

But it won't crash... If you do 'rm -rf /' it is not the fault of the
creator of rm...

--
one's freedom stops where other's begin

Giannis Papadopoulos
http://dop.users.uth.gr/
University of Thessaly
Computer & Communications Engineering dept.

Nov 15 '05 #17

Anton Petrusevich

Giannis Papadopoulos wrote:

My ggets has the prototype:
int ggets(char **ln);

Great. What if I use your ggets() in my program and someone will run it
as ./myprogram </dev/zero? Sorry I didn't look into your code, but I
suppose it will eat all available memory and then crash, right?

But it won't crash... If you do 'rm -rf /' it is not the fault of the
creator of rm...

rm -rf / must be executed with root privileges, but ./myprogram does not
require that. "Analogies suck" (c). I just wanted to show that such a
simple API have some drawbacks.
--
Anton Petrusevich

Nov 15 '05 #18

Giannis Papadopoulos

Anton Petrusevich wrote:

Giannis Papadopoulos wrote:

My ggets has the prototype:
int ggets(char **ln);

Great. What if I use your ggets() in my program and someone will run it
as ./myprogram </dev/zero? Sorry I didn't look into your code, but I
suppose it will eat all available memory and then crash, right?

But it won't crash... If you do 'rm -rf /' it is not the fault of the
creator of rm...

rm -rf / must be executed with root privileges, but ./myprogram does not
require that. "Analogies suck" (c). I just wanted to show that such a
simple API have some drawbacks.

Ok, then 'rm -rf ~'..
You can't (and shouldn't always) protect a system from a complete idiot.

There are numerous programs out there that if they had available memory,
they would work with infinite input.

And ggets() doesn't crash.. A well-written program in a well-written
system would notify that memory allocation failed, return EXIT_FAILURE
and the os would deallocate the memory. If you choose win98, then BSOD
is the solution..

Have you ever tried while(1) {fork();}? Should fork check how many times
it is called? If not, why should CBFalconer bother to check how many
chars are read, seriously hindering perfomance?

--
one's freedom stops where other's begin

Giannis Papadopoulos
http://dop.users.uth.gr/
University of Thessaly
Computer & Communications Engineering dept.

Nov 15 '05 #19

CBFalconer

Anton Petrusevich wrote:

CBFalconer wrote:
My ggets has the prototype:
int ggets(char **ln);

Great. What if I use your ggets() in my program and someone will run it
as ./myprogram </dev/zero? Sorry I didn't look into your code, but I
suppose it will eat all available memory and then crash, right?

Yes, it will eat all memory. No, it won't crash. It will return
an error indicator. If the caller then executes free(ln) the
memory will be available again. This assumes /dev/zero returns an
unending stream of zero bytes.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson

Nov 15 '05 #20

Kenny McCormack

In article <dG***************@newsread1.news.atl.earthlink.ne t>,
Martin Ambuhl <ma*****@earthlink.net> wrote:

Anthony Irwin wrote:
Hi,

I am fairly new to C and all the C books I got talk about gets()

Burn them. Where did you get "all the C books" that use gets()?

He said they "talked" about gets(), not that they "used" it.

Sounds like you made a leap of logic here.

(No apologies necessary...)

Nov 15 '05 #21

Martin Ambuhl

Kenny McCormack wrote:

In article <dG***************@newsread1.news.atl.earthlink.ne t>,
Martin Ambuhl <ma*****@earthlink.net> wrote:
Anthony Irwin wrote:
Hi,

I am fairly new to C and all the C books I got talk about gets()

Burn them. Where did you get "all the C books" that use gets()?

He said they "talked" about gets(), not that they "used" it.

Sounds like you made a leap of logic here.

(No apologies necessary...)

And none warranted.

Nov 15 '05 #22

Keith Thompson

"Malcolm" <re*******@btinternet.com> writes:

"Martin Ambuhl" <ma*****@earthlink.net> wrote

fgets() is much safer. And you can always roll your own.

Actually it's more dangerous, unless the programmer really knows what he is
doing.
Undefined behaviour is usually correct behaviour, terminating the program
with an appropriate message. Silently truncating input will usually produce
incorrect behaviour.

I'll just repeat what I wrote a couple of weeks ago in another thread
(in response to a similar statement from you).

fgets() is worse than gets()?

Nonsense.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 15 '05 #23

Martin Ambuhl

Malcolm wrote:

"Martin Ambuhl" <ma*****@earthlink.net> wrote
fgets() is much safer. And you can always roll your own.

Actually it's more dangerous, unless the programmer really knows what he is
doing.

Life is too short to bother with idiots that claim that fgets() is more
dangerous than gets().

*PLONK*

Nov 15 '05 #24

Randy Howard

Anton Petrusevich wrote
(in article <dd*************@news.t-online.com>):

CBFalconer wrote:
My ggets has the prototype:
int ggets(char **ln);
Great. What if I use your ggets() in my program and someone will run it
as ./myprogram </dev/zero?

OT: Probably depends upon the platform and resource limits (if
any). If you run it as a 32-bit app on a 64-bit platform, it
will probably bail at 2GB or slightly less, leaving the rest of
the OS with anything above that, say on a 32GB big dog Opteron
server.

Now back to your regularly schedule c.l.c. program...
Sorry I didn't look into your code, but I
suppose it will eat all available memory and then crash, right?

Or, you could simply modify it to take a size_t parameter, say
max_eat_ram, or hard code a limit that you feel is reasonable
for the specific application, set it as a tunable parameter in a
config file, etc., etc.

--
Randy Howard (2reply remove FOOBAR)

Nov 15 '05 #25

Anton Petrusevich

CBFalconer wrote:

Yes, it will eat all memory. No, it won't crash. It will return
an error indicator. If the caller then executes free(ln) the
memory will be available again.

It can crash if the system uses optimistic allocator (overcommits memory).
--
Anton Petrusevich

Nov 15 '05 #26

websnarf

Anton Petrusevich wrote:

CBFalconer wrote:
Yes, it will eat all memory. No, it won't crash. It will return
an error indicator. If the caller then executes free(ln) the
memory will be available again.

It can crash if the system uses optimistic allocator (overcommits memory).

Well, more specifically, it can crash if you are in a multitasking
environment, and some *other* task is intolerant to memory allocation
failures because you are hogging it all with your neer ending gets
substitute. Thus, while it is possible to make such a solution sound
for the current task, its not appropriate for a system which is acting
as either a server, or just a system with lots of 3rd party tasks
running at the same time.

It also has another problem. What if you are trying to read a
password? If you use realloc() on the buffer, then its possible that
stale password prefix may be released back to heap during the process
of reading the password itself. This can be a security problem if you
have another user or process logged into the system that is able to
perform an "elevation attack" and gain access to the processes memory
outside the lifetime of the password request.

The most general answer is to read the user input into multiple
segmented buffers that are then provided back to the caller in an
incremental manner. This gives the caller the ability to performing a
length truncating style input without overallocation memory for the
most common case of small inputs. You can see an example of this here:

http://www.azillionmonkeys.com/qed/userInput.html

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 15 '05 #27

Villy Kruse

On Fri, 12 Aug 2005 19:14:33 +0000 (UTC),
Malcolm <re*******@btinternet.com> wrote:

"Martin Ambuhl" <ma*****@earthlink.net> wrote

fgets() is much safer. And you can always roll your own.

Actually it's more dangerous, unless the programmer really knows what he is
doing.
Undefined behaviour is usually correct behaviour, terminating the program
with an appropriate message. Silently truncating input will usually produce
incorrect behaviour.

It doesn't *silently* trunctate lines. The missing newline is an
indication that your input line were longer than the size of your buffer,
and then you can deside what the appropriate action in that case is.
You could chose to delcare this a fatal error, or you could chose that
this is no problem in which case you would read and ignore the rest of
the line, for example by using a getc loop. Or you could realloc() the
buffer and append the rest of the line.
Villy

Nov 15 '05 #28

Malcolm

"Chris Torek" <no****@torek.net> wrote

But observable, repeatable, predictable incorrect behavior.
Moreover, programs that use gets() in this same way also
silently truncate input:

gets(buf); /* XXX bad */
error = FALSE;
...

Since the "error" variable happens to alias &buf[sizeof buf], this
truncates the input, silently. In other words, the function you
claim to be "superior" has the same flaw as fgets(), plus additional
flaws.

If you're unlucky.
Undefined behaviour is always incorrect behaviour _by the program_, but
usually correct behaviour _by the computer_.
In this case, if the buffer overflow is only two or three bytes, then the
input will be silently truncated. However the chances are that sooner or
later someone will enter a longer input, and the error will be very noisily
flagged, probaly with a message like "segmentation fault". This is what you
want to happen, given that it is incorrect, it should produce no results.

Imagine that for a certain disease the clinically effective drug dose is
close to the lethal dose. We write a program to help out the doctor, taking
as input the patient's weight, severity of symptoms, and so on. The corect
does is 800mg
There are a few possible scenarios.
1) Computer prints ("Internal error");
Doctor: **** computers.
2) Computer prints "ugggle uggle uggle"
Doctor: **** program.
3) Computer prints "dose 100,000mg"
Doctor: ***** programmer. Is he trying to kill someone.
4) Computer prints "dose 1000mg"
Doctor: inject.

By replacing undefined behaviour with wrong but defined behaviour, you are
reducing the number of type 1 and type 2 errors, at the cost of more type 3
and type 4 errors. That's why Martin's advice was so badly wrong

Nov 15 '05 #29

Keith Thompson

"Malcolm" <re*******@btinternet.com> writes:

"Chris Torek" <no****@torek.net> wrote

But observable, repeatable, predictable incorrect behavior.
Moreover, programs that use gets() in this same way also
silently truncate input:

gets(buf); /* XXX bad */
error = FALSE;
...

Since the "error" variable happens to alias &buf[sizeof buf], this
truncates the input, silently. In other words, the function you
claim to be "superior" has the same flaw as fgets(), plus additional
flaws.
If you're unlucky.
Undefined behaviour is always incorrect behaviour _by the program_, but
usually correct behaviour _by the computer_.

Depends on what you mean by "correct", I suppose.

fgets() can be used safely if you know what you're doing. With proper
use, it cannot cause a buffer overflow, and you can always tell
whether the input line was truncated by checking the last character of
the input string. It's not ideal; the non-standard but portably
implemented ggets()is arguably better, at least in some ways. If you
aren't able to use fgets() safely, you probably shouldn't be
programming in C (or in any other language).

(That last isn't directed at you personally. I presume you are
competent enough to avoid the pitfalls of fgets(). I'm atacking your
arguments, not your competence.)

getc() can be used safely if you know what you're doing. Integer
addition can be used safely if you know what you're doing. And so on,
for nearly every feature of the language.

gets() cannot be used safely. Undefined behavior is not safe. A
program that potentially trampling on other variables is not safe.
Living with possible crashes rather than learning how to use the
language properly is, frankly, not the mark of a competent programmer.
In this case, if the buffer overflow is only two or three bytes, then the
input will be silently truncated. However the chances are that sooner or
later someone will enter a longer input, and the error will be very noisily
flagged, probaly with a message like "segmentation fault". This is what you
want to happen, given that it is incorrect, it should produce no results.
Just write correct code in the first place. Code that uses gets() is
not, and cannot be, correct code. Yes, it's possible to write
incorrect code using fgets(). It's also possible (though perhaps
slightly inconvenient) to write correct code using fgets(). It's also
possible to write correct or incorrect code using ggets(), or getc(),
or nearly anything else.

[snip]
By replacing undefined behaviour with wrong but defined behaviour, you are
reducing the number of type 1 and type 2 errors, at the cost of more type 3
and type 4 errors. That's why Martin's advice was so badly wrong

Nobody has advocated replacing undefined behavior with wrong but
defined behavior. The trick is to replace undefined behavior with
correct and defined behavior. Nothing else is acceptable. Suggesting
otherwise makes it difficult to take you seriously.

Once again, yes, fgets() has problems. It's simply foolish to think
that gets() could be the solution.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 15 '05 #30

Flash Gordon

Malcolm wrote:

"Chris Torek" <no****@torek.net> wrote
But observable, repeatable, predictable incorrect behavior.
Moreover, programs that use gets() in this same way also
silently truncate input:

gets(buf); /* XXX bad */
error = FALSE;
...

Since the "error" variable happens to alias &buf[sizeof buf], this
truncates the input, silently. In other words, the function you
claim to be "superior" has the same flaw as fgets(), plus additional
flaws.
If you're unlucky.
Undefined behaviour is always incorrect behaviour _by the program_, but
usually correct behaviour _by the computer_.

Except when it corrupts 10000 records and means the company can't get
its statutery accounts out on time.
In this case, if the buffer overflow is only two or three bytes, then the
input will be silently truncated.
Or corrupt something else causing other problems.
However the chances are that sooner or
later someone will enter a longer input, and the error will be very noisily
flagged, probaly with a message like "segmentation fault". This is what you
want to happen, given that it is incorrect, it should produce no results.
So write your own wrapper that takes the buffer size and does an abort
if the line is too long.
Imagine that for a certain disease the clinically effective drug dose is
close to the lethal dose. We write a program to help out the doctor, taking
as input the patient's weight, severity of symptoms, and so on. The corect
does is 800mg
If I say gets being used on a safety critical project I would do my
damdest to get the developer off the project, including writing to the
QA director (if the company had one).
There are a few possible scenarios.
1) Computer prints ("Internal error");
Doctor: **** computers.
2) Computer prints "ugggle uggle uggle"
Doctor: **** program.
3) Computer prints "dose 100,000mg"
Doctor: ***** programmer. Is he trying to kill someone.
4) Computer prints "dose 1000mg"
Doctor: inject.

By replacing undefined behaviour with wrong but defined behaviour, you are
reducing the number of type 1 and type 2 errors, at the cost of more type 3
and type 4 errors. That's why Martin's advice was so badly wrong

My experience of buffer overruns is that they generally have not cause
the SW to crash until some (sometimes considerable) time later, by which
time the corrupted data can have caused all sorts of things to go wrong.
At least with defined but wrong behaviour you can then step through it
with a debugger, see what is wrong, and analyse what effects it will
have had.
--
Flash Gordon
Living in interesting times.
Although my email address says spam, it is real and I read it.

Nov 15 '05 #31

Richard Heathfield

Malcolm wrote:

"Chris Torek" <no****@torek.net> wrote

But observable, repeatable, predictable incorrect behavior.
Moreover, programs that use gets() in this same way also
silently truncate input:

gets(buf); /* XXX bad */
error = FALSE;
...

Since the "error" variable happens to alias &buf[sizeof buf], this
truncates the input, silently. In other words, the function you
claim to be "superior" has the same flaw as fgets(), plus additional
flaws.
If you're unlucky.
Undefined behaviour is always incorrect behaviour _by the program_, but
usually correct behaviour _by the computer_.

You must live in a very different universe to me, then. I've seen computers
do some weird stuff for UB.

We had this argument a dozen times already, and you always lose it, and you
always forget you lost it.
Imagine that for a certain disease the clinically effective drug dose is
close to the lethal dose. We write a program to help out the doctor,
taking as input the patient's weight, severity of symptoms, and so on. The
corect does is 800mg
There are a few possible scenarios.
1) Computer prints ("Internal error");
Doctor: **** computers.
2) Computer prints "ugggle uggle uggle"
Doctor: **** program.
3) Computer prints "dose 100,000mg"
Doctor: ***** programmer. Is he trying to kill someone.
4) Computer prints "dose 1000mg"
Doctor: inject.

By replacing undefined behaviour with wrong but defined behaviour, you are
reducing the number of type 1 and type 2 errors, at the cost of more type
3 and type 4 errors. That's why Martin's advice was so badly wrong

Martin's advice was fine. Your example is flawed. Firstly, it is the doctor,
not the computer program, that is responsible for the patient's safety.
Secondly, NONE of the behaviours you demonstrate is preferable. The program
should accept the input completely, and process it completely, and produce
the completely correct output. This is hardly difficult.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
mail: rjh at above domain

Nov 15 '05 #32

Malcolm

"Keith Thompson" <ks***@mib.org> wrote

Nobody has advocated replacing undefined behavior with wrong but
defined behavior.
You may not have done so. People frequently advocate using fgets() in an
unsafe manner. If you use the function, you must always check the newline
and consider what you need to do on a partial read (except in the unusual
case of needing to strip all newlines from input).
If you don't do this, fgets() is a highly dangerous function.
The trick is to replace undefined behavior with correct and defined
behavior. Nothing else
is acceptable. Suggesting otherwise makes it difficult to take you
seriously.

I'm not suggesting using gets(). I'm suggesting that replacing undefined
behaviour with defined but incorrect behaviour makes things worse. Thus
advice to replace gets() with fgets() must always be accompanied by a
warning to check the newline, because this is so likely to be overlooked.
That's what I am complaining of.

Nov 15 '05 #33

Richard Heathfield

Malcolm wrote:

I'm not suggesting using gets(). I'm suggesting that replacing undefined
behaviour with defined but incorrect behaviour makes things worse.
No, it doesn't. Reproducible bugs are always easier to find and fix than
bugs which may or may not be reproducible and which may or may not damage
the hardware.
Thus
advice to replace gets() with fgets() must always be accompanied by a
warning to check the newline, because this is so likely to be overlooked.
s/must/could/

The advice to replace gets() with fgets() is always sound. It is even sound
to advise replacing gets() with isupper()! This is because gets() cannot be
used safely, so any replacement is an improvement to the code.

Evolution of advice not to use gets():

1. Don't use gets().
2. Use fgets() instead.
3. stdin.
4. No, don't put 100 in there. Put sizeof array.
5. Only if it's an array. Duh.
6. Well, you knew how big it was when you malloc'd it. Don't forget!
7. Check the return value - it returns NULL if an error occurs or EOF is
encountered.
8. Check for a '\n' character in the string given you by fgets(). If you get
one, it means you read (the last part of) a complete line. Otherwise, it
means you read only part of a line.
9. How big should the array be? You can't know in advance. If you absolutely
must read an entire line all in one go, write a wrapper around fgets() and
do some realloc calls - or find such a wrapper.
10. Yes, some of us have already written such wrappers, and yes, they are
publicly available.

That's what I am complaining of.

Then say so, instead of wittering on about how nice it is to have a broken
program.
--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
mail: rjh at above domain

Nov 15 '05 #34

Keith Thompson

"Malcolm" <re*******@btinternet.com> writes:
[...]

I'm not suggesting using gets().
Good. You've given that impression; I'm glad to see that it was
untinentional.
I'm suggesting that replacing undefined
behaviour with defined but incorrect behaviour makes things worse.

Wrong.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 15 '05 #35

Malcolm

"Keith Thompson" <ks***@mib.org> wrote

I'm suggesting that replacing undefined behaviour with defined but
incorrect behaviour
makes things worse.
Wrong.

You assert that. But I've given a justification (see the doctor example). By
getting rid of the undefined behaviour your are suppressing most of the type
1 scenarios, the program can never produce the output "segmentation fault",
for example. But it is still incorrect, so you must be increasing the type
2, type 3 and type 4 scenarios.

Doctor example [ Malcolm, recopied ] There are a few possible scenarios.
1) Computer prints ("Internal error");
Doctor: **** computers.
2) Computer prints "ugggle uggle uggle"
Doctor: **** program.
3) Computer prints "dose 100,000mg"
Doctor: ***** programmer. Is he trying to kill someone.
4) Computer prints "dose 1000mg"
Doctor: inject.

Nov 15 '05 #36

Richard Heathfield

Malcolm wrote:

"Keith Thompson" <ks***@mib.org> wrote
I'm suggesting that replacing undefined behaviour with defined but
incorrect behaviour
makes things worse.
Wrong.

You assert that. But I've given a justification (see the doctor example).

The example is broken, as has already been explained to you.
By getting rid of the undefined behaviour your are suppressing most of the
type 1 scenarios, the program can never produce the output "segmentation
fault", for example. But it is still incorrect, so you must be increasing
the type 2, type 3 and type 4 scenarios.

So is it your contention that, if one removes a source of error from a
program, other sources of error multiply to maintain the balance? Do you
have some kind of Theory of Conservation of Program Errors? Or is it more
likely that fixing a problem will /reduce/ the number of errors?

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
mail: rjh at above domain

Nov 15 '05 #37

Richard Bos

"Malcolm" <re*******@btinternet.com> wrote:

"Keith Thompson" <ks***@mib.org> wrote
I'm suggesting that replacing undefined behaviour with defined but
incorrect behaviour
makes things worse.

Wrong.

You assert that. But I've given a justification (see the doctor example). By
getting rid of the undefined behaviour your are suppressing most of the type
1 scenarios, the program can never produce the output "segmentation fault",
for example. But it is still incorrect, so you must be increasing the type
2, type 3 and type 4 scenarios.

Thing is, hypothetical scenarios are easy to manufacture. I could just
as simply, and probably more plausibly, devise one in which the area for
the number can hold 4 digits (plus null); the result from fgets()
without checking for '\n' would be 1000 mg of some medicine instead of
800 mg, which is not correct but, with proper monitoring of the
patient's situation not dramatic; but using your preferred gets() would
clobber over the pointer that just happened to be situated after the
array, which points to the description of the medicine, which means that
instead of 1000 mgs of medicine X, he now gets 10000 mgs of medicine Y.
Problem is, for medicine Y, 10000 mgs is a perfectly normal amount,
unlike for X... but medicine Y will kill the patient.

Richard

Nov 15 '05 #38

Malcolm

"Richard Heathfield" <in*****@address.co.uk.invalid> wrote

So is it your contention that, if one removes a source of error from a
program, other sources of error multiply to maintain the balance? Do you
have some kind of Theory of Conservation of Program Errors? Or is it more
likely that fixing a problem will /reduce/ the number of errors?

An awful lot of errors are introduced into programs in just that way. A bug
is discovered, and someone puts in a quick fix to suppress the symptoms,
without really getting the root of the problem.

Nov 15 '05 #39

Flash Gordon

Malcolm wrote:

"Richard Heathfield" <in*****@address.co.uk.invalid> wrote
So is it your contention that, if one removes a source of error from a
program, other sources of error multiply to maintain the balance? Do you
have some kind of Theory of Conservation of Program Errors? Or is it more
likely that fixing a problem will /reduce/ the number of errors?

An awful lot of errors are introduced into programs in just that way. A bug
is discovered, and someone puts in a quick fix to suppress the symptoms,
without really getting the root of the problem.

That's swapping bugs, and if the new bug is more likely to produce
reproducible symptoms, then you have at least reduced the average time
to debug the program.

However, if even *once* someone replaces a call to gets with correct
usage of fgets (incorrect usage does not increase the bug count since if
gets was used that is a bug by definition) then, on average, replacing
gets with fgets reduces the bug count.

So blanket advice to replace gets with fgets is still correct, although
it is advisable to at least point out there are pitfalls.
--
Flash Gordon
Living in interesting times.
Although my email address says spam, it is real and I read it.

Nov 15 '05 #40

Malcolm

"Richard Heathfield" <in*****@address.co.uk.invalid> wrote

Imagine that for a certain disease the clinically effective drug dose is
close to the lethal dose. We write a program to help out the doctor,
taking as input the patient's weight, severity of symptoms, and so on.
The
corect does is 800mg
There are a few possible scenarios.
1) Computer prints ("Internal error");
Doctor: **** computers.
2) Computer prints "ugggle uggle uggle"
Doctor: **** program.
3) Computer prints "dose 100,000mg"
Doctor: ***** programmer. Is he trying to kill someone.
4) Computer prints "dose 1000mg"
Doctor: inject.

By replacing undefined behaviour with wrong but defined behaviour, you
are
reducing the number of type 1 and type 2 errors, at the cost of more type
3 and type 4 errors. That's why Martin's advice was so badly wrong

Martin's advice was fine. Your example is flawed. Firstly, it is the
doctor,
not the computer program, that is responsible for the patient's safety.
Secondly, NONE of the behaviours you demonstrate is preferable. The
program
should accept the input completely, and process it completely, and produce
the completely correct output. This is hardly difficult.

A correct program is preferable to an incorrect one. That's the first
principle of software engineering.
fgets() can be used correctly, unfortunately it is too difficult for the
average programmer to do so. The human factor is something that people often
overlook. We have Martin Ambuls as well as Richard Heathfields as
professional C programmers. You personally will never forget to check for
that trailing newline that indicates an entire line has been read. Lesser
programmers are not always as diligent.

As for your first point, of course I chose an example where the consequence
of an error was death, in order to make the point. In reality Blogg's
restaurant supplies would get 1000 pizza bases instead of 800, with the
consequence that someone has got to get into a van and collect them. 50
pounds cost to the company. Not the end of the world, but not something you
want your software to be responsible for either.

Nov 15 '05 #41

Malcolm

"Flash Gordon" <sp**@flash-gordon.me.uk> wrote

An awful lot of errors are introduced into programs in just that way. A
bug is discovered, and someone puts in a quick fix to suppress the
symptoms, without really getting the root of the problem.

That's swapping bugs, and if the new bug is more likely to produce
reproducible symptoms, then you have at least reduced the average time to
debug the program.

That's the argument. It does have a certain force. However my experience is
that, in practise, it is erroneous.

let's say we ahve the following

void setname(EMPLOYEE *emp, char *name)
{
strcpy(emp->name, name);
}
Now if fact the name member isn't long enough to hold the string passed.
Muggins does the following

void setname(EMPLOYEE *emp, char *name)
{
if(strlen(name) < (sizeof emp->name) - 1)
strcpy(emp->name, name);
}

That's suppressing the error. He gets rid of the UB, but the poor caller has
now got to work out why his call to setname is having no effect.

What Muggins should do is of course work out why setname is being passed an
over-long name, and root out the error. He might also consider whether
setname() should assert fail on being passed an over-long string, or even
exit.

What Muggins has done will almost certainly, in practise, make the program
much more difficult to debug.

Nov 15 '05 #42

Richard Heathfield

Malcolm wrote:

You personally will never forget to check for
that trailing newline that indicates an entire line has been read.
Nonsense. Of course I will forget, sometimes. That's why program validation
methods - code reviews, unit tests, etc - are so important.
In reality
Blogg's restaurant supplies would get 1000 pizza bases instead of 800,

Only if your idea of testing is "look! it compiles!"

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
mail: rjh at above domain

Nov 15 '05 #43

Richard Heathfield

Malcolm wrote:

"Flash Gordon" <sp**@flash-gordon.me.uk> wrote
An awful lot of errors are introduced into programs in just that way. A
bug is discovered, and someone puts in a quick fix to suppress the
symptoms, without really getting the root of the problem.
That's swapping bugs, and if the new bug is more likely to produce
reproducible symptoms, then you have at least reduced the average time to
debug the program.

That's the argument. It does have a certain force. However my experience
is that, in practise, it is erroneous.

let's say we ahve the following

void setname(EMPLOYEE *emp, char *name)
{
strcpy(emp->name, name);
}
Now if fact the name member isn't long enough to hold the string passed.
Muggins does the following

void setname(EMPLOYEE *emp, char *name)
{
if(strlen(name) < (sizeof emp->name) - 1)
strcpy(emp->name, name);
}

That's suppressing the error. He gets rid of the UB, but the poor caller
has now got to work out why his call to setname is having no effect.

That's because he used data validation techniques to validate the program.
Assuming emp->name is an array (because, if it isn't, the technique needs
to be modified very slightly), what he should have done is something like:

assert(strlen(name) < sizeof emp->name);
strcpy(emp->name, name);
What Muggins has done will almost certainly, in practise, make the program
much more difficult to debug.

Moral: don't employ mugginses to cut production code.
--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
mail: rjh at above domain

Nov 15 '05 #44

Richard Heathfield

Malcolm wrote:

"Richard Heathfield" <in*****@address.co.uk.invalid> wrote

So is it your contention that, if one removes a source of error from a
program, other sources of error multiply to maintain the balance? Do you
have some kind of Theory of Conservation of Program Errors? Or is it more
likely that fixing a problem will /reduce/ the number of errors?

An awful lot of errors are introduced into programs in just that way. A
bug is discovered, and someone puts in a quick fix to suppress the
symptoms, without really getting the root of the problem.

Ah, I see. Yes. That's a very good point. Hmmm. Introducing bugs as a
problem-solving technique. I hadn't thought of that. I must try it.

That might explain why there are so many bugs in today's software! They are
not problems, but solutions! We should embrace and welcome them!

(FCOL)

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
mail: rjh at above domain

Nov 15 '05 #45

Flash Gordon

Malcolm wrote:

"Flash Gordon" <sp**@flash-gordon.me.uk> wrote
An awful lot of errors are introduced into programs in just that way. A
bug is discovered, and someone puts in a quick fix to suppress the
symptoms, without really getting the root of the problem.
That's swapping bugs, and if the new bug is more likely to produce
reproducible symptoms, then you have at least reduced the average time to
debug the program.

That's the argument. It does have a certain force. However my experience is
that, in practise, it is erroneous.

My *personal* experience is that it is true. I have had to debug many
instances of UB introduced by others, and with the exception of
dereferencing NULL pointers (which happens to cause an immediate crash)
UB is harder to debug than incorrect but defined behaviour because it is
predictable and because corrupting memory often leads to a crash later
in the code rather than visibly incorrect behaviour at the place where
the bug is.
let's say we ahve the following

void setname(EMPLOYEE *emp, char *name)
{
strcpy(emp->name, name);
}
Now if fact the name member isn't long enough to hold the string passed.
Muggins does the following

void setname(EMPLOYEE *emp, char *name)
{
if(strlen(name) < (sizeof emp->name) - 1)
strcpy(emp->name, name);
}

That's suppressing the error. He gets rid of the UB, but the poor caller has
now got to work out why his call to setname is having no effect.

What Muggins should do is of course work out why setname is being passed an
over-long name, and root out the error. He might also consider whether
setname() should assert fail on being passed an over-long string, or even
exit.

What Muggins has done will almost certainly, in practise, make the program
much more difficult to debug.

No, the second example with defined behaviour is vastly easier to debug.
With the first example the software may well visibly crashes millions of
instructions later in a completely different part of the code. With the
second example you know that the name is not being set so you step
through (or read) the code setting it and immediately find the problem.
So an excellent eexample of code where incorrect but defined behaviour
is far EASIER to debug than undefined behaviour.
--
Flash Gordon
Living in interesting times.
Although my email address says spam, it is real and I read it.

Nov 15 '05 #46

Safe version of gets

Similar topics