468,463 Members | 2,030 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,463 developers. It's quick & easy.

Why GCC does warn me when I using gets() function for accessing file

After compiling the source code with gcc v.4.1.1, I got a warning
message:
"/tmp/ccixzSIL.o: In function 'main';ex.c: (.text+0x9a): warning: the
'gets' function is dangerous and should not be used."

Could anybody tell me why gets() function is dangerous??
Thank you very much.

Cuthbert

Here is the source code I was testing:
---------------------------------------------------
/* count.c -- using standard I/O */

#include "stdafx.h"
#include <stdio.h>
#include <stdlib.h// ANSI C exit() prototype

int main(int argc, char *argv[])
{
int ch; // place to store each character as read
FILE *fp; // "file pointer"
long count = 0;

if (argc != 2)
{
printf("Usage: %s filename\n", argv[0]);
exit(1);
}
if ((fp = fopen(argv[1], "r")) == NULL)
{
printf("Can't open %s\n", argv[1]);
exit(1);
}
while ((ch = getc(fp)) != EOF)
{
putc(ch,stdout); // same as putchar(ch);
count++;
}
fclose(fp);
printf("File %s has %ld characters\n", argv[1], count);

return 0;
}
------------------------------------------------------------------

Sep 3 '06
89 5119
Mark McIntyre <ma**********@spamcop.netwrites:
[...]
I'd be fascinated to see an actual evidence of that statement. I
strongly suspect that what was actually said was something alonge the
lines of:
The code published is not intended to be a rigorous implementation of
asctime() but merely aid to understand how it works. The purpose is
not to demonstrate error handling techniques and we do not consider it
relevant to add such code which would only serve to complicate and
obfuscate the fragments.
Unfortunately, I doubt that.

The standard (C99 7.23.3.1) says:

The asctime function converts the broken-down time in the
structure pointed to by timeptr into a string in the form

Sun Sep 16 01:03:52 1973\n\0

using the equivalent of the following algorithm.

char *asctime(const struct tm *timeptr)
{
[snip]
}

I believe an implementation could add error checking for cases where
the version provided in the standard invokes undefined behavior, but
any suggestion that this is encouraged is not supported by the
standard.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Sep 4 '06 #51
goose wrote:
we******@gmail.com wrote:
<snipped>
the standard does not require this behaviour. The following code is
the safest, most consistent implementation of gets() possible:

#include <stdio.h>
char * gets_fixed (char * buf, const char * sourcefile) {
remove (sourcefile);

Instead of predictable but malicious UB, why not a
predictable (but non-malicious) side-effect?

fprintf (stderr, "WARNING: bug detected, please contact vendor\n");
UB is UB, technically you can do what you want. But in standard
negotiations you always present your most optimistic demands first.
You find common ground when *both* sides start making concessions.
AFAICT, I don't think the ANSI C committee is interested in discussion
on this point, let alone concessions.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Sep 4 '06 #52
we******@gmail.com wrote:
goose wrote:
>>we******@gmail.com wrote:
<snipped>
>>>the standard does not require this behaviour. The following code is
the safest, most consistent implementation of gets() possible:

#include <stdio.h>
char * gets_fixed (char * buf, const char * sourcefile) {
remove (sourcefile);

Instead of predictable but malicious UB, why not a
predictable (but non-malicious) side-effect?

fprintf (stderr, "WARNING: bug detected, please contact vendor\n");


UB is UB, technically you can do what you want. But in standard
negotiations you always present your most optimistic demands first.
You find common ground when *both* sides start making concessions.
AFAICT, I don't think the ANSI C committee is interested in discussion
on this point, let alone concessions.
Seeing as how I neither frequent comp.std.c nor
maintain a compiler, what exactly were the
objects the standards people put forward to
a removal of gets? Would they be prepared to
massage the wording (wrt gets) so that the
"fprintf (stderr..." above in gets won't be
non-conformant?

--
goose
Have I offended you? Send flames to root@localhost
real email: lelanthran at gmail dot com
website : www.lelanthran.com
Sep 4 '06 #53
On Sun, 03 Sep 2006 21:02:54 +0200, jacob navia
<ja***@jacob.remcomp.frwrote:
>This function is dangerous because there is no way you can pass
it the size of the given buffer.

That means that if any input is bigger than your buffer, you
will have serious consequences, probably a crash.
Only if you are lucky. The bad thing about undefined behavior is that
it can lead to other problems with more serious consequences than
crashing your program.
Remove del for email
Sep 4 '06 #54
we******@gmail.com writes:
Who is proposing to make it more dangerous? The source I gave should
be fairly safe.
While
remove (__FILE__)
is perfectly safe, chances are that it'll give an error message or
something that you don't want, because __FILE__ might not expand
to a real filename when the program is run.

I for one, often start writing programs in my code/scratch/
heirarchy, and move it to either code/tools/ or code/libs/,
depending on how general my solution is.

If I ran a program with gets() in it, I might get a "file not found"
error message or something, which would distract from the real "Don't
use gets()!" message.

In the version of your code that I implemented:
o I changed gets_fixed() to strgetsgets() to avoid user namespace.
o I had it print out a mildly insulting warning message.
o Instead of deleting the file, I simply killed the program.

--
Andrew Poelstra <http://www.wpsoftware.net/projects>
To reach me by email, use `apoelstra' at the above domain.
"Do BOTH ends of the cable need to be plugged in?" -Anon.
Sep 5 '06 #55
Mark McIntyre a écrit :
>
I'd be fascinated to see an actual evidence of that statement. I
strongly suspect that what was actually said was something alonge the
lines of:
The code published is not intended to be a rigorous implementation of
asctime() but merely aid to understand how it works. The purpose is
not to demonstrate error handling techniques and we do not consider it
relevant to add such code which would only serve to complicate and
obfuscate the fragments.

I am not the first one to point out this problem. In a “Defect Report”
filed in 2001, Clive Feather proposed to fix it. The answer of the
committee was that if any of the members of the input argument was out
of range this was “undefined behavior”, and anything was permitted,
including corrupting memory.

The answer in full (quoted from the comitee reports) is:

Defect Report #217
Submitter: Clive Feather (UK)
Submission Date: 2000-04-04
Reference Document: N/A
Version: 1.3
Date: 2001-09-18 15:51:36
Subject: asctime limits

Summary
The definition of the asctime function involves a sprintf call writing
into a buffer of size 26. This call will have undefined behavior if the
year being represented falls outside the range [-999, 9999]. Since
applications may have relied on the size of 26, this should not be
corrected by allowing the implementation to generate a longer string.
This is a defect because the specification is not self-consistent and
does not restrict the domain of the argument.
[snip]

Committee Response
From 7.1.4 paragraph 1:
If an argument to a function has an invalid value (such as a value
outside the domain of the function, or a pointer outside the address
space of the program, or a null pointer, or a pointer to non-modifiable
storage when the corresponding parameter is not const-qualified) or a
type (after promotion) not expected by a function with variable number
of arguments, the behavior is undefined.
Thus, asctime() may exhibit undefined behavior if any of the members of
timeptr produce undefined behavior in the sample algorithm (for example,
if the timeptr->tm_wday is outside the range 0 to 6 the function may
index beyond the end of an array).
As always, the range of undefined behavior permitted includes:
Corrupting memory
Aborting the program
Range checking the argument and returning a failure indicator (e.g., a
null pointer)
Returning truncated results within the traditional 26 byte buffer.
There is no consensus to make the suggested change or any change along
this line.

-----------------------------------------------------

AS ALWAYS, THE RANGE OF UNDEFINED BEHAVIOR PERMITTED INCLUDES

CORRUPTING MEMORY.
Isn't this very clear?

>>that treated us as "anti-gets fanatics" and stubbornly defended
gets(), as he defended trigraphs, and all other obsolete
stuff that pollutes the language.


Again you ruin your argument by bringing in something irrelevant. And
it seems you think "All the Worlds a 386 with a US keyboard". Which
is quite astounding from a francophone. I've used many keyboards which
had no keystroke for common C symbols. And even if not, so what?
Nobody's forcing you to use trigraphs.
Who cares about keyboards?

Why such hardware details must be in the standard?

And if the screen does not exist and the user is blind using
some Braille (blind people's alphabet)
output device? THAT would be of course more important!!!

But it is not in the standard.

Why must the standard talk about such problems?
>
>>To put up a proposal I was told by the French standardizations comitee
that I would need to pay at least 10 000 euros. Just to put the
proposal.


Thats how it works on many national Standards committees (and other
bodies for that matter, never joined a trades union or debating club?)
Interested parties have to pay their subs.

>>Then I would have to go to their meetings paying all travel expenses.


What, you expected them to pay you ? ROFL. This isn't the EU you
know....
I expected that I could submit a proposition and discuss it by email.
Yes, (maybe you know this) modern communications and this "internet" fad
make sometimes traveling obsolete.
Sep 5 '06 #56
jacob navia <ja***@jacob.remcomp.frwrote:
AS ALWAYS, THE RANGE OF UNDEFINED BEHAVIOR PERMITTED INCLUDES
Don't shout. It makes you look like a twit, even if you have a (minor)
point in the case of asctime().
Again you ruin your argument by bringing in something irrelevant. And
it seems you think "All the Worlds a 386 with a US keyboard". Which
is quite astounding from a francophone. I've used many keyboards which
had no keystroke for common C symbols. And even if not, so what?
Nobody's forcing you to use trigraphs.

Who cares about keyboards?

Why such hardware details must be in the standard?

And if the screen does not exist and the user is blind using some Braille (blind
people's alphabet) output device? THAT would be of course more important!!!
If a blind person uses a Braille reader he might well be grateful that
trigraphs exist, since standard Braille does not include the more exotic
codes such as { and #, and although there are extended versions which
do, these are neither as wide-spread as the standard version nor
standardised across the Latin alphabet using world.

Richard
Sep 5 '06 #57
<we******@gmail.comwrote in message
news:11**********************@i3g2000cwc.googlegro ups.com...
Keith Thompson wrote:
No, gets() should not be assumed to always invoke undefined behavior,
because it *doesn't* always invoke undefined behavior.

How is "doesn't have to be UB" distinct from "always UB"? The
distinction in this case is outside of the
specification/programmer/language's control. But that's basically the
same situation for pretty much *ALL* UB.
The standard says that integer overflow is UB; therefore, if I add 1 to an
int containing 32767, I may have invoked UB. However, if I know that on my
system int is 17 bits or more, I can guarantee I haven't. The size of an int
is outside the programmer/language/specification's control, so according to
your argument this is still UB, and my implementation is free to reformat my
hard drive instead. I don't think many people would agree with you on this.

Similarly, if I know that on my system stdin gets a \n character at least
every 20 characters, I can use gets() and guarantee no UB.
This is a completely different situation from gets(). The ANSI C
committee has openly declared hostile intent towards the software
industry by putting their stamp of approval on this function. They
even go so far as to put deceptive language in the standard in an
attempt to demonstrate they've addressed the problem of potential bad
uses of gets().
Is this true? Please tell me more - I'd be interested to hear.

Philip

Sep 5 '06 #58
<we******@gmail.comwrote in message
news:11**********************@i42g2000cwa.googlegr oups.com...
Here, go learn something:
>
http://www.pobox.com/~qed/random.html
One other quibble: each occurence of [0,RAND_MAX) should be [0, RAND_MAX].
RAND_MAX is the maximum possible output, not one-past; similarly, RANGE
should divide RAND_MAX+1 for uniformity.
Quoting the above page:
"Specifically the probability of choosing x in [(RAND_MAX % RANGE),
RANGE)
is less than choosing x in [0, (RAND_MAX % RANGE))."

This seems to be your main problem with the solution:
int x = rand() % RANGE;
after you explicitly state that you're looking for a "good enough" RNG.
For
RANGE much smaller than RAND_MAX, the difference in probability exists
but
is negligible - something you completely fail to mention.

I specifically state that you require 1000 * (RAND_MAX / RANGE) samples
to be able to definitively detect the anomily in the distribution.
Obviously if RANGE is small, that number may be high enough for it not
to be a problem.
So you do. My apologies.
This is even mentioned in the comp.lang.c FAQ:
"When N is close to RAND_MAX, and if the range of the random number
generator is not a multiple of N (i.e. if (RAND_MAX+1) % N != 0), all of
these methods break down: some outputs occur more often than others."

This was added to the FAQ after I made mention of this on my website.
How is it relevant how something ended up in the FAQ?
The FAQ is not meant to be a complete document. It is only meant to be
accurate. Most people asking for a random number in a given range (the
frequent askers of questions on clc) really do want RANGE much smaller
than
RAND_MAX. Therefore I think your description of this accurate but
incomplete
answer as "a very sad state of affairs" is melodramatic at best,
misleading
at worst.

First of all, the FAQ used to be much worse. Second of all, its hard
to be accurate when you are incomplete.
"Two plus two is four. Two plus other numbers is outside the scope of this
sentence."
The FAQ should at least say
something like "accurate generation of finite uniform distributions is
beyond the scope of this FAQ".
You're probably right here. Question 13.15 gives references and directs
people to the sci.math.numerical-analysis list, but 13.16 doesn't.
Instead the FAQ just gives solutions
and ignores the analysis of those solutions.
Further down, you generate a random number from 3 calls to rand():

The versions where I use a finite number of rand() calls to virtually
increase the range of rand() have the effect of changing RAND_MAX to
RAND_MAX**2 or RAND_MAX**3. Going back to the sample expression I
gave, we see that we are talking 300 billion and 1x10**16 number of
samples are required to detect the anomily in the most extreme case.
So these are "good enough" on practical systems.
We have differing views of practical systems. Not many people will take the
300k samples you described earlier on, and those who do shouldn't be using
rand() anyway; they should be using a RNG for which they have specific
guarantees of randomness which suit their application.
"2) The conversion back to integer can introduce a bias of about 1 ULP.
A
bias of 1 ULP is typically so small that it is not even realistically
feasible to test for its existence from a statistical point of view."

That depends on the number of unique "bins" of your range. If you
require a
random number with RAND_MAX ** 2.5 unique possible outcomes, then your
floating-point generator suffers exactly the same problem as you condemn
the
comp.lang.c FAQ for propagating, just with a bigger RANGE for which the
problem manifests.

Yeah but at this point we are talking about numbers where even large
super-computer problems cannot generate enough samples in reasonable
time.
Only for now. Code lasts longer than computers.
Besides, trying to operate with accuracies of better than 1ULP
in the C language, or using your computer's floating point support is
not something easily accomplished. I am just pointing out that my
solutions are running up against what your practical hard calculation
limits are anyways.
But it isn't. You can always simulate greater accuracy.

Philip

Sep 5 '06 #59
"Philip Potter" <ph***********@xilinx.comwrote:
<we******@gmail.comwrote in message
news:11**********************@i3g2000cwc.googlegro ups.com...
Keith Thompson wrote:
No, gets() should not be assumed to always invoke undefined behavior,
because it *doesn't* always invoke undefined behavior.
How is "doesn't have to be UB" distinct from "always UB"? The
distinction in this case is outside of the
specification/programmer/language's control. But that's basically the
same situation for pretty much *ALL* UB.
Similarly, if I know that on my system stdin gets a \n character at least
every 20 characters, I can use gets() and guarantee no UB.
Yes, but there are no realistic situations where you _can_ know that.
You can't even be certain if you enter the text yourself; typos are easy
to make.
This is a completely different situation from gets(). The ANSI C
committee has openly declared hostile intent towards the software
industry by putting their stamp of approval on this function. They
even go so far as to put deceptive language in the standard in an
attempt to demonstrate they've addressed the problem of potential bad
uses of gets().

Is this true? Please tell me more - I'd be interested to hear.
"The ANSI C committee has openly declared hostile intent towards the
software industry"? No, of course it isn't true. It's a predictable and
rather idiotic rant with as little bearing on reality as a Harry Potter
book. The Standard committee have better things to do with their time
than trying to piss off the likes of Paul Hsieh. Nevertheless, gets()
should go, and backwards compatibility be hanged for this one.

Richard
Sep 5 '06 #60
"Richard Bos" <rl*@hoekstra-uitgeverij.nlwrote in message
news:44*****************@news.xs4all.nl...
"Philip Potter" <ph***********@xilinx.comwrote:
<we******@gmail.comwrote in message
news:11**********************@i3g2000cwc.googlegro ups.com...
Keith Thompson wrote:
No, gets() should not be assumed to always invoke undefined
behavior,
because it *doesn't* always invoke undefined behavior.
>
How is "doesn't have to be UB" distinct from "always UB"? The
distinction in this case is outside of the
specification/programmer/language's control. But that's basically the
same situation for pretty much *ALL* UB.
Similarly, if I know that on my system stdin gets a \n character at
least
every 20 characters, I can use gets() and guarantee no UB.

Yes, but there are no realistic situations where you _can_ know that.
You can't even be certain if you enter the text yourself; typos are easy
to make.
It doesn't matter if it's realistic or not. If gets() is in the standard
then a conforming implementation must implement it properly when stdin gives
it "friendly" input. The example shows that gets() can have perfectly
well-defined behaviour, even if only in unrealistic situations. If gets()
did have completely undefined behaviour, it would be trivial to remove it
from the standard, since none of the programs which use it had defined
behaviour anyway.

Philip

Sep 5 '06 #61
On 4 Sep 2006 15:11:36 -0700, in comp.lang.c , "Spiros Bousbouras"
<sp****@gmail.comwrote:
>Mark McIntyre wrote:
>On Mon, 04 Sep 2006 18:51:13 +0200, in comp.lang.c , jacob navia
<ja***@jacob.remcomp.frwrote:
>To put up a proposal I was told by the French standardizations comitee
that I would need to pay at least 10 000 euros. Just to put the
proposal.

Thats how it works on many national Standards committees (and other
bodies for that matter, never joined a trades union or debating club?)
Interested parties have to pay their subs.
>Then I would have to go to their meetings paying all travel expenses.

What, you expected them to pay you ? ROFL. This isn't the EU you
know....

France is a member of EU.
Do tell.

*sigh* Its a reference to the "enthusiastic" expenses policy that EU
comissioners and many other staff have been reputed to enjoy.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
Sep 5 '06 #62
On Mon, 04 Sep 2006 22:31:49 GMT, in comp.lang.c , Keith Thompson
<ks***@mib.orgwrote:
>Mark McIntyre <ma**********@spamcop.netwrites:
[...]
>I'd be fascinated to see an actual evidence of that statement. I
strongly suspect that what was actually said was something alonge the
lines of:
The code published is not intended to be a rigorous implementation of
asctime() but merely aid to understand how it works. The purpose is
not to demonstrate error handling techniques and we do not consider it
relevant to add such code which would only serve to complicate and
obfuscate the fragments.

Unfortunately, I doubt that.
I disagree. I'd actually be quite annoyed if the ISO committee
littered the Standard with anal-retentive error checking. Its not a
blessed instruction manual.
>I believe an implementation could add error checking for cases where
the version provided in the standard invokes undefined behavior, but
any suggestion that this is encouraged is not supported by the
standard.
The standard doesn't require or indeed encourage many things,
including the safe use of gets() but that doesn't mean the Standard
committee are recklessly or wilfully negligent, as JN imputed.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
Sep 5 '06 #63
On Tue, 05 Sep 2006 08:41:15 +0200, in comp.lang.c , jacob navia
<ja***@jacob.remcomp.frwrote:
>The answer in full (quoted from the comitee reports) is:
....
>There is no consensus to make the suggested change or any change along
this line.
....
Thank you for posting this quote, which /entirely/ makes my point.
>-----------------------------------------------------

AS ALWAYS, THE RANGE OF UNDEFINED BEHAVIOR PERMITTED INCLUDES

CORRUPTING MEMORY.
Indeed. So what? And don't shout.
>Isn't this very clear?
Absolutely. So what?
>Nobody's forcing you to use trigraphs.

Who cares about keyboards?
The standard requires the use of a basic character set. Since not all
keyboard layouts contain all those characters a means was provided to
resolve this issue.
>I expected that I could submit a proposition and discuss it by email.
What, you expected the committee members to give up yet more of their
personal time (they have day jobs too), to enter into personal
correspondence with you, who couldn't even be bothered to attend
meetings?
>Yes, (maybe you know this) modern communications and this "internet" fad
make sometimes traveling obsolete.
Ever heard of comp.std.c?
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
Sep 5 '06 #64
On Tue, 05 Sep 2006 08:41:15 +0200, in comp.lang.c , jacob navia
<ja***@jacob.remcomp.frwrote:

<stuff>

Oh, and for reference, I'm threadplonking this thread, as I have no
further desire to feed Jacob's paranoia, or to read his pomposity. I
get plenty of opportunities in other threads, sadly.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
Sep 5 '06 #65
Mark McIntyre <ma**********@spamcop.netwrites:
On Mon, 04 Sep 2006 22:31:49 GMT, in comp.lang.c , Keith Thompson
<ks***@mib.orgwrote:
>>Mark McIntyre <ma**********@spamcop.netwrites:
[...]
>>I'd be fascinated to see an actual evidence of that statement. I
strongly suspect that what was actually said was something alonge the
lines of:
The code published is not intended to be a rigorous implementation of
asctime() but merely aid to understand how it works. The purpose is
not to demonstrate error handling techniques and we do not consider it
relevant to add such code which would only serve to complicate and
obfuscate the fragments.

Unfortunately, I doubt that.

I disagree. I'd actually be quite annoyed if the ISO committee
littered the Standard with anal-retentive error checking. Its not a
blessed instruction manual.
The key phrase from the Standard, which you snipped, is:

"using the equivalent of the following algorithm"

It seems to me that the wording encourages the use of the actual code
from the standard to implement asctime().

By contrast, the standard provides a sample implementation of srand()
and rand(), but doesn't require the actual implementation to use an
equivalent algorithm. (Too many implementations do so anyway.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Sep 5 '06 #66
Mark McIntyre wrote:
>
I disagree. I'd actually be quite annoyed if the ISO committee
littered the Standard with anal-retentive error checking. Its not a
blessed instruction manual.
This is the main reason why C has lost most of its
supporters.

This "macho" attitude, this disdain for careful programming,
this "anything goes" attitude.

Careful error checking is "anal retentive" for people
that go around leaving their shit in each and every place!

Careful error specification is the most important
thing to do in the C standard now.

jacob
Sep 5 '06 #67
jacob navia said:
Mark McIntyre wrote:
>>
I disagree. I'd actually be quite annoyed if the ISO committee
littered the Standard with anal-retentive error checking. Its not a
blessed instruction manual.

This is the main reason why C has lost most of its
supporters.
What makes you think C has lost most of its supporters?
This "macho" attitude, this disdain for careful programming,
this "anything goes" attitude.
That isn't what Mark said. He only said that he didn't want the Standard
littered with error-checking code. He didn't say he didn't want his own
programs to do error-checking.

It really is time you learned to read. I run a little course...

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
Sep 5 '06 #68
Philip Potter wrote:
<we******@gmail.comwrote in message
Keith Thompson wrote:
No, gets() should not be assumed to always invoke undefined behavior,
because it *doesn't* always invoke undefined behavior.
How is "doesn't have to be UB" distinct from "always UB"? The
distinction in this case is outside of the
specification/programmer/language's control. But that's basically the
same situation for pretty much *ALL* UB.

The standard says that integer overflow is UB; therefore, if I add 1 to an
int containing 32767, I may have invoked UB.
Correct, that's why Richard Seacord recently created a secure integer
library. (Personally, I just make sure my ranges make sense.)
[...] However, if I know that on my
system int is 17 bits or more, I can guarantee I haven't. The size of an int
is outside the programmer/language/specification's control, so according to
your argument this is still UB, and my implementation is free to reformat my
hard drive instead. I don't think many people would agree with you on this.
Uhhh ... that ANSI C committee itself agrees with this point of view.
I would actually prefer to take your point of view, and decribe a
limited scope of "bad behavior" for certain failures like numerical
overflows. But the ANSI C people decided not to bother with that. So
yes, overflowing an integer apparently can email the KGB the US nuclear
launch codes.
Similarly, if I know that on my system stdin gets a \n character at least
every 20 characters, I can use gets() and guarantee no UB.
All your doing is going ahead and translating the universe of UB that
comes with gets() into one narrow manifestation or predictable
behavior. That's exactly what I did in my sample implementation, BTW.
We can both do this, of course, because we are covered the by the UB.
Either way neither is behaving as the optimistic description in the
standard suggests.
This is a completely different situation from gets(). The ANSI C
committee has openly declared hostile intent towards the software
industry by putting their stamp of approval on this function. They
even go so far as to put deceptive language in the standard in an
attempt to demonstrate they've addressed the problem of potential bad
uses of gets().

Is this true? Please tell me more - I'd be interested to hear.
I found this in the C9X Rationale (sorry, got this mixed up with the
standard itself):

"Because gets does not check for buffer overrun,
it is generally unsafe to use when its input is not
under the programmer's control. [...]"

Ok, so they have a rudimentary understanding of the problem.

"[...] This has cause some to question whether it
should appear in the Standard at all. [...]"

Classic PR -- "some to question ...". A *LOT* of people question this.
Compare to how Fox news in the US reports on things that disagree with
their bias. Anyhow, so they recognize the people who understand what's
wrong with gets() do exist. So it all looks pretty reasonable right?

"The Committee decided that gets was useful and
convenient in those special circumstances when
the programmer does have adequate control over
the input, and as longstanding existing practice,
it needed a standard specification."

Two distortions in a single sentence:

1) Any place that gets could in theory be safely used, fgets can be
safely used just as easily (if you want to be an idiot, you can even
pass in INT_MAX as the buffer length parameter). Therefore gets is not
needed (and thus neither does a specification for it) for this reason.

2) Longtime existing practice is not a justification but rather an
indictment, because it is erroneous practice.

I.e., removing gets actually *improves* the situation for *both*
reasons. The longtime existing practice needs to stop, and even under
programmer control fgets works at least as well. Black is white, up is
down, war is peace, etc. Of course they go ahead an contradict
themselves in the very next sentence:

"In general, however, the preferred function is fgets."

Ya think? In fact do you think maybe its so preferred that you should
always use it (or something even better) instead?

Do you see what they've done? They've gone ahead and presented the
main and sufficient reason for taking gets *out* of the standard, and
just pretended the logic was inverted and claimed that's why they are
leaving it *in* the standard. So if you try to bring up the issue to a
member of the the standards committee, they can point to the rationale
and claim "that issue has been addressed". As such they don't have to
explain themselves. They don't have to justify themselves, they just
have to claim the logic implies the opposite of what it really does.
If you don't get this, go read 1984.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Sep 6 '06 #69
<we******@gmail.comwrote in message
news:11**********************@i3g2000cwc.googlegro ups.com...
Philip Potter wrote:
[...] However, if I know that on my
system int is 17 bits or more, I can guarantee I haven't. The size of an
int
is outside the programmer/language/specification's control, so according
to
your argument this is still UB, and my implementation is free to
reformat my
hard drive instead. I don't think many people would agree with you on
this.
>
Uhhh ... that ANSI C committee itself agrees with this point of view.
I would actually prefer to take your point of view, and decribe a
limited scope of "bad behavior" for certain failures like numerical
overflows. But the ANSI C people decided not to bother with that. So
yes, overflowing an integer apparently can email the KGB the US nuclear
launch codes.
You've completely missed the point here. Re-read the sentence "If I know
that on my system int is 17 bits or more, I can guarantee I haven't [invoked
UB when adding 32767 to 1]".
Similarly, if I know that on my system stdin gets a \n character at
least
every 20 characters, I can use gets() and guarantee no UB.

All your doing is going ahead and translating the universe of UB that
comes with gets()
http://en.wikipedia.org/wiki/Begging_the_question
into one narrow manifestation or predictable
behavior. That's exactly what I did in my sample implementation, BTW.
Except that yours isn't standard-compliant. gets() does not invoke UB unless
it actually overruns the buffer. If you believe otherwise, quote C&V, rather
than just asserting it.

Philip

Sep 6 '06 #70
we******@gmail.com wrote:
Philip Potter wrote:
<we******@gmail.comwrote in message
This is a completely different situation from gets(). The ANSI C
committee has openly declared hostile intent towards the software
industry by putting their stamp of approval on this function. They
even go so far as to put deceptive language in the standard in an
attempt to demonstrate they've addressed the problem of potential bad
uses of gets().
Is this true? Please tell me more - I'd be interested to hear.

I found this in the C9X Rationale (sorry, got this mixed up with the
standard itself):

"Because gets does not check for buffer overrun,
it is generally unsafe to use when its input is not
under the programmer's control. [...]"

Ok, so they have a rudimentary understanding of the problem.

"[...] This has cause some to question whether it
should appear in the Standard at all. [...]"
My dear fellow, if you can't even be bothered to quote them correctly,
you shouldn't be the one to whine.

Richard
Sep 6 '06 #71
Philip Potter wrote:
<we******@gmail.comwrote in message
Philip Potter wrote:
[...] However, if I know that on my
system int is 17 bits or more, I can guarantee I haven't. The size of an int
is outside the programmer/language/specification's control, so according to
your argument this is still UB, and my implementation is free to reformat my
hard drive instead. I don't think many people would agree with you on this.
Uhhh ... that ANSI C committee itself agrees with this point of view.
I would actually prefer to take your point of view, and decribe a
limited scope of "bad behavior" for certain failures like numerical
overflows. But the ANSI C people decided not to bother with that. So
yes, overflowing an integer apparently can email the KGB the US nuclear
launch codes.

You've completely missed the point here. Re-read the sentence "If I know
that on my system int is 17 bits or more, I can guarantee I haven't [invoked
UB when adding 32767 to 1]".
What relevance is that? The C standard says nothing about such
guarantees. You are comflating a particular implementation with the
standard. Of course an implementation can do whatever it wants for
platform-specific and undefined behavior. So it does -- this does not
prove any point.
Similarly, if I know that on my system stdin gets a \n character at
least every 20 characters, I can use gets() and guarantee no UB.
All your doing is going ahead and translating the universe of UB that
comes with gets()

http://en.wikipedia.org/wiki/Begging_the_question
That does not apply here. That gets() comes with UB is not in dispute.
That you can remove its undefinedness on a particular platform, is
nothing more than an abuse of the meaning of the word UB.
into one narrow manifestation or predictable
behavior. That's exactly what I did in my sample implementation, BTW.

Except that yours isn't standard-compliant.
Of course it is -- under conditions of UB any behavior is
standard-compliant.
[...] gets() does not invoke UB unless
it actually overruns the buffer.
But nothing in the program can make this condition either happen or not
happen. I.e., to be well defined the spec has to specify something
outside of the C language. Besides not being its mandate -- it
actually does not do that. So the specification does not describe
conditions within the confines of what its describing (that C language,
not what user should do) under which the call can be made to be well
defined. The "unless its actually overruns the buffer" is nothing that
the C standard can explain with any specificity -- and it doesn't try.
[...] If you believe otherwise, quote C&V, rather
than just asserting it.
Quoting from the standard is useless since the standard does not make
any attempt to analyze things to their logical conclusion. Taking a
classic cue from Keith Thomspon, if you narrow your view of C
programming to just the spec you would conclude that the C language
does not contain variables (I kid you not, he posted this). Or as
other people keep claiming -- that C doesn't have a stack (if function
calls and returns are always bracketted like pushes and pops, do we not
have a stack? In fact a "call" stack?).

But telling me to justify my "beliefs" by citing chapter and verse you
are just being like the standards committee was with the gets()
rationale by attempting to subvert the rules for the argument.

Buffer overruns invoke UB. gets() may invoke buffer overruns
independent of how the programmer uses it. In some implementations its
possible to go outside the ANSI specification to force gets() to not
buffer overflow and have the behavior that's described in the spec,
however clearly the spec does not delineate these conditions.

So regardless of what the spec says under the heading of gets(), the UB
is inescapable from within the spec, which means the behavior described
in the spec is irrelevant since UB can invoke any behavior.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Sep 6 '06 #72
<we******@gmail.comwrote in message
news:11**********************@m79g2000cwm.googlegr oups.com...
Philip Potter wrote:
You've completely missed the point here. Re-read the sentence "If I know
that on my system int is 17 bits or more, I can guarantee I haven't
[invoked
UB when adding 32767 to 1]".

What relevance is that? The C standard says nothing about such
guarantees.
It says that the size of an int is implementation-defined. It describes the
meaning of "implementation-defined" carefully. It talks about minimum values
for INT_MAX, and says that integer overflow is UB.

If integer overflow happens, the behaviour is undefined. If an addition does
not overflow, the behaviour is well-defined, and must conform to the
standard's definition of addition.

On an implementation with INT_MAX>32767, 32767+1 is not an overflow, and
therefore not UB. It must result in 32768 - no other behaviour is
conforming.

Please tell me which step in this argument you disagree with.
You are comflating a particular implementation with the
standard. Of course an implementation can do whatever it wants for
platform-specific and undefined behavior. So it does -- this does not
prove any point.
No it can't. Please see FAQ 11.33. Implementation-defined behaviour must be
consistent, and must fit within the restrictions imposed by the standard.

If you are going to continue to place your hands over your ears, singing to
yourself, you are welcome to. I am tired of trying to talk sense over the
endless noise you put out.
[...] If you believe otherwise, quote C&V, rather
than just asserting it.

Quoting from the standard is useless since the standard does not make
any attempt to analyze things to their logical conclusion.
It sure beats your preferred method of "Yes it is!" "No it isn't!"

Philip

Sep 6 '06 #73
we******@gmail.com writes:
Philip Potter wrote:
[...]
>Except that yours isn't standard-compliant.

Of course it is -- under conditions of UB any behavior is
standard-compliant.
No. I'll expand on that below.

[...]
Quoting from the standard is useless since the standard does not make
any attempt to analyze things to their logical conclusion. Taking a
classic cue from Keith Thomspon, if you narrow your view of C
programming to just the spec you would conclude that the C language
does not contain variables (I kid you not, he posted this).
I think you're referring to the "curious about array initialization."
thread from last April and May.

I did not say that "the C language does not contain variables". (If I
did, please cite the article in which I said that.) I said that the C
standard does not define the term "variable". The discussion is
archived on groups.google.com; anyone who's interested can read it
there.
Or as
other people keep claiming -- that C doesn't have a stack (if function
calls and returns are always bracketted like pushes and pops, do we not
have a stack? In fact a "call" stack?).
The semantics of function calling does require some sort of structure
that behaves in a stack-like manner (last-in first-out). On the other
hand, the term "stack" is also commonly used to refer to a particular
data structure implemented in hardware, where a CPU register is
dedicated as a "stack pointer", and the stack grows and shrinks
through contiguous memory addresses. This kind of "stack" is not
required or implied by the C standard, and there are implementations
that don't have such a "stack"; the data storage required for the
local objects created by a function call is allocated by something
similar to malloc(), and released by something similar to free().
Referring to "the stack" on such a system would be misleading.

[...]
So regardless of what the spec says under the heading of gets(), the UB
is inescapable from within the spec, which means the behavior described
in the spec is irrelevant since UB can invoke any behavior.
Ok, getting back to gets().

You've encouraged me to do something I don't believe I've ever done
here. I'm going to defend gets().

Suppose I've written a function that takes a char* argument (that
points to a string), and I want to write a quick and dirty test
program for it. (I might write a more rigorous test framework later
on; for now, I just want to try it with a few arguments to see if the
results seem plausible.) So, I write a small program like this:

#include <stdio.h>
#include <string.h>
void show(char *s)
{
printf("%d: \"%s\"\n", (int)strlen(s), s);
}

int main(void)
{
char buf[256];
while (gets(buf) != NULL) {
show(buf);
}
return 0;
}

This lets me manually test my function with a few values. Since I
wrote the program in the last 5 minutes, I *know* that it could fail
if I enter too long a line. Once I've satisfied myself that the
function works more or less as I want it to, I delete the program.
I've never made it available to anyone else. I have exactly as much
control over the program's input as I do over the program itself.

If I enter a 300-character line while running this program, I get
undefined behavior. The consequences would be entirely my own fault.

But suppose I enter a 10-character line. The C standard guarantees
that it will work properly. If I'm using your proposed
implementation, on which any call to gets() attempts to reformat my
hard drive, then the damage to my system is entirely *your* fault. I
used a standard function in a safe manner, in a way that *cannot*
invoke undefined behavior (because I control the input, and I will not
enter an overly long input line ).

Having said that, if I were writing such a quick-and-dirty test
program in real life I *still* wouldn't use gets(). I'd use fgets()
and remove the trailing '\n'. (There would always be one because,
again, I wouldn't feed very long lines to the program; if I
accidentally did so, the program would misbehave, but in a benign and
predictable manner.) gets() should not be used, and it should be
removed from the standard, or at least formally deprecated.
Implementations should warn about any calls to gets().

But *if* I use it in a manner whose behavior is guaranteed by the
standard, I have every right to expect it to behave as the standard
specifies.

I don't expect you to be willing to understand this, but I'm prepared
to be pleasantly surprised.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Sep 6 '06 #74
Keith Thompson wrote:
we******@gmail.com writes:
Philip Potter wrote:
[...]
Except that yours isn't standard-compliant.
Of course it is -- under conditions of UB any behavior is
standard-compliant.

No. I'll expand on that below.
Actually you didn't. You simply tried to defend gets() by describing a
scenario outside the specification (hence under UB) that was
predictable in a way you've constructed which happens to coincide with
what the optimistic things that specification tries to describe in its
explanation of gets(). But you never removed the "UB cloud" which
covers the whole thing.
I did not say that "the C language does not contain variables". (If I
did, please cite the article in which I said that.) I said that the C
standard does not define the term "variable".
The relevant quote:

" [...] It is not obvious what the word "variable" should mean in the
context of C. [...]"

And if you think that quote is out of context, you can look up it for
for yourself and see the follow up with a half dozen examples of things
in C where it supposedly can't be decided whether or not something is a
variable.
Or as
other people keep claiming -- that C doesn't have a stack (if function
calls and returns are always bracketted like pushes and pops, do we not
have a stack? In fact a "call" stack?).

The semantics of function calling does require some sort of structure
that behaves in a stack-like manner (last-in first-out). On the other
hand, the term "stack" is also commonly used to refer to a particular
data structure implemented in hardware, where a CPU register is
dedicated as a "stack pointer", and the stack grows and shrinks
through contiguous memory addresses. This kind of "stack" is not
required or implied by the C standard, and there are implementations
that don't have such a "stack"; the data storage required for the
local objects created by a function call is allocated by something
similar to malloc(), and released by something similar to free().
Referring to "the stack" on such a system would be misleading.
What the hell are you talking about? If you think "the stack" means a
hardware stack, its because of something in your mind. We all
understand that the C specification is implemented on an abstract
machine, and that's where its "stack" is. If someone is conflating
this stack with a particular stack implementation (such as Sparc's
register window mechanism, or Itanium's register block stack thingy),
its no different from the people who post with gcc-specific extensions
(like an extra envp parameter in main) which happens here all the time.

And of course in this case the conflation is usually harmless since its
a very rare thing for someone to use an *extension* or
platform-specific feature of a hardware stack in real world code. You
usually use it exactly in the same way you use it in its abtract form
-- you push and pop to it. Compliers may play games with hardware
stacks, general programmers (even hard code low level programmers like
myself) usually do not.

So insisting that C has no stack because the specification doesn't say
that it does is just silly. This is why confining discussion of C only
to the language in the specification is idiotic.
[...]
So regardless of what the spec says under the heading of gets(), the UB
is inescapable from within the spec, which means the behavior described
in the spec is irrelevant since UB can invoke any behavior.

Ok, getting back to gets().

You've encouraged me to do something I don't believe I've ever done
here. I'm going to defend gets().
I know your mind doesn't work very "flexibility" at all but I'll give
it a shot -- replace your bad gets() program with another program,
which say, performs a simple buffer overflow:

char digs[5];
sprintf (digs, "%d", (int) val);

Ok, then continue to apply the reasoning and statements you just made
with your gets() program, but in the obvious analogous way. Ok, so
here are the statements you made which apply equally to a program
whicih contains the above:

1) " ... But suppose I enter a 10-character line. The C standard
guarantees that it will work properly."

-- Similarly, if we make val small enough here, it will work
properly.

2) "If I enter a 300-character line while running this program, I get
undefined behavior. The consequences would be entirely my own fault.
[...] But suppose I enter a 10-character line. The C standard
guarantees that it will work properly."

-- Similarly if I make val a 5+ digit integer, the program that
includes the above will have UB. But if I make val a 4 digit
positive number, or 3 digit negative number, it will work
just fine.

The UB we get from overrunning digs[] here obviously can lead to
arbitrary action since it will smash and adjacent declarations
including possibly volatiles, sig_atomic_t or whatever. Same with your
gets() program. So both programs occupy the same space of what's the
worst that can go wrong. Either program could easily format your hard
drive with the right set of circumstances.

So we see the analogy is a pretty close fit, and because of that we
usually look at code such as the above very skeptically. In other
words your argument about gets() hasn't specifcally bolstered gets() in
any way that doesn't also bolster the code above. Let me repeat --
your *argument* doesn't significantly distinguish gets() from the code
snippet above in the context we are in.

Where the analogy falls down, however, is that that above code can be
made to work solely through mechanisms inside the program itself. If I
have some way of guaranteeing that val is between -999 and 9999 solely
through mechanisms inside the program itself, then everything is fine.
I would be using things *IN THE C STANDARD* to make sure that the
semantics of that code remained compliant. The key point is that I do
not need to venture outside the system/program or invoke platform
specific behavior to guarantee that code and brinng it within spec.
I.e., the semantic correctness is guaranteed, essentially by other
contents from the standard itself. I.e., the code above is actually
correct within certain assumptions, and those assumptions can be
enforced by nothing more than the standard itself. The potential for
UB is *eliminated* from within usage of the specification itself.

Your gets() program cannot be similarly fixed, or similarly rely on
analogous guarantees. In order to make gets() work according to what
the standard is optimistically hoping for you *MUST* step outside of
the standard. Thus the standard is trying to specify something that
specifically needs something that isn't (and can't realistically be
put) in the standard. BTW, what does UB mean? Doesn't it mean
arbitrary behavior outside the specification? So trying to format your
hard drive (perhaps successfully, perhaps not) because you stepped
outside the spec

Your argument fails to make this distinction (can you see this?) and by
implication misses the whole point.
Having said that, if I were writing such a quick-and-dirty test
program in real life I *still* wouldn't use gets().
And in this case, its not because of any typically wrong reasoning on
your part. You are actually behaving correctly. As would any
programmer that behaved this way. So why is this being specified? The
rationale is not convincing, and in fact is clearly meant as
subterfuge.
[...] I'd use fgets()
and remove the trailing '\n'. (There would always be one because,
again, I wouldn't feed very long lines to the program; if I
accidentally did so, the program would misbehave, but in a benign and
predictable manner.)
So you've traded one bad behavior for another? ... Whatever, that's
another discussion entirely. You won't UB with this strategy (just get
wrong results, but predictably so.) The \n can also be omitted if EOF
is encountered without a \n just before it, btw. A \n can also
*appear* to be omitted if a \0 is consumed before a \n is, and you are
just using C's char * string semantics on the results.
[...] gets() should not be used, and it should be
removed from the standard, or at least formally deprecated.
Implementations should warn about any calls to gets().
So what are you defending?
But *if* I use it in a manner whose behavior is guaranteed by the
standard, I have every right to expect it to behave as the standard
specifies.
Ok, but the standard *CANNOT* specify that guarantee. It makes a
"chicken before the egg" kind of specification about how gets() works.
It basically says *IF* the call to gets() doesn't invoke UB, then it
reflects some kind of stdin input. But that *IF* cannot be satisfied
by any content in the standard at all. Are you following? Therefore
the standard is not *specifying* a way for gets() to behave in the
optimistic way they are hoping it does.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Sep 6 '06 #75
we******@gmail.com writes:
Keith Thompson wrote:
>we******@gmail.com writes:
Philip Potter wrote:
[...]
>Except that yours isn't standard-compliant.

Of course it is -- under conditions of UB any behavior is
standard-compliant.

No. I'll expand on that below.

Actually you didn't. You simply tried to defend gets() by describing a
scenario outside the specification (hence under UB) that was
predictable in a way you've constructed which happens to coincide with
what the optimistic things that specification tries to describe in its
explanation of gets(). But you never removed the "UB cloud" which
covers the whole thing.
>I did not say that "the C language does not contain variables". (If I
did, please cite the article in which I said that.) I said that the C
standard does not define the term "variable".

The relevant quote:

" [...] It is not obvious what the word "variable" should mean in the
context of C. [...]"

And if you think that quote is out of context, you can look up it for
for yourself and see the follow up with a half dozen examples of things
in C where it supposedly can't be decided whether or not something is a
variable.
Thank you for confirming that I did *not* say that "the C language
does not contain variables".
Or as
other people keep claiming -- that C doesn't have a stack (if function
calls and returns are always bracketted like pushes and pops, do we not
have a stack? In fact a "call" stack?).

The semantics of function calling does require some sort of structure
that behaves in a stack-like manner (last-in first-out). On the other
hand, the term "stack" is also commonly used to refer to a particular
data structure implemented in hardware, where a CPU register is
dedicated as a "stack pointer", and the stack grows and shrinks
through contiguous memory addresses. This kind of "stack" is not
required or implied by the C standard, and there are implementations
that don't have such a "stack"; the data storage required for the
local objects created by a function call is allocated by something
similar to malloc(), and released by something similar to free().
Referring to "the stack" on such a system would be misleading.

What the hell are you talking about? If you think "the stack" means a
hardware stack, its because of something in your mind. We all
understand that the C specification is implemented on an abstract
machine, and that's where its "stack" is.
[snip]
>
So insisting that C has no stack because the specification doesn't say
that it does is just silly. This is why confining discussion of C only
to the language in the specification is idiotic.
In most implementations, local variables and other storage associated
with a called function are allocated on "the stack". My understanding
of the phrase "the stack" in this context is exactly the kind of
hardware-based stack I discussed above, something that is not
guaranteed by the standard. The word "the" implies something
specific.

If the phrase "the stack" doesn't carry that implication for you,
that's terrific, but I strongly suspect that it does for most people.
>[...]
So regardless of what the spec says under the heading of gets(), the UB
is inescapable from within the spec, which means the behavior described
in the spec is irrelevant since UB can invoke any behavior.

Ok, getting back to gets().

You've encouraged me to do something I don't believe I've ever done
here. I'm going to defend gets().
[snip]
Your gets() program cannot be similarly fixed, or similarly rely on
analogous guarantees. In order to make gets() work according to what
the standard is optimistically hoping for you *MUST* step outside of
the standard. Thus the standard is trying to specify something that
specifically needs something that isn't (and can't realistically be
put) in the standard. BTW, what does UB mean? Doesn't it mean
arbitrary behavior outside the specification? So trying to format your
hard drive (perhaps successfully, perhaps not) because you stepped
outside the spec
Ok, there's no way within the standard to use gets() safely. Beyond
the question of whether it should be used in any circumstances, it
certainly shouldn't be used in code that's intended to be portable.
(The sample program I posted was not intended to be portable; it was
specifically designed to be used in tightly controlled conditions and
then discarded.)

Not all C code has to be portable. Most C code should be portable,
but most C *programs* are not; they depend on system-specific
features. fopen() can't be successfully called without a valid file
name, and there's no portable way (other than tmpnam()) to generate a
valid file name. (And yes, fopen() behaves in a well-defined manner
if you give it an invalid file name, which makes it more robust than
gets().)
Your argument fails to make this distinction (can you see this?) and by
implication misses the whole point.
I didn't miss the point. I made a different point.
>Having said that, if I were writing such a quick-and-dirty test
program in real life I *still* wouldn't use gets().

And in this case, its not because of any typically wrong reasoning on
your part. You are actually behaving correctly. As would any
programmer that behaved this way. So why is this being specified? The
rationale is not convincing, and in fact is clearly meant as
subterfuge.
A subterfuge? Do you think that the ISO C committee keeps gets() in
the standard for malicious purposes? What is their motivation?

[...]
>[...] gets() should not be used, and it should be
removed from the standard, or at least formally deprecated.
Implementations should warn about any calls to gets().

So what are you defending?
Just this: Given that gets() is defined by the standard, a conforming
implementation must implement it properly. gets() does not always
invoke undefined behavior. In those cases where it doesn't, it must
behave as specified.
>But *if* I use it in a manner whose behavior is guaranteed by the
standard, I have every right to expect it to behave as the standard
specifies.

Ok, but the standard *CANNOT* specify that guarantee. It makes a
"chicken before the egg" kind of specification about how gets() works.
It basically says *IF* the call to gets() doesn't invoke UB, then it
reflects some kind of stdin input.
Correct.
But that *IF* cannot be satisfied
by any content in the standard at all. Are you following? Therefore
the standard is not *specifying* a way for gets() to behave in the
optimistic way they are hoping it does.
The standard provides no portable way to use gets() safely.

There are *non-portable* ways to use gets() safely.

C is specifically designed to support both portable and non-portable
programming.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Sep 7 '06 #76
On Thu, 6 Sep 2006 we******@gmail.com wrote:
In order to make gets() work according to what
the standard is optimistically hoping for you *MUST* step outside of
the standard. Thus the standard is trying to specify something that
specifically needs something that isn't (and can't realistically be
put) in the standard. BTW, what does UB mean? Doesn't it mean
arbitrary behavior outside the specification? So trying to format your
hard drive (perhaps successfully, perhaps not) because you stepped
outside the spec
I believe that gets() works as follows. I simply cannot see
how you could apply the as-if rule to ``optimize'' this code into
unconditional arbitrary behavior.

/*
* 7.19.7.7 The gets function
*
* Implemented by Tak-Shing Chan
*/

#include <stdio.h>

char *
gets(char *s)
{
int c;
char *itaptbs = s;

/*
* 7.19.7.7 paragraph 2
*
* The gets function reads characters from the input
* stream pointed to by stdin, into the array pointed to
* by s, until end-of-file is encountered or a new-line
* character is read.
*/
while (!((c = getchar()) == EOF || c == '\n'))
*itaptbs++ = c;

/*
* 7.19.7.7 paragraph 3
*
* If end-of-file is encountered and no characters have
* been read into the array, the contents of the array
* remain unchanged and a null pointer is returned. If
* a read error occurs during the operation, the array
* contents are indeterminate and a null pointer is
* returned.
*/
if (c == EOF && (itaptbs == s || ferror(stdin)))
return NULL;

/*
* 7.19.7.7 paragraph 2
*
* Any new-line character is discarded, and a null
* character is written immediately after the last
* character read into the array.
*/
*itaptbs = 0;
/*
* 7.19.7.7 paragraph 3
*
* The gets function returns s if successful.
*/
return s;
}

Tak-Shing
Sep 7 '06 #77
Tak-Shing Chan wrote:
On Thu, 6 Sep 2006 we******@gmail.com wrote:
In order to make gets() work according to what
the standard is optimistically hoping for you *MUST* step outside of
the standard. Thus the standard is trying to specify something that
specifically needs something that isn't (and can't realistically be
put) in the standard. BTW, what does UB mean? Doesn't it mean
arbitrary behavior outside the specification? So trying to format your
hard drive (perhaps successfully, perhaps not) because you stepped
outside the spec

I believe that gets() works as follows. I simply cannot see
how you could apply the as-if rule to ``optimize'' this code into
unconditional arbitrary behavior.
This is because you are doing a literal translation of what they are
saying without taking anything to a logical conclusion. This has
nothing to do with optimization. "As-if" also has little meaning once
UB is encountered -- every behaviour is "as-if" once you enact UB. All
that needs to be established is that there is a UB here.
/*
* 7.19.7.7 The gets function
*
* Implemented by Tak-Shing Chan
*/

#include <stdio.h>

char *
gets(char *s)
{
int c;
char *itaptbs = s;

/*
* 7.19.7.7 paragraph 2
*
* The gets function reads characters from the input
* stream pointed to by stdin, into the array pointed to
* by s, until end-of-file is encountered or a new-line
* character is read.
*/
while (!((c = getchar()) == EOF || c == '\n'))
*itaptbs++ = c;
This last line causes an unfixable and unaddressable UB. The fact that
this is not stated in the specification does not change it from being
so. Because of that, the code can in fact, undo the stream state, send
the characters back, send the state of s into anything it likes, then
proceed to format your hard drive. In fact it can do anything, and a
programmer cannot have any expectation that anything less happens,
except in platform-specific scenarios that are not covered by the
specification.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Sep 7 '06 #78
On Thu, 6 Sep 2006 we******@gmail.com wrote:
Tak-Shing Chan wrote:
>On Thu, 6 Sep 2006 we******@gmail.com wrote:
>>In order to make gets() work according to what
the standard is optimistically hoping for you *MUST* step outside of
the standard. Thus the standard is trying to specify something that
specifically needs something that isn't (and can't realistically be
put) in the standard. BTW, what does UB mean? Doesn't it mean
arbitrary behavior outside the specification? So trying to format your
hard drive (perhaps successfully, perhaps not) because you stepped
outside the spec

I believe that gets() works as follows. I simply cannot see
how you could apply the as-if rule to ``optimize'' this code into
unconditional arbitrary behavior.

This is because you are doing a literal translation of what they are
saying without taking anything to a logical conclusion. This has
nothing to do with optimization. "As-if" also has little meaning once
UB is encountered -- every behaviour is "as-if" once you enact UB. All
that needs to be established is that there is a UB here.
UB would only occur if the input really exceeds the size of
the array pointed to by s.
> /*
* 7.19.7.7 The gets function
*
* Implemented by Tak-Shing Chan
*/

#include <stdio.h>

char *
gets(char *s)
{
int c;
char *itaptbs = s;

/*
* 7.19.7.7 paragraph 2
*
* The gets function reads characters from the input
* stream pointed to by stdin, into the array pointed to
* by s, until end-of-file is encountered or a new-line
* character is read.
*/
while (!((c = getchar()) == EOF || c == '\n'))
*itaptbs++ = c;

This last line causes an unfixable and unaddressable UB. The fact that
this is not stated in the specification does not change it from being
so. Because of that, the code can in fact, undo the stream state, send
the characters back, send the state of s into anything it likes, then
proceed to format your hard drive. In fact it can do anything, and a
programmer cannot have any expectation that anything less happens,
except in platform-specific scenarios that are not covered by the
specification.
It is not UB if the array pointed to by s is large enough for
the input.

Tak-Shing
Sep 7 '06 #79
Keith Thompson wrote:
we******@gmail.com writes:
Keith Thompson wrote:
we******@gmail.com writes:
Philip Potter wrote:
Or as
other people keep claiming -- that C doesn't have a stack (if function
calls and returns are always bracketted like pushes and pops, do we not
have a stack? In fact a "call" stack?).

The semantics of function calling does require some sort of structure
that behaves in a stack-like manner (last-in first-out). On the other
hand, the term "stack" is also commonly used to refer to a particular
data structure implemented in hardware, where a CPU register is
dedicated as a "stack pointer", and the stack grows and shrinks
through contiguous memory addresses. This kind of "stack" is not
required or implied by the C standard, and there are implementations
that don't have such a "stack"; the data storage required for the
local objects created by a function call is allocated by something
similar to malloc(), and released by something similar to free().
Referring to "the stack" on such a system would be misleading.
What the hell are you talking about? If you think "the stack" means a
hardware stack, its because of something in your mind. We all
understand that the C specification is implemented on an abstract
machine, and that's where its "stack" is.
[snip]

So insisting that C has no stack because the specification doesn't say
that it does is just silly. This is why confining discussion of C only
to the language in the specification is idiotic.

In most implementations, local variables and other storage associated
with a called function are allocated on "the stack".
Really? Most implementations I know of actually throw these things
into registers first. Many even throw return addresses into "link
registers". Even on the x86 (a very popular platform), there are at
least *two* stacks (one for floating point, and one for the rest). We
must inhabit different planes of existance.
[...] My understanding
of the phrase "the stack" in this context is exactly the kind of
hardware-based stack I discussed above, something that is not
guaranteed by the standard. The word "the" implies something
specific.

If the phrase "the stack" doesn't carry that implication for you,
that's terrific, but I strongly suspect that it does for most people.
You think most people know assembly language? You really do live in a
bizarre fantasy world.
[...]
So regardless of what the spec says under the heading of gets(), the UB
is inescapable from within the spec, which means the behavior described
in the spec is irrelevant since UB can invoke any behavior.

Ok, getting back to gets().

You've encouraged me to do something I don't believe I've ever done
here. I'm going to defend gets().

[snip]
Your gets() program cannot be similarly fixed, or similarly rely on
analogous guarantees. In order to make gets() work according to what
the standard is optimistically hoping for you *MUST* step outside of
the standard. Thus the standard is trying to specify something that
specifically needs something that isn't (and can't realistically be
put) in the standard. BTW, what does UB mean? Doesn't it mean
arbitrary behavior outside the specification? So trying to format your
hard drive (perhaps successfully, perhaps not) because you stepped
outside the spec

Ok, there's no way within the standard to use gets() safely. Beyond
the question of whether it should be used in any circumstances, it
certainly shouldn't be used in code that's intended to be portable.
portable?!?! What? What has that got to do with anything? It fails
in *every* system. *Every* time its put into a program its wrong.
Only in systems where the input is redirected *and* the system does not
support multitasking can you even build a credible case for a well
defined scenario where it can satisfy the committees fantasies about
how gets() is supposed to behave. Even there, you are relying on
specific platform behavior.
(The sample program I posted was not intended to be portable;
It is if you ignore the UB -- which of course you are. Its only not
portable because UB is not portable. Portability just isn't the issue.
Every platform must fail except by extraordinary intervention (that
can't realistically be called programming).
[...] it was
specifically designed to be used in tightly controlled conditions and
then discarded.)
I thought it was designed for you to post and make a point. If you
actually used it for any reason, besides contradicting earlier
statements you made, it would just be irresponsible.
Not all C code has to be portable. Most C code should be portable,
but most C *programs* are not; they depend on system-specific
features. fopen() can't be successfully called without a valid file
name, and there's no portable way (other than tmpnam()) to generate a
valid file name.
You are confusing platform specific with undefined behavior. Calls to
fopen(), and system() can't be made portable. This is well understood.
This has nothing to do with the situation with gets().
[...] (And yes, fopen() behaves in a well-defined manner
if you give it an invalid file name, which makes it more robust than
gets().)
It makes it well defined. As opposed to gets().
Your argument fails to make this distinction (can you see this?) and by
implication misses the whole point.

I didn't miss the point. I made a different point.
There's no point in there. You can't use non-portability as a
protection for gets(), and that clearly was not the point you were
making.
Having said that, if I were writing such a quick-and-dirty test
program in real life I *still* wouldn't use gets().
And in this case, its not because of any typically wrong reasoning on
your part. You are actually behaving correctly. As would any
programmer that behaved this way. So why is this being specified? The
rationale is not convincing, and in fact is clearly meant as
subterfuge.

A subterfuge? Do you think that the ISO C committee keeps gets() in
the standard for malicious purposes? What is their motivation?
I have no idea *WHY* they do things like that. I just know that they
did it. I mean we *KNOW* that the committee is aware of what the issue
is. But they have gone on record to say that that doesn't matter them
and they are leaving it in, and they've created a "doublespeak" kind of
rationale for their behavior.
[...]
[...] gets() should not be used, and it should be
removed from the standard, or at least formally deprecated.
Implementations should warn about any calls to gets().
So what are you defending?

Just this: Given that gets() is defined by the standard, a conforming
implementation must implement it properly. gets() does not always
invoke undefined behavior. In those cases where it doesn't, it must
behave as specified.
You're a broken record. I have asked and you have not explained the
difference between undefined behavior and sometimes undefined behavior.
Literally you gave an example of a platform and environment specific
way of making the undefined behavior emit some sort of predictable
results. But that's generally exactly the case for every other kind of
UB that you can create as well. So you have not made a distinction,
and thus have not made the case. There is a built-in contradiction of
language in the specification -- they just omit the blatant expression
of that contradiction, even though they cannot excise it from real
manifestations.
But *if* I use it in a manner whose behavior is guaranteed by the
standard, I have every right to expect it to behave as the standard
specifies.
Ok, but the standard *CANNOT* specify that guarantee. It makes a
"chicken before the egg" kind of specification about how gets() works.
It basically says *IF* the call to gets() doesn't invoke UB, then it
reflects some kind of stdin input.

Correct.
But that *IF* cannot be satisfied
by any content in the standard at all. Are you following? Therefore
the standard is not *specifying* a way for gets() to behave in the
optimistic way they are hoping it does.

The standard provides no portable way to use gets() safely.
It provides *NO* way to use gets() safely. Portable or not.
There are *non-portable* ways to use gets() safely.
There are non-portable ways of making every UB safe. *EVERY*. That's
an irrelevant tautology.
C is specifically designed to support both portable and non-portable
programming.
It was *supposed* to be designed to be well defined, regardless of
portability. They specified gets() obviously -- so you have
reinterpret the spec to realize the gets() always invokes UB, to retain
this well definedness property. Your portability argument is just a
red herring.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Sep 7 '06 #80
we******@gmail.com writes:
Keith Thompson wrote:
>we******@gmail.com writes:
[...]
So what are you defending?

Just this: Given that gets() is defined by the standard, a conforming
implementation must implement it properly. gets() does not always
invoke undefined behavior. In those cases where it doesn't, it must
behave as specified.

You're a broken record. I have asked and you have not explained the
difference between undefined behavior and sometimes undefined behavior.
[...]

The difference is the word "sometimes".

I'm done here.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Sep 7 '06 #81
<we******@gmail.comwrote in message
news:11**********************@m73g2000cwd.googlegr oups.com...
You're a broken record. I have asked and you have not explained the
difference between undefined behavior and sometimes undefined behavior.
UB is undefined behaviour, and the implementation can do as it likes.

Sometimes UB is sometimes undefined behaviour, depending on a certain
condition, and the implementation can sometimes do as it likes. When the
condition does not hold, it has behaviour which is specifically defined by
the standard. If the implementation does not implement this behaviour, it is
nonconforming.

Philip

Sep 7 '06 #82

In article <11**********************@p79g2000cwp.googlegroups .com>, we******@gmail.com writes:
Keith Thompson wrote:
we******@gmail.com writes:
Philip Potter wrote:
I did not say that "the C language does not contain variables". (If I
did, please cite the article in which I said that.) I said that the C
standard does not define the term "variable".

The relevant quote:

" [...] It is not obvious what the word "variable" should mean in the
context of C. [...]"
Which is entirely as Keith characterized it, and not as you did.
Either you failed (and continue to fail) to comprehend what he
wrote in that post, or you're being deliberately dense.

It *isn't* obvious what "variable" means in C - at least not to
anyone who understands the language. It's not obvious whether the
term should apply to const-qualified objects, for example, because
practitioners use "variable" to mean a number of things, and only
some of them apply to const-qualified objects.
The semantics of function calling does require some sort of structure
that behaves in a stack-like manner (last-in first-out). On the other
hand, the term "stack" is also commonly used to refer to a particular
data structure implemented in hardware, where a CPU register is
dedicated as a "stack pointer", and the stack grows and shrinks
through contiguous memory addresses.

What the hell are you talking about? If you think "the stack" means a
hardware stack, its because of something in your mind.
Keith didn't say *he* thought the phrase "the stack" necessarily
applied to a hardward stack. He said that phrase is sometimes used
with that meaning. I'll go further and note that often when people
post here asking about "the stack", that's what they have in mind
(for questions like "how can I tell if a variable is on the stack?").

(And "its" is a possessive pronoun; the contraction for "it is" is
"it's". And obviously if Keith thinks something means something
else, it's because of something in his mind, by definition. That's
the faculty we employ when we think and produce meaning.)

Really, Paul, your ability to misconstrue what you read is
remarkable. If you must continue these rants, do try to interpret
something correctly and come up with an actual meaningful argument
- it'd be a refreshing change.
--
Michael Wojcik mi************@microfocus.com

It does basically make you look fat and naked - but you see all this stuff.
-- Susan Hallowell, TSA Security Lab Director, on "backscatter" scanners
Sep 7 '06 #83
Philip Potter wrote:

<snipped>
well-defined behaviour, even if only in unrealistic situations. If gets()
did have completely undefined behaviour, it would be trivial to remove it
from the standard, since none of the programs which use it had defined
behaviour anyway.
AFAIK, UB doesn't include breaking on compilation.

--
goose
Have I offended you? Send flames to root@localhost
real email: lelanthran at gmail dot com
website : www.lelanthran.com
Sep 7 '06 #84
goose <lk************@webmail.co.zawrites:
Philip Potter wrote:
<snipped>
>well-defined behaviour, even if only in unrealistic situations. If gets()
did have completely undefined behaviour, it would be trivial to remove it
from the standard, since none of the programs which use it had defined
behaviour anyway.

AFAIK, UB doesn't include breaking on compilation.
Yes, it does.

C99 3.4.3:

undefined behavior

behavior, upon use of a nonportable or erroneous program construct
or of erroneous data, for which this International Standard
imposes no requirements

NOTE Possible undefined behavior ranges from ignoring the
situation completely with unpredictable results, to behaving
during translation or program execution in a documented manner
characteristic of the environment (with or without the issuance of
a diagnostic message), to terminating a translation or execution
(with the issuance of a diagnostic message).

EXAMPLE An example of undefined behavior is the behavior on
integer overflow.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Sep 7 '06 #85

In article <ed**********@ctb-nnrp2.saix.net>, goose <lk************@webmail.co.zawrites:
Philip Potter wrote:
well-defined behaviour, even if only in unrealistic situations. If gets()
did have completely undefined behaviour, it would be trivial to remove it
from the standard, since none of the programs which use it had defined
behaviour anyway.

AFAIK, UB doesn't include breaking on compilation.
It could, in some cases. If the program's output depends on UB -
which it would, if the program actually invokes UB during execution -
then the program is not strictly conforming and no conforming
implementation has to accept it. (9899-1999 4 #5,#6)

However, I think the following is a strictly-conforming program:

#include <stdio.h>
int main(void)
{
char x;
if (0) gets(x);
return 0;
}

Its output never depends on UB, because the gets() is never
executed. Thus conforming implementations would have to accept
it.

So I think you're correct in the end: even if gets() always
caused UB if it was called, strictly-conforming programs could
include calls to it provided they were never executed, so it
would have to be present at least as an identifier for a
function of the appropriate type.

--
Michael Wojcik mi************@microfocus.com

Global warming is just a theory. This is Intelligent Defrosting. -- "Gregg"
Sep 8 '06 #86
mw*****@newsguy.com (Michael Wojcik) writes:
[...]
However, I think the following is a strictly-conforming program:

#include <stdio.h>
int main(void)
{
char x;
if (0) gets(x);
return 0;
}
Not quite, but I think it is if you change the declaration of x to:

char *x;

*or* change the if statement to:

if (0) gets(&x);

(but not both, obviously).

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Sep 8 '06 #87
Keith Thompson wrote:
If you want to have a non-conforming C implementation that discourages
gets(), just reject any program that calls it. If you want to have a
*conforming* C implementation that discourages gets(), issue a warning
(as gcc does).
In all seriousness, what if someone chose to implement gets along
the (mildly non-conforming) lines of:

char *gets(char *retbuf)
{
return fgets(retbuf, 100, stdin);
}

There would also need to be a bit of code to remove the newline,
of course.

This could be combined with gcc's existing compile-time warning,
so that users whose input was always small would also be alerted
to their problem.
--
Steve Summit
sc*@eskimo.com
Sep 10 '06 #88
sc*@eskimo.com (Steve Summit) writes:
Keith Thompson wrote:
>If you want to have a non-conforming C implementation that discourages
gets(), just reject any program that calls it. If you want to have a
*conforming* C implementation that discourages gets(), issue a warning
(as gcc does).

In all seriousness, what if someone chose to implement gets along
the (mildly non-conforming) lines of:

char *gets(char *retbuf)
{
return fgets(retbuf, 100, stdin);
}

There would also need to be a bit of code to remove the newline,
of course.

This could be combined with gcc's existing compile-time warning,
so that users whose input was always small would also be alerted
to their problem.
Doug Gwyn suggested something similar over in comp.std.c, using BUFSIZ
rather than 100.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Sep 10 '06 #89

In article <ln************@nuthaus.mib.org>, Keith Thompson <ks***@mib.orgwrites:
mw*****@newsguy.com (Michael Wojcik) writes:
[...]
However, I think the following is a strictly-conforming program:

#include <stdio.h>
int main(void)
{
char x;
if (0) gets(x);
return 0;
}

Not quite, but I think it is if you change the declaration of x to:

char *x;

*or* change the if statement to:

if (0) gets(&x);
Posting-without-testing strikes again.

Thanks, Keith.

--
Michael Wojcik mi************@microfocus.com
Sep 11 '06 #90

This discussion thread is closed

Replies have been disabled for this discussion.

By using this site, you agree to our Privacy Policy and Terms of Use.