getc() vs. fgetc()

William L. Bahn

I'm sure this has been asked before, and I have looked in the FAQ, but I'm
looking for an explanation for the following:

The functions pairs:

gets()/fgets()
puts()/fputs()
printf()/fprintf()
scanf()/fscanf()

differ primarily in that the first one assumes stdin/stdout while the second
one works with a stream passed by the programmer. This makes sense and makes
the functions easy to remember.

But then we have:

getc()/fgetc()
putc()/fputc()
getchar()/fgetchar()
putchar()/fputchar()

In each case the pairs of functions perform the same task. This makes it
hard for people that don't use these functions all the time because
everytime they use one they have to look up whether it assumes one of the
standard streams or not. Is there a reason that the standard did not adopt a
consistent (and quite useful) naming convention for these functions?

Nov 14 '05 #1

Subscribe Post Reply

15075

Andrew Palmer

"William L. Bahn" <wi*****@toomuchspam.net> wrote in message
news:10*************@corp.supernews.com...

I'm sure this has been asked before, and I have looked in the FAQ, but I'm
looking for an explanation for the following:

The functions pairs:

gets()/fgets()
puts()/fputs()
printf()/fprintf()
scanf()/fscanf()

differ primarily in that the first one assumes stdin/stdout while the second one works with a stream passed by the programmer. This makes sense and makes the functions easy to remember.

But then we have:

getc()/fgetc()
putc()/fputc()
getchar()/fgetchar()
putchar()/fputchar()

In each case the pairs of functions perform the same task. This makes it
hard for people that don't use these functions all the time because
everytime they use one they have to look up whether it assumes one of the
standard streams or not. Is there a reason that the standard did not adopt a consistent (and quite useful) naming convention for these functions?

Yeah, fgetc() and fputc() make sense, but getc()/putc() should do what
getchar()/putchar() do. It may actually be even less consistant than you're
saying, though. The arguments for fputs() are backwards from fprintf()
(fprintf() must be that way), the arguments for fputs() and fgets() don't
match, and the "f" in fgetchar() and fputchar() doesn't seem to refer to
anything. To (partly) answer your question, the latter two functions are
actually not part of "the standard."

Nov 14 '05 #2

Richard Bos

"William L. Bahn" <wi*****@toomuchspam.net> wrote:

The functions pairs:

gets()/fgets()
Don't use gets(). Ever. It is an irrepairable security hole, because
there is no way to tell it where its buffer stops and it will start to
munge other variable or worse..
puts()/fputs()
printf()/fprintf()
scanf()/fscanf()

differ primarily in that the first one assumes stdin/stdout while the second
one works with a stream passed by the programmer. This makes sense and makes
the functions easy to remember.
That's deceptive, though. It's true in the case of (f)printf() and
(f)scanf(), but puts(s) actually does something subtly different from
fputs(s, stdin);
But then we have:

getc()/fgetc()
putc()/fputc()
getchar()/fgetchar()
putchar()/fputchar()
No, we don't. There is no such thing as fgetchar() and fputchar() in C.
In each case the pairs of functions perform the same task.
No, they don't. fgetc(instream) gets a character from instream.
getc(instream) does the same thing superficially, but it is allowed to
evaluate its parameter more than once. This means that
fgetc(instream[i++]) is safe, but getc(instream[i++]) is not safe; it
might evaluate i++ more than once, even more than once between two
sequence points, and thus cause undefined behaviour. The other side of
the coin is that getc() could be slightly faster than fgetc().
getchar() is equivalent to getc(stdin). Since stdin does not contain
side effects, this is both safe and efficient.
The same thing is true for fputc()/putc()/putchar()/stdout, with the
proviso that putc() is only allowed to evaluate its second argument more
than once; putc(line[i++], outstream) is safe, but putc(i,
outstream[j++]) is not.
This makes it
hard for people that don't use these functions all the time because
everytime they use one they have to look up whether it assumes one of the
standard streams or not. Is there a reason that the standard did not adopt a
consistent (and quite useful) naming convention for these functions?

I presume it was for historical reasons; that is, because it was the way
pre-Standard C implementations usually did it, and changing it would
have broken too much existing code.

Richard

Nov 14 '05 #3

Dan Pop

In <10*************@corp.supernews.com> "William L. Bahn" <wi*****@toomuchspam.net> writes:

But then we have:

getc()/fgetc()
putc()/fputc()
getchar()/fgetchar()
putchar()/fputchar()

In each case the pairs of functions perform the same task. This makes it
hard for people that don't use these functions all the time because
everytime they use one they have to look up whether it assumes one of the
standard streams or not. Is there a reason that the standard did not adopt a
consistent (and quite useful) naming convention for these functions?

The naming convention predates the standard, and it is consistent, even if
it is not obvious to those unfamiliar with the language history.

First, there is no such thing as fgetchar and fputchar, so we're
left only with the getc/fgetc and putc/fputc pairs. The 'f' stands, in
both cases, for "function", which makes perfect sense once you understand
the history of <stdio.h>.

In the pre-ANSI days, getc and putc were typically implemented as macros,
only. This was good enough for most purposes, unless you needed to pass
their address to another function or, for some other reason, needed a
function with the semantics as these macros. So, fgetc and fputc have
been introduced as the function versions of getc and putc.

Things are different in standard C, because each function in the standard
C library must be implemented as a function, even if it is also provided
as a macro. So, you can take the address of getc, or even call the
function version of getc, if you're careful enough to bypass the macro.
Likewise, fgetc and fputc can be provided as macros, too, although I can't
imagine why any implementor might want to do so.

But even today there is a subtle difference between the plain versions and
the f-versions: if implemented as macros, all the functions from the
standard C library are restricted to single evaluation of each of their
parameters. This makes something like putchar(i++) safe: i is guaranteed
to be incremented once, even if putchar is implemented as a macro.
However, there are two exceptions from this rule: getc and putc. If
implemented as macros, they are allowed to evaluate their FILE pointer
parameter (and *only* this parameter) more than once. So, in the
unlikely event that you ever need to call getc/putc with an expression
containing side effects as the FILE pointer argument, use the f-version
instead.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #4

William L. Bahn

"Richard Bos" <rl*@hoekstra-uitgeverij.nl> wrote in message
news:40****************@news.individual.net...

"William L. Bahn" <wi*****@toomuchspam.net> wrote:
The functions pairs:

gets()/fgets()
Don't use gets(). Ever. It is an irrepairable security hole, because
there is no way to tell it where its buffer stops and it will start to
munge other variable or worse..

I understand that. That wasn't the question. I also understand that they
handle the newline character differently and that when you use fgets() to
bring a string you need to also check whether the newline character is at
the end of the string in order to determine if the entire string was read in
versus hitting the count limit. My question wasn't about any of that. All of
those are answered in the FAQ quite extensively. None of this would be any
different if the functions where named Bob and Sue. My question was limited
to a querry about the nameing convention.
puts()/fputs()
printf()/fprintf()
scanf()/fscanf()

differ primarily in that the first one assumes stdin/stdout while the second one works with a stream passed by the programmer. This makes sense and makes the functions easy to remember.
That's deceptive, though. It's true in the case of (f)printf() and
(f)scanf(), but puts(s) actually does something subtly different from
fputs(s, stdin);

I understand that. Again, that is addressed more than adequately by the FAQ.
That wasn't the question. The question only has to do with the naming
convention chosen.

But then we have:

getc()/fgetc()
putc()/fputc()
getchar()/fgetchar()
putchar()/fputchar()
No, we don't. There is no such thing as fgetchar() and fputchar() in C.

I'll take your word for that. They are in every stdio.h I have every worked
with, but that's not a very long list. In fact I just looked at the
portability list for those arguments and see that they are not listed as
being ANSI compliant. So thanks, even though it still doesn't answer the
question that was asked.

In each case the pairs of functions perform the same task.
No, they don't. fgetc(instream) gets a character from instream.
getc(instream) does the same thing superficially, but it is allowed to
evaluate its parameter more than once. This means that
fgetc(instream[i++]) is safe, but getc(instream[i++]) is not safe; it
might evaluate i++ more than once, even more than once between two
sequence points, and thus cause undefined behaviour. The other side of
the coin is that getc() could be slightly faster than fgetc().
getchar() is equivalent to getc(stdin). Since stdin does not contain
side effects, this is both safe and efficient.
The same thing is true for fputc()/putc()/putchar()/stdout, with the
proviso that putc() is only allowed to evaluate its second argument more
than once; putc(line[i++], outstream) is safe, but putc(i,
outstream[j++]) is not.

Although it still doesn't answer the question that was asked, this is
definitely completely new information to me and so I appreciate it. I'm
trying to picture the process flow that allows a function to evaluate its
parameters more than once and can't. While I know that this is
implementation specific, my mental picture of the process is that the
parameters are evaluated and the resulting values are placed on a stack.
Necessary context is then saved and program control is turned over to the
function's code that then accesses the evaluated values of the parameters
from the stack based on the value of the stack pointer. After doing whatever
it wants to with those values, at calculates a return value (if any), pops
all of the arguments off the stack, and places that value on the stack and
returns control to the calling function that then pops the return value from
the stack placing the stack back to its original condition prior to the
function call.

I would be greatful for an alternate picture that allows multiple
evaluations of a function's arguments for a single call to the function.

This makes it
hard for people that don't use these functions all the time because
everytime they use one they have to look up whether it assumes one of the standard streams or not. Is there a reason that the standard did not adopt a consistent (and quite useful) naming convention for these functions?
I presume it was for historical reasons; that is, because it was the way
pre-Standard C implementations usually did it, and changing it would
have broken too much existing code.

This is my general assumption, but I'm hoping that, like the implied type
casting of chars and shorts to ints with certain functions for compatibility
with legacy code, that I can get a more definite answer. In particular, if
there was a reason that 'f' was used to distinguish getc() from fgetc(). If
the performance you mentioned was the other way around I could almost see
the two being getc() and fast_gets(), but of course that would only be a
guess on my part and not what I am looking for.

Thanks.

Richard

Nov 14 '05 #5

William L. Bahn

"Dan Pop" <Da*****@cern.ch> wrote in message
news:cd**********@sunnews.cern.ch...

In <10*************@corp.supernews.com> "William L. Bahn" <wi*****@toomuchspam.net> writes:
But then we have:

getc()/fgetc()
putc()/fputc()
getchar()/fgetchar()
putchar()/fputchar()

In each case the pairs of functions perform the same task. This makes it
hard for people that don't use these functions all the time because
everytime they use one they have to look up whether it assumes one of the
standard streams or not. Is there a reason that the standard did not adopt aconsistent (and quite useful) naming convention for these functions?
The naming convention predates the standard, and it is consistent, even if
it is not obvious to those unfamiliar with the language history.

First, there is no such thing as fgetchar and fputchar, so we're
left only with the getc/fgetc and putc/fputc pairs. The 'f' stands, in
both cases, for "function", which makes perfect sense once you understand
the history of <stdio.h>.

THANK YOU!!!

It still seems slopply to use the 'f' prefix for more than one thing in
functions that are in the same library and have so much surface similarity
to each otehr But I can see that the original developers could easily have
been so close to the material that they didn't see it that way - at least
not when it mattered. The same with the location of the FILE * in several of
the functions. It would have been nice had they seen, in time, the utility
of adopting a convention that said that the FILE * will always go first in
those functions that use it (since going last would create unnecessary
overhead in variable length functions such as fprintf()).

In the pre-ANSI days, getc and putc were typically implemented as macros,
only. This was good enough for most purposes, unless you needed to pass
their address to another function or, for some other reason, needed a
function with the semantics as these macros. So, fgetc and fputc have
been introduced as the function versions of getc and putc.

Things are different in standard C, because each function in the standard
C library must be implemented as a function, even if it is also provided
as a macro. So, you can take the address of getc, or even call the
function version of getc, if you're careful enough to bypass the macro.
Likewise, fgetc and fputc can be provided as macros, too, although I can't
imagine why any implementor might want to do so.

But even today there is a subtle difference between the plain versions and
the f-versions: if implemented as macros, all the functions from the
standard C library are restricted to single evaluation of each of their
parameters. This makes something like putchar(i++) safe: i is guaranteed
to be incremented once, even if putchar is implemented as a macro.
However, there are two exceptions from this rule: getc and putc. If
implemented as macros, they are allowed to evaluate their FILE pointer
parameter (and *only* this parameter) more than once. So, in the
unlikely event that you ever need to call getc/putc with an expression
containing side effects as the FILE pointer argument, use the f-version
instead.

Thank you very much. This makes sense. As I said in another post, I'm having
a hard time picturing the process flow that makes multiple evaluation of
function's parameter possible. Could you describe that in more detail.

Thanks.
Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #6

Keith Thompson

rl*@hoekstra-uitgeverij.nl (Richard Bos) writes:

"William L. Bahn" <wi*****@toomuchspam.net> wrote:

[...]

puts()/fputs()
printf()/fprintf()
scanf()/fscanf()

differ primarily in that the first one assumes stdin/stdout while
the second one works with a stream passed by the programmer. This
makes sense and makes the functions easy to remember.

That's deceptive, though. It's true in the case of (f)printf() and
(f)scanf(), but puts(s) actually does something subtly different from
fputs(s, stdin);

I think you mean fputs(s, stdout).

The difference, of course, is that puts(s) appends a newline; I
wouldn't call that a subtle difference.

[...]

In each case the pairs of functions perform the same task.

No, they don't. fgetc(instream) gets a character from instream.
getc(instream) does the same thing superficially, but it is allowed to
evaluate its parameter more than once. This means that
fgetc(instream[i++]) is safe, but getc(instream[i++]) is not safe; it
might evaluate i++ more than once, even more than once between two
sequence points, and thus cause undefined behaviour. The other side of
the coin is that getc() could be slightly faster than fgetc().
getchar() is equivalent to getc(stdin). Since stdin does not contain
side effects, this is both safe and efficient.
The same thing is true for fputc()/putc()/putchar()/stdout, with the
proviso that putc() is only allowed to evaluate its second argument more
than once; putc(line[i++], outstream) is safe, but putc(i,
outstream[j++]) is not.

I'd say that they do perform the same task, but with slightly
different semantics. It depends, I suppose, on how loosely you want
to define the phrase "perform the same task", but William's statement
seems perfectly reasonable to me.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 14 '05 #7

Keith Thompson

"William L. Bahn" <wi*****@toomuchspam.net> writes:
[...]

Although it still doesn't answer the question that was asked, this is
definitely completely new information to me and so I appreciate it. I'm
trying to picture the process flow that allows a function to evaluate its
parameters more than once and can't. While I know that this is
implementation specific, my mental picture of the process is that the
parameters are evaluated and the resulting values are placed on a stack.
Necessary context is then saved and program control is turned over to the
function's code that then accesses the evaluated values of the parameters
from the stack based on the value of the stack pointer. After doing whatever
it wants to with those values, at calculates a return value (if any), pops
all of the arguments off the stack, and places that value on the stack and
returns control to the calling function that then pops the return value from
the stack placing the stack back to its original condition prior to the
function call.

I would be greatful for an alternate picture that allows multiple
evaluations of a function's arguments for a single call to the function.

If they evaluate their arguments more than once, it's because they're
implemented as macros.

Any library function can be implemented as a macro in addition to its
declaration as a function, but with the restriction that the macro
cannot evaluate any of its arguments more than once. For putc() and a
few other functions, the implementation is given special permission to
use a macro that does evaluate its stream argument more than once;
this allows for a more efficient implementation.

Here's a definition of the putc() macro on one system (obviously
this is non-portable):

#define putc(x, p) (--(p)->_cnt < 0 ? __flsbuf((x), (p)) \
: (int)(*(p)->_ptr++ = (unsigned char) (x)))

As long as the output buffer isn't full, an invocation of putc() can
store a character directly in the output buffer without the overhead
of a function call. The stream argument is rarely going to be an
expression with side effects anyway, so evaluating it more than once
will rarely matter, but if the standard didn't explicitly permit it an
implementation would have to correctly support calls where the second
argument does have side effects.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 14 '05 #8

lawrence.jones

William L. Bahn <wi*****@toomuchspam.net> wrote:

The same with the location of the FILE * in several of
the functions. It would have been nice had they seen, in time, the utility
of adopting a convention that said that the FILE * will always go first in
those functions that use it (since going last would create unnecessary
overhead in variable length functions such as fprintf()).

Ah, but there was method in that madness, too. By putting the stream
argument at the end, it could be made optional with the default being
the appropriate standard stream (stdin or stdout). Fortunately, that
idea didn't hang around very long, but the order of the arguments did.

-Larry Jones

They say winning isn't everything, and I've decided
to take their word for it. -- Calvin

Nov 14 '05 #9

William L. Bahn

THANKS!

I was trying to think of a way a macro could evaluate an argument more than
once and the obvious answer just didn't make itself obvious to me:

#define sq(x,y) ( (x)*(x) )

evaluates it more than once and hence would have problems with:

d = sq( x*=2 );

What's the general way of handling something like this where you need to use
the value more than once? Using pow() is not the answer because that would
only work in a case similar to this one and I'm looking for a general way.

Can you do something like:

#define sq(x) {double u; u=(x); u*u}

That won't work because you can't do:

y = {3}; // Curly braces, not parens

But is there some trick that would let you do the same idea?

"Keith Thompson" <ks***@mib.org> wrote in message
news:ln************@nuthaus.mib.org...

"William L. Bahn" <wi*****@toomuchspam.net> writes:
[...]
Although it still doesn't answer the question that was asked, this is
definitely completely new information to me and so I appreciate it. I'm
trying to picture the process flow that allows a function to evaluate its parameters more than once and can't. While I know that this is
implementation specific, my mental picture of the process is that the
parameters are evaluated and the resulting values are placed on a stack.
Necessary context is then saved and program control is turned over to the function's code that then accesses the evaluated values of the parameters from the stack based on the value of the stack pointer. After doing whatever it wants to with those values, at calculates a return value (if any), pops all of the arguments off the stack, and places that value on the stack and returns control to the calling function that then pops the return value from the stack placing the stack back to its original condition prior to the
function call.

I would be greatful for an alternate picture that allows multiple
evaluations of a function's arguments for a single call to the function.
If they evaluate their arguments more than once, it's because they're
implemented as macros.

Any library function can be implemented as a macro in addition to its
declaration as a function, but with the restriction that the macro
cannot evaluate any of its arguments more than once. For putc() and a
few other functions, the implementation is given special permission to
use a macro that does evaluate its stream argument more than once;
this allows for a more efficient implementation.

Here's a definition of the putc() macro on one system (obviously
this is non-portable):

#define putc(x, p) (--(p)->_cnt < 0 ? __flsbuf((x), (p)) \
: (int)(*(p)->_ptr++ = (unsigned char)

(x)))
As long as the output buffer isn't full, an invocation of putc() can
store a character directly in the output buffer without the overhead
of a function call. The stream argument is rarely going to be an
expression with side effects anyway, so evaluating it more than once
will rarely matter, but if the standard didn't explicitly permit it an
implementation would have to correctly support calls where the second
argument does have side effects.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst> San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst> We must do something. This is something. Therefore, we must do this.

Nov 14 '05 #10

Dan Pop

In <10*************@corp.supernews.com> "William L. Bahn" <wi*****@toomuchspam.net> writes:

It still seems slopply to use the 'f' prefix for more than one thing in
functions that are in the same library and have so much surface similarity
to each otehr But I can see that the original developers could easily have
been so close to the material that they didn't see it that way - at least
not when it mattered.

The C's "standard" library was not the work of a restricted set of
developers, it grew more or less chaotically, which explains many of its
inconsistencies. There is no mention of fgetc/fputc functions in K&R1,
so they're probably a later addition to the library.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #11

William L. Bahn

Is there anyone that wants to take a crack at this.

I guess what I'm looking for is some way to have temporary, local storage
in a macro. A type-independent way would be nice, but even it is has to
be type-specific that would be fine.

"William L. Bahn" <wi*****@toomuchspam.net> wrote in message
news:10*************@corp.supernews.com...

THANKS!

I was trying to think of a way a macro could evaluate an argument more than once and the obvious answer just didn't make itself obvious to me:

#define sq(x,y) ( (x)*(x) )

evaluates it more than once and hence would have problems with:

d = sq( x*=2 );

What's the general way of handling something like this where you need to use the value more than once? Using pow() is not the answer because that would
only work in a case similar to this one and I'm looking for a general way.

Can you do something like:

#define sq(x) {double u; u=(x); u*u}

That won't work because you can't do:

y = {3}; // Curly braces, not parens

But is there some trick that would let you do the same idea ?

Nov 14 '05 #12

Dan Pop

In <10*************@corp.supernews.com> "William L. Bahn" <wi*****@toomuchspam.net> writes:

Is there anyone that wants to take a crack at this.

I guess what I'm looking for is some way to have temporary, local storage
in a macro. A type-independent way would be nice, but even it is has to
be type-specific that would be fine.

The answer is no, if the macro has to return a value. GNU C provides the
required extensions for both type-generic local declarations and for
returning a value from a block.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #13

Dave Thompson

On Wed, 14 Jul 2004 22:26:52 -0600, "William L. Bahn"
<wi*****@toomuchspam.net> wrote:

#define sq(x,y) ( (x)*(x) )

evaluates [its arg] more than once and hence would have problems with:

d = sq( x*=2 );
Or x++, which is usually preferred for demonstrating this problem.
Can you do something like:

#define sq(x) {double u; u=(x); u*u}

Not in standard C; you can do almost this as an extension in GNU C,
aka the language implemented by gcc; see "statement expressions".

The closest you can come in standard/portable C is
#define foo(x) (temp = x, temp*temp /* or whatever */)
/* or temp = (x) if you like to be consistent with other macros */
where a variable temp (or other fixed name) of correct or at least
suitable type must be in scope everyplace you want to use the macro.
It's simple to make the variable "global" or at least file scope,
using a more involved name to avoid any conflict; but not threadsafe,
though standard C doesn't have threads anyway, and not reentrant, if
you combine macros with functions to produce the recursion a macro
can't do directly. Alternatively require that the caller provide it,
preferably locally, which is a nuisance and clutter.

Or, not really a trick, just make it a function defined (by #include
if convenient) before any use in the/each source file, probably
static=internal and inline if C99 or gcc or some others; it is very
likely though not guaranteed you will get the same actual code as you
would have for the macro if you could write it.

- David.Thompson1 at worldnet.att.net

Nov 14 '05 #14

getc() vs. fgetc()

Similar topics