fgets() replacement

Paul D. Boyle

Hi all,

There was a recent thread in this group which talked about the
shortcomings of fgets(). I decided to try my hand at writing a
replacement for fgets() using fgetc() and realloc() to read a line of
arbitrary length. I know that the better programmers in this group could
write a more robust function, but here is my shot at it anyway.
I would appreciate people's comments on my fget_line() code below
(usage example included). Any constructive criticism welcome regarding
logic, design, style, etc. Thanks.

Paul

/* fget_line(): a function to read a line of input of arbitrary length.
*
* Arguments:
* 'in' -- the input stream from which data is wanted.
* 'buf' -- the address of a pointer to char. The read in results
* will be contained in this buffer after the fget_line returns.
* *** THE CALLER MUST FREE THIS POINTER ***
* 'sz' -- the caller can supply an estimate of the length of line to be
* read in. If this argument is 0, then fget_line() uses a
* default.
* 'validate' -- a user supplied callback function which is used to validate
* each input character. This argument may be NULL in which
* case no input validation is done.
*
* RETURN values:
* fget_line() on success: returns the number of bytes read
* realloc() related failure: returns -1 (#define'd below as ERROR_MEMORY)
* illegal input: returns -2 (#define'd below as ERROR_ILLEGAL_CHAR)
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <errno.h>

#ifndef USER_DIALOG_H /* if I am not using this as part of my
* 'user_dialog' utilities.
*/
#define LINE_LEN 80
#define DELIMITER '\n'
#define ERROR_MEMORY (-1)
#define ERROR_ILLEGAL_CHAR (-2)
#else
#include "user_dialog.h"
#endif

int fget_line( FILE *in, char **buf, size_t sz, int (*validate)(int) )
{
int n_bytes = 0; /* total number of bytes read or error flag */
size_t n_alloc = 0; /* number of bytes allocated */

unsigned int mult = 1;

char *tmp, *local_buffer = NULL;
int read_in;
*buf = NULL;

if( 0 == sz ) sz = LINE_LEN;

while( (read_in = fgetc( in )) != DELIMITER && read_in != EOF ) {
if( 0 == n_alloc ) {
n_alloc = sz * mult + n_bytes + 1;
tmp = realloc( local_buffer, n_alloc );
if ( NULL != tmp ) {
local_buffer = tmp;
mult++;
}
else {
local_buffer[n_bytes] = '\0';
*buf = local_buffer;
return ERROR_MEMORY;
}

}
if( NULL != validate ) {
if( 0 != validate( read_in ) ) {
local_buffer[n_bytes++] = read_in;
n_alloc--;
}
else {
local_buffer[n_bytes] = '\0';
*buf = local_buffer;
return ERROR_ILLEGAL_CHAR;
}
}

}

local_buffer[n_bytes] = '\0';

/* trim excess memory if any */
if( n_alloc > (size_t)n_bytes ) {
tmp = realloc( local_buffer, n_bytes );
if( NULL != tmp ) {
local_buffer = tmp;
}
}

local_buffer[n_bytes] = '\0';
*buf = local_buffer;
return n_bytes;
}

/* usage example */
int main( void )
{
char *line = NULL;
int ret_value;
size_t len;

fputs( "Enter a string: ", stdout );
fflush( stdout );

ret_value = fget_line( stdin, &line, 0, isalnum );
len = strlen( line );

fprintf( stdout, "fget_line() returned %d\nstrlen() returns %d bytes\n",
ret_value, len );
fprintf( stdout, "String is: \"%s\"\n", line );
free( line );
exit( EXIT_SUCCESS );
}

--
Paul D. Boyle
bo***@laue.chem.ncsu.edu
North Carolina State University
http://www.xray.ncsu.edu

Nov 14 '05 #1

Subscribe Post Reply

5589

Malcolm

"Paul D. Boyle" <bo***@laue.chem.ncsu.edu> wrote

Hi all,

There was a recent thread in this group which talked about the
shortcomings of fgets(). I decided to try my hand at writing a
replacement for fgets() using fgetc() and realloc() to read a line of
arbitrary length.

Why not look a CB Falconer's "ggets()"?

<http://cbfalconer.home.att.net>

or email him on

Chuck F (cb********@yahoo.com)

Do you think your way of doing things is better?

Nov 14 '05 #2

Paul D. Boyle

Malcolm <ma*****@55bank.freeserve.co.uk> wrote:

: "Paul D. Boyle" <bo***@laue.chem.ncsu.edu> wrote
:> Hi all,
:>
:> There was a recent thread in this group which talked about the
:> shortcomings of fgets(). I decided to try my hand at writing a
:> replacement for fgets() using fgetc() and realloc() to read a line of
:> arbitrary length.
:>
: Why not look a CB Falconer's "ggets()"?

I did, but I wanted to try doing it (mostly for the heck of it) using
fgetc() and realloc().

: Do you think your way of doing things is better?

I don't see it as a matter of better. I wrote a function which did the
things I thought would make a safe and useful function, and I wanted
other people's opinion of what I had done. In particular, in fget_line(),
I provided a way to do some input validation. Was that a good and useful
design decision?

Paul

--
Paul D. Boyle
bo***@laue.chem.ncsu.edu
North Carolina State University
http://www.xray.ncsu.edu

Nov 14 '05 #3

Paul D. Boyle

Paul D. Boyle <bo***@laue.chem.ncsu.edu> wrote:
: if( NULL != validate ) {
: if( 0 != validate( read_in ) ) {
: local_buffer[n_bytes++] = read_in;
: n_alloc--;
: }
: else {
: local_buffer[n_bytes] = '\0';
: *buf = local_buffer;
: return ERROR_ILLEGAL_CHAR;
: }
: }
:
: }

: local_buffer[n_bytes] = '\0';

Naturally, I have to discover a little(?) *after* I post to
comp.lang.c. (grrr). I am missing an 'else' block to cover the case
where the 'validate' function pointer is NULL. The above code should be:

if( NULL != validate ) {
if( 0 != validate( read_in ) ) {
local_buffer[n_bytes++] = read_in;
n_alloc--;
}
else {
local_buffer[n_bytes] = '\0';
*buf = local_buffer;
return ERROR_ILLEGAL_CHAR;
}
}
else {
local_buffer[n_bytes++] = read_in;
n_alloc--;
}

}

local_buffer[n_bytes] = '\0';

/* and so on ... */

Paul
--
Paul D. Boyle
bo***@laue.chem.ncsu.edu
North Carolina State University
http://www.xray.ncsu.edu

Nov 14 '05 #4

Eric Sosman

Paul D. Boyle wrote:

Hi all,

There was a recent thread in this group which talked about the
shortcomings of fgets(). I decided to try my hand at writing a
replacement for fgets() using fgetc() and realloc() to read a line of
arbitrary length. I know that the better programmers in this group could
write a more robust function, but here is my shot at it anyway.
I would appreciate people's comments on my fget_line() code below
(usage example included). Any constructive criticism welcome regarding
logic, design, style, etc. Thanks.

Paul

/* fget_line(): a function to read a line of input of arbitrary length.
*
* Arguments:
* 'in' -- the input stream from which data is wanted.
* 'buf' -- the address of a pointer to char. The read in results
* will be contained in this buffer after the fget_line returns.
* *** THE CALLER MUST FREE THIS POINTER ***
* 'sz' -- the caller can supply an estimate of the length of line to be
* read in. If this argument is 0, then fget_line() uses a
* default.
* 'validate' -- a user supplied callback function which is used to validate
* each input character. This argument may be NULL in which
* case no input validation is done.
*
* RETURN values:
* fget_line() on success: returns the number of bytes read
* realloc() related failure: returns -1 (#define'd below as ERROR_MEMORY)
* illegal input: returns -2 (#define'd below as ERROR_ILLEGAL_CHAR)
*/
First criticism: The function does too much. This is, of
course, a matter of taste, but if the goal is "a replacement
for fgets()" I think the validate() business is extraneous.
(Even the `sz' parameter raises my eyebrows a little, albeit
not a lot.)

IMHO, a low-level library function should do one thing,
do it well, and do it in a manner that facilitates combining
it with other functions to create grander structures. Or as
my old fencing coach used to admonish me when I got overenthused
with tricky multiple-feint combinations: "Keep It Simple, Stupid."

By the way, you've described the purpose of validate() but
not how it is supposed to operate. What value(s) should it
return to cause fget_line() to take this or that action? To
find this out one must read the code of fget_line() -- and that,
I think, is a poor form for documentation.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <errno.h>

#ifndef USER_DIALOG_H /* if I am not using this as part of my
* 'user_dialog' utilities.
*/
Here, I think, is another clue that the design leans too
far towards the baroque. In effect, USER_DIALOG_H and the
associated macros become part of the function's specification.
That specification now encompasses one return value encoding
three distinguishable states, four function arguments (two
with usage rules not expressible to the compiler), and five
macros. That strikes me as *way* too much for "a replacement
for fgets()."

("All right, Smarty Pants, how would *you* do it?")

Fair enough. Everybody, it seems, writes an fgets()
replacement eventually, and here's the .h text for mine:

char *getline(FILE *stream);
/*
* Reads a complete line from an input stream, stores
* it and a NUL terminator in an internal buffer, and
* returns a pointer to the start of the buffer. Returns
* NULL if end-of-file occurs before any characters are
* read, or if an I/O error occurs at any time, or if
* unable to allocate buffer memory (in the latter cases,
* any characters read before the I/O error or allocation
* failure are lost). If the argument is NULL, frees the
* internal buffer and returns NULL.
*
* A "complete line" consists of zero or more non-newline
* characters followed by a newline, or one or more non-
* newline characters followed by EOF.
*/

Now, I'm not saying that this is the only way to replace fgets().
I'm not even claiming it's the "best" way; some choices have been
made that could just as well have been made differently. The
point is to show that the specification can be a whole lot sparer
than for fget_line() and still be useful. (In fact, my original
getline() was sparer still: that "discard the buffer on a NULL
argument" gadget was warted on afterwards. It's ugly and very
little used; I may decide to go back to the simpler form.)

Now, on to the implementation itself.
#define LINE_LEN 80
#define DELIMITER '\n'
#define ERROR_MEMORY (-1)
#define ERROR_ILLEGAL_CHAR (-2)
Pointless parentheses.
#else
#include "user_dialog.h"
#endif

int fget_line( FILE *in, char **buf, size_t sz, int (*validate)(int) )
{
int n_bytes = 0; /* total number of bytes read or error flag */
size_t n_alloc = 0; /* number of bytes allocated */
This stands out as an Odd Thing: You're using a `size_t' to
keep track of the allocated buffer's size, but a mere `int' to
count the characters therein. Ah, yes: You also want to return
negative values to indicate errors! But that doesn't excuse the
type of `n_bytes', because the error codes are never stored in
it; they're always transmitted as part of a `return' statement.

... in connection with which, I wonder about the wisdom of
using an `int' as the function's value. Maybe a `long' would
be better? At any rate, if you feel you must use `int' you
should at least guard against lines longer than INT_MAX.
unsigned int mult = 1;

char *tmp, *local_buffer = NULL;
int read_in;
*buf = NULL;

if( 0 == sz ) sz = LINE_LEN;

while( (read_in = fgetc( in )) != DELIMITER && read_in != EOF ) {
Why fgetc() instead of getc()? In this instance they're
functionally equivalent, but getc() is likely to have less
overhead.
if( 0 == n_alloc ) {
n_alloc = sz * mult + n_bytes + 1;
tmp = realloc( local_buffer, n_alloc );
if ( NULL != tmp ) {
local_buffer = tmp;
mult++;
}
else {
local_buffer[n_bytes] = '\0';
Undefined behavior if the very first realloc() fails,
because `local_buffer' will still be NULL.
*buf = local_buffer;
return ERROR_MEMORY;
}

}
if( NULL != validate ) {
if( 0 != validate( read_in ) ) {
local_buffer[n_bytes++] = read_in;
n_alloc--;
}
else {
local_buffer[n_bytes] = '\0';
*buf = local_buffer;
return ERROR_ILLEGAL_CHAR;
}
}
You mentioned that `validate' could be given as NULL,
but somehow you didn't mention that doing so would suppress
*all* the input ...
}

local_buffer[n_bytes] = '\0';
Undefined behavior if you get EOF or DELIMITER on the
very first fgetc(), because `local_buffer' will still be
NULL.
/* trim excess memory if any */
if( n_alloc > (size_t)n_bytes ) {
I think this test is wrong: You've been decrementing
`n_alloc' with each character stored, so it is no longer
the size of the allocated area. (The record-keeping of
sizes in this function seems to involve a lot more work
than is really necessary. Two variables should suffice
for the job; fget_line() uses four.)
tmp = realloc( local_buffer, n_bytes );
if( NULL != tmp ) {
local_buffer = tmp;
}
}

local_buffer[n_bytes] = '\0';
What, again? Didn't we already do this, just a few
lines ago? Oh, wait, it's different this time:

Undefined behavior if the "trim excess" realloc()
*succeeds*, because it writes beyond the end of the memory
pointed to by `local_buffer'.
*buf = local_buffer;
return n_bytes;
}

General impressions: The design is overcomplicated, the
implementation is more intricate than even the complex
design requires, insufficient attention has been paid to
boundary conditions, and insufficient testing has been done.

"Simplify! Simplify!" -- H.D. Thoreau

--
Er*********@sun.com

Nov 14 '05 #5

CBFalconer

"Paul D. Boyle" wrote:

Malcolm <ma*****@55bank.freeserve.co.uk> wrote:
: "Paul D. Boyle" <bo***@laue.chem.ncsu.edu> wrote
:>
:> There was a recent thread in this group which talked about the
:> shortcomings of fgets(). I decided to try my hand at writing a
:> replacement for fgets() using fgetc() and realloc() to read a
:> line of arbitrary length.
:>
: Why not look a CB Falconer's "ggets()"?

I did, but I wanted to try doing it (mostly for the heck of it)
using fgetc() and realloc().

: Do you think your way of doing things is better?

I don't see it as a matter of better. I wrote a function which
did the things I thought would make a safe and useful function,
and I wanted other people's opinion of what I had done. In
particular, in fget_line(), I provided a way to do some input
validation. Was that a good and useful design decision?

Thanks, Malcolm, for the kind words. The mail address you gave
will reach my spam trap. The URL is good.

My objective was to simplify the calling sequence as far as
possible. So I didn't look at yours in detail. Your validation
idea may well be useful in some areas. I don't think the
preliminary size estimate is worthwhile, but that is just an
opinion.

If I were creating a routine with such input validation, I would
probably simply pass it a routine to input a char, say "rdchar()",
returning EOF on error or invalid. I have grave doubts that such
will be useful in string input. For stream conversion to integer,
real, etc. the tests belong in the conversion function. Again,
IMO.

--
fix (vb.): 1. to paper over, obscure, hide from public view; 2.
to work around, in a way that produces unintended consequences
that are worse than the original problem. Usage: "Windows ME
fixes many of the shortcomings of Windows 98 SE". - Hutchison

Nov 14 '05 #6

Dan Pop

In <c9**********@news8.svr.pol.co.uk> "Malcolm" <ma*****@55bank.freeserve.co.uk> writes:

"Paul D. Boyle" <bo***@laue.chem.ncsu.edu> wrote
Hi all,

There was a recent thread in this group which talked about the
shortcomings of fgets(). I decided to try my hand at writing a
replacement for fgets() using fgetc() and realloc() to read a line of
arbitrary length.

Why not look a CB Falconer's "ggets()"?

Because this is the kind of wheel most programmers prefer to reinvent
on their own. I'm still using scanf and friends, since I have yet to
write an application where getting arbitrarily long input lines makes
sense.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #7

Eric Sosman

Dan Pop wrote:

In <c9**********@news8.svr.pol.co.uk> "Malcolm" <ma*****@55bank.freeserve.co.uk> writes:

"Paul D. Boyle" <bo***@laue.chem.ncsu.edu> wrote
Hi all,

There was a recent thread in this group which talked about the
shortcomings of fgets(). I decided to try my hand at writing a
replacement for fgets() using fgetc() and realloc() to read a line of
arbitrary length.

Why not look a CB Falconer's "ggets()"?

Because this is the kind of wheel most programmers prefer to reinvent
on their own. I'm still using scanf and friends, since I have yet to
write an application where getting arbitrarily long input lines makes
sense.

Arbitrarily long input lines are quite likely senseless.
But the problem's really the other side of the issue: Arbitrarily
*short* input lines -- meaning, "Input lines artificially truncated
to a length J. Random Programmer chose at compile time" -- are
not very sensible, either. (Off-topic aside: look up "curtation"
in "The Computer Contradictionary," or Google for "MOZDONG." These
are limitations on output rather than input, but the idea is similar.)

The utility of an fgets() substitute/wrapper/whatever isn't
that one is now free to read "lines" of umpty-skillion gigabytes,
but that one can stop worrying about the line length altogether.

--
Er*********@sun.com

Nov 14 '05 #8

Dan Pop

In <40**************@sun.com> Eric Sosman <Er*********@sun.com> writes:

Dan Pop wrote:
In <c9**********@news8.svr.pol.co.uk> "Malcolm" <ma*****@55bank.freeserve.co.uk> writes:
"Paul D. Boyle" <bo***@laue.chem.ncsu.edu> wrote

Hi all,

There was a recent thread in this group which talked about the
shortcomings of fgets(). I decided to try my hand at writing a
replacement for fgets() using fgetc() and realloc() to read a line of
arbitrary length.

Why not look a CB Falconer's "ggets()"?
Because this is the kind of wheel most programmers prefer to reinvent
on their own. I'm still using scanf and friends, since I have yet to
write an application where getting arbitrarily long input lines makes
sense.

Arbitrarily long input lines are quite likely senseless.
But the problem's really the other side of the issue: Arbitrarily
*short* input lines -- meaning, "Input lines artificially truncated
to a length J. Random Programmer chose at compile time" -- are
not very sensible, either.

Most of the time, there are perfectly sensible limits that can be imposed
on the user input. As a matter of fact, I have yet to see a
counterexample. And when the user input is obtained interactively, the
user can be warned of those limits, in the text prompting him to
provide the input.

Of course, there is always the option of treating a line longer than the
limit as erroneous input and completely rejecting it, rather than
truncating it. It up to the programmer to decide what makes more sense
in the presence of nonsensical input...
The utility of an fgets() substitute/wrapper/whatever isn't
that one is now free to read "lines" of umpty-skillion gigabytes,
but that one can stop worrying about the line length altogether.

But it can be trivially abused into reading umpty-skillion gigabytes,
unless it imposes a limit ;-)

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #9

Malcolm

"Dan Pop" <Da*****@cern.ch> wrote in message

Most of the time, there are perfectly sensible limits that can be
imposed on the user input. As a matter of fact, I have yet to see a
counterexample. And when the user input is obtained interactively, > the user can be warned of those limits, in the text prompting him to provide the input.
What limit should be imposed on a line of BASIC? Most human-readable code is
under 100 characters, but some code might be machine-generated, and someone
might add a long string as a single line.
But it can be trivially abused into reading umpty-skillion gigabytes,
unless it imposes a limit ;-)

My BASIC interpreter uses a recursive-decent parser, so very long
expressions could overflow the stack. There is actually a case for imposing
a line limit, though of course stack size / usage is hard to determine.

Nov 14 '05 #10

James Kanze

"Malcolm" <ma*****@55bank.freeserve.co.uk> writes:

|> "Dan Pop" <Da*****@cern.ch> wrote in message

|> > Most of the time, there are perfectly sensible limits that can be
|> > imposed on the user input. As a matter of fact, I have yet to see
|> > a counterexample. And when the user input is obtained
|> > interactively, the user can be warned of those limits, in the text
|> > prompting him to provide the input.

|> What limit should be imposed on a line of BASIC? Most human-readable
|> code is under 100 characters, but some code might be
|> machine-generated, and someone might add a long string as a single
|> line.

Take a look at the sources of any web page sometime. Many of the web
page editors put an entire paragraph on a single line.

--
James Kanze
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

Nov 14 '05 #11

red floyd

Eric Sosman <Er*********@sun.com> wrote in message news:<40**************@sun.com>...

[extremely long and well reasoned post redacted]
#define ERROR_MEMORY (-1)
#define ERROR_ILLEGAL_CHAR (-2)

Pointless parentheses.

Actually, no. It avoids errors in the case of (admittedly bad)

ch=ERROR_MEMORY;

Which would expand to

ch=-1;

I don't know if the standard formally made "=-" illegal, or kept the
K&R deprecation, but with older compilers, this might be an issue.

In general, I tend to make sure that any macros that include a - or a
~ are parenthesized, to avoid issues with macro expansion and operator
precedence.

Nov 14 '05 #12

Eric Sosman

red floyd wrote:

Eric Sosman <Er*********@sun.com> wrote in message news:<40**************@sun.com>...
[extremely long and well reasoned post redacted]
#define ERROR_MEMORY (-1)
#define ERROR_ILLEGAL_CHAR (-2)
Pointless parentheses.

Actually, no. It avoids errors in the case of (admittedly bad)

ch=ERROR_MEMORY;

Which would expand to

ch=-1;

Not in a Standard-conforming compiler. The preprocessor
sees the original line as containing four preprocessing tokens

ch
=
ERROR_MEMORY
;

After macro substitution there are five (if the parentheses
are removed from the definition):

ch
=
-
1
;

The two preprocessing tokens `=' and `-' are not magically
joined together into a `=-' preprocessing token. It is often
said that the preprocessor performs textual substitution, but
this is just loose talk: The preprocessor operates on text
that has already been tokenized. You can test this by
trying to construct a "modern" two-character operator:

#define PLUS +
int i = 0;
i PLUS= 42;

.... and scrutinizing the error messages you get.

Of course, your concern is with pre-Standard compilers,
some of whose preprocessors did in fact work with text even
though (IIRC) K&R said they shouldn't. The resulting bugs
were occasionally useful, as in the much-abused

#define PASTE(x,y) x/**/y

.... for which the C89 Standard had to invent a new notation.
I don't know if the standard formally made "=-" illegal, or kept the
K&R deprecation, but with older compilers, this might be an issue.
When I first encountered C in the late 1970's, these
operators had already been respelled to their current forms.
The K&R of that vintage mentioned that some "older" compilers
might still be found in the wild somewhere. Now, thirty-plus
years further along, such compilers deserve a stronger word
than just "older."

Let's put it this way: What other accommodations do you
make on behalf of these "older than older" compilers? Do you
avoid using prototypes? Do you avoid `unsigned char'? Do you
avoid `long double', or `long'? Do you cast the result of
malloc()? Do you refrain from using `size_t' and `time_t'?
Do you steer clear of <stdarg.h>? Do you ... well, never mind:
The list is already long, and the point is already made.
In general, I tend to make sure that any macros that include a - or a
~ are parenthesized, to avoid issues with macro expansion and operator
precedence.

Well, it's your choice. It's certainly harmless, and it
may help keep strong the habit of parenthesizing value-producing
macros in general. But in this case it's not necessary and hasn't
been necessary for going on three decades.

--
Er*********@sun.com

Nov 14 '05 #13

Dan Pop

In <c9**********@news5.svr.pol.co.uk> "Malcolm" <ma*****@55bank.freeserve.co.uk> writes:

"Dan Pop" <Da*****@cern.ch> wrote in message
Most of the time, there are perfectly sensible limits that can be
imposed on the user input. As a matter of fact, I have yet to see a
counterexample. And when the user input is obtained interactively, > the

user can be warned of those limits, in the text prompting him to
provide the input.

What limit should be imposed on a line of BASIC? Most human-readable code is
under 100 characters, but some code might be machine-generated, and someone
might add a long string as a single line.

The only reason for reading a whole line of BASIC code in a buffer I can
imagine is for implementing the interactive line editing capability of a
simple minded BASIC interpreter.

If this is a small system, like the typical ones using such BASIC
interpreters, I'd use the whole memory available in the system and
give an error if it still doesn't fit (a common situation when the
system is running out of memory: my Spectrum used to beep when I wanted
to edit a line too large to be copied in the remaining free memory).

On a larger system, I'd use something like a 10..100k buffer and give an
error if the line doesn't fit. This is not a context where silent
truncation makes any sense.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #14

Dan Pop

In <40**************@sun.com> Eric Sosman <Er*********@sun.com> writes:

When I first encountered C in the late 1970's, these
operators had already been respelled to their current forms.
The K&R of that vintage mentioned that some "older" compilers
might still be found in the wild somewhere.
OTOH, when I first encountered VAX C, in the late 1980's, it was still
supporting the anachronic operators mentioned in K&R1 as things of the
past... It was after a long debugging session, trying to understand why
the compiler generates the "wrong" code for "i=-1;" (I was fluent in VAX
assembly and the assembly output of the compiler simply didn't make any
sense) that I realised that white space is really my friend when not
coerced to the rigours of fixed form FORTRAN.
Now, thirty-plus
years further along, such compilers deserve a stronger word
than just "older."

Yet, the only qualifier we can apply to them is pre-ANSI...

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #15

Paul D. Boyle

Eric Sosman <Er*********@sun.com> wrote:
: Paul D. Boyle wrote:
:> There was a recent thread in this group which talked about the
:> shortcomings of fgets(). I decided to try my hand at writing a

: First criticism: The function does too much. This is, of
: course, a matter of taste, but if the goal is "a replacement
: for fgets()" I think the validate() business is extraneous.
: (Even the `sz' parameter raises my eyebrows a little, albeit
: not a lot.)

: IMHO, a low-level library function should do one thing,
: do it well, and do it in a manner that facilitates combining
: it with other functions to create grander structures. Or as
: my old fencing coach used to admonish me when I got overenthused
: with tricky multiple-feint combinations: "Keep It Simple, Stupid."

I appreciate your criticisms. I especially agree that the 'int
(*validate)(int)' validation is, at best, misplaced to be used by this
low level function. I do like, however, the idea of having some
way of validating the input which can be simply plugged in without
messing with the main flow of the program.

: far towards the baroque. In effect, USER_DIALOG_H and the
: associated macros become part of the function's specification.
: That specification now encompasses one return value encoding
: three distinguishable states, four function arguments (two
: with usage rules not expressible to the compiler), and five
: macros. That strikes me as *way* too much for "a replacement
: for fgets()."

When I was writing the function my idea changed from returning
a size_t of the number of bytes read to an error code based
returned. I agree that this was muddled by "changing horses
in mid stream.
: ("All right, Smarty Pants, how would *you* do it?")

: Fair enough. Everybody, it seems, writes an fgets()
: replacement eventually, and here's the .h text for mine:

: char *getline(FILE *stream);

I did write another function with this prototype. Based on
how I will combine this with other input taking and parsing
functions, I am thinking of some thing like:

char *get_line( FILE *in, size_t *sz );

/* where 'sz' will "return" the number of bytes written */
or
size_t get_line( FILE *in, char **line );

I haven't decided yet which one to go with though.
:> #define ERROR_MEMORY (-1)
:> #define ERROR_ILLEGAL_CHAR (-2)

: Pointless parentheses.

I feel embarrassed. Shame is a good teacher. :-)

Again, thank-you for pointing out all the other shortcomings of
my function. I'll try again taking into account your feedback.

Regards,
Paul

--
Paul D. Boyle
bo***@laue.chem.ncsu.edu
North Carolina State University
http://www.xray.ncsu.edu

Nov 14 '05 #16

Dan Pop

In <c9**********@uni00nw.unity.ncsu.edu> "Paul D. Boyle" <bo***@laue.chem.ncsu.edu> writes:

Again, thank-you for pointing out all the other shortcomings of
my function. I'll try again taking into account your feedback.

OTOH, don't let other people tell you how to invent *your* wheel.
After all, it's you who's going to ride on it, not them...

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #17

Malcolm

"Dan Pop" <Da*****@cern.ch> wrote in message

Again, thank-you for pointing out all the other shortcomings of
my function. I'll try again taking into account your feedback.

OTOH, don't let other people tell you how to invent *your*
wheel.
After all, it's you who's going to ride on it, not them...

But discussing design decisions is a sensible thing to do. The problem is
that the software engineering side of programming is so immature that you
will get several opinions on the best way to read a line of text.

Nov 14 '05 #18

kal

Da*****@cern.ch (Dan Pop) wrote in message news:<c9**********@sunnews.cern.ch>...

In <c9**********@uni00nw.unity.ncsu.edu> "Paul D. Boyle" <bo***@laue.chem.ncsu.edu> writes:
Again, thank-you for pointing out all the other shortcomings of
my function. I'll try again taking into account your feedback.

OTOH, don't let other people tell you how to invent *your* wheel.
After all, it's you who's going to ride on it, not them...

Dan

1. If one or more functions already available will do the job then
use those functions. As Hippocrates said: "Art is long, life short,
experience treacherous, judgement difficult... ." Sorry, I have
forgotten the full quotation.

2. When writing general purpose functions, design it so that it
does the minimum possible extent of work. The simpler the function
the wider will be its use. The principle to remember is: "if it
is not necessary for a funtion to do something then it is necessary
that the function does not do that thing."

Nov 14 '05 #19

Paul Hsieh

"Paul D. Boyle" <bo***@laue.chem.ncsu.edu> wrote:

Eric Sosman <Er*********@sun.com> wrote:
: IMHO, a low-level library function should do one thing,
: do it well, and do it in a manner that facilitates combining
: it with other functions to create grander structures. Or as
: my old fencing coach used to admonish me when I got overenthused
: with tricky multiple-feint combinations: "Keep It Simple, Stupid."

I appreciate your criticisms. I especially agree that the 'int
(*validate)(int)' validation is, at best, misplaced to be used by this
low level function.
Uh ... no. This was probably the best motivated part of your entire
routine. *fgetc()* is low level. *fread()* is low level. A function
which inputs a line and doesn't buffer overflow is *NOT* low level.
See my other comments here:

http://groups.google.com/groups?selm...&output=gplain

The reason why fgets is flawed in the first place was because the
original designers of the C language made the mistake of thinking
input was low level.
[...] I do like, however, the idea of having some way of validating the input
which can be simply plugged in without messing with the main flow of the
program.
Yes. But you need more context to give it the possibility of being
able to parse. Text input almost always leads to a requirement for
parsing. Your intuition is correct.
: far towards the baroque. In effect, USER_DIALOG_H and the
: associated macros become part of the function's specification.
: That specification now encompasses one return value encoding
: three distinguishable states, four function arguments (two
: with usage rules not expressible to the compiler), and five
: macros. That strikes me as *way* too much for "a replacement
: for fgets()."

When I was writing the function my idea changed from returning
a size_t of the number of bytes read to an error code based
returned. I agree that this was muddled by "changing horses
in mid stream.
So change it to a long. Best of both worlds.
[...] Based on
how I will combine this with other input taking and parsing
functions, I am thinking of some thing like:

char *get_line( FILE *in, size_t *sz );

/* where 'sz' will "return" the number of bytes written */
And what if the input size is infinite? (Imagine stdin, and pipe in
something that simply never issues an EOF?)
:> #define ERROR_MEMORY (-1)
:> #define ERROR_ILLEGAL_CHAR (-2)

: Pointless parentheses.

I feel embarrassed. Shame is a good teacher. :-)

Actually, the "-" sign can be incorrectly interpreted as a subtraction
as a silent error. I'm sure that in your case that they are
unnecessary but go look at the include files for any C compiler --
negative numbers are usually always enclosed in parentheses.

--
Paul Hsieh
http://www.pobox.com/~qed/userInput.html
http://bstring.sf.net/

Nov 14 '05 #20

Dan Pop

In <a5**************************@posting.google.com > k_*****@yahoo.com (kal) writes:

Da*****@cern.ch (Dan Pop) wrote in message news:<c9**********@sunnews.cern.ch>...
In <c9**********@uni00nw.unity.ncsu.edu> "Paul D. Boyle" <bo***@laue.chem.ncsu.edu> writes:
>Again, thank-you for pointing out all the other shortcomings of
>my function. I'll try again taking into account your feedback.

OTOH, don't let other people tell you how to invent *your* wheel.
After all, it's you who's going to ride on it, not them...

Dan

1. If one or more functions already available will do the job then
use those functions. As Hippocrates said: "Art is long, life short,
experience treacherous, judgement difficult... ." Sorry, I have
forgotten the full quotation.

2. When writing general purpose functions, design it so that it
does the minimum possible extent of work. The simpler the function
the wider will be its use. The principle to remember is: "if it
is not necessary for a funtion to do something then it is necessary
that the function does not do that thing."

None of these applies here.

1. There is no standard function (or functions that simply combined)
that does the job.

2. We're talking about a function with a very specific purpose. The hard
part is not the implementation, but the design. That's why most
programmers prefer to design it themselves, rather than use some else's
design. I don't like any of the designs I've seen until now, so if I
needed such a function, I'd certainly implement my own design...

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #21

fgets() replacement

Similar topics