471,620 Members | 1,383 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,620 software developers and data experts.

Regarding: fgets() replacement

"Paul D. Boyle" <bo***@laue.chem.ncsu.edu> wrote:
There was a recent thread in this group which talked about the
shortcomings of fgets(). I decided to try my hand at writing a
replacement for fgets() using fgetc() and realloc() to read a line of
arbitrary length. I know that the better programmers in this group could
write a more robust function, but here is my shot at it anyway.
I would appreciate people's comments on my fget_line() code below
(usage example included). Any constructive criticism welcome regarding
logic, design, style, etc. Thanks.

Your algorithm can end up calling realloc O(lengthof(input)) times.
This will end up shredding some of the existing heaps for some widely
deployed C compilers in existence today. I.e., just input enough
stuff and malloc/realloc will start grinding to a halt. Its not hard
to write up test scenarios to see this happen for yourself with 2 of
the most popular Windows C compilers. Using double the amount of
memory at each step is a simple way to avoid this problem (it will
also improve performance by quite a bit.)

Your function also only supports the one semantic of reallocating the
memory required for whatever input is supplied. This isnt going to
work very well in systems like Linux/UNIX which support "yes".
(Consider "yes | ./a.out") The time required to fill in a whole
swap-file's worth of data is pretty long; so this is not a marginal
situation if a hacker is trying to slow down your system, for example.

Your function doesn't make any distinction between binary or text
files. The thing is -- if you read a text file in a binary mode, you
can receive extraneous characters (like '\0's that would not be read
from a text file). That's all fine since you return the length so the
set of characters read (with the exception of the last one that might
be rejected by "validate") but, just like fgets(), that puts your
function at odds with every other function in the standard C library
which always disallows internal characters of char * strings to be
'\0' (since that's the end-of-string marker).

The validate() function you specify is a lot weaker than it could and
should be. First of all, if the input is fairly large, why not build
a table indexed by all characters and just check the table instead?
It would be a lot faster. Ok, the reason is because a callback
function can process a lot more context -- for example it might be
possible to parse simple grammars via state machine. The way you do
this is by passing an opaque context parameter to it (passed in to
fget_line()) as well as the character index. So it should be:

int (*validate)(int character, int idx, void * context)

And add in the additional parameter void * context to fget_line().

There also appears to be a flaw in your program that allows
local_buffer to be dereferenced even if its NULL; i.e., if the first
realloc() fails.

If you would like to see how I solved this same problem in a more
general way see:


Paul Hsieh
Nov 14 '05 #1
0 1490

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

5 posts views Thread by Rob Somers | last post: by
7 posts views Thread by William L. Bahn | last post: by
20 posts views Thread by Paul D. Boyle | last post: by
35 posts views Thread by David Mathog | last post: by
32 posts views Thread by FireHead | last post: by
42 posts views Thread by mellyshum123 | last post: by
285 posts views Thread by Sheth Raxit | last post: by
1 post views Thread by XIAOLAOHU | last post: by
reply views Thread by leo001 | last post: by
1 post views Thread by ZEDKYRIE | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.