473,326 Members | 2,013 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

Question regarding fgets and new lines

I need to read in a comma separated file, and for this I was going to
use fgets. I was reading about it at http://www.cplusplus.com/ref/ and
I noticed that the document said:

"Reads characters from stream and stores them in string until (num -1)
characters have been read or a newline or EOF character is reached,
whichever comes first."

My question is that if it stops at a new line character (LF?) then how
does one read a file with multiple new line characters?

Another question. The syntax is:

char * fgets (char * string , int num , FILE * stream);

but you have to allot a size for the string before this. Would you just
use the same num as used in the fgets? So char stringexample[num] ?

Nov 24 '06 #1
42 6711
me**********@yahoo.ca wrote:
I need to read in a comma separated file, and for this I was going to
use fgets. I was reading about it at http://www.cplusplus.com/ref/ and
I noticed that the document said:

"Reads characters from stream and stores them in string until (num -1)
characters have been read or a newline or EOF character is reached,
whichever comes first."

My question is that if it stops at a new line character (LF?) then how
does one read a file with multiple new line characters?
You call it multiple times, until you've read the entire file.
Another question. The syntax is:

char * fgets (char * string , int num , FILE * stream);

but you have to allot a size for the string before this. Would you just
use the same num as used in the fgets? So char stringexample[num] ?
Yes. You're essentially telling fgets: "OK, I've set this much space
aside for you to read into, give me that many characters (minus 1 for
the NUL terminator) or the first line, whichever comes first."
--
Clark S. Cox III
cl*******@gmail.com
Nov 24 '06 #2
me**********@yahoo.ca wrote:
I need to read in a comma separated file, and for this I was going to
use fgets.
You may be better off parsing such files one character at a time.
I was reading about it at http://www.cplusplus.com/ref/ and
I noticed that the document said:

"Reads characters from stream and stores them in string until (num -1)
characters have been read or a newline or EOF character is reached,
whichever comes first."

My question is that if it stops at a new line character (LF?) then how
does one read a file with multiple new line characters?
By making multiple calls to fgets().

The problem though is cases like Excel which allow newlines in
individual
field records. Such fields are 'quoted' with a leading double quote
("), and
an embedded double quote is escaped as two double quotes. Hence my
comment that you may be better off with a simple state machine parsing
one character at a time.
Another question. The syntax is:

char * fgets (char * string , int num , FILE * stream);

but you have to allot a size for the string before this. Would you just
use the same num as used in the fgets? So char stringexample[num] ?
Yes. Sample use is...

char line[256];
while (fgets(line, sizeof line, stdin))
{
/* ... */
}

Though more serious programs will roll their own fgets() that
dynamically
allocates storage for a line, rather than fixing the size of the
buffer.
[Such programs still need to be mindful of the idiots that will pump a
large \n free binary file through stdin.]

--
Peter

Nov 24 '06 #3

Peter Nilsson wrote:
You may be better off parsing such files one character at a time.

I guess maybe using fgetc?

Nov 24 '06 #4
me**********@yahoo.ca wrote:
I need to read in a comma separated file, and for this I was going to
use fgets. I was reading about it at http://www.cplusplus.com/ref/ and
I noticed that the document said:

"Reads characters from stream and stores them in string until (num -1)
characters have been read or a newline or EOF character is reached,
whichever comes first."

My question is that if it stops at a new line character (LF?) then how
does one read a file with multiple new line characters?
One line at a time. Read a line, process it as you see fit,
and then proceed to the next line. Lather, rinse, repeat.
Another question. The syntax is:

char * fgets (char * string , int num , FILE * stream);

but you have to allot a size for the string before this. Would you just
use the same num as used in the fgets? So char stringexample[num] ?
Yes. The problem of how big to make `num' can be a
vexing one: If you make it 80 you can handle lines of up
to 78 "payload" characters plus a newline and a '\0', but
if the input stream supplies a longer line you've got a
bit of a problem. You could make `num' 1000000, but do you
really want to spend a megabyte as insurance against long
lines? (And there's still the nagging possibility that the
input might hold a 1000001-character line ...)

One plausible way to proceed is to make `num' moderately
larger than the longest line you expect to encounter, call
fgets(), and then check whether the buffer contains a '\n'.
If it does not (and if neither end-of-input nor an I/O error
occurred, which you can test with feof() and ferror()), then
the file contains a longer-than-anticipated line. The first
part of that line has been stored in the buffer, and the tail
end is still "pending," available to be read.

What to do next? If you were expecting lines of up to
around 100 characters and you used a 1000-character buffer
just to be on the safe side and you ran into a line longer
than 1000 characters -- more than ten times what you thought
the maximum length would be -- you might well conclude that
there's something wrong with the input: Maybe the file you've
been handed really isn't a CSV file at all. It would be
perfectly plausible to blurt out an error message and stop
processing, or to blurt an error and throw the offending line
away (remember to "drain" the unread tail by reading until
you get '\n' or EOF).

If you've used malloc() to obtain memory for the buffer,
another possibility is to use realloc() to make the buffer
larger (preserving the already-read portion) and call fgets()
again to read the tail of the line into the tail of the expanded
buffer. If necessary, you can expand again and again until you
finally get a big enough buffer (or run out of memory). In my
opinion it's a little easier to implement this scheme by using
getc() to read a character at a time instead of using fgets()
to read a batch of characters, but either way it's fairly
straightforward.

--
Eric Sosman
es*****@acm-dot-org.invalid
Nov 24 '06 #5
me**********@yahoo.ca wrote:
I need to read in a comma separated file, and for this I was going to
use fgets. I was reading about it at http://www.cplusplus.com/ref/ and
I noticed that the document said:

"Reads characters from stream and stores them in string until (num -1)
characters have been read or a newline or EOF character is reached,
whichever comes first."

My question is that if it stops at a new line character (LF?) then how
does one read a file with multiple new line characters?
Well presumably you would just read line after line. (fgets() can be
called iteratively.)
Another question. The syntax is:

char * fgets (char * string , int num , FILE * stream);

but you have to allot a size for the string before this. Would you just
use the same num as used in the fgets? So char stringexample[num] ?
Somehow you are just supposed to know the length. You have to guess --
usually you just overestimate or something like that. If its too small
then you get truncated results. Yeah, it doesn't make much more sense
to me either. This is just a design stupidity of the C language.

You can save yourself a lot of grief and just download The Better
String Library and its examples. Its open source and includes an Excel
compatible CSV reader. You can get it from here:

http://bstring.sf.net/

It also includes more logical line reading functions like bgets which
you use via something like:

bstring b = ((bNgetc) fgetc, stdin, '\n');

Which will read a line of text from the standard input into the bstring
b which will be sized as required. Or if you just want to deal with
the whole thing at once:

struct bstrlist * sl=bsplit(b=bread ((bNread)fread,stdin),'\n');

Which will read the whole file into the bstring b, and split it into
individual sub-strings seperated by '\n's stored in sl.

Of course, as I said, neither of these things are quite correct for
parsing CSV that can include quotation, however the examples give a
mechanism for this:

struct bStream * s = bsopen ((bNread) fread, stdin);
struct CSVStream * csv = parseCSVOpen (s);
struct CSVEntry entry; /* contents, mode */
/*...*/
parseCSVNextEntry (&entry, csv); /* Grab an entry */
/*...*/
parseCSVClose (csv);

Its fast and correct.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 24 '06 #6
Eric Sosman wrote:
me**********@yahoo.ca wrote:
>I need to read in a comma separated file, and for this I was going
to use fgets. I was reading about it at http://www.cplusplus.com/ref/
and I noticed that the document said:

"Reads characters from stream and stores them in string until
(num -1) characters have been read or a newline or EOF character
is reached, whichever comes first."
.... snip ...
>
If you've used malloc() to obtain memory for the buffer,
another possibility is to use realloc() to make the buffer
larger (preserving the already-read portion) and call fgets()
again to read the tail of the line into the tail of the expanded
buffer. If necessary, you can expand again and again until you
finally get a big enough buffer (or run out of memory). In my
opinion it's a little easier to implement this scheme by using
getc() to read a character at a time instead of using fgets()
to read a batch of characters, but either way it's fairly
straightforward.
Or simply download and use the public domain ggets, at:

<http://cbfalconer.home.att.net/download/>

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>
Nov 24 '06 #7
On Fri, 24 Nov 2006 02:22:25 -0500, CBFalconer wrote:
>Or simply download and use the public domain ggets, at:

<http://cbfalconer.home.att.net/download/>
"The storage has been allocated within fggets ... Freeing of assigned
storage is the callers responsibility".

This programming style is not used by the Standard C library (and
other well-known libraries). I'd be reluctant to use it in my
programs.

Best regards,
Roland Pibinger
Nov 24 '06 #8
Roland Pibinger wrote:
On Fri, 24 Nov 2006 02:22:25 -0500, CBFalconer wrote:
Or simply download and use the public domain ggets, at:

<http://cbfalconer.home.att.net/download/>

"The storage has been allocated within fggets ... Freeing of assigned
storage is the callers responsibility".

This programming style is not used by the Standard C library (and
other well-known libraries).
For two simple examples, the style's used by POSIX's strdup, and GNU's
asprintf. I'd say both are rather well-known.
I'd be reluctant to use it in my
programs.
That, of course, is your right.

Nov 24 '06 #9
On Fri, 24 Nov 2006 08:50:13 GMT, rp*****@yahoo.com (Roland Pibinger)
wrote:
>On Fri, 24 Nov 2006 02:22:25 -0500, CBFalconer wrote:
>>Or simply download and use the public domain ggets, at:

<http://cbfalconer.home.att.net/download/>

"The storage has been allocated within fggets ... Freeing of assigned
storage is the callers responsibility".

This programming style is not used by the Standard C library (and
other well-known libraries). I'd be reluctant to use it in my
programs.
Isn't strdup posix and isn't that well known?
Remove del for email
Nov 24 '06 #10
On 24 Nov 2006 05:00:37 -0800, <truedfx@...comwrote:
>Roland Pibinger wrote:
>This programming style is not used by the Standard C library (and
other well-known libraries).

For two simple examples, the style's used by POSIX's strdup, and GNU's
asprintf. I'd say both are rather well-known.
Guess why there is no strdup (and no asprintf) in the ISO C Standard?

Best regards,
Roland Pibinger
Nov 24 '06 #11
rp*****@yahoo.com (Roland Pibinger) wrote:
On Fri, 24 Nov 2006 02:22:25 -0500, CBFalconer wrote:
Or simply download and use the public domain ggets, at:

<http://cbfalconer.home.att.net/download/>

"The storage has been allocated within fggets ... Freeing of assigned
storage is the callers responsibility".

This programming style is not used by the Standard C library (and
other well-known libraries).
Isn't it? I can't say that I'm unfamiliar with it.
I'd be reluctant to use it in my programs.
Then you're going to have a right hassle implementing con- and
destructors for, e.g., linked lists.

Richard
Nov 24 '06 #12
On 23 Nov 2006 23:17:06 -0800, websnarf@...com wrote:
>mellyshum123@...ca wrote:
>but you have to allot a size for the string before this. Would you just
use the same num as used in the fgets? So char stringexample[num] ?

Somehow you are just supposed to know the length. You have to guess --
usually you just overestimate or something like that. If its too small
then you get truncated results.
Not necessarily. You only need to know if you are done (if the line is
entirely read) or not. If not, read again until the rest of the line
is read. Your code basically becomes a loop. Just assume that the
buffer is always too small to read the line in one pass.
>Yeah, it doesn't make much more sense
to me either. This is just a design stupidity of the C language.
Live with, not against, your limits.
>You can save yourself a lot of grief and just download The Better
String Library and its examples.
Best regards,
Roland Pibinger
Nov 24 '06 #13
Roland Pibinger wrote:
On 24 Nov 2006 05:00:37 -0800, <truedfx@...comwrote:
Roland Pibinger wrote:
This programming style is not used by the Standard C library (and
other well-known libraries).
For two simple examples, the style's used by POSIX's strdup, and GNU's
asprintf. I'd say both are rather well-known.

Guess why there is no strdup (and no asprintf) in the ISO C Standard?
See the C99 rationale, section 0.

Nov 24 '06 #14
Roland Pibinger wrote:
On Fri, 24 Nov 2006 02:22:25 -0500, CBFalconer wrote:
>Or simply download and use the public domain ggets, at:

<http://cbfalconer.home.att.net/download/>

"The storage has been allocated within fggets ... Freeing of
assigned storage is the callers responsibility".

This programming style is not used by the Standard C library (and
other well-known libraries). I'd be reluctant to use it in my
programs.
Why not? If you malloc something, you know you need to free it
when no longer needed. If you use ggets, you know you need to free
the line when no longer needed. This is not a massive memory
leap. Meanwhile you don't have to worry about buffer sizes, etc.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>
Nov 24 '06 #15

<me**********@yahoo.cawrote in message
news:11**********************@45g2000cws.googlegro ups.com...
I need to read in a comma separated file, and for this I was going to
use fgets. I was reading about it at http://www.cplusplus.com/ref/ and
I noticed that the document said:

"Reads characters from stream and stores them in string until (num -1)
characters have been read or a newline or EOF character is reached,
whichever comes first."

My question is that if it stops at a new line character (LF?) then how
does one read a file with multiple new line characters?

Another question. The syntax is:

char * fgets (char * string , int num , FILE * stream);

but you have to allot a size for the string before this. Would you just
use the same num as used in the fgets? So char stringexample[num] ?
OK, I've read the other responses to this and they were...shall we
say, regrettable? Except for "pathological" cases, here's all you
need to do:

#define LINEMAX 512

char csv_line[LINEMAX];
FILE *csv_fptr;

<get or create a string here that is the path to the CSV file>

if((csv_fptr=fopen(csv_filepath,"r"))!=NULL) {

while((fgets(csv_line,LINEMAX,csv_fptr))!=NULL) {

<you can parse out the data from each csv_line right here>

}

fclose(csv_fptr);
}

else printf("\nCouldn't open %s",csv_filepath);

And you're done! Something basically exactly like this is done
like a trillion times a day without incident or regret...

Yes, you do have to declare a character array that is bigger than the
longest line you expect to encounter (I generally use "512" as my "magic
number" for that), and fgets() is one of those file-reading functions that
keeps track of a "pointer" to a position in the file, so every time you use
it,
it starts reading at the position where it left off the last time it was
called...this is why it is easy to use it in a loop like above. (If needed,
you also can use fseek(), rewind(), and ftell() to move the "pointer"
around the file to positions you want to read.)

---
William Ernest Reid

Nov 24 '06 #16
CBFalconer <cb********@yahoo.comwrites:
Roland Pibinger wrote:
>On Fri, 24 Nov 2006 02:22:25 -0500, CBFalconer wrote:
>>Or simply download and use the public domain ggets, at:

<http://cbfalconer.home.att.net/download/>

"The storage has been allocated within fggets ... Freeing of
assigned storage is the callers responsibility".

This programming style is not used by the Standard C library (and
other well-known libraries). I'd be reluctant to use it in my
programs.

Why not? If you malloc something, you know you need to free it
when no longer needed. If you use ggets, you know you need to free
the line when no longer needed. This is not a massive memory
leap. Meanwhile you don't have to worry about buffer sizes, etc.
Exactly. For any resource, there needs to be a way to allocate it and
a way to release it. For raw chunks of memory, the allocation and
deallocation routines are "malloc" and "free". For stdio streams,
they're called "fopen" and "fclose". For the ggets interface (if I
understand it correctly), they're called "ggets" and "free".

It might not have been a bad idea to have a special purpose
deallocation, say "ggets_release"; it would be a simple wrapper around
"free", but it would leave room for more complex actions in a future
version. But I don't think it's really necessary.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 24 '06 #17
On Fri, 24 Nov 2006 08:42:32 -0500, CBFalconer wrote:
>Roland Pibinger wrote:
>This programming style is not used by the Standard C library (and
other well-known libraries). I'd be reluctant to use it in my
programs.

Why not?
Because responsibilities become unclear. Simple rules like 'whoever
allocates something must deallocate it' don't work any more.
>If you malloc something, you know you need to free it
when no longer needed.
Ok, that's symmetric.
>If you use ggets, you know you need to free
the line when no longer needed.
That's unsymmetric. The user can easily forget the 'free'.
It's all about style. Maybe someone can tell the story why strdup was
excluded from the C Standard (I'm not a C historian and don't want to
become one).

Best regards,
Roland Pibinger
Nov 24 '06 #18
rp*****@yahoo.com (Roland Pibinger) writes:
On Fri, 24 Nov 2006 08:42:32 -0500, CBFalconer wrote:
>>Roland Pibinger wrote:
>>This programming style is not used by the Standard C library (and
other well-known libraries). I'd be reluctant to use it in my
programs.

Why not?

Because responsibilities become unclear. Simple rules like 'whoever
allocates something must deallocate it' don't work any more.
>>If you malloc something, you know you need to free it
when no longer needed.

Ok, that's symmetric.
>>If you use ggets, you know you need to free
the line when no longer needed.

That's unsymmetric. The user can easily forget the 'free'.
malloc() allocates; free() frees.

ggets() allocates; free() frees.

It all seems sufficiently symmetric to me. The user has to remember
the free() in either case.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 24 '06 #19
Roland Pibinger wrote:
On Fri, 24 Nov 2006 08:42:32 -0500, CBFalconer wrote:
>Roland Pibinger wrote:
>>This programming style is not used by the Standard C library (and
other well-known libraries). I'd be reluctant to use it in my
programs.

Why not?

Because responsibilities become unclear. Simple rules like 'whoever
allocates something must deallocate it' don't work any more.
They don't work anyhow for anything other than the simplest code.
>
>If you malloc something, you know you need to free it
when no longer needed.

Ok, that's symmetric.
Oh? I would think you would be #defining unmalloc free. How are
you handling freeing after realloc, or calloc?
>
>If you use ggets, you know you need to free
the line when no longer needed.

That's unsymmetric. The user can easily forget the 'free'. It's
all about style. Maybe someone can tell the story why strdup was
excluded from the C Standard (I'm not a C historian and don't
want to become one).
Well, implementing strdup is much simpler than implementing ggets.
There also isn't a dangerous version (e.g. gets) to be replaced.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>
Nov 24 '06 #20
CBFalconer wrote:
Roland Pibinger wrote:
>>On Fri, 24 Nov 2006 02:22:25 -0500, CBFalconer wrote:

>>>Or simply download and use the public domain ggets, at:

<http://cbfalconer.home.att.net/download/>

"The storage has been allocated within fggets ... Freeing of
assigned storage is the callers responsibility".

This programming style is not used by the Standard C library (and
other well-known libraries). I'd be reluctant to use it in my
programs.


Why not? If you malloc something, you know you need to free it
when no longer needed. If you use ggets, you know you need to free
the line when no longer needed. This is not a massive memory
leap. Meanwhile you don't have to worry about buffer sizes, etc.
FWIW, I took a somewhat different tack in my own gets()
replacement (I guess everybody writes one, sooner or later).
Mine follows the precedent of things like getenv(): the returned
pointer is only valid until the next call, when the buffer it
points to may be overwritten and/or moved or freed.

This approach has some disadvantages: for example, it would
be a pain to make it thread-friendly. On the other hand, it
localizes all the memory management inside the function, and the
signature `char *getline(FILE*)' is simple enough that even I can
remember it. (The older and feebler my gray cells get, the more
I value simplicity ...)

I don't have a convenient place to post the code, but I'll be
happy to mail it to anyone who's interested.

--
Eric Sosman
es*****@acm-dot-org.invalid
Nov 24 '06 #21
Eric Sosman said:

<snip>
>
I don't have a convenient place to post the code, but I'll be
happy to mail it to anyone who's interested.
I'll be glad to host it for you if you wish. Email works, I think. (If not,
please let me know!!)

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Nov 24 '06 #22
In article <45***************@news.utanet.at>,
Roland Pibinger <rp*****@yahoo.comwrote:
>>This programming style is not used by the Standard C library (and
other well-known libraries). I'd be reluctant to use it in my
programs.
>>Why not?
>Because responsibilities become unclear. Simple rules like 'whoever
allocates something must deallocate it' don't work any more.
That's not the simple rule. The simple rule is "whoever allocates
something must deallocate it, oh and by the way we've got to have some
way for the user to allocate this object whose size he doesn't know,
maybe he should pass in a size and we'll return null if it's not big
enough, or maybe we'll have another function telling him how big it's
going to be...".

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
Nov 24 '06 #23
Eric Sosman wrote:
>
.... snip ...
>
FWIW, I took a somewhat different tack in my own gets()
replacement (I guess everybody writes one, sooner or later).
Mine follows the precedent of things like getenv(): the returned
pointer is only valid until the next call, when the buffer it
points to may be overwritten and/or moved or freed.

This approach has some disadvantages: for example, it would
be a pain to make it thread-friendly. On the other hand, it
localizes all the memory management inside the function, and the
signature `char *getline(FILE*)' is simple enough that even I can
remember it. (The older and feebler my gray cells get, the more
I value simplicity ...)
When I designed ggets I considered that signature, but I could see
no way of returning appropriate errors for both FILE problems, EOF,
and memory allocation problems.

I second the motion about easily remembered signatures.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>
Nov 25 '06 #24
"Bill Reid" <ho********@happyhealthy.netwrites:
<me**********@yahoo.cawrote in message
news:11**********************@45g2000cws.googlegro ups.com...
>I need to read in a comma separated file, and for this I was going to
use fgets. I was reading about it at http://www.cplusplus.com/ref/ and
I noticed that the document said:

"Reads characters from stream and stores them in string until (num -1)
characters have been read or a newline or EOF character is reached,
whichever comes first."

My question is that if it stops at a new line character (LF?) then how
does one read a file with multiple new line characters?

Another question. The syntax is:

char * fgets (char * string , int num , FILE * stream);

but you have to allot a size for the string before this. Would you just
use the same num as used in the fgets? So char stringexample[num] ?
OK, I've read the other responses to this and they were...shall we
say, regrettable? Except for "pathological" cases, here's all you
need to do:

#define LINEMAX 512

char csv_line[LINEMAX];
FILE *csv_fptr;

<get or create a string here that is the path to the CSV file>

if((csv_fptr=fopen(csv_filepath,"r"))!=NULL) {

while((fgets(csv_line,LINEMAX,csv_fptr))!=NULL) {

<you can parse out the data from each csv_line right here>

}

fclose(csv_fptr);
}

else printf("\nCouldn't open %s",csv_filepath);

And you're done!
I may be missing your point, but CSV files are not "line oriented" so
this does not seem to be the obvious solution. Do you consider CSV
records with embedded newlines to be "pathological" and can thus be
ignored (like you do for long lines) or are you saying that the loop
pattern above can be extended to deal with these easily?

When I last had to do this (not in C so I won't post the code) it
seemed easier in the long run to use a small state machine as
another poster has suggested.

--
Ben.
Nov 25 '06 #25
Roland Pibinger wrote:
On 23 Nov 2006 23:17:06 -0800, websnarf@...com wrote:
mellyshum123@...ca wrote:
but you have to allot a size for the string before this. Would you just
use the same num as used in the fgets? So char stringexample[num] ?
Somehow you are just supposed to know the length. You have to guess --
usually you just overestimate or something like that. If its too small
then you get truncated results.

Not necessarily. You only need to know if you are done (if the line is
entirely read) or not.
Right, but if you read a '\0' from your input then knowing this is not
as easy as it sounds.
[...] If not, read again until the rest of the line
is read. Your code basically becomes a loop. Just assume that the
buffer is always too small to read the line in one pass.
If your code is a loop, first of all how/where are you storing each
iteration and second of all, why not write a raw loop around fgetc() in
the first place?
Yeah, it doesn't make much more sense
to me either. This is just a design stupidity of the C language.

Live with, not against, your limits.
WTF? First of all, the perception that you should use fgets() because
you are limited to using just that is completely bogus. The C language
is general enough, that suffering through the weaknesses of fgets() is
completely unnecessary. Second of all, there is a name for people who
simply accept limitations without question; they are called sheep.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 25 '06 #26
Roland Pibinger wrote:
On Fri, 24 Nov 2006 02:22:25 -0500, CBFalconer wrote:
Or simply download and use the public domain ggets, at:

<http://cbfalconer.home.att.net/download/>

"The storage has been allocated within fggets ... Freeing of assigned
storage is the callers responsibility".

This programming style is not used by the Standard C library
True, if you ignore the existence of calloc, realloc and malloc
themselves (or even fopen).
[...] (and other well-known libraries).
With the exception of scientific/numeric (and some crypto) libraries
this is almost certainly false. I would claim that any ADT library for
C which declares the containers uses this paradigm.
[...] I'd be reluctant to use it in my programs.
There are other grounds for being unsatisfied with that code, but
merely its use of implicit allocation is not one of them. If you are
looking for a completely clean, maximally flexible and portable line
input library you can get one here:

http://www.pobox.com/~qed/userInput.html

If you have some clever way of encoding the input incrementally as you
go without storing to memory (such as crypto-hashing passwords) then
you can even avoid the use of malloc if you want.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 25 '06 #27
Keith Thompson wrote:
CBFalconer <cb********@yahoo.comwrites:
Roland Pibinger wrote:
On Fri, 24 Nov 2006 02:22:25 -0500, CBFalconer wrote:
Or simply download and use the public domain ggets, at:

<http://cbfalconer.home.att.net/download/>

"The storage has been allocated within fggets ... Freeing of
assigned storage is the callers responsibility".

This programming style is not used by the Standard C library (and
other well-known libraries). I'd be reluctant to use it in my
programs.
Why not? If you malloc something, you know you need to free it
when no longer needed. If you use ggets, you know you need to free
the line when no longer needed. This is not a massive memory
leap. Meanwhile you don't have to worry about buffer sizes, etc.

Exactly. For any resource, there needs to be a way to allocate it and
a way to release it. For raw chunks of memory, the allocation and
deallocation routines are "malloc" and "free". For stdio streams,
they're called "fopen" and "fclose". For the ggets interface (if I
understand it correctly), they're called "ggets" and "free".

It might not have been a bad idea to have a special purpose
deallocation, say "ggets_release"; it would be a simple wrapper around
"free", but it would leave room for more complex actions in a future
version. But I don't think it's really necessary.
Actually renaming "ggets()" to getsalloc() would be the most
consistent. After all you feed results of (m/c/re)alloc to free to
reclaim the memory, you can just classify getsalloc() into that same
category and just say that anything that you obtained that ends in
"-alloc" you should send to free, without increase in mental impact at
all.

Use of an auxilliary free-function, like fclose() is usually supplied
when the contents of the data you are handling are (for practical
purposes) opaque.

If he wanted a custom free function, he might as well do tricky things
like hide the length (which would cost 0 to obtain) in before the
actual char * pointer, and provide other clever manipulation functions
that could leverage this information. Of course, you can see where
this naturally would end up going.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 25 '06 #28
CBFalconer wrote:
Eric Sosman wrote:

... snip ...
> FWIW, I took a somewhat different tack in my own gets()
replacement (I guess everybody writes one, sooner or later).
Mine follows the precedent of things like getenv(): the returned
pointer is only valid until the next call, when the buffer it
points to may be overwritten and/or moved or freed.

This approach has some disadvantages: for example, it would
be a pain to make it thread-friendly. On the other hand, it
localizes all the memory management inside the function, and the
signature `char *getline(FILE*)' is simple enough that even I can
remember it. (The older and feebler my gray cells get, the more
I value simplicity ...)


When I designed ggets I considered that signature, but I could see
no way of returning appropriate errors for both FILE problems, EOF,
and memory allocation problems.
Not sure what you mean by "FILE problems," but yes: the
ultra-simple signature loses the ability to distinguish between
EOF and realloc() failure. A caller who cares can write

while ((buff = getline(stream)) != NULL) {
/* process the line */
}
if (feof(stream)) {
/* drained the entire file */
} else if (ferror(stream)) {
/* I/O error */
} else {
/* ran out of memory */
}

But even that's not a panacea: A further shortcoming of my
method is that when it returns NULL to indicate I/O error or
realloc() failure, it thereby "loses" (but doesn't "leak")
any characters that may already have been read before the
event. That hasn't turned out to be a problem, because it's
never seemed worth while to try to parse the partial line in
the face of the failure; that is, instead of the above I
usually find myself writing

while ((buff = getline(stream)) != NULL) {
/* process the line */
}
if (! feof(stream)) {
/* some kind of failure: terminate with regrets */
}
I second the motion about easily remembered signatures.
It can, of course, be overdone. Sometimes I wonder whether
I'm running afoul of Einstein's famous remark that things should
be made as simple as possible, but no simpler.

--
Eric Sosman
es*****@acm-dot-org.invalid

Nov 25 '06 #29

Ben Bacarisse <be********@bsb.me.ukwrote in message
news:87************@bsb.me.uk...
"Bill Reid" <ho********@happyhealthy.netwrites:
<me**********@yahoo.cawrote in message
news:11**********************@45g2000cws.googlegro ups.com...
I need to read in a comma separated file, and for this I was going to
use fgets. I was reading about it at http://www.cplusplus.com/ref/ and
I noticed that the document said:

"Reads characters from stream and stores them in string until (num -1)
characters have been read or a newline or EOF character is reached,
whichever comes first."

My question is that if it stops at a new line character (LF?) then how
does one read a file with multiple new line characters?

Another question. The syntax is:

char * fgets (char * string , int num , FILE * stream);

but you have to allot a size for the string before this. Would you just
use the same num as used in the fgets? So char stringexample[num] ?
OK, I've read the other responses to this and they were...shall we
say, regrettable? Except for "pathological" cases, here's all you
need to do:

#define LINEMAX 512

char csv_line[LINEMAX];
FILE *csv_fptr;

<get or create a string here that is the path to the CSV file>

if((csv_fptr=fopen(csv_filepath,"r"))!=NULL) {

while((fgets(csv_line,LINEMAX,csv_fptr))!=NULL) {

<you can parse out the data from each csv_line right here>

}

fclose(csv_fptr);
}

else printf("\nCouldn't open %s",csv_filepath);

And you're done!

I may be missing your point,
There's no "maybe" about it...
but CSV files are not "line oriented"
Except the millions that are, and was the whole idea originally behind
a "CSV" file in the first place...you may be thinking of a "CSV" file
format that co-opted the name but not the spirit or the actual format
(I'm aware of these files), but the OP gave no indication that that
was the type of file he was dealing with...
so
this does not seem to be the obvious solution.
Of course not, because it works, is fast and simple, and is
blindingly obvious...therefore, it must be "wrong"...
Do you consider CSV
records with embedded newlines to be "pathological"
Sure, whatever, they aren't what I typically process and most
likely not what the OP wants to process, so if there are no "embedded
newlines" in the particular CSV files that I or anybody else wants to
process, who cares what name I call those that do? How about
"irrelevant", is that better? "Misnamed"? "Deceptive"?
and can thus be
ignored (like you do for long lines)
Oh, I'm ignoring even more than that, that's me all over, if it doesn't
apply, I just "ignore" it, I'm "funny" that way...
or are you saying that the loop
pattern above can be extended to deal with these easily?
Since fgets() terminates on a newline, you might be better off using
fgetc() instead, and then you have to chew up extra cycles counting
the commas rather than assuming you've got a full record for each
line...I do use other processing loops that employ fgetc() for truly
"pathological" (known malformatted, foreign, suspect) CSV files and
employ "sanity checks" prior to actually reading them, usually in
careful interactive context...

But since I usually know the maximum size and data types for the
files I am dealing with (many times because I wrote them myself!),
why would I slow down and complicate my program for something
I know for a fact I will not encounter?
When I last had to do this (not in C so I won't post the code) it
seemed easier in the long run to use a small state machine as
another poster has suggested.
I don't know exactly what you "had to do" so I can't comment. Since
you seem to be hung up on "embedded newlines" and "long lines" you
were clearly dealing with different data types than I am, and probably
the OP. Like most posters in this thread, you chose to answer a
question that wasn't asked, and then STILL failed to provide an answer!

---
William Ernest Reid

Nov 25 '06 #30
[much snippage]
>Ben Bacarisse <be********@bsb.me.ukwrote in message
news:87************@bsb.me.uk...
>>... but CSV files are not "line oriented"
In article <Gd*******************@bgtnsc05-news.ops.worldnet.att.net>,
Bill Reid <ho********@happyhealthy.netwrote:
>...you may be thinking of a "CSV" file format that co-opted the
name ...
So, you are talking about CSV files, and he is talking about CSV
files, which are completely different things?

"Not Claw, Claw!"
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 25 '06 #31

Chris Torek <no****@torek.netwrote in message
news:ek*********@news4.newsguy.com...
[much snippage]
Ben Bacarisse <be********@bsb.me.ukwrote in message
news:87************@bsb.me.uk...
>... but CSV files are not "line oriented"

In article <Gd*******************@bgtnsc05-news.ops.worldnet.att.net>,
Bill Reid <ho********@happyhealthy.netwrote:
...you may be thinking of a "CSV" file format that co-opted the
name ...

So, you are talking about CSV files, and he is talking about CSV
files, which are completely different things?

"Not Claw, Claw!"
Exactly!

He's most certainly talking about just randomly exporting nonsense
from his weird Uncle Billy's "Excel" program as a *.csv file, such as:

123.45,678.9,135.45,"Hi there
this is a new line","Hi again sucker",680.24
791.23,579.35,,,,

Note carefully, however, that like all "CSV" files since the dawn
of man trying to portably export data by studiously avoiding binary
formats, it IS actually "line-oriented"; there were two rows of a
maximum of six columns each in the Excel file. The only reason it
"exported" as three lines was because of the "embedded newline" in
the cell at the fourth column of the first row; other than that, it follows
the grand tradition of a "field" being delimited by a comma (or
whatever) and the "record" being delimited by a newline.

If I wanted to, I guess I could have inserted the complete text for
"War and Peace" into that column, and thus over-flowed my
512-character buffer by a factor of about a million. For that
matter, I could have inserted the complete text for "Paradise
Lost" in the cell at the first column of the second row, even though
I put a float number in that column in the first row, and what the hell
would I do to "process" THAT data? I don't even think a
"state machine" would save my bacon at that point...

I will leave those problems to people oh so much smarter than I.
I will just continue to humbly bumble through "CSV" files that list
crap like the average home price for the last 30 years for the
top 40 metropolitan areas in the USA, and pray to God some
joker didn't insert the text for the "Kama Sutra" in the column for "1999"
at the row for "Miami/Dade County, FLA"...

---
William Ernest Reid

Nov 26 '06 #32
Eric Sosman said:

[regarding his getline function]
I don't have a convenient place to post the code, but I'll be
happy to mail it to anyone who's interested.
In fact, it is now available on that thar new-fangled Woild-Woid Web, thanks
to a miruckle of modernology, at the following address:

http://www.cpax.org.uk/prg/portable/...sman/index.php

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Nov 26 '06 #33
"Bill Reid" <ho********@happyhealthy.netwrites:
Chris Torek <no****@torek.netwrote in message
news:ek*********@news4.newsguy.com...
>[much snippage]
>Ben Bacarisse <be********@bsb.me.ukwrote in message
news:87************@bsb.me.uk...
... but CSV files are not "line oriented"

In article <Gd*******************@bgtnsc05-news.ops.worldnet.att.net>,
Bill Reid <ho********@happyhealthy.netwrote:
>...you may be thinking of a "CSV" file format that co-opted the
name ...

So, you are talking about CSV files, and he is talking about CSV
files, which are completely different things?

"Not Claw, Claw!"

Exactly!

He's most certainly talking about just randomly exporting nonsense
from his weird Uncle Billy's "Excel" program as a *.csv file, such as:

123.45,678.9,135.45,"Hi there
this is a new line","Hi again sucker",680.24
791.23,579.35,,,,
I was not aware that there was any other kind of CSV file but I will
accept you know more about them I do. However it was not clear from
the OP's post whether he meant "your" kind of CSV files or what I
will call, for want of a more formal definition, RFC 4180 CSV files.

I have obviously unset you, and for that I am sorry, but please don't
try to suggest that I am a fan of this dreadful format. (Or that I
have a weird uncle Billy in Redmond.)

--
Ben.
Nov 26 '06 #34
"Bill Reid" <ho********@happyhealthy.netwrites:
Ben Bacarisse <be********@bsb.me.ukwrote in message
news:87************@bsb.me.uk...
>"Bill Reid" <ho********@happyhealthy.netwrites:
<me**********@yahoo.cawrote in message
news:11**********************@45g2000cws.googlegro ups.com...
>I need to read in a comma separated file, and for this I was going to
use fgets. I was reading about it at http://www.cplusplus.com/ref/ and
I noticed that the document said:

"Reads characters from stream and stores them in string until (num -1)
characters have been read or a newline or EOF character is reached,
whichever comes first."

My question is that if it stops at a new line character (LF?) then how
does one read a file with multiple new line characters?

Another question. The syntax is:

char * fgets (char * string , int num , FILE * stream);

but you have to allot a size for the string before this. Would you just
use the same num as used in the fgets? So char stringexample[num] ?

OK, I've read the other responses to this and they were...shall we
say, regrettable? Except for "pathological" cases, here's all you
need to do:

#define LINEMAX 512

char csv_line[LINEMAX];
FILE *csv_fptr;

<get or create a string here that is the path to the CSV file>

if((csv_fptr=fopen(csv_filepath,"r"))!=NULL) {

while((fgets(csv_line,LINEMAX,csv_fptr))!=NULL) {

<you can parse out the data from each csv_line right here>

}

fclose(csv_fptr);
}

else printf("\nCouldn't open %s",csv_filepath);

And you're done!

I may be missing your point,

There's no "maybe" about it...
>but CSV files are not "line oriented"

Except the millions that are, and was the whole idea originally behind
a "CSV" file in the first place...you may be thinking of a "CSV" file
format that co-opted the name but not the spirit or the actual format
(I'm aware of these files), but the OP gave no indication that that
was the type of file he was dealing with...
Nor any that it was not. Your CSV files are "clean" and the ones I've
had to parse all had embedded newlines in them so we came to the question
from different angles.

<snip>
>When I last had to do this (not in C so I won't post the code) it
seemed easier in the long run to use a small state machine as
another poster has suggested.
I don't know exactly what you "had to do" so I can't comment. Since
you seem to be hung up on "embedded newlines" and "long lines" you
were clearly dealing with different data types than I am, and probably
the OP. Like most posters in this thread, you chose to answer a
question that wasn't asked, and then STILL failed to provide an
answer!
Actually, no, I did not even answer a question that was not asked -- I
just commented and, you are quite right, I added little value. If the
OP gets back to say that his/her CSV files do have embedded newlines I'll
translate my old code and post it.

--
Ben.
Nov 26 '06 #35
Bill Reid wrote:
Chris Torek <no****@torek.netwrote in message
news:ek*********@news4.newsguy.com...
[much snippage]
>Ben Bacarisse <be********@bsb.me.ukwrote in message
>news:87************@bsb.me.uk...
>>... but CSV files are not "line oriented"
In article <Gd*******************@bgtnsc05-news.ops.worldnet.att.net>,
Bill Reid <ho********@happyhealthy.netwrote:
>...you may be thinking of a "CSV" file format that co-opted the
>name ...
So, you are talking about CSV files, and he is talking about CSV
files, which are completely different things?

"Not Claw, Claw!"

Exactly!

He's most certainly talking about just randomly exporting nonsense
from his weird Uncle Billy's "Excel" program as a *.csv file, such as:

123.45,678.9,135.45,"Hi there
this is a new line","Hi again sucker",680.24
791.23,579.35,,,,

Note carefully, however, that like all "CSV" files since the dawn
of man trying to portably export data by studiously avoiding binary
formats, it IS actually "line-oriented";
Nonsense, CSV is a binary format and always has been. You cannot
properly parse a CSV file by assuming that each line is its own record.
there were two rows of a
maximum of six columns each in the Excel file. The only reason it
"exported" as three lines was because of the "embedded newline" in
the cell at the fourth column of the first row; other than that, it follows
the grand tradition of a "field" being delimited by a comma (or
whatever) and the "record" being delimited by a newline.
I am not aware of this grand tradition you speak of and I have
processed numerous CSV files from multiple sources and applications
over the last few years, all of which handled embedded commas and
newlines. You may be confusing CSV with comma-delimited files, a text
format in which all commas are row separators, records are delimited
with newlines, and commas and newlines cannot exist in the actual data.
Comma-delimited files are significantly easier to process than real
CSV files but suffer from obvious disadvantages that are shared with
any character-delimited format.

Check out http://en.wikipedia.org/wiki/Comma-separated_values and
http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm for the details of
the CSV format and how it is used by real-world applications.
If I wanted to, I guess I could have inserted the complete text for
"War and Peace" into that column, and thus over-flowed my
512-character buffer by a factor of about a million. For that
matter, I could have inserted the complete text for "Paradise
Lost" in the cell at the first column of the second row, even though
I put a float number in that column in the first row, and what the hell
would I do to "process" THAT data? I don't even think a
"state machine" would save my bacon at that point...
It's not an overly simple task but it can be done without extraordinary
effort. Excel does it, and I have written a real CSV parser in pure
ANSI C which I recently made available at
http://sourceforge.net/projects/libcsv.

Robert Gamble

Nov 26 '06 #36
CBFalconer <cb********@yahoo.comwrites:
Eric Sosman wrote:
>>
... snip ...
>>
FWIW, I took a somewhat different tack in my own gets()
replacement (I guess everybody writes one, sooner or later).
Mine follows the precedent of things like getenv(): the returned
pointer is only valid until the next call, when the buffer it
points to may be overwritten and/or moved or freed.

This approach has some disadvantages: for example, it would
be a pain to make it thread-friendly. On the other hand, it
localizes all the memory management inside the function, and the
signature `char *getline(FILE*)' is simple enough that even I can
remember it. (The older and feebler my gray cells get, the more
I value simplicity ...)

When I designed ggets I considered that signature, but I could see
no way of returning appropriate errors for both FILE problems, EOF,
and memory allocation problems.

I second the motion about easily remembered signatures.
Well, you *could* return as many unique error conditions as you like
with a simple char* return value:

const char *const gl_EOF = "getline: EOF";
const char *const gl_io_error = "getline: I/O error";
const char *const gl_malloc_failed = "getline: malloc failed";
...

Each error code is a unique pointer value. One problem is that
there's no good way to detect success other than by comparing the
result to each possible error code (a set that can easily change in
later revisions of the function); this can be alleviated somewhat by
putting all the error codes into an array, but it's still inconvenient
for the user. You could provide an auxiliary function that tells you
whether the char* value returned by getline() points to an error
message or not.

Another drawback is that you can't return both error information and
actual data.

But if the user forgets to check the result, the value returned looks
like an error message which could make it easier to track down bugs.

Another approach is to implement something resembling errno, but that
can make it difficult to tie an error to a specific call.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 27 '06 #37
Keith Thompson wrote:
CBFalconer <cb********@yahoo.comwrites:
>>Eric Sosman wrote:

... snip ...
>> FWIW, I took a somewhat different tack in my own gets()
replacement (I guess everybody writes one, sooner or later).
Mine follows the precedent of things like getenv(): the returned
pointer is only valid until the next call, when the buffer it
points to may be overwritten and/or moved or freed.

This approach has some disadvantages: for example, it would
be a pain to make it thread-friendly. On the other hand, it
localizes all the memory management inside the function, and the
signature `char *getline(FILE*)' is simple enough that even I can
remember it. (The older and feebler my gray cells get, the more
I value simplicity ...)

When I designed ggets I considered that signature, but I could see
no way of returning appropriate errors for both FILE problems, EOF,
and memory allocation problems.

I second the motion about easily remembered signatures.


Well, you *could* return as many unique error conditions as you like
with a simple char* return value:

const char *const gl_EOF = "getline: EOF";
const char *const gl_io_error = "getline: I/O error";
const char *const gl_malloc_failed = "getline: malloc failed";
...
Given the availability of feof() and ferror(), and the
sub-rosa knowledge that the only other possible failure mode
is NULL from realloc(), such a dodge didn't seem necessary.
Purists might argue (with some justification) that the sub-rosa
knowledge is a Bad Thing; my feeling was that enumerating all
three "exceptional" possibilities in the documentation (thus
making them part of the "interface contract") was good enough.

If I'd felt it desirable to return more information about
the failure modes, I'd probably have abandoned the approach of
overloading both success and all those failure modes onto a
single returned value. Instead, I'd have returned the "payload"
in one place and a "status" in another -- almost all (Open)VMS
facilities worked this way, and reasonably well. This approach
would also have allowed me to return the partial line payload
that preceded an error, instead of just discarding what had
been read, and that would have been a Good Thing. (Principle:
low-level routines shouldn't make higher-level decisions.) But
it didn't "feel" worth while to clutter the interface to preserve
information I really couldn't see myself making use of.

Different people design different interfaces for the same
task! Different people decompose the same task in different ways!
Ultimately, it comes down to what might be called "taste" (there's
just no point in arguing with Gus), or to put it on a more respectable
footing it comes down to a guess about the likely usage scenarios for
the new facility. There might (or might not) be the germ of a thesis
topic for someone who wants to make a study of how different programmers
approach similar problems: were you corrupted by an early exposure to
Forth, was your mother frightened by a COBOL compiler?
[...]
Another approach is to implement something resembling errno, but that
can make it difficult to tie an error to a specific call.
Usually, when I want to preserve more information than I "tasted"
was appropriate for getline(), I'll have the function return a status
code and pass the payload through an additional argument:

#define GETLINE_OK 0
#define GETLINE_EOF (-1)
#define GETLINE_ERR (-2)
#define GETLINE_NOMEM (-3)
int getline(FILE *stream, char **bufptr);

.... and the caller would write

char *line;
int status;
while ((status = getline(stream, &line)) == GETLINE_OK)

Sometimes it's convenient to make the "status" value be NULL
for success or else a `const char*' pointing to an error message:

char *status;
if ((status = somefunc(args)) != NULL) {
fprintf (stderr, "somefunc: %s\n", status);
exit (EXIT_FAILURE);
}

"There are nine and sixty ways of constructing tribal lays,
And every single one of them is right!" -- Rudyard Kipling

--
Eric Sosman
es*****@acm-dot-org.invalid
Nov 28 '06 #38
Eric Sosman wrote:
>
.... snip ...
>
If I'd felt it desirable to return more information about
the failure modes, I'd probably have abandoned the approach of
overloading both success and all those failure modes onto a
single returned value. Instead, I'd have returned the "payload"
in one place and a "status" in another -- almost all (Open)VMS
facilities worked this way, and reasonably well. This approach
would also have allowed me to return the partial line payload
that preceded an error, instead of just discarding what had
been read, and that would have been a Good Thing. (Principle:
low-level routines shouldn't make higher-level decisions.) But
it didn't "feel" worth while to clutter the interface to preserve
information I really couldn't see myself making use of.
Precisely what ggets does.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>
Nov 28 '06 #39
rp*****@yahoo.com (Roland Pibinger) wrote:
On Fri, 24 Nov 2006 08:42:32 -0500, CBFalconer wrote:
Roland Pibinger wrote:
This programming style is not used by the Standard C library (and
other well-known libraries). I'd be reluctant to use it in my
programs.
Why not?

Because responsibilities become unclear. Simple rules like 'whoever
allocates something must deallocate it' don't work any more.
Sure they do. The rule "*alloc() are the only three functions which
allocate something" doesn't work any more; that's all.
If you malloc something, you know you need to free it
when no longer needed.

Ok, that's symmetric.
If you use ggets, you know you need to free
the line when no longer needed.

That's unsymmetric. The user can easily forget the 'free'.
Then the user had better do his homework.
It's all about style. Maybe someone can tell the story why strdup was
excluded from the C Standard (I'm not a C historian and don't want to
become one).
Then _you_ had better do your homework. Start with the Rationale, as
indicated upthread.

Richard
Nov 28 '06 #40
Eric Sosman <es*****@acm-dot-org.invalidwrites:
Keith Thompson wrote:
[...]
>Well, you *could* return as many unique error conditions as you like
with a simple char* return value:
const char *const gl_EOF = "getline: EOF";
const char *const gl_io_error = "getline: I/O error";
const char *const gl_malloc_failed = "getline: malloc failed";
...
[...]
Different people design different interfaces for the same
task! Different people decompose the same task in different ways!
Ultimately, it comes down to what might be called "taste" (there's
just no point in arguing with Gus), or to put it on a more respectable
footing it comes down to a guess about the likely usage scenarios for
the new facility. There might (or might not) be the germ of a thesis
topic for someone who wants to make a study of how different
programmers approach similar problems: were you corrupted by an early
exposure to
Forth, was your mother frightened by a COBOL compiler?
[...]

Thinking about it a bit more, I don't necessarily *like* the interface
I came up with. If you really really want to have your function
return just a single char* value and still be able to represent
multiple error conditions, what I suggested is probably close to the
best way to do it. The language provides one distinguished pointer
value that doesn't point to actual data; gl_EOF, gl_io_error, et al
can be thought of as multiple null pointers. But returning the actual
data and a status code separately is, IMHO, better style.

It would be nice if a function could return multiple values, so you
could do something like:

char *line;
int status;
(line, status) = getline(stdin);

But that's not C. Yes, getline() could return a struct (and using a
struct might not be a horrible idea), but using a struct probably
isn't as convenient as multiple return values would be.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 28 '06 #41

"Bill Reid" <ho********@happyhealthy.netwrote in message
news:do********************@bgtnsc05-news.ops.worldnet.att.net...
>
Chris Torek <no****@torek.netwrote in message
news:ek*********@news4.newsguy.com...
>[much snippage]
>Ben Bacarisse <be********@bsb.me.ukwrote in message
news:87************@bsb.me.uk...
... but CSV files are not "line oriented"

In article <Gd*******************@bgtnsc05-news.ops.worldnet.att.net>,
Bill Reid <ho********@happyhealthy.netwrote:
>...you may be thinking of a "CSV" file format that co-opted the
name ...

So, you are talking about CSV files, and he is talking about CSV
files, which are completely different things?

"Not Claw, Claw!"

Exactly!

He's most certainly talking about just randomly exporting nonsense
from his weird Uncle Billy's "Excel" program as a *.csv file, such as:

123.45,678.9,135.45,"Hi there
this is a new line","Hi again sucker",680.24
791.23,579.35,,,,

Note carefully, however, that like all "CSV" files since the dawn
of man trying to portably export data by studiously avoiding binary
formats, it IS actually "line-oriented"; there were two rows of a
maximum of six columns each in the Excel file. The only reason it
"exported" as three lines was because of the "embedded newline" in
the cell at the fourth column of the first row; other than that, it
follows
the grand tradition of a "field" being delimited by a comma (or
whatever) and the "record" being delimited by a newline.

If I wanted to, I guess I could have inserted the complete text for
"War and Peace" into that column, and thus over-flowed my
512-character buffer by a factor of about a million. For that
matter, I could have inserted the complete text for "Paradise
Lost" in the cell at the first column of the second row, even though
I put a float number in that column in the first row, and what the hell
would I do to "process" THAT data? I don't even think a
"state machine" would save my bacon at that point...

I will leave those problems to people oh so much smarter than I.
I will just continue to humbly bumble through "CSV" files that list
crap like the average home price for the last 30 years for the
top 40 metropolitan areas in the USA, and pray to God some
joker didn't insert the text for the "Kama Sutra" in the column for "1999"
at the row for "Miami/Dade County, FLA"...
These days, we have huge amounts of memory avialable and so we can afford to
throw it away in making our functions easy to use.
If you go onto my website and look under the "Fuzzy Logic Trees" section you
will find a file called csv.c which is a general-purpose csv file loader.
You only need to have one such function, and then you can quite happily
handle "War and Peace" in a field.

--
www.personal.leeds.ac.uk/~bgy1mm
freeware games to download.
Nov 28 '06 #42

Robert Gamble <rg*******@gmail.comwrote in message
news:11**********************@j44g2000cwa.googlegr oups.com...
Bill Reid wrote:
Chris Torek <no****@torek.netwrote in message
news:ek*********@news4.newsguy.com...
[much snippage]
>
Ben Bacarisse <be********@bsb.me.ukwrote in message
news:87************@bsb.me.uk...
>... but CSV files are not "line oriented"
>
In article <Gd*******************@bgtnsc05-news.ops.worldnet.att.net>,
Bill Reid <ho********@happyhealthy.netwrote:
...you may be thinking of a "CSV" file format that co-opted the
name ...
>
So, you are talking about CSV files, and he is talking about CSV
files, which are completely different things?
>
"Not Claw, Claw!"
Exactly!

He's most certainly talking about just randomly exporting nonsense
from his weird Uncle Billy's "Excel" program as a *.csv file, such as:

123.45,678.9,135.45,"Hi there
this is a new line","Hi again sucker",680.24
791.23,579.35,,,,

Note carefully, however, that like all "CSV" files since the dawn
of man trying to portably export data by studiously avoiding binary
formats, it IS actually "line-oriented";

Nonsense, CSV is a binary format and always has been.
!!!!

Whaaaa...?!?!!

A "binary" format!???!! You call my example that I "exported"
from Excel a "binary" format? My world is askew! Also, you've
probably just given our weird Uncle Billy a heart attack...thanks
for that...
You cannot
properly parse a CSV file by assuming that each line is its own record.
<big sigh>Unless of course, each line in the "CSV" file IS its own
record...
there were two rows of a
maximum of six columns each in the Excel file. The only reason it
"exported" as three lines was because of the "embedded newline" in
the cell at the fourth column of the first row; other than that, it
follows
the grand tradition of a "field" being delimited by a comma (or
whatever) and the "record" being delimited by a newline.

I am not aware of this grand tradition
Well, I'm nothing if not a traditionalist...
you speak of and I have
processed numerous CSV files from multiple sources and applications
over the last few years, all of which handled embedded commas and
newlines. You may be confusing CSV with comma-delimited files,
"CDV" files? I must again say, "Whaaaa?!??!!"
a text
format in which all commas are row separators, records are delimited
with newlines, and commas and newlines cannot exist in the actual data.
Yeah, just like the "*.csv" file I "exported" from Excel: a text format,
the end of every "record" is delimited by a newline, and surprise,
commas COULD always exist in a "CSV" file, just within a text
"field" protected by enclosing double quotes, which was always
part of the "classic" CSV format; after you've "protected" your
text fields, an "embedded newline" ain't hardly nuttin'...
Comma-delimited files are significantly easier to process than real
CSV files but suffer from obvious disadvantages that are shared with
any character-delimited format.
Yeah, being able to share data simply across dozens of systems is
quite the disadvantage...
Check out http://en.wikipedia.org/wiki/Comma-separated_values and
http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm for the details of
the CSV format and how it is used by real-world applications.
Yeah, probably not going there, don't care...I did admit that I was aware
of a format called "CSV" (is this the "RFC 4180" referenced elsethread?)
that bore little resemblance to the jillions of "CSV" files I've worked
with,
so what?
If I wanted to, I guess I could have inserted the complete text for
"War and Peace" into that column, and thus over-flowed my
512-character buffer by a factor of about a million. For that
matter, I could have inserted the complete text for "Paradise
Lost" in the cell at the first column of the second row, even though
I put a float number in that column in the first row, and what the hell
would I do to "process" THAT data? I don't even think a
"state machine" would save my bacon at that point...

It's not an overly simple task but it can be done without extraordinary
effort.
Of course it can be done, but if you have no interest in data of that
format, what's the point? Of course, you don't really get the point,
even though I've made it about a billion times: I DON"T CARE ABOUT
PERFORMING SOME TYPE OF RANDOM READING OF
ANY POSSIBLE "CSV" FILE THAT MAY EXIST, I ACTUALLY
WRITE THESE THINGS CALLED "APPLICATIONS" THAT
DO THIS FUNNY THING CALLED "PROCESS DATA".

But everybody has their own hobby...mine is NOT storing the text
of all the great classics of literature as "cells" of an Excel file,
"exporting"
them as a "CSV" file, then "reading" them out of the "CSV" file...
Excel does it,
If you're talking about what I posted as an example of what
Excel "exports" as a "*.csv" file, I have no idea why you would
call that a "binary" file. Apparently, in addition to nobody knowing
exactly what a "CSV" file is, nobody knows what a "binary" file
is either...it just gets worse and worse...
and I have written a real CSV parser in pure
ANSI C which I recently made available at
http://sourceforge.net/projects/libcsv.
Well, good for you, another link that I'll never visit...unless I want
to "import" data that was previously "exported" from Excel (begging
the question why I would want to do that to do something that
isn't covered by Excel functionality in the first place).

---
William Ernest Reid

Nov 30 '06 #43

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: deko | last post by:
I have a large file that I need to put into an array - but I only need the last hundred or so lines. The file is approx. 1100 lines total (20k). Is it more efficient to use fgets in a situation...
11
by: herrcho | last post by:
int get_lines(char *lines) { int n = 0; char buffer; puts("Enter one line at time; enter a blank when done"); while ((n < MAXLINES) && (gets(buffer) !=0 ) && (buffer != '\0')) { if ((lines...
5
by: Rob Somers | last post by:
Hey all I am writing a program to keep track of expenses and so on - it is not a school project, I am learning C as a hobby - At any rate, I am new to structs and reading and writing to files,...
51
by: Alan | last post by:
hi all, I want to define a constant length string, say 4 then in a function at some time, I want to set the string to a constant value, say a below is my code but it fails what is the correct...
20
by: Paul D. Boyle | last post by:
Hi all, There was a recent thread in this group which talked about the shortcomings of fgets(). I decided to try my hand at writing a replacement for fgets() using fgetc() and realloc() to read...
10
by: name | last post by:
When I started testing the algorithms for my wrap program, I threw together this snippet of code, which works quite well. Except that it (predictably) segfaults at the end when it tries to go...
8
by: AG | last post by:
Hello, This is my first post to this group, and on top of that I am a beginner. So please direct me to another group if this post seems out of place.... I have recently written a program which...
9
by: uidzer0 | last post by:
Hey everyone, Taken the following code; is there a "proper" or dynamic way to allocate the length of line? #include <stdio.h> #include <errno.h> int main(int argc, char **argv) { FILE *fp;
14
by: subramanian100in | last post by:
Suppose fgets is used to read a line of input. char str; fgets(str, sizeof(str), stdin); After reading some characters on the same line, if end-of-file is encountered, will fgets return the 'str'...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.