473,405 Members | 2,379 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

Is this fully portable and/or smart?


This is how I handle a check that the last character of a text
file is a newline:

/* checks if newline is last character of text file */
unsigned check_text_file_newline_termination(FILE *test_file) {
int end_char;

fseek(test_file,-1L,SEEK_END);

end_char=getc(test_file);

rewind(test_file);

if(end_char=='\n') return TRUE;
else return FALSE;
}

The question is: is this actually guaranteed to work properly on
all "conforming" C "implementations"? My reading of the spec says
"no", but of course it works just fine on the several systems I've
used it on...

Also, would this be the fastest way possible to make this check,
as opposed to a perhaps more "conformant" scheme of reading
every character in the file to find the end of the file?

---
William Ernest Reid

Jun 27 '08 #1
69 2556
Bill Reid wrote:
This is how I handle a check that the last character of a text
file is a newline:

/* checks if newline is last character of text file */
unsigned check_text_file_newline_termination(FILE *test_file) {
int end_char;

fseek(test_file,-1L,SEEK_END);

end_char=getc(test_file);

rewind(test_file);

if(end_char=='\n') return TRUE;
else return FALSE;
}

The question is: is this actually guaranteed to work properly on
all "conforming" C "implementations"? My reading of the spec says
"no", but of course it works just fine on the several systems I've
used it on...
The only non-conforming things that I can see here are TRUE and FALSE.
stdbool.h defines the macros 'true' to 1 and 'false' to 0.
>
Also, would this be the fastest way possible to make this check,
as opposed to a perhaps more "conformant" scheme of reading
every character in the file to find the end of the file?
I would imagine that, since fseek can work directly on I/O structures at
the OS level. In the worst case, fseek will not perform worse than
reading the file one character at a time, so I would recommend using it.
>
---
William Ernest Reid


--
Pietro Cerutti
Jun 27 '08 #2
Bill Reid wrote:
This is how I handle a check that the last character of a text
file is a newline:
Why bother?
/* checks if newline is last character of text file */
unsigned check_text_file_newline_termination(FILE *test_file) {
int end_char;

fseek(test_file,-1L,SEEK_END);
"For a text stream, either offset shall be zero, or offset shall be a
value returned by an earlier successful call to the ftell function
on a stream associated with the same file and whence shall be
SEEK_SET."

And you should at least check whether the call succeeds. If it
doesn't, what will you return? I suggest you change the return
type to int and return EOF, 0 or some positive number.
end_char=getc(test_file);
rewind(test_file);

if(end_char=='\n') return TRUE;
else return FALSE;
}

The question is: is this actually guaranteed to work properly on
all "conforming" C "implementations"?
No. But even if you read the whole file, rewind can fail.

--
Peter
Jun 27 '08 #3
"Bill Reid" <ho********@happyhealthy.netwrites:
This is how I handle a check that the last character of a text
file is a newline:

/* checks if newline is last character of text file */
unsigned check_text_file_newline_termination(FILE *test_file) {
int end_char;

fseek(test_file,-1L,SEEK_END);

end_char=getc(test_file);

rewind(test_file);

if(end_char=='\n') return TRUE;
else return FALSE;
}
The usual type for boolean values (unless you have C99's _Bool) is
int, not unsigned. It doesn't matter, but anyone reading your code is
going to waste a little time wondering why you used unsigned rather than
int.

Presumably TRUE and FALSE are defined somewhere -- but you really
don't need them. I'd replace the if/else with:

return end_char == '\n';

You don't check whether fseek() succeeded. Not all files are
seekable. (Try seeking to the end of stdin, when it's reading from
your keyboard.)

rewind(test_file) goes back to the beginning of the file -- which
isn't necessarily where it was before the function was called. If you
want to restore the file's position, use ftell() and fseek().
The question is: is this actually guaranteed to work properly on
all "conforming" C "implementations"? My reading of the spec says
"no", but of course it works just fine on the several systems I've
used it on...
You're right, it's not guaranteed by the standard. For a text file,
fseek() requires either an offset of zero, or an offset returned by an
earlier call to ftell() and a whence value of SEEK_SET. (As you've
seen, it happens to work on your system.)
Also, would this be the fastest way possible to make this check,
as opposed to a perhaps more "conformant" scheme of reading
every character in the file to find the end of the file?
Probably.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jun 27 '08 #4

Peter Nilsson <ai***@acay.com.auwrote in message
news:53**********************************@s33g2000 pri.googlegroups.com...
Bill Reid wrote:
This is how I handle a check that the last character of a text
file is a newline:

Why bother?
Because sometimes I care...if only a little. And I've heard that
some systems don't "like" text files that aren't terminated by a
newline. So when I get a text file from an "unreliable" source,
I generally "fix" it, by appending the newline if needed...
/* checks if newline is last character of text file */
unsigned check_text_file_newline_termination(FILE *test_file) {
int end_char;

fseek(test_file,-1L,SEEK_END);

"For a text stream, either offset shall be zero, or offset shall be a
value returned by an earlier successful call to the ftell function
on a stream associated with the same file and whence shall be
SEEK_SET."
Yeah, that's some good copy'n'pasting there...the documentation
for my "implementation" even reads pretty much the same...and yet,
confoundedly enough, the code works fine, and on other systems
too...
And you should at least check whether the call succeeds. If it
doesn't, what will you return?
Hey, I'll go you one better, my "implementation" may "silently"
fail an fseek()...What WILL you do, WHAT WILL YOU DO?!??!!
I suggest you change the return
type to int and return EOF, 0 or some positive number.
Suggestion "under advisement"...
end_char=getc(test_file);
rewind(test_file);

if(end_char=='\n') return TRUE;
else return FALSE;
}

The question is: is this actually guaranteed to work properly on
all "conforming" C "implementations"?

No.
Great, now I've two contrary opinions...
But even if you read the whole file, rewind can fail.
ANOTHER problem!!! What WILL you do, WHAT WILL
YOU DO??!!?!!

In any event, maybe I shouldn't have used the word "guarantee",
although I actually was looking for a little "spec lawyering" (and
came to the right place!). Maybe as an alternate question, just
how often WILL all these things fail?

---
William Ernest Reid

Jun 27 '08 #5
Bill Reid wrote:
Peter Nilsson <ai***@acay.com.auwrote ...
>Bill Reid wrote:
>>[...]
fseek(test_file,-1L,SEEK_END);
"For a text stream, either offset shall be zero, or offset shall be a
value returned by an earlier successful call to the ftell function
on a stream associated with the same file and whence shall be
SEEK_SET."

Yeah, that's some good copy'n'pasting there...the documentation
for my "implementation" even reads pretty much the same...and yet,
confoundedly enough, the code works fine, and on other systems
too...
Let's take a quick poll: How many c.l.c. readers have
(1) driven an automobile while not wearing a seat belt,
and (2) been killed in an automobile accident while doing
so? Hands, anyone?

"I got away with it, once" is not the same as "It works,
always, or even often."
>And you should at least check whether the call succeeds. If it
doesn't, what will you return?

Hey, I'll go you one better, my "implementation" may "silently"
fail an fseek()...What WILL you do, WHAT WILL YOU DO?!??!!
You will bail out. Instead of returning TRUE "This file
ends with a newline" or FALSE "This file does not end with a
newline," you report "I don't know about this file." The
existence of Boolean algebra does not imply that you can
answer TRUE or FALSE to every question. "TRUE or FALSE: The
human whose dung became the coprolite recently discovered in
Oregon was left-handed." "TRUE or FALSE: The answer to this
question is FALSE."
In any event, maybe I shouldn't have used the word "guarantee",
although I actually was looking for a little "spec lawyering" (and
came to the right place!). Maybe as an alternate question, just
how often WILL all these things fail?
On some systems an invalid fseek() will not produce an
immediate failure, but the subsequent I/O operation to the
invalid location will.

On some systems that mark line endings with something
other than a one-byte sentinel, seeking to one byte before
the end of the file and reading what you find there will be
misleading at best.

How often is that? What's your statistical universe, and
is your seat belt fastened?

--
Eric Sosman
es*****@ieee-dot-org.invalid
Jun 27 '08 #6
Bill Reid <ho********@happyhealthy.netwrote:
Peter Nilsson <ai***@acay.com.auwrote in message
news:53**********************************@s33g2000 pri.googlegroups.com...
Bill Reid wrote:
This is how I handle a check that the last character of a text
file is a newline:
Why bother?

Because sometimes I care...if only a little. And I've heard that
some systems don't "like" text files that aren't terminated by a
newline. So when I get a text file from an "unreliable" source,
I generally "fix" it, by appending the newline if needed...
I can confirm that this has bitten me before. I can't recall the exact
system, but there was some UNIX program that I was using which didn't
behave as expected without the newline at the end. Also, I have seen
some e-commerce software that uses text files for mass imports and
exports of data (the code fr which I do not control) which fails without
the trailing newline.
--
Aaron Hsu <ar*****@sacrideo.us| Jabber: ar*****@jabber.org
``Government is the great fiction through which everybody endeavors to
live at the expense of everybody else.'' - Frederic Bastiat
Jun 27 '08 #7
"Bill Reid" <ho********@happyhealthy.netwrites:
Peter Nilsson <ai***@acay.com.auwrote in message
news:53**********************************@s33g2000 pri.googlegroups.com...
>Bill Reid wrote:
This is how I handle a check that the last character of a text
file is a newline:

Why bother?

Because sometimes I care...if only a little. And I've heard that
some systems don't "like" text files that aren't terminated by a
newline. So when I get a text file from an "unreliable" source,
I generally "fix" it, by appending the newline if needed...
/* checks if newline is last character of text file */
unsigned check_text_file_newline_termination(FILE *test_file) {
int end_char;

fseek(test_file,-1L,SEEK_END);

"For a text stream, either offset shall be zero, or offset shall be a
value returned by an earlier successful call to the ftell function
on a stream associated with the same file and whence shall be
SEEK_SET."

Yeah, that's some good copy'n'pasting there...the documentation
for my "implementation" even reads pretty much the same...and yet,
confoundedly enough, the code works fine, and on other systems
too...
The quoted text is not in a constraint. In the C standard, the use of
the word "shall" outside a constraint means that, if the requirement
is violated, the behavior is undefined. "Works fine" is one possible
consequence of undefined behavior.
>And you should at least check whether the call succeeds. If it
doesn't, what will you return?

Hey, I'll go you one better, my "implementation" may "silently"
fail an fseek()...What WILL you do, WHAT WILL YOU DO?!??!!
If your implementation's fseek() function can fail without properly
reporting the error, then that's a bug; you should report it to the
vendor.
>I suggest you change the return
type to int and return EOF, 0 or some positive number.

Suggestion "under advisement"...
end_char=getc(test_file);
rewind(test_file);

if(end_char=='\n') return TRUE;
else return FALSE;
}

The question is: is this actually guaranteed to work properly on
all "conforming" C "implementations"?

No.

Great, now I've two contrary opinions...
What two contrary opinions? I don't recall anyone saying that it's
guaranteed to work. If anyone did, they were mistaken.
>But even if you read the whole file, rewind can fail.

ANOTHER problem!!! What WILL you do, WHAT WILL
YOU DO??!!?!!
I don't know. It's your program; what will you do? Error handling
isn't easy.
In any event, maybe I shouldn't have used the word "guarantee",
although I actually was looking for a little "spec lawyering" (and
came to the right place!). Maybe as an alternate question, just
how often WILL all these things fail?
Elsethread, Eric Sosman wrote:
| On some systems that mark line endings with something
| other than a one-byte sentinel, seeking to one byte before
| the end of the file and reading what you find there will be
| misleading at best.

For example, Windows uses a CR LF pair to mark end-of-line. If you
seek to one byte before the end of a properly terminated text file,
and then read a single character, you'll probably get a single LF
character, which will probably be translated to '\n'. Normally the CR
LF pair is what gets translated to '\n'. I'm actually not 100%
certain what will happen; it might depend on the C implementation.

Other systems use things other than character sequences to mark line
endings. VMS, for example, has a rather sophisticated
record-management system. Some mainframes use fixed-width lines
(inherited from punch card images), though I don't know whether this
is still in use. This kind of thing is exactly why the standard is so
vague about the behavior of fseek on text files.

If you don't mind the fact that your code is non-portable, that's
fine. The fseek trick is likely to be a whole heck of a lot faster
than the portable method of reading the entire file and seeing how it
ends.

For that matter, even that slow method isn't guaranteed to work. C99
7.19.2p2:

Whether the last line requires a terminating new-line character is
implementation-defined.

And the standard doesn't say what happens if the implementation
requires a terminating new-line and a particular file doesn't have
one. In my opinion, the behavior is undefined. It might report an
error, or it might quietly add a new-line on input.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jun 27 '08 #8
Keith Thompson said:
"Bill Reid" <ho********@happyhealthy.netwrites:
<big ol' snip>
>Great, now I've two contrary opinions...

What two contrary opinions?
His, and everybody else's.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Jun 27 '08 #9

Keith Thompson <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
"Bill Reid" <ho********@happyhealthy.netwrites:
This is how I handle a check that the last character of a text
file is a newline:

/* checks if newline is last character of text file */
unsigned check_text_file_newline_termination(FILE *test_file) {
int end_char;

fseek(test_file,-1L,SEEK_END);

end_char=getc(test_file);

rewind(test_file);

if(end_char=='\n') return TRUE;
else return FALSE;
}

The usual type for boolean values (unless you have C99's _Bool) is
int, not unsigned. It doesn't matter, but anyone reading your code is
going to waste a little time wondering why you used unsigned rather than
int.
In general, I'm only interested in boolean values as a return from
a function; this is part of my "scheme" for gracefully exiting from
failed operations which I alluded to in another thread...
Presumably TRUE and FALSE are defined somewhere -- but you really
don't need them. I'd replace the if/else with:

return end_char == '\n';
Yeah, that is so "C", so obscure, so arcane...and a friggin' boolean
value to boot, so I like it...I'll take it under advisement...
You don't check whether fseek() succeeded. Not all files are
seekable. (Try seeking to the end of stdin, when it's reading from
your keyboard.)
Give me a break, I am clearly only interested in ACTUAL text files,
not all the crap that can be called a "file stream"...
rewind(test_file) goes back to the beginning of the file -- which
isn't necessarily where it was before the function was called. If you
want to restore the file's position, use ftell() and fseek().
That's EXACTLY what I want, except in most cases I'm not even
interested in that...as I said, I mostly use this to append a newline
to the end of the file if "needed"...
The question is: is this actually guaranteed to work properly on
all "conforming" C "implementations"? My reading of the spec says
"no", but of course it works just fine on the several systems I've
used it on...

You're right, it's not guaranteed by the standard. For a text file,
fseek() requires either an offset of zero, or an offset returned by an
earlier call to ftell() and a whence value of SEEK_SET. (As you've
seen, it happens to work on your system.)
Actually, several systems...
Also, would this be the fastest way possible to make this check,
as opposed to a perhaps more "conformant" scheme of reading
every character in the file to find the end of the file?

Probably.
Great! Nothing like blinding speed at the sake of reliability and
portability for no apparent actual performance benefit...

---
William Ernest Reid

Jun 27 '08 #10

Keith Thompson <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
"Bill Reid" <ho********@happyhealthy.netwrites:
Peter Nilsson <ai***@acay.com.auwrote in message
news:53**********************************@s33g2000 pri.googlegroups.com...
Bill Reid wrote:
This is how I handle a check that the last character of a text
file is a newline:

Why bother?
Because sometimes I care...if only a little. And I've heard that
some systems don't "like" text files that aren't terminated by a
newline. So when I get a text file from an "unreliable" source,
I generally "fix" it, by appending the newline if needed...
/* checks if newline is last character of text file */
unsigned check_text_file_newline_termination(FILE *test_file) {
int end_char;

fseek(test_file,-1L,SEEK_END);

"For a text stream, either offset shall be zero, or offset shall be a
value returned by an earlier successful call to the ftell function
on a stream associated with the same file and whence shall be
SEEK_SET."
Yeah, that's some good copy'n'pasting there...the documentation
for my "implementation" even reads pretty much the same...and yet,
confoundedly enough, the code works fine, and on other systems
too...

The quoted text is not in a constraint. In the C standard, the use of
the word "shall" outside a constraint means that, if the requirement
is violated, the behavior is undefined. "Works fine" is one possible
consequence of undefined behavior.
There's a lot of that kind of stuff in the "standard", ain't there?
And you should at least check whether the call succeeds. If it
doesn't, what will you return?
Hey, I'll go you one better, my "implementation" may "silently"
fail an fseek()...What WILL you do, WHAT WILL YOU DO?!??!!

If your implementation's fseek() function can fail without properly
reporting the error, then that's a bug; you should report it to the
vendor.
It's a problem with the operating system and is documented by
the "vendor". It IS a bother, a real bother, but what are you going
to do? It's the way the stupid system works, or, at least, what the
"vendor" says is the way the stupid system works, maybe just as
an excuse...
I suggest you change the return
type to int and return EOF, 0 or some positive number.
Suggestion "under advisement"...
end_char=getc(test_file);
rewind(test_file);

if(end_char=='\n') return TRUE;
else return FALSE;
}

The question is: is this actually guaranteed to work properly on
all "conforming" C "implementations"?

No.
Great, now I've two contrary opinions...

What two contrary opinions? I don't recall anyone saying that it's
guaranteed to work. If anyone did, they were mistaken.
I'm not going to narc anybody out, but somebody DID say it was
fine and completely conforming except for the "TRUE" and "FALSE"
defines...
But even if you read the whole file, rewind can fail.
ANOTHER problem!!! What WILL you do, WHAT WILL
YOU DO??!!?!!

I don't know. It's your program; what will you do? Error handling
isn't easy.
It gets easier if you just ignore it, hoping the error will "go away"...
In any event, maybe I shouldn't have used the word "guarantee",
although I actually was looking for a little "spec lawyering" (and
came to the right place!). Maybe as an alternate question, just
how often WILL all these things fail?

Elsethread, Eric Sosman wrote:
| On some systems that mark line endings with something
| other than a one-byte sentinel, seeking to one byte before
| the end of the file and reading what you find there will be
| misleading at best.

For example, Windows uses a CR LF pair to mark end-of-line. If you
seek to one byte before the end of a properly terminated text file,
and then read a single character, you'll probably get a single LF
character, which will probably be translated to '\n'.
Unless the "implementation" goes to the "extra trouble" of translating
the CR LF to a single newline for "text files" even when "seeking" from
the end of the file, which apparently mine does!
Normally the CR
LF pair is what gets translated to '\n'. I'm actually not 100%
certain what will happen; it might depend on the C implementation.
Precisely! But AGAIN, it might be one of those things that "work"
just about all the time; "statistically", it "works" much more often
across the spectrum of possible implementations than it doesn't
"work"...
Other systems use things other than character sequences to mark line
endings. VMS, for example, has a rather sophisticated
record-management system.
You could still "implement" a version of fseek() for text files that
would "work"...I've actually worked with VMS systems, and getting
VMS file systems to "work" with "C" seemed to be "doable", if not
actually easy...
Some mainframes use fixed-width lines
(inherited from punch card images), though I don't know whether this
is still in use. This kind of thing is exactly why the standard is so
vague about the behavior of fseek on text files.
Exactly...the standard is kind of like "Java" in that regard...but
better, in the sense that it is just a bunch of empty prose, and not
an empty programming "language"...
If you don't mind the fact that your code is non-portable, that's
fine. The fseek trick is likely to be a whole heck of a lot faster
than the portable method of reading the entire file and seeing how it
ends.
This thrills me no end...
For that matter, even that slow method isn't guaranteed to work. C99
7.19.2p2:

Whether the last line requires a terminating new-line character is
implementation-defined.
Yes, although this really isn't a problem for me, I would prefer the
maximum "portable" solution to text files...
And the standard doesn't say what happens if the implementation
requires a terminating new-line and a particular file doesn't have
one. In my opinion, the behavior is undefined. It might report an
error, or it might quietly add a new-line on input.
That's what I've written! If the "implementation" adds the newline,
then I don't have to, even though I really didn't have to in the first
place...

---
William Ernest Reid

Jun 27 '08 #11
Ulrich Eckhardt <do******@knuut.dewrites:
[...]
A few comments:
* Both fseek() and rewind() could fail, but you don't check for errors.
[...]

If you want to check for errors, don't use rewind(); it specifically
has no way to report an error condition.

C99 7.19.8.5:

The rewind function sets the file position indicator for the
stream pointed to by stream to the beginning of the file. It is
equivalent to

(void)fseek(stream, 0L, SEEK_SET)

except that the error indicator for the stream is also cleared.
--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jun 27 '08 #12
Bill Reid wrote, On 20/05/08 02:00:
Peter Nilsson <ai***@acay.com.auwrote in message
news:53**********************************@s33g2000 pri.googlegroups.com...
>Bill Reid wrote:
<snip>
>And you should at least check whether the call succeeds. If it
doesn't, what will you return?

Hey, I'll go you one better, my "implementation" may "silently"
fail an fseek()...What WILL you do, WHAT WILL YOU DO?!??!!
You asked if it would work on all possible conforming implementations,
so why are you shouting at someone for pointing out places where it
might fail without your code spotting it?
--
Flash Gordon
Jun 27 '08 #13
Keith Thompson wrote, On 20/05/08 01:54:

<snip>
rewind(test_file) goes back to the beginning of the file -- which
isn't necessarily where it was before the function was called. If you
want to restore the file's position, use ftell() and fseek().
Better to use fgetpos/fsetpos just in case the position does not fit in
a long (e.g. large files on a system with a 32 bit long, and yes they do
exist before Bill complains about yet another non-existent problem).
--
Flash Gordon
Jun 27 '08 #14
"Bill Reid" <ho********@happyhealthy.netwrites:
Ulrich Eckhardt <do******@knuut.dewrote in message
news:69*************@mid.uni-berlin.de...
>Bill Reid wrote:
/* checks if newline is last character of text file */
unsigned check_text_file_newline_termination(FILE *test_file) {
int end_char;

fseek(test_file,-1L,SEEK_END);

end_char=getc(test_file);

rewind(test_file);

if(end_char=='\n') return TRUE;
else return FALSE;
}

A few comments:
* Both fseek() and rewind() could fail, but you don't check for errors.

Yeah, heard THAT before...
That sounds dismissive. Was it meant to be?

You'll hear it again, every time you post code here that doesn't check
for errors.

Here's some advice. Decide exactly what your function is supposed to
do, and *document* it. Think about all the corner cases. What if
test_file was opened in binary mode? What if it's not seekable? What
if it's not a well-formed text file for whatever OS you're using?
Think about what your function should do in each of those cases.

In some cases, the answer may well be "I don't care" (equivalent to
the way the standard leaves behavior undefined in many cases).
Perhaps you're 100% certain that some conditions will never occur,
because you'll be careful when you write calls to your function. For
example, if test_file has been passed to fclose(), I don't think
there's anything you can do to detect it; you just have to avoid
making such a call.

But for each possible error condition, *think* about how your function
can and should respond. If you decide not to handle some case, then
*decide* not to handle it; don't just ignore it.

[...]
>It will pretty much fail on any implementation when the file is some kind
of
>pipe, i.e. a non-seekable stream like stdin.

But will it always work on ACTUAL text files? You know, on-a-disk
text files?
I don't know. I'm fairly sure that it *won't* work on text files on
some systems, but I'm not familiar with every system in existence.
What I am familiar with is the set of guarantees provided by the
standard. I think you've already been told just about everything
there is to say about that.

[snip]

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jun 27 '08 #15
"Bill Reid" <ho********@happyhealthy.netwrites:
Keith Thompson <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
>"Bill Reid" <ho********@happyhealthy.netwrites:
This is how I handle a check that the last character of a text
file is a newline:

/* checks if newline is last character of text file */
unsigned check_text_file_newline_termination(FILE *test_file) {
[snip]
if(end_char=='\n') return TRUE;
else return FALSE;
}

The usual type for boolean values (unless you have C99's _Bool) is
int, not unsigned. It doesn't matter, but anyone reading your code is
going to waste a little time wondering why you used unsigned rather than
int.

In general, I'm only interested in boolean values as a return from
a function; this is part of my "scheme" for gracefully exiting from
failed operations which I alluded to in another thread...
Ok. What does that have to do with using unsigned for boolean values?
Why did you choose to use unsigned rather than int? (This is a fairly
minor style issue.)
>Presumably TRUE and FALSE are defined somewhere -- but you really
don't need them. I'd replace the if/else with:

return end_char == '\n';

Yeah, that is so "C", so obscure, so arcane...and a friggin' boolean
value to boot, so I like it...I'll take it under advisement...
>You don't check whether fseek() succeeded. Not all files are
seekable. (Try seeking to the end of stdin, when it's reading from
your keyboard.)

Give me a break, I am clearly only interested in ACTUAL text files,
not all the crap that can be called a "file stream"...
What a shame that your function takes a FILE* argument, and can
therefore potentially be called with any stream.

At least document your assumptions. And if you choose to ignore the
results of functions that are designed to give error indications, you
can do that -- but don't expect us not to point it out.

[...]

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jun 27 '08 #16
Flash Gordon <sp**@flash-gordon.me.ukwrites:
Keith Thompson wrote, On 20/05/08 01:54:
<snip>
>rewind(test_file) goes back to the beginning of the file -- which
isn't necessarily where it was before the function was called. If you
want to restore the file's position, use ftell() and fseek().

Better to use fgetpos/fsetpos just in case the position does not fit
in a long (e.g. large files on a system with a 32 bit long, and yes
they do exist before Bill complains about yet another non-existent
problem).
Good point.

I suspect the function itself won't work on such a system (it depends
on fseek() to jump to a point just before the end of the file). But
yes, it could at least restore the file to its previous position and
report and error.

Or it can leave the file at some arbitrary position if that's part of
its defined behavior.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jun 27 '08 #17
Bill Reid wrote:
I'm not going to narc anybody out, but somebody DID say it was
fine and completely conforming except for the "TRUE" and "FALSE"
defines...
It /*is*/ conforming. And it /*is not*/ guaranteed to do what you expect.

The following is conforming:

#include <stdio.h>
int main(void) {
printf("hello\n");
return (0);
}

still, printf could return something other than 6, in which case what
you would expect from the "conforming" program (to print hello followed
by a newline on the console) wouldn't probably have happened.
---
William Ernest Reid

--
Pietro Cerutti
Jun 27 '08 #18
Bill Reid <ho********@happyhealthy.netwrote:
Normally the CR
LF pair is what gets translated to '\n'. I'm actually not 100%
certain what will happen; it might depend on the C implementation.

Precisely! But AGAIN, it might be one of those things that "work"
just about all the time; "statistically", it "works" much more often
across the spectrum of possible implementations than it doesn't
"work"...
One other solution that is often very plausible would be to test the
results on platforms that you know will be targets of your system. You
can document the non-conformism, and be ready for it when it breaks your
system down the road. This assumes that you are not running anything
mission critical and that you have access to the platforms, which isn't
always the case, I understand.
--
Aaron Hsu <ar*****@sacrideo.us| Jabber: ar*****@jabber.org
``Government is the great fiction through which everybody endeavors to
live at the expense of everybody else.'' - Frederic Bastiat
Jun 27 '08 #19
Ulrich Eckhardt wrote:
[...]
It will pretty much fail on any implementation when the file is some kind of
pipe, i.e. a non-seekable stream like stdin. The stanza about textfiles
being not relatively seekable was already quoted in this thread, and since
that makes the implementation easier you can be sure that it will be done
on some systems that distinguish textfiles.
Even on systems that don't "distinguish" between text and
binary files there can be trouble. Consider text encoded in
a multi-byte or state-shifted scheme, and imagine dropping
into the middle of a sequence without knowing the encoding
state ...

--
Eric Sosman
es*****@ieee-dot-org.invalid
Jun 27 '08 #20
Bill Reid wrote:
[...]
Strangely enough, I'm not THAT interested whether
this function actually works, [...]
Then why did you start this thread in the first place,
and why do you pursue it?

--
Eric Sosman
es*****@ieee-dot-org.invalid
Jun 27 '08 #21

Flash Gordon <sp**@flash-gordon.me.ukwrote in message
news:up************@news.flash-gordon.me.uk...
Keith Thompson wrote, On 20/05/08 01:54:

<snip>
rewind(test_file) goes back to the beginning of the file -- which
isn't necessarily where it was before the function was called. If you
want to restore the file's position, use ftell() and fseek().

Better to use fgetpos/fsetpos just in case the position does not fit in
a long (e.g. large files on a system with a 32 bit long, and yes they do
exist before Bill complains about yet another non-existent problem).
I'm well aware of systems with "short" longs, but I'm willing to run
the risk of not handling TEXT files over 2GB in exchange for 100%
POSIX-compliance, which fgetpos/fsetpos isn't...

---
William Ernest Reid

Jun 27 '08 #22
Bill Reid wrote:
>
This is how I handle a check that the last character of a text
file is a newline:

/* checks if newline is last character of text file */
unsigned check_text_file_newline_termination(FILE *test_file) {
int end_char;

fseek(test_file,-1L,SEEK_END);

end_char=getc(test_file);

rewind(test_file);

if(end_char=='\n') return TRUE;
else return FALSE;
}

The question is: is this actually guaranteed to work properly on
all "conforming" C "implementations"? My reading of the spec says
"no", but of course it works just fine on the several systems I've
used it on...
What happens on systems where the last character of a text file
isn't necessarily '\n', even if there is a "newline" at the end
of the last line?

Consider MS-DOS, where the last character may be Ctrl-Z.

Now, you're probably saying "but I already tried it on Windows,
where the end-of-line isn't a single character, and it worked!"
Yes, it "worked", because the multi-character end-of-line
sequence happens to have '\n' as the last character. What will
happen if the system happens to use LF-CR rather than CR-LF?

Finally, consider VMS. It's been years, but as I recall, text
files under VMS cannot be randomly-seeked, as the files are
stores as variable-length records, and you can only seek to
a record boundary. (This is a good example of only being able
to pass certain values, such as those returned from ftell.)
Also, would this be the fastest way possible to make this check,
as opposed to a perhaps more "conformant" scheme of reading
every character in the file to find the end of the file?
On systems where your logic "works" (ie: probably most Unix
systems, though even there you have to take into account things
like pipes which cannot be rewound), your method is probably the
fastest. However, on those systems where your logic fails, it's
obviously not "fastest", as it won't work at all, regardless of
how "fast" it fails.
Finally, the definitive answer on portability would be from
7.19.9.2p4:

For a text stream, either offset shall be zero, or offset shall
be a value returned by an earlier successful call to the ftell
function on a stream associated with the same file and whence
shall be SEEK_SET.

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h|
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:Th*************@gmail.com>
Jun 27 '08 #23
"Bill Reid" <ho********@happyhealthy.netwrites:
Flash Gordon <sp**@flash-gordon.me.ukwrote in message
news:up************@news.flash-gordon.me.uk...
>Keith Thompson wrote, On 20/05/08 01:54:
<snip>
rewind(test_file) goes back to the beginning of the file -- which
isn't necessarily where it was before the function was called. If you
want to restore the file's position, use ftell() and fseek().

Better to use fgetpos/fsetpos just in case the position does not fit in
a long (e.g. large files on a system with a 32 bit long, and yes they do
exist before Bill complains about yet another non-existent problem).

I'm well aware of systems with "short" longs, but I'm willing to run
the risk of not handling TEXT files over 2GB in exchange for 100%
POSIX-compliance, which fgetpos/fsetpos isn't...
Where did you get the idea that fgetpos and fsetpos aren't
POSIX-compliant? They're standard C functions (both C90 and C99), and
therefore they're POSIX-compliant as well.

If you're willing to settle for POSIX compliance without necessarily
having code that's fully portable C, you should ask for advice in
comp.unix.programmer (this would let you use fseeko() and ftello(),
for example).

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jun 27 '08 #24

Eric Sosman <es*****@ieee-dot-org.invalidwrote in message
news:38******************************@comcast.com. ..
Bill Reid wrote:
Peter Nilsson <ai***@acay.com.auwrote ...
Bill Reid wrote:
[...]
fseek(test_file,-1L,SEEK_END);
"For a text stream, either offset shall be zero, or offset shall be a
value returned by an earlier successful call to the ftell function
on a stream associated with the same file and whence shall be
SEEK_SET."
Yeah, that's some good copy'n'pasting there...the documentation
for my "implementation" even reads pretty much the same...and yet,
confoundedly enough, the code works fine, and on other systems
too...

Let's take a quick poll: How many c.l.c. readers have
(1) driven an automobile while not wearing a seat belt,
and (2) been killed in an automobile accident while doing
so? Hands, anyone?
Yeah, and while you're at it, will somebody answer my
question from a few months ago: "how many times have
you been struck by lightening 10 times in a row while
simultaneously being eaten by a shark...on land?"
"I got away with it, once" is not the same as "It works,
always, or even often."
Yes, but in the instant case I don't think it is possible for it
to have worked thousands of times on a particular "implementation"
and then to quit working, and as a matter of fact, if it works ONCE
I can't see why it would fail to work forever...other than that, your
ability to distinguish between different probability domains is
improving, if only slightly...
And you should at least check whether the call succeeds. If it
doesn't, what will you return?
Hey, I'll go you one better, my "implementation" may "silently"
fail an fseek()...What WILL you do, WHAT WILL YOU DO?!??!!

You will bail out. Instead of returning TRUE "This file
ends with a newline" or FALSE "This file does not end with a
newline," you report "I don't know about this file." The
existence of Boolean algebra does not imply that you can
answer TRUE or FALSE to every question. "TRUE or FALSE: The
human whose dung became the coprolite recently discovered in
Oregon was left-handed." "TRUE or FALSE: The answer to this
question is FALSE."
OK, as usual, I'll "take it under advisement"...
In any event, maybe I shouldn't have used the word "guarantee",
although I actually was looking for a little "spec lawyering" (and
came to the right place!). Maybe as an alternate question, just
how often WILL all these things fail?

On some systems an invalid fseek() will not produce an
immediate failure, but the subsequent I/O operation to the
invalid location will.
Obviously not what I was doing, but actually something I
might want to think about for other code, since I do use it
for some forms of file parsing...again, though, seems to
work OK, for tens of thousands, even millions, of uses...
On some systems that mark line endings with something
other than a one-byte sentinel, seeking to one byte before
the end of the file and reading what you find there will be
misleading at best.

How often is that? What's your statistical universe, and
is your seat belt fastened?
How often do you port your applications (with a GUI interface)
to different systems anyway?

---
William Ernest Reid

Jun 27 '08 #25

Eric Sosman <es*****@ieee-dot-org.invalidwrote in message
news:3d******************************@comcast.com. ..
Bill Reid wrote:
[...]
Strangely enough, I'm not THAT interested whether
this function actually works, [...]

Then why did you start this thread in the first place,
and why do you pursue it?
Idle curiousity. Something came up recently about parsing
out text files, I remembered this code, was kind of wondering
why it worked...I wanted to confirm it was just a figment of
my imagination...

---
William Ernest Reid

Jun 27 '08 #26
In article <LF***************@bgtnsc05-news.ops.worldnet.att.net>
Bill Reid <ho********@happyhealthy.netwrote:

[on fseek()ing to offset -1 from SEEK_END]
>Yes, but in the instant case I don't think it is possible for it
to have worked thousands of times on a particular "implementation"
and then to quit working,
Obviously you have not used VMS. :-)
>and as a matter of fact, if it works ONCE I can't see why it
would fail to work forever...
VMS has dozens of file formats, and using the -1 trick works on
some of them, but not all of them. So it would depend on the
file format of the file you opened.

The answer (which by now should be obvious) to the first part of
the quesetion (in the subject line) is "it is not fully portable".
As for whether it is "smart", that one is trickier.

In my ancient TeX-DVI-file-handling library, which had a rather
different but related problem to solve, I had a machine-dependent
function I called "make seekable", so that you could string DVI-file
commands together with pipes. Some readers here in in comp.lang.c
may be aware that Unix-like systems (including Linux) cause seek
(including fseek()) operations on pipes to fail. Even if you
fopen() your file, it is possible that the name refers to a pipe
(e.g., a "named pipe", or perhaps simply /dev/stdin), so that the
seek will fail.

The fully-portable, but ugly, solution is simply to copy the entire
file, adding a newline at the end if and only if the original
version did not have one. This is obviously going to be slower
than a machine-specific function that can use the seek-to-end trick.
Whether it is "significantly" slower depends on many other things.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: gmail (figure it out) http://web.torek.net/torek/index.html
Jun 27 '08 #27
Bill Reid wrote:
Eric Sosman <es*****@ieee-dot-org.invalidwrote
>>
"I got away with it, once" is not the same as "It works,
always, or even often."

Yes, but in the instant case I don't think it is possible for it
to have worked thousands of times on a particular "implementation"
and then to quit working, and as a matter of fact, if it works ONCE
I can't see why it would fail to work forever...other than that, your
ability to distinguish between different probability domains is
improving, if only slightly...
... but you didn't ask about whether something would "quit
working," you asked whether it was "fully portable." The fact
that a non-guaranteed something happens to yield Result R on
one particular implementation does not mean that you can rely
on getting R on other systems.
>>In any event, maybe I shouldn't have used the word "guarantee",
although I actually was looking for a little "spec lawyering" (and
came to the right place!). Maybe as an alternate question, just
how often WILL all these things fail?
On some systems an invalid fseek() will not produce an
immediate failure, but the subsequent I/O operation to the
invalid location will.

Obviously not what I was doing, [...]
You did an fseek() immediately followed by a getc().
How often do you port your applications (with a GUI interface)
to different systems anyway?
What have gooeys got to do with whether a text file is
in canonical form or not? And if portability is not of interest
to you, why do you ask "Is this fully portable?"

Have you met three gruff billy-goats recently?

--
Er*********@sun.com
Jun 27 '08 #28
In article <48***************@spamcop.net>
Kenneth Brody <ke******@spamcop.netwrote:
>Finally, consider VMS. It's been years, but as I recall, text
files under VMS cannot be randomly-seeked, as the files are
stores as variable-length records, and you can only seek to
a record boundary. (This is a good example of only being able
to pass certain values, such as those returned from ftell.)
Actually, fseek() on VMS is considerably more complicated than
that.

VMS record formats fall into a couple of different categories. A
text file with fixed-length records "acts like" a punched card (the
format more or less goes with old-style mainframe punched-card
formats). If the record format is "fixed" and the length is N,
then it is easy to map from an offset-in-bytes to a <line,
record-offsetpair: offset o is line (o/N)+1 (line numbers start
at 1), record-offset (o%N). Of course, mapping this to a C stdio
text stream is more work, because the text stream has to insert
"apparent newlines" between each record, and optionally remove
trailing blanks.

A text file with variable-length records is tougher. The
variable-length records may have prefix byte-count fields. (In
early versions of VMS, this was the only other kind of text file.)
The VMS C library, however, still allows seeking to any position
within such a file, *provided* you have been to that position
earlier and used ftell() to find out where you are. The value
returned by ftell() is encoded: the ftell() routine packages
up a fairly large set of information that allows the C library
to find that record and its offset again, puts the data into a
table entry, and returns a pointer into the table entry (either
as an offset or as an actual pointer -- I never learned which).
(The table is freed when the stdio "FILE *" stream is closed.)

Finally, in VMS version 5 and later, there is a new [%] text format
called "stream-LF". In this format, text has variable-length
records that are separated/terminated by a linefeed (or "newline")
character. In other words, these files look exactly like any
Unix-like system's files. Here seeking to any arbitrary offset is
trivial, and everything works nicely for C.

[% Well, it was new back then. :-) ]
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: gmail (figure it out) http://web.torek.net/torek/index.html
Jun 27 '08 #29
Chris Torek wrote:
In article <48***************@spamcop.net>
Kenneth Brody <ke******@spamcop.netwrote:
>Finally, consider VMS. It's been years, but as I recall, text
files under VMS cannot be randomly-seeked, as the files are
stores as variable-length records, and you can only seek to
a record boundary. (This is a good example of only being able
to pass certain values, such as those returned from ftell.)

Actually, fseek() on VMS is considerably more complicated than
that.

VMS record formats fall into a couple of different categories. A
text file with fixed-length records "acts like" a punched card (the
format more or less goes with old-style mainframe punched-card
formats). If the record format is "fixed" and the length is N,
then it is easy to map from an offset-in-bytes to a <line,
record-offsetpair: offset o is line (o/N)+1 (line numbers start
at 1), record-offset (o%N). Of course, mapping this to a C stdio
text stream is more work, because the text stream has to insert
"apparent newlines" between each record, and optionally remove
trailing blanks.

A text file with variable-length records is tougher. The
variable-length records may have prefix byte-count fields. (In
early versions of VMS, this was the only other kind of text file.)
The VMS C library, however, still allows seeking to any position
within such a file, *provided* you have been to that position
earlier and used ftell() to find out where you are. The value
returned by ftell() is encoded: the ftell() routine packages
up a fairly large set of information that allows the C library
to find that record and its offset again, puts the data into a
table entry, and returns a pointer into the table entry (either
as an offset or as an actual pointer -- I never learned which).
(The table is freed when the stdio "FILE *" stream is closed.)

Finally, in VMS version 5 and later, there is a new [%] text format
called "stream-LF". In this format, text has variable-length
records that are separated/terminated by a linefeed (or "newline")
character. In other words, these files look exactly like any
Unix-like system's files. Here seeking to any arbitrary offset is
trivial, and everything works nicely for C.

[% Well, it was new back then. :-) ]
There were/are also Stream-CR and Stream (using CR/LF),
and a format called Undef that was probably the best imitation
of Unix "binary" files. The real oddball was the Variable
with Fixed Control (VFC) format, where each record had

- a two-byte count word giving the record length
- a fixed-length "control" prefix
- the variable-length "payload"
- an optional padding byte to even out the total

In theory the control prefix could have been used for just
about anything, but a conventional use was to encode "carriage
control" information in the style of line printers: skip to
top-of page before printing this line and then double-space,
overprint this line on the preceding one, and so on. The C
library could read these files and present them as C-ish char
sequences, fabricating extra newlines, form feeds, carriage
returns and whatnot before and after each line's payload --
but since the amount of data a prefix generates varies with its
value, trying to fseek() to the umpty-umpth character of the
translated stream is just not in the cards ...

(VMS also supported/s "file organizations" that were/are
not sequential, without a fixed notion of "next record" and
not amenable to C's model of a file as an ordered sequence of
characters. C's support for these organizations was somewhat,
er, limited back in the days when I used VMS, and I'd be mildly
surprised to learn that it had improved substantially in the
last fifteen years or so, since the models seem irreconcilable.
In particular, the question of whether there's a newline at
the end of a file where no record is "last" seems unanswerable --
but since you wouldn't use a non-sequential file to store a
sequence of characters the issue can probably be ignored.)

--
Er*********@sun.com
Jun 27 '08 #30
Peter Nilsson wrote:
Bill Reid wrote:
>This is how I handle a check that the last character of a text
file is a newline:

Why bother?
>/* checks if newline is last character of text file */
unsigned check_text_file_newline_termination(FILE *test_file) {
int end_char;

fseek(test_file,-1L,SEEK_END);

"For a text stream, either offset shall be zero, or offset shall be a
value returned by an earlier successful call to the ftell function
on a stream associated with the same file and whence shall be
SEEK_SET."

And you should at least check whether the call succeeds. If it
doesn't, what will you return? I suggest you change the return
type to int and return EOF, 0 or some positive number.
> end_char=getc(test_file);
rewind(test_file);

if(end_char=='\n') return TRUE;
else return FALSE;
}

The question is: is this actually guaranteed to work properly on
all "conforming" C "implementations"?

No. But even if you read the whole file, rewind can fail.
Assuming you have opened "test_file" successfully, what do you think
might cause 'rewind(test_file);' to fail?

--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
Jun 27 '08 #31
Joe Wright wrote:
Peter Nilsson wrote:
>Bill Reid wrote:
>>This is how I handle a check that the last character of a text
file is a newline:

Why bother?
>>/* checks if newline is last character of text file */
unsigned check_text_file_newline_termination(FILE *test_file) {
int end_char;

fseek(test_file,-1L,SEEK_END);

"For a text stream, either offset shall be zero, or offset shall be a
value returned by an earlier successful call to the ftell function
on a stream associated with the same file and whence shall be
SEEK_SET."

And you should at least check whether the call succeeds. If it
doesn't, what will you return? I suggest you change the return
type to int and return EOF, 0 or some positive number.
>> end_char=getc(test_file);
rewind(test_file);

if(end_char=='\n') return TRUE;
else return FALSE;
}

The question is: is this actually guaranteed to work properly on
all "conforming" C "implementations"?

No. But even if you read the whole file, rewind can fail.
Assuming you have opened "test_file" successfully, what do you think
might cause 'rewind(test_file);' to fail?
Since rewind is likely be implemented as fseek(f, 0, SEEK_SET), any
reason why fseek may fail would do.

E.g., my system's man page says that rewind could fail "for any of the
errors specified for the routines fflush(3), fstat(2), lseek(2), and
malloc(3)"
--
Pietro Cerutti
Jun 27 '08 #32
On May 20, 8:52 pm, Joe Wright <joewwri...@comcast.netwrote:
Peter Nilsson wrote:
Bill Reid wrote:
This is how I handle a check that the last character of a text
file is a newline:
Why bother?
/* checks if newline is last character of text file */
unsigned check_text_file_newline_termination(FILE *test_file) {
int end_char;
fseek(test_file,-1L,SEEK_END);
"For a text stream, either offset shall be zero, or offset shall be a
value returned by an earlier successful call to the ftell function
on a stream associated with the same file and whence shall be
SEEK_SET."
And you should at least check whether the call succeeds. If it
doesn't, what will you return? I suggest you change the return
type to int and return EOF, 0 or some positive number.
end_char=getc(test_file);
rewind(test_file);
if(end_char=='\n') return TRUE;
else return FALSE;
}
The question is: is this actually guaranteed to work properly on
all "conforming" C "implementations"?
No. But even if you read the whole file, rewind can fail.

Assuming you have opened "test_file" successfully, what do you think
might cause 'rewind(test_file);' to fail?
test_file could be a pipe.
Jun 27 '08 #33
Joe Wright <jo********@comcast.netwrites:
Peter Nilsson wrote:
[...]
>No. But even if you read the whole file, rewind can fail.
Assuming you have opened "test_file" successfully, what do you think
might cause 'rewind(test_file);' to fail?
If "test_file" isn't seekable, then rewind(test_file) will fail to
rewind it -- but it has no way to tell you that it failed.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jun 27 '08 #34
Joe Wright wrote:
Peter Nilsson wrote:
>Bill Reid wrote:
.... snip ...
>>
>>The question is: is this actually guaranteed to work properly on
all "conforming" C "implementations"?

No. But even if you read the whole file, rewind can fail.

Assuming you have opened "test_file" successfully, what do you think
might cause 'rewind(test_file);' to fail?
For example, what if 'test_file' is actually stdin, or a modem
return the text typed in remotely, etc.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.
** Posted from http://www.teranews.com **
Jun 27 '08 #35
On May 19, 3:54 pm, "Bill Reid" <hormelf...@happyhealthy.netwrote:
This is how I handle a check that the last character of a text
file is a newline:

/* checks if newline is last character of text file */
unsigned check_text_file_newline_termination(FILE *test_file) {
int end_char;

fseek(test_file,-1L,SEEK_END);
You have found one of the most worthless semantics in the entire C
standard library. According to the standard fseek() on text files
cannot function in a manner superior to fgetpos(), which you should
probably use instead.

In any event, the most obvious problem is for systems that use
multiple bytes to denote an end of line (like DOS/Windows and most
internet protocols.) Unless the system is willing to perform
immediate parsing and it maintains a strict isolation of the
termination characters (which is possible) it will not back you up to
a point where:
end_char=getc(test_file);
will give you the '\n' you were looking for. When one realizes this
nonsense one is inevitably lead to the question: What is the use of
text files in the C language anyways? Personally, I prefer to always
open files as binary and use the following grammar to read them:

contents := line* linebody? DOSEOF?
line := linebody lineterminator
linebody := [^\n\r]+
lineterminator := \n | \r | \r\n

(Where \n = LF and \r = CR and DOSEOF = \033.) For ASCII and UTF-8,
this makes you compatible with Unix, Mac and DOS all at the same
time. You can even open a file which has mistakenly mixed the line
terminator formats without issue.

This also suggests a method for you to determine if the text file has
a line terminator -- open it as binary, do a fseek to the end with
offset -1L as you do above, and check the last character for either \r
or \n; if its DOSEOF then back it up one more then check for \r or
\n. This *may* be wrong on UNIX systems that can allow \r and DOSEOF
to be legitimate content characters, but you can typically make
demands that text files not contain control characters other than \n
and \t.
rewind(test_file);

if(end_char=='\n') return TRUE;
else return FALSE;
}

The question is: is this actually guaranteed to work properly on
all "conforming" C "implementations"?
Did this actually work on a Windows system? I am too lazy to check.
If it did, I can only assume that the C compiler library is just
promoting a LF by itself to a '\n'. My recollection was that on some
DOS systems text files were terminated by and 27 (=EOF) character as
well. Either way, I don't believe you can expect the C libraries for
all Windows/DOS compilers to support this (but I could be wrong.)
[...] My reading of the spec says
"no", but of course it works just fine on the several systems I've
used it on...
Maybe some weird EBCDIC system would fail. Or maybe the standard
worshipers here might pull some random nonsense system like the
Epilepsy or Tandem pit stop where it fails. Personally, I prefer to
pick standards which have the most relevance. In this case, the three
main desktop OSes cover pretty much all the text file formats that
matter, and the grammar I gave above reads all of them
simultaneously. The C standard has less to offer me than that.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
Jun 27 '08 #36
On 20 May, 21:52, Paul Hsieh <websn...@gmail.comwrote:
When one realizes this
nonsense one is inevitably lead to the question: What is the use of
text files in the C language anyways? Personally, I prefer to always
open files as binary and use the following grammar to read them:

contents := line* linebody? DOSEOF?
line := linebody lineterminator
linebody := [^\n\r]+
lineterminator := \n | \r | \r\n

(Where \n = LF and \r = CR and DOSEOF = \033.)
What if the file contains empty lines ? It seems to
me you should either do
linebody := [^\n\r]*
or
line := linebody? lineterminator

Jun 27 '08 #37

Chris Torek <no****@torek.netwrote in message
news:g0*********@news1.newsguy.com...
In article <LF***************@bgtnsc05-news.ops.worldnet.att.net>
Bill Reid <ho********@happyhealthy.netwrote:

[on fseek()ing to offset -1 from SEEK_END]
Yes, but in the instant case I don't think it is possible for it
to have worked thousands of times on a particular "implementation"
and then to quit working,

Obviously you have not used VMS. :-)
Au contraire, but it was so long ago I hardly remember...

What I do recall was that I worked with a LOT of text files created
using VMS "Edit" (or whatever their text editor was called), and these
files were actually parsed by UNIX scripts, commands, and utilities,
with only a small amount of conversion required for line endings (I think
that was all they needed)...and if I remember correctly (I may not), I
think the files were all cross-mounted on UNIX servers...
and as a matter of fact, if it works ONCE I can't see why it
would fail to work forever...

VMS has dozens of file formats, and using the -1 trick works on
some of them, but not all of them. So it would depend on the
file format of the file you opened.
This of course is always true of ALL systems, since SEEK_END
is not required to be meaningful for binary streams according to the
"standard"...and the standard specifically says you must use
"0" as an offset for SEEK_END...once again, somebody didn't
"get the memo"...
The answer (which by now should be obvious) to the first part of
the quesetion (in the subject line) is "it is not fully portable".
Particularly for ports to the 20-year-old past!
As for whether it is "smart", that one is trickier.

In my ancient TeX-DVI-file-handling library, which had a rather
different but related problem to solve, I had a machine-dependent
function I called "make seekable", so that you could string DVI-file
commands together with pipes. Some readers here in in comp.lang.c
may be aware that Unix-like systems (including Linux) cause seek
(including fseek()) operations on pipes to fail. Even if you
fopen() your file, it is possible that the name refers to a pipe
(e.g., a "named pipe", or perhaps simply /dev/stdin), so that the
seek will fail.
Yes, a "pipe" will cause a fseek() error, and set ESPIPE in
errno()...and this type of error is unknown to the C standard,
but "implementation-defined", like in POSIX...
The fully-portable, but ugly, solution is simply to copy the entire
file, adding a newline at the end if and only if the original
version did not have one. This is obviously going to be slower
than a machine-specific function that can use the seek-to-end trick.
Whether it is "significantly" slower depends on many other things.
Yeah, I can imagine. In any event, the nature of the way I use
this is I don't use "pipes", stdin, CTRL-Z terminated files (I actually
filter this type of stuff out routinely before the file is saved in the
first place), non-disk files of any sort...so we're kind of down to
only an unopened file or illegal seek value error, and the file is
guaranteed to be opened by the calling function, and I would
think that if -1L from SEEK_END is a legal seek value (even
though the "standard" says it isn't) once on an "implementation",
it always will be...

---
William Ernest Reid

Jun 27 '08 #38

Keith Thompson <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
Joe Wright <jo********@comcast.netwrites:
Peter Nilsson wrote:
[...]
No. But even if you read the whole file, rewind can fail.
Assuming you have opened "test_file" successfully, what do you think
might cause 'rewind(test_file);' to fail?

If "test_file" isn't seekable, then rewind(test_file) will fail to
rewind it -- but it has no way to tell you that it failed.
At the risk of repeating my reply to "vippstar", any non-seekable
"file", like a "pipe", will of course fail to fseek(), which in this case
means the rewind() is irrelevant...

---
William Ernest Reid

Jun 27 '08 #39

<vi******@gmail.comwrote in message
news:a6**********************************@z72g2000 hsb.googlegroups.com...
On May 20, 8:52 pm, Joe Wright <joewwri...@comcast.netwrote:
Peter Nilsson wrote:
Bill Reid wrote:
>This is how I handle a check that the last character of a text
>file is a newline:
Why bother?
>/* checks if newline is last character of text file */
>unsigned check_text_file_newline_termination(FILE *test_file)
> int end_char;
> fseek(test_file,-1L,SEEK_END);
"For a text stream, either offset shall be zero, or offset shall be a
value returned by an earlier successful call to the ftell function
on a stream associated with the same file and whence shall be
SEEK_SET."
And you should at least check whether the call succeeds. If it
doesn't, what will you return? I suggest you change the return
type to int and return EOF, 0 or some positive number.
> end_char=getc(test_file);
> rewind(test_file);
> if(end_char=='\n') return TRUE;
> else return FALSE;
> }
>The question is: is this actually guaranteed to work properly on
>all "conforming" C "implementations"?
No. But even if you read the whole file, rewind can fail.
Assuming you have opened "test_file" successfully, what do you think
might cause 'rewind(test_file);' to fail?
test_file could be a pipe.
That would also cause the fseek() itself to fail...

---
William Ernest Reid

Jun 27 '08 #40

Flash Gordon <sp**@flash-gordon.me.ukwrote in message
news:51************@news.flash-gordon.me.uk...
Bill Reid wrote, On 20/05/08 02:00:
Peter Nilsson <ai***@acay.com.auwrote in message
news:53**********************************@s33g2000 pri.googlegroups.com...
Bill Reid wrote:

<snip>
And you should at least check whether the call succeeds. If it
doesn't, what will you return?
Hey, I'll go you one better, my "implementation" may "silently"
fail an fseek()...What WILL you do, WHAT WILL YOU DO?!??!!

You asked if it would work on all possible conforming implementations,
so why are you shouting at someone for pointing out places where it
might fail without your code spotting it?
Well, there are actually a limited number of reasons why it
would fail in the first place, and most if not all of those don't
apply to this particular usage. Now on my "silent but deadly"
system, I'm further limited to ONE possible failure, an unopened
file, and THAT error is handled in the calling function, riiiiiiiight?
(I know, some goofball could inherit my personal code after
my death and start calling it with unopened files...)

---
William Ernest Reid

Jun 27 '08 #41

Keith Thompson <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
"Bill Reid" <ho********@happyhealthy.netwrites:
Flash Gordon <sp**@flash-gordon.me.ukwrote in message
news:up************@news.flash-gordon.me.uk...
Keith Thompson wrote, On 20/05/08 01:54:
<snip>

rewind(test_file) goes back to the beginning of the file -- which
isn't necessarily where it was before the function was called. If
you
want to restore the file's position, use ftell() and fseek().

Better to use fgetpos/fsetpos just in case the position does not fit in
a long (e.g. large files on a system with a 32 bit long, and yes they
do
exist before Bill complains about yet another non-existent problem).
I'm well aware of systems with "short" longs, but I'm willing to run
the risk of not handling TEXT files over 2GB in exchange for 100%
POSIX-compliance, which fgetpos/fsetpos isn't...

Where did you get the idea that fgetpos and fsetpos aren't
POSIX-compliant?
Well, if you want to get "technical" about it, I don't think the
presence of the two in a C compiler makes the C compiler
"non-conformant" to POSIX, and they ARE mentioned IN PASSING
in the POSIX standard, BUUYUUTTTT......
They're standard C functions (both C90 and C99), and
therefore they're POSIX-compliant as well.
Wrong, so very wrong.

"Standard C" != POSIX

This is by the clear language of the POSIX standard. A "C
implementation" may only be called "POSIX-conformant" if
it includes certain extensions and changes to the "standard"
C libraries. To the extent those changes exist, the C
"implementation" can no longer be called "standard" C,
and certainly can't be considered "portable" (except of course,
to other POSIX systems).
If you're willing to settle for POSIX compliance without necessarily
having code that's fully portable C, you should ask for advice in
comp.unix.programmer (this would let you use fseeko() and ftello(),
for example).
Don't forget lseek(), fileno(), filedes, and on and on and on...

fseek() and ftell(), et. al., are the clearly-described overlapping
requirements of both the "C" standard and POSIX, so that's what
I use...FOR MAXIMUM POSSIBLE PORTABILITY!!!

---
William Ernest Reid

Jun 27 '08 #42
"Bill Reid" <ho********@happyhealthy.netwrites:
Keith Thompson <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
[...]
>Where did you get the idea that fgetpos and fsetpos aren't
POSIX-compliant?

Well, if you want to get "technical" about it, I don't think the
presence of the two in a C compiler makes the C compiler
"non-conformant" to POSIX, and they ARE mentioned IN PASSING
in the POSIX standard, BUUYUUTTTT......
And, in fact, the *absence* of fgetpos and fsetpos in a C
implementation would cause that implementation to be non-conformat to
both C and POSIX.
>They're standard C functions (both C90 and C99), and
therefore they're POSIX-compliant as well.

Wrong, so very wrong.
No. Take a look at any draft of the C standard, or any C textbook, or
your online documentation.
"Standard C" != POSIX
Do you seriously think I'm not perfectly well aware of that?
This is by the clear language of the POSIX standard. A "C
implementation" may only be called "POSIX-conformant" if
it includes certain extensions and changes to the "standard"
C libraries.
Right.
To the extent those changes exist, the C
"implementation" can no longer be called "standard" C,
and certainly can't be considered "portable" (except of course,
to other POSIX systems).
The C standard specifically allows for extensions. Most (or all?)
POSIX extensions are compatible with the C standard. For example,
POSIX specifies a <unistd.hheader; this doesn't conflict with
anything in the C standard.

(Some POSIX extensions, as I recall, are in the form of additional
declarations in <stdio.h>, but I *think* those extensions are enabled
only if you define a certain preprocessor symbol. I don't remember
the details.)

But all of that is beside the point.
>If you're willing to settle for POSIX compliance without necessarily
having code that's fully portable C, you should ask for advice in
comp.unix.programmer (this would let you use fseeko() and ftello(),
for example).

Don't forget lseek(), fileno(), filedes, and on and on and on...
I didn't forget them; I didn't mention them because they weren't
relevant to my point.
fseek() and ftell(), et. al., are the clearly-described overlapping
requirements of both the "C" standard and POSIX, so that's what
I use...FOR MAXIMUM POSSIBLE PORTABILITY!!!
Uh huh.

The point that you're persistently missing is that fgetpos() and
fgetpos() are *also* specified in *both* the C and POSIX standards.
fseek(), ftell(), fgetpos(), and fsetpos() all have exactly the same
status with respect to the C standard (C89, C90, C95, C99) and the
POSIX standard.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jun 27 '08 #43

Keith Thompson <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
"Bill Reid" <ho********@happyhealthy.netwrites:
Keith Thompson <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
[...]
Where did you get the idea that fgetpos and fsetpos aren't
POSIX-compliant?
Well, if you want to get "technical" about it, I don't think the
presence of the two in a C compiler makes the C compiler
"non-conformant" to POSIX, and they ARE mentioned IN PASSING
in the POSIX standard, BUUYUUTTTT......

And, in fact, the *absence* of fgetpos and fsetpos in a C
implementation would cause that implementation to be non-conformat to
both C and POSIX.
What authority do you rely on to say that absence of those two in
a C "implementation" makes it not POSIX-conformant?
They're standard C functions (both C90 and C99), and
therefore they're POSIX-compliant as well.
Wrong, so very wrong.

No. Take a look at any draft of the C standard, or any C textbook, or
your online documentation.
I think you're having logic problems again. "POSIX" is only listed
in the bibliography of the C "standard". My online documentation
hardly mentions POSIX at all, except for some of the supported
extensions. It tends to use the term "UNIX" to designate what I
assume to be POSIX portability for most stuff, and fgetpos() and
fsetpos() are specifically listed as NOT being portable to "UNIX"...

In any event, you're looking at this the wrong way...since even by
your "logic" POSIX is a superset of the C "standard", why would
I read "C" documentation to figure out what is in POSIX?
"Standard C" != POSIX

Do you seriously think I'm not perfectly well aware of that?
<ATOMIC_BOGGLE!!!>

B-b-b-but...well, I'm just too stunned to think of something to
say here...
This is by the clear language of the POSIX standard. A "C
implementation" may only be called "POSIX-conformant" if
it includes certain extensions and changes to the "standard"
C libraries.

Right.
You're losing your power to shock me...
To the extent those changes exist, the C
"implementation" can no longer be called "standard" C,
and certainly can't be considered "portable" (except of course,
to other POSIX systems).

The C standard specifically allows for extensions. Most (or all?)
POSIX extensions are compatible with the C standard.
You mean "undefined behavior" is now "compatible" with the
C "standard"? Maybe you're regaining your ability to confound
me with contradictory nonsense...
For example,
POSIX specifies a <unistd.hheader; this doesn't conflict with
anything in the C standard.
I can't believe I'm actually reading this...even more so, actually
bothering to reply to it...
(Some POSIX extensions, as I recall, are in the form of additional
declarations in <stdio.h>, but I *think* those extensions are enabled
only if you define a certain preprocessor symbol. I don't remember
the details.)
Try REAL hard and you might remember what you're talking
about here...
But all of that is beside the point.
Yes, it is, it really is...
If you're willing to settle for POSIX compliance without necessarily
having code that's fully portable C, you should ask for advice in
comp.unix.programmer (this would let you use fseeko() and ftello(),
for example).
Don't forget lseek(), fileno(), filedes, and on and on and on...

I didn't forget them; I didn't mention them because they weren't
relevant to my point.
You had a point?
fseek() and ftell(), et. al., are the clearly-described overlapping
requirements of both the "C" standard and POSIX, so that's what
I use...FOR MAXIMUM POSSIBLE PORTABILITY!!!

Uh huh.

The point that you're persistently missing is that fgetpos() and
fgetpos() are *also* specified in *both* the C and POSIX standards.
Where is fgetpos() and fsetpos() "specified" in the POSIX standard?
Chapter and verse, please...
fseek(), ftell(), fgetpos(), and fsetpos() all have exactly the same
status with respect to the C standard (C89, C90, C95, C99) and the
POSIX standard.
Not in MY copy of the POSIX standard, or my "implementation"
documentation...

---
William Ernest Reid

Jun 27 '08 #44
Bill Reid wrote:
Keith Thompson <ks***@mib.orgwrote:
>And, in fact, the *absence* of fgetpos and fsetpos in a C
implementation would cause that implementation to be non-conformat to
both C and POSIX.

What authority do you rely on to say that absence of those two in
a C "implementation" makes it not POSIX-conformant?
http://www.opengroup.org/onlinepubs/...s/fgetpos.html
>
In any event, you're looking at this the wrong way...since even by
your "logic" POSIX is a superset of the C "standard", why would
I read "C" documentation to figure out what is in POSIX?
It may reference the C standard, but it does not include the text.
>The C standard specifically allows for extensions. Most (or all?)
POSIX extensions are compatible with the C standard.

You mean "undefined behavior" is now "compatible" with the
C "standard"? Maybe you're regaining your ability to confound
me with contradictory nonsense...
I can't see any reference to undefined behavior in Keith's postings. A
standard such as POSIX if free to define behavior that is implementation
defined in the C standard. It is also free to extend features (signals,
errno values and so on).
>For example,
POSIX specifies a <unistd.hheader; this doesn't conflict with
anything in the C standard.

I can't believe I'm actually reading this...even more so, actually
bothering to reply to it...
Well you just have.

--
Ian Collins.
Jun 27 '08 #45
Paul Hsieh wrote:
>
.... snip ...
>
Maybe some weird EBCDIC system would fail. Or maybe the standard
worshipers here might pull some random nonsense system like the
Epilepsy or Tandem pit stop where it fails. Personally, I prefer
to pick standards which have the most relevance. In this case,
the three main desktop OSes cover pretty much all the text file
formats that matter, and the grammar I gave above reads all of
them simultaneously. The C standard has less to offer me than
that.
Why horse about with all this? This only arose from finding out
whether a file had a terminal \n included. The C standard lets the
system choose whether or not to insist on such. It is obviously
easy to simply insist that all lines are \n terminated.

Having done so, there is no need to go through binary gyrations to
handle a possibly flawed strategy to detect actual \n codes. The
text file provisions of the system can do so quite accurately. Now
text files can include provisions to implement \a, \b, \t, etc.
KISS.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.
** Posted from http://www.teranews.com **
Jun 27 '08 #46
"Bill Reid" <ho********@happyhealthy.netwrites:
Keith Thompson <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
>"Bill Reid" <ho********@happyhealthy.netwrites:
Keith Thompson <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
[...]
>Where did you get the idea that fgetpos and fsetpos aren't
POSIX-compliant?

Well, if you want to get "technical" about it, I don't think the
presence of the two in a C compiler makes the C compiler
"non-conformant" to POSIX, and they ARE mentioned IN PASSING
in the POSIX standard, BUUYUUTTTT......

And, in fact, the *absence* of fgetpos and fsetpos in a C
implementation would cause that implementation to be non-conformat to
both C and POSIX.

What authority do you rely on to say that absence of those two in
a C "implementation" makes it not POSIX-conformant?
I don't know POSIX as well as I know C, but my understanding is that
POSIX requires a conforming C implementation. You say you have a copy
of the POSIX standard, so you can verify that yourself. Look up the
"c99" command. Or, if you have an older version, perhaps there's a
"c89" or "c95" command, or *some* command that's supposed to be a C
compiler.

The reference I've been using is the set of web pages at
<http://www.opengroup.org/onlinepubs/NNNNNNNNN/nframe.html>, where
"NNNNNNNNN" needs to be replaced with a decimal number that, if I
recall correctly, I obtained by registering at the site. The header
says:

The Open Group Base Specifications Issue 6
IEEE Std 1003.1, 2004 Edition
Copyright (c) 2001-2004 The IEEE and The Open Group

Quoting the page that describes fgetpos() (which you can find among
the first few hits of a Google search for "fgetpos"):

The functionality described on this reference page is aligned with
the ISO C standard. Any conflict between the requirements
described here and the ISO C standard is unintentional. This
volume of IEEE Std 1003.1-2001 defers to the ISO C standard.
>They're standard C functions (both C90 and C99), and
therefore they're POSIX-compliant as well.

Wrong, so very wrong.

No. Take a look at any draft of the C standard, or any C textbook, or
your online documentation.

I think you're having logic problems again. "POSIX" is only listed
in the bibliography of the C "standard". My online documentation
hardly mentions POSIX at all, except for some of the supported
extensions. It tends to use the term "UNIX" to designate what I
assume to be POSIX portability for most stuff, and fgetpos() and
fsetpos() are specifically listed as NOT being portable to "UNIX"...
I suggest that your online documentation is wrong, or perhaps merely
very old.
In any event, you're looking at this the wrong way...since even by
your "logic" POSIX is a superset of the C "standard", why would
I read "C" documentation to figure out what is in POSIX?
I suggest that quotation marks don't mean what you think they mean.

If POSIX is a superset of the C standard, then everything that's part
of C is part of POSIX. fgetpos() and fsetpos() are part of C.
Therefore, fgetpos() and fsetpos() are part of POSIX.

[snip]
Where is fgetpos() and fsetpos() "specified" in the POSIX standard?
Chapter and verse, please...
I don't have a copy of the POSIX standard. You claim that you do.
Try looking in the index or the table of contents.

[snip]

Strictly speaking, POSIX is off-topic here, but fgetpos and fsetpos
are topical. Both are standard C functions, and have been since the
first C standard was issued in 1989. I'd be surprised if you could
find a modern system with a C compiler on which they don't work as
specified.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jun 27 '08 #47
On May 20, 2:50 pm, Spiros Bousbouras <spi...@gmail.comwrote:
On 20 May, 21:52, Paul Hsieh <websn...@gmail.comwrote:
When one realizes this
nonsense one is inevitably lead to the question: What is the use
of text files in the C language anyways? Personally, I prefer to
always open files as binary and use the following grammar to read
them:
contents := line* linebody? DOSEOF?
line := linebody lineterminator
linebody := [^\n\r]+
lineterminator := \n | \r | \r\n
(Where \n = LF and \r = CR and DOSEOF = \033.)

What if the file contains empty lines ? It seems to
me you should either do
linebody := [^\n\r]*
or
line := linebody? lineterminator
Good catch. I'd go with the first option.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
Jun 27 '08 #48

Ian Collins <ia******@hotmail.comwrote in message
news:69*************@mid.individual.net...
Bill Reid wrote:
Keith Thompson <ks***@mib.orgwrote:
And, in fact, the *absence* of fgetpos and fsetpos in a C
implementation would cause that implementation to be non-conformat to
both C and POSIX.
What authority do you rely on to say that absence of those two in
a C "implementation" makes it not POSIX-conformant?
http://www.opengroup.org/onlinepubs/...s/fgetpos.html
OK, NOW it is in POSIX, as of "Issue 4" (look at the change history
section); it was NOT in "Issues 1, 2, and 3"...but fseek() was in there
from the beginning. I'm relying on documentation that is several years
old at least, and who knows, possibly POSIX-compliance of similar
vintage if I should try to port my application...

For some really relevant fun, read the description of fseek() from
the same source:

http://www.opengroup.org/onlinepubs/...ons/fseek.html

Note carefully that the restriction on only seeking a non-zero offset
from SEEK_SET applies only to wide-character I/O, conflicting with
more restrictive language of the C "standard"...
In any event, you're looking at this the wrong way...since even by
your "logic" POSIX is a superset of the C "standard", why would
I read "C" documentation to figure out what is in POSIX?
It may reference the C standard, but it does not include the text.
Huh? You just gave me a link to what purports to be the "POSIX"
description of a C "standard" function. ALL POSIX versions I have
read either explicitly referenced the C "standard" section for functions
that were "identical" for both, or explicitly described the differences,
or provided a full description for functions/defines/etc. that were
unique to POSIX. In this latest version, they apparently have full
descriptions of all the C "standard" functions, with a little notation
indicating what is an "extension" to the C "standard".

So if I want POSIX-compliance, why the hell would I read the
C "standard" again?
The C standard specifically allows for extensions. Most (or all?)
POSIX extensions are compatible with the C standard.
You mean "undefined behavior" is now "compatible" with the
C "standard"? Maybe you're regaining your ability to confound
me with contradictory nonsense...
I can't see any reference to undefined behavior in Keith's postings.
Which postings? He's never posted the words "undefined behavior"?

If you're talking about this post, I know that in the past at least some
of the POSIX extensions, specifically extra arguments to strftime(), were
listed as "undefined behavior" by the C "standard", and were documented
as "extensions" in POSIX (and in my own "implementation" documentation).
So in this post, he claimed that most or ALL POSIX extensions were
"compatible" with the C "standard", he must have been referring to
some "undefined behavior" rather than just "implementation-defined"
behavior...
A
standard such as POSIX if free to define behavior that is implementation
defined in the C standard. It is also free to extend features (signals,
errno values and so on).
Well, sure, I guess...still won't make an application "portable" that
relies on any of that stuff, at least from a C "portability" standpoint,
which up until today I thought was the monomaniacal focus of the
group...
For example,
POSIX specifies a <unistd.hheader; this doesn't conflict with
anything in the C standard.
I can't believe I'm actually reading this...even more so, actually
bothering to reply to it...
Well you just have.
Yup, did it again too...

Anyway, learned something today: if a system is POSIX "Issue 4"
compliant, I CAN use fgetpos() and fsetpos()...

---
William Ernest Reid

Jun 27 '08 #49
Bill Reid wrote:
Ian Collins <ia******@hotmail.comwrote:
>It may reference the C standard, but it does not include the text.

Huh? You just gave me a link to what purports to be the "POSIX"
description of a C "standard" function.
There's more to the C standard than the standard library section.

--
Ian Collins.
Jun 27 '08 #50

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Frederick Gotham | last post by:
I was pondering over writing a fully-portable version of <limits.h(e.g. such things as: #define UINT_MAX ((unsigned)-1) , when something occurred to me. Just recently on this newsgroup, I and...
9
by: Martin Wells | last post by:
I'm doing an embedded systems project and I'm programming it as fully- portable C89 (except of course for setting the pin values). I need to put delays in the program, in the vicinity of 250 ms....
18
by: Tomás Ó hÉilidhe | last post by:
(SHA-1 is a cryptographic hash function. For info on what SHA-1 is: http://en.wikipedia.org/wiki/SHA-1) I'm writing fullportable C89 code that needs to make use of the SHA-1 algorithm. Does...
14
by: =?ISO-8859-1?Q?Tom=E1s_=D3_h=C9ilidhe?= | last post by:
I need a Big Number library. I've been considering switching my project to C++ but at the moment I'm exploring the avenue of keeping it in C. What's the best Big Number library for C? I need to...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.