By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,156 Members | 1,026 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,156 IT Pros & Developers. It's quick & easy.

Bug with compiler or am I just doing something illegal

P: n/a
This program should copy one file onto the other. It works if I
compile it with gcc to a cygwin program. However, if I compile it
with the -mno-cygwin option, it doesn't work (this targets native
windows).

Anyway, I just want to check that the program is valid before I see if
I can find a way around a compiler bug.

It might be something simple that I am doing wrong.

-------------------------------------------------------------
#include <stdio.h>

main()
{
FILE *fp;

fp = fopen( "scan0001b.bmp" , "r" );
if( fp == NULL )
{
printf("File open failed for read\n");
exit(0);
}

FILE *fpo;

fpo = fopen( "scanout.bmp" , "w");

if( fpo == NULL )
{
printf("File open failed for write\n");
exit(0);
}
int c=1;

while(!feof(fp))
{
c = fgetc(fp);
if( c>=0 )
fputc( c , fpo );
}

fclose(fp);
fclose(fpo);

}

---------------------------------------------

When I run it, it stops before it has read the entire file (it only
reads around 10%).
Jul 2 '08 #1
Share this Question
Share on Google+
34 Replies


P: n/a
In comp.lang.c, raphfrk wrote:
This program should copy one file onto the other. It works if I
compile it with gcc to a cygwin program. However, if I compile it
with the -mno-cygwin option, it doesn't work (this targets native
windows).

Anyway, I just want to check that the program is valid before I see if
I can find a way around a compiler bug.

It might be something simple that I am doing wrong.

-------------------------------------------------------------
#include <stdio.h>

main()
{
FILE *fp;

fp = fopen( "scan0001b.bmp" , "r" );
You've opened this file with the "read text" option. Presuming that the
filename represents a file in the Microsoft bitmap graphics format
(".BMP"), then this is the wrong mode to open the file in. You probably
want
fp = fopen( "scan0001b.bmp" , "rb" );
here.
if( fp == NULL )
{
printf("File open failed for read\n");
exit(0);
}

FILE *fpo;

fpo = fopen( "scanout.bmp" , "w");
Similarly, this is the wrong mode to open a (presumably binary) output file.
>
if( fpo == NULL )
{
printf("File open failed for write\n");
exit(0);
}
int c=1;

while(!feof(fp))
Remember, feof() does not read the file, and returns true /after/ the true
read (in your case, fgetc()) returns an end-of-file condition.
{
c = fgetc(fp);
In Microsoft Windows, text files are permitted to contain a binary octet
marker (0x1a or ^Z), which will indicate a logical end-of-file prior to the
physical end of the file. While this is a leftover from the MSDOS 1 days,
it still is enforced and acted apon by the underlying Windows I/O model.

If you /did/ mean your input file to contain pure binary data (rather than
the text that you indicate by the file mode string), then there is a very
good chance that at least one character (octet) of this pure binary data
has the value of 0x1a. This would cause your fgetc() on the file to
prematurely return EOF, and subsequently cause feof() to return true. This
in turn causes you to abort the copy process prior to the actual physical
end-of-file of the (presumably binary) input file.
if( c>=0 )
fputc( c , fpo );
}

fclose(fp);
fclose(fpo);

}

---------------------------------------------

When I run it, it stops before it has read the entire file (it only
reads around 10%).
--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------
Jul 2 '08 #2

P: n/a
raphfrk wrote:
) This program should copy one file onto the other. It works if I
) compile it with gcc to a cygwin program. However, if I compile it
) with the -mno-cygwin option, it doesn't work (this targets native
) windows).
)
) Anyway, I just want to check that the program is valid before I see if
) I can find a way around a compiler bug.
)
) It might be something simple that I am doing wrong.

It might be.
There are certainly several simple things wrong with the code.

) FILE *fp;
)
) fp = fopen( "scan0001b.bmp" , "r" );

Not binary mode "rb" ? It is a binary file, right ?
Perhaps if you read in text mode, certain characters
can flag end of file on windows. Ctrl-Z perhaps.

) int c=1;

Why initialize it ? And why to 1 ??

) while(!feof(fp))

This mistake is so common that it has its own FAQ entry.
You see, the eof flag is set at the moment a read operation is done
when the file is at EOF. So after the last character is read, feof(fp)
will *not* return true. The next read will return an error code, and
*only then* will feof(fp) be true.

But theoretically, as is, it should work, because of the extra if (c>=0).

) {
) c = fgetc(fp);
) if( c>=0 )
) fputc( c , fpo );
) }

The correct idiom is this:

while ((c = fgetc(fp)) != EOF) {
fputc(c, fpo); /* Should check for errors here also, I think */
}
/* And here an if (ferror(fp)) would be nice */

) fclose(fp);
) fclose(fpo);
)
) }
)
) ---------------------------------------------
)
) When I run it, it stops before it has read the entire file (it only
) reads around 10%).

Offhand, I guess there is a ^Z character at 10% in the input file.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
Jul 2 '08 #3

P: n/a
raphfrk wrote:
This program should copy one file onto the other. It works if I
compile it with gcc to a cygwin program. However, if I compile it
with the -mno-cygwin option, it doesn't work (this targets native
windows).

Anyway, I just want to check that the program is valid before I see if
I can find a way around a compiler bug.

It might be something simple that I am doing wrong.

-------------------------------------------------------------
#include <stdio.h>

main()
main() return an int. Say so when you write the function:
int main(void)
{
FILE *fp;

fp = fopen( "scan0001b.bmp" , "r" );
You're not opening the file in binary mode.
On some systems, that will cause some characters in the file to be
interpreted by the C library (or the Operating System??) and those
characters will not reach your program exactly as they are on disk. It
may even happen that some character tells the C library (or the
Operating System) to stop reading the file right there, even though
the number of characters read is only a small percentage of the file
length reported by the system through other means.
if( fp == NULL )
{
printf("File open failed for read\n");
exit(0);
}

FILE *fpo;

fpo = fopen( "scanout.bmp" , "w");
You're not opening the file in binary mode.
Translation of characters can occur in a similar way to what happens
when you read a file in not binary mode.
>
if( fpo == NULL )
{
printf("File open failed for write\n");
exit(0);
}
int c=1;
Why 1? What does it mean?
I see you're using C99 (you're mixing declarations and code). There's
nothing wrong with that as long as you understand that C99 compilers
aren't as readily available as C89 compilers, and you don't mind lose
some portability.
>
while(!feof(fp))
{
c = fgetc(fp);
if( c>=0 )
fputc( c , fpo );
}
This loop is wrong.
feof() doesn't do what you think it does.
Read the answer to question 12.2 on the c-faq ( http://c-faq.com/ ),
and while you're there, bookmark the site and return there every now
and again.
fclose(fp);
Failed to test if the fclose() call succeded.
fclose(fpo);

}

---------------------------------------------

When I run it, it stops before it has read the entire file (it only
reads around 10%).
Jul 2 '08 #4

P: n/a
On Jul 2, 8:37 pm, Lew Pitcher <lpitc...@teksavvy.comwrote:
It might be something simple that I am doing wrong.
-------------------------------------------------------------
#include <stdio.h>
main()
{
FILE *fp;
fp = fopen( "scan0001b.bmp" , "r" );

You've opened this file with the "read text" option. Presuming that the
filename represents a file in the Microsoft bitmap graphics format
(".BMP"), then this is the wrong mode to open the file in. You probably
want
fp = fopen( "scan0001b.bmp" , "rb" );
here.
Ahh, I didn't realise that there was a rb option.

Does it work exactly the same other than stopping at logic end of
file?

Can I still use things like fscanf( fp , "%c" , &variable )
Remember, feof() does not read the file, and returns true /after/ the true
read (in your case, fgetc()) returns an end-of-file condition.
Ahh, another piece of useful info.

Thanks to all that replied. This group is great for spotting non-
obvious coding errors.

A previous tip for ensuring malloc used the right size has really
reduced bugs when I use dynamic allocation.
Jul 2 '08 #5

P: n/a
raphfrk wrote:
On Jul 2, 8:37 pm, Lew Pitcher <lpitc...@teksavvy.comwrote:
>>It might be something simple that I am doing wrong.
-------------------------------------------------------------
#include <stdio.h>
main()
{
FILE *fp;
fp = fopen( "scan0001b.bmp" , "r" );
You've opened this file with the "read text" option. Presuming that the
filename represents a file in the Microsoft bitmap graphics format
(".BMP"), then this is the wrong mode to open the file in. You probably
want
fp = fopen( "scan0001b.bmp" , "rb" );
here.

Ahh, I didn't realise that there was a rb option.

Does it work exactly the same other than stopping at logic end of
file?
A text stream will do whatever is necessary to translate
between C's notion of text ("lines of text-ish characters,
each terminated by a single '\n'") and the platform's own
conventions for text, whatever they might be. On Windows,
this means that a ^Z is treated as an end-of-data marker on
input (and may be generated automatically on output, for all
I know), and that line ends are marked by the pair '\r' '\n'
instead of by '\n'. Other conventions apply on other systems.
It's the stream's business to understand the conventions and
to translate them to and from C's view.

A binary stream, on the other hand, operates in a "raw
bytes" mode, without translation. What you read is what you
wrote (except that there may be extra '\0' bytes at the end).
Can I still use things like fscanf( fp , "%c" , &variable )
Yes, but it's queasy-making. Many fscanf() directives
have the text-friendly but binary-fatal habit of skipping any
leading white space characters. "%c" is not one of those, but
if you try to use "%s" or "%d" or something of that sort you
may be unpleasantly surprised.

Using `variable = getc(fp)' is simpler, harder to get wrong
(but see Question 12.1 in the FAQ), and may even be faster.
There's no need to commit canaricide by cannon.

--
Er*********@sun.com
Jul 2 '08 #6

P: n/a

"raphfrk" <ra*****@netscape.netwrote, among other things:

while(!feof(fp))
{
c = fgetc(fp);
if( c>=0 )
fputc( c , fpo );
}

In addition to the excellent advice others here gave you,
there's one *HUGE* error here which everyone seems to have
missed somehow: the 0 byte, 0x00, is a perfectly
valid byte both for text files (ASCII, iso-8859-1, etc)
and binary (non-text) files. Many of the bytes in a
bmp file will be 0. If you omit those and close up the
gaps, you will severely corrupt the copy of the original
file. It will NOT render correctly in a graphics viewer
(Paintshop Pro, Internet Explorer, or whatever). It
probably can't even be opened, because the headers will
be screwed up.

So your while loop is wrong from several standpoints.

Try:

// Loop while file pointer is valid, until break:
while (fp)
{
c = fgetc(fp); // ATTEMPT TO READ A BYTE.
if (feof(fp)) break; // BREAK IF READ ATTEMPT FAILED.
else fputc(c, fpo); // COPY EVEN THE "0" BYTES.
}

--
Cheers,
Robbie Hatley
lonewolf aatt well dott com
www dott well dott com slant user slant lonewolf slant
Jul 2 '08 #7

P: n/a
"Robbie Hatley" <se**************@for.my.email.addresswrites:
"raphfrk" <ra*****@netscape.netwrote, among other things:

while(!feof(fp))
{
c = fgetc(fp);
if( c>=0 )
fputc( c , fpo );
}

In addition to the excellent advice others here gave you,
there's one *HUGE* error here which everyone seems to have
missed somehow: the 0 byte, 0x00, is a perfectly
valid byte both for text files (ASCII, iso-8859-1, etc)
and binary (non-text) files. Many of the bytes in a
bmp file will be 0. If you omit those and close up the
gaps, you will severely corrupt the copy of the original
file.
I presume you missed the = part of the >=?
It will NOT render correctly in a graphics viewer
(Paintshop Pro, Internet Explorer, or whatever). It
probably can't even be opened, because the headers will
be screwed up.

So your while loop is wrong from several standpoints.
Actually no (unless I've missed some subtlety). It is non-idiomatic
but looks to be entirely correct to me.
Try:

// Loop while file pointer is valid, until break:
while (fp)
{
c = fgetc(fp); // ATTEMPT TO READ A BYTE.
if (feof(fp)) break; // BREAK IF READ ATTEMPT FAILED.
else fputc(c, fpo); // COPY EVEN THE "0" BYTES.
}
The canonical version would be:

while ((c = fgetc(fp)) != EOF)
fputc(c, fpo);

--
Ben.
Jul 3 '08 #8

P: n/a
"Robbie Hatley" <se**************@for.my.email.addresswrites:
"raphfrk" <ra*****@netscape.netwrote, among other things:
while(!feof(fp))
{
c = fgetc(fp);
if( c>=0 )
fputc( c , fpo );
}

In addition to the excellent advice others here gave you,
there's one *HUGE* error here which everyone seems to have
missed somehow: the 0 byte, 0x00, is a perfectly
valid byte both for text files (ASCII, iso-8859-1, etc)
and binary (non-text) files. Many of the bytes in a
bmp file will be 0. If you omit those and close up the
gaps, you will severely corrupt the copy of the original
file. It will NOT render correctly in a graphics viewer
(Paintshop Pro, Internet Explorer, or whatever). It
probably can't even be opened, because the headers will
be screwed up.
[...]

Look again. The test is "c>=0", not "c>0". 0 bytes are treated the
same as any other valid bytes.

And I'd dispute that '\0' is a valid byte in a text file, at least for
most text formats. A text file *can* have '\0' characters, but
they'll cause problems for programs that use fgets() because they'll
be treated as string terminators.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jul 3 '08 #9

P: n/a
On Jul 3, 12:11 am, raphfrk <raph...@netscape.netwrote:
This program should copy one file onto the other. It works if I
compile it with gcc to a cygwin program. However, if I compile it
with the -mno-cygwin option, it doesn't work (this targets native
windows).
Just curious, do you mean that this code worked correctly in the
former case? Since the errors seems to be in your code, I don't
understand how it worked well as a cygwin program. One possible
explanation is that as cygwin emulates unix environment, it does not
distinguish between byte and text streams.
Jul 3 '08 #10

P: n/a
On Jul 3, 5:14 am, rahul <rahulsin...@gmail.comwrote:
Just curious, do you mean that this code worked correctly in the
former case? Since the errors seems to be in your code, I don't
understand how it worked well as a cygwin program. One possible
explanation is that as cygwin emulates unix environment, it does not
distinguish between byte and text streams.
Yeah, I think that is probably it. My memory is that fprintf as
native cygwin will output unix type files.

In fact, it can cause issues when compiling source files. It executes
a dos2unix command before it passes them to the compiler. This can
mean that the line number in the compiler doesn't match the DOS line
number as sometimes notepad adds hidden characters that are
interpreted as newlines. (I think created by pressing shift-delete or
some other wierd combination).

I end up running dos2unix and unix2dos on the files every so often.
This ensures the DOS version and the unix version are equivalent.
Jul 3 '08 #11

P: n/a
On 2008-07-02, Robbie Hatley <se**************@for.my.email.addresswrote:
// Loop while file pointer is valid, until break:
while (fp)
{
c = fgetc(fp); // ATTEMPT TO READ A BYTE.
if (feof(fp)) break; // BREAK IF READ ATTEMPT FAILED.
Should be:

if (c == EOF) break;

or:

if (feof(fp) || ferror(fp)) break;
** Posted from http://www.teranews.com **
Jul 3 '08 #12

P: n/a
Robbie Hatley said:

<snip>
So your while loop is wrong from several standpoints.
So is yours.
>
Try:

// Loop while file pointer is valid, until break:
while (fp)
No, don't try that.
{
c = fgetc(fp); // ATTEMPT TO READ A BYTE.
if (feof(fp)) break; // BREAK IF READ ATTEMPT FAILED.
else fputc(c, fpo); // COPY EVEN THE "0" BYTES.
}
while((c = fgetc(fp)) != EOF)
{
fputc(c, fpo);
}

Elegance is underrated.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Jul 3 '08 #13

P: n/a
rahul <ra*********@gmail.comwrites:
On Jul 3, 12:11 am, raphfrk <raph...@netscape.netwrote:
>This program should copy one file onto the other. It works if I
compile it with gcc to a cygwin program. However, if I compile it
with the -mno-cygwin option, it doesn't work (this targets native
windows).
Just curious, do you mean that this code worked correctly in the
former case? Since the errors seems to be in your code, I don't
understand how it worked well as a cygwin program. One possible
explanation is that as cygwin emulates unix environment, it does not
distinguish between byte and text streams.
Yes, Cygwin, in its default configuration, uses the Unix format for
text files.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jul 3 '08 #14

P: n/a

"Ben Bacarisse" wrote:
"Robbie Hatley" writes:
"raphfrk" wrote:

while(!feof(fp))
{
c = fgetc(fp);
if( c>=0 )
fputc( c , fpo );
}

In addition to the excellent advice others here gave you,
there's one *HUGE* error here which everyone seems to have
missed somehow: the 0 byte, 0x00, is a perfectly
valid byte both for text files (ASCII, iso-8859-1, etc)
and binary (non-text) files. Many of the bytes in a
bmp file will be 0. If you omit those and close up the
gaps, you will severely corrupt the copy of the original
file.

I presume you missed the = part of the >=?
Oops. Yes, indeed. I must have been looking at it cross-eyed
or something.
The canonical version would be:

while ((c = fgetc(fp)) != EOF)
fputc(c, fpo);
Yes, on looking at the instruction file for libc that comes with
my compiler (djgpp), i see it recommends similar:

int c;
while((c=fgetc(stdin)) != EOF)
fputc(c, stdout);

Substitute generic file pointers for stdin and stdout,
and it becomes indentical to the version you gave.

The c being an int (not char or unsigned char) is important,
because fgetc returns an int containing the unsigned value
(0 to 255) of the character, or EOF if character couldn't
be read. (I'm assuming EOF would always be defined out of
the 0-255 range so it couldn't be confused with a character
value.)

--
Cheers,
Robbie Hatley
lonewolf aatt well dott com
www dott well dott com slant user slant lonewolf slant
Jul 3 '08 #15

P: n/a
"Robbie Hatley" <se**************@for.my.email.addresswrites:
[...]
The c being an int (not char or unsigned char) is important,
because fgetc returns an int containing the unsigned value
(0 to 255) of the character, or EOF if character couldn't
be read. (I'm assuming EOF would always be defined out of
the 0-255 range so it couldn't be confused with a character
value.)
Right.

EOF is of type int, and has a negative value, which must therefore be
outside the range of unsigned char. But note that the range of
unsigned char can be greater than 0..255 on systems where CHAR_BIT>8.
(Most modern systems other than DSPs have CHAR_BIT==8, but the
standard merely requires CHAR_BIT>=8.)

You can have problems if an input character, when converted from
unsigned char to int, yields a negative value that might happen to
match the value of EOF. This can happen only if sizeof(int)==1, which
can only happen if CHAR_BIT>=16 (since int must be at least 16 bits
wide). In practice, this isn't an issue; I know of no current hosted
implementation with CHAR_BIT!=8, and freestanding implementations
aren't required to provide <stdio.h>. But in theory, the canonical
I/O loop:

int c;
...
while ((c = fgetc(fin)) != EOF) {
/* ... */
}

isn't absolutely 100% portable.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jul 3 '08 #16

P: n/a
On Jul 3, 8:18 pm, Keith Thompson <ks...@mib.orgwrote:
You can have problems if an input character, when converted from
unsigned char to int, yields a negative value that might happen to
match the value of EOF. This can happen only if sizeof(int)==1, which
can only happen if CHAR_BIT>=16 (since int must be at least 16 bits
wide). In practice, this isn't an issue; I know of no current hosted
implementation with CHAR_BIT!=8, and freestanding implementations
aren't required to provide <stdio.h>. But in theory, the canonical
I/O loop:

int c;
...
while ((c = fgetc(fin)) != EOF) {
/* ... */
}

isn't absolutely 100% portable.
What would EOF be equal to in that case and what would be the correct
code?

In fact, it doesn't look possible, it is not just a portability issue.
Jul 3 '08 #17

P: n/a
On Jul 4, 1:58 am, raphfrk <raph...@netscape.netwrote:
On Jul 3, 8:18 pm, Keith Thompson <ks...@mib.orgwrote:
You can have problems if an input character, when converted from
unsigned char to int, yields a negative value that might happen to
match the value of EOF. This can happen only if sizeof(int)==1, which
can only happen if CHAR_BIT>=16 (since int must be at least 16 bits
wide). In practice, this isn't an issue; I know of no current hosted
implementation with CHAR_BIT!=8, and freestanding implementations
aren't required to provide <stdio.h>. But in theory, the canonical
I/O loop:
int c;
...
while ((c = fgetc(fin)) != EOF) {
/* ... */
}
isn't absolutely 100% portable.

What would EOF be equal to in that case and what would be the correct
code?
The correct code would use feof() and ferror() and ignore what fgetc()
returns.
Example:

while((c = fgetc(fin), !feof(fin)) && !ferror(fin)) { /* ... */ }

However I also do not know of such implementation. I'm sure future
implementors will not let this happend, it will break a fair amount of
C code.

Jul 3 '08 #18

P: n/a
On Jul 4, 12:08 am, vipps...@gmail.com wrote:
However I also do not know of such implementation. I'm sure future
implementors will not let this happend, it will break a fair amount of
C code.
Ahh, so the EOF return is in effect useless if you want 100% portable
code.
Jul 4 '08 #19

P: n/a
raphfrk <ra*****@netscape.netwrites:
On Jul 3, 8:18 pm, Keith Thompson <ks...@mib.orgwrote:
>You can have problems if an input character, when converted from
unsigned char to int, yields a negative value that might happen to
match the value of EOF. This can happen only if sizeof(int)==1, which
can only happen if CHAR_BIT>=16 (since int must be at least 16 bits
wide). In practice, this isn't an issue; I know of no current hosted
implementation with CHAR_BIT!=8, and freestanding implementations
aren't required to provide <stdio.h>. But in theory, the canonical
I/O loop:

int c;
...
while ((c = fgetc(fin)) != EOF) {
/* ... */
}

isn't absolutely 100% portable.

What would EOF be equal to in that case and what would be the correct
code?

In fact, it doesn't look possible, it is not just a portability issue.
EOF is required to be a negative value of type int (in practice, it's
generally -1). That wouldn't change. It just would happen to match a
valid character value (represented as unsigned char and converted to
int).

In fact the conversion from unsigned char to signed int would yield an
implementation-defined result (or, in theory, raise an
implementation-defined signal).

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jul 4 '08 #20

P: n/a
vi******@gmail.com writes:
On Jul 4, 1:58 am, raphfrk <raph...@netscape.netwrote:
>On Jul 3, 8:18 pm, Keith Thompson <ks...@mib.orgwrote:
You can have problems if an input character, when converted from
unsigned char to int, yields a negative value that might happen to
match the value of EOF. This can happen only if sizeof(int)==1, which
can only happen if CHAR_BIT>=16 (since int must be at least 16 bits
wide). In practice, this isn't an issue; I know of no current hosted
implementation with CHAR_BIT!=8, and freestanding implementations
aren't required to provide <stdio.h>. But in theory, the canonical
I/O loop:
int c;
...
while ((c = fgetc(fin)) != EOF) {
/* ... */
}
isn't absolutely 100% portable.

What would EOF be equal to in that case and what would be the correct
code?
The correct code would use feof() and ferror() and ignore what fgetc()
returns.
Example:

while((c = fgetc(fin), !feof(fin)) && !ferror(fin)) { /* ... */ }
You can still do the comparison to EOF; you just have to realize that
it doesn't necessarily mean that you've reached end-of-file.

If c is equal to EOF *and* either feof(fin) or ferror(fin) returns
true, then you've reached the end of your input (or encountered an
error). Checking for EOF first can avoid the (minor) cost of the
function calls in some cases.

I might write it something like this:

#define END_OF_FILE(c, f) ((c)==EOF && (feof(f) || ferror(f)))

while (c = fgetc(fin), ! END_OF_FILE(c, fin)) {
/* ... */
}
However I also do not know of such implementation. I'm sure future
implementors will not let this happend, it will break a fair amount of
C code.
Agreed. In practice, it's just not worth worrying about.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jul 4 '08 #21

P: n/a
On Thu, 03 Jul 2008 17:27:23 -0700, Keith Thompson wrote:
vi******@gmail.com writes:
>On Jul 4, 1:58 am, raphfrk <raph...@netscape.netwrote:
>>On Jul 3, 8:18 pm, Keith Thompson <ks...@mib.orgwrote:
You can have problems if an input character, when converted from
unsigned char to int, yields a negative value that might happen to
match the value of EOF. [...]

What would EOF be equal to in that case and what would be the correct
code?
The correct code would use feof() and ferror() and ignore what fgetc()
returns.
Example:

while((c = fgetc(fin), !feof(fin)) && !ferror(fin)) { /* ... */ }

You can still do the comparison to EOF; you just have to realize that it
doesn't necessarily mean that you've reached end-of-file.

If c is equal to EOF *and* either feof(fin) or ferror(fin) returns true,
then you've reached the end of your input (or encountered an error).
Checking for EOF first can avoid the (minor) cost of the function calls
in some cases.
Here's how I would do and have done it:

for (;;) {
c = getc(in);
if (c == EOF) {
if (feof(in)) {
/* cleanup omitted */
return;
}
if (ferror(in)) {
do_error("read error");
return;
}
}

/* process read character c, which may happen to compare equal to EOF */
}

Here, I don't want to treat read errors the same way as the end of the
file. If there is a read error, I want to print an error message, set a
flag to return EXIT_FAILURE at the end of the program, but return from the
read function and see if the input that has been read so far is useful. If
the end of the file is reached, I don't want to print anything, I don't
want to set any flag, and I want to return from the read function. This
already requires me to check feof or ferror, so handling a character that
happens to compare equal to EOF has practically no extra cost.
Jul 4 '08 #22

P: n/a
raphfrk wrote:
>
.... snip ...
>
What would EOF be equal to in that case and what would be the
correct code?
EOF will be whatever it is defined as in stdio.h. You have no
problem generating correct code, as long as you include the header.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.
Jul 4 '08 #23

P: n/a
Harald van Dijk <tr*****@gmail.comwrites:
On Thu, 03 Jul 2008 17:27:23 -0700, Keith Thompson wrote:
>vi******@gmail.com writes:
>>On Jul 4, 1:58 am, raphfrk <raph...@netscape.netwrote:
On Jul 3, 8:18 pm, Keith Thompson <ks...@mib.orgwrote:
You can have problems if an input character, when converted from
unsigned char to int, yields a negative value that might happen to
match the value of EOF. [...]

What would EOF be equal to in that case and what would be the correct
code?
The correct code would use feof() and ferror() and ignore what fgetc()
returns.
Example:

while((c = fgetc(fin), !feof(fin)) && !ferror(fin)) { /* ... */ }

You can still do the comparison to EOF; you just have to realize that it
doesn't necessarily mean that you've reached end-of-file.

If c is equal to EOF *and* either feof(fin) or ferror(fin) returns true,
then you've reached the end of your input (or encountered an error).
Checking for EOF first can avoid the (minor) cost of the function calls
in some cases.

Here's how I would do and have done it:

for (;;) {
c = getc(in);
if (c == EOF) {
if (feof(in)) {
/* cleanup omitted */
return;
}
if (ferror(in)) {
do_error("read error");
return;
}
}

/* process read character c, which may happen to compare equal to EOF */
}

Here, I don't want to treat read errors the same way as the end of the
file. If there is a read error, I want to print an error message, set a
flag to return EXIT_FAILURE at the end of the program, but return from the
read function and see if the input that has been read so far is useful. If
the end of the file is reached, I don't want to print anything, I don't
want to set any flag, and I want to return from the read function. This
already requires me to check feof or ferror, so handling a character that
happens to compare equal to EOF has practically no extra cost.
Have you actually used a system where EOF may match a valid character?

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jul 4 '08 #24

P: n/a
On Thu, 03 Jul 2008 18:24:44 -0700, Keith Thompson wrote:
Harald van Dijk <tr*****@gmail.comwrites:
>On Thu, 03 Jul 2008 17:27:23 -0700, Keith Thompson wrote:
>>You can still do the comparison to EOF; you just have to realize that
it doesn't necessarily mean that you've reached end-of-file.

If c is equal to EOF *and* either feof(fin) or ferror(fin) returns
true, then you've reached the end of your input (or encountered an
error). Checking for EOF first can avoid the (minor) cost of the
function calls in some cases.

Here's how I would do and have done it:

for (;;) {
c = getc(in);
if (c == EOF) {
if (feof(in)) {
/* cleanup omitted */
return;
}
if (ferror(in)) {
do_error("read error");
return;
}
}

/* process read character c, which may happen to compare equal to EOF
*/
}

Here, I don't want to treat read errors the same way as the end of the
file. If there is a read error, I want to print an error message, set a
flag to return EXIT_FAILURE at the end of the program, but return from
the read function and see if the input that has been read so far is
useful. If the end of the file is reached, I don't want to print
anything, I don't want to set any flag, and I want to return from the
read function. This already requires me to check feof or ferror, so
handling a character that happens to compare equal to EOF has
practically no extra cost.

Have you actually used a system where EOF may match a valid character?
No, I haven't. I originally wrote the code the way I did without paying
attention to that, and ended up with

for (;;) {
c = getc(in);
if (c == EOF) {
if (ferror(in)) {
do_error("read error");
}
/* cleanup omitted */
return;
}

/* process read character c */
}

After realising the problem, while it's only a problem in theory for me, I
noticed the cost of avoiding the assumption was so low and that it didn't
clutter the code, that I really had no excuse not to just fix it anyway.

I already try to avoid using the result of an assignment, because I think
it makes the code harder to read. I would never write

while ((c = getc(in)) != EOF)

and don't care much for

while (c = getc(in), c != EOF)

either. I understand they're valid, and I don't consider them bad style,
but it's not my preferred way to write. Similarly, I didn't break out of
the loop, but placed the cleanup near the EOF check, because that made
more sense, made the code easier to follow for me.
Jul 4 '08 #25

P: n/a
vi******@gmail.com wrote:
The correct code would use feof() and ferror() and ignore what fgetc()
returns.
It makes more sense to me to check what fgetc returns,
and then to only check feof and ferror, if fgetc returned EOF.

--
pete
Jul 4 '08 #26

P: n/a
raphfrk wrote:
On Jul 4, 12:08 am, vipps...@gmail.com wrote:
However I also do not know of such implementation. I'm sure future
implementors will not let this happend, it will break a fair amount of
C code.

Ahh, so the EOF return is in effect useless if you want 100% portable
code.
Yes, but no one wants to write 100% portable code. Most people are
happy with 99.9999%. Hell, Windoze programmers are happy with 5%. ;-)

AFAIK, no implementation writer has yet built a hosted implementation
where INT_MAX < UCHAR_MAX, intending it for practical use.

I know of one case where it was considered (with old Crays,) but it
was
rejected because the vast majority of code assumes that isn't the
case.

--
Peter
Jul 4 '08 #27

P: n/a
pete wrote:
vi******@gmail.com wrote:
>The correct code would use feof() and ferror() and ignore what
fgetc() returns.

It makes more sense to me to check what fgetc returns,
and then to only check feof and ferror, if fgetc returned EOF.
Surprise - the fgetc function _does_ return EOF on any i/o error or
file EOF. If you care, you can distinguish between them with feof
and ferror.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.
Jul 4 '08 #28

P: n/a
CBFalconer <cb********@yahoo.comwrites:
pete wrote:
>vi******@gmail.com wrote:
>>The correct code would use feof() and ferror() and ignore what
fgetc() returns.

It makes more sense to me to check what fgetc returns,
and then to only check feof and ferror, if fgetc returned EOF.

Surprise - the fgetc function _does_ return EOF on any i/o error or
file EOF. If you care, you can distinguish between them with feof
and ferror.
If you'll go back and read the context that you snipped, you'll find
that we were discussing a (likely nonexistent but theoretically
possible) system on which fgetc can *also* return EOF on reading a
valid character.

(For this to be possible, sizeof(int) must be 1, which implies
CHAR_BIT>=16.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jul 4 '08 #29

P: n/a
Keith Thompson wrote:
CBFalconer <cb********@yahoo.comwrites:
>pete wrote:
>>vi******@gmail.com wrote:

The correct code would use feof() and ferror() and ignore what
fgetc() returns.

It makes more sense to me to check what fgetc returns,
and then to only check feof and ferror, if fgetc returned EOF.

Surprise - the fgetc function _does_ return EOF on any i/o error or
file EOF. If you care, you can distinguish between them with feof
and ferror.

If you'll go back and read the context that you snipped, you'll find
that we were discussing a (likely nonexistent but theoretically
possible) system on which fgetc can *also* return EOF on reading a
valid character.
Actually, I snipped nothing. The quote was (and is) the complete
message to which I replied, and it also contained no indication of
snipping. It's not worth fussing about, but as I see it fgetc can
always return EOF in place of a char. If sizeof(char) ==
sizeof(int) then some considerations are needed to restrict the
available chars, for example to those signified by positive ints.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.
Jul 4 '08 #30

P: n/a
In article <ln************@nuthaus.mib.org>,
Keith Thompson <ks***@mib.orgwrote:
>CBFalconer <cb********@yahoo.comwrites:
>pete wrote:
>>It makes more sense to me to check what fgetc returns,
and then to only check feof and ferror, if fgetc returned EOF.
>Surprise - the fgetc function _does_ return EOF on any i/o error or
file EOF. If you care, you can distinguish between them with feof
and ferror.
>If you'll go back and read the context that you snipped, you'll find
that we were discussing a (likely nonexistent but theoretically
possible) system on which fgetc can *also* return EOF on reading a
valid character.
I think Chuck may have misunderstood "if fgetc returned EOF" to mean
"if the fgetc() function were defined to return EOF" rather than "if
this call to fgetc() happens to return EOF". But I could be wrong.

-- Richard
--
Please remember to mention me / in tapes you leave behind.
Jul 4 '08 #31

P: n/a
Richard Tobin wrote:
) I think Chuck may have misunderstood "if fgetc returned EOF" to mean
) "if the fgetc() function were defined to return EOF" rather than "if
) this call to fgetc() happens to return EOF". But I could be wrong.

It would have been better worded as "when fgetc returns EOF", I think.
English has these subtle differences between 'if' and 'when', you see.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
Jul 4 '08 #32

P: n/a
CBFalconer wrote:
pete wrote:
>vi******@gmail.com wrote:
>>The correct code would use feof() and ferror() and ignore what
fgetc() returns.
I don't see any point in checking feof and ferror,
after fgetc returns something else besides EOF.
>It makes more sense to me to check what fgetc returns,
and then to only check feof and ferror, if fgetc returned EOF.

Surprise - the fgetc function _does_ return EOF on any i/o error or
file EOF. If you care, you can distinguish between them with feof
and ferror.
That's why I would check what fgetc returns,
and then only check feof and ferror, if fgetc returned EOF.

--
pete
Jul 4 '08 #33

P: n/a
Willem wrote:
Richard Tobin wrote:
) I think Chuck may have misunderstood "if fgetc returned EOF" to mean
) "if the fgetc() function were defined to return EOF" rather than "if
) this call to fgetc() happens to return EOF". But I could be wrong.

It would have been better worded as "when fgetc returns EOF", I think.
English has these subtle differences between 'if' and 'when', you see.
I see now.
I was probably thinking in C when I wrote what I wrote.

if (rc == EOF)
vs.
when (rc == EOF)

--
pete
Jul 4 '08 #34

P: n/a
CBFalconer <cb********@yahoo.comwrites:
Keith Thompson wrote:
>CBFalconer <cb********@yahoo.comwrites:
>>pete wrote:
vi******@gmail.com wrote:

The correct code would use feof() and ferror() and ignore what
fgetc() returns.

It makes more sense to me to check what fgetc returns,
and then to only check feof and ferror, if fgetc returned EOF.

Surprise - the fgetc function _does_ return EOF on any i/o error or
file EOF. If you care, you can distinguish between them with feof
and ferror.

If you'll go back and read the context that you snipped, you'll find
that we were discussing a (likely nonexistent but theoretically
possible) system on which fgetc can *also* return EOF on reading a
valid character.

Actually, I snipped nothing. The quote was (and is) the complete
message to which I replied, and it also contained no indication of
snipping.
You're right. The snipping that lost the context (that we were
talking about exotic systems) took place several messages upthread.
I should have checked that myself before complaining.
It's not worth fussing about, but as I see it fgetc can
always return EOF in place of a char. If sizeof(char) ==
sizeof(int) then some considerations are needed to restrict the
available chars, for example to those signified by positive ints.
Not necessarily. Suppose sizeof(char)==sizeof(int), and both types
are 16 bits, signed, and two's-complement. Then there are 65536
possible byte values that can be read from a file. You can still read
and distinguish all possible byte values, as long as you check feof()
and ferror() in addition to the usual comparison against EOF -- as
long as the conversion from unsigned char to int is sufficiently well
behaved.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jul 4 '08 #35

This discussion thread is closed

Replies have been disabled for this discussion.