getchar() and EOF confusion

arnuld

Mostly when I want to take input from stdin I use getchar() but I get this
from man page itself:

"If the integer value returned by getchar() is stored into a variable of
type char and then compared against the integer constant EOF, the
comparison may never succeed, because sign-extension of a variable of type
char on widening to integer is implementation-defined"
while( EOF != (ch = getchar()) ) ....
I use it like that. Can I run into problems with that ?

--
www.lispmachine.wordpress.com
my email is @ the above blog.

Oct 15 '08 #1

Subscribe Post Reply

3506

Richard Heathfield

arnuld said:

Mostly when I want to take input from stdin I use getchar() but I get
this from man page itself:

"If the integer value returned by getchar() is stored into a variable
of
type char and then compared against the integer constant EOF, the
comparison may never succeed, because sign-extension of a variable of
type char on widening to integer is implementation-defined"
while( EOF != (ch = getchar()) ) ....
I use it like that. Can I run into problems with that ?

You can run into problems with anything - but the above is a good idiom.
Ensure that ch has type int.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Oct 15 '08 #2

arnuld

On Wed, 15 Oct 2008, Richard Heathfield wrote:

You can run into problems with anything - but the above is a good idiom.
Ensure that ch has type int.

Oh.. thats why K&R2 uses "int ch" , thanks :)

Oct 15 '08 #3

Barry Schwarz

On Wed, 15 Oct 2008 09:47:14 +0500, arnuld <su*****@invalid.address>
wrote:

>Mostly when I want to take input from stdin I use getchar() but I get this
from man page itself:

"If the integer value returned by getchar() is stored into a variable of
type char and then compared against the integer constant EOF, the
comparison may never succeed, because sign-extension of a variable of type
char on widening to integer is implementation-defined"
while( EOF != (ch = getchar()) ) ....
I use it like that. Can I run into problems with that ?

getchar treats the data it obtains from the stream as unsigned. EOF
is guaranteed to be negative. Can you see where this leads?

--
Remove del for email

Oct 15 '08 #4

danmath06

arnuld <sunr...@invalid.addresswrote:

Mostly when I want to take input from stdin I use getchar() but I get this
from man page itself:

"If the integer value returned by getchar() is stored into a variable of
type char and then compared against the integer constant EOF, the
comparison may never succeed, because sign-extension of a variable of type
char on widening to integer is implementation-defined"

while( EOF != (ch = getchar()) ) ....

I use it like that. Can I run into problems with that ?

Yes, if ch is not an int. The prototype for getchar() is: "int
getchar(void);". So you should use an int to hold the return from
getchar();

Oct 15 '08 #5

Peter Nilsson

arnuld <sunr...@invalid.addresswrote:

Mostly when I want to take input from stdin I use
getchar() but I get this from man page itself: *

* "If the *integer value returned by getchar() is
stored into a variable of*type char and then
compared against the integer constant EOF, the
* *comparison may never succeed, because sign-
extension of a variable of type*char on
widening to integer is implementation-defined"

The manual is poorly written. Integral promotion
is well defined and will always be value preserving
in the case of char values.

What is implementation defined is whether plain char
is signed or unsigned, but that too is mostly
incidental.

* * *while( EOF != (ch = getchar()) ) ....

I use it like that. Can I run into problems with that ?

Did you read the FAQ?

http://c-faq.com/stdio/getcharc.html

--
Peter

Oct 15 '08 #6

Pranav

Then does reading a character sized data into a integer type of data
variable do cause an issue in the porting of the code ??

Oct 15 '08 #7

Nate Eldredge

Pranav <pr*******@gmail.comwrites:

Then does reading a character sized data into a integer type of data
variable do cause an issue in the porting of the code ??

I'm not sure I understand what you mean. Can you give an example of the
kind of code you have in mind?

Oct 15 '08 #8

arnuld

On Tue, 14 Oct 2008 23:00:57 -0700, Pranav wrote:

Then does reading a character sized data into a integer type of data
variable do cause an issue in the porting of the code ??

No, as every character is converted into an integer at compilation. Right
clc folks ? ( or you think I am confusing ASCII table with compiler ?)

--
www.lispmachine.wordpress.com
my email is @ the above blog.
Google Groups is UnBlocked now :)

Oct 15 '08 #9

Keith Thompson

arnuld <su*****@invalid.addresswrites:

>On Tue, 14 Oct 2008 23:00:57 -0700, Pranav wrote:
Then does reading a character sized data into a integer type of data
variable do cause an issue in the porting of the code ??

No, as every character is converted into an integer at compilation. Right
clc folks ? ( or you think I am confusing ASCII table with compiler ?)

Pranav was talking about run-time input, not compilation.

Note that type char is an integer type. It's important to distinguish
between an integer type (of which there are several, including char,
int, unsigned long, etc.) and the specific integer type called "int".
The name "int" was obviously formed as an abbreviation of the word
"integer", but they mean different things.

getchar() attempts to read the next character from stdin. If it
succeeds, it treats the character as a value of type unsigned char,
and then converts the resulting unsigned char value to int. Since all
unsigned char values are non-negative, the result of the conversion is
non-negative. If it fails (either because there's no more input or
because of some error), it returns the int value EOF, which, since
it's negative, is distinct from any valid character value. (Plain
char may be either signed or unsigned -- but getchar() doesn't use
plain char.)

The answer to Pranav's questions is no, this doesn't cause any
problems with porting code.

Well, mostly. Some exotic machines might have sizeof(int)==1 (which
can happen only if char is at least 16 bits). On such a system, it
can be difficult to distinguish between EOF (typically an int value of
-1) and a valid character with the unsigned char value 0xffff, which
when converted to int is likely to yield -1.

You're unlikely to run into this in practice. Machines with this
characteristic are typically DSPs (digital signal processors) which
typically have freestanding C implementations, so stdio.h might not
even be available. But if you want your code to be 100% portable, you
can first check whether the result returned by getchar() is equal to
EOF, and then check whether either feof() or ferror() returns a true
value. In practice, we don't generally bother.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Oct 15 '08 #10

arnuld

On Tue, 14 Oct 2008 23:39:59 -0700, Keith Thompson wrote:

Note that type char is an integer type. It's important to distinguish
between an integer type (of which there are several, including char,
int, unsigned long, etc.) and the specific integer type called "int".
The name "int" was obviously formed as an abbreviation of the word
"integer", but they mean different things.

Now I am much curious. Whats the different between an "integer" and a
variable of type "int". Do "integer types" are different from "int types"

..SNIP..

You're unlikely to run into this in practice. Machines with this
characteristic are typically DSPs (digital signal processors) which
typically have freestanding C implementations, so stdio.h might not
even be available. But if you want your code to be 100% portable, you
can first check whether the result returned by getchar() is equal to
EOF, and then check whether either feof() or ferror() returns a true
value. In practice, we don't generally bother.

Now I know why some clc lurker told me to distinguish between real end
of file (no more input) and the not so real end of file (error in input)
and suggested me to use feof() and ferror() for that.

--
www.lispmachine.wordpress.com
my email is @ the above blog.
Google Groups is UnBlocked now :)

Oct 15 '08 #11

Richard Heathfield

arnuld said:

>On Tue, 14 Oct 2008 23:39:59 -0700, Keith Thompson wrote:

>Note that type char is an integer type. It's important to distinguish
between an integer type (of which there are several, including char,
int, unsigned long, etc.) and the specific integer type called "int".
The name "int" was obviously formed as an abbreviation of the word
"integer", but they mean different things.

Now I am much curious. Whats the different between an "integer" and a
variable of type "int". Do "integer types" are different from "int types"

char is an integer type, but it isn't an int. unsigned char and signed char
are other integer types that are not ints. Furthermore, we can qualify the
bare term "int" with, for example, "short" or "long", resulting in types
that are very similar in behaviour to bare-bones int, but with different
ranges: e.g. int's range is INT_MIN to INT_MAX, but short int's range is
SHRT_MIN to SHRT_MAX and long int's is LONG_MIN to LONG_MAX. And yet these
are integer types. Not ints, but integer types.

The difference isn't just one of ranges and types - it can also be one of
behaviour. The unsigned int type is an integer type that isn't an int. Its
range is 0 to UINT_MAX - but any results outside this range that you try
to store in it are reduced modulo (UINT_MAX + 1) so that they come within
range, and this behaviour is well-defined. This sure beats int's behaviour
when you try to store an out of range result.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Oct 15 '08 #12

James Kuyper

arnuld wrote:

>On Tue, 14 Oct 2008 23:39:59 -0700, Keith Thompson wrote:

>Note that type char is an integer type. It's important to distinguish
between an integer type (of which there are several, including char,
int, unsigned long, etc.) and the specific integer type called "int".
The name "int" was obviously formed as an abbreviation of the word
"integer", but they mean different things.

Now I am much curious. Whats the different between an "integer" and a
variable of type "int". Do "integer types" are different from "int types"

The standard doesn't define any meaning for the phrase "int types". It
does define "integer types". "int" is the name one of one particular
integer type.

Integer types (6.2.5p17):
char

signed integer types (6.2.5p4):
standard signed integer types:
signed char, short int, int, long int, long long int

extended signed integer types (implementation-defined)

unsigned integer types (6.2.5p6):
standard unsigned integer types:
_Bool, and unsigned types corresponding to standard signed
integer types

extended unsigned integer types (implementation-defined)

enumerated types

It's not possible to be specific about the extended integer types. They
are implementation-defined types, such as _int36 for a 36-bit integer
type. In C90, such types were allowed only as an extension to C. This
meant that, in particular, things like size_t that were required to be
integer types could only be typedefs for standard types. In C99, the
concept of "extended integer types" was defined, and size_t is allowed
to refer any unsigned integer type, whether standard or extended.

....

Now I know why some clc lurker told me to distinguish between real end
of file (no more input) and the not so real end of file (error in input)
and suggested me to use feof() and ferror() for that.

EOF is just a macro name; it's clearly named in reference to "End Of
File", but it's also used by the character-oriented I/O functions as a
general-purpose error flag, not exclusively to refer to the end of the file.

Oct 15 '08 #13

James Kuyper

arnuld wrote:

>On Tue, 14 Oct 2008 23:39:59 -0700, Keith Thompson wrote:

>Note that type char is an integer type. It's important to distinguish
between an integer type (of which there are several, including char,
int, unsigned long, etc.) and the specific integer type called "int".
The name "int" was obviously formed as an abbreviation of the word
"integer", but they mean different things.

Now I am much curious. Whats the different between an "integer" and a
variable of type "int". Do "integer types" are different from "int types"

Now I know why some clc lurker told me to distinguish between real end
of file (no more input) and the not so real end of file (error in input)
and suggested me to use feof() and ferror() for that.

Oct 15 '08 #14

Michael

arnuld wrote:

Mostly when I want to take input from stdin I use getchar() but I get this
from man page itself:

"If the integer value returned by getchar() is stored into a variable of
type char and then compared against the integer constant EOF, the
comparison may never succeed, because sign-extension of a variable of type
char on widening to integer is implementation-defined"
while( EOF != (ch = getchar()) ) ....
I use it like that. Can I run into problems with that ?

the function

int getchar();

reads a byte from the standard input and return it.
If End-of-file is read, it returns EOF (on my machine, it is 0xffffffff)
If ch is an int, there is no problem at all.
A common mistake is assigning getchar() into a char variable.
For example, if ch is a char:

EOF!=(ch=getchar())

When the byte of 0xff is read:

getchar()=0x000000ff
ch=0xff

Because EOF is an int, the value of ch is automatically casted to int.

If ch is unsigned, R.H.S of != is 0x000000ff
If ch is signed, R.H.S of != is 0xffffffff which is equal to EOF and
while loop will exit

Therefore, if ch is a char, there will be a problem if the read
character is expanded to EOF (which is implementation-specific) and the
signedness of char (again which is implementation-specific)

Oct 15 '08 #15

James Kuyper

Michael wrote:

arnuld wrote:

....

> while( EOF != (ch = getchar()) ) ....

....

If ch is an int, there is no problem at all.

Unless INT_MAX<UCHAR_MAX, which is possible on systems where CHAR_BIT >=
16. On such systems, it's possible for a valid byte, when converted to
'int', to have the same value as EOF. The only work-around for that
possibility is to check feof() and ferror().

Oct 15 '08 #16

Keith Thompson

Michael <mi*****@michaeldadmum.no-ip.orgwrites:
[...]

the function

int getchar();

reads a byte from the standard input and return it.
If End-of-file is read, it returns EOF (on my machine, it is 0xffffffff)

[...]

No, EOF cannot be defined as 0xffffffff. It must expand to "an
integer constant expression, with type int and a negative value". A
typical definition is

#define EOF (-1)

If you convert the value of EOF to unsigned int on a 32-bit system,
the result is likely to be 0xffffffff; that's not the value of EOF,
it's the result of the conversion.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Oct 15 '08 #17

Michael

Keith Thompson wrote:

Michael <mi*****@michaeldadmum.no-ip.orgwrites:
[...]
>the function

int getchar();

reads a byte from the standard input and return it.
If End-of-file is read, it returns EOF (on my machine, it is 0xffffffff)
[...]

No, EOF cannot be defined as 0xffffffff. It must expand to "an
integer constant expression, with type int and a negative value". A
typical definition is

#define EOF (-1)

If you convert the value of EOF to unsigned int on a 32-bit system,
the result is likely to be 0xffffffff; that's not the value of EOF,
it's the result of the conversion.

0xffffffff is hexadecimal *is* -1 in decimal on 32-bit int.

Oct 16 '08 #18

Chris Dollin

Michael wrote:

Keith Thompson wrote:

>If you convert the value of EOF to unsigned int on a 32-bit system,
the result is likely to be 0xffffffff; that's not the value of EOF,
it's the result of the conversion.

0xffffffff is hexadecimal *is* -1 in decimal on 32-bit int.

Not if it's an /unsigned/ int (see Keith's first sentence above).

--
'It changed the future .. and it changed us.' /Babylon 5/

Hewlett-Packard Limited registered office: Cain Road, Bracknell,
registered no: 690597 England Berks RG12 1HN

Oct 16 '08 #19

jameskuyper

Michael wrote:

Keith Thompson wrote:

....

If you convert the value of EOF to unsigned int on a 32-bit system,
the result is likely to be 0xffffffff; that's not the value of EOF,
it's the result of the conversion.

0xffffffff is hexadecimal *is* -1 in decimal on 32-bit int.

Not in C. In C, 0xFFFFFFFF is just a different way of writing the same
value as 2147483647 - the only difference is that 0xFFFFFFFF might
have an unsigned type, while 2147483647 must have a signed type.
0xFFFFFFFF never has the meaning "-1". It can be converted to an int,
and if 'int' is a 32-bit 2's complement type the result of that
conversion will probably be -1, but that doesn't mean that 0xFFFFFFFF
is -1.

Oct 16 '08 #20

Keith Thompson

Michael <mi*****@michaeldadmum.no-ip.orgwrites:

Keith Thompson wrote:
>Michael <mi*****@michaeldadmum.no-ip.orgwrites:
[...]
>>the function

int getchar();

reads a byte from the standard input and return it.
If End-of-file is read, it returns EOF (on my machine, it is 0xffffffff)
[...]
No, EOF cannot be defined as 0xffffffff. It must expand to "an
integer constant expression, with type int and a negative value". A
typical definition is
#define EOF (-1)
If you convert the value of EOF to unsigned int on a 32-bit system,
the result is likely to be 0xffffffff; that's not the value of EOF,
it's the result of the conversion.

0xffffffff is hexadecimal *is* -1 in decimal on 32-bit int.

No, 0xffffffff is an integer constant with the value 4294967295
(2**32-1, where "**" denotes exponentiation).

Assuming int is 32 bits, 2's-complement, no padding bits, no trap
representations, then that value cannot be represented by type int.
If you assign 0xffffffff to an int object, then, strictly speaking,
the result is an implementation-defined value (or, optionally and in
C99 only, an implementation-defined signal). In practice, it's very
likely that the value -1 will be assigned -- this is the
(implementation-defined but very common) result of the conversion.
Because of the conversion *the value changes*.

Assigning -1 to an object of type unsigned int will result in the
object having the value UINT_MAX, which, if unsigned int is 32 bits
with no padding bits, is 4294967295 or 0xffffffff. Again, the
implicit conversion from int (the type of the expression -1) to
unsigned int (the type of the object) changes the value. (Conversion
to unsigned types is defined differently by the standard than
conversion to signed types.)

I suspect that you're thinking of hexadecimal notation as a way of
specifying the representation of an object, as opposed to decimal
notation, which specifies a mathematical numeric value. If so, you
are mistaken. In C, decimal and hexadecimal are just two different
notations for representing integer values; there's nothing magical
about either one. 0xff, 0x00ff, and 255 mean *exactly* the same
thing.

On the other hand, in English text it's not unreasonable to use
hexadecimal notation to talk about object representations, so that
0xff refers to 8 bits all set to 1, and 0x00ff refers to 16 bits (and
thus is distinct from 0xff). But since C has a well-defined meaning
for hexadecimal notation, if you're going to use it that way you need
to say so explicitly.

For example, the representation of the 32-bit int value -1 is
0xffffffff.

(Octal is the third notation; it's probably not used as much these
days, though it was very useful on the PDP-11. Except that, strictly
speaking, 0 is an octal constant, so most C programmers use octal
every day without realizing it.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Oct 16 '08 #21

jameskuyper

jameskuy...@verizon.net wrote:
....

Not in C. In C, 0xFFFFFFFF is just a different way of writing the same
value as 2147483647 - the only difference is that 0xFFFFFFFF might
have an unsigned type, while 2147483647 must have a signed type.
0xFFFFFFFF never has the meaning "-1". It can be converted to an int,
and if 'int' is a 32-bit 2's complement type the result of that
conversion will probably be -1, but that doesn't mean that 0xFFFFFFFF
is -1.

I used the wrong power of 2, of course. Replace 2147483647 with
4294967295 in all locations in that paragraph. Argh!

Oct 16 '08 #22

David Thompson

On Thu, 16 Oct 2008 08:53:18 -0700 (PDT), ja*********@verizon.net
wrote:

jameskuy...@verizon.net wrote:
...
Not in C. In C, 0xFFFFFFFF is just a different way of writing the same
value as 2147483647 - the only difference is that 0xFFFFFFFF might
have an unsigned type, while 2147483647 must have a signed type.
0xFFFFFFFF never has the meaning "-1". It can be converted to an int,
and if 'int' is a 32-bit 2's complement type the result of that
conversion will probably be -1, but that doesn't mean that 0xFFFFFFFF
is -1.

I used the wrong power of 2, of course. Replace 2147483647 with
4294967295 in all locations in that paragraph. Argh!

In C89 unsuffixed decimal goes signed int, signed long, unsigned long.
For the common case of L32, that literal (with value 2 up 32 less 1)
is UL. That isn't relevant to your argument, though. However,
unsuffixed hex or octal CAN be signed; they go SI, UI, SL, UL.
For L32 0x(eight F) is UL, and as you say equal to (UL)-1,
but for L>32 it's SL (and large positive != -1) unless I=32 in which
case it's (representable in) UI (and again equal to converted -1).

In C99 unsuffixed decimal stays signed. Unsuffixed octal or hex still
goes through both signed and unsigned; for most machines with
power-of-2 sizes (S=U) the choice will land on unsigned, but other
architectures are permitted.

It is always true, as you say, that the C literal 0x(eight F) (as
opposed to the bit representation) is never actually (int)-1; it is
*sometimes* true that it converts to and/or from (int)-1. Even when it
works, it is less clear and obvious than writing -1 to mean -1.

- formerly david.thompson1 || achar(64) || worldnet.att.net

Oct 27 '08 #23

getchar() and EOF confusion

Similar topics