By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,148 Members | 1,342 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,148 IT Pros & Developers. It's quick & easy.

May fgetc() and friends return 163? Or UCHAR_MAX?

P: n/a
In a thread from substantially earlier this week,

Harald van D?k <tr*****@gmail.comwrote:
getchar does not work with plain chars, it works with unsigned chars. 163
fits just fine in an unsigned char, so getchar is allowed to return 163.
Being rather pendantic, I decided to try to verify whether this was
true. I would appreciate knowing whether my reading of the Standard
is correct.

7.19.7.1 (as we all know) states that fgetc() (and thus its friends)
"obtains [a] character as an unsigned char converted to an int".
There is nothing in the Standard (that I was able to find) which
states that sizeof(int) may not be 1, so it occurred to me to ask, "Is
163 always representable as a signed int if sizeof(int) is 1?"
5.2.4.2.1 states that INT_MAX may not be less than 32767, so the
answer to that question appears to be "yes".

On the other hand, I do not see anything in 5.2.4.2.1 which requires
that UCHAR_MAX not be greater than INT_MAX - which indeed it must be,
if sizeof(int) == 1, correct? In such a case, fgetc() may return
UCHAR_MAX (right?), and so either fgetc() must work behind-the-scenes
magic to return a signed integer representing UCHAR_MAX, or invoke UB
by overflowing the signed type int. Both of these alternatives seem
ridiculous to me, so what am I missing?

--
C. Benson Manica | I *should* know what I'm talking about - if I
cbmanica(at)gmail.com | don't, I need to know. Flames welcome.
Jun 7 '07 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Christopher Benson-Manica wrote:
In a thread from substantially earlier this week,

Harald van D?k <tr*****@gmail.comwrote:
>getchar does not work with plain chars, it works with unsigned chars. 163
fits just fine in an unsigned char, so getchar is allowed to return 163.

Being rather pendantic, I decided to try to verify whether this was
true. I would appreciate knowing whether my reading of the Standard
is correct.

7.19.7.1 (as we all know) states that fgetc() (and thus its friends)
"obtains [a] character as an unsigned char converted to an int".
There is nothing in the Standard (that I was able to find) which
states that sizeof(int) may not be 1, so it occurred to me to ask, "Is
163 always representable as a signed int if sizeof(int) is 1?"
5.2.4.2.1 states that INT_MAX may not be less than 32767, so the
answer to that question appears to be "yes".
Right.
On the other hand, I do not see anything in 5.2.4.2.1 which requires
that UCHAR_MAX not be greater than INT_MAX - which indeed it must be,
if sizeof(int) == 1, correct?
Correct. signed int has at least INT_MAX - INT_MIN + 1 distinct
representations, and if sizeof(int) == 1, that means unsigned char must be
capable of storing at least that many values. However, it is allowed to be
capable of storing even more.
In such a case, fgetc() may return
UCHAR_MAX (right?), and so either fgetc() must work behind-the-scenes
magic to return a signed integer representing UCHAR_MAX, or invoke UB
by overflowing the signed type int. Both of these alternatives seem
ridiculous to me, so what am I missing?
The behaviour is not undefined for integer conversions of out-of-range
values, not even for the signed types. Either the result is
implementation-defined, or an implementation-defined signal is raised, see
6.3.1.3p3. The result is the same: fgetc need not or cannot be meaningful.

However, 7.19.2p3 states that

"A binary stream is an ordered sequence of characters that can transparently
record internal data. Data read in from a binary stream shall compare equal
to the data that were earlier written out to that stream, under the same
implementation. Such a stream may, however, have an implementation-defined
number of null characters appended to the end of the stream."

This requirement cannot be met by an implementation where the conversion of
out-of-range values results in a signal, or where the conversion of
out-of-range values cannot be reverted. So by my reading, only freestanding
implementations that do not provide the standard I/O functions at all are
allowed to define unsigned char and int in such ways.
Jun 7 '07 #2

P: n/a
Harald van D?k <tr*****@gmail.comwrote:
The behaviour is not undefined for integer conversions of out-of-range
values, not even for the signed types. Either the result is
implementation-defined, or an implementation-defined signal is raised, see
6.3.1.3p3. The result is the same: fgetc need not or cannot be meaningful.
The language in n869 does not mention signals, but I assume that is a
difference between the draft and the actual standards.
However, 7.19.2p3 states that
"A binary stream is an ordered sequence of characters that can transparently
record internal data. Data read in from a binary stream shall compare equal
to the data that were earlier written out to that stream, under the same
implementation. Such a stream may, however, have an implementation-defined
number of null characters appended to the end of the stream."
This requirement cannot be met by an implementation where the conversion of
out-of-range values results in a signal, or where the conversion of
out-of-range values cannot be reverted.
Yes, that makes sense, although on further reading, it seems that an
implementation could work internal magic to establish a one-to-one
relationship between all unsigned char values from 0 to UCHAR_MAX and all
signed int values from INT_MIN to INT_MAX. That would mean that an
implementation would have to ensure that there were at least as many
valid signed int values as unsigned char values, with an extra signed
int value representing EOF. It does sound like a tall order for an
implementation where sizeof(int) == 1, but possible on a DS9K level.

--
C. Benson Manica | I *should* know what I'm talking about - if I
cbmanica(at)gmail.com | don't, I need to know. Flames welcome.
Jun 7 '07 #3

P: n/a
Christopher Benson-Manica wrote:
Harald van D?k <tr*****@gmail.comwrote:
>The behaviour is not undefined for integer conversions of out-of-range
values, not even for the signed types. Either the result is
implementation-defined, or an implementation-defined signal is raised,
see 6.3.1.3p3. The result is the same: fgetc need not or cannot be
meaningful.

The language in n869 does not mention signals, but I assume that is a
difference between the draft and the actual standards.
I don't remember when it was added. I believe it's part of C99, but I may be
misremembering. I'm reading from n1124.
>However, 7.19.2p3 states that
>"A binary stream is an ordered sequence of characters that can
transparently
record internal data. Data read in from a binary stream shall compare
equal to the data that were earlier written out to that stream, under
the same implementation. Such a stream may, however, have an
implementation-defined number of null characters appended to the end of
the stream."
>This requirement cannot be met by an implementation where the conversion
of out-of-range values results in a signal, or where the conversion of
out-of-range values cannot be reverted.

Yes, that makes sense, although on further reading, it seems that an
implementation could work internal magic to establish a one-to-one
relationship between all unsigned char values from 0 to UCHAR_MAX and all
signed int values from INT_MIN to INT_MAX. That would mean that an
implementation would have to ensure that there were at least as many
valid signed int values as unsigned char values, with an extra signed
int value representing EOF. It does sound like a tall order for an
implementation where sizeof(int) == 1, but possible on a DS9K level.
EOF need not be distinct from any valid character converted to int. Though
most code doesn't, strictly speaking, after reading EOF, you should call
feof() and ferror() to check whether more characters can be read. Note that
this is necessary even for non-DS9K systems when using fgetwc() e.a.
Jun 7 '07 #4

P: n/a

"Christopher Benson-Manica" <at***@faeroes.freeshell.orgwrote in message
news:f4**********@chessie.cirr.com...
In a thread from substantially earlier this week,

Harald van D?k <tr*****@gmail.comwrote:
>getchar does not work with plain chars, it works with unsigned chars. 163
fits just fine in an unsigned char, so getchar is allowed to return 163.

Being rather pendantic, I decided to try to verify whether this was
true. I would appreciate knowing whether my reading of the Standard
is correct.

7.19.7.1 (as we all know) states that fgetc() (and thus its friends)
"obtains [a] character as an unsigned char converted to an int".
There is nothing in the Standard (that I was able to find) which
states that sizeof(int) may not be 1, so it occurred to me to ask, "Is
163 always representable as a signed int if sizeof(int) is 1?"
5.2.4.2.1 states that INT_MAX may not be less than 32767, so the
answer to that question appears to be "yes".

On the other hand, I do not see anything in 5.2.4.2.1 which requires
that UCHAR_MAX not be greater than INT_MAX - which indeed it must be,
if sizeof(int) == 1, correct? In such a case, fgetc() may return
UCHAR_MAX (right?), and so either fgetc() must work behind-the-scenes
magic to return a signed integer representing UCHAR_MAX, or invoke UB
by overflowing the signed type int. Both of these alternatives seem
ridiculous to me, so what am I missing?
Yes. That's a known glitch. A system that makes sizeof(int) == 1 has no way
of returning EOF and distinguishing it from a legal value.
In practise files are probably read as octets. Which is OK but breaks
fputc(), but only in binary mode.

Jun 17 '07 #5

This discussion thread is closed

Replies have been disabled for this discussion.