James Daughtry wrote:[color=blue][color=green]
>>make plain char an unsigned type.[/color]
>
> Why do I get the feeling that this is the best solution for everyone to
> begin with? Muddy areas do nothing but get your shoes messy, so it's
> better to wander around them, no?[/color]
Remember what I said about hindsight being a luxury not
afforded to pioneers? We wouldn't have this mess if
- `char' had been unsigned from the git-go. However,
this would have penalized C implementations on some
machines (the historically important PDP-11 among
them), and the consequence might have been that we
wouldn't have this mess but also wouldn't have C.
... or ...
- getc() and the like didn't try to make the return
value carry both status (EOF) and data simulaneously.
It's this that forces them to return a perverted
sort of `char'-ish value instead of a value that's
actually expressible as a `char'.[*]
... or ...
- getc() and co. had not been designed to cater to the
convention that "negative return values are errors."
Actually, the Unix convention was that a -1 returned
by a system call indicated an error, but many coders
tested with `< 0' instead of with `== -1' -- it often
required less code, and memory used to be far more
precious than it is today. Defining EOF as CHAR_MIN-1
or CHAR_MAX+1 would have allowed getc() to return a
"natural" `char' value in a wider[*] `int' and avoided
all the present difficulties -- but all the `< 0' tests
would have broken.
... or ...
- The <ctype.h> functions had been defined to work on
`char' values instead of on the perverted values
returned by getc(). This would sometimes have meant
casting a getc()-returned `int' value to `char'
before handing it to toupper(), but that seems a lot
more natural (and easier to explain) than requiring
people to cast to `unsigned char' on the way from
`char' to `int'.
However, it didn't happen that way. And it's not going
to change -- some of these facilities may eventually be
supplanted by others better-suited to handling international
character sets, but nobody's going to respecify getc() or
toupper() at this late date. Hell, we can't even get up the
gumption to murder gets()! Plato regarded the world as an
imperfect expression of a perfect ideal; he'd have understood.
[*] The assumption that `int' is wider than `char' is no
longer universally true, which creates difficulties on some
systems: How can you choose an `int' value that's distinct
from all `char' values on a system where CHAR_MIN==INT_MIN
and CHAR_MAX==INT_MAX? I personally have not used such a
machine, but two possibilities occur to me: First, the I/O
facilities could continue to operate with characters of eight
(or so) bits, even though they occupy more memory. On input,
you'd get character values with a lot of high-order sign bits
(all zeroes or all ones), so EOF could be a `char' value that
corresponds to no actual I/O character. Output would probably
"chop" the high-order bits of a `char' that corresponded to
no character; the unpleasant consequence would be that there
would be `char' values that could not be written out and read
back in again.
The second approach would be to say that all `char' and
`int' values are "legitimate" as characters, so EOF would not
have a distinguished value. This would require coding along
the lines of
int ch;
if ((ch = getchar())) == EOF
&& (feof(stdin) || ferror(stdin))) ...
or even just
int ch = getchar();
if (feof(stdin) || ferror(stdin)) ...
.... meaning that the use of in-band signalling for exceptional
conditions really hasn't helped; you wind up needing to test
the out-of-band channel anyhow.
--
Eric.Sosman@sun.com