By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
445,819 Members | 1,173 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 445,819 IT Pros & Developers. It's quick & easy.

transformation wisdom

P: n/a
mdh
Could I get some help understanding a concept that is related to
exercise 4-9 in K&R II. The question relates to the properties of "EOF"
and the issue of transformation from Char to Int. On page 43 or K&R,
(last paragraph) it says "There is one subtle point about the
conversion of characters to integers. The language does not specify
whether variables of type char are signed or unsigned quantities......"
Then goes onto explaining how different machines might convert a char
to a pos or neg integer. But, then it says, (p 44, 1st paragraph) "The
definition of C guarantees that any character in the machine's standard
printing character set will never be negative, so these characters
will always be positive quantities in expressions. But arbitrary bit
patterns stored in character variables may appear to be negative on
some machines, yet positive on others"
I am clearly missing something. The answer to the exercise simply had
the "push-back" characters stored in an array of type "integer" as
opposed to type "character", but even though I see this, the above
explanation has left me more confused than enlightened!
thanks in advance.

Oct 1 '06 #1
Share this Question
Share on Google+
4 Replies


P: n/a
mdh said:
Could I get some help understanding a concept that is related to
exercise 4-9 in K&R II. The question relates to the properties of "EOF"
and the issue of transformation from Char to Int. On page 43 or K&R,
(last paragraph) it says "There is one subtle point about the
conversion of characters to integers. The language does not specify
whether variables of type char are signed or unsigned quantities......"
Then goes onto explaining how different machines might convert a char
to a pos or neg integer. But, then it says, (p 44, 1st paragraph) "The
definition of C guarantees that any character in the machine's standard
printing character set will never be negative, so these characters
will always be positive quantities in expressions. But arbitrary bit
patterns stored in character variables may appear to be negative on
some machines, yet positive on others"
I am clearly missing something. The answer to the exercise simply had
the "push-back" characters stored in an array of type "integer" as
opposed to type "character", but even though I see this, the above
explanation has left me more confused than enlightened!
It's even more confusing than that, because all input in C is done "as if"
by repeated calls to getc. Now, getc will only ever return a negative value
on end-of-file (when, of course, it returns EOF). The rest of the time, it:

a) captures the character
b) represents it as unsigned char
c) converts the unsigned char representation to an int
d) returns the int

Because of step b), even if your implementation has signed chars by default
and even if your input has something bizarre in it (say, the UK currency
symbol) you will still get a positive value for it. It's only when you
shove it back into a char that it will, if appropriate, revert to being
negative.

Actually, it generally works out okay, just doing what comes naturally.

In a getc (or getchar) loop, just use int.

int ch;
while((ch = getchar()) != EOF)
{
putchar(ch);
}

If you're filtering, just use ch:

int ch;
while((ch = getchar()) != EOF)
{
if(isalpha(ch))
{
putchar(ch);
}
}

If you're populating a string, use a char array:

int ch;
char buf[N] = {0};
int n = 0;
while(n < N - 1 && (ch = getchar()) != EOF)
{
if(isalpha(ch))
{
buf[n++] = ch;
}
}

In other words, just do normal stuff, and the chances are high that it will
all work exactly as you expect.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
Oct 1 '06 #2

P: n/a
mdh

mdh said:
Could I get some help understanding a concept that is related to
exercise 4-9 in K&R II.>
Richard Heathfield wrote:
It's even more confusing than that.....

oh boy!!

Thanks for the explanation.

Oct 1 '06 #3

P: n/a
mdh wrote:
Could I get some help understanding a concept that is related to
exercise 4-9 in K&R II. The question relates to the properties of "EOF"
and the issue of transformation from Char to Int. On page 43 or K&R,
(last paragraph) it says "There is one subtle point about the
conversion of characters to integers. The language does not specify
whether variables of type char are signed or unsigned quantities......"
Then goes onto explaining how different machines might convert a char
to a pos or neg integer. But, then it says, (p 44, 1st paragraph) "The
definition of C guarantees that any character in the machine's standard
printing character set will never be negative, so these characters
will always be positive quantities in expressions. But arbitrary bit
patterns stored in character variables may appear to be negative on
some machines, yet positive on others"
I am clearly missing something. The answer to the exercise simply had
the "push-back" characters stored in an array of type "integer" as
opposed to type "character", but even though I see this, the above
explanation has left me more confused than enlightened!
thanks in advance.
It's the magic word "standard" that may be the cause of your
confusion. The C language defines a set of characters that must
be available at run-time: the upper- and lower-case letters, the
decimal digits, various punctuation marks, some special things
like '\n' an '\0', and so on. These are the "standard" characters,
and all of them must have non-negative codes.

But each implementation may also support additional characters
over and above those required by the language definition. Accented
letters like Àéîôü, special symbols like ¶¥$©, letters outside the
English repertoire like ßΣƏ, and perhaps many others. These are
"extended" characters, and their codes may be positive or negative;
the language definition doesn't specify.

The upshot is that even though the standard characters will
always be non-negative, any arbitrary character code (that might
be either a standard character or an extended character) could be
of either sign.

--
Eric Sosman
es*****@acm-dot-org.invalid
Oct 1 '06 #4

P: n/a
mdh

Eric Sosman wrote:
>
The upshot is that even though the standard characters will
always be non-negative, any arbitrary character code (that might
be either a standard character or an extended character) could be
of either sign.

thanks...makes sense.

Oct 1 '06 #5

This discussion thread is closed

Replies have been disabled for this discussion.