A character with a negative value

Martin Wells

Plain char may be signed or unsigned. Typical ranges could be:

CHAR_MIN == -128, CHAR_MAX == 127

CHAR_MIN == 0, CHAR_MAX == 255

The Standard says that the behaviour is undefined if we pass an
argument to the "to*" functions whose value is outside the range of 0
through UCHAR_MAX. This most certainly should have been CHAR_MIN
through CHAR_MAX.

If there were a particular implementation where a valid character had
a negative value, wouldn't it make perfect sense that you can pass
this value to "to*"? I think it would be ridiculously stupid if you
couldn't.

As an example, let's say that there's an uppercase alphabetical
character whose numeric value is 17, and that the lowercase form of
this character's value is -8. If we pass the former to "tolower", we
should get -8, and if we pass the latter to "toupper", we should get
17. Now of course, the Standard itself doesn't guarantee this... but
if the implementation has negative values for valid characters then it
would be quite stupid if you couldn't do normal operations on these
valid characters. How many people here use an "unsigned char" cast
when using the "to*" functions? Because I don't.

Martin

Nov 1 '07 #1

Subscribe Post Reply

4503

Chris Dollin

Martin Wells wrote:

As an example, let's say that there's an uppercase alphabetical
character whose numeric value is 17, and that the lowercase form of
this character's value is -8.

Can't happen in a conforming implementation.

How many people here use an "unsigned char" cast when using the "to*"

Me.

functions? Because I don't.

Oops.

--
Chris "unsinged and unsung" Dollin

Hewlett-Packard Limited Cain Road, Bracknell, registered no:
registered office: Berks RG12 1HN 690597 England

Nov 1 '07 #2

vipvipvipvipvip.ru

On Nov 1, 3:18 pm, Martin Wells <war...@eircom.netwrote:

Plain char may be signed or unsigned. Typical ranges could be:

CHAR_MIN == -128, CHAR_MAX == 127

CHAR_MIN == 0, CHAR_MAX == 255

CHAR_M(IN/AX) is for plain char. use (S/U)CHAR_M(IN/MAX)
SCHAR_MIN <= -127 SCHAR_MAX >= 127
UCHAR_MIN 0 UCHAR_MAX >= 255

Cast the argument in all your to*() calls to (unsigned char).

Nov 1 '07 #3

Martin Wells

Chris:

As an example, let's say that there's an uppercase alphabetical
character whose numeric value is 17, and that the lowercase form of
this character's value is -8.

Can't happen in a conforming implementation.

Specifically what can't happen in a conforming implementation?

Martin

Nov 1 '07 #4

Chris Dollin

Martin Wells wrote:

Chris:

As an example, let's say that there's an uppercase alphabetical
character whose numeric value is 17, and that the lowercase form of
this character's value is -8.

Can't happen in a conforming implementation.

Specifically what can't happen in a conforming implementation?

Specifically:

that there's an uppercase alphabetical
character whose numeric value is 17, and that the lowercase form of
this character's value is -8.

Generally: that any letter (as defined by the C standard) have a negative
value when represented as a plain `char`.

--
Chris "imaginary values are also excluded" Dollin

Hewlett-Packard Limited Cain Road, Bracknell, registered no:
registered office: Berks RG12 1HN 690597 England

Nov 1 '07 #5

Martin Wells

Chris:

Generally: that any letter (as defined by the C standard) have a negative
value when represented as a plain `char`.

Any _letter_ or any _character_? Can you point me to the page in the
Standard?

Martin

Nov 1 '07 #6

Ben Pfaff

Martin Wells <wa****@eircom.netwrites:

Chris:

As an example, let's say that there's an uppercase alphabetical
character whose numeric value is 17, and that the lowercase form of
this character's value is -8.

Can't happen in a conforming implementation.

Specifically what can't happen in a conforming implementation?

All uppercase and lowercase letters in the English alphabet must
have positive values in the range of char in a C implementation.
See C99 6.2.5 "Types", paragraph 2:

If a member of the basic execution character set is
stored in a char object, its value is guaranteed to be
positive.
--
int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuv wxyz.\
\n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
);while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p[i]\
);}return 0;}

Nov 1 '07 #7

CBFalconer

Martin Wells wrote:

>
Plain char may be signed or unsigned. Typical ranges could be:

CHAR_MIN == -128, CHAR_MAX == 127

CHAR_MIN == 0, CHAR_MAX == 255

The Standard says that the behaviour is undefined if we pass an
argument to the "to*" functions whose value is outside the range of 0
through UCHAR_MAX. This most certainly should have been CHAR_MIN
through CHAR_MAX.

If there were a particular implementation where a valid character had
a negative value, wouldn't it make perfect sense that you can pass
this value to "to*"? I think it would be ridiculously stupid if you
couldn't.

However in most cases this cannot arise. If you closely examine
the specifications of various input functions, such as getc, fgetc,
getchar, etc. you will notice that they all return the input value
as an int formed from the _unsigned char_ value of the input. This
also means that checking for EOF is simple, check the sign of the
returned value, since EOF is the only negative value allowed.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>
--
Posted via a free Usenet account from http://www.teranews.com

Nov 1 '07 #8

Richard

Martin Wells <wa****@eircom.netwrites:

Chris:

>Generally: that any letter (as defined by the C standard) have a negative
value when represented as a plain `char`.

Any _letter_ or any _character_? Can you point me to the page in the
Standard?

Martin

Any English *letter* (upper/lowercase) as defined by the C standard I
would think.

Nov 1 '07 #9

Chris Dollin

Martin Wells wrote:

Chris:

>Generally: that any letter (as defined by the C standard) have a negative
value when represented as a plain `char`.

Any _letter_ or any _character_?

Any letter, and in fact any character in C's execution set; so eg ()[]*+!
must all be positive but @ need not be.

Can you point me to the page in the Standard?

No, I'll have to give you a reference to n1124, in which 6.2.5 para 3
has the required text.

--
Chris "pointing is visual" Dollin

Hewlett-Packard Limited registered office: Cain Road, Bracknell,
registered no: 690597 England Berks RG12 1HN

Nov 1 '07 #10

Eric Sosman

Martin Wells wrote On 11/01/07 09:18,:

Plain char may be signed or unsigned. Typical ranges could be:

CHAR_MIN == -128, CHAR_MAX == 127

CHAR_MIN == 0, CHAR_MAX == 255

The Standard says that the behaviour is undefined if we pass an
argument to the "to*" functions whose value is outside the range of 0
through UCHAR_MAX. This most certainly should have been CHAR_MIN
through CHAR_MAX.

Taking "should have been" as a criticism of the
original design, I'd agree. Unfortunately, that horse
had left the barn long before the first Standard was
assembled. C89 codified existing practice to the extent
possible, rather than using hindsight to overturn it.
That's why we have gets(), for instance.

If there were a particular implementation where a valid character had
a negative value, wouldn't it make perfect sense that you can pass
this value to "to*"? I think it would be ridiculously stupid if you
couldn't.

It would make perfect sense, yes. Alas, we live and
code in an imperfect world.

[...] How many people here use an "unsigned char" cast
when using the "to*" functions?

I do.

Because I don't.

Sorry to hear that. Are the people who use your code
aware that some of your bugs are not accidental, but
deliberate?

--
Er*********@sun.com

Nov 1 '07 #11

Eric Sosman

Ben Pfaff wrote On 11/01/07 11:29,:

Martin Wells <wa****@eircom.netwrites:

>>Chris:

>>>>As an example, let's say that there's an uppercase alphabetical
character whose numeric value is 17, and that the lowercase form of
this character's value is -8.

Can't happen in a conforming implementation.

Specifically what can't happen in a conforming implementation?

All uppercase and lowercase letters in the English alphabet must
have positive values in the range of char in a C implementation.
See C99 6.2.5 "Types", paragraph 2:

If a member of the basic execution character set is
stored in a char object, its value is guaranteed to be
positive.

Right, but that's only for the *basic* set, the
characters that the Standard itself requires. The
implementation may define additional characters -- most
do, nowadays -- some or all of which may be negative.
(Anyone with a ¥ to argue this point and lacking the ¢
to keep quiet can go £ sand.)

In the "C" locale, only the fifty-two letters A-Z
and a-z are "alphabetic" as determined by isalpha().
But other locales can extend "alphabetic" to characters
outside the basic set, and some of these could well be
negative.

Martin's scenario seems a bit fanciful, but as far
as I can tell it is permitted by the Standard. I see
no requirement that toupper((unsigned char)ch) and
tolower((unsigned char)ch) must have the same sign.

--
Er*********@sun.com

Nov 1 '07 #12

santosh

Martin Wells wrote:

Chris:

>Generally: that any letter (as defined by the C standard) have a
negative value when represented as a plain `char`.

Any _letter_ or any _character_? Can you point me to the page in the
Standard?

As per the Standard all printable characters of the execution and source
character set must be positive values.

Nov 1 '07 #13

Ben Pfaff

Eric Sosman <Er*********@sun.comwrites:

Ben Pfaff wrote On 11/01/07 11:29,:
>All uppercase and lowercase letters in the English alphabet must
have positive values in the range of char in a C implementation.

Right, but that's only for the *basic* set, the
characters that the Standard itself requires. The
implementation may define additional characters -- most
do, nowadays -- some or all of which may be negative.

And that's why I said "in the English alphabet". (Additionally,
the Standard defines "letters" to be only English letters.)
--
"Programmers have the right to be ignorant of many details of your code
and still make reasonable changes."
--Kernighan and Plauger, _Software Tools_

Nov 1 '07 #14

Ben Pfaff

santosh <sa*********@gmail.comwrites:

As per the Standard all printable characters of the execution and source
character set must be positive values.

I believe that this guarantee is restricted to the basic
execution character set.
--
"Given that computing power increases exponentially with time,
algorithms with exponential or better O-notations
are actually linear with a large constant."
--Mike Lee

Nov 1 '07 #15

Keith Thompson

santosh <sa*********@gmail.comwrites:

Martin Wells wrote:
>Chris:
>>Generally: that any letter (as defined by the C standard) have a
negative value when represented as a plain `char`.

Any _letter_ or any _character_? Can you point me to the page in the
Standard?

As per the Standard all printable characters of the execution and source
character set must be positive values.

Yes, but in most cases it would be unwise to take advantage of that
fact. Code that deals only with the required character set today
might have to deal with arbitrary characters tomorrow.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Looking for software development work in the San Diego area.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Nov 1 '07 #16

jameskuyper

Ben Pfaff wrote:

Martin Wells <wa****@eircom.netwrites:

Chris:
As an example, let's say that there's an uppercase alphabetical
character whose numeric value is 17, and that the lowercase form of
this character's value is -8.

Can't happen in a conforming implementation.
Specifically what can't happen in a conforming implementation?

All uppercase and lowercase letters in the English alphabet must

He just said "alphabetical". He didn't say English, ...

have positive values in the range of char in a C implementation.
See C99 6.2.5 "Types", paragraph 2:

If a member of the basic execution character set is

and he didn't say "basic execution character set", or any variation
thereof.

Nov 1 '07 #17

Ben Pfaff

ja*********@verizon.net writes:

Ben Pfaff wrote:
>Martin Wells <wa****@eircom.netwrites:

Chris:
As an example, let's say that there's an uppercase alphabetical
character whose numeric value is 17, and that the lowercase form of
this character's value is -8.

Can't happen in a conforming implementation.

Specifically what can't happen in a conforming implementation?

All uppercase and lowercase letters in the English alphabet must

He just said "alphabetical". He didn't say English, ...

>have positive values in the range of char in a C implementation.
See C99 6.2.5 "Types", paragraph 2:

If a member of the basic execution character set is

and he didn't say "basic execution character set", or any variation
thereof.

Well, that's why my answer included those words, to make
everything perfectly clear.
--
Ben Pfaff
http://benpfaff.org

Nov 1 '07 #18

SM Ryan

Martin Wells <wa****@eircom.netwrote:
#
# Plain char may be signed or unsigned. Typical ranges could be:
#
# CHAR_MIN == -128, CHAR_MAX == 127
#
# CHAR_MIN == 0, CHAR_MAX == 255
#
# The Standard says that the behaviour is undefined if we pass an
# argument to the "to*" functions whose value is outside the range of 0
# through UCHAR_MAX. This most certainly should have been CHAR_MIN
# through CHAR_MAX.

Not it's 0. Sucks. Other than isascii, you have to take care
for the 128-255 (or -128 - -1) range. It dates back to when everyone
knew we would never need more than seven bits so the eighth bit was
like for free and you could use it for other things.

--
SM Ryan http://www.rawbw.com/~wyrmwif/
So basically, you just trace.

Nov 1 '07 #19

jameskuyper

Ben Pfaff wrote:

ja*********@verizon.net writes:

Ben Pfaff wrote:
Martin Wells <wa****@eircom.netwrites:

Chris:
As an example, let's say that there's an uppercase alphabetical
character whose numeric value is 17, and that the lowercase form of
this character's value is -8.

Can't happen in a conforming implementation.

Specifically what can't happen in a conforming implementation?

All uppercase and lowercase letters in the English alphabet must
He just said "alphabetical". He didn't say English, ...

have positive values in the range of char in a C implementation.
See C99 6.2.5 "Types", paragraph 2:

If a member of the basic execution character set is
and he didn't say "basic execution character set", or any variation
thereof.

Well, that's why my answer included those words, to make
everything perfectly clear.

Sorry, I though you were explaining Martin Wells' claim "Can't happen
in a conforming implementation." It wasn't clear from context that you
were pointing out the limitations that made his claim inaccurate.

Nov 1 '07 #20

Eric Sosman

ja*********@verizon.net wrote On 11/01/07 17:13,:

[...]

Sorry, I though you were explaining Martin Wells' claim "Can't happen
in a conforming implementation." It wasn't clear from context that you
were pointing out the limitations that made his claim inaccurate.

It was Chris Dollin's claim, in reference to
Martin Wells' example.

--
Er*********@sun.com

Nov 1 '07 #21

Old Wolf

On Nov 2, 6:15 am, Eric Sosman <Eric.Sos...@sun.comwrote:

Martin's scenario seems a bit fanciful, but as far
as I can tell it is permitted by the Standard. I see
no requirement that toupper((unsigned char)ch) and
tolower((unsigned char)ch) must have the same sign.

toupper and tolower are defined to return non-negative
int values. Martin's original suggestion of tolower(17)
returning -8 is not possible.

Nov 1 '07 #22

Eric Sosman

Old Wolf wrote On 11/01/07 17:56,:

On Nov 2, 6:15 am, Eric Sosman <Eric.Sos...@sun.comwrote:

> Martin's scenario seems a bit fanciful, but as far
as I can tell it is permitted by the Standard. I see
no requirement that toupper((unsigned char)ch) and
tolower((unsigned char)ch) must have the same sign.

toupper and tolower are defined to return non-negative
int values. Martin's original suggestion of tolower(17)
returning -8 is not possible.

Sorry; you're right. I should have said there's no
requirement that

(char)toupper((unsigned char)ch)
and
(char)tolower((unsigned char)ch)

have the same sign.

--
Er*********@sun.com

Nov 1 '07 #23

Army1987

On Thu, 01 Nov 2007 06:18:07 -0700, Martin Wells wrote:

>
Plain char may be signed or unsigned. Typical ranges could be:

CHAR_MIN == -128, CHAR_MAX == 127

CHAR_MIN == 0, CHAR_MAX == 255

The Standard says that the behaviour is undefined if we pass an
argument to the "to*" functions whose value is outside the range of 0
through UCHAR_MAX. This most certainly should have been CHAR_MIN
through CHAR_MAX.

getc() & co. return an int. putc() & co. return an int. I hardly
ever use a char to store a single character. If you hold all
characters (except those in strings) as an unsigned character
stored in an int, you won't have that problem.
--
Army1987 (Replace "NOSPAM" with "email")
A hamburger is better than nothing.
Nothing is better than eternal happiness.
Therefore, a hamburger is better than eternal happiness.

Nov 1 '07 #24

Ben Pfaff

Army1987 <ar******@NOSPAM.itwrites:

On Thu, 01 Nov 2007 06:18:07 -0700, Martin Wells wrote:
>The Standard says that the behaviour is undefined if we pass an
argument to the "to*" functions whose value is outside the range of 0
through UCHAR_MAX. This most certainly should have been CHAR_MIN
through CHAR_MAX.
getc() & co. return an int. putc() & co. return an int. I hardly
ever use a char to store a single character. If you hold all
characters (except those in strings) as an unsigned character
stored in an int, you won't have that problem.

Aren't most characters in fact stored in strings? That is the
case in my own programs.
--
Ben Pfaff
http://benpfaff.org

Nov 1 '07 #25

James Kuyper

Eric Sosman wrote:

ja*********@verizon.net wrote On 11/01/07 17:13,:
>[...]

Sorry, I though you were explaining Martin Wells' claim "Can't happen
in a conforming implementation." It wasn't clear from context that you
were pointing out the limitations that made his claim inaccurate.

It was Chris Dollin's claim, in reference to
Martin Wells' example.

I apologize for the misattribution! I must have gotten lost somewhere
while tracing things back.

Nov 2 '07 #26

Chris Dollin

ja*********@verizon.net wrote:

Ben Pfaff wrote:
>Martin Wells <wa****@eircom.netwrites:

Chris:
As an example, let's say that there's an uppercase alphabetical
character whose numeric value is 17, and that the lowercase form of
this character's value is -8.

Can't happen in a conforming implementation.

Specifically what can't happen in a conforming implementation?

All uppercase and lowercase letters in the English alphabet must

He just said "alphabetical". He didn't say English, ...

That's true, but if he'd meant characters other than the ones that we
/know/ a C program handles he would have said so, yes?

(One of my later messages made that restriction specific, just in case
that was the issue Martin was addressing.)

--
Hewlett-Packard Limited Cain Road, Bracknell, registered no:
registered office: Berks RG12 1HN 690597 England

Nov 2 '07 #27

James Kuyper

Chris Dollin wrote:

ja*********@verizon.net wrote:

>Ben Pfaff wrote:
>>Martin Wells <wa****@eircom.netwrites:

Chris:
>As an example, let's say that there's an uppercase alphabetical
>character whose numeric value is 17, and that the lowercase form of
>this character's value is -8.
Can't happen in a conforming implementation.
Specifically what can't happen in a conforming implementation?
All uppercase and lowercase letters in the English alphabet must
He just said "alphabetical". He didn't say English, ...

That's true, but if he'd meant characters other than the ones that we
/know/ a C program handles he would have said so, yes?

No, I can't say that I'd have expected him to say so explicitly. The
context was a question about the legal argument range of the to*()
functions, and that range is very definitely intended to be sufficient
to handle the extended execution character set, not just the basic
execution character set. The to*() functions have locale-dependent
behavior, and have to be usable in all supported locales, not just the
"C" locale.

Nov 2 '07 #28

Chris Dollin

James Kuyper wrote:

Chris Dollin wrote:
> ja*********@verizon.net wrote:

>>Ben Pfaff wrote:
Martin Wells <wa****@eircom.netwrites:

Chris:
>>As an example, let's say that there's an uppercase alphabetical
>>character whose numeric value is 17, and that the lowercase form of
>>this character's value is -8.
>Can't happen in a conforming implementation.
Specifically what can't happen in a conforming implementation?
All uppercase and lowercase letters in the English alphabet must
He just said "alphabetical". He didn't say English, ...

That's true, but if he'd meant characters other than the ones that we
/know/ a C program handles he would have said so, yes?

No, I can't say that I'd have expected him to say so explicitly. The
context was a question about the legal argument range of the to*()
functions, and that range is very definitely intended to be sufficient
to handle the extended execution character set, not just the basic
execution character set. The to*() functions have locale-dependent
behavior, and have to be usable in all supported locales, not just the
"C" locale.

OK, I can accept that I may have been over-reading his "alphabetical".
(Thinks: how to avoid making that mistake in the future?)

--
Chris "hmm, may umlauted characters be alphabetical and negative?" Dollin

Hewlett-Packard Limited registered office: Cain Road, Bracknell,
registered no: 690597 England Berks RG12 1HN

Nov 2 '07 #29

Eric Sosman

Chris Dollin wrote:

ja*********@verizon.net wrote:

>Ben Pfaff wrote:
>>Martin Wells <wa****@eircom.netwrites:

Chris:
>As an example, let's say that there's an uppercase alphabetical
>character whose numeric value is 17, and that the lowercase form of
>this character's value is -8.
Can't happen in a conforming implementation.
Specifically what can't happen in a conforming implementation?
All uppercase and lowercase letters in the English alphabet must
He just said "alphabetical". He didn't say English, ...

That's true, but if he'd meant characters other than the ones that we
/know/ a C program handles he would have said so, yes?

The characters we /know/ a C program handles have codes
in the range CHAR_MIN through CHAR_MAX. And that's the problem:
Even if you're operating in the "C" locale where all alphabetic
characters are positive, you still may encounter negative values
in a string. Many letters and signs outside the ASCII set will
have negative codes on some machines, and your program may have
its own private Götterdämmerung if it hands those negative values
to <ctype.hfunctions, even if operating in the "C" locale. The
undefinedness of the behavior is locale-independent.

It's onerous and awkward, but it's important to cast `char'
to `unsigned char' when calling a <ctype.hfunction. (But note
that the cast is *incorrect* when the argument is an `int' as
returned by getc(), etc.)

--
Eric Sosman
es*****@ieee-dot-org.invalid

Nov 2 '07 #30

Chris Dollin

Eric Sosman wrote:

Chris Dollin wrote:
> ja*********@verizon.net wrote:

>>Ben Pfaff wrote:
Martin Wells <wa****@eircom.netwrites:

Chris:
>>As an example, let's say that there's an uppercase alphabetical
>>character whose numeric value is 17, and that the lowercase form of
>>this character's value is -8.
>Can't happen in a conforming implementation.
Specifically what can't happen in a conforming implementation?
All uppercase and lowercase letters in the English alphabet must
He just said "alphabetical". He didn't say English, ...

That's true, but if he'd meant characters other than the ones that we
/know/ a C program handles he would have said so, yes?

The characters we /know/ a C program handles have codes
in the range CHAR_MIN through CHAR_MAX. And that's the problem:
Even if you're operating in the "C" locale where all alphabetic
characters are positive, you still may encounter negative values
in a string. Many letters and signs outside the ASCII set will
have negative codes on some machines, and your program may have
its own private GÃ¶tterdÃ¤mmerung if it hands those negative values
to <ctype.hfunctions, even if operating in the "C" locale. The
undefinedness of the behavior is locale-independent.

It's onerous and awkward, but it's important to cast `char'
to `unsigned char' when calling a <ctype.hfunction.

This I do not -- and did not -- deny.

--
Chris "must make caveats explicit early" Dollin

Hewlett-Packard Limited registered no:
registered office: Cain Road, Bracknell, Berks RG12 1HN 690597 England

Nov 2 '07 #31

Martin Wells

Let's say we want to write a C89 fully-portable algorithm for making
every character in a string uppercase (where applicable). Is there
anything wrong with the following?

void AllUpper(char *p)
{
while (*p++ = toupper(*p));
}

Sure, toupper will have unexpected results if the character value is
negative; but, if the implementation has signed char for plain char,
and has negative numbers for some of its characters, isn't it the
implementation's problem to make sure it works properly?

Assuming that the cast to unsigned char is necessary, what will happen
to values which are negative? Will they become corrupt?

Martin

Nov 2 '07 #32

Richard Heathfield

Martin Wells said:

>
Let's say we want to write a C89 fully-portable algorithm for making
every character in a string uppercase (where applicable). Is there
anything wrong with the following?

void AllUpper(char *p)
{
while (*p++ = toupper(*p));
}

Yes. The lack of a cast exposes the call to unnecessary risk, and I'm
fairly sure you didn't mean to use ++ in quite such a cavalier manner. I'd
be happier to see it written like this:

void AllUpper(char *p)
{
while(*p)
{
*p = toupper((unsigned char)*p);
++p;
}
}

Sure, toupper will have unexpected results if the character value is
negative; but, if the implementation has signed char for plain char,
and has negative numbers for some of its characters, isn't it the
implementation's problem to make sure it works properly?

Nope. The implementation's problem is to supply a toupper that works
according to spec on inputs that are either EOF or representable as an
unsigned char. If you hand it negative values, a crash is by no means
ruled out. For example, consider an implementation that defines EOF as -1,
and implements toupper as follows:

#define toupper(x) (__convtoupper[(x) + 1])

On such an implementation, toupper(-42), say, could easily crash the
program or even the machine.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Nov 2 '07 #33

Eric Sosman

Martin Wells wrote On 11/02/07 12:32,:

Let's say we want to write a C89 fully-portable algorithm for making
every character in a string uppercase (where applicable). Is there
anything wrong with the following?

void AllUpper(char *p)
{
while (*p++ = toupper(*p));
}

Yes, if `char' is signed.

Sure, toupper will have unexpected results if the character value is
negative; but, if the implementation has signed char for plain char,
and has negative numbers for some of its characters, isn't it the
implementation's problem to make sure it works properly?

The implementation must work properly when used properly.
When used improperly (as above), all bets are off.

Note that a `char' with the value -1 is likely to be
mistaken for EOF. (EOF is the only legal negative argument
to the <ctype.hfunctions; it is usually -1 but can in
theory be any negative `int' value.)

Assuming that the cast to unsigned char is necessary, what will happen
to values which are negative? Will they become corrupt?

Demons will fly from their serifs. Undefined behavior
is "undefined."

In typical implementations where `char' is not too wide,
the <ctype.hfunctions use their argument values to index
predefined arrays. Feed such an implementation an out-of-
range argument, and it will try to use that argument as an
array index, with outcomes similar to those you get when you
wander outside your own arrays. You might get a garbage
result like toupper('µ') == '7', or you might get something
like a SIGSEGV or GPF. Some implementations may try to be
helpful by making a spare copy of half the array, but they're
still likely to misbehave with 'ÿ' (Unicode U00FF, easily
confused with EOF on systems with 8-bit signed characters).

--
Er*********@sun.com

Nov 2 '07 #34

Chris Dollin

Martin Wells wrote:

Let's say we want to write a C89 fully-portable algorithm for making
every character in a string uppercase (where applicable). Is there
anything wrong with the following?

void AllUpper(char *p)
{
while (*p++ = toupper(*p));
}

Doesn't that fall afoul of undefined behaviour? `p` is assigned in
the expression, /and/ it's value is accessed for reasons other than
to determine its new value.

--
Chris "/which/ *p, exactly?" Dollin

Hewlett-Packard Limited Cain Road, Bracknell, registered no:
registered office: Berks RG12 1HN 690597 England

Nov 5 '07 #35

pete

Chris Dollin wrote:

>
Martin Wells wrote:

Let's say we want to write a C89 fully-portable algorithm for making
every character in a string uppercase (where applicable). Is there
anything wrong with the following?

void AllUpper(char *p)
{
while (*p++ = toupper(*p));
}

Doesn't that fall afoul of undefined behaviour?

Yes.

`p` is assigned in
the expression, /and/ it's value is accessed for reasons other than
to determine its new value.

N869
6.5 Expressions
[#2] Between the previous and next sequence point an object
shall have its stored value modified at most once by the
evaluation of an expression. Furthermore, the prior value
shall be accessed only to determine the value to be
stored.

60)This paragraph renders undefined statement expressions
such as
i = ++i + 1;
a[i++] = i;
while allowing
i = i + 1;
a[i] = i;

--
pete

Nov 5 '07 #36

A character with a negative value

Similar topics