473,394 Members | 2,048 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

A character with a negative value


Plain char may be signed or unsigned. Typical ranges could be:

CHAR_MIN == -128, CHAR_MAX == 127

CHAR_MIN == 0, CHAR_MAX == 255

The Standard says that the behaviour is undefined if we pass an
argument to the "to*" functions whose value is outside the range of 0
through UCHAR_MAX. This most certainly should have been CHAR_MIN
through CHAR_MAX.

If there were a particular implementation where a valid character had
a negative value, wouldn't it make perfect sense that you can pass
this value to "to*"? I think it would be ridiculously stupid if you
couldn't.

As an example, let's say that there's an uppercase alphabetical
character whose numeric value is 17, and that the lowercase form of
this character's value is -8. If we pass the former to "tolower", we
should get -8, and if we pass the latter to "toupper", we should get
17. Now of course, the Standard itself doesn't guarantee this... but
if the implementation has negative values for valid characters then it
would be quite stupid if you couldn't do normal operations on these
valid characters. How many people here use an "unsigned char" cast
when using the "to*" functions? Because I don't.

Martin

Nov 1 '07 #1
35 4503
Martin Wells wrote:
As an example, let's say that there's an uppercase alphabetical
character whose numeric value is 17, and that the lowercase form of
this character's value is -8.
Can't happen in a conforming implementation.
How many people here use an "unsigned char" cast when using the "to*"
Me.
functions? Because I don't.
Oops.

--
Chris "unsinged and unsung" Dollin

Hewlett-Packard Limited Cain Road, Bracknell, registered no:
registered office: Berks RG12 1HN 690597 England

Nov 1 '07 #2
On Nov 1, 3:18 pm, Martin Wells <war...@eircom.netwrote:
Plain char may be signed or unsigned. Typical ranges could be:

CHAR_MIN == -128, CHAR_MAX == 127

CHAR_MIN == 0, CHAR_MAX == 255
CHAR_M(IN/AX) is for plain char. use (S/U)CHAR_M(IN/MAX)
SCHAR_MIN <= -127 SCHAR_MAX >= 127
UCHAR_MIN 0 UCHAR_MAX >= 255

Cast the argument in all your to*() calls to (unsigned char).

Nov 1 '07 #3
Chris:
As an example, let's say that there's an uppercase alphabetical
character whose numeric value is 17, and that the lowercase form of
this character's value is -8.

Can't happen in a conforming implementation.

Specifically what can't happen in a conforming implementation?

Martin

Nov 1 '07 #4
Martin Wells wrote:
Chris:
As an example, let's say that there's an uppercase alphabetical
character whose numeric value is 17, and that the lowercase form of
this character's value is -8.

Can't happen in a conforming implementation.

Specifically what can't happen in a conforming implementation?
Specifically:
that there's an uppercase alphabetical
character whose numeric value is 17, and that the lowercase form of
this character's value is -8.
Generally: that any letter (as defined by the C standard) have a negative
value when represented as a plain `char`.

--
Chris "imaginary values are also excluded" Dollin

Hewlett-Packard Limited Cain Road, Bracknell, registered no:
registered office: Berks RG12 1HN 690597 England

Nov 1 '07 #5
Chris:
Generally: that any letter (as defined by the C standard) have a negative
value when represented as a plain `char`.

Any _letter_ or any _character_? Can you point me to the page in the
Standard?

Martin

Nov 1 '07 #6
Martin Wells <wa****@eircom.netwrites:
Chris:
As an example, let's say that there's an uppercase alphabetical
character whose numeric value is 17, and that the lowercase form of
this character's value is -8.

Can't happen in a conforming implementation.

Specifically what can't happen in a conforming implementation?
All uppercase and lowercase letters in the English alphabet must
have positive values in the range of char in a C implementation.
See C99 6.2.5 "Types", paragraph 2:

If a member of the basic execution character set is
stored in a char object, its value is guaranteed to be
positive.
--
int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuv wxyz.\
\n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
);while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p[i]\
);}return 0;}
Nov 1 '07 #7
Martin Wells wrote:
>
Plain char may be signed or unsigned. Typical ranges could be:

CHAR_MIN == -128, CHAR_MAX == 127

CHAR_MIN == 0, CHAR_MAX == 255

The Standard says that the behaviour is undefined if we pass an
argument to the "to*" functions whose value is outside the range of 0
through UCHAR_MAX. This most certainly should have been CHAR_MIN
through CHAR_MAX.

If there were a particular implementation where a valid character had
a negative value, wouldn't it make perfect sense that you can pass
this value to "to*"? I think it would be ridiculously stupid if you
couldn't.
However in most cases this cannot arise. If you closely examine
the specifications of various input functions, such as getc, fgetc,
getchar, etc. you will notice that they all return the input value
as an int formed from the _unsigned char_ value of the input. This
also means that checking for EOF is simple, check the sign of the
returned value, since EOF is the only negative value allowed.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>
--
Posted via a free Usenet account from http://www.teranews.com

Nov 1 '07 #8
Martin Wells <wa****@eircom.netwrites:
Chris:
>Generally: that any letter (as defined by the C standard) have a negative
value when represented as a plain `char`.


Any _letter_ or any _character_? Can you point me to the page in the
Standard?

Martin
Any English *letter* (upper/lowercase) as defined by the C standard I
would think.

Nov 1 '07 #9
Martin Wells wrote:
Chris:
>Generally: that any letter (as defined by the C standard) have a negative
value when represented as a plain `char`.

Any _letter_ or any _character_?
Any letter, and in fact any character in C's execution set; so eg ()[]*+!
must all be positive but @ need not be.
Can you point me to the page in the Standard?
No, I'll have to give you a reference to n1124, in which 6.2.5 para 3
has the required text.

--
Chris "pointing is visual" Dollin

Hewlett-Packard Limited registered office: Cain Road, Bracknell,
registered no: 690597 England Berks RG12 1HN

Nov 1 '07 #10
Martin Wells wrote On 11/01/07 09:18,:
Plain char may be signed or unsigned. Typical ranges could be:

CHAR_MIN == -128, CHAR_MAX == 127

CHAR_MIN == 0, CHAR_MAX == 255

The Standard says that the behaviour is undefined if we pass an
argument to the "to*" functions whose value is outside the range of 0
through UCHAR_MAX. This most certainly should have been CHAR_MIN
through CHAR_MAX.
Taking "should have been" as a criticism of the
original design, I'd agree. Unfortunately, that horse
had left the barn long before the first Standard was
assembled. C89 codified existing practice to the extent
possible, rather than using hindsight to overturn it.
That's why we have gets(), for instance.
If there were a particular implementation where a valid character had
a negative value, wouldn't it make perfect sense that you can pass
this value to "to*"? I think it would be ridiculously stupid if you
couldn't.
It would make perfect sense, yes. Alas, we live and
code in an imperfect world.
[...] How many people here use an "unsigned char" cast
when using the "to*" functions?
I do.
Because I don't.
Sorry to hear that. Are the people who use your code
aware that some of your bugs are not accidental, but
deliberate?

--
Er*********@sun.com
Nov 1 '07 #11
Ben Pfaff wrote On 11/01/07 11:29,:
Martin Wells <wa****@eircom.netwrites:

>>Chris:
>>>>As an example, let's say that there's an uppercase alphabetical
character whose numeric value is 17, and that the lowercase form of
this character's value is -8.

Can't happen in a conforming implementation.

Specifically what can't happen in a conforming implementation?

All uppercase and lowercase letters in the English alphabet must
have positive values in the range of char in a C implementation.
See C99 6.2.5 "Types", paragraph 2:

If a member of the basic execution character set is
stored in a char object, its value is guaranteed to be
positive.
Right, but that's only for the *basic* set, the
characters that the Standard itself requires. The
implementation may define additional characters -- most
do, nowadays -- some or all of which may be negative.
(Anyone with a „ to argue this point and lacking the ¢
to keep quiet can go £ sand.)

In the "C" locale, only the fifty-two letters A-Z
and a-z are "alphabetic" as determined by isalpha().
But other locales can extend "alphabetic" to characters
outside the basic set, and some of these could well be
negative.

Martin's scenario seems a bit fanciful, but as far
as I can tell it is permitted by the Standard. I see
no requirement that toupper((unsigned char)ch) and
tolower((unsigned char)ch) must have the same sign.

--
Er*********@sun.com
Nov 1 '07 #12
Martin Wells wrote:
Chris:
>Generally: that any letter (as defined by the C standard) have a
negative value when represented as a plain `char`.


Any _letter_ or any _character_? Can you point me to the page in the
Standard?
As per the Standard all printable characters of the execution and source
character set must be positive values.

Nov 1 '07 #13
Eric Sosman <Er*********@sun.comwrites:
Ben Pfaff wrote On 11/01/07 11:29,:
>All uppercase and lowercase letters in the English alphabet must
have positive values in the range of char in a C implementation.

Right, but that's only for the *basic* set, the
characters that the Standard itself requires. The
implementation may define additional characters -- most
do, nowadays -- some or all of which may be negative.
And that's why I said "in the English alphabet". (Additionally,
the Standard defines "letters" to be only English letters.)
--
"Programmers have the right to be ignorant of many details of your code
and still make reasonable changes."
--Kernighan and Plauger, _Software Tools_
Nov 1 '07 #14
santosh <sa*********@gmail.comwrites:
As per the Standard all printable characters of the execution and source
character set must be positive values.
I believe that this guarantee is restricted to the basic
execution character set.
--
"Given that computing power increases exponentially with time,
algorithms with exponential or better O-notations
are actually linear with a large constant."
--Mike Lee
Nov 1 '07 #15
santosh <sa*********@gmail.comwrites:
Martin Wells wrote:
>Chris:
>>Generally: that any letter (as defined by the C standard) have a
negative value when represented as a plain `char`.

Any _letter_ or any _character_? Can you point me to the page in the
Standard?

As per the Standard all printable characters of the execution and source
character set must be positive values.
Yes, but in most cases it would be unwise to take advantage of that
fact. Code that deals only with the required character set today
might have to deal with arbitrary characters tomorrow.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Looking for software development work in the San Diego area.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Nov 1 '07 #16
Ben Pfaff wrote:
Martin Wells <wa****@eircom.netwrites:
Chris:
As an example, let's say that there's an uppercase alphabetical
character whose numeric value is 17, and that the lowercase form of
this character's value is -8.

Can't happen in a conforming implementation.
Specifically what can't happen in a conforming implementation?

All uppercase and lowercase letters in the English alphabet must
He just said "alphabetical". He didn't say English, ...
have positive values in the range of char in a C implementation.
See C99 6.2.5 "Types", paragraph 2:

If a member of the basic execution character set is
and he didn't say "basic execution character set", or any variation
thereof.

Nov 1 '07 #17
ja*********@verizon.net writes:
Ben Pfaff wrote:
>Martin Wells <wa****@eircom.netwrites:
Chris:
As an example, let's say that there's an uppercase alphabetical
character whose numeric value is 17, and that the lowercase form of
this character's value is -8.

Can't happen in a conforming implementation.

Specifically what can't happen in a conforming implementation?

All uppercase and lowercase letters in the English alphabet must

He just said "alphabetical". He didn't say English, ...
>have positive values in the range of char in a C implementation.
See C99 6.2.5 "Types", paragraph 2:

If a member of the basic execution character set is

and he didn't say "basic execution character set", or any variation
thereof.
Well, that's why my answer included those words, to make
everything perfectly clear.
--
Ben Pfaff
http://benpfaff.org
Nov 1 '07 #18
Martin Wells <wa****@eircom.netwrote:
#
# Plain char may be signed or unsigned. Typical ranges could be:
#
# CHAR_MIN == -128, CHAR_MAX == 127
#
# CHAR_MIN == 0, CHAR_MAX == 255
#
# The Standard says that the behaviour is undefined if we pass an
# argument to the "to*" functions whose value is outside the range of 0
# through UCHAR_MAX. This most certainly should have been CHAR_MIN
# through CHAR_MAX.

Not it's 0. Sucks. Other than isascii, you have to take care
for the 128-255 (or -128 - -1) range. It dates back to when everyone
knew we would never need more than seven bits so the eighth bit was
like for free and you could use it for other things.

--
SM Ryan http://www.rawbw.com/~wyrmwif/
So basically, you just trace.
Nov 1 '07 #19
Ben Pfaff wrote:
ja*********@verizon.net writes:
Ben Pfaff wrote:
Martin Wells <wa****@eircom.netwrites:

Chris:
As an example, let's say that there's an uppercase alphabetical
character whose numeric value is 17, and that the lowercase form of
this character's value is -8.

Can't happen in a conforming implementation.

Specifically what can't happen in a conforming implementation?

All uppercase and lowercase letters in the English alphabet must
He just said "alphabetical". He didn't say English, ...
have positive values in the range of char in a C implementation.
See C99 6.2.5 "Types", paragraph 2:

If a member of the basic execution character set is
and he didn't say "basic execution character set", or any variation
thereof.

Well, that's why my answer included those words, to make
everything perfectly clear.
Sorry, I though you were explaining Martin Wells' claim "Can't happen
in a conforming implementation." It wasn't clear from context that you
were pointing out the limitations that made his claim inaccurate.

Nov 1 '07 #20
ja*********@verizon.net wrote On 11/01/07 17:13,:
[...]

Sorry, I though you were explaining Martin Wells' claim "Can't happen
in a conforming implementation." It wasn't clear from context that you
were pointing out the limitations that made his claim inaccurate.
It was Chris Dollin's claim, in reference to
Martin Wells' example.

--
Er*********@sun.com
Nov 1 '07 #21
On Nov 2, 6:15 am, Eric Sosman <Eric.Sos...@sun.comwrote:
Martin's scenario seems a bit fanciful, but as far
as I can tell it is permitted by the Standard. I see
no requirement that toupper((unsigned char)ch) and
tolower((unsigned char)ch) must have the same sign.
toupper and tolower are defined to return non-negative
int values. Martin's original suggestion of tolower(17)
returning -8 is not possible.

Nov 1 '07 #22
Old Wolf wrote On 11/01/07 17:56,:
On Nov 2, 6:15 am, Eric Sosman <Eric.Sos...@sun.comwrote:
> Martin's scenario seems a bit fanciful, but as far
as I can tell it is permitted by the Standard. I see
no requirement that toupper((unsigned char)ch) and
tolower((unsigned char)ch) must have the same sign.


toupper and tolower are defined to return non-negative
int values. Martin's original suggestion of tolower(17)
returning -8 is not possible.
Sorry; you're right. I should have said there's no
requirement that

(char)toupper((unsigned char)ch)
and
(char)tolower((unsigned char)ch)

have the same sign.

--
Er*********@sun.com
Nov 1 '07 #23
On Thu, 01 Nov 2007 06:18:07 -0700, Martin Wells wrote:
>
Plain char may be signed or unsigned. Typical ranges could be:

CHAR_MIN == -128, CHAR_MAX == 127

CHAR_MIN == 0, CHAR_MAX == 255

The Standard says that the behaviour is undefined if we pass an
argument to the "to*" functions whose value is outside the range of 0
through UCHAR_MAX. This most certainly should have been CHAR_MIN
through CHAR_MAX.
getc() & co. return an int. putc() & co. return an int. I hardly
ever use a char to store a single character. If you hold all
characters (except those in strings) as an unsigned character
stored in an int, you won't have that problem.
--
Army1987 (Replace "NOSPAM" with "email")
A hamburger is better than nothing.
Nothing is better than eternal happiness.
Therefore, a hamburger is better than eternal happiness.

Nov 1 '07 #24
Army1987 <ar******@NOSPAM.itwrites:
On Thu, 01 Nov 2007 06:18:07 -0700, Martin Wells wrote:
>The Standard says that the behaviour is undefined if we pass an
argument to the "to*" functions whose value is outside the range of 0
through UCHAR_MAX. This most certainly should have been CHAR_MIN
through CHAR_MAX.
getc() & co. return an int. putc() & co. return an int. I hardly
ever use a char to store a single character. If you hold all
characters (except those in strings) as an unsigned character
stored in an int, you won't have that problem.
Aren't most characters in fact stored in strings? That is the
case in my own programs.
--
Ben Pfaff
http://benpfaff.org
Nov 1 '07 #25
Eric Sosman wrote:
ja*********@verizon.net wrote On 11/01/07 17:13,:
>[...]

Sorry, I though you were explaining Martin Wells' claim "Can't happen
in a conforming implementation." It wasn't clear from context that you
were pointing out the limitations that made his claim inaccurate.

It was Chris Dollin's claim, in reference to
Martin Wells' example.
I apologize for the misattribution! I must have gotten lost somewhere
while tracing things back.
Nov 2 '07 #26
ja*********@verizon.net wrote:
Ben Pfaff wrote:
>Martin Wells <wa****@eircom.netwrites:
Chris:
As an example, let's say that there's an uppercase alphabetical
character whose numeric value is 17, and that the lowercase form of
this character's value is -8.

Can't happen in a conforming implementation.

Specifically what can't happen in a conforming implementation?

All uppercase and lowercase letters in the English alphabet must

He just said "alphabetical". He didn't say English, ...
That's true, but if he'd meant characters other than the ones that we
/know/ a C program handles he would have said so, yes?

(One of my later messages made that restriction specific, just in case
that was the issue Martin was addressing.)

--
Hewlett-Packard Limited Cain Road, Bracknell, registered no:
registered office: Berks RG12 1HN 690597 England

Nov 2 '07 #27
Chris Dollin wrote:
ja*********@verizon.net wrote:
>Ben Pfaff wrote:
>>Martin Wells <wa****@eircom.netwrites:

Chris:
>As an example, let's say that there's an uppercase alphabetical
>character whose numeric value is 17, and that the lowercase form of
>this character's value is -8.
Can't happen in a conforming implementation.
Specifically what can't happen in a conforming implementation?
All uppercase and lowercase letters in the English alphabet must
He just said "alphabetical". He didn't say English, ...

That's true, but if he'd meant characters other than the ones that we
/know/ a C program handles he would have said so, yes?
No, I can't say that I'd have expected him to say so explicitly. The
context was a question about the legal argument range of the to*()
functions, and that range is very definitely intended to be sufficient
to handle the extended execution character set, not just the basic
execution character set. The to*() functions have locale-dependent
behavior, and have to be usable in all supported locales, not just the
"C" locale.

Nov 2 '07 #28
James Kuyper wrote:
Chris Dollin wrote:
> ja*********@verizon.net wrote:
>>Ben Pfaff wrote:
Martin Wells <wa****@eircom.netwrites:

Chris:
>>As an example, let's say that there's an uppercase alphabetical
>>character whose numeric value is 17, and that the lowercase form of
>>this character's value is -8.
>Can't happen in a conforming implementation.
Specifically what can't happen in a conforming implementation?
All uppercase and lowercase letters in the English alphabet must
He just said "alphabetical". He didn't say English, ...

That's true, but if he'd meant characters other than the ones that we
/know/ a C program handles he would have said so, yes?

No, I can't say that I'd have expected him to say so explicitly. The
context was a question about the legal argument range of the to*()
functions, and that range is very definitely intended to be sufficient
to handle the extended execution character set, not just the basic
execution character set. The to*() functions have locale-dependent
behavior, and have to be usable in all supported locales, not just the
"C" locale.
OK, I can accept that I may have been over-reading his "alphabetical".
(Thinks: how to avoid making that mistake in the future?)

--
Chris "hmm, may umlauted characters be alphabetical and negative?" Dollin

Hewlett-Packard Limited registered office: Cain Road, Bracknell,
registered no: 690597 England Berks RG12 1HN

Nov 2 '07 #29
Chris Dollin wrote:
ja*********@verizon.net wrote:
>Ben Pfaff wrote:
>>Martin Wells <wa****@eircom.netwrites:

Chris:
>As an example, let's say that there's an uppercase alphabetical
>character whose numeric value is 17, and that the lowercase form of
>this character's value is -8.
Can't happen in a conforming implementation.
Specifically what can't happen in a conforming implementation?
All uppercase and lowercase letters in the English alphabet must
He just said "alphabetical". He didn't say English, ...

That's true, but if he'd meant characters other than the ones that we
/know/ a C program handles he would have said so, yes?
The characters we /know/ a C program handles have codes
in the range CHAR_MIN through CHAR_MAX. And that's the problem:
Even if you're operating in the "C" locale where all alphabetic
characters are positive, you still may encounter negative values
in a string. Many letters and signs outside the ASCII set will
have negative codes on some machines, and your program may have
its own private Götterdämmerung if it hands those negative values
to <ctype.hfunctions, even if operating in the "C" locale. The
undefinedness of the behavior is locale-independent.

It's onerous and awkward, but it's important to cast `char'
to `unsigned char' when calling a <ctype.hfunction. (But note
that the cast is *incorrect* when the argument is an `int' as
returned by getc(), etc.)

--
Eric Sosman
es*****@ieee-dot-org.invalid
Nov 2 '07 #30
Eric Sosman wrote:
Chris Dollin wrote:
> ja*********@verizon.net wrote:
>>Ben Pfaff wrote:
Martin Wells <wa****@eircom.netwrites:

Chris:
>>As an example, let's say that there's an uppercase alphabetical
>>character whose numeric value is 17, and that the lowercase form of
>>this character's value is -8.
>Can't happen in a conforming implementation.
Specifically what can't happen in a conforming implementation?
All uppercase and lowercase letters in the English alphabet must
He just said "alphabetical". He didn't say English, ...

That's true, but if he'd meant characters other than the ones that we
/know/ a C program handles he would have said so, yes?

The characters we /know/ a C program handles have codes
in the range CHAR_MIN through CHAR_MAX. And that's the problem:
Even if you're operating in the "C" locale where all alphabetic
characters are positive, you still may encounter negative values
in a string. Many letters and signs outside the ASCII set will
have negative codes on some machines, and your program may have
its own private GƶtterdƤmmerung if it hands those negative values
to <ctype.hfunctions, even if operating in the "C" locale. The
undefinedness of the behavior is locale-independent.

It's onerous and awkward, but it's important to cast `char'
to `unsigned char' when calling a <ctype.hfunction.
This I do not -- and did not -- deny.

--
Chris "must make caveats explicit early" Dollin

Hewlett-Packard Limited registered no:
registered office: Cain Road, Bracknell, Berks RG12 1HN 690597 England

Nov 2 '07 #31

Let's say we want to write a C89 fully-portable algorithm for making
every character in a string uppercase (where applicable). Is there
anything wrong with the following?

void AllUpper(char *p)
{
while (*p++ = toupper(*p));
}

Sure, toupper will have unexpected results if the character value is
negative; but, if the implementation has signed char for plain char,
and has negative numbers for some of its characters, isn't it the
implementation's problem to make sure it works properly?

Assuming that the cast to unsigned char is necessary, what will happen
to values which are negative? Will they become corrupt?

Martin

Nov 2 '07 #32
Martin Wells said:
>
Let's say we want to write a C89 fully-portable algorithm for making
every character in a string uppercase (where applicable). Is there
anything wrong with the following?

void AllUpper(char *p)
{
while (*p++ = toupper(*p));
}
Yes. The lack of a cast exposes the call to unnecessary risk, and I'm
fairly sure you didn't mean to use ++ in quite such a cavalier manner. I'd
be happier to see it written like this:

void AllUpper(char *p)
{
while(*p)
{
*p = toupper((unsigned char)*p);
++p;
}
}
Sure, toupper will have unexpected results if the character value is
negative; but, if the implementation has signed char for plain char,
and has negative numbers for some of its characters, isn't it the
implementation's problem to make sure it works properly?
Nope. The implementation's problem is to supply a toupper that works
according to spec on inputs that are either EOF or representable as an
unsigned char. If you hand it negative values, a crash is by no means
ruled out. For example, consider an implementation that defines EOF as -1,
and implements toupper as follows:

#define toupper(x) (__convtoupper[(x) + 1])

On such an implementation, toupper(-42), say, could easily crash the
program or even the machine.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Nov 2 '07 #33
Martin Wells wrote On 11/02/07 12:32,:
Let's say we want to write a C89 fully-portable algorithm for making
every character in a string uppercase (where applicable). Is there
anything wrong with the following?

void AllUpper(char *p)
{
while (*p++ = toupper(*p));
}
Yes, if `char' is signed.
Sure, toupper will have unexpected results if the character value is
negative; but, if the implementation has signed char for plain char,
and has negative numbers for some of its characters, isn't it the
implementation's problem to make sure it works properly?
The implementation must work properly when used properly.
When used improperly (as above), all bets are off.

Note that a `char' with the value -1 is likely to be
mistaken for EOF. (EOF is the only legal negative argument
to the <ctype.hfunctions; it is usually -1 but can in
theory be any negative `int' value.)
Assuming that the cast to unsigned char is necessary, what will happen
to values which are negative? Will they become corrupt?
Demons will fly from their serifs. Undefined behavior
is "undefined."

In typical implementations where `char' is not too wide,
the <ctype.hfunctions use their argument values to index
predefined arrays. Feed such an implementation an out-of-
range argument, and it will try to use that argument as an
array index, with outcomes similar to those you get when you
wander outside your own arrays. You might get a garbage
result like toupper('µ') == '7', or you might get something
like a SIGSEGV or GPF. Some implementations may try to be
helpful by making a spare copy of half the array, but they're
still likely to misbehave with '’' (Unicode U00FF, easily
confused with EOF on systems with 8-bit signed characters).

--
Er*********@sun.com
Nov 2 '07 #34
Martin Wells wrote:
Let's say we want to write a C89 fully-portable algorithm for making
every character in a string uppercase (where applicable). Is there
anything wrong with the following?

void AllUpper(char *p)
{
while (*p++ = toupper(*p));
}
Doesn't that fall afoul of undefined behaviour? `p` is assigned in
the expression, /and/ it's value is accessed for reasons other than
to determine its new value.

--
Chris "/which/ *p, exactly?" Dollin

Hewlett-Packard Limited Cain Road, Bracknell, registered no:
registered office: Berks RG12 1HN 690597 England

Nov 5 '07 #35
Chris Dollin wrote:
>
Martin Wells wrote:
Let's say we want to write a C89 fully-portable algorithm for making
every character in a string uppercase (where applicable). Is there
anything wrong with the following?

void AllUpper(char *p)
{
while (*p++ = toupper(*p));
}

Doesn't that fall afoul of undefined behaviour?
Yes.
`p` is assigned in
the expression, /and/ it's value is accessed for reasons other than
to determine its new value.
N869
6.5 Expressions
[#2] Between the previous and next sequence point an object
shall have its stored value modified at most once by the
evaluation of an expression. Furthermore, the prior value
shall be accessed only to determine the value to be
stored.

60)This paragraph renders undefined statement expressions
such as
i = ++i + 1;
a[i++] = i;
while allowing
i = i + 1;
a[i] = i;

--
pete
Nov 5 '07 #36

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
by: pb | last post by:
Im wanted to pad out blank spaces with a specific character instead of spaces or zeros, does C support that? printf("$%*d", '*', 5); // Not sure what the format string is supposed to look like...
21
by: aegis | last post by:
7.4#1 states The header <ctype.h> declares several functions useful for classifying and mapping characters.166) In all cases the argument is an int, the value of which shall be representable as an...
17
by: Gladiator | last post by:
When I am trying to execute a program from "The C Programming Language" by Dennis Ritchie, I tried to run the following program.I am using Dev++ as a compiler software. The Program is presented...
39
by: Frederick Gotham | last post by:
I have a general idea about how negative number systems work, but I'd appreciate some clarification if anyone would be willing to help me. Let's assume we're working with an 8-Bit signed integer,...
25
by: ehabaziz2001 | last post by:
Why I can not begin my subscript of character arrrays with 0. In this program I can not do : do { na=getchar(); i++; na=getchar(); } while (na!='\n');
35
by: rajash | last post by:
Hello everyone, Thanks again for all the suggestions, though I think some people are a bit fussy in their answers. Here is a solution to Exercise 1.14. It deals well with control characters...
6
by: tcomer | last post by:
Hello, I have a pretty interesting problem here.. Ok, I have an integer that needs to be right shifted, and then converted to a char.. which is then is used to build a string. Heres an...
13
by: Ivan | last post by:
Hi, What is the best syntax to use a char to index into an array. /////////////////////////////////// For example int data; data = 1;
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.