473,606 Members | 3,113 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Casting to unsigned char for isupper() and friends

I've read that you should always cast the argument you pass to
isupper(), isalnum(), etc. to unsigned char, even though their
signature is int is...(int).

This confuses me, for the following reason. The is...() functions can
either accept a character, or EOF. But now suppose (as is common) that
EOF==(int) -1. Then (unsigned char) EOF will be 255, which is a valid
character value! So this casting destroys the possibility to pass EOF
to is...(), and in fact gives misleading results in this case.

Mar 23 '07 #1
24 2084
Fr************@ googlemail.com writes:
I've read that you should always cast the argument you pass to
isupper(), isalnum(), etc. to unsigned char, even though their
signature is int is...(int).

This confuses me, for the following reason. The is...() functions can
either accept a character, or EOF. But now suppose (as is common) that
EOF==(int) -1. Then (unsigned char) EOF will be 255, which is a valid
character value! So this casting destroys the possibility to pass EOF
to is...(), and in fact gives misleading results in this case.
If you have a value of type (plain) char, you should cast it to
unsigned char before passing it to isupper() (or any of the is*()
functions). For example, if plain char is signed, then -42
might be a valid character; you need to convert it to unsigned char,
yielding (assuming 8-bit characters) the value 214, which isupper()
can understand.

If you have the value EOF, then presumably you haven't tried to store
it in a variable of type char. For example, if it's the result of the
getchar() function, then it's already of type int (and any characters
that have negative values as signed char are already converted to
unsigned char), so no cast is necessary. Casting it to unsigned char
would, as you say, lose information.

So saying that you should *always* cast the argument to unsigned char
isn't quite correct. But the ability to pass the value EOF to the
is*() functions is fairly obscure, and it's not something I've ever
seen a use for. You're correct that EOF is an exception to the rule,
but I'd recommend just avoiding EOF in this context in the first
place.

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Mar 23 '07 #2
On 23 Mar 2007 16:30:13 -0700, in comp.lang.c ,
Fr************@ googlemail.com wrote:
>I've read that you should always cast the argument you pass to
isupper(), isalnum(), etc. to unsigned char, even though their
signature is int is...(int).

This confuses me, for the following reason. The is...() functions can
either accept a character, or EOF. But now suppose (as is common) that
EOF==(int) -1. Then (unsigned char) EOF will be 255, which is a valid
character value! So this casting destroys the possibility to pass EOF
to is...(), and in fact gives misleading results in this case.
While you can pass EOF to these functions it serves no useful purpose
to do so that I can think of. I suspect its there because getchar()
and the ilk can return it.

On the other hand, any other value outside the range of unsigned char
would invoke undefined behaviour. The cast is thus a safety measure to
prevent accidental invocation of UB.

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
Mar 24 '07 #3
In article <ln************ @nuthaus.mib.or g>,
Keith Thompson <ks***@mib.orgw rote:
>But the ability to pass the value EOF to the
is*() functions is fairly obscure, and it's not something I've ever
seen a use for.
I suppose if you have a series of tests like

c = getchar();
if(isupper(c))
...;
else if(isdigit(c))
...;
else if(c == '*')
...;
else if(c == EOF)
...;

you can do it without worrying about the order of the tests, just as if
it only had equality tests.

-- Richard
--
"Considerat ion shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
Mar 24 '07 #4
Mark McIntyre wrote, On 24/03/07 00:04:
On 23 Mar 2007 16:30:13 -0700, in comp.lang.c ,
Fr************@ googlemail.com wrote:
>I've read that you should always cast the argument you pass to
isupper(), isalnum(), etc. to unsigned char, even though their
signature is int is...(int).

This confuses me, for the following reason. The is...() functions can
either accept a character, or EOF. But now suppose (as is common) that
EOF==(int) -1. Then (unsigned char) EOF will be 255, which is a valid
character value! So this casting destroys the possibility to pass EOF
to is...(), and in fact gives misleading results in this case.

While you can pass EOF to these functions it serves no useful purpose
to do so that I can think of. I suspect its there because getchar()
and the ilk can return it.
I can see a useful purpose. On the assumption that EOF is the rare case
you can produce efficient code with
while (c=getchar() && isspace(c) && !(c==EOF)) continue;
for skipping white space. There are times when this is both efficient
and convenient. It is efficient because normally when the loop
terminates it is because of isspace failing. I'm not sure what isspace
returns if the input is EOF, it might mean you don't even need the last
test!
On the other hand, any other value outside the range of unsigned char
would invoke undefined behaviour. The cast is thus a safety measure to
prevent accidental invocation of UB.
The cast is a safety measure when the argument is not an int value that
is the result of getchar.
--
Flash Gordon
Mar 24 '07 #5
Richard Tobin wrote:
>
.... snip ...
>
I suppose if you have a series of tests like

c = getchar();
if(isupper(c))
...;
else if(isdigit(c))
...;
else if(c == '*')
...;
else if(c == EOF)
...;

you can do it without worrying about the order of the tests, just
as if it only had equality tests.
You can do this BECAUSE getchar (and fgetc and getc) return the int
equivalent of an unsigned char, or EOF. Note that c above MUST be
an int.

Stylewar note: if is not a function, so follow it with a blank.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home .att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Mar 24 '07 #6
On Mar 23, 6:51 pm, CBFalconer <cbfalco...@yah oo.comwrote:
<major snippage>
>
Stylewar note: if is not a function, so follow it with a blank.
SILENCE, NUMBER TWO!!
Mark F. Haigh
mf*****@sbcglob al.net

Mar 24 '07 #7
On Fri, 23 Mar 2007 21:51:18 -0500, CBFalconer <cb********@yah oo.com>
wrote:
>Richard Tobin wrote:
>>
... snip ...
>>
I suppose if you have a series of tests like

c = getchar();
if(isupper(c))
...;
else if(isdigit(c))
...;
else if(c == '*')
...;
else if(c == EOF)
...;

you can do it without worrying about the order of the tests, just
as if it only had equality tests.

You can do this BECAUSE getchar (and fgetc and getc) return the int
equivalent of an unsigned char, or EOF. Note that c above MUST be
an int.

Stylewar note: if is not a function, so follow it with a blank.
Yes! And neither are else, switch, for, or while functions.

--
jay
Mar 24 '07 #8
CBFalconer said:
Stylewar note: if is not a function, so follow it with a blank.
Why? (The stated reason is considered insufficient.)

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Mar 24 '07 #9
jaysome <ja*****@hotmai l.comwrites:
On Fri, 23 Mar 2007 21:51:18 -0500, CBFalconer <cb********@yah oo.com>
wrote:
>>Richard Tobin wrote:
[...]
>> else if(isdigit(c))
...;
else if(c == '*')
[...]
>>Stylewar note: if is not a function, so follow it with a blank.

Yes! And neither are else, switch, for, or while functions.
True, but else is seldom a problem. I don't think I've ever seen an
else immediately followed by a left parenthesis. At least, I hadn't
until a couple of minutes ago, when I write this silly little program:

#include <stdio.h>
int main(int argc, char **argv)
{
if (argc == 1)
puts("No arguments");
else(puts("One or more arguments"));
return 0;
}

(Or I could have added a cast to void rather than enclosing the entire
call in parentheses.)

But I agree with your actual point.

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Mar 24 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
2504
by: drowned | last post by:
I'm having a problem understanding why the code in this little program I've got is behaving the way it does... if anyone can explain it, I'd be grateful: #include <iostream> using namespace std; //This function takes an unsigned char (one byte on my machine) //and prints it in binary notation
1
1703
by: wenmang | last post by:
Hi, I have a third party function that has a following signature: function(char * memoryBuffer, long bufferSize); I want to following Base64::decode() from Apache's XML C++ APIs: XMLByte* Base64::decode ( const XMLByte *const inputData, unsigned int * outputLength, MemoryManager *const memMgr = 0 ) The problem is that the decode() returns a diff type XMLByte* defined as unsinged char and an unsigned int for outputLength in...
19
1738
by: Ramesh Tharma | last post by:
Hi, Is any one knows what's wrong with the following code, I was told that it will compile and run but it will crash for some values. Assume that variables are initilized. char* c; long* lg;
14
2610
by: mr_semantics | last post by:
I have been reading about the practise of casting values to unsigned char while using the <ctype.h> functions. For example, c = toupper ((unsigned char) c); Now I understand that the standard says this about the <ctype.h> functions: "The header <ctype.h> declares several functions useful for classifying and mapping characters.166) In all cases the argument is an int, the
0
291
by: Dan | last post by:
Hi all, I'd like to submit what it seems to be a bug as for the Unicode compliance of methods like Char.Is...: as stated by the latest version of Unicode, codes +03F2 and +03F9 represent Greek lunate sigma, lowercase and uppercase respectively (c and C). For these codes I get the following results: +03F2: lowercase c: Char.IsLetter() = true (OK) Char.IsUpper() = false (OK) Char.IsLower() = true (OK)
9
430
by: Roman Mashak | last post by:
Hello, All! I met this code recently on some open source sites. What may be the point of using such construction: typedef struct cmd { unsigned int cmdack; unsigned int code; unsigned int data;
2
2676
by: keith | last post by:
Hello, Could someone please explain why the (GCC, ancient 2.95.3 version) compiler won't let me static_cast<a char* to a unsigned char* or vice versa? It also won't accept dynamic_cast<for those conversions. The only one it will permit is reinterpret_cast<>. BTW, yes I know that pointer casts are 'evil', but here's a for- example: ostream::write for some reason expects a const char* and a streamsize parameter, rather than the...
33
15510
by: Michael B Allen | last post by:
Hello, Early on I decided that all text (what most people call "strings" ) in my code would be unsigned char *. The reasoning is that the elements of these arrays are decidedly not signed. In fact, they may not even represent complete characters. At this point I think of text as simple binary blobs. What charset, character encoding and termination they use should not be exposed in the interface used to operate on them. But now I have...
10
3767
by: Alex Vinokur | last post by:
Hi, Is it possible to do C++-casting from const pair<const unsigned char*, size_t>* to const pair<unsigned char*, size_t>* ? Alex Vinokur
0
7978
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8461
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8317
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6796
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
5987
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
3948
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4010
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2454
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1572
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.