473,606 Members | 2,101 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

casting to unsigned char for is*() and to*() functions

I have been reading about the practise of casting values to unsigned
char while using the <ctype.h> functions. For example,

c = toupper ((unsigned char) c);

Now I understand that the standard says this about the <ctype.h>
functions:

"The header <ctype.h> declares several functions useful for classifying
and mapping characters.166) In all cases the argument is an int, the
value of which shall be representable as an unsigned char or shall
equal the value of the macro EOF. If the argument has any other value,
the behavior is undefined."

I am having a hard time formulating my question - basically its like
this though - Some people say cast to unsigned char (as in the above
example), whereas I have seen some people argue that casting to
unsigned char is unecessary, and if it is done, then a recast back to
int is necessary, because functions like toupper() expect an int, eg,

toupper( (int)((unsigned char) c) );

So what is the right thing to do? Cast to unsigned char? Cast to
unsigned char and back to int?

Nov 15 '05 #1
14 2609
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

mr**********@ho tmail.com wrote:
I have been reading about the practise of casting values to unsigned
char while using the <ctype.h> functions. For example,

c = toupper ((unsigned char) c);

Now I understand that the standard says this about the <ctype.h>
functions:

"The header <ctype.h> declares several functions useful for classifying
and mapping characters.166) In all cases the argument is an int, the
value of which shall be representable as an unsigned char or shall
equal the value of the macro EOF. If the argument has any other value,
the behavior is undefined."

I am having a hard time formulating my question - basically its like
this though - Some people say cast to unsigned char (as in the above
example), whereas I have seen some people argue that casting to
unsigned char is unecessary, and if it is done, then a recast back to
int is necessary, because functions like toupper() expect an int, eg,

toupper( (int)((unsigned char) c) );

So what is the right thing to do? Cast to unsigned char? Cast to
unsigned char and back to int?


IIRC, the C standard says that all members of the execution characterset
will be expressable as positive values.

Since the use of toupper() only makes sense within the scope of the
execution characterset, and not for arbitrary char values outside of
that range, it is safe to say that toupper() only works on positive char
values, or EOF (which is a specific, often negative, char value).

Casting the parameter to an unsigned char
- - may change the interpretation of the value of the parameter, if it is
(a negative value) EOF
- - has no effect on proper members of the execution character set

Casting this back to int, while properly correcting the type of the
parameter to int, otherwise has (to my knowledge) no other effect. The
damage has been done by the cast to unsigned char, and cannot be
corrected by the recasting to int.

In the end, toupper() was originally meant to take an int (as in the
return value of fgetc(), and that's what you should give it. If you are
working with char data items, convert them to int first, but otherwise
don't cast. I.e.
#include <ctype.h>
#include <stdio.h>

{
char datum = 'c', uc_datum;
int file_datum, uc_file_datum;

file_datum = fgetc(stdin);
uc_file_datum = toupper(file_da tum);

uc_datum = toupper((int)da tum);
}

- --

Lew Pitcher, IT Specialist, Enterprise Data Systems
Enterprise Technology Solutions, TD Bank Financial Group

(Opinions expressed here are my own, not my employer's)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)

iD8DBQFCwrO+agV FX4UWr64RAn4DAK CnWQEAHo7kXd8xv 3DFlJFIyDH7BQCg 5W9M
REN07taxd5C5T4S JMM8JaSk=
=HH+V
-----END PGP SIGNATURE-----
Nov 15 '05 #2


mr**********@ho tmail.com wrote:
I have been reading about the practise of casting values to unsigned
char while using the <ctype.h> functions. For example,

c = toupper ((unsigned char) c);

Now I understand that the standard says this about the <ctype.h>
functions:

"The header <ctype.h> declares several functions useful for classifying
and mapping characters.166) In all cases the argument is an int, the
value of which shall be representable as an unsigned char or shall
equal the value of the macro EOF. If the argument has any other value,
the behavior is undefined."

I am having a hard time formulating my question - basically its like
this though - Some people say cast to unsigned char (as in the above
example), whereas I have seen some people argue that casting to
unsigned char is unecessary, and if it is done, then a recast back to
int is necessary, because functions like toupper() expect an int, eg,

toupper( (int)((unsigned char) c) );

So what is the right thing to do? Cast to unsigned char? Cast to
unsigned char and back to int?


If `c' is a plain `char', cast it to `unsigned char'.
The further cast to `int' is harmless but unnecessary: since
<ctype.h> provides a prototype that says toupper() takes an
`int' argument, the compiler will do the conversion anyhow.

The reason you need the cast is that converting directly
from plain `char' to `int' might not produce what toupper()
needs. Specifically, if `char' is a signed type and `c' has
a negative value, direct conversion will produce a negative
`int'. If this negative `int' happens to equal EOF toupper()
will just return the EOF unaltered, and this might not be the
upper-case equivalent of `c'. If the negative `int' is
something other than EOF, all bets are off and you are in the
perilous realm of Undefined Behavior.

If `c' is an `int' obtained from something like getc(),
just pass it along without casting. getc() and its ilk
already return either EOF or a non-negative `unsigned char'
value, which is what toupper() et al. require.

--
Er*********@sun .com

Nov 15 '05 #3
mr**********@ho tmail.com wrote:

I have been reading about the practise of casting values to unsigned
char while using the <ctype.h> functions. For example,

c = toupper ((unsigned char) c);

Now I understand that the standard says this about the <ctype.h>
functions:

"The header <ctype.h>
declares several functions useful for classifying
and mapping characters.166) In all cases the argument is an int, the
value of which shall be representable as an unsigned char or shall
equal the value of the macro EOF. If the argument has any other value,
the behavior is undefined."

I am having a hard time formulating my question - basically its like
this though - Some people say cast to unsigned char (as in the above
example), whereas I have seen some people argue that casting to
unsigned char is unecessary, and if it is done, then a recast back to
int is necessary, because functions like toupper() expect an int, eg,

toupper( (int)((unsigned char) c) );

So what is the right thing to do? Cast to unsigned char? Cast to
unsigned char and back to int?


Since toupper is undefined for values which are
not representable as unsigned char,
then if a cast to unsigned char will change the value,
then do that, if not, then don't bother.

fputc, by which all file output is described,
converts it's int argument to unsigned char, before output.

So, if you have a negative integer value like:
#define NEG_a ('a' - 1 - (unsigned char)-1)
you know that
putchar(NEG_a);
will output the 'a' character.

To make that negative number work with toupper:
putchar(toupper ((unsigned char)NEG_a));

--
pete
Nov 15 '05 #4
Eric Sosman wrote:

mr**********@ho tmail.com wrote:
I have been reading about the practise of casting values to unsigned
char while using the <ctype.h> functions. For example,

c = toupper ((unsigned char) c);

Now I understand that the standard says this about the <ctype.h>
functions: [...] So what is the right thing to do? Cast to unsigned char? Cast to
unsigned char and back to int?


If `c' is a plain `char', cast it to `unsigned char'.
The further cast to `int' is harmless but unnecessary: since
<ctype.h> provides a prototype that says toupper() takes an
`int' argument, the compiler will do the conversion anyhow.

The reason you need the cast is that converting directly
from plain `char' to `int' might not produce what toupper()
needs. Specifically, if `char' is a signed type and `c' has
a negative value, direct conversion will produce a negative
`int'. If this negative `int' happens to equal EOF toupper()
will just return the EOF unaltered, and this might not be the
upper-case equivalent of `c'. If the negative `int' is
something other than EOF, all bets are off and you are in the
perilous realm of Undefined Behavior.

[...]

For example, suppose toupper were defined as:

#define toupper(c) ( ((c)==EOF) ? EOF : toupper_xlate[c] )

where "toupper_xl ate[]" is an array of the convert-to-upper-case values.
If c is a signed char, then in addition to 0xFF probably equivalent to
EOF, you also have 0x80 through 0xFE being sign-extended as negative
subscripts into the array. (BTDT)

--
+-------------------------+--------------------+-----------------------------+
| Kenneth J. Brody | www.hvcomputer.com | |
| kenbrody/at\spamcop.net | www.fptech.com | #include <std_disclaimer .h> |
+-------------------------+--------------------+-----------------------------+
Don't e-mail me at: <mailto:Th***** ********@gmail. com>

Nov 15 '05 #5


Lew Pitcher wrote:

IIRC, the C standard says that all members of the execution characterset
will be expressable as positive values.
You seem to have missed the difference between "execution
chracter set" and "basic execution character set," described
in section 5.2.1. Section 6.2.5/3 guarantees that all the
basic characters are positive[*], but no such guarantee applies
to the "extended execution character set."
[*] Is this a defect in the Standard? '\0' is a member
of the basic execution character set, yet it is not positive.
Since the use of toupper() only makes sense within the scope of the
execution characterset, and not for arbitrary char values outside of
that range, it is safe to say that toupper() only works on positive char
values, or EOF (which is a specific, often negative, char value).
toupper() applies only to the execution character set and
to EOF, true. But toupper() is not restricted to the basic
execution character set; it also applies to extended characters.
If you want to translate æ to Æ or ñ to Ñ or å to Å, you are
dealing with extended characters and must consider that they
could be negative.

By the way, EOF is not "often" negative but "always" negative,
and is not a `char' value but an `int' value. See 7.19.1/3.
Casting the parameter to an unsigned char
- - may change the interpretation of the value of the parameter, if it is
(a negative value) EOF
Changing the value from possibly negative to guaranteed
positive is the purpose of the cast. I'm not sure why you
mention EOF here.
- - has no effect on proper members of the execution character set
Has no effect on members of the basic execution set, but
can affect extended characters.
Casting this back to int, while properly correcting the type of the
parameter to int, otherwise has (to my knowledge) no other effect. The
damage has been done by the cast to unsigned char, and cannot be
corrected by the recasting to int.
... except "damage" is the wrong word, and there's nothing
that needs to be "corrected. "
In the end, toupper() was originally meant to take an int (as in the
return value of fgetc(), and that's what you should give it. If you are
working with char data items, convert them to int first, but otherwise
don't cast. I.e.


#include <ctype.h>
#include <stdio.h>

{
char datum = 'c', uc_datum;
int file_datum, uc_file_datum;

file_datum = fgetc(stdin);
uc_file_datum = toupper(file_da tum);

uc_datum = toupper((int)da tum);


No; that's both pointless and wrong. "Pointless" because
the compiler would perform this conversion anyhow without the
cast, and "wrong" because a negative `datum' would invoke
undefined behavior unless it just happened to equal EOF.

--
Er*********@sun .com

Nov 15 '05 #6


Kenneth Brody wrote:

For example, suppose toupper were defined as:

#define toupper(c) ( ((c)==EOF) ? EOF : toupper_xlate[c] )

where "toupper_xl ate[]" is an array of the convert-to-upper-case values.
If c is a signed char, then in addition to 0xFF probably equivalent to
EOF, you also have 0x80 through 0xFE being sign-extended as negative
subscripts into the array. (BTDT)


That particular definition wouldn't work because it
isn't safe from side-effects -- consider `toupper(*p++)' .
The usual rescue is something like

#define toupper(c) _toupper_xlate[(c) - EOF]

.... using a table that's "offset" by -EOF (usually 1)
positions.

--
Er*********@sun .com

Nov 15 '05 #7
On 29 Jun 2005 07:28:23 -0700, mr**********@ho tmail.com wrote in
comp.lang.c:
I have been reading about the practise of casting values to unsigned
char while using the <ctype.h> functions. For example,

c = toupper ((unsigned char) c);

Now I understand that the standard says this about the <ctype.h>
functions:

"The header <ctype.h> declares several functions useful for classifying
and mapping characters.166) In all cases the argument is an int, the
value of which shall be representable as an unsigned char or shall
equal the value of the macro EOF. If the argument has any other value,
the behavior is undefined."

I am having a hard time formulating my question - basically its like
this though - Some people say cast to unsigned char (as in the above
example), whereas I have seen some people argue that casting to ^^^^^^^^^^^

Who are these "some people"? What are their qualifications to offer
advice on this subject?

On an implementation where plain char is signed (which is quite
common, especially on x86 systems, meaning Windows, most Linux, and
soon to be Macintosh), then this:

char ch = CHAR_MIN;

int uc = toupper(ch);

....produces undefined behavior, unless the macros CHAR_MIN and EOF
happen to be equal, not likely. The very words you quoted above
specifically say so.

So your "some people" are very foolish if they are offering advice
based on fundamental misconception.
unsigned char is unecessary, and if it is done, then a recast back to
int is necessary, because functions like toupper() expect an int, eg,
If the same foolish "some people" are saying this, then they are
actually beyond misconception and have arrived at ignorance. The only
advice they are qualified to give about C is perhaps how to spell the
name of the language. And even then I would double check their
answer.

toupper( (int)((unsigned char) c) );

So what is the right thing to do? Cast to unsigned char? Cast to
unsigned char and back to int?


--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.l earn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html
Nov 15 '05 #8
Eric Sosman wrote:
.... snip ...
If `c' is an `int' obtained from something like getc(),
just pass it along without casting. getc() and its ilk
already return either EOF or a non-negative `unsigned char'
value, which is what toupper() et al. require.


I think the point is that getc and friends do not return a char,
they all return an int. So unless the OP makes the beginners
mistake of storing that value in a char, all is correct without any
special effort. Thus the prototype for filling a char array is:

while (EOF != (ch = getc(...))) {
/* make tests on ch */
/* optionally store ch in a char array */
}

and the tests on the intermediate storage ch needs no special care,
provided that is of type int.

--
"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
Nov 15 '05 #9
> The reason you need the cast is that converting directly
from plain `char' to `int' might not produce what toupper()
needs.

Does the standard guarantee that casting signed char with a negative,
non-EOF value, to unsigned char will produce the expected character? It
seems to me that unless this guarantee is provided, the cast would give
you defined behavior but garbage results. That's only marginally better
than undefined behavior. As such, wouldn't it be better to simply avoid
the operation if the value is out of range?

if (c == EOF || (c >= 0 && c <= UCHAR_MAX))
c = toupper(c);
else {
/* Special treatment for c */
}

Nov 15 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
2504
by: drowned | last post by:
I'm having a problem understanding why the code in this little program I've got is behaving the way it does... if anyone can explain it, I'd be grateful: #include <iostream> using namespace std; //This function takes an unsigned char (one byte on my machine) //and prints it in binary notation
2
3774
by: Jason Heyes | last post by:
Is this the right idea? char *ptr; unsigned char *uptr = reinterpret_cast<unsigned char *>(ptr); What about when I start with a char only? char ch; unsigned char *uptr = &static_cast<unsigned char>(ch);
19
1737
by: Ramesh Tharma | last post by:
Hi, Is any one knows what's wrong with the following code, I was told that it will compile and run but it will crash for some values. Assume that variables are initilized. char* c; long* lg;
25
10138
by: hugo2 | last post by:
Obrhy/hugo July 12, 2004 Take a look at this memcpy() definition. Is there a good reason the void pointer args are cast to byte just to assign their addresses to byte pointers? /*from Steve Maguire's 'Writing Soild Code'*/ void *memcpy(void *pvTo,void *pvFrom,size_t size)
9
430
by: Roman Mashak | last post by:
Hello, All! I met this code recently on some open source sites. What may be the point of using such construction: typedef struct cmd { unsigned int cmdack; unsigned int code; unsigned int data;
24
2084
by: Francine.Neary | last post by:
I've read that you should always cast the argument you pass to isupper(), isalnum(), etc. to unsigned char, even though their signature is int is...(int). This confuses me, for the following reason. The is...() functions can either accept a character, or EOF. But now suppose (as is common) that EOF==(int) -1. Then (unsigned char) EOF will be 255, which is a valid character value! So this casting destroys the possibility to pass EOF to...
2
2676
by: keith | last post by:
Hello, Could someone please explain why the (GCC, ancient 2.95.3 version) compiler won't let me static_cast<a char* to a unsigned char* or vice versa? It also won't accept dynamic_cast<for those conversions. The only one it will permit is reinterpret_cast<>. BTW, yes I know that pointer casts are 'evil', but here's a for- example: ostream::write for some reason expects a const char* and a streamsize parameter, rather than the...
12
2111
by: Phil Endecott | last post by:
Dear Experts, I need a function that takes a float, swaps its endianness (htonl) in place, and returns a char* pointer to its first byte. This is one of a family of functions that prepare different data types for passing to another process. I have got confused by the rules about what won't work, what will work, and what might work, when casting. Specifically, I have an implementation that works until I remove my debugging, at which...
10
3767
by: Alex Vinokur | last post by:
Hi, Is it possible to do C++-casting from const pair<const unsigned char*, size_t>* to const pair<unsigned char*, size_t>* ? Alex Vinokur
0
8010
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
7942
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
5963
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5461
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
3922
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
3969
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2443
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1550
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
1287
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.