473,396 Members | 1,785 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Should I use "char" or "unsigned char" for strings?

Hi,

I would like some advice on whether I should be using plain "chars"
for strings. I have instead been using "unsigned char" in my code (for
embedded systems). In general the strings contain ASCII characters in
the 0-127 range, although I had thought that I might want to use the
128-255 range for special symbols or foreign character codes.

This all worked OK for a long time, but a recent update to the
compiler on my system has resulted in a lot of errors such as:

"pointer targets in passing argument 1 of 'strcpy' differ in
signedness"

Basically, the compiler is now protesting about me passing strings of
"unsigned char" to standard library functions that expect "char"
(which seems to be most of them).

I can rewrite my code to use plain chars. Or, I can cast the string
pointers in the standard library function calls. Both of these will
need quite a lot of (fairly trivial) changes. Or I expect I can turn
the warnings off.

(I would think this topic must be beaten to death, but I did not see
anything in the FAQ!).

Thanks,

--

John Devereux
Nov 14 '05 #1
4 10589
John Devereux wrote on 28/03/05 :
I would like some advice on whether I should be using plain "chars"
for strings. I have instead been using "unsigned char" in my code (for
embedded systems). In general the strings contain ASCII characters in
the 0-127 range, although I had thought that I might want to use the
128-255 range for special symbols or foreign character codes.


Stick to char for strings. You could activate some 'make char unsigned'
option if you need a 0-255 range. But AFAIK, it's not necessary. Values
128..255 are encoded -1..-127 on most machines.

--
Emmanuel
The C-FAQ: http://www.eskimo.com/~scs/C-faq/faq.html
The C-library: http://www.dinkumware.com/refxc.html

"Clearly your code does not meet the original spec."
"You are sentenced to 30 lashes with a wet noodle."
-- Jerry Coffin in a.l.c.c++

Nov 14 '05 #2


Emmanuel Delahaye wrote:
John Devereux wrote on 28/03/05 :
I would like some advice on whether I should be using plain "chars"
for strings. I have instead been using "unsigned char" in my code (for
embedded systems). In general the strings contain ASCII characters in
the 0-127 range, although I had thought that I might want to use the
128-255 range for special symbols or foreign character codes.

Stick to char for strings. You could activate some 'make char unsigned'
option if you need a 0-255 range. But AFAIK, it's not necessary. Values
128..255 are encoded -1..-127 on most machines.


ITYM -128..-1 -- but the advice is sound.

The possible signedness of `char' is, IMHO, one of
the nagging infelicities of C. It's an imperfection we
simply have to live with, and attempts to get around it
by type-punning with `unsigned char' aren't satisfactory.
As Emmanuel says, use plain `char' when dealing with
characters -- but when using the <ctype.h> functions,
take care to cast where needed:

#include <ctype.h>
const char *skip_whitespace(const char *string) {
while (isspace((unsigned char)*string)
++string;
return string;
}

Despite appearances, the cast is required if there's any
chance at all of "extended" characters in the strings. I
can't think of any other standard library functions that
require such ugliness, so if you switch to plain `char'
strings there shouldn't be too many places where you need
to insert casts.

--
Er*********@sun.com

Nov 14 '05 #3
Eric Sosman <er*********@sun.com> writes:
Emmanuel Delahaye wrote:
John Devereux wrote on 28/03/05 :
I would like some advice on whether I should be using plain "chars"
for strings. I have instead been using "unsigned char" in my code (for
embedded systems). In general the strings contain ASCII characters in
the 0-127 range, although I had thought that I might want to use the
128-255 range for special symbols or foreign character codes.

Stick to char for strings. You could activate some 'make char unsigned'
option if you need a 0-255 range. But AFAIK, it's not necessary. Values
128..255 are encoded -1..-127 on most machines.


ITYM -128..-1 -- but the advice is sound.


OK, should cover any machine I am likely to encounter.

The possible signedness of `char' is, IMHO, one of
the nagging infelicities of C. It's an imperfection we
simply have to live with, and attempts to get around it
by type-punning with `unsigned char' aren't satisfactory.
As Emmanuel says, use plain `char' when dealing with
characters -- but when using the <ctype.h> functions,
take care to cast where needed:

#include <ctype.h>
const char *skip_whitespace(const char *string) {
while (isspace((unsigned char)*string)
++string;
return string;
}

Despite appearances, the cast is required if there's any
chance at all of "extended" characters in the strings. I
can't think of any other standard library functions that
require such ugliness, so if you switch to plain `char'
strings there shouldn't be too many places where you need
to insert casts.


Great, I use rarely use these anyway.

What about conversion to and from an "int" I wonder? Some of my
functions process a string character by character, calling another
function with that character. This will presumably get promoted to an
int, right? And then probably converted back to a "char" again in the
function. As I understand it, an in-range negative "int" is guaranteed
to get converted to the same negative "char" value. So we should be
OK.

--

John Devereux
Nov 14 '05 #4


John Devereux wrote:
[...]
What about conversion to and from an "int" I wonder? Some of my
functions process a string character by character, calling another
function with that character. This will presumably get promoted to an
int, right? And then probably converted back to a "char" again in the
function. As I understand it, an in-range negative "int" is guaranteed
to get converted to the same negative "char" value. So we should be
OK.


There are three cases:

If the function is prototyped to take a `char' argument,
the `char' value you provide is passed to the function without
conversion or promotion, and received just as you passed it.
There may be behind-the-scenes magic involved (e.g., passing
an eight-bit value in a 32-bit register), but the effect must
be "as if" nothing happens.

If the `char' you provide is passed to an old-style
function (no prototype) that expects a `char' argument, the
provided value is promoted, passed, and then "demoted" upon
receipt. Again, the value arrives unscathed even though the
representation may change on "exotic" hardware: if you provide
a negative zero the function might receive a positive zero,
but it will in any case receive a zero.

If the `char' argument corresponds to part of the `...'
of a variable-argument function, the value is promoted just
as for prototypeless functions. In this case, though, you
actually need to know the promoted type when you fetch the
argument: `va_arg(ap, char)' is incorrect. A `char' will
promote to `int' if `int' can represent all possible values
a `char' might have, or to `unsigned int' otherwise. From
your earlier posts it appears you're assuming an eight-bit
`char' (values between -128 and 255), which fits comfortably
in the range of `int' (at least -32767..32767, perhaps wider).
Some systems, though, have sizeof(int)==sizeof(char)==1, and
if `char' is unsigned on such a system it will promote to
`unsigned int' instead of `int'.

--
Er*********@sun.com

Nov 14 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: srktnc | last post by:
When I run the program, I get a Debug Error saying "This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more...
43
by: M-One | last post by:
See subject: how do I calloc (and free the memory, if that's not free(my_bytes);) this? TIA!
26
by: =?gb2312?B?wNbA1rTzzOzKpg==?= | last post by:
i wrote: ----------------------------------------------------------------------- ---------------------------------------- unsigned char * p = reinterpret_cast<unsigned char *>("abcdg");...
2
by: runner | last post by:
I'm trying to call some functions from OpenSSL library but I'm a bit confused when I have to use pinvoke. first function should create key from some input data, it's declared : void...
8
by: FabioAng | last post by:
Assuming I have this function (it's not complete): template<typename InputType, typename OutputIterator> void to_utf8(InputType input, OutputIterator result) { // trivial conversion if (input...
14
by: Anna | last post by:
I try to put 8 int bit for example 10100010 into one character of type char(1 octet) with no hope . Could anyone propose a simple way to do it? Thank you very much.
0
by: Ben Voigt [C++ MVP] | last post by:
Chip Gore wrote: Actually, because you are using the C++/CLI compiler, microsoft.public.dotnet.languages.vc is the most appropriate place to discuss this. To be useful from C# and other .NET...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.