Should I use "char" or "unsigned char" for strings?

John Devereux

Hi,

I would like some advice on whether I should be using plain "chars"
for strings. I have instead been using "unsigned char" in my code (for
embedded systems). In general the strings contain ASCII characters in
the 0-127 range, although I had thought that I might want to use the
128-255 range for special symbols or foreign character codes.

This all worked OK for a long time, but a recent update to the
compiler on my system has resulted in a lot of errors such as:

"pointer targets in passing argument 1 of 'strcpy' differ in
signedness"

Basically, the compiler is now protesting about me passing strings of
"unsigned char" to standard library functions that expect "char"
(which seems to be most of them).

I can rewrite my code to use plain chars. Or, I can cast the string
pointers in the standard library function calls. Both of these will
need quite a lot of (fairly trivial) changes. Or I expect I can turn
the warnings off.

(I would think this topic must be beaten to death, but I did not see
anything in the FAQ!).

Thanks,

--

John Devereux

Nov 14 '05 #1

Subscribe Post Reply

10589

Emmanuel Delahaye

John Devereux wrote on 28/03/05 :

I would like some advice on whether I should be using plain "chars"
for strings. I have instead been using "unsigned char" in my code (for
embedded systems). In general the strings contain ASCII characters in
the 0-127 range, although I had thought that I might want to use the
128-255 range for special symbols or foreign character codes.

Stick to char for strings. You could activate some 'make char unsigned'
option if you need a 0-255 range. But AFAIK, it's not necessary. Values
128..255 are encoded -1..-127 on most machines.

--
Emmanuel
The C-FAQ: http://www.eskimo.com/~scs/C-faq/faq.html
The C-library: http://www.dinkumware.com/refxc.html

"Clearly your code does not meet the original spec."
"You are sentenced to 30 lashes with a wet noodle."
-- Jerry Coffin in a.l.c.c++

Nov 14 '05 #2

Eric Sosman

Emmanuel Delahaye wrote:

John Devereux wrote on 28/03/05 :
I would like some advice on whether I should be using plain "chars"
for strings. I have instead been using "unsigned char" in my code (for
embedded systems). In general the strings contain ASCII characters in
the 0-127 range, although I had thought that I might want to use the
128-255 range for special symbols or foreign character codes.

Stick to char for strings. You could activate some 'make char unsigned'
option if you need a 0-255 range. But AFAIK, it's not necessary. Values
128..255 are encoded -1..-127 on most machines.

ITYM -128..-1 -- but the advice is sound.

The possible signedness of `char' is, IMHO, one of
the nagging infelicities of C. It's an imperfection we
simply have to live with, and attempts to get around it
by type-punning with `unsigned char' aren't satisfactory.
As Emmanuel says, use plain `char' when dealing with
characters -- but when using the <ctype.h> functions,
take care to cast where needed:

#include <ctype.h>
const char *skip_whitespace(const char *string) {
while (isspace((unsigned char)*string)
++string;
return string;
}

Despite appearances, the cast is required if there's any
chance at all of "extended" characters in the strings. I
can't think of any other standard library functions that
require such ugliness, so if you switch to plain `char'
strings there shouldn't be too many places where you need
to insert casts.

--
Er*********@sun.com

Nov 14 '05 #3

John Devereux

Eric Sosman <er*********@sun.com> writes:

Emmanuel Delahaye wrote:
John Devereux wrote on 28/03/05 :
I would like some advice on whether I should be using plain "chars"
for strings. I have instead been using "unsigned char" in my code (for
embedded systems). In general the strings contain ASCII characters in
the 0-127 range, although I had thought that I might want to use the
128-255 range for special symbols or foreign character codes.

Stick to char for strings. You could activate some 'make char unsigned'
option if you need a 0-255 range. But AFAIK, it's not necessary. Values
128..255 are encoded -1..-127 on most machines.

ITYM -128..-1 -- but the advice is sound.

OK, should cover any machine I am likely to encounter.

The possible signedness of `char' is, IMHO, one of
the nagging infelicities of C. It's an imperfection we
simply have to live with, and attempts to get around it
by type-punning with `unsigned char' aren't satisfactory.
As Emmanuel says, use plain `char' when dealing with
characters -- but when using the <ctype.h> functions,
take care to cast where needed:

#include <ctype.h>
const char *skip_whitespace(const char *string) {
while (isspace((unsigned char)*string)
++string;
return string;
}

Despite appearances, the cast is required if there's any
chance at all of "extended" characters in the strings. I
can't think of any other standard library functions that
require such ugliness, so if you switch to plain `char'
strings there shouldn't be too many places where you need
to insert casts.

Great, I use rarely use these anyway.

What about conversion to and from an "int" I wonder? Some of my
functions process a string character by character, calling another
function with that character. This will presumably get promoted to an
int, right? And then probably converted back to a "char" again in the
function. As I understand it, an in-range negative "int" is guaranteed
to get converted to the same negative "char" value. So we should be
OK.

--

John Devereux

Nov 14 '05 #4

Eric Sosman

John Devereux wrote:

[...]
What about conversion to and from an "int" I wonder? Some of my
functions process a string character by character, calling another
function with that character. This will presumably get promoted to an
int, right? And then probably converted back to a "char" again in the
function. As I understand it, an in-range negative "int" is guaranteed
to get converted to the same negative "char" value. So we should be
OK.

There are three cases:

If the function is prototyped to take a `char' argument,
the `char' value you provide is passed to the function without
conversion or promotion, and received just as you passed it.
There may be behind-the-scenes magic involved (e.g., passing
an eight-bit value in a 32-bit register), but the effect must
be "as if" nothing happens.

If the `char' you provide is passed to an old-style
function (no prototype) that expects a `char' argument, the
provided value is promoted, passed, and then "demoted" upon
receipt. Again, the value arrives unscathed even though the
representation may change on "exotic" hardware: if you provide
a negative zero the function might receive a positive zero,
but it will in any case receive a zero.

If the `char' argument corresponds to part of the `...'
of a variable-argument function, the value is promoted just
as for prototypeless functions. In this case, though, you
actually need to know the promoted type when you fetch the
argument: `va_arg(ap, char)' is incorrect. A `char' will
promote to `int' if `int' can represent all possible values
a `char' might have, or to `unsigned int' otherwise. From
your earlier posts it appears you're assuming an eight-bit
`char' (values between -128 and 255), which fits comfortably
in the range of `int' (at least -32767..32767, perhaps wider).
Some systems, though, have sizeof(int)==sizeof(char)==1, and
if `char' is unsigned on such a system it will promote to
`unsigned int' instead of `int'.

--
Er*********@sun.com

Nov 14 '05 #5

by: srktnc | last post by:

When I run the program, I get a Debug Error saying "This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more...

C / C++

Calloc of "unsigned char my_bytes[256][256]"

by: M-One | last post by:

See subject: how do I calloc (and free the memory, if that's not free(my_bytes);) this? TIA!

C / C++

how to deal with the translation from "const char * " to "const unsigned char *"?

by: =?gb2312?B?wNbA1rTzzOzKpg==?= | last post by:

i wrote: ----------------------------------------------------------------------- ---------------------------------------- unsigned char * p = reinterpret_cast<unsigned char *>("abcdg");...

C / C++

pinvoke, how to marshal "unsigned char *"

by: runner | last post by:

I'm trying to call some functions from OpenSSL library but I'm a bit confused when I have to use pinvoke. first function should create key from some input data, it's declared : void...

.NET Framework

How to use the "unsigned" representation of a number

by: FabioAng | last post by:

Assuming I have this function (it's not complete): template<typename InputType, typename OutputIterator> void to_utf8(InputType input, OutputIterator result) { // trivial conversion if (input...

C / C++

how to put 8 "int" => 10100010 into one character of type "char"

by: Anna | last post by:

I try to put 8 int bit for example 10100010 into one character of type char(1 octet) with no hope . Could anyone propose a simple way to do it? Thank you very much.

C / C++

Re: Why is a "char *" method parameter being projected as a "sbyte *"?

by: Ben Voigt [C++ MVP] | last post by:

Chip Gore wrote: Actually, because you are using the C++/CLI compiler, microsoft.public.dotnet.languages.vc is the most appropriate place to discuss this. To be useful from C# and other .NET...

C# / C Sharp

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Should I use "char" or "unsigned char" for strings?

Similar topics