Macro for supplying memset with an unsigned char

Martin Wells

I'm trying to come up with a fully-portable macro for supplying memset
with an unsigned char rather than an int. I'm going to think out loud
as I go along. . .

I'll take a sample system before I begin:

CHAR_BIT == 16
sizeof(short) == sizeof(int) == 1
Assume none of the integer types have padding bits
Sign-magnitude

Therefore we have:

UCHAR_MAX == 65535
INT_MIN = -32767
INT_MAX = 32767

Let's say we have an array of bytes and we want to set every byte to
65000. We CANNOT use:

memset(data, 65000, sizeof data);

because the conversion from unsigned integer types to signed integer
types "is implementation-defined or an implementation-defined signal
is raised" if the number is out of range.

Therefore we need to supply memset with an int value, which, went
converted to unsigned char, will yield the value we want.

The rules for converting from signed to unsigned are as follows:

| If the new type is unsigned, the value is converted
| by repeatedly adding or subtracting one more than
| the maximum value that can be represented in the
| new type until the value is in the range of the new type.

The addition method is easier to understand so we'll go with that one.
If we start off with a negative number like -1, then here's what will
happen:

char unsigned c = -1;

is equal to:

infinite_range_int x = -1; /* Let's pretend we have a signed
int type that can hold any number */

while (0 x || UCHAR_MAX < x) x += UCHAR_MAX +
(infinite_range_int)1;

char unsigned c = x;

So on our own system, this is:

while (0 x || 65535 < x) x += 65536;

Clearly, if x = -1, then it only takes one iteration of the loop to
yield 65535, i.e. UCHAR_MAX.

Therefore, if we want UCHAR_MAX-1, then we'd use (int)-2.
For UCHAR_MAX-2, we'd use (int)-3.

The entire set of data looks something like:

int char unsigned
-1 65535
-2 65534
-3 65533
-4 65532
-5 65531
-6 65530
-7 65529
-8 65528
-9 65527
-10 65526
-11 65525
-12 65524
....
....
-32764 32772
-32765 32771
-32766 32770
-32767 32769
-32768 32768 <--

Now I've just realised a problem. An unsigned char can store 65536
different combinations (i.e. 0 through 65535), but an int can only
store 65535 different combination (i.e. -32767 through 32767) if we're
using something other than two's complement. I don't know what I'll do
about that, but for now I'll try continue with the other two number
systems:

#if NUMBER_SYSTEM != SIGN_MAGNITUDE

#define UC_AS_INT(x) /* Whatever we're going to do */

#endif

My first thought is something like:

#define UC_AS_INT(x) UC_AS_INT_Internal( (char unsigned)(x) )

#define UC_AS_INT_Internal(x) ( x INT_MAX \
? -(int)(UCHAR_MAX - x) - 1 \
: (int)x )

Anyway it's Friday an I've stuff to do, but if anyone wants to finish
it off then feel free! :)

If we can't get all 65536 combinations out of one's complement or sign-
magnitude, then we can just have a macro that changes it to:

char unsigned *p = data;
char unsigned const *const pover = data + sizeof data;
while (pover != p) *p++ = c;

Martin

Sep 28 '07 #1

Subscribe Post Reply

3530

pete

Martin Wells wrote:

>
I'm trying to come up with a fully-portable macro for supplying memset
with an unsigned char rather than an int. I'm going to think out loud
as I go along. . .

I'll take a sample system before I begin:

CHAR_BIT == 16
sizeof(short) == sizeof(int) == 1
Assume none of the integer types have padding bits
Sign-magnitude

Therefore we have:

UCHAR_MAX == 65535
INT_MIN = -32767
INT_MAX = 32767

Let's say we have an array of bytes and we want to set every byte to
65000. We CANNOT use:

memset(data, 65000, sizeof data);

because the conversion from unsigned integer types to signed integer
types "is implementation-defined or an implementation-defined signal
is raised" if the number is out of range.

Whether or not you can set an unsigned char to 65000
is implementation defined,
so there's nothing wrong
with an implementation defined way of doing it.

--
pete

Sep 28 '07 #2

Martin Wells

pete:

Whether or not you can set an unsigned char to 65000
is implementation defined,
so there's nothing wrong
with an implementation defined way of doing it.

The reason I mentioned concrete figures like 65535 instead of
UCHAR_MAX is that I think people find it easier to understand and
grasp.

The point wasn't whether we could assign 65000 to an int, but rather
whether we could assign (UCHAR_MAX - some_small_number) to an int and
have the same results on every implementation conceivable.

For clarity, I'll rewrite my original post taking out the concrete
numbers. Remember again, that the code is being written in the context
of it being FULLY portable (e.g. 97-Bit char's and sign-magnitude):
Let's say we have an array of bytes and we want to set every byte to
(UCHAR_MAX - 4). We CANNOT use:
memset(data, UCHAR_MAX - 4, sizeof data);
because the conversion from unsigned integer types to signed integer
types "is implementation-defined or an implementation-defined signal
is raised" if the number is out of range. (So in the context of fully
portable programming, the resultant int could have pretty much any
value because UCHAR_MAX might be bigger than INT_MAX).

Therefore we need to supply memset with an int value, which, went
converted to unsigned char, will yield the value we want.

The rules for converting from signed to unsigned are as follows:

| If the new type is unsigned, the value is converted
| by repeatedly adding or subtracting one more than
| the maximum value that can be represented in the
| new type until the value is in the range of the new type.

The addition method is easier to understand so we'll go with that
one.
If we start off with a negative number like -1, then here's what will
happen:
char unsigned c = -1;
is equal to:
infinite_range_int x = -1; /* Let's pretend we have a signed
int type that can hold any number */
while (0 x || UCHAR_MAX < x) x += UCHAR_MAX +
(infinite_range_int)1;
char unsigned c = x;
So here's a few samples of what will happen on different systems:
while (0 x || 255 < x) x += 256;
while (0 x || 65535 < x) x += 65536;
while (0 x || 4294967295 < x) x += 4294967296;
while (0 x || 18446744073709551615 < x) x +=
18446744073709551616;

If x = -1, then it only takes one iteration of the loop to
yield UCHAR_MAX on any implementation.

Therefore, if we want UCHAR_MAX-1, then we'd use (int)-2.
For UCHAR_MAX-2, we'd use (int)-3.
The entire set of data looks something like:
int char unsigned
-1 UCHAR_MAX
-2 UCHAR_MAX-1
-3 UCHAR_MAX-2
-4 UCHAR_MAX-3
-5 UCHAR_MAX-4
-6 UCHAR_MAX-5
-7 UCHAR_MAX-6
-8 UCHAR_MAX-7
-9 UCHAR_MAX-8
-10 UCHAR_MAX-9
-11 UCHAR_MAX-10
-12 UCHAR_MAX-11
....
....
Now I've just realised a problem. Imagine a system where unsigned char
has the range 0 through 65535 and where int has -32767 through 32767.
The former has 65536 possible combinations while the latter only has
65535 combinations. We might have to resort to a loop if working with
something other than two's complement, but I'm not sure yet.

Anyway here's the code I have at the moment, I robbed some of it from
old posts of yours pete:

#define SIGNMAG 0
#define ONES 1
#define TWOS 2

#if -1 & 3 == 1
#define NUM_SYS SIGNMAG
#elif -1 & 3 == 2
#define NUM_SYS ONES
#else
#define NUM_SYS TWOS
#endif
#if NUM_SYS != TWOS /* ----------- */

#include <stddef.h>

static void *uc_memset(void *const pv,char unsigned const val,size_t
const len)
{
char *p = pv;
char const *const pover = p + len;

while (pover != p) *p++ = val;

return pv;
}

#define UC_MEMSET(p,uc,len) (uc_memset(p,uc,len))

#else /* ------------ */

#include <string.h>

#define UC_AS_INT(x) UC_AS_INT_Internal( (char unsigned)(x) )

#define UC_AS_INT_Internal(x) ( x INT_MAX \
? -(int)(UCHAR_MAX - x) - 1 \
: (int)x )

#define UC_MEMSET(p,uc,len) (memset((p),UC_AS_INT((uc)),(len)))

#endif /* ----------- */

#include <limits.h>

int main(void)
{
char unsigned data[24];

UC_MEMSET(data, UCHAR_MAX, sizeof data);

return 0;
}

Feel free to make alterations if you see a better way of doing it!

Martin

Sep 29 '07 #3

Charlie Gordon

"Martin Wells" <wa****@eircom.neta écrit dans le message de news:
11**********************@r29g2000hsg.googlegroups. com...
<snip>

>
Now I've just realised a problem. Imagine a system where unsigned char
has the range 0 through 65535 and where int has -32767 through 32767.
The former has 65536 possible combinations while the latter only has
65535 combinations. We might have to resort to a loop if working with
something other than two's complement, but I'm not sure yet.

For this and other similar reasons, it would be difficult if not impossible
to implement a fully conformant hosted C envirinment on an architecture with
non twos-complement representation and sizeof(int) == 1 at the same time.

Luckily, non twos-complement architectures can only be found in museums
today.

Anyway here's the code I have at the moment, I robbed some of it from
old posts of yours pete:

#define SIGNMAG 0
#define ONES 1
#define TWOS 2

#if -1 & 3 == 1
#define NUM_SYS SIGNMAG
#elif -1 & 3 == 2
#define NUM_SYS ONES
#else
#define NUM_SYS TWOS
#endif

These tests are incorrect for two reasons:

* ``-1 & 3 == 1'' is interpreted as ``-1 & (3 == 1)'' which yields 0 for
all platforms.

* There is no guarantee that the preprocessing be performed with the same
representation as the target architecture. As a matter of fact, embedded
targets with unusual arithmetics are often targetted by cross compilers
running on different machines.

It is a sad fact that integer representation cannot be adequately tested at
the preprocessing stages. sizeof(int) == 1 cannot be evaluated be the
preprocessor.

One can only test the macros from <limits.h>:

#if INT_MIN == -INT_MAX
/* we are targetting a non twos-complement architecture */
# if INT_MAX < UCHAR_MAX
/* Houston, we have a problem! */
# define MEMSET_IS_INADEQUATE 1
# endif
# define NUM_SYS ONES_OR_SIGNMAG
#else
# define NUM_SYS TWOS
#endif
--
Chqrlie.

Sep 29 '07 #4

Martin Wells

Chqrlie:

For this and other similar reasons, it would be difficult if not impossible
to implement a fully conformant hosted C envirinment on an architecture with
non twos-complement representation and sizeof(int) == 1 at the same time.

Luckily, non twos-complement architectures can only be found in museums
today.

Unless it's prevented by the "laws of mathematics" or something like
that, I allow for every possiblity when writing portable code. (A
little ridiculous at times, I admit, but hey I don't make a sacrifice
unless it's a sacrifice worth making).

Anyway here's the code I have at the moment, I robbed some of it from
old posts of yours pete:

#define SIGNMAG 0
#define ONES 1
#define TWOS 2

#if -1 & 3 == 1
#define NUM_SYS SIGNMAG
#elif -1 & 3 == 2
#define NUM_SYS ONES
#else
#define NUM_SYS TWOS
#endif

These tests are incorrect for two reasons:

* ``-1 & 3 == 1'' is interpreted as ``-1 & (3 == 1)'' which yields 0 for
all platforms.

Wups.

* There is no guarantee that the preprocessing be performed with the same
representation as the target architecture. As a matter of fact, embedded
targets with unusual arithmetics are often targetted by cross compilers
running on different machines.

Now I may be mistaken, but I think the requirement with C99 is that
the preprocessor int types be the same as the actual C int types
(including their use of number systems). Not sure if this applies to
C89.

It is a sad fact that integer representation cannot be adequately tested at
the preprocessing stages. sizeof(int) == 1 cannot be evaluated be the
preprocessor.

One can only test the macros from <limits.h>:

#if INT_MIN == -INT_MAX
/* we are targetting a non twos-complement architecture */
# if INT_MAX < UCHAR_MAX
/* Houston, we have a problem! */
# define MEMSET_IS_INADEQUATE 1
# endif
# define NUM_SYS ONES_OR_SIGNMAG
#else
# define NUM_SYS TWOS
#endif

Great idea! What about the following then:

#include <limits.h>

#if INT_MAX >= UCHAR_MAX

/* Normal memset will work just fine */
# define UC_MEMSET(p,uc,len) (memset((p),(char unsigned)(uc),
(len)))

#elif INT_MIN != -INT_MAX

/* We've got two's complement, we can still use memset */

# include <string.h>

# define UC_AS_INT_Internal(x) ( x INT_MAX \
? -(int)(UCHAR_MAX - x) - 1 \
: (int)x )

# define UC_AS_INT(x) UC_AS_INT_Internal( (char unsigned)
(x) )
# define UC_MEMSET(p,uc,len) (memset((p),UC_AS_INT((uc)),(len)))
#else

/* int hasn't got enough unique value combinations, we can't use
memset :( */

# include <stddef.h>

static void *uc_memset(void *const pv,char unsigned const
val,size_t const len)
{
char *p = pv;
char const *const pover = p + len;

while (pover != p) *p++ = val;

return pv;
}
# define UC_MEMSET(p,uc,len) (uc_memset(p,uc,len))
#endif
int main(void)
{
char unsigned data[24];
UC_MEMSET(data, UCHAR_MAX, sizeof data);
return 0;
}
Martin

Sep 29 '07 #5

pete

Martin Wells wrote:

>

#elif INT_MIN != -INT_MAX

/* We've got two's complement, we can still use memset */

The preprocessor directive is correct, but the comment is wrong.
What really matters is whether or not INT_MIN equals -INT_MAX.
INT_MIN is allowed to equal -INT_MAX on
implementations that use two's complement.

--
pete

Sep 29 '07 #6

=?iso-2022-kr?q?Harald_van_D=0E=29=26=0Fk?=

On Sat, 29 Sep 2007 16:26:57 -0400, pete wrote:

Martin Wells wrote:
>#elif INT_MIN != -INT_MAX

/* We've got two's complement, we can still use memset */

The preprocessor directive is correct, but the comment is wrong. What
really matters is whether or not INT_MIN equals -INT_MAX. INT_MIN is
allowed to equal -INT_MAX on implementations that use two's complement.

Right, but INT_MIN is not allowed to differ from -INT_MAX on
implementations that don't use two's complement. So if the #elif block is
entered, you know you're dealing with two's complement. That info is not
actually useful, for the reason you stated, but it's not wrong either.

Sep 29 '07 #7

Army1987

On Fri, 28 Sep 2007 10:09:38 -0700, Martin Wells wrote:

>
I'm trying to come up with a fully-portable macro for supplying memset
with an unsigned char rather than an int. I'm going to think out loud
as I go along. . .

If you want to set every byte of an object to a value (other than
0 or a character constant in the basic character set), you know
what that does on that object. And since that depends on the
implementation, why do you want to do it fully-portably?
--
Army1987 (Replace "NOSPAM" with "email")
A hamburger is better than nothing.
Nothing is better than eternal happiness.
Therefore, a hamburger is better than eternal happiness.

Sep 30 '07 #8

Charlie Gordon

"Martin Wells" <wa****@eircom.neta écrit dans le message de news:
11**********************@w3g2000hsg.googlegroups.c om...

Chqrlie:

>For this and other similar reasons, it would be difficult if not
impossible
to implement a fully conformant hosted C envirinment on an architecture
with
non twos-complement representation and sizeof(int) == 1 at the same time.

Luckily, non twos-complement architectures can only be found in museums
today.

Unless it's prevented by the "laws of mathematics" or something like
that, I allow for every possiblity when writing portable code. (A
little ridiculous at times, I admit, but hey I don't make a sacrifice
unless it's a sacrifice worth making).

Well there are more important battles to be faught than this one.

>* There is no guarantee that the preprocessing be performed with the same
representation as the target architecture. As a matter of fact, embedded
targets with unusual arithmetics are often targetted by cross compilers
running on different machines.

Now I may be mistaken, but I think the requirement with C99 is that
the preprocessor int types be the same as the actual C int types
(including their use of number systems). Not sure if this applies to
C89.

Chapter and Verse ?

6.10.1p4 says for the purpose of evaluating preprocessing constant
expressions (#if / #elif)preprocessing numbers act as if they have the same
representation as intmax_t (or uintmax_t for unsigned variants). They leave
it implementation defined if character constants convert to the same numeric
value for proprocessing constant expressions and actual compilation. Could
it be possible that intmax_t use twos-complement and int use sign/magnitude
?

I think the Standard is not precise enough on this issue, and I don't even
have a copy of C89 to check if it applies there.

As for your ultimate proposal, I am still analysing it, but I don't think
you can refer to unsigned char as ``char unsigned''

--
Chqrlie.

Sep 30 '07 #9

Martin Wells

Army1987:

If you want to set every byte of an object to a value (other than
0 or a character constant in the basic character set), you know
what that does on that object. And since that depends on the
implementation, why do you want to do it fully-portably?

I'm writing portable code for an embedded system. The microcontroller
will output a byte value via ports consisting of individual pins which
will be either 5 volts or 0 volts to indicate binary 1 or 0. I want to
be easily able to set all ports to a given pattern (e.g. all zeros,
all ones, alternating ones and zeros, two zeros then a one, etc.).

Of course, the code that actually sets the pins values will be
micrcontroller, library and compiler specific, but there's no reason
to deportify the guts of the program.

Martin

Sep 30 '07 #10

Charlie Gordon

"Martin Wells" <wa****@eircom.neta écrit dans le message de news:
11**********************@o80g2000hse.googlegroups. com...

Army1987:

>If you want to set every byte of an object to a value (other than
0 or a character constant in the basic character set), you know
what that does on that object. And since that depends on the
implementation, why do you want to do it fully-portably?

I'm writing portable code for an embedded system. The microcontroller
will output a byte value via ports consisting of individual pins which
will be either 5 volts or 0 volts to indicate binary 1 or 0. I want to
be easily able to set all ports to a given pattern (e.g. all zeros,
all ones, alternating ones and zeros, two zeros then a one, etc.).

Of course, the code that actually sets the pins values will be
micrcontroller, library and compiler specific, but there's no reason
to deportify the guts of the program.

For the specific cases all bits 0 and all bits 1, the solution is simple:

memset(array, 0, sizeof array); /* all bits 0 */
memset(array, -1, sizeof array); /* all bits 1 */

For arbitrary bit patterns, it may not be possible with memset on
architectures with non twos-complement arithmetics and sizeof(int) == 1.
But discussing these is a form of mental masturbation as they do not exist
in the real world. Most regulars here indulge in it almost daily, but only
in forums like this one, not in production code. Obfuscating calls to
memset to ensure protability to the DS9K is exactly that: obfuscation. It
makes your program harder to write, harder to read, more prone to bugs.

--
Chqrlie.

Oct 1 '07 #11

Martin Wells

Chqrlie:

For the specific cases all bits 0 and all bits 1, the solution is simple:

memset(array, 0, sizeof array); /* all bits 0 */
memset(array, -1, sizeof array); /* all bits 1 */

For arbitrary bit patterns, it may not be possible with memset on
architectures with non twos-complement arithmetics and sizeof(int) == 1.

The UC_MEMSET macro takes care of that by calling a function which has
a loop. If you ask me though, the C89 Standard is broken in that it
doesn't provide a UC_MEMSET itself. But the again, it makes more fun
for us to patch over the broken stuff :D

But discussing these is a form of mental masturbation as they do not exist
in the real world. Most regulars here indulge in it almost daily, but only
in forums like this one, not in production code.

Yes I can agree that if time is money, you're not going to be very
productive by accomodating sign-magnitude machines, but it still is a
bit of fun to make your code 100% portable to a certain standard. I'm
doing an embedded systems project at the moment, and most people would
start off as non-portable and keeping getting more and more non-
portable. Instead I've decided to got the portable route... and it's
going well so far :D

Obfuscating calls to
memset to ensure protability to the DS9K is exactly that: obfuscation. It
makes your program harder to write, harder to read, more prone to bugs.

Not if you hide the funky stuff in header files:

#include "broken_int_uc_fixes.h"

int main(void)
{
UC_MEMSET(whatever,UCHAR_MAX,sizeof whatever);
}

Martin

Oct 1 '07 #12

Richard

"Charlie Gordon" <ne**@chqrlie.orgwrites:

"Martin Wells" <wa****@eircom.neta Ã©crit dans le message de news:
11**********************@o80g2000hse.googlegroups. com...
>Army1987:

>>If you want to set every byte of an object to a value (other than
0 or a character constant in the basic character set), you know
what that does on that object. And since that depends on the
implementation, why do you want to do it fully-portably?

I'm writing portable code for an embedded system. The microcontroller
will output a byte value via ports consisting of individual pins which
will be either 5 volts or 0 volts to indicate binary 1 or 0. I want to
be easily able to set all ports to a given pattern (e.g. all zeros,
all ones, alternating ones and zeros, two zeros then a one, etc.).

Of course, the code that actually sets the pins values will be
micrcontroller, library and compiler specific, but there's no reason
to deportify the guts of the program.

For the specific cases all bits 0 and all bits 1, the solution is simple:

memset(array, 0, sizeof array); /* all bits 0 */
memset(array, -1, sizeof array); /* all bits 1 */

For arbitrary bit patterns, it may not be possible with memset on
architectures with non twos-complement arithmetics and sizeof(int) == 1.
But discussing these is a form of mental masturbation as they do not exist
in the real world. Most regulars here indulge in it almost daily, but only
in forums like this one, not in production code. Obfuscating calls to
memset to ensure protability to the DS9K is exactly that: obfuscation. It
makes your program harder to write, harder to read, more prone to
bugs.

Well said.

Oct 1 '07 #13

Macro for supplying memset with an unsigned char

Similar topics