Macro for supplying memset with an unsigned char

Martin Wells

I'm trying to come up with a fully-portable macro for supplying memset
with an unsigned char rather than an int. I'm going to think out loud
as I go along. . .

I'll take a sample system before I begin:

CHAR_BIT == 16
sizeof(short) == sizeof(int) == 1
Assume none of the integer types have padding bits
Sign-magnitude

Therefore we have:

UCHAR_MAX == 65535
INT_MIN = -32767
INT_MAX = 32767

Let's say we have an array of bytes and we want to set every byte to
65000. We CANNOT use:

memset(data, 65000, sizeof data);

because the conversion from unsigned integer types to signed integer
types "is implementation-defined or an implementation-defined signal
is raised" if the number is out of range.

Therefore we need to supply memset with an int value, which, went
converted to unsigned char, will yield the value we want.

The rules for converting from signed to unsigned are as follows:

| If the new type is unsigned, the value is converted
| by repeatedly adding or subtracting one more than
| the maximum value that can be represented in the
| new type until the value is in the range of the new type.

The addition method is easier to understand so we'll go with that one.
If we start off with a negative number like -1, then here's what will
happen:

char unsigned c = -1;

is equal to:

infinite_range_ int x = -1; /* Let's pretend we have a signed
int type that can hold any number */

while (0 x || UCHAR_MAX < x) x += UCHAR_MAX +
(infinite_range _int)1;

char unsigned c = x;

So on our own system, this is:

while (0 x || 65535 < x) x += 65536;

Clearly, if x = -1, then it only takes one iteration of the loop to
yield 65535, i.e. UCHAR_MAX.

Therefore, if we want UCHAR_MAX-1, then we'd use (int)-2.
For UCHAR_MAX-2, we'd use (int)-3.

The entire set of data looks something like:

int char unsigned
-1 65535
-2 65534
-3 65533
-4 65532
-5 65531
-6 65530
-7 65529
-8 65528
-9 65527
-10 65526
-11 65525
-12 65524
....
....
-32764 32772
-32765 32771
-32766 32770
-32767 32769
-32768 32768 <--

Now I've just realised a problem. An unsigned char can store 65536
different combinations (i.e. 0 through 65535), but an int can only
store 65535 different combination (i.e. -32767 through 32767) if we're
using something other than two's complement. I don't know what I'll do
about that, but for now I'll try continue with the other two number
systems:

#if NUMBER_SYSTEM != SIGN_MAGNITUDE

#define UC_AS_INT(x) /* Whatever we're going to do */

#endif

My first thought is something like:

#define UC_AS_INT(x) UC_AS_INT_Inter nal( (char unsigned)(x) )

#define UC_AS_INT_Inter nal(x) ( x INT_MAX \
? -(int)(UCHAR_MAX - x) - 1 \
: (int)x )

Anyway it's Friday an I've stuff to do, but if anyone wants to finish
it off then feel free! :)

If we can't get all 65536 combinations out of one's complement or sign-
magnitude, then we can just have a macro that changes it to:

char unsigned *p = data;
char unsigned const *const pover = data + sizeof data;
while (pover != p) *p++ = c;

Martin

Sep 28 '07 #1

Subscribe Reply

3559

pete

Martin Wells wrote:

>
I'm trying to come up with a fully-portable macro for supplying memset
with an unsigned char rather than an int. I'm going to think out loud
as I go along. . .

I'll take a sample system before I begin:

CHAR_BIT == 16
sizeof(short) == sizeof(int) == 1
Assume none of the integer types have padding bits
Sign-magnitude

Therefore we have:

UCHAR_MAX == 65535
INT_MIN = -32767
INT_MAX = 32767

Let's say we have an array of bytes and we want to set every byte to
65000. We CANNOT use:

memset(data, 65000, sizeof data);

because the conversion from unsigned integer types to signed integer
types "is implementation-defined or an implementation-defined signal
is raised" if the number is out of range.

Whether or not you can set an unsigned char to 65000
is implementation defined,
so there's nothing wrong
with an implementation defined way of doing it.

--
pete

Sep 28 '07 #2

Martin Wells

pete:

Whether or not you can set an unsigned char to 65000
is implementation defined,
so there's nothing wrong
with an implementation defined way of doing it.

The reason I mentioned concrete figures like 65535 instead of
UCHAR_MAX is that I think people find it easier to understand and
grasp.

The point wasn't whether we could assign 65000 to an int, but rather
whether we could assign (UCHAR_MAX - some_small_numb er) to an int and
have the same results on every implementation conceivable.

For clarity, I'll rewrite my original post taking out the concrete
numbers. Remember again, that the code is being written in the context
of it being FULLY portable (e.g. 97-Bit char's and sign-magnitude):
Let's say we have an array of bytes and we want to set every byte to
(UCHAR_MAX - 4). We CANNOT use:
memset(data, UCHAR_MAX - 4, sizeof data);
because the conversion from unsigned integer types to signed integer
types "is implementation-defined or an implementation-defined signal
is raised" if the number is out of range. (So in the context of fully
portable programming, the resultant int could have pretty much any
value because UCHAR_MAX might be bigger than INT_MAX).

Therefore we need to supply memset with an int value, which, went
converted to unsigned char, will yield the value we want.

The rules for converting from signed to unsigned are as follows:

| If the new type is unsigned, the value is converted
| by repeatedly adding or subtracting one more than
| the maximum value that can be represented in the
| new type until the value is in the range of the new type.

The addition method is easier to understand so we'll go with that
one.
If we start off with a negative number like -1, then here's what will
happen:
char unsigned c = -1;
is equal to:
infinite_range_ int x = -1; /* Let's pretend we have a signed
int type that can hold any number */
while (0 x || UCHAR_MAX < x) x += UCHAR_MAX +
(infinite_range _int)1;
char unsigned c = x;
So here's a few samples of what will happen on different systems:
while (0 x || 255 < x) x += 256;
while (0 x || 65535 < x) x += 65536;
while (0 x || 4294967295 < x) x += 4294967296;
while (0 x || 184467440737095 51615 < x) x +=
184467440737095 51616;

If x = -1, then it only takes one iteration of the loop to
yield UCHAR_MAX on any implementation.

Therefore, if we want UCHAR_MAX-1, then we'd use (int)-2.
For UCHAR_MAX-2, we'd use (int)-3.
The entire set of data looks something like:
int char unsigned
-1 UCHAR_MAX
-2 UCHAR_MAX-1
-3 UCHAR_MAX-2
-4 UCHAR_MAX-3
-5 UCHAR_MAX-4
-6 UCHAR_MAX-5
-7 UCHAR_MAX-6
-8 UCHAR_MAX-7
-9 UCHAR_MAX-8
-10 UCHAR_MAX-9
-11 UCHAR_MAX-10
-12 UCHAR_MAX-11
....
....
Now I've just realised a problem. Imagine a system where unsigned char
has the range 0 through 65535 and where int has -32767 through 32767.
The former has 65536 possible combinations while the latter only has
65535 combinations. We might have to resort to a loop if working with
something other than two's complement, but I'm not sure yet.

Anyway here's the code I have at the moment, I robbed some of it from
old posts of yours pete:

#define SIGNMAG 0
#define ONES 1
#define TWOS 2

#if -1 & 3 == 1
#define NUM_SYS SIGNMAG
#elif -1 & 3 == 2
#define NUM_SYS ONES
#else
#define NUM_SYS TWOS
#endif
#if NUM_SYS != TWOS /* ----------- */

#include <stddef.h>

static void *uc_memset(void *const pv,char unsigned const val,size_t
const len)
{
char *p = pv;
char const *const pover = p + len;

while (pover != p) *p++ = val;

return pv;
}

#define UC_MEMSET(p,uc, len) (uc_memset(p,uc ,len))

#else /* ------------ */

#include <string.h>

#define UC_AS_INT(x) UC_AS_INT_Inter nal( (char unsigned)(x) )

#define UC_AS_INT_Inter nal(x) ( x INT_MAX \
? -(int)(UCHAR_MAX - x) - 1 \
: (int)x )

#define UC_MEMSET(p,uc, len) (memset((p),UC_ AS_INT((uc)),(l en)))

#endif /* ----------- */

#include <limits.h>

int main(void)
{
char unsigned data[24];

UC_MEMSET(data, UCHAR_MAX, sizeof data);

return 0;
}

Feel free to make alterations if you see a better way of doing it!

Martin

Sep 29 '07 #3

Charlie Gordon

"Martin Wells" <wa****@eircom. neta écrit dans le message de news:
11************* *********@r29g2 00...legr oups.com...
<snip>

>
Now I've just realised a problem. Imagine a system where unsigned char
has the range 0 through 65535 and where int has -32767 through 32767.
The former has 65536 possible combinations while the latter only has
65535 combinations. We might have to resort to a loop if working with
something other than two's complement, but I'm not sure yet.

For this and other similar reasons, it would be difficult if not impossible
to implement a fully conformant hosted C envirinment on an architecture with
non twos-complement representation and sizeof(int) == 1 at the same time.

Luckily, non twos-complement architectures can only be found in museums
today.

Anyway here's the code I have at the moment, I robbed some of it from
old posts of yours pete:

#define SIGNMAG 0
#define ONES 1
#define TWOS 2

#if -1 & 3 == 1
#define NUM_SYS SIGNMAG
#elif -1 & 3 == 2
#define NUM_SYS ONES
#else
#define NUM_SYS TWOS
#endif

These tests are incorrect for two reasons:

* ``-1 & 3 == 1'' is interpreted as ``-1 & (3 == 1)'' which yields 0 for
all platforms.

* There is no guarantee that the preprocessing be performed with the same
representation as the target architecture. As a matter of fact, embedded
targets with unusual arithmetics are often targetted by cross compilers
running on different machines.

It is a sad fact that integer representation cannot be adequately tested at
the preprocessing stages. sizeof(int) == 1 cannot be evaluated be the
preprocessor.

One can only test the macros from <limits.h>:

#if INT_MIN == -INT_MAX
/* we are targetting a non twos-complement architecture */
# if INT_MAX < UCHAR_MAX
/* Houston, we have a problem! */
# define MEMSET_IS_INADE QUATE 1
# endif
# define NUM_SYS ONES_OR_SIGNMAG
#else
# define NUM_SYS TWOS
#endif
--
Chqrlie.

Sep 29 '07 #4

Martin Wells

Chqrlie:

For this and other similar reasons, it would be difficult if not impossible
to implement a fully conformant hosted C envirinment on an architecture with
non twos-complement representation and sizeof(int) == 1 at the same time.

Luckily, non twos-complement architectures can only be found in museums
today.

Unless it's prevented by the "laws of mathematics" or something like
that, I allow for every possiblity when writing portable code. (A
little ridiculous at times, I admit, but hey I don't make a sacrifice
unless it's a sacrifice worth making).

Anyway here's the code I have at the moment, I robbed some of it from
old posts of yours pete:

#define SIGNMAG 0
#define ONES 1
#define TWOS 2

#if -1 & 3 == 1
#define NUM_SYS SIGNMAG
#elif -1 & 3 == 2
#define NUM_SYS ONES
#else
#define NUM_SYS TWOS
#endif

These tests are incorrect for two reasons:

* ``-1 & 3 == 1'' is interpreted as ``-1 & (3 == 1)'' which yields 0 for
all platforms.

Wups.

* There is no guarantee that the preprocessing be performed with the same
representation as the target architecture. As a matter of fact, embedded
targets with unusual arithmetics are often targetted by cross compilers
running on different machines.

Now I may be mistaken, but I think the requirement with C99 is that
the preprocessor int types be the same as the actual C int types
(including their use of number systems). Not sure if this applies to
C89.

It is a sad fact that integer representation cannot be adequately tested at
the preprocessing stages. sizeof(int) == 1 cannot be evaluated be the
preprocessor.

One can only test the macros from <limits.h>:

#if INT_MIN == -INT_MAX
/* we are targetting a non twos-complement architecture */
# if INT_MAX < UCHAR_MAX
/* Houston, we have a problem! */
# define MEMSET_IS_INADE QUATE 1
# endif
# define NUM_SYS ONES_OR_SIGNMAG
#else
# define NUM_SYS TWOS
#endif

Great idea! What about the following then:

#include <limits.h>

#if INT_MAX >= UCHAR_MAX

/* Normal memset will work just fine */
# define UC_MEMSET(p,uc, len) (memset((p),(ch ar unsigned)(uc),
(len)))

#elif INT_MIN != -INT_MAX

/* We've got two's complement, we can still use memset */

# include <string.h>

# define UC_AS_INT_Inter nal(x) ( x INT_MAX \
? -(int)(UCHAR_MAX - x) - 1 \
: (int)x )

# define UC_AS_INT(x) UC_AS_INT_Inter nal( (char unsigned)
(x) )
# define UC_MEMSET(p,uc, len) (memset((p),UC_ AS_INT((uc)),(l en)))
#else

/* int hasn't got enough unique value combinations, we can't use
memset :( */

# include <stddef.h>

static void *uc_memset(void *const pv,char unsigned const
val,size_t const len)
{
char *p = pv;
char const *const pover = p + len;

while (pover != p) *p++ = val;

return pv;
}
# define UC_MEMSET(p,uc, len) (uc_memset(p,uc ,len))
#endif
int main(void)
{
char unsigned data[24];
UC_MEMSET(data, UCHAR_MAX, sizeof data);
return 0;
}
Martin

Sep 29 '07 #5

pete

Martin Wells wrote:

>

#elif INT_MIN != -INT_MAX

/* We've got two's complement, we can still use memset */

The preprocessor directive is correct, but the comment is wrong.
What really matters is whether or not INT_MIN equals -INT_MAX.
INT_MIN is allowed to equal -INT_MAX on
implementations that use two's complement.

--
pete

Sep 29 '07 #6

=?iso-2022-kr?q?Harald_van_D=0E=29=26=0Fk?=

On Sat, 29 Sep 2007 16:26:57 -0400, pete wrote:

Martin Wells wrote:
>#elif INT_MIN != -INT_MAX

/* We've got two's complement, we can still use memset */

The preprocessor directive is correct, but the comment is wrong. What
really matters is whether or not INT_MIN equals -INT_MAX. INT_MIN is
allowed to equal -INT_MAX on implementations that use two's complement.

Right, but INT_MIN is not allowed to differ from -INT_MAX on
implementations that don't use two's complement. So if the #elif block is
entered, you know you're dealing with two's complement. That info is not
actually useful, for the reason you stated, but it's not wrong either.

Sep 29 '07 #7

Army1987

On Fri, 28 Sep 2007 10:09:38 -0700, Martin Wells wrote:

>
I'm trying to come up with a fully-portable macro for supplying memset
with an unsigned char rather than an int. I'm going to think out loud
as I go along. . .

If you want to set every byte of an object to a value (other than
0 or a character constant in the basic character set), you know
what that does on that object. And since that depends on the
implementation, why do you want to do it fully-portably?
--
Army1987 (Replace "NOSPAM" with "email")
A hamburger is better than nothing.
Nothing is better than eternal happiness.
Therefore, a hamburger is better than eternal happiness.

Sep 30 '07 #8

Charlie Gordon

"Martin Wells" <wa****@eircom. neta écrit dans le message de news:
11************* *********@w3g20 00...legro ups.com...

Chqrlie:

>For this and other similar reasons, it would be difficult if not
impossible
to implement a fully conformant hosted C envirinment on an architecture
with
non twos-complement representation and sizeof(int) == 1 at the same time.

Luckily, non twos-complement architectures can only be found in museums
today.

Unless it's prevented by the "laws of mathematics" or something like
that, I allow for every possiblity when writing portable code. (A
little ridiculous at times, I admit, but hey I don't make a sacrifice
unless it's a sacrifice worth making).

Well there are more important battles to be faught than this one.

>* There is no guarantee that the preprocessing be performed with the same
representati on as the target architecture. As a matter of fact, embedded
targets with unusual arithmetics are often targetted by cross compilers
running on different machines.

Now I may be mistaken, but I think the requirement with C99 is that
the preprocessor int types be the same as the actual C int types
(including their use of number systems). Not sure if this applies to
C89.

Chapter and Verse ?

6.10.1p4 says for the purpose of evaluating preprocessing constant
expressions (#if / #elif)preproces sing numbers act as if they have the same
representation as intmax_t (or uintmax_t for unsigned variants). They leave
it implementation defined if character constants convert to the same numeric
value for proprocessing constant expressions and actual compilation. Could
it be possible that intmax_t use twos-complement and int use sign/magnitude
?

I think the Standard is not precise enough on this issue, and I don't even
have a copy of C89 to check if it applies there.

As for your ultimate proposal, I am still analysing it, but I don't think
you can refer to unsigned char as ``char unsigned''

--
Chqrlie.

Sep 30 '07 #9

Martin Wells

Army1987:

If you want to set every byte of an object to a value (other than
0 or a character constant in the basic character set), you know
what that does on that object. And since that depends on the
implementation, why do you want to do it fully-portably?

I'm writing portable code for an embedded system. The microcontroller
will output a byte value via ports consisting of individual pins which
will be either 5 volts or 0 volts to indicate binary 1 or 0. I want to
be easily able to set all ports to a given pattern (e.g. all zeros,
all ones, alternating ones and zeros, two zeros then a one, etc.).

Of course, the code that actually sets the pins values will be
micrcontroller, library and compiler specific, but there's no reason
to deportify the guts of the program.

Martin

Sep 30 '07 #10

Similar topics

6786

memset() and portability

by: copx | last post by:

I want to fill char/int arrays with zeros. Is something like this portable/save: char myArray; memset(myArray,0,(30*80) * sizeof(char)); copx

C / C++

8201

passing a (const void *) into memset()

by: bob_jenkins | last post by:

{ const void *p; (void)memset((void *)p, ' ', (size_t)10); } Should this call to memset() be legal? Memset is of type void *memset(void *, unsigned char, size_t) Also, (void *) is the generic pointer type. My real question is, is (void *) such a generic pointer type that it

C / C++

2077

Are foo[bar] = "" (at declaration) and memset (foo, '\0', bar) the same?

by: adelfino | last post by:

I mean: unsigned char foo = ""; is the same as: unsigned char foo; memset (foo, '\0', bar); ?

C / C++

6330

Is memset used only for strings?

by: Frederick Ding | last post by:

Hi, guys,I met a problem, Please look at the problem below: int* bit = (int*)malloc(10000*sizeof(int)); memset(bit, 1, 10000*sizeof(int)); printf("%d %d %d\n", bit,bit, bit); Output: 16843009 16843009 16843009 Obviously I set the bit to bit to 1, but it outputs are not 1's.

C / C++

8718

Checking memset

by: jacob navia | last post by:

Many compilers check printf for errors, lcc-win32 too. But there are other functions that would be worth to check, specially memset. Memset is used mainly to clear a memory zone, receiving a pointer to the start, the value (most of the time zero) and the size of the memory array to clear. Problems appear when the size given is not the...

C / C++

5032

understanding memset

by: pauldepstein | last post by:

This is copy-pasted from cplusplus.com: BEGINNING OF QUOTE void * memset ( void * buffer, int c, size_t num ); Fill buffer with specified character. Sets the first num bytes pointed by buffer to the value specified by c parameter

C / C++

7902

macro vs. template???

by: aaragon | last post by:

Hi everyone. A very simple question. I would like to know what is better in terms of performance. I want to use a simple function to obtain the minimum of two values. One way could be using a macro: #define min(a,b) ((a<b)?a:b) I thought that another way could be to use a template function: template <class T> T min<T a, T b>

C / C++

3898

Doubt in memcpy() and memset()

by: Shhnwz.a | last post by:

Hi, I want to know some of the situations , when to use memcpy() and when to memset(); Thanx in Advance..

C / C++

321

Reformulating a macro to use argument just once

by: Francois Grieu | last post by:

Consider this macro // check if x, assumed of type unsigned char, is in range #define ISVALID(x) ((x)>=0x20 && (x)<=0x7E) Of course, this can't be safely used as in if (ISVALID(*p++)) foo(); where p is a pointer ot unsigned char.

C / C++

7618

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...

Windows Server

8132

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...

Online Marketing

7678

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...

Windows Server

7982

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...

General

6286

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...

Career Advice

5222

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...

C# / C Sharp

3656

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...

Networking - Hardware / Configuration

1226

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

944

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

General