459,663 Members | 1,431 Online
Need help? Post your question and get tips & solutions from a community of 459,663 IT Pros & Developers. It's quick & easy.

# Macro for supplying memset with an unsigned char

 P: n/a I'm trying to come up with a fully-portable macro for supplying memset with an unsigned char rather than an int. I'm going to think out loud as I go along. . . I'll take a sample system before I begin: CHAR_BIT == 16 sizeof(short) == sizeof(int) == 1 Assume none of the integer types have padding bits Sign-magnitude Therefore we have: UCHAR_MAX == 65535 INT_MIN = -32767 INT_MAX = 32767 Let's say we have an array of bytes and we want to set every byte to 65000. We CANNOT use: memset(data, 65000, sizeof data); because the conversion from unsigned integer types to signed integer types "is implementation-defined or an implementation-defined signal is raised" if the number is out of range. Therefore we need to supply memset with an int value, which, went converted to unsigned char, will yield the value we want. The rules for converting from signed to unsigned are as follows: | If the new type is unsigned, the value is converted | by repeatedly adding or subtracting one more than | the maximum value that can be represented in the | new type until the value is in the range of the new type. The addition method is easier to understand so we'll go with that one. If we start off with a negative number like -1, then here's what will happen: char unsigned c = -1; is equal to: infinite_range_int x = -1; /* Let's pretend we have a signed int type that can hold any number */ while (0 x || UCHAR_MAX < x) x += UCHAR_MAX + (infinite_range_int)1; char unsigned c = x; So on our own system, this is: while (0 x || 65535 < x) x += 65536; Clearly, if x = -1, then it only takes one iteration of the loop to yield 65535, i.e. UCHAR_MAX. Therefore, if we want UCHAR_MAX-1, then we'd use (int)-2. For UCHAR_MAX-2, we'd use (int)-3. The entire set of data looks something like: int char unsigned -1 65535 -2 65534 -3 65533 -4 65532 -5 65531 -6 65530 -7 65529 -8 65528 -9 65527 -10 65526 -11 65525 -12 65524 .... .... -32764 32772 -32765 32771 -32766 32770 -32767 32769 -32768 32768 <-- Now I've just realised a problem. An unsigned char can store 65536 different combinations (i.e. 0 through 65535), but an int can only store 65535 different combination (i.e. -32767 through 32767) if we're using something other than two's complement. I don't know what I'll do about that, but for now I'll try continue with the other two number systems: #if NUMBER_SYSTEM != SIGN_MAGNITUDE #define UC_AS_INT(x) /* Whatever we're going to do */ #endif My first thought is something like: #define UC_AS_INT(x) UC_AS_INT_Internal( (char unsigned)(x) ) #define UC_AS_INT_Internal(x) ( x INT_MAX \ ? -(int)(UCHAR_MAX - x) - 1 \ : (int)x ) Anyway it's Friday an I've stuff to do, but if anyone wants to finish it off then feel free! :) If we can't get all 65536 combinations out of one's complement or sign- magnitude, then we can just have a macro that changes it to: char unsigned *p = data; char unsigned const *const pover = data + sizeof data; while (pover != p) *p++ = c; Martin Sep 28 '07 #1
12 Replies

 P: n/a Martin Wells wrote: > I'm trying to come up with a fully-portable macro for supplying memset with an unsigned char rather than an int. I'm going to think out loud as I go along. . . I'll take a sample system before I begin: CHAR_BIT == 16 sizeof(short) == sizeof(int) == 1 Assume none of the integer types have padding bits Sign-magnitude Therefore we have: UCHAR_MAX == 65535 INT_MIN = -32767 INT_MAX = 32767 Let's say we have an array of bytes and we want to set every byte to 65000. We CANNOT use: memset(data, 65000, sizeof data); because the conversion from unsigned integer types to signed integer types "is implementation-defined or an implementation-defined signal is raised" if the number is out of range. Whether or not you can set an unsigned char to 65000 is implementation defined, so there's nothing wrong with an implementation defined way of doing it. -- pete Sep 28 '07 #2

 P: n/a pete: Whether or not you can set an unsigned char to 65000 is implementation defined, so there's nothing wrong with an implementation defined way of doing it. The reason I mentioned concrete figures like 65535 instead of UCHAR_MAX is that I think people find it easier to understand and grasp. The point wasn't whether we could assign 65000 to an int, but rather whether we could assign (UCHAR_MAX - some_small_number) to an int and have the same results on every implementation conceivable. For clarity, I'll rewrite my original post taking out the concrete numbers. Remember again, that the code is being written in the context of it being FULLY portable (e.g. 97-Bit char's and sign-magnitude): Let's say we have an array of bytes and we want to set every byte to (UCHAR_MAX - 4). We CANNOT use: memset(data, UCHAR_MAX - 4, sizeof data); because the conversion from unsigned integer types to signed integer types "is implementation-defined or an implementation-defined signal is raised" if the number is out of range. (So in the context of fully portable programming, the resultant int could have pretty much any value because UCHAR_MAX might be bigger than INT_MAX). Therefore we need to supply memset with an int value, which, went converted to unsigned char, will yield the value we want. The rules for converting from signed to unsigned are as follows: | If the new type is unsigned, the value is converted | by repeatedly adding or subtracting one more than | the maximum value that can be represented in the | new type until the value is in the range of the new type. The addition method is easier to understand so we'll go with that one. If we start off with a negative number like -1, then here's what will happen: char unsigned c = -1; is equal to: infinite_range_int x = -1; /* Let's pretend we have a signed int type that can hold any number */ while (0 x || UCHAR_MAX < x) x += UCHAR_MAX + (infinite_range_int)1; char unsigned c = x; So here's a few samples of what will happen on different systems: while (0 x || 255 < x) x += 256; while (0 x || 65535 < x) x += 65536; while (0 x || 4294967295 < x) x += 4294967296; while (0 x || 18446744073709551615 < x) x += 18446744073709551616; If x = -1, then it only takes one iteration of the loop to yield UCHAR_MAX on any implementation. Therefore, if we want UCHAR_MAX-1, then we'd use (int)-2. For UCHAR_MAX-2, we'd use (int)-3. The entire set of data looks something like: int char unsigned -1 UCHAR_MAX -2 UCHAR_MAX-1 -3 UCHAR_MAX-2 -4 UCHAR_MAX-3 -5 UCHAR_MAX-4 -6 UCHAR_MAX-5 -7 UCHAR_MAX-6 -8 UCHAR_MAX-7 -9 UCHAR_MAX-8 -10 UCHAR_MAX-9 -11 UCHAR_MAX-10 -12 UCHAR_MAX-11 .... .... Now I've just realised a problem. Imagine a system where unsigned char has the range 0 through 65535 and where int has -32767 through 32767. The former has 65536 possible combinations while the latter only has 65535 combinations. We might have to resort to a loop if working with something other than two's complement, but I'm not sure yet. Anyway here's the code I have at the moment, I robbed some of it from old posts of yours pete: #define SIGNMAG 0 #define ONES 1 #define TWOS 2 #if -1 & 3 == 1 #define NUM_SYS SIGNMAG #elif -1 & 3 == 2 #define NUM_SYS ONES #else #define NUM_SYS TWOS #endif #if NUM_SYS != TWOS /* ----------- */ #include static void *uc_memset(void *const pv,char unsigned const val,size_t const len) { char *p = pv; char const *const pover = p + len; while (pover != p) *p++ = val; return pv; } #define UC_MEMSET(p,uc,len) (uc_memset(p,uc,len)) #else /* ------------ */ #include #define UC_AS_INT(x) UC_AS_INT_Internal( (char unsigned)(x) ) #define UC_AS_INT_Internal(x) ( x INT_MAX \ ? -(int)(UCHAR_MAX - x) - 1 \ : (int)x ) #define UC_MEMSET(p,uc,len) (memset((p),UC_AS_INT((uc)),(len))) #endif /* ----------- */ #include int main(void) { char unsigned data[24]; UC_MEMSET(data, UCHAR_MAX, sizeof data); return 0; } Feel free to make alterations if you see a better way of doing it! Martin Sep 29 '07 #3

 P: n/a "Martin Wells" > Now I've just realised a problem. Imagine a system where unsigned char has the range 0 through 65535 and where int has -32767 through 32767. The former has 65536 possible combinations while the latter only has 65535 combinations. We might have to resort to a loop if working with something other than two's complement, but I'm not sure yet. For this and other similar reasons, it would be difficult if not impossible to implement a fully conformant hosted C envirinment on an architecture with non twos-complement representation and sizeof(int) == 1 at the same time. Luckily, non twos-complement architectures can only be found in museums today. Anyway here's the code I have at the moment, I robbed some of it from old posts of yours pete: #define SIGNMAG 0 #define ONES 1 #define TWOS 2 #if -1 & 3 == 1 #define NUM_SYS SIGNMAG #elif -1 & 3 == 2 #define NUM_SYS ONES #else #define NUM_SYS TWOS #endif These tests are incorrect for two reasons: * ``-1 & 3 == 1'' is interpreted as ``-1 & (3 == 1)'' which yields 0 for all platforms. * There is no guarantee that the preprocessing be performed with the same representation as the target architecture. As a matter of fact, embedded targets with unusual arithmetics are often targetted by cross compilers running on different machines. It is a sad fact that integer representation cannot be adequately tested at the preprocessing stages. sizeof(int) == 1 cannot be evaluated be the preprocessor. One can only test the macros from : #if INT_MIN == -INT_MAX /* we are targetting a non twos-complement architecture */ # if INT_MAX < UCHAR_MAX /* Houston, we have a problem! */ # define MEMSET_IS_INADEQUATE 1 # endif # define NUM_SYS ONES_OR_SIGNMAG #else # define NUM_SYS TWOS #endif -- Chqrlie. Sep 29 '07 #4

 P: n/a Chqrlie: For this and other similar reasons, it would be difficult if not impossible to implement a fully conformant hosted C envirinment on an architecture with non twos-complement representation and sizeof(int) == 1 at the same time. Luckily, non twos-complement architectures can only be found in museums today. Unless it's prevented by the "laws of mathematics" or something like that, I allow for every possiblity when writing portable code. (A little ridiculous at times, I admit, but hey I don't make a sacrifice unless it's a sacrifice worth making). Anyway here's the code I have at the moment, I robbed some of it from old posts of yours pete: #define SIGNMAG 0 #define ONES 1 #define TWOS 2 #if -1 & 3 == 1 #define NUM_SYS SIGNMAG #elif -1 & 3 == 2 #define NUM_SYS ONES #else #define NUM_SYS TWOS #endif These tests are incorrect for two reasons: * ``-1 & 3 == 1'' is interpreted as ``-1 & (3 == 1)'' which yields 0 for all platforms. Wups. * There is no guarantee that the preprocessing be performed with the same representation as the target architecture. As a matter of fact, embedded targets with unusual arithmetics are often targetted by cross compilers running on different machines. Now I may be mistaken, but I think the requirement with C99 is that the preprocessor int types be the same as the actual C int types (including their use of number systems). Not sure if this applies to C89. It is a sad fact that integer representation cannot be adequately tested at the preprocessing stages. sizeof(int) == 1 cannot be evaluated be the preprocessor. One can only test the macros from : #if INT_MIN == -INT_MAX /* we are targetting a non twos-complement architecture */ # if INT_MAX < UCHAR_MAX /* Houston, we have a problem! */ # define MEMSET_IS_INADEQUATE 1 # endif # define NUM_SYS ONES_OR_SIGNMAG #else # define NUM_SYS TWOS #endif Great idea! What about the following then: #include #if INT_MAX >= UCHAR_MAX /* Normal memset will work just fine */ # define UC_MEMSET(p,uc,len) (memset((p),(char unsigned)(uc), (len))) #elif INT_MIN != -INT_MAX /* We've got two's complement, we can still use memset */ # include # define UC_AS_INT_Internal(x) ( x INT_MAX \ ? -(int)(UCHAR_MAX - x) - 1 \ : (int)x ) # define UC_AS_INT(x) UC_AS_INT_Internal( (char unsigned) (x) ) # define UC_MEMSET(p,uc,len) (memset((p),UC_AS_INT((uc)),(len))) #else /* int hasn't got enough unique value combinations, we can't use memset :( */ # include static void *uc_memset(void *const pv,char unsigned const val,size_t const len) { char *p = pv; char const *const pover = p + len; while (pover != p) *p++ = val; return pv; } # define UC_MEMSET(p,uc,len) (uc_memset(p,uc,len)) #endif int main(void) { char unsigned data[24]; UC_MEMSET(data, UCHAR_MAX, sizeof data); return 0; } Martin Sep 29 '07 #5

 P: n/a Martin Wells wrote: > #elif INT_MIN != -INT_MAX /* We've got two's complement, we can still use memset */ The preprocessor directive is correct, but the comment is wrong. What really matters is whether or not INT_MIN equals -INT_MAX. INT_MIN is allowed to equal -INT_MAX on implementations that use two's complement. -- pete Sep 29 '07 #6

 P: n/a On Sat, 29 Sep 2007 16:26:57 -0400, pete wrote: Martin Wells wrote: >#elif INT_MIN != -INT_MAX /* We've got two's complement, we can still use memset */ The preprocessor directive is correct, but the comment is wrong. What really matters is whether or not INT_MIN equals -INT_MAX. INT_MIN is allowed to equal -INT_MAX on implementations that use two's complement. Right, but INT_MIN is not allowed to differ from -INT_MAX on implementations that don't use two's complement. So if the #elif block is entered, you know you're dealing with two's complement. That info is not actually useful, for the reason you stated, but it's not wrong either. Sep 29 '07 #7

 P: n/a On Fri, 28 Sep 2007 10:09:38 -0700, Martin Wells wrote: > I'm trying to come up with a fully-portable macro for supplying memset with an unsigned char rather than an int. I'm going to think out loud as I go along. . . If you want to set every byte of an object to a value (other than 0 or a character constant in the basic character set), you know what that does on that object. And since that depends on the implementation, why do you want to do it fully-portably? -- Army1987 (Replace "NOSPAM" with "email") A hamburger is better than nothing. Nothing is better than eternal happiness. Therefore, a hamburger is better than eternal happiness. Sep 30 '07 #8

 P: n/a "Martin Wells" For this and other similar reasons, it would be difficult if notimpossibleto implement a fully conformant hosted C envirinment on an architecturewithnon twos-complement representation and sizeof(int) == 1 at the same time.Luckily, non twos-complement architectures can only be found in museumstoday. Unless it's prevented by the "laws of mathematics" or something like that, I allow for every possiblity when writing portable code. (A little ridiculous at times, I admit, but hey I don't make a sacrifice unless it's a sacrifice worth making). Well there are more important battles to be faught than this one. >* There is no guarantee that the preprocessing be performed with the samerepresentation as the target architecture. As a matter of fact, embeddedtargets with unusual arithmetics are often targetted by cross compilersrunning on different machines. Now I may be mistaken, but I think the requirement with C99 is that the preprocessor int types be the same as the actual C int types (including their use of number systems). Not sure if this applies to C89. Chapter and Verse ? 6.10.1p4 says for the purpose of evaluating preprocessing constant expressions (#if / #elif)preprocessing numbers act as if they have the same representation as intmax_t (or uintmax_t for unsigned variants). They leave it implementation defined if character constants convert to the same numeric value for proprocessing constant expressions and actual compilation. Could it be possible that intmax_t use twos-complement and int use sign/magnitude ? I think the Standard is not precise enough on this issue, and I don't even have a copy of C89 to check if it applies there. As for your ultimate proposal, I am still analysing it, but I don't think you can refer to unsigned char as ``char unsigned'' -- Chqrlie. Sep 30 '07 #9

 P: n/a Army1987: If you want to set every byte of an object to a value (other than 0 or a character constant in the basic character set), you know what that does on that object. And since that depends on the implementation, why do you want to do it fully-portably? I'm writing portable code for an embedded system. The microcontroller will output a byte value via ports consisting of individual pins which will be either 5 volts or 0 volts to indicate binary 1 or 0. I want to be easily able to set all ports to a given pattern (e.g. all zeros, all ones, alternating ones and zeros, two zeros then a one, etc.). Of course, the code that actually sets the pins values will be micrcontroller, library and compiler specific, but there's no reason to deportify the guts of the program. Martin Sep 30 '07 #10

 P: n/a "Martin Wells" If you want to set every byte of an object to a value (other than0 or a character constant in the basic character set), you knowwhat that does on that object. And since that depends on theimplementation, why do you want to do it fully-portably? I'm writing portable code for an embedded system. The microcontroller will output a byte value via ports consisting of individual pins which will be either 5 volts or 0 volts to indicate binary 1 or 0. I want to be easily able to set all ports to a given pattern (e.g. all zeros, all ones, alternating ones and zeros, two zeros then a one, etc.). Of course, the code that actually sets the pins values will be micrcontroller, library and compiler specific, but there's no reason to deportify the guts of the program. For the specific cases all bits 0 and all bits 1, the solution is simple: memset(array, 0, sizeof array); /* all bits 0 */ memset(array, -1, sizeof array); /* all bits 1 */ For arbitrary bit patterns, it may not be possible with memset on architectures with non twos-complement arithmetics and sizeof(int) == 1. But discussing these is a form of mental masturbation as they do not exist in the real world. Most regulars here indulge in it almost daily, but only in forums like this one, not in production code. Obfuscating calls to memset to ensure protability to the DS9K is exactly that: obfuscation. It makes your program harder to write, harder to read, more prone to bugs. -- Chqrlie. Oct 1 '07 #11

 P: n/a Chqrlie: For the specific cases all bits 0 and all bits 1, the solution is simple: memset(array, 0, sizeof array); /* all bits 0 */ memset(array, -1, sizeof array); /* all bits 1 */ For arbitrary bit patterns, it may not be possible with memset on architectures with non twos-complement arithmetics and sizeof(int) == 1. The UC_MEMSET macro takes care of that by calling a function which has a loop. If you ask me though, the C89 Standard is broken in that it doesn't provide a UC_MEMSET itself. But the again, it makes more fun for us to patch over the broken stuff :D But discussing these is a form of mental masturbation as they do not exist in the real world. Most regulars here indulge in it almost daily, but only in forums like this one, not in production code. Yes I can agree that if time is money, you're not going to be very productive by accomodating sign-magnitude machines, but it still is a bit of fun to make your code 100% portable to a certain standard. I'm doing an embedded systems project at the moment, and most people would start off as non-portable and keeping getting more and more non- portable. Instead I've decided to got the portable route... and it's going well so far :D Obfuscating calls to memset to ensure protability to the DS9K is exactly that: obfuscation. It makes your program harder to write, harder to read, more prone to bugs. Not if you hide the funky stuff in header files: #include "broken_int_uc_fixes.h" int main(void) { UC_MEMSET(whatever,UCHAR_MAX,sizeof whatever); } Martin Oct 1 '07 #12

 P: n/a "Charlie Gordon" Army1987: >>If you want to set every byte of an object to a value (other than0 or a character constant in the basic character set), you knowwhat that does on that object. And since that depends on theimplementation, why do you want to do it fully-portably? I'm writing portable code for an embedded system. The microcontrollerwill output a byte value via ports consisting of individual pins whichwill be either 5 volts or 0 volts to indicate binary 1 or 0. I want tobe easily able to set all ports to a given pattern (e.g. all zeros,all ones, alternating ones and zeros, two zeros then a one, etc.).Of course, the code that actually sets the pins values will bemicrcontroller, library and compiler specific, but there's no reasonto deportify the guts of the program. For the specific cases all bits 0 and all bits 1, the solution is simple: memset(array, 0, sizeof array); /* all bits 0 */ memset(array, -1, sizeof array); /* all bits 1 */ For arbitrary bit patterns, it may not be possible with memset on architectures with non twos-complement arithmetics and sizeof(int) == 1. But discussing these is a form of mental masturbation as they do not exist in the real world. Most regulars here indulge in it almost daily, but only in forums like this one, not in production code. Obfuscating calls to memset to ensure protability to the DS9K is exactly that: obfuscation. It makes your program harder to write, harder to read, more prone to bugs. Well said. Oct 1 '07 #13

### This discussion thread is closed

Replies have been disabled for this discussion.