In article <fe**************************@posting.google.com >

Mohit Sindhwani <em*********@yahoo.com.sg> writes:

The operation of invert & add 1 is exactly what negating a number

does. So, you can actually combine the steps by doing something like

checksum = -checksum;

My understanding is that you can only do this if the data type is

signed. However, with GCC/Windows, it seems to work fine with an

unsigned variable. Not sure what the standard says on this. Help,

anyone?

In C, when x has type unsigned int, -x is just the "mathematically

correct" value of UINT_MAX + 1 - x.

Narrow types like "unsigned char" and "unsigned short" are widened

under "the usual arithmetic conversions", and -- in a fit of

stupidity -- these narrow unsigned types generally widen to *signed*

int, so using -x is in some ways dangerous, but probably no worse

than ~x + 1U in practice. You can avoid all possibilities of

undefined behavior by computing the sum in "unsigned int", though.

This is the code I used:

#include <stdio.h>

int main (void)

{

char data[4];

unsigned char a;

unsigned char b;

data[0] = 0x01;

data[1] = 0x0E;

data[2] = 0x09;

data[3] = 0x0F;

a = data[0] + data[1] + data[2] + data[3];

Note that the right hand side of the "=" operation consists

exclusively of signed items, and signed arithmetic can potentially

overflow. Changing the "data" array to "unsigned char" helps a

little bit but still leaves the arithmetic itself signed if unsigned

char widens to signed int, i.e., if UCHAR_MAX <= INT_MAX. (Whether

UCHAR_MAX <= INT_MAX is an implementation-specific detail, although

implementations on which it is false -- where UCHAR_MAX is say

65535, and INT_MAX is 32767 -- are rare. The X3J11 committee folks

had some specific situations with "surprising results" in mind when

they made up this particular widening rule, but now those situations

even depend on whether UCHAR_MAX and USHRT_MAX exceed INT_MAX,

making them unpredictable instead of "predictably surprising".

The *sane* choice, of having unsigned char widen to unsigned int

in all cases, would have been much more reasonable. Still, we

often must work with what we are given.)

In this particular case, the four values being summed are 1, 14, 9,

and 15, all of which are comfortably positive even in "char" and

quite definitely positive in "int"; the result is 39, which is

definitely smaller than all of CHAR_MAX, UCHAR_MAX, and INT_MAX.

(CHAR_MAX must be at least 127, UCHAR_MAX must be at least 255,

and INT_MAX must be at least 32767, in every C implementation.)

So this sets "a" to 39, stored in an object of type "unsigned char".

printf ("Before Invert = %2X\n", a);

a = -a;

Here "-a" widens "a" to int (if UCHAR_MAX <= INT_MAX) or to

unsigned int (if UCHAR_MAX > INT_MAX), and in either case, the

value continues to be 39. Then -a is either -39 (if the type

is signed int) or UINT_MAX + 1 - 39 (if the type is unsigned

int).

Consider the latter case for a moment; suppose UINT_MAX is 65535

-- i.e., we have 2-octet "char"s and 2-octet "int"s and sizeof(int)==1.

Then -(unsigned int)39, or -39U, is 65536 - 39, or 65497U, or

0xffd9U. In fact, we are guaranteed that UINT_MAX is one less than

some power of two, and is at least 65535, so the possible values

are 0xffd9U, 0x1ffd9U, 0x3ffd9U, 0x7ffd9U, 0xfffd9U, 0x1fffd9U,

0x3fffd9U, .... The key thing to note here is that *every bit

pattern ends in "d9"*. Mathematically, we can say that every one

of these "is congruent to 0xd9 mod 256". This is an absolutely

*wonderful* result, as it guarantees that we can get the answer we

want.

On the other hand, perhaps we have a more typical implementation

where UCHAR_MAX < INT_MAX, so that "-a" has type signed int. In

this case, becase we started with the known result of 39, the value

is definitely -39. Storing this back in a variable of type "unsigned

char" sets that variable to UCHAR_MAX + 1 - 39. This is quite

similar to what we just saw with the unsigned ints, except that

UCHAR_MAX is allowed to be as small as 255, so that the value stored

back into "a" is 0xd9, or perhaps 0x1d9, or possibly 0x3d9, or

0x7d9, 0xfd9, 0x1fd9, 3fd9, 7fd9, ffd9, 1ffd9, etc., ad nauseam.

The same "wonderful thing" has happened: the value is always

congruent to 0xd9 mod 256.

Thus, in this case, "a = -a" is fine, and the low 8 bits of the

result are always what we wanted. But note that all of this depends

on the assumption that "a" is initially 39. What if it has some

other value?

Suppose in particular that "a" had the value 32768 (remember that

"a" is "unsigned char" and UCHAR_MAX might be 65535, as is the case

on some C systems on digital signal processor CPUs). Suppose

further that INT_MAX is 32767 and UINT_MAX is 65535. Then unsigned

char widens to unsigned int, and we are OK. This particular

implementation forces the "unsigned widens to unsigned" case, so

that the error made back in the 1980s -- choosing the ridiculous

"unsigned widens to signed except when it doesn't" rule -- does

not affect us. Only if UCHAR_MAX is (say) 32767 will this system

have "unsigned char widens to signed int", and in that case, -a

can be no greater in magnitude than -32767, which still fits

(INT_MIN must also be at least 32767 in magnitude).

So where *can* things go wrong? As I implied earlier, the real

problem lies in the summation of a "data" array, which in

"real code" is probably going to do something like:

unsigned char data[N], sum;

int i;

... /* fill in the "data" array */

sum = 0;

for (i = 0; i < N; i++)

sum += data[i];

If sum is "unsigned char" and data[i] is also "unsigned char", but

UCHAR_MAX is 32767 while INT_MAX is also 32767, then the arithmetic

happens in *signed* int, and we may attempt to compute (signed

int)32767 + (signed int)32767. This is allowed to produce a runtime

"overflow" trap and cause the code to abort. If we want to be

100% paranoid, we must write:

sum = (unsigned int)sum + (unsigned int)data[i];

which forces the arithmetic to be done in "unsigned" fashion,

with its nice mathematical properties.

In practice, real implementations do not have this problem, because

UCHAR_MAX is almost always 255 and INT_MAX guaranteed to be at

least 32767. In this case, the unsigned char values widen to signed

int, but the sum itself cannot exceed 510 (255+255), which certainly

fits in an int. Storing the result back into the "unsigned char"

in "sum" reduces it mod 256, making everything work on the next

trip through the loop. Things can only go wrong when UCHAR_MAX

(a) does not exceed INT_MAX, yet (b) is sufficiently close to

INT_MAX so that UCHAR_MAX + UCHAR_MAX > INT_MAX. Even then, the

implementation must do something unusual on integer overflow.

If, back in the 1980s, the X3J11 committee had used the sane rules

that the Bell Labs compilers used (i.e., "narrow unsigned types

widen to unsigned int"), even this remote possibility would be

ruled out. And as long as you only ever use 2s complement systems

that ignore signed integer overflow, none of this matters anyway.

--

In-Real-Life: Chris Torek, Wind River Systems (BSD engineering)

Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603

email: forget about it

http://67.40.109.61/torek/index.html (for the moment)

Reading email is like searching for food in the garbage, thanks to spammers.