Representation of integers

Mantorok Redgormor

I always see posts that involve the representation of integers, where
some poster claims that the unerlyding representation of an integer
doesn't have to reflect on the actual integer, for example:

int foo = 0;

0 can be all zeros 0x00000000 or 00000000 00000000 00000000 00000000

Then someone chimes in and says 0 doesn't have to contain all zeros..

So what is that about?

Also I have heard the same for (void *)0 or NULL, repreatedly..

How can an integer have a different underlying representation but
refer to that integer that it does not reflect?

That is, is it possible to have something like:

int foo = 0;

And the underlying representation of that zero is actually:

0x10000010 ? or some crazy nonsense like that.

Nov 13 '05 #1

Subscribe Post Reply

3211

Arthur J. O'Dwyer

On Sat, 12 Sep 2003, Mantorok Redgormor wrote:

I always see posts that involve the representation of integers, where
some poster claims that the unerlyding representation of an integer
doesn't have to reflect on the actual integer, for example:

int foo = 0;

0 can be all zeros 0x00000000 or 00000000 00000000 00000000 00000000
Sort-of-correct. In the C programming language,

0 == 000 /* octal */
0 == 0x000000 /* hex */
0 == '\0' /* character constant */

Then someone chimes in and says 0 doesn't have to contain all zeros..
Sort-of-correct. At the machine level, the mathematical concept of
"zero" can be represented by any number of bit patterns. Sure, it's
common (these days) to represent "zero" as 32 contiguous bits aligned
in a particular way and all set to the low state; but "zero" on some
machines might be 24 bits with the high-order 6 bits all set to 1, for
parity checking; or something even more unconventional.

The C language really doesn't care too much about the machine level.

But the C standard library includes a few functions that *do* deal
directly with the machine representation of concepts like "zero".
For example, 'calloc'.

So what is that about?
Also I have heard the same for (void *)0 or NULL, repreatedly..
Same idea. Some machines treat a pointer value with all-machine-level-
-bits-zero as a "null pointer" that can't be dereferenced without
a segfault or something similar. Other machines treat address "zero"
just the same as any other address, and use a different "null pointer"
representation at the machine level.

Again, C really doesn't care. In a C program, if you write 0 or
(void*)0 or NULL in a pointer context, you *will* get whatever "null
pointer representation" is appropriate for your particular machine.
You don't have to worry about this.
How can an integer have a different underlying representation but
refer to that integer that it does not reflect?
"Does not reflect"? How does 00001100 "reflect" the concept of "twelve"
more or less than 01110110? Two's complement is convenient for
arithmetic, sure; but that doesn't mean it's convenient for *everything*.
That is, is it possible to have something like:

int foo = 0;

And the underlying representation of that zero is actually:

0x10000010 ? or some crazy nonsense like that.

Yes. Except that of course you didn't mean "0x10000010"; the
prefix "0x" conventionally indicates hexadecimal notation. What
you probably meant was, "the bits 10000010." (Not that you
*couldn't* have 64-bit integers with weird representations...)
Anyway, AFAIK having an integer constant that big may invoke
undefined behavior ('int' is only guaranteed to have 16 bits
or more). So, in the spirit of pedantry, I must point out
that

if (0 == 0x10000010)
{
printf("Undefined behavior can do weird things to you.\n");
}

Bottom line: All-bits-zero does *not* mean value-zero. One
concept is much lower-level than the other. Avoid 'memset'
and 'calloc' for now, and I think you'll be okay.

HTH,
-Arthur

Nov 13 '05 #2

Tom Zych

"Arthur J. O'Dwyer" wrote:

Sort-of-correct. At the machine level, the mathematical concept of
"zero" can be represented by any number of bit patterns. Sure, it's
common (these days) to represent "zero" as 32 contiguous bits aligned
in a particular way and all set to the low state; but "zero" on some
machines might be 24 bits with the high-order 6 bits all set to 1, for
parity checking; or something even more unconventional. Bottom line: All-bits-zero does *not* mean value-zero. One
concept is much lower-level than the other. Avoid 'memset'
and 'calloc' for now, and I think you'll be okay.
This came up about a week ago, in the context of using memset() to
zero an array, and Jack Klein posted this:

[snip] From 7.18.1.1 Exact-width integer types: <quote>
The typedef name intN_t designates a signed integer type with width N,
no padding bits, and a two's complement representation. Thus, int8_t
denotes a signed integer type with a width of exactly 8 bits. The typedef name uintN_t designates an unsigned integer type with
width N. Thus, uint24_t denotes an unsigned integer type with a width
of exactly 24 bits.
<end quote> Since there are no sign bits an any unsigned int type, all that there
can be are value and padding bits. Since the exact width types
specifically have no padding, every bit in the unsigned form is a
value bit and all bits 0 in any exact width unsigned type must be the
one and only representation of the value 0 for that type.

[snip]

I'm not sure about his conclusion, though. I don't see anything in
the quote that says zero has to be represented by all zero bits. It
has to be a one-to-one mapping, but theoretically all zero bits
could represent something else and zero could be represented by
something else.

--
Tom Zych
This email address will expire at some point to thwart spammers.
Permanent address: echo 'g******@cbobk.pbz' | rot13

Nov 13 '05 #3

Arthur J. O'Dwyer

On Sat, 13 Sep 2003, Tom Zych wrote:

"Arthur J. O'Dwyer" wrote:
Sort-of-correct. At the machine level, the mathematical concept of
"zero" can be represented by any number of bit patterns. Sure, it's
common (these days) to represent "zero" as 32 contiguous bits aligned
in a particular way and all set to the low state; but "zero" on some
machines might be 24 bits with the high-order 6 bits all set to 1, for
parity checking; or something even more unconventional.
Bottom line: All-bits-zero does *not* mean value-zero. One
concept is much lower-level than the other. Avoid 'memset'
and 'calloc' for now, and I think you'll be okay.

This came up about a week ago, in the context of using memset() to
zero an array, and Jack Klein posted this:

[snip]
I'm not sure about his conclusion, though. I don't see anything in
the quote that says zero has to be represented by all zero bits. It
has to be a one-to-one mapping, but theoretically all zero bits
could represent something else and zero could be represented by
something else.

For the uint_XXt types defined in C99, yes, all-zero-bits means
"zero". [That's because unsigned types are guaranteed to be
represented in a binary notation; see N869, sec. 6.2.6.2#1.]
This is *not* true for signed types (consider ones'-complement),
the normal unsigned types (consider padding bits), pointers (well
duh), floating-point types (well duh), or whatever I've missed
in that list.

All-zero-bits does mean "zero" for 'unsigned char', and for those
special C99 types. But not in general.

HTCTU,
-Arthur

Nov 13 '05 #4

Glen Herrmannsfeldt

"Arthur J. O'Dwyer" <aj*@andrew.cmu.edu> wrote in message
news:Pi***********************************@unix46. andrew.cmu.edu...

On Sat, 13 Sep 2003, Tom Zych wrote:

"Arthur J. O'Dwyer" wrote:
Sort-of-correct. At the machine level, the mathematical concept of
"zero" can be represented by any number of bit patterns. Sure, it's
common (these days) to represent "zero" as 32 contiguous bits aligned
in a particular way and all set to the low state; but "zero" on some
machines might be 24 bits with the high-order 6 bits all set to 1, for
parity checking; or something even more unconventional.

Bottom line: All-bits-zero does *not* mean value-zero. One
concept is much lower-level than the other. Avoid 'memset'
and 'calloc' for now, and I think you'll be okay.

This came up about a week ago, in the context of using memset() to
zero an array, and Jack Klein posted this:

[snip]

I'm not sure about his conclusion, though. I don't see anything in
the quote that says zero has to be represented by all zero bits. It
has to be a one-to-one mapping, but theoretically all zero bits
could represent something else and zero could be represented by
something else.

There does seem to be a strong tendency in hardware design for the all zero
bits to represent numerical zero. Consider that the biased representation
commonly used for floating point exponents is not used for integer
representation (on any machine that I know of).

It is also common for the all zero bits floating point representation to be
numerical zero. In IEEE floating point, which uses a hidden one, zero must
have a special representation, I believe with all bits zero.

In the traditional IBM S/360 floating point, without a hidden one, there are
a variety of represenations with the numerical value of zero, but only one
has the correct arithmetic properties. The Fortran AINT function which
truncates the fractional part of a floating point number, but keeps it in
floating point form, on S/360 consists of adding the constant 0*16**7, that
is mantissa 0 and hexadecimal exponent of 7. Prenormalization will shift
out any fraction bits, nothing will be added, and post normalization will be
performed.

For pointers, C doesn't require the NULL pointer representation to have all
zero bits, but does make it more convenient. In protected mode x86 with
large model, there must be some value for the segment selector. The
hardware actually treats zero special in this case as the null selector. I
would expect any processor built with the expectation of running C code to
follow this convention, so that all zero bits would be the NULL pointer.

On any processor that didn't happen to have all bits zero as its NULL
representation, my choice would be to XOR the value with the desired
representation on any conversions between (int) and pointer types. That
would satisfy the NULL constant, and NULL pointer logical tests. It would
fail for the use of memset(), but the standard doesn't guarantee that,
anyway.

-- glen

Nov 13 '05 #5

Jack Klein

On Sat, 13 Sep 2003 13:32:53 -0400 (EDT), "Arthur J. O'Dwyer"
<aj*@andrew.cmu.edu> wrote in comp.lang.c:

On Sat, 13 Sep 2003, Tom Zych wrote:

"Arthur J. O'Dwyer" wrote:
Sort-of-correct. At the machine level, the mathematical concept of
"zero" can be represented by any number of bit patterns. Sure, it's
common (these days) to represent "zero" as 32 contiguous bits aligned
in a particular way and all set to the low state; but "zero" on some
machines might be 24 bits with the high-order 6 bits all set to 1, for
parity checking; or something even more unconventional.

Bottom line: All-bits-zero does *not* mean value-zero. One
concept is much lower-level than the other. Avoid 'memset'
and 'calloc' for now, and I think you'll be okay.

This came up about a week ago, in the context of using memset() to
zero an array, and Jack Klein posted this:

[snip]

I'm not sure about his conclusion, though. I don't see anything in
the quote that says zero has to be represented by all zero bits. It
has to be a one-to-one mapping, but theoretically all zero bits
could represent something else and zero could be represented by
something else.

For the uint_XXt types defined in C99, yes, all-zero-bits means
"zero". [That's because unsigned types are guaranteed to be
represented in a binary notation; see N869, sec. 6.2.6.2#1.]
This is *not* true for signed types (consider ones'-complement),
the normal unsigned types (consider padding bits), pointers (well
duh), floating-point types (well duh), or whatever I've missed
in that list.

All-zero-bits does mean "zero" for 'unsigned char', and for those
special C99 types. But not in general.

HTCTU,
-Arthur

The are exactly two and only two situations in C where any integer
type with a value of 0 can contain non-zero bits.

1. The representation has padding bits, in which case one or more of
the padding bits might be non-zero.

2. The implementation uses 1's complement or signed magnitude
representation for a signed integer type and that signed integer type
contains the value of -0. Note that -0 may be a trap representation
on such implementations rather than a valid value.

The meaning from the standard is quite unmistakable:

========
6.2.6.2 Integer types

1 For unsigned integer types other than unsigned char, the bits of the
object representation shall be divided into two groups: value bits and
padding bits (there need not be any of the latter). If there are N
value bits, each bit shall represent a different power of 2 between 1
and 2N-1, so that objects of that type shall be capable of
representing values from 0 to 2N - 1 using a pure binary
representation; this shall be known as the value representation. The
values of any padding bits are unspecified.
========

The quotation above clearly states that the one and only possible
representation for 0 in any unsigned integer type must be all value
bits 0.

========
2 For signed integer types, the bits of the object representation
shall be divided into three groups: value bits, padding bits, and the
sign bit. There need not be any padding bits; there shall be exactly
one sign bit. Each bit that is a value bit shall have the same value
as the same bit in the object representation of the corresponding
unsigned type (if there are M value bits in the signed type and N in
the unsigned type, then M <= N). If the sign bit is zero, it shall not
affect the resulting value. If the sign bit is one, the value shall be
modified in one of the following ways:

— the corresponding value with sign bit 0 is negated (sign and
magnitude);

— the sign bit has the value -(2N) (two’s complement);

— the sign bit has the value -(2N - 1) (one’s complement).

Which of these applies is implementation-defined, as is whether the
value with sign bit 1 and all value bits zero (for the first two), or
with sign bit and all value bits 1 (for one’s complement), is a trap
representation or a normal value. In the case of sign and
magnitude and one’s complement, if this representation is a normal
value it is called a negative zero.
========

....and the second quotation clearly states that any signed integer
type with a positive value must have the same bit representation as
the corresponding unsigned type in its value bits.

So any signed or unsigned integer type with all its value bits 0 has
the value of 0. A signed integer type with a value of 0 (as opposed
to -0, on non 2's complement systems if they support -0) must also
have its sign bit 0.

So only padding bits, if any, may be non-zero in the representation of
any C integer type with a value of 0.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq

Nov 13 '05 #6

Kevin D. Quitt

On Sun, 14 Sep 2003 03:43:08 GMT, Jack Klein <ja*******@spamcop.net>
wrote:

1 For unsigned integer types other than unsigned char, the bits of the
object representation shall be divided into two groups: value bits and
padding bits (there need not be any of the latter). If there are N
value bits, each bit shall represent a different power of 2 between 1
and 2N-1, so that objects of that type shall be capable of
representing values from 0 to 2N - 1 using a pure binary
representation; this shall be known as the value representation. The
values of any padding bits are unspecified.

Damn - there goes my hope to port C99 to my ternary computer.
--
#include <standard.disclaimer>
_
Kevin D Quitt USA 91387-4454 96.37% of all statistics are made up
Per the FCA, this address may not be added to any commercial mail list

Nov 13 '05 #7

LibraryUser

"Kevin D. Quitt" wrote:

Jack Klein <ja*******@spamcop.net> wrote:
1 For unsigned integer types other than unsigned char, the bits
of the object representation shall be divided into two groups:
value bits and padding bits (there need not be any of the
latter). If there are N value bits, each bit shall represent a
different power of 2 between 1 and 2N-1, so that objects of
that type shall be capable of representing values from 0 to 2N
- 1 using a pure binary representation; this shall be known as
the value representation. The values of any padding bits are
unspecified.

Damn - there goes my hope to port C99 to my ternary computer.

No, you can do it. You may have to restrict the actual trits you
use. The problem is similar to implementing BCD arithmetic on
hexadecimal units. :-) For example excess 3 encoding will use
the hex values 3 through 0x0c only, and others will be trap
values.

You may lose some efficiency though. :-[

--
Replies should be to the newsgroup
Chuck Falconer, on vacation.

Nov 13 '05 #8

Johan Aurér

On Sat, 13 Sep 2003, Arthur J. O'Dwyer wrote:

On Sat, 12 Sep 2003, Mantorok Redgormor wrote:
That is, is it possible to have something like:

int foo = 0;

And the underlying representation of that zero is actually:

0x10000010 ? or some crazy nonsense like that.

Yes. Except that of course you didn't mean "0x10000010"; the
prefix "0x" conventionally indicates hexadecimal notation. What
you probably meant was, "the bits 10000010." (Not that you
*couldn't* have 64-bit integers with weird representations...)
Anyway, AFAIK having an integer constant that big may invoke
undefined behavior ('int' is only guaranteed to have 16 bits
or more). So, in the spirit of pedantry, I must point out
that

if (0 == 0x10000010)
{
printf("Undefined behavior can do weird things to you.\n");
}

What? Hexadecimal 10000010 is invariably less than ULONG_MAX, so there's
no problem (even with 16-bit ints).

--
au***@axis.com

Nov 13 '05 #9

Arthur J. O'Dwyer

On Tue, 16 Sep 2003, [ISO-8859-1] Johan Aurér wrote:

On Sat, 13 Sep 2003, Arthur J. O'Dwyer wrote:
On Sat, 12 Sep 2003, Mantorok Redgormor wrote:
That is, is it possible to have something like:
int foo = 0;
And the underlying representation of that zero is actually:
0x10000010 ? or some crazy nonsense like that.

Yes. Except that of course you didn't mean "0x10000010"; the
prefix "0x" conventionally indicates hexadecimal notation. What
you probably meant was, "the bits 10000010." (Not that you
*couldn't* have 64-bit integers with weird representations...)
Anyway, AFAIK having an integer constant that big may invoke
undefined behavior ('int' is only guaranteed to have 16 bits
or more). So, in the spirit of pedantry, I must point out
that

if (0 == 0x10000010)
{
printf("Undefined behavior can do weird things to you.\n");
}

What? Hexadecimal 10000010 is invariably less than ULONG_MAX, so there's
no problem (even with 16-bit ints).

Whoops. I forgot about the promotion to 'long' if it won't fit in
an 'int'. So just tack a few more zeroes on the end of that number.

-Arthur

Nov 13 '05 #10

Kevin D. Quitt

On Tue, 16 Sep 2003 09:20:33 -0400, LibraryUser
<de**********@made.invalid> wrote:

No, you can do it. You may have to restrict the actual trits you
use. The problem is similar to implementing BCD arithmetic on
hexadecimal units. :-) For example excess 3 encoding will use
the hex values 3 through 0x0c only, and others will be trap
values.

You may lose some efficiency though. :-[

That *does* explain why my Pentium is so slow and requires so much memory.
Thanks!
--
#include <standard.disclaimer>
_
Kevin D Quitt USA 91387-4454 96.37% of all statistics are made up
Per the FCA, this address may not be added to any commercial mail list

Nov 13 '05 #11

Representation of integers

Similar topics