FAQ-Question

Till Crueger wrote:

Hi,

I stumbled upon the following code to determine byte ordering in the FAQ:

union {
int i;
char c[sizeof(int)];
} x;
/* do stuff */

In this case it seems that the union was used to convert a memory section
from int to char[]. However I've learned that you couldn't use unions to
reinterpret a memory area, because you can't always be sure that the
fields realy do use the same positions. So why is this code portable?

It's not portable but not for the reason you mentioned. There cannot
be padding before the elements of a union, the first byte of a union is
the first byte of each of its members, always. The problem is that
retrieving the value of a union element is only defined when the member
retrieved was the member most recently assigned, so assigning i and
then examining c is not defined.

Robert Gamble

Jun 20 '06 #2

Dann Corbit

"Till Crueger" <Ti****@gmx.net> wrote in message
news:pa****************************@gmx.net...

Hi,

I stumbled upon the following code to determine byte ordering in the FAQ:

union {
int i;
char c[sizeof(int)];
} x;
/* do stuff */

In this case it seems that the union was used to convert a memory section
from int to char[]. However I've learned that you couldn't use unions to
reinterpret a memory area, because you can't always be sure that the
fields realy do use the same positions. So why is this code portable?
From : ISO/IEC 9899:1999 (E) (which is the current standard) we have this:
5 One special guarantee is made in order to simplify the use of unions: if a
union contains several structures that share a common initial sequence (see
below), and if the union object currently contains one of these structures,
it is permitted to inspect the common initial part of any of them anywhere
that a declaration of the complete type of the union is visible. Two
structures share a common initial sequence if corresponding members have
compatible types (and, for bit-fields, the same widths) for a sequence of
one or more initial members.
Thanks,
Till

--
Please add "Salt and Pepper" to the subject line to bypass my spam filter

Jun 20 '06 #3

Christopher Benson-Manica

Till Crueger <Ti****@gmx.net> wrote:

union {
int i;
char c[sizeof(int)];
unsigned char c[sizeof(int)];

I'm surprised that it's apparently not like that in the FAQ.
} x;
/* do stuff */ Please add "Salt and Pepper" to the subject line to bypass my spam filter

You could always just use Google mail :-)

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.

Jun 20 '06 #4

Dann Corbit wrote:

"Till Crueger" <Ti****@gmx.net> wrote in message
news:pa****************************@gmx.net...
Hi,

I stumbled upon the following code to determine byte ordering in the FAQ:

union {
int i;
char c[sizeof(int)];
} x;
/* do stuff */

In this case it seems that the union was used to convert a memory section
from int to char[]. However I've learned that you couldn't use unions to
reinterpret a memory area, because you can't always be sure that the
fields realy do use the same positions. So why is this code portable?

From : ISO/IEC 9899:1999 (E) (which is the current standard) we have this:
5 One special guarantee is made in order to simplify the use of unions: if a
union contains several structures that share a common initial sequence (see
below), and if the union object currently contains one of these structures,
it is permitted to inspect the common initial part of any of them anywhere
that a declaration of the complete type of the union is visible. Two
structures share a common initial sequence if corresponding members have
compatible types (and, for bit-fields, the same widths) for a sequence of
one or more initial members.

That does not have any bearing on the presented example though as there
are no structures involved.

Robert Gamble

Jun 20 '06 #5

Till Crueger posted:

Hi,

I stumbled upon the following code to determine byte ordering in the FAQ:
union {
int i;
char c[sizeof(int)];
} x;
/* do stuff */

(I come from a background in C++, so I apologise if anything forthcoming
which I write amounts to misinformation)

All elements of a union have the same address.

However, I believe that the Standard forbids you to write to one member
of a union, and then to subsequently access a different member.

An unsigned integer type in C can have "trapping bits", which are bits
which don't take part in the representation of the variable's value, so
you can't rely on having sizeof(unsigned) amount of bytes to work with.

If there were no trapping bits, code akin to the following might be
acceptable:

#include <stddef.h>
#include <limits.h>
#include <stdlib.h>

typedef unsigned UType;

int main(void)
{
UType guinea_pig = 0;

UType to_and_with;

unsigned i;

const unsigned char *p = (const unsigned char*)&guinea_pig;

for( i = 0; i != sizeof(guinea_pig); ++i )
{
guinea_pig |= (UType)i <<
( (UType)i * CHAR_BIT );
}
const unsigned char * const p_over =
(const unsigned char * const)(&guinea_pig + 1);

for( ; p != p_over; ++p )
{
printf( "%i \n", *p );
}
system("PAUSE");
}

--

Frederick Gotham

Jun 20 '06 #6

Jack Klein

On Tue, 20 Jun 2006 19:31:22 GMT, Frederick Gotham
<fg*******@SPAM.com> wrote in comp.lang.c:

Till Crueger posted:
Hi,

I stumbled upon the following code to determine byte ordering in the FAQ:

union {
int i;
char c[sizeof(int)];
} x;
/* do stuff */

(I come from a background in C++, so I apologise if anything forthcoming
which I write amounts to misinformation)

All elements of a union have the same address.

However, I believe that the Standard forbids you to write to one member
of a union, and then to subsequently access a different member.

No, not actually "forbids". The closest you could get to "forbids" in
the C, or C++ for that matter, is a constraint violation which
requires that the compiler issue a diagnostic. What you get when you
do this is undefined behavior, in either language.

But there is one exception. Any object may be inspected as an array
of unsigned characters, as unsigned chars have no trap representations
in either language.

Here are paragraphs 4 and 5 of section 6.2.6 Representations of types,
6.2.6.1 General of the C standard.

"Values stored in non-bit-field objects of any other object type
consist of n x CHAR_BIT bits, where n is the size of an object of that
type, in bytes. The value may be copied into an object of type
unsigned char [n] (e.g., by memcpy); the resulting set of bytes is
called the object representation of the value. Values stored in
bit-fields consist of m bits, where m is the size specified for the
bit-field. The object representation is the set of m bits the
bit-field comprises in the addressable storage unit holding it. Two
values (other than NaNs) with the same object representation compare
equal, but values that compare equal may have different object
representations.

"Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not have
character type, the behavior is undefined. If such a representation is
produced by a side effect that modifies all or any part of the object
by an lvalue expression that does not have character type, the
behavior is undefined. Such a representation is called
a trap representation."

Note that the C standard still uses the phrase "character type", even
though C specifically allows signed char to have trap representations,
so unsigned char is the only safe type to use.

Both C and C++ allow any memory legally accessible to a program to be
accessed as an array of unsigned chars.
An unsigned integer type in C can have "trapping bits", which are bits
which don't take part in the representation of the variable's value, so
you can't rely on having sizeof(unsigned) amount of bytes to work with.

You are a little confused here. An object occupies the space
equivalent to an array of unsigned chars of the length sizeof(object).
The fact that there might be padding bits (not "trapping bits") that
do not contribute to the value of the object does not change this
fact. And it is a fact that some combination of the padding bits,
which you might set while mucking around with the object
representation, could create a trap value is not relevant if you do
not again access the object by its own type after you modify its bits.

But you can, for example, with your own loop copy the representation
of an object one unsigned char at a time into another object of the
same type, and the destination object will then have the same value as
the original.

Finally, as for the notion of unsigned, or even signed, integer types
having padding bits, I wouldn't actually worry about it. While such a
representation is allowed by both language standards (other than for
the character types), it just isn't a factor in the real world. There
may be a very few systems with such obsolete architectures around, but
the changes of any particular C programmer ever writing code for one
is extremely remote.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html

Jun 20 '06 #7

Jack Klein posted:

"Values stored in non-bit-field objects of any other object type
consist of n x CHAR_BIT bits, where n is the size of an object of that
type, in bytes.

That doesn't make sense... because we all know that an unsigned integer
type can have trapping bits (except unsigned char of course), so maybe
the quote was taken out of context?

You are a little confused here.

I don't believe that I am.

Finally, as for the notion of unsigned, or even signed, integer types
having padding bits, I wouldn't actually worry about it.

I refer to them as trapping bits. Assume we're working with the following
machine:

CHAR_BIT == 8
sizeof(unsigned) == 4

Therefore, an unsigned integer consists of 32 bits in memory:

0000 0000 0000 0000 0000 0000 0000 0000

If I asked you, "Make a guess how many unique values can be stored in an
unsigned int on this platform?", then a good guess would be 2^32.

However, that was just a guess. Just because the "object representation"
of an unsigned int consists of 32 bits, that doesn't mean that the "value
representation" consists of 32 bits.

Maybe this system only uses 30 bits for the value representation, ie.:
1100 0000 0000 0000 0000 0000 0000 0000
^^
Trapping bits
Let's attempt to set the LSB to zero, the next LSB to 1, the next LSB to
2, and the MSB to 3 (as my previous code snippet attempts to do). This
could very well turn out as:
1100 0011 0000 0010 0000 0001 0000 0000
^^
Trapping bits
If we then use an unsigned char* to analyse each byte, we get:

0
1
2
195
The MSB's value is 195, when we obviously want it to be 3. The code I
posted elsewhere in the thread is now broken, and it's all because of
trapping bits.

While such a
representation is allowed by both language standards (other than for
the character types), it just isn't a factor in the real world.

It's also allowed in C++, which has the most modern standard of them all.
If it really wasn't a factor in the real world, then the standard would
forbid an implemenation to use trapping bits.
--

Frederick Gotham

Jun 20 '06 #8

Dann Corbit

"Frederick Gotham" <fg*******@SPAM.com> wrote in message
news:Xn*************************@194.125.133.14...

Jack Klein posted:

"Values stored in non-bit-field objects of any other object type
consist of n x CHAR_BIT bits, where n is the size of an object of that
type, in bytes.

That doesn't make sense... because we all know that an unsigned integer
type can have trapping bits (except unsigned char of course), so maybe
the quote was taken out of context?

You are a little confused here.

I don't believe that I am.

Finally, as for the notion of unsigned, or even signed, integer types
having padding bits, I wouldn't actually worry about it.

I refer to them as trapping bits. Assume we're working with the following
machine:

CHAR_BIT == 8
sizeof(unsigned) == 4

Therefore, an unsigned integer consists of 32 bits in memory:

0000 0000 0000 0000 0000 0000 0000 0000

If I asked you, "Make a guess how many unique values can be stored in an
unsigned int on this platform?", then a good guess would be 2^32.

However, that was just a guess. Just because the "object representation"
of an unsigned int consists of 32 bits, that doesn't mean that the "value
representation" consists of 32 bits.

Maybe this system only uses 30 bits for the value representation, ie.:
1100 0000 0000 0000 0000 0000 0000 0000
^^
Trapping bits
Let's attempt to set the LSB to zero, the next LSB to 1, the next LSB to
2, and the MSB to 3 (as my previous code snippet attempts to do). This
could very well turn out as:
1100 0011 0000 0010 0000 0001 0000 0000
^^
Trapping bits
If we then use an unsigned char* to analyse each byte, we get:

0
1
2
195
The MSB's value is 195, when we obviously want it to be 3. The code I
posted elsewhere in the thread is now broken, and it's all because of
trapping bits.

While such a
representation is allowed by both language standards (other than for
the character types), it just isn't a factor in the real world.

It's also allowed in C++, which has the most modern standard of them all.
If it really wasn't a factor in the real world, then the standard would
forbid an implemenation to use trapping bits.

The example initializes the integer with a value that is not a trap
representation [IIRC]. How did the integer in your example get the trapping
bits set? (If zero is allowed to represent a trapping bit rather than one,
that would severely limit the usefulness of memset() -- in such a case the
standard should clearly state that memset() can only be used safely on
unsigned char data type, not to mention calloc() etc.).

The O.P. said the example came from the C-FAQ, but I do not see it in there.
I don't have the C-FAQ book handy {it's at home}, so maybe it is in the book
version?

The rtfm.mit.edu C-FAQ does not use the union example (though it does
mention unions as a possible solution), so I am not sure where it [the OP's
example] came from. This is the endian test in the C-FAQ:

20.9: How can I determine whether a machine's byte order is big-endian
or little-endian?

A: One way is to use a pointer:

int x = 1;
if(*(char *)&x == 1)
printf("little-endian\n");
else printf("big-endian\n");

It's also possible to use a union.

See also question 10.16.

References: H&S Sec. 6.1.2 pp. 163-4.

Here is 10.16:

10.16: How can I use a preprocessor #if expression to tell if a machine
is big-endian or little-endian?

A: You probably can't. (Preprocessor arithmetic uses only long
integers, and there is no concept of addressing.) Are you
sure you need to know the machine's endianness explicitly?
Usually it's better to write code which doesn't care.
See also question 20.9.

References: ISO Sec. 6.8.1; H&S Sec. 7.11.1 p. 225.

Frederick Gotham

Jun 20 '06 #9

Frederick Gotham <fg*******@SPAM.com> writes:

Jack Klein posted:
"Values stored in non-bit-field objects of any other object type
consist of n x CHAR_BIT bits, where n is the size of an object of that
type, in bytes.

That doesn't make sense... because we all know that an unsigned integer
type can have trapping bits (except unsigned char of course), so maybe
the quote was taken out of context?

You are a little confused here.

I don't believe that I am.

Finally, as for the notion of unsigned, or even signed, integer types
having padding bits, I wouldn't actually worry about it.

I refer to them as trapping bits.

Why on Earth would you do that? The standard calls them padding bits.
The term "trapping bits" is misleading; they don't necessarily cause
any kind of trap.

The representation of an unsigned type consists of CHAR_BIT*sizeof(type)
bits. These are divided into value bits and padding bits (there may
or may not be any of the latter). Each representation either
represents a value (which is determined entirely by the value bits) or
is a trap representation; if there are no padding bits, there are no
trap representations. The manner in which padding bits can create
trap representations is unspecified.
It's also allowed in C++, which has the most modern standard of them all.
If it really wasn't a factor in the real world, then the standard would
forbid an implemenation to use trapping bits.

Not necessarily. There are plenty of things in the C standard that
are intended to allow for *possibilities* that may be unlikely to
appear on modern machines. It's a good thing to allow conforming C
implementations on older systems. It's also a good thing to allow for
future developments.

For example, I'm currently logged into a system with the following
characteristics:

CHAR_BIT = 8
sizeof(short) = 8 (64 bits)
sizeof(int) = 8 (64 bits)
sizeof(long) = 8 (64 bits)

SHRT_MAX = 2147483647 (32 padding bits)
USHRT_MAX = 4294967295 (32 padding bits)

INT_MAX = 35184372088831 (18 padding bits)
UINT_MAX = 18446744073709551615 (no padding bits)

LONG_MAX = 9223372036854775807 (no padding bits)
ULONG_MAX = 18446744073709551615 (no padding bits)

(It's a Cray Y/MP EL running Unicos 9.0, basically an obsolete
supercomputer.)

I'm sure that some code would break when ported to this system, but
there's plenty of code that works just fine. The need to assume that
a 64-bit type has 64 value bits is not as great as you might think;
usually all you care about is the range.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Jun 20 '06 #10

Dann Corbit

"Robert Gamble" <rg*******@gmail.com> wrote in message
news:11*********************@p79g2000cwp.googlegro ups.com...

Dann Corbit wrote:
"Till Crueger" <Ti****@gmx.net> wrote in message
news:pa****************************@gmx.net...
> Hi,
>
> I stumbled upon the following code to determine byte ordering in the
> FAQ:
>
> union {
> int i;
> char c[sizeof(int)];
> } x;
> /* do stuff */
>
> In this case it seems that the union was used to convert a memory
> section
> from int to char[]. However I've learned that you couldn't use unions
> to
> reinterpret a memory area, because you can't always be sure that the
> fields realy do use the same positions. So why is this code portable?
From : ISO/IEC 9899:1999 (E) (which is the current standard) we have
this:
5 One special guarantee is made in order to simplify the use of unions:
if a
union contains several structures that share a common initial sequence
(see
below), and if the union object currently contains one of these
structures,
it is permitted to inspect the common initial part of any of them
anywhere
that a declaration of the complete type of the union is visible. Two
structures share a common initial sequence if corresponding members have
compatible types (and, for bit-fields, the same widths) for a sequence of
one or more initial members.

That does not have any bearing on the presented example though as there
are no structures involved.

Rather astounding.

That would mean that this is perfectly fine:
#include <stdio.h>

int main(void)
{
typedef union foo_u {
struct a {
unsigned char carr[sizeof(unsigned int)];
} aa;
struct b {
unsigned int ui;
} bb;
} foo;

foo bar;
bar.bb.ui = 1;
printf("%d\n", (unsigned)bar.aa.carr[0]);
return 0;
}

But this is not:

#include <stdio.h>

int main(void)
{
typedef union foo_u {
unsigned char carr[sizeof(unsigned int)];
unsigned int ui;
} foo;

foo bar;
bar.ui = 1;
printf("%d\n", (unsigned)bar.carr[0]);
return 0;
}

I think that there must be a defect lurking there somewhere.
Robert Gamble

Jun 21 '06 #11

dcorbit

"Dann Corbit" <dc*****@connx.com> wrote in message
news:e7**********@nntp.aioe.org...

"Robert Gamble" <rg*******@gmail.com> wrote in message
news:11*********************@p79g2000cwp.googlegro ups.com...
Dann Corbit wrote:
"Till Crueger" <Ti****@gmx.net> wrote in message
news:pa****************************@gmx.net...
> Hi,
>
> I stumbled upon the following code to determine byte ordering in the
> FAQ:
>
> union {
> int i;
> char c[sizeof(int)];
> } x;
> /* do stuff */
>
> In this case it seems that the union was used to convert a memory
> section
> from int to char[]. However I've learned that you couldn't use unions
> to
> reinterpret a memory area, because you can't always be sure that the
> fields realy do use the same positions. So why is this code portable?

From : ISO/IEC 9899:1999 (E) (which is the current standard) we have
this:
5 One special guarantee is made in order to simplify the use of unions:
if a
union contains several structures that share a common initial sequence
(see
below), and if the union object currently contains one of these
structures,
it is permitted to inspect the common initial part of any of them
anywhere
that a declaration of the complete type of the union is visible. Two
structures share a common initial sequence if corresponding members have
compatible types (and, for bit-fields, the same widths) for a sequence
of
one or more initial members.
That does not have any bearing on the presented example though as there
are no structures involved.

Rather astounding.

That would mean that this is perfectly fine:
#include <stdio.h>

int main(void)
{
typedef union foo_u {
struct a {
unsigned char carr[sizeof(unsigned int)];
} aa;
struct b {
unsigned int ui;
} bb;
} foo;

foo bar;
bar.bb.ui = 1;
printf("%d\n", (unsigned)bar.aa.carr[0]);

%u, of course, for both.
return 0;
}

But this is not:

#include <stdio.h>

int main(void)
{
typedef union foo_u {
unsigned char carr[sizeof(unsigned int)];
unsigned int ui;
} foo;

foo bar;
bar.ui = 1;
printf("%d\n", (unsigned)bar.carr[0]);
see above
return 0;
}

I think that there must be a defect lurking there somewhere.
Robert Gamble

Jun 21 '06 #12

jjf

Dann Corbit wrote:

"Robert Gamble" <rg*******@gmail.com> wrote in message
news:11*********************@p79g2000cwp.googlegro ups.com...
Dann Corbit wrote:
"Till Crueger" <Ti****@gmx.net> wrote in message
news:pa****************************@gmx.net...
>
> union {
> int i;
> char c[sizeof(int)];
> } x;
> /* do stuff */
>
> In this case it seems that the union was used to convert a memory
> section
> from int to char[]. However I've learned that you couldn't use unions
> to
> reinterpret a memory area, because you can't always be sure that the
> fields realy do use the same positions. So why is this code portable?

From : ISO/IEC 9899:1999 (E) (which is the current standard) we have
this:
5 One special guarantee is made in order to simplify the use of unions:
if a
union contains several structures that share a common initial sequence
(see
below), and if the union object currently contains one of these
structures,
it is permitted to inspect the common initial part of any of them
anywhere
that a declaration of the complete type of the union is visible. Two
structures share a common initial sequence if corresponding members have
compatible types (and, for bit-fields, the same widths) for a sequence of
one or more initial members.

That does not have any bearing on the presented example though as there
are no structures involved.

Rather astounding.

That would mean that this is perfectly fine:
#include <stdio.h>

int main(void)
{
typedef union foo_u {
struct a {
unsigned char carr[sizeof(unsigned int)];
} aa;
struct b {
unsigned int ui;
} bb;
} foo;

foo bar;
bar.bb.ui = 1;
printf("%d\n", (unsigned)bar.aa.carr[0]);
return 0;
}

Are "array of unsigned char" and "unsigned int" compatible types?
Unless I really misunderstand the concept, they are not; in which case
the quoted section does not apply here and this is not fine.

Jun 21 '06 #13

Dann Corbit posted:

I think that there must be a defect lurking there somewhere.

Depends how you look at it. Consider how you can't return an array by
value, but the following is perfectly okay:
struct ArrayWrapper {
int array[64];
};
ArrayWrapper ReturnByValue(void);

--

Frederick Gotham

Jun 21 '06 #14

Dann Corbit posted:

The example initializes the integer with a value that is not a trap
representation [IIRC]. How did the integer in your example get the
trapping bits set?

Maybe they're set all the time? The Standard doesn't specify how the
implementation must use any "superfluous" bits (i.e. bits which don't
take part in the value representation).

(If zero is allowed to represent a trapping bit
rather than one, that would severely limit the usefulness of memset()

(It appears that you know this already, but just to be explicit:)

The Standard specifies that "all bits zero" is a valid zero value for all
unsigned integer types. (By saying "all bits zero", I refer to value both
representations bits and any superfluous bits)
20.9: How can I determine whether a machine's byte order is big-endian
or little-endian?

A: One way is to use a pointer:

int x = 1;
if(*(char *)&x == 1)
printf("little-endian\n");
else printf("big-endian\n");

If you've got 4 bytes in an int, then there's 24 possible byte-order
systems -- not just Little-endian and Big-endian

0123 1023 2013 3012
0132 1032 2031 3021
0213 1203 2103 3102
0231 1230 2130 3120
0312 1302 2301 3201
0321 1320 2310 3210

--

Frederick Gotham

Jun 21 '06 #15

Dann Corbit wrote:

"Robert Gamble" <rg*******@gmail.com> wrote in message
news:11*********************@p79g2000cwp.googlegro ups.com...
Dann Corbit wrote:
"Till Crueger" <Ti****@gmx.net> wrote in message
news:pa****************************@gmx.net...
> Hi,
>
> I stumbled upon the following code to determine byte ordering in the
> FAQ:
>
> union {
> int i;
> char c[sizeof(int)];
> } x;
> /* do stuff */
>
> In this case it seems that the union was used to convert a memory
> section
> from int to char[]. However I've learned that you couldn't use unions
> to
> reinterpret a memory area, because you can't always be sure that the
> fields realy do use the same positions. So why is this code portable?

From : ISO/IEC 9899:1999 (E) (which is the current standard) we have
this:
5 One special guarantee is made in order to simplify the use of unions:
if a
union contains several structures that share a common initial sequence
(see
below), and if the union object currently contains one of these
structures,
it is permitted to inspect the common initial part of any of them
anywhere
that a declaration of the complete type of the union is visible. Two
structures share a common initial sequence if corresponding members have
compatible types (and, for bit-fields, the same widths) for a sequence of
one or more initial members.
That does not have any bearing on the presented example though as there
are no structures involved.

Rather astounding.

If you say so.
That would mean that this is perfectly fine:
#include <stdio.h>

int main(void)
{
typedef union foo_u {
struct a {
unsigned char carr[sizeof(unsigned int)];
} aa;
struct b {
unsigned int ui;
} bb;
} foo;

foo bar;
bar.bb.ui = 1;
printf("%d\n", (unsigned)bar.aa.carr[0]);
return 0;
}

I don't think so. aa and bb do not "share a common initial sequence"
as aa.carr and bb.ui are not compatible types so the exception rule
once again does not apply.

Robert Gamble

Jun 21 '06 #16

Frederick Gotham wrote:

Dann Corbit posted:

The example initializes the integer with a value that is not a trap
representation [IIRC]. How did the integer in your example get the
trapping bits set?

Maybe they're set all the time? The Standard doesn't specify how the
implementation must use any "superfluous" bits (i.e. bits which don't
take part in the value representation).

(If zero is allowed to represent a trapping bit
rather than one, that would severely limit the usefulness of memset()

(It appears that you know this already, but just to be explicit:)

The Standard specifies that "all bits zero" is a valid zero value for all
unsigned integer types. (By saying "all bits zero", I refer to value both
representations bits and any superfluous bits)

Actually, the Standard guarantees this for all integer types, not just
unsigned.

20.9: How can I determine whether a machine's byte order is big-endian
or little-endian?

A: One way is to use a pointer:

int x = 1;
if(*(char *)&x == 1)
printf("little-endian\n");
else printf("big-endian\n");

If you've got 4 bytes in an int, then there's 24 possible byte-order
systems -- not just Little-endian and Big-endian

0123 1023 2013 3012
0132 1032 2031 3021
0213 1203 2103 3102
0231 1230 2130 3120
0312 1302 2301 3201
0321 1320 2310 3210

The question is clearly addressing the case where the machine is known
to be either little-endian or big-endian.

Robert Gamble

Jun 21 '06 #17

Jack Klein

On Tue, 20 Jun 2006 23:00:34 GMT, Frederick Gotham
<fg*******@SPAM.com> wrote in comp.lang.c:

Jack Klein posted:

"Values stored in non-bit-field objects of any other object type
consist of n x CHAR_BIT bits, where n is the size of an object of that
type, in bytes.

That doesn't make sense... because we all know that an unsigned integer
type can have trapping bits (except unsigned char of course), so maybe
the quote was taken out of context?

You took the quotation out of context. I quoted two paragraphs of the
C standard exactly, and in their entirety.

There is no such thing as a "trapping bit". The term does not exist
in either the C or C++ standard. The defined terms are padding bits
and trap representation. They may be related.

You are a little confused here.

I don't believe that I am.

You certainly are. The object representation of a type in both C and
C++ contains an integral number of bytes. If the type has padding
bits, they are contained within the bytes of the object
representation.

Finally, as for the notion of unsigned, or even signed, integer types
having padding bits, I wouldn't actually worry about it.

I refer to them as trapping bits. Assume we're working with the following
machine:

Then you have invented a meaningless term that is not in either
language standard. Making up your own terms outside of those defined
by the standard is futile, and makes serious discussion in language
groups difficult to impossible.

There are padding bits, and some combinations of those padding, if
they exist, might all by themselves make the object representation a
trap representation for a give object type. Also, in the signed
integer types, it is possible that there is exactly one combination of
value bits, with the sign bit set to 1, that might be a trap
representation.
CHAR_BIT == 8
sizeof(unsigned) == 4

Therefore, an unsigned integer consists of 32 bits in memory:

0000 0000 0000 0000 0000 0000 0000 0000

If I asked you, "Make a guess how many unique values can be stored in an
unsigned int on this platform?", then a good guess would be 2^32.
I wouldn't guess. That's what <limits.h> is for.
However, that was just a guess. Just because the "object representation"
of an unsigned int consists of 32 bits, that doesn't mean that the "value
representation" consists of 32 bits.
It was your guess. I looked at the implementation's <limits.h> [1]
and determined that UINT_MAX was 1073741823, and quickly deduced that
there were 30 value bits in an unsigned int. No guesswork involved.
Maybe this system only uses 30 bits for the value representation, ie.:
1100 0000 0000 0000 0000 0000 0000 0000
^^
Trapping bits
No, padding bits. We don't make up our own terms here, we use the
ones defined by the standard. Your notion may have value in some
context, but not here.
Let's attempt to set the LSB to zero, the next LSB to 1, the next LSB to
2, and the MSB to 3 (as my previous code snippet attempts to do). This
could very well turn out as:
1100 0011 0000 0010 0000 0001 0000 0000
^^
Trapping bits
If we then use an unsigned char* to analyse each byte, we get:

0
1
2
195
The MSB's value is 195, when we obviously want it to be 3. The code I
posted elsewhere in the thread is now broken, and it's all because of
trapping bits.

You keep repeating this meaningless term.

While such a
representation is allowed by both language standards (other than for
the character types), it just isn't a factor in the real world.

It's also allowed in C++, which has the most modern standard of them all.
If it really wasn't a factor in the real world, then the standard would
forbid an implemenation to use trapping bits.

Apparently you did not read my post closely enough to realize that by
"both language standards" I meant both C and C++. As for the ISO C++
standard being the "most modern standard of them all", both language
standards have had TCs since their creation, but the year of the base
current C standard is 1999, that of the C++ standard 1998.

In fact, C++ does not yet incorporate the "long long" minimum 64 bit
integer type, although it will someday.

And finally, as to the usefulness of worrying about padding bits in
integer types, I read Keith's post about working on a Cray else
thread. I never have, and Keith didn't say, but I'd be willing to bet
that the padding bits are simply ignored, and never cause any sort of
trap or unexpected error in a calculation.

The whole thing about padding bits being able to cause some sort of
trap seems to date from the era of some very early, and now completely
obsolete, main frames. These beasts did not have integer arithmetic
hardware, only floating point. So some particular combination of bits
in a value told the hardware that it was dealing with an integer type
instead of a floating point type. Naturally, swizzling those bits
could lead to catastrophic results.

Homework assignment: A 500 word essay on the reasons not to make up
terms not defined by the language standard.

Research project: Locate at least one non-obsolete platform where a
combination of padding bits in an unsigned integer type actually
causes a trap or calculation error.

When you've finished the research project successfully, please post
the results here.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html

Jun 21 '06 #18

Jack Klein posted:

Homework assignment: A 500 word essay on the reasons not to make up
terms not defined by the language standard.

If that wasn't so humourous, I'd think you're being patronising.

I mentioned elsewhere on the newsgroup that I stand corrected as regards
the naming of any "superfluous bits".
--

Frederick Gotham

Jun 21 '06 #19

Frederick Gotham <fg*******@SPAM.com> writes:
[...]

The Standard specifies that "all bits zero" is a valid zero value for all
unsigned integer types. (By saying "all bits zero", I refer to value both
representations bits and any superfluous bits)

The C99 standard doesn't say this; it was added in a TC. But yes,
that guarantee is effectively part of the standard.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Jun 21 '06 #20

CBFalconer

Jack Klein wrote:

.... snip ...
Finally, as for the notion of unsigned, or even signed, integer
types having padding bits, I wouldn't actually worry about it.
While such a representation is allowed by both language standards
(other than for the character types), it just isn't a factor in
the real world. There may be a very few systems with such
obsolete architectures around, but the changes of any particular
C programmer ever writing code for one is extremely remote.

Actually there are many systems that have these 'hidden bits'.
Think of any system with ECC memory. In general no access is given
to the ECC bits when objects are treated as arrays of unsigned
chars, instead the hardware manipulates the larger word to make it
byte accessible in some magic manner or other, so the programmer is
never aware of it. But the hardware designer is aware.

I dare say in excess of 80% of the C programmers here have no idea
what I am talking about, but that 80% of the embedded system
programmers understand very well.

--
"I don't know where bin Laden is. I have no idea and really
don't care. It's not that important." - G.W. Bush, 2002-03-13
"No, we've had no evidence that Saddam Hussein was involved
with September the 11th." - George Walker Bush 2003-09-17

Jun 21 '06 #21

Old Wolf

Jack Klein wrote:

Research project: Locate at least one non-obsolete platform where a
combination of padding bits in an unsigned integer type actually
causes a trap or calculation error.

Does this include systems where each byte has a parity bit,
and the hardware traps if it detects a parity error in any
particular byte?

Jun 21 '06 #22

Jack Klein <ja*******@spamcop.net> writes:
[...]

And finally, as to the usefulness of worrying about padding bits in
integer types, I read Keith's post about working on a Cray else
thread. I never have, and Keith didn't say, but I'd be willing to bet
that the padding bits are simply ignored, and never cause any sort of
trap or unexpected error in a calculation.

I actually don't know one way or the other. (I might look into it if
I get sufficiently motivated.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Jun 21 '06 #23

"Old Wolf" <ol*****@inspire.net.nz> writes:

Jack Klein wrote:

Research project: Locate at least one non-obsolete platform where a
combination of padding bits in an unsigned integer type actually
causes a trap or calculation error.

Does this include systems where each byte has a parity bit,
and the hardware traps if it detects a parity error in any
particular byte?

Probably not. If a bit isn't visible when you view an object as an
array of unsigned char, it's not a padding bit (it's not a bit at all
as far as the C standard is concerned).

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Jun 21 '06 #24

Keith Thompson posted:

(It's a Cray Y/MP EL running Unicos 9.0, basically an obsolete
supercomputer.)

I think I read elsewhere on the newsgroup the unsigned integer types have
padding because the machine can really only do floating point
arithmetic...
So I'm curious, how would it deal with the following code?
int main(void)
{
unsigned char array[50];
const unsigned char * const p_over =
array + sizeof(array) / sizeof(*array);

const unsigned char* p = array;

do
{
*p++ = 46;
} while ( p != p_over );
}

Jun 21 '06 #25

Frederick Gotham <fg*******@SPAM.com> writes:

Keith Thompson posted:
(It's a Cray Y/MP EL running Unicos 9.0, basically an obsolete
supercomputer.)

I think I read elsewhere on the newsgroup the unsigned integer types have
padding because the machine can really only do floating point
arithmetic...
So I'm curious, how would it deal with the following code?
int main(void)
{
unsigned char array[50];
const unsigned char * const p_over =
array + sizeof(array) / sizeof(*array);

const unsigned char* p = array;

do
{
*p++ = 46;
} while ( p != p_over );
}

Probably by printing a compile-time error message. You declare p as a
pointer to const unsigned char, then attempt to modify what it points
to.

If you drop the "const" in the declaration of p, it will simply set
each element of array to 46 on any conforming implementation --
assuming the entire program isn't optimized away because it produces
no output.

I haven't tried it on the Cray, and I don't see the point of the
question. (The standard explicitly says that unsigned char has no
padding bits, but even if it did, the program doesn't have anything to
do with that issue.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Jun 21 '06 #26

Barry Schwarz

On Wed, 21 Jun 2006 19:43:56 GMT, Keith Thompson <ks***@mib.org>
wrote:

Frederick Gotham <fg*******@SPAM.com> writes:
Keith Thompson posted:
(It's a Cray Y/MP EL running Unicos 9.0, basically an obsolete
supercomputer.)

I think I read elsewhere on the newsgroup the unsigned integer types have
padding because the machine can really only do floating point
arithmetic...
So I'm curious, how would it deal with the following code?
int main(void)
{
unsigned char array[50];
const unsigned char * const p_over =
array + sizeof(array) / sizeof(*array);

const unsigned char* p = array;

do
{
*p++ = 46;
} while ( p != p_over );
}

Probably by printing a compile-time error message. You declare p as a
pointer to const unsigned char, then attempt to modify what it points
to.

If you drop the "const" in the declaration of p, it will simply set
each element of array to 46 on any conforming implementation --
assuming the entire program isn't optimized away because it produces
no output.

Since p points to an element of array and none of those elements have
been initialized, this invokes undefined behavior at each iteration.
Remove del for email

Jun 22 '06 #27

Barry Schwarz <sc******@doezl.net> writes:

On Wed, 21 Jun 2006 19:43:56 GMT, Keith Thompson <ks***@mib.org>
wrote:
Frederick Gotham <fg*******@SPAM.com> writes: [...]
So I'm curious, how would it deal with the following code?
int main(void)
{
unsigned char array[50];
const unsigned char * const p_over =
array + sizeof(array) / sizeof(*array);

const unsigned char* p = array;

do
{
*p++ = 46;
} while ( p != p_over );
}

Probably by printing a compile-time error message. You declare p as a
pointer to const unsigned char, then attempt to modify what it points
to.

If you drop the "const" in the declaration of p, it will simply set
each element of array to 46 on any conforming implementation --
assuming the entire program isn't optimized away because it produces
no output.

Since p points to an element of array and none of those elements have
been initialized, this invokes undefined behavior at each iteration.

I don't think so. The statement
*p++ = 46;
assigns a value to the array element and increments p (the pointer,
not the array element).

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Jun 22 '06 #28

Barry Schwarz

On Thu, 22 Jun 2006 01:59:24 GMT, Keith Thompson <ks***@mib.org>
wrote:

Barry Schwarz <sc******@doezl.net> writes:
On Wed, 21 Jun 2006 19:43:56 GMT, Keith Thompson <ks***@mib.org>
wrote:
Frederick Gotham <fg*******@SPAM.com> writes:[...] So I'm curious, how would it deal with the following code?
int main(void)
{
unsigned char array[50];
const unsigned char * const p_over =
array + sizeof(array) / sizeof(*array);

const unsigned char* p = array;

do
{
*p++ = 46;
} while ( p != p_over );
}

Probably by printing a compile-time error message. You declare p as a
pointer to const unsigned char, then attempt to modify what it points
to.

If you drop the "const" in the declaration of p, it will simply set
each element of array to 46 on any conforming implementation --
assuming the entire program isn't optimized away because it produces
no output.

Since p points to an element of array and none of those elements have
been initialized, this invokes undefined behavior at each iteration.

I don't think so. The statement
*p++ = 46;
assigns a value to the array element and increments p (the pointer,
not the array element).

You're right. For some reason I read it as +=. I think I'll blame my
glasses.
Remove del for email

Jun 22 '06 #29

Keith Thompson posted:

Probably by printing a compile-time error message. You declare p as a
pointer to const unsigned char, then attempt to modify what it points
to.

I rarely check over my code before posting to Usenet, so you can expect
to see the odd error or oversight here and there. More to the point, why
shift the focus away from the topic at hand, impertinently focusing the
attention on a misplaced "const"? It makes the exchange less enjoyable.

If you drop the "const" in the declaration of p, it will simply set
each element of array to 46 on any conforming implementation --
assuming the entire program isn't optimized away because it produces
no output.

Again, nothing to do with the actual topic at hand.

I haven't tried it on the Cray, and I don't see the point of the
question. (The standard explicitly says that unsigned char has no
padding bits, but even if it did, the program doesn't have anything to
do with that issue.)

Here's what I was getting at:

If the machine in question can only do floating-point arithmetic, and
if it achieves integral arithemic via usage of padding bits... then how
could it possibly do arithmetic on an unsigned char (because an unsigned
char is guaranteed to be absent of padding bits)?

Jun 22 '06 #30

CBFalconer

Frederick Gotham wrote:

.... snip ...
I rarely check over my code before posting to Usenet, so you can
expect to see the odd error or oversight here and there. More to
the point, why shift the focus away from the topic at hand,
impertinently focusing the attention on a misplaced "const"? It
makes the exchange less enjoyable.

Because, around here, we value accuracy, and we don't suffer the
goofs to live. Maybe you should do more checking before posting.
I don't think any regular has failed to be corrected at times. It
does tend to sharpen the critical faculties.

--
"I don't know where bin Laden is. I have no idea and really
don't care. It's not that important." - G.W. Bush, 2002-03-13
"No, we've had no evidence that Saddam Hussein was involved
with September the 11th." - George Walker Bush 2003-09-17

Jun 22 '06 #31

CBFalconer posted:

Frederick Gotham wrote:

... snip ...

I rarely check over my code before posting to Usenet, so you can
expect to see the odd error or oversight here and there. More to
the point, why shift the focus away from the topic at hand,
impertinently focusing the attention on a misplaced "const"? It
makes the exchange less enjoyable.

Because, around here, we value accuracy, and we don't suffer the
goofs to live. Maybe you should do more checking before posting.
I don't think any regular has failed to be corrected at times. It
does tend to sharpen the critical faculties.

I have no problem with a simple correction:

"Hey, you've a misplaced const there".

But why proceed to center the post around an obvious oversight? If anything
it just demonstrates the poster's lack of proficiency to notice that it was
simply an oversight.

Jun 22 '06 #32

Richard Heathfield

Frederick Gotham said:

I have no problem with a simple correction:

"Hey, you've a misplaced const there".

But why proceed to center the post around an obvious oversight?
To get you out of the habit of posting obvious oversights, maybe? There is a
strong case to be made out for /only/ pointing out the /first/ error in any
code.
If
anything it just demonstrates the poster's lack of proficiency to notice
that it was simply an oversight.

That's the second time at least that you've attacked proficient C
programmers for criticising your C code's incorrectness. That isn't a
bright direction to be heading in, in my opinion.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Jun 22 '06 #33

Frederick Gotham <fg*******@SPAM.com> writes:

CBFalconer posted:
Frederick Gotham wrote:

... snip ...

I rarely check over my code before posting to Usenet, so you can
expect to see the odd error or oversight here and there. More to
the point, why shift the focus away from the topic at hand,
impertinently focusing the attention on a misplaced "const"? It
makes the exchange less enjoyable.

Because, around here, we value accuracy, and we don't suffer the
goofs to live. Maybe you should do more checking before posting.
I don't think any regular has failed to be corrected at times. It
does tend to sharpen the critical faculties.

I have no problem with a simple correction:

"Hey, you've a misplaced const there".

But why proceed to center the post around an obvious oversight? If anything
it just demonstrates the poster's lack of proficiency to notice that it was
simply an oversight.

I certainly did not "center the post around an obvious oversight". I
pointed it out, then answered your question assuming the "const" is
deleted.

As I said in my previous response, I honestly didn't know what the
point of the program was. Next time, please check your code before
posting it, and if you're making a point, please state it rather than
assuming it's clear from the code.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Jun 22 '06 #34