memcpy() and endianness

Case

#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/
}

/* Thanks for listening! Case */

Nov 14 '05 #1

Subscribe Reply

11545

Lew Pitcher

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Case wrote:

#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);
First off, sizeof(i) may not be equal to 4. So, this may or may not do what you
expect it to do.

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/ Nothing can be said about the value of i.
1) you may or may not have set the value of i to a known quantity. If sizeof(i)
is greater than 4, then you didn't set i's storage completely, and if sizeof(i)
is less than 4, then some of your initialization was not used to set i (and
overwrote something else instead)
2) the standard doesn't specify how an integer is to map into a character array.
It doesn't specify a particular endianness for integers.

}

/* Thanks for listening! Case */

- --
Lew Pitcher
IT Consultant, Enterprise Application Architecture,
Enterprise Technology Solutions, TD Bank Financial Group

(Opinions expressed are my own, not my employers')
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)

iD8DBQFAn5YjagVFX4UWr64RAgNYAKCGonjwOnfElYsZrCbrxp SzMS+rdgCg0oeE
3mzpLbH2n9S6Pv2gfAIfvTs=
=hmVd
-----END PGP SIGNATURE-----

Nov 14 '05 #2

Martin Dickopp

Case <no@no.no> writes:

#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/
}

/* Thanks for listening! Case */

A signed integer has a sign bit, a number of value bits (each of which
has a value that is an integral power of two), and possibly padding
bits. The standard does not impose any rule how the bits have to be
arranged.

For example, in the special case of `int' having 31 value bits and no
padding bits, there are 263130836933693530167218012160000000 (== 32!)
possibilities how to arrange the bits. Three are particularly popular
among implementors, so that they have special names: little, big, and
mixed endian. The remaining 263130836933693530167218012159999997 don't
have any endianess.

Therefore, not much can be said about the value of `i' from the
perspective of the C standard.

Martin
--
,--. Martin Dickopp, Dresden, Germany ,= ,-_-. =.
/ ,- ) http://www.zero-based.org/ ((_/)o o(\_))
\ `-' `-'(. .)`-'
`-. Debian, a variant of the GNU operating system. \_/

Nov 14 '05 #3

Case

Lew Pitcher wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Case wrote:
#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

First off, sizeof(i) may not be equal to 4. So, this may or may not do what you
expect it to do.

Yes, I know. That's why I said i is '4-byte == 4-char'.

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/
Nothing can be said about the value of i.
1) you may or may not have set the value of i to a known quantity. If sizeof(i)
is greater than 4, then you didn't set i's storage completely, and if sizeof(i)
is less than 4, then some of your initialization was not used to set i (and
overwrote something else instead)

It's 4 as I said (see above). And, doesn't the C standard say that
'global' data (as i is) is initialized to 0?!
2) the standard doesn't specify how an integer is to map into a character array.
It doesn't specify a particular endianness for integers.

Nov 14 '05 #4

Case

Martin Dickopp wrote:

Case <no@no.no> writes:

#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/
}

/* Thanks for listening! Case */

A signed integer has a sign bit, a number of value bits (each of which
has a value that is an integral power of two), and possibly padding
bits. The standard does not impose any rule how the bits have to be
arranged.

For example, in the special case of `int' having 31 value bits and no
padding bits, there are 263130836933693530167218012160000000 (== 32!)
possibilities how to arrange the bits. Three are particularly popular
among implementors, so that they have special names: little, big, and
mixed endian. The remaining 263130836933693530167218012159999997 don't
have any endianess.

Therefore, not much can be said about the value of `i' from the
perspective of the C standard.

How many different values can i have given code above? With value I
mean a number at C level, not implementation level.

Nov 14 '05 #5

Case

Lew Pitcher wrote:

Case wrote:
#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

First off, sizeof(i) may not be equal to 4. So, this may or may not do what you
expect it to do.

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/

Nothing can be said about the value of i.
1) you may or may not have set the value of i to a known quantity. If sizeof(i)
is greater than 4, then you didn't set i's storage completely, and if sizeof(i)
is less than 4, then some of your initialization was not used to set i (and
overwrote something else instead)
2) the standard doesn't specify how an integer is to map into a character array.
It doesn't specify a particular endianness for integers.

In terms of implementation, what mappings mapping are common?

Nov 14 '05 #6

CBFalconer

Case wrote:

Lew Pitcher wrote:
Case wrote:
#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

First off, sizeof(i) may not be equal to 4. So, this may or may
not do what you expect it to do.

Yes, I know. That's why I said i is '4-byte == 4-char'.

.... snip ...

Nothing can be said about the value of i.
1) you may or may not have set the value of i to a known
quantity. If sizeof(i) is greater than 4, then you didn't set
i's storage completely, and if sizeof(i) is less than 4, then
some of your initialization was not used to set i (and
overwrote something else instead)

It's 4 as I said (see above). And, doesn't the C standard say
that 'global' data (as i is) is initialized to 0?!

The fact that 'you said' doesn't make it so. The initialization
doesn't matter, because you have no idea what bits or bytes belong
where. This newsgroup deals only with portable standard C, so
your particular platform is of no interest whatsoever.

--
"I'm a war president. I make decisions here in the Oval Office
in foreign policy matters with war on my mind." - Bush.
"Churchill and Bush can both be considered wartime leaders, just
as Secretariat and Mr Ed were both horses." - James Rhodes.

Nov 14 '05 #7

Eric Sosman

Case wrote:

[code setting the bytes of a four-byte `int' to:]
char data[] = { 0x78, 0x56, 0x34, 0x12 };

In terms of implementation, what mappings mapping are common?

"Big-Endian:" the value is 0x78563412

"Little-Endian:" the value is 0x12345678

"Middle-Endian:" the value is 0x56781234

Other formats are possible, of course, and permitted by the
C Standard. Also, the latest C99 Standard permits an `int' to
have "trap representations" somewhat like an IEEE signalling NaN:
some arrangements of bits may signify "erroneous data" rather than
encoding a numeric value. It's at least possible thet storing
these four bytes in an integer could produce such a result.

For what it's worth, I've never encountered a machine that
used trap representations in integers or that used an "endian"
arrangement other than the three listed above. YMMV.

--
Er*********@sun.com

Nov 14 '05 #8

Martin Dickopp

Case <no@no.no> writes:

Martin Dickopp wrote:
Case <no@no.no> writes:
#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/
}

/* Thanks for listening! Case */ A signed integer has a sign bit, a number of value bits (each of which
has a value that is an integral power of two), and possibly padding
bits. The standard does not impose any rule how the bits have to be
arranged.
For example, in the special case of `int' having 31 value bits and no
padding bits, there are 263130836933693530167218012160000000 (== 32!)
possibilities how to arrange the bits. Three are particularly popular
among implementors, so that they have special names: little, big, and
mixed endian. The remaining 263130836933693530167218012159999997 don't
have any endianess.
Therefore, not much can be said about the value of `i' from the
perspective of the C standard.

How many different values can i have given code above?

If type `int' has 31 value bits and no padding bits, and bytes have 8
bits, then `i' will have 13 one-bits and 19 zero-bits. The number of
values with this property is given by the binomial coefficient
"32 choose 13", which is 347373600. That's how many different values
`i' can have.
With value I mean a number at C level, not implementation level.

I don't know what you mean by "C level" or "implementation level".

Martin
--
,--. Martin Dickopp, Dresden, Germany ,= ,-_-. =.
/ ,- ) http://www.zero-based.org/ ((_/)o o(\_))
\ `-' `-'(. .)`-'
`-. Debian, a variant of the GNU operating system. \_/

Nov 14 '05 #9

Christian Bau

In article <40*********************@news.xs4all.nl>, Case <no@no.no>
wrote:

#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/
}

Nothing.

Nov 14 '05 #10

Malcolm

"Case" <no@no.no> wrote in message

#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/
}

/* Thanks for listening! Case */

How many different values can i have given code above? With
value I mean a number at C level, not implementation level.

In terms of existing implementations, probably about a dozen. Usually
numbers will be big- or little- endian and in two's complement notation, so
for practical purposes the answer is two. However you could run into
non-two's complement machines, machines where there are 9 bits in a byte,
and all sorts of other wonderful variations.

Nov 14 '05 #11

Stephen L.

Christian Bau wrote:

In article <40*********************@news.xs4all.nl>, Case <no@no.no>
wrote:
#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/
}

Nothing.

I agree.

I believe what is missing in all of the
discussions is what endianness _is_.

In simple terms, it is the relationship between the CPU
and its memory. The above code example will, on any
architecture/platform it's run on, ALWAYS do the
following (assuming sizeof (int) == 4 for sake of argument):

*((char *)(&i) + 0) = data[ 0 ];
*((char *)(&i) + 1) = data[ 1 ];
*((char *)(&i) + 2) = data[ 2 ];
*((char *)(&i) + 3) = data[ 3 ];

However, how the CPU interprets the bits now contained
in the variable "i" is where the concept of its endianness
comes in. An Intel CPU will see the ordering of the
bits _differently_ then a SPARC CPU (or a 68040, etc.).

The code snippet will produce identical results _in
memory_ on all architectures where the sizeof (int) is four,
however, there is nothing to say that each architecture
will interpret the arrangement of the bits in the same way.

See man htonl(), etc. for more details.
HTH...

Stephen

Nov 14 '05 #12

Lew Pitcher

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Case wrote:
| Lew Pitcher wrote:
|
|> -----BEGIN PGP SIGNED MESSAGE-----
|> Hash: SHA1
|>
|> Case wrote:
|>
|>> #include <string.h>
|>>
|>> int i; /* 4-byte == 4-char */
|>> char data[] = { 0x78, 0x56, 0x34, 0x12 };
|>>
|>> int main()
|>> {
|>> memcpy(&i, data, 4);
|>
|>
|>
|> First off, sizeof(i) may not be equal to 4. So, this may or may not do
|> what you
|> expect it to do.
|
|
| Yes, I know. That's why I said i is '4-byte == 4-char'.

No. sizeof(int) is 4 if the *compiler* says it is. Your word doesn't count
here at all. And we haven't seen anything from the compiler to indicate that
sizeof(int) == 4

|>
|>> /*
|>> * Thinking about endianness, what can be said about
|>> * the value of i according to the C-spec?
|>> */
|>
|>
|> Nothing can be said about the value of i.
|> 1) you may or may not have set the value of i to a known quantity. If
|> sizeof(i)
|> is greater than 4, then you didn't set i's storage completely, and if
|> sizeof(i)
|> is less than 4, then some of your initialization was not used to set i
|> (and
|> overwrote something else instead)
|
|
| It's 4 as I said (see above).

See above. It's not 4 on your word.

| And, doesn't the C standard say that
| 'global' data (as i is) is initialized to 0?!

So? We're not talking about /before/ you memcpy(). We're talking about /after/
you memcpy()

Think of it this way. If, unlike you, your compiler believes that
sizeof(int) == 2, then your memcpy() of 4 bytes over a 2-byte int just wiped
out two additional bytes somewhere. Your int only holds the first two bytes of
the 4 byte array that you used to init with, and that value might be
interpreted /either/ in big-endian /or/ little-endian format.

OTOH, if (unlike you) your compiler believes that sizeof(int) == 8), then your
memcpy() of 4 bytes over an 8-byte int only placed data into four of the eight
bytes. The other four bytes are not touched. So, we now have an int in which
four bytes are known quantities, but that can be interpreted in one of 8! ways
(big-endian and little-endian being two of those ways). So, even knowing the 4
bytes (and by inference from the rules, all 8 bytes) we can't tell what the
value of your int is.

|> 2) the standard doesn't specify how an integer is to map into a
|> character array.
|> It doesn't specify a particular endianness for integers.
|
|
- --
Lew Pitcher

Master Codewright & JOAT-in-training | GPG public key available on request
Registered Linux User #112576 (http://counter.li.org/)
Slackware - Because I know what I'm doing.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAoEBOagVFX4UWr64RAmnTAKDaJ1lt0cW8WHF753pjcG WQHMHChACbBSsD
miBERGc25WSOMfhSWfdQi28=
=woxR
-----END PGP SIGNATURE-----

Nov 14 '05 #13

Case

Lew Pitcher wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Case wrote:
| Lew Pitcher wrote:
|
|> -----BEGIN PGP SIGNED MESSAGE-----
|> Hash: SHA1
|>
|> Case wrote:
|>
|>> #include <string.h>
|>>
|>> int i; /* 4-byte == 4-char */
|>> char data[] = { 0x78, 0x56, 0x34, 0x12 };
|>>
|>> int main()
|>> {
|>> memcpy(&i, data, 4);
|>
|>
|>
|> First off, sizeof(i) may not be equal to 4. So, this may or may not do
|> what you
|> expect it to do.
|
|
| Yes, I know. That's why I said i is '4-byte == 4-char'.

No. sizeof(int) is 4 if the *compiler* says it is. Your word doesn't count
here at all. And we haven't seen anything from the compiler to indicate
that
sizeof(int) == 4
Yes, you are correct. All I meant was: 'Assuming that my compiler sees
an int as a 4-byte entity and a char as a 1-byte entity, what is the
result of ...' BTW, why doesn't anyone question the sizeof char in
my example? Is char perhaps *silently* assumed to be a byte?

Assuming my question is clear now, how should I have coded my example
unambiguously (without the use of comments)?

|>
|>> /*
|>> * Thinking about endianness, what can be said about
|>> * the value of i according to the C-spec?
|>> */
|>
|>
|> Nothing can be said about the value of i.
|> 1) you may or may not have set the value of i to a known quantity. If
|> sizeof(i)
|> is greater than 4, then you didn't set i's storage completely, and if
|> sizeof(i)
|> is less than 4, then some of your initialization was not used to set i
|> (and
|> overwrote something else instead)
|
|
| It's 4 as I said (see above).

See above. It's not 4 on your word.

| And, doesn't the C standard say that
| 'global' data (as i is) is initialized to 0?!

So? We're not talking about /before/ you memcpy(). We're talking about
/after/
you memcpy()

Think of it this way. If, unlike you, your compiler believes that
sizeof(int) == 2, then your memcpy() of 4 bytes over a 2-byte int just
wiped
out two additional bytes somewhere. Your int only holds the first two
bytes of
the 4 byte array that you used to init with, and that value might be
interpreted /either/ in big-endian /or/ little-endian format.

OTOH, if (unlike you) your compiler believes that sizeof(int) == 8),
then your
memcpy() of 4 bytes over an 8-byte int only placed data into four of the
eight
bytes. The other four bytes are not touched. So, we now have an int in
which
four bytes are known quantities, but that can be interpreted in one of
8! ways
(big-endian and little-endian being two of those ways). So, even knowing
the 4
bytes (and by inference from the rules, all 8 bytes) we can't tell what the
value of your int is.

|> 2) the standard doesn't specify how an integer is to map into a
|> character array.
|> It doesn't specify a particular endianness for integers.
|
|
- --
Lew Pitcher

Master Codewright & JOAT-in-training | GPG public key available on request
Registered Linux User #112576 (http://counter.li.org/)
Slackware - Because I know what I'm doing.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAoEBOagVFX4UWr64RAmnTAKDaJ1lt0cW8WHF753pjcG WQHMHChACbBSsD
miBERGc25WSOMfhSWfdQi28=
=woxR
-----END PGP SIGNATURE-----

Nov 14 '05 #14

Case

Martin Dickopp wrote:

Case <no@no.no> writes:

Martin Dickopp wrote:
Case <no@no.no> writes:
#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/
}

/* Thanks for listening! Case */

A signed integer has a sign bit, a number of value bits (each of which
has a value that is an integral power of two), and possibly padding
bits. The standard does not impose any rule how the bits have to be
arranged.
For example, in the special case of `int' having 31 value bits and no
padding bits, there are 263130836933693530167218012160000000 (== 32!)
possibilities how to arrange the bits. Three are particularly popular
among implementors, so that they have special names: little, big, and
mixed endian. The remaining 263130836933693530167218012159999997 don't
have any endianess.
Therefore, not much can be said about the value of `i' from the
perspective of the C standard.
How many different values can i have given code above?

If type `int' has 31 value bits and no padding bits, and bytes have 8
bits, then `i' will have 13 one-bits and 19 zero-bits. The number of
values with this property is given by the binomial coefficient
"32 choose 13", which is 347373600. That's how many different values
`i' can have.

So this means that bit ordering, as defined in the C spec, can be
completely different for int and char (and other basic types)?

With value I mean a number at C level, not implementation level.

I don't know what you mean by "C level" or "implementation level".

At "C level" the bits have a fixed position, for example 0x00000001
can be used to get least significant bit (bit 0) of a 4 byte int;
at implementation level there are (as I understand it from you) 32
possible positions this bit could be.

Nov 14 '05 #15

Richard Bos

Case <no@no.no> wrote:

Lew Pitcher wrote:
Case wrote:
| Yes, I know. That's why I said i is '4-byte == 4-char'.

No. sizeof(int) is 4 if the *compiler* says it is. Your word doesn't count
here at all. And we haven't seen anything from the compiler to indicate
that sizeof(int) == 4

Yes, you are correct. All I meant was: 'Assuming that my compiler sees
an int as a 4-byte entity and a char as a 1-byte entity, what is the
result of ...' BTW, why doesn't anyone question the sizeof char in
my example? Is char perhaps *silently* assumed to be a byte?

No. It is _explicitly_ defined to be one byte by the Standard.

Richard

[ BTW, please learn to snip. ]

Nov 14 '05 #16

Martin Dickopp

Case <no@no.no> writes:

BTW, why doesn't anyone question the sizeof char in my example? Is
char perhaps *silently* assumed to be a byte?

Yes, `char' *always* has a size of one byte, so `sizeof(char) == 1' is
always true. However, a byte can have more than 8 bits.

Note that my other answer to you in this thread deals with the special
case that seems to apply to your implementation: 8 bit bytes, 4 byte
`int's with no padding bits.

Martin
--
,--. Martin Dickopp, Dresden, Germany ,= ,-_-. =.
/ ,- ) http://www.zero-based.org/ ((_/)o o(\_))
\ `-' `-'(. .)`-'
`-. Debian, a variant of the GNU operating system. \_/

Nov 14 '05 #17

Case

Richard Bos wrote:
....snip...

[ BTW, please learn to snip. ]

Thanks for the info about char size.

Kees

Nov 14 '05 #18

Martin Dickopp

Case <no@no.no> writes:

Martin Dickopp wrote:
Case <no@no.no> writes:
Martin Dickopp wrote:

Case <no@no.no> writes:
>#include <string.h>
>
>int i; /* 4-byte == 4-char */
>char data[] = { 0x78, 0x56, 0x34, 0x12 };
>
>int main()
>{
> memcpy(&i, data, 4);
>
> /*
> * Thinking about endianness, what can be said about
> * the value of i according to the C-spec?
> */
>}
>
>/* Thanks for listening! Case */

A signed integer has a sign bit, a number of value bits (each of which
has a value that is an integral power of two), and possibly padding
bits. The standard does not impose any rule how the bits have to be
arranged.
For example, in the special case of `int' having 31 value bits and no
padding bits, there are 263130836933693530167218012160000000 (== 32!)
possibilities how to arrange the bits. Three are particularly popular
among implementors, so that they have special names: little, big, and
mixed endian. The remaining 263130836933693530167218012159999997 don't
have any endianess.
Therefore, not much can be said about the value of `i' from the
perspective of the C standard.

How many different values can i have given code above?

If type `int' has 31 value bits and no padding bits, and bytes have 8
bits, then `i' will have 13 one-bits and 19 zero-bits. The number of
values with this property is given by the binomial coefficient
"32 choose 13", which is 347373600. That's how many different values
`i' can have.

So this means that bit ordering, as defined in the C spec, can be
completely different for int and char (and other basic types)?

Yes. Although in reality, I have never seen a machine which didn't
either use big endian, little endian, or mixed endian bit order, the
C standard certainly allows others.

With value I mean a number at C level, not implementation level.

I don't know what you mean by "C level" or "implementation level".

At "C level" the bits have a fixed position, for example 0x00000001
can be used to get least significant bit (bit 0) of a 4 byte int;
at implementation level there are (as I understand it from you) 32
possible positions this bit could be.

I see. These are usually referred to as "value" and "representation",
respectively. Note that the `memcpy' call sets the /representation/
of `i'.

Martin
--
,--. Martin Dickopp, Dresden, Germany ,= ,-_-. =.
/ ,- ) http://www.zero-based.org/ ((_/)o o(\_))
\ `-' `-'(. .)`-'
`-. Debian, a variant of the GNU operating system. \_/

Nov 14 '05 #19

Sam Dennis

Richard Bos wrote:

Case <no@no.no> wrote:
Is char perhaps *silently* assumed to be a byte?

No. It is _explicitly_ defined to be one byte by the Standard.

<sarcasm> Well, that's really going to clear up the OP's confusion.

In C, a byte is a unit of storage large enough to hold a char. By this
definition, similar to that used in the Standard, sizeof(char) == 1

The meaning that many people incorrectly associate with `byte' actually
belongs with `octet'; the latter just happens to be a common choice for
size of the former.

Applying the sizeof operator directly to the `char' type is not harmful
but it is indicative of a grave misunderstanding of the meaning of byte
or character in C, and thus throws doubt on the correctness of all uses
of sizeof by that programmer.

--
++acr@,ka"

Nov 14 '05 #20

Kevin Bracey

In message <40**************@sun.com>
Eric Sosman <er*********@sun.com> wrote:

Other formats are possible, of course, and permitted by the
C Standard. Also, the latest C99 Standard permits an `int' to
have "trap representations" somewhat like an IEEE signalling NaN:
some arrangements of bits may signify "erroneous data" rather than
encoding a numeric value. It's at least possible thet storing
these four bytes in an integer could produce such a result.

For what it's worth, I've never encountered a machine that
used trap representations in integers or that used an "endian"
arrangement other than the three listed above. YMMV.

I think trap representations for C99 _Bools are likely, at least. I suspect
that may have been one of the motivations for adding them. My implementation
has _Bool having the same representation as an unsigned char, with any
contents other than 0x00 or 0x01 being a trap representation.

--
Kevin Bracey, Principal Software Engineer
Tematic Ltd Tel: +44 (0) 1223 503464
182-190 Newmarket Road Fax: +44 (0) 1223 503458
Cambridge, CB5 8HE, United Kingdom WWW: http://www.tematic.com/

Nov 14 '05 #21

Dan Pop

In <2f****************@tematic.com> Kevin Bracey <ke**********@tematic.com> writes:

I think trap representations for C99 _Bools are likely, at least. I suspect
that may have been one of the motivations for adding them. My implementation
has _Bool having the same representation as an unsigned char, with any
contents other than 0x00 or 0x01 being a trap representation.

And what happens in your implementation when a trap representation is
evaluated?

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #22

James Kanze

"Malcolm" <ma*****@55bank.freeserve.co.uk> writes:

|> "Case" <no@no.no> wrote in message
|> > >>#include <string.h>

|> > >>int i; /* 4-byte == 4-char */
|> > >>char data[] = { 0x78, 0x56, 0x34, 0x12 };

|> > >>int main()
|> > >>{
|> > >> memcpy(&i, data, 4);

|> > >> /*
|> > >> * Thinking about endianness, what can be said about
|> > >> * the value of i according to the C-spec?
|> > >> */
|> > >>}

|> > >>/* Thanks for listening! Case */

|> > How many different values can i have given code above? With value
|> > I mean a number at C level, not implementation level.

|> In terms of existing implementations, probably about a dozen.
|> Usually numbers will be big- or little- endian and in two's
|> complement notation, so for practical purposes the answer is two.
|> However you could run into non-two's complement machines, machines
|> where there are 9 bits in a byte, and all sorts of other wonderful
|> variations.

Why be so exotic? I've used machines on which int was 16 bits, so the
memcpy becomes undefined behavior, and anything is possible.

--
James Kanze
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

Nov 14 '05 #23

Joe Wright

Sam Dennis wrote:

Richard Bos wrote:
Case <no@no.no> wrote:
Is char perhaps *silently* assumed to be a byte?
No. It is _explicitly_ defined to be one byte by the Standard.

<sarcasm> Well, that's really going to clear up the OP's confusion.

In C, a byte is a unit of storage large enough to hold a char. By this
definition, similar to that used in the Standard, sizeof(char) == 1

The meaning that many people incorrectly associate with `byte' actually
belongs with `octet'; the latter just happens to be a common choice for
size of the former.

So, as byte is an octet is nybble a quartet?
Applying the sizeof operator directly to the `char' type is not harmful
but it is indicative of a grave misunderstanding of the meaning of byte
or character in C, and thus throws doubt on the correctness of all uses
of sizeof by that programmer.

I'm sorry. I just couldn't stop myself. :-)

--
Joe Wright mailto:jo********@comcast.net
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---

Nov 14 '05 #24

Christian Bau

In article <40*********************@news.xs4all.nl>, Case <no@no.no>
wrote:

Yes, you are correct. All I meant was: 'Assuming that my compiler sees
an int as a 4-byte entity and a char as a 1-byte entity, what is the
result of ...' BTW, why doesn't anyone question the sizeof char in
my example? Is char perhaps *silently* assumed to be a byte?

sizeof (char) is always equal to one. A char is a byte. However, a byte
is _not_ an octet; a byte could have more than eight bits. And there are
C compilers where a char is 32 bits, and sizeof (int) is one.

(Eight bits are called an octet. Whatever number of bits you need for a
char is called a byte. C requires that a byte has at least eight bits,
so a byte is at least as large as an octet, but never less).

Nov 14 '05 #25

Kevin Bracey

In message <c7**********@sunnews.cern.ch>
Da*****@cern.ch (Dan Pop) wrote:

In <2f****************@tematic.com> Kevin Bracey <ke**********@tematic.com> writes:
I think trap representations for C99 _Bools are likely, at least. I
suspect that may have been one of the motivations for adding them. My
implementation has _Bool having the same representation as an unsigned
char, with any contents other than 0x00 or 0x01 being a trap
representation.

And what happens in your implementation when a trap representation is
evaluated?

Well, you'd get undefined behaviour, basically. Here's an example of what
could happen (off the top of my head):

_Bool b, b2, b3, b4;
unsigned char c;
int i;

struct { _Bool b:1 } s;

c = 2;
memcpy(&b, &c, 1);

i = b;
printf("i = %d\n", i);

s.b = b;
printf("s.b = %d\n", s.b);

b2 = b;
printf("b2 = %d\n", b2);

b3 = i;
printf("b3 = %d\n", b3);

b4 = !b;
printf("b4 = %d\n", b4);
This would output:

i = 2;
s.b = 0;
b2 = 2;
b3 = 1;
b4 = 3;

The knowledge that a bool "cannot" have a value other than 0 or 1 is used to
eliminate the "!= 0" test that would otherwise be inserted, as illustrated
by b2 and b3 there.

Internally, this is handled by having a hidden "boolean" attribute of a type
that indicates that its value is known to be 0 or 1. For example, the
expressions "!x" and "x && y" have type "boolean int".

--
Kevin Bracey, Principal Software Engineer
Tematic Ltd Tel: +44 (0) 1223 503464
182-190 Newmarket Road Fax: +44 (0) 1223 503458
Cambridge, CB5 8HE, United Kingdom WWW: http://www.tematic.com/

Nov 14 '05 #26

Dan Pop

In <0b****************@tematic.com> Kevin Bracey <ke**********@tematic.com> writes:

In message <c7**********@sunnews.cern.ch>
Da*****@cern.ch (Dan Pop) wrote:
In <2f****************@tematic.com> Kevin Bracey <ke**********@tematic.com> writes:
> I think trap representations for C99 _Bools are likely, at least. I
> suspect that may have been one of the motivations for adding them. My
> implementation has _Bool having the same representation as an unsigned
> char, with any contents other than 0x00 or 0x01 being a trap
> representation.

And what happens in your implementation when a trap representation is
evaluated?

Well, you'd get undefined behaviour, basically.

There is no such thing where a *concrete* implementation is concerned.
Even if not documented, the behaviour is defined by the code generated
by the compiler (and if the compiler generates random garbage, by the
algorithm used to generate it and by the way the processor handles it).

Thanks for the example: it's a reasonable optimisation.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #27

memcpy() and endianness

Similar topics