473,884 Members | 2,445 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

memcpy() and endianness

#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/
}

/* Thanks for listening! Case */

Nov 14 '05
26 11694

"Case" <no@no.no> wrote in message
#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/
}

/* Thanks for listening! Case */


How many different values can i have given code above? With
value I mean a number at C level, not implementation level.

In terms of existing implementations , probably about a dozen. Usually
numbers will be big- or little- endian and in two's complement notation, so
for practical purposes the answer is two. However you could run into
non-two's complement machines, machines where there are 9 bits in a byte,
and all sorts of other wonderful variations.
Nov 14 '05 #11
Christian Bau wrote:

In article <40************ *********@news. xs4all.nl>, Case <no@no.no>
wrote:
#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/
}


Nothing.


I agree.

I believe what is missing in all of the
discussions is what endianness _is_.

In simple terms, it is the relationship between the CPU
and its memory. The above code example will, on any
architecture/platform it's run on, ALWAYS do the
following (assuming sizeof (int) == 4 for sake of argument):

*((char *)(&i) + 0) = data[ 0 ];
*((char *)(&i) + 1) = data[ 1 ];
*((char *)(&i) + 2) = data[ 2 ];
*((char *)(&i) + 3) = data[ 3 ];

However, how the CPU interprets the bits now contained
in the variable "i" is where the concept of its endianness
comes in. An Intel CPU will see the ordering of the
bits _differently_ then a SPARC CPU (or a 68040, etc.).

The code snippet will produce identical results _in
memory_ on all architectures where the sizeof (int) is four,
however, there is nothing to say that each architecture
will interpret the arrangement of the bits in the same way.

See man htonl(), etc. for more details.
HTH...

Stephen
Nov 14 '05 #12
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Case wrote:
| Lew Pitcher wrote:
|
|> -----BEGIN PGP SIGNED MESSAGE-----
|> Hash: SHA1
|>
|> Case wrote:
|>
|>> #include <string.h>
|>>
|>> int i; /* 4-byte == 4-char */
|>> char data[] = { 0x78, 0x56, 0x34, 0x12 };
|>>
|>> int main()
|>> {
|>> memcpy(&i, data, 4);
|>
|>
|>
|> First off, sizeof(i) may not be equal to 4. So, this may or may not do
|> what you
|> expect it to do.
|
|
| Yes, I know. That's why I said i is '4-byte == 4-char'.

No. sizeof(int) is 4 if the *compiler* says it is. Your word doesn't count
here at all. And we haven't seen anything from the compiler to indicate that
sizeof(int) == 4

|>
|>> /*
|>> * Thinking about endianness, what can be said about
|>> * the value of i according to the C-spec?
|>> */
|>
|>
|> Nothing can be said about the value of i.
|> 1) you may or may not have set the value of i to a known quantity. If
|> sizeof(i)
|> is greater than 4, then you didn't set i's storage completely, and if
|> sizeof(i)
|> is less than 4, then some of your initialization was not used to set i
|> (and
|> overwrote something else instead)
|
|
| It's 4 as I said (see above).

See above. It's not 4 on your word.

| And, doesn't the C standard say that
| 'global' data (as i is) is initialized to 0?!

So? We're not talking about /before/ you memcpy(). We're talking about /after/
you memcpy()

Think of it this way. If, unlike you, your compiler believes that
sizeof(int) == 2, then your memcpy() of 4 bytes over a 2-byte int just wiped
out two additional bytes somewhere. Your int only holds the first two bytes of
the 4 byte array that you used to init with, and that value might be
interpreted /either/ in big-endian /or/ little-endian format.

OTOH, if (unlike you) your compiler believes that sizeof(int) == 8), then your
memcpy() of 4 bytes over an 8-byte int only placed data into four of the eight
bytes. The other four bytes are not touched. So, we now have an int in which
four bytes are known quantities, but that can be interpreted in one of 8! ways
(big-endian and little-endian being two of those ways). So, even knowing the 4
bytes (and by inference from the rules, all 8 bytes) we can't tell what the
value of your int is.

|> 2) the standard doesn't specify how an integer is to map into a
|> character array.
|> It doesn't specify a particular endianness for integers.
|
|
- --
Lew Pitcher

Master Codewright & JOAT-in-training | GPG public key available on request
Registered Linux User #112576 (http://counter.li.org/)
Slackware - Because I know what I'm doing.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAoEBOagV FX4UWr64RAmnTAK DaJ1lt0cW8WHF75 3pjcGWQHMHChACb BSsD
miBERGc25WSOMfh SWfdQi28=
=woxR
-----END PGP SIGNATURE-----
Nov 14 '05 #13
Lew Pitcher wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Case wrote:
| Lew Pitcher wrote:
|
|> -----BEGIN PGP SIGNED MESSAGE-----
|> Hash: SHA1
|>
|> Case wrote:
|>
|>> #include <string.h>
|>>
|>> int i; /* 4-byte == 4-char */
|>> char data[] = { 0x78, 0x56, 0x34, 0x12 };
|>>
|>> int main()
|>> {
|>> memcpy(&i, data, 4);
|>
|>
|>
|> First off, sizeof(i) may not be equal to 4. So, this may or may not do
|> what you
|> expect it to do.
|
|
| Yes, I know. That's why I said i is '4-byte == 4-char'.

No. sizeof(int) is 4 if the *compiler* says it is. Your word doesn't count
here at all. And we haven't seen anything from the compiler to indicate
that
sizeof(int) == 4
Yes, you are correct. All I meant was: 'Assuming that my compiler sees
an int as a 4-byte entity and a char as a 1-byte entity, what is the
result of ...' BTW, why doesn't anyone question the sizeof char in
my example? Is char perhaps *silently* assumed to be a byte?

Assuming my question is clear now, how should I have coded my example
unambiguously (without the use of comments)?

|>
|>> /*
|>> * Thinking about endianness, what can be said about
|>> * the value of i according to the C-spec?
|>> */
|>
|>
|> Nothing can be said about the value of i.
|> 1) you may or may not have set the value of i to a known quantity. If
|> sizeof(i)
|> is greater than 4, then you didn't set i's storage completely, and if
|> sizeof(i)
|> is less than 4, then some of your initialization was not used to set i
|> (and
|> overwrote something else instead)
|
|
| It's 4 as I said (see above).

See above. It's not 4 on your word.

| And, doesn't the C standard say that
| 'global' data (as i is) is initialized to 0?!

So? We're not talking about /before/ you memcpy(). We're talking about
/after/
you memcpy()

Think of it this way. If, unlike you, your compiler believes that
sizeof(int) == 2, then your memcpy() of 4 bytes over a 2-byte int just
wiped
out two additional bytes somewhere. Your int only holds the first two
bytes of
the 4 byte array that you used to init with, and that value might be
interpreted /either/ in big-endian /or/ little-endian format.

OTOH, if (unlike you) your compiler believes that sizeof(int) == 8),
then your
memcpy() of 4 bytes over an 8-byte int only placed data into four of the
eight
bytes. The other four bytes are not touched. So, we now have an int in
which
four bytes are known quantities, but that can be interpreted in one of
8! ways
(big-endian and little-endian being two of those ways). So, even knowing
the 4
bytes (and by inference from the rules, all 8 bytes) we can't tell what the
value of your int is.

|> 2) the standard doesn't specify how an integer is to map into a
|> character array.
|> It doesn't specify a particular endianness for integers.
|
|
- --
Lew Pitcher

Master Codewright & JOAT-in-training | GPG public key available on request
Registered Linux User #112576 (http://counter.li.org/)
Slackware - Because I know what I'm doing.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAoEBOagV FX4UWr64RAmnTAK DaJ1lt0cW8WHF75 3pjcGWQHMHChACb BSsD
miBERGc25WSOMfh SWfdQi28=
=woxR
-----END PGP SIGNATURE-----


Nov 14 '05 #14
Martin Dickopp wrote:
Case <no@no.no> writes:

Martin Dickopp wrote:
Case <no@no.no> writes:
#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/
}

/* Thanks for listening! Case */

A signed integer has a sign bit, a number of value bits (each of which
has a value that is an integral power of two), and possibly padding
bits. The standard does not impose any rule how the bits have to be
arranged.
For example, in the special case of `int' having 31 value bits and no
padding bits, there are 263130836933693 530167218012160 000000 (== 32!)
possibilitie s how to arrange the bits. Three are particularly popular
among implementors, so that they have special names: little, big, and
mixed endian. The remaining 263130836933693 530167218012159 999997 don't
have any endianess.
Therefore, not much can be said about the value of `i' from the
perspectiv e of the C standard.
How many different values can i have given code above?

If type `int' has 31 value bits and no padding bits, and bytes have 8
bits, then `i' will have 13 one-bits and 19 zero-bits. The number of
values with this property is given by the binomial coefficient
"32 choose 13", which is 347373600. That's how many different values
`i' can have.


So this means that bit ordering, as defined in the C spec, can be
completely different for int and char (and other basic types)?

With value I mean a number at C level, not implementation level.

I don't know what you mean by "C level" or "implementa tion level".


At "C level" the bits have a fixed position, for example 0x00000001
can be used to get least significant bit (bit 0) of a 4 byte int;
at implementation level there are (as I understand it from you) 32
possible positions this bit could be.

Nov 14 '05 #15
Case <no@no.no> wrote:
Lew Pitcher wrote:
Case wrote:
| Yes, I know. That's why I said i is '4-byte == 4-char'.

No. sizeof(int) is 4 if the *compiler* says it is. Your word doesn't count
here at all. And we haven't seen anything from the compiler to indicate
that sizeof(int) == 4


Yes, you are correct. All I meant was: 'Assuming that my compiler sees
an int as a 4-byte entity and a char as a 1-byte entity, what is the
result of ...' BTW, why doesn't anyone question the sizeof char in
my example? Is char perhaps *silently* assumed to be a byte?


No. It is _explicitly_ defined to be one byte by the Standard.

Richard

[ BTW, please learn to snip. ]
Nov 14 '05 #16
Case <no@no.no> writes:
BTW, why doesn't anyone question the sizeof char in my example? Is
char perhaps *silently* assumed to be a byte?


Yes, `char' *always* has a size of one byte, so `sizeof(char) == 1' is
always true. However, a byte can have more than 8 bits.

Note that my other answer to you in this thread deals with the special
case that seems to apply to your implementation: 8 bit bytes, 4 byte
`int's with no padding bits.

Martin
--
,--. Martin Dickopp, Dresden, Germany ,= ,-_-. =.
/ ,- ) http://www.zero-based.org/ ((_/)o o(\_))
\ `-' `-'(. .)`-'
`-. Debian, a variant of the GNU operating system. \_/
Nov 14 '05 #17
Richard Bos wrote:
....snip...
[ BTW, please learn to snip. ]


Thanks for the info about char size.

Kees

Nov 14 '05 #18
Case <no@no.no> writes:
Martin Dickopp wrote:
Case <no@no.no> writes:
Martin Dickopp wrote:

Case <no@no.no> writes:
>#include <string.h>
>
>int i; /* 4-byte == 4-char */
>char data[] = { 0x78, 0x56, 0x34, 0x12 };
>
>int main()
>{
> memcpy(&i, data, 4);
>
> /*
> * Thinking about endianness, what can be said about
> * the value of i according to the C-spec?
> */
>}
>
>/* Thanks for listening! Case */

A signed integer has a sign bit, a number of value bits (each of which
has a value that is an integral power of two), and possibly padding
bits. The standard does not impose any rule how the bits have to be
arranged.
For example, in the special case of `int' having 31 value bits and no
padding bits, there are 263130836933693 530167218012160 000000 (== 32!)
possibiliti es how to arrange the bits. Three are particularly popular
among implementors, so that they have special names: little, big, and
mixed endian. The remaining 263130836933693 530167218012159 999997 don't
have any endianess.
Therefore , not much can be said about the value of `i' from the
perspecti ve of the C standard.

How many different values can i have given code above?

If type `int' has 31 value bits and no padding bits, and bytes have 8
bits, then `i' will have 13 one-bits and 19 zero-bits. The number of
values with this property is given by the binomial coefficient
"32 choose 13", which is 347373600. That's how many different values
`i' can have.


So this means that bit ordering, as defined in the C spec, can be
completely different for int and char (and other basic types)?


Yes. Although in reality, I have never seen a machine which didn't
either use big endian, little endian, or mixed endian bit order, the
C standard certainly allows others.
With value I mean a number at C level, not implementation level.

I don't know what you mean by "C level" or "implementa tion level".


At "C level" the bits have a fixed position, for example 0x00000001
can be used to get least significant bit (bit 0) of a 4 byte int;
at implementation level there are (as I understand it from you) 32
possible positions this bit could be.


I see. These are usually referred to as "value" and "representation ",
respectively. Note that the `memcpy' call sets the /representation/
of `i'.

Martin
--
,--. Martin Dickopp, Dresden, Germany ,= ,-_-. =.
/ ,- ) http://www.zero-based.org/ ((_/)o o(\_))
\ `-' `-'(. .)`-'
`-. Debian, a variant of the GNU operating system. \_/
Nov 14 '05 #19
Richard Bos wrote:
Case <no@no.no> wrote:
Is char perhaps *silently* assumed to be a byte?


No. It is _explicitly_ defined to be one byte by the Standard.


<sarcasm> Well, that's really going to clear up the OP's confusion.

In C, a byte is a unit of storage large enough to hold a char. By this
definition, similar to that used in the Standard, sizeof(char) == 1

The meaning that many people incorrectly associate with `byte' actually
belongs with `octet'; the latter just happens to be a common choice for
size of the former.

Applying the sizeof operator directly to the `char' type is not harmful
but it is indicative of a grave misunderstandin g of the meaning of byte
or character in C, and thus throws doubt on the correctness of all uses
of sizeof by that programmer.

--
++acr@,ka"
Nov 14 '05 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
496
by: kelvSYC | last post by:
Are there any endianness concerns in C++, or does the compiler take care of those details? I ask because I'm not sure if code such as the following have consistent behavior on all platforms. typedef unsigned int u32; // sizeof(int) == 4 typedef unsigned char u8; u8 array = { 0x01, 0x23, 0x45, 0x67 }; *((u32*) array) = 0x89ABCDEF;
15
2041
by: T Koster | last post by:
Hi group, I'm having some difficulty figuring out the most portable way to read 24 bits from a file. This is related to a Base-64 encoding. The file is opened in binary mode, and I'm using fread to read three bytes from it. The question is though, where should fread put this? I have considered two alternatives, but neither seem like a good idea: In most cases, the width of a char is 8 bits, so an array of 3 chars
2
4035
by: SSM | last post by:
Hi, Does C standard comment about "Endianness" to be used to store a structure/union variables? Thanks & Regards, Mehta
72
11655
by: gamehack | last post by:
Hi all, I was thinking today, suppose we have the number n = 0xAB 0xFF which is equivalent to 44031 in decimal. In big endian it will be stored as 10101011 11111111 but in little endian it will be 11111111 10101011 If we then apply a bit shift n << 2; that would give us completely
18
14055
by: friend.05 | last post by:
Code to check endianness of machine
18
2843
by: Indian.croesus | last post by:
Hi, If I am right Endianness is CPU related. I do not know if the question is right in itself but if it is then how does C handle issues arising out of Endianness. I understand that if we pass structures using sockets across platforms, we need to take care of Endianness issues at the application level. But for example, for the code using bitwise AND to figure out if a number is odd or even, how does C know the LSB position?
29
4273
by: Martin | last post by:
For reasons I won't go into, I need to transfer from 1 to 3 bytes to a variable that I know is 4 bytes long. Bytes not written to in the 4-byte target variable must be zero. Is the following use of memcpy() a well-defined way of so doing? The code is written knowing that sizeof(unsigned long) == 4 in this instance. The code is somewhat contrived in order to provide a self-contained program that will compile and show the use of memcpy() I...
5
6822
by: Rahul | last post by:
Hi Everyone, I have a program unit which does >and << of an integer which is of 4 bytes length. The logic of shifting and action based on the result, assumes that the system is big-endian. Accordingly, if i need the program to work fine in a little-endian system. I understand that the code needs to be changed. ( I couldn't find any statement in C90 about endianness, hence i'm assuming that c programs are not portable if the endianness...
11
4093
by: =?Utf-8?B?RGF0ZWxNb25rZXk5OQ==?= | last post by:
I have some c++ code that I am converting to C#. What I need to convert is the following: memcpy(&tmpshort, (pTmpDataIn+1), 2); This should copy two bytes of an char* to an int which then gets used elsewhere. I am having trouble coming up with how to approach. Thanks,
0
11170
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10769
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10869
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9591
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
7137
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
6009
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4623
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4231
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3243
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.