By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
434,720 Members | 2,157 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 434,720 IT Pros & Developers. It's quick & easy.

Access individual bytes of a 4 byte long (optimization)

P: n/a
Hi!

On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.

Is version A, version B, or version C better? Are there other
alternatives?

/**** Version A ******/
{
long mylong = -1;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
(unsigned char) mylong , \
(unsigned char) (mylong >8), \
(unsigned char) (mylong >>16), \
(unsigned char) (mylong >>24));
}

/**** Version B ******/
{
long mylong = -1;
unsigned char f_b[4];

*((long *)&f_b) = mylong;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", f_b[0], f_b[1], f_b[2],
f_b[3]);
}

/**** Version C ******/
{
union align_array_and_long {
unsigned char four_b[4];
long dummy;
};

long mylong = -1;
union align_array_and_long four;

four = (union align_array_and_long) mylong;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
four.four_b[0], \
four.four_b[1], \
four.four_b[2], \
four.four_b[3]);
}
My feeling is the Version C is best.

What can be said about the alignment of array f_b and mylong in
Version B?
(I think in Version B, the alignment of array f_b and mylong might be
skew, in which case it is slower than C. If in Version B, four_b and
mylong are aligned, then Version B is identical to Version C.?)

..
..
..

Now what if one needs to access the individual bytes the *whole time*?
Is A2, B2, C2 or D2 faster?

/**** Version A2 ******/
{
long mylong = -1;
unsigned char b0, b1, b2, b3;

b0 = (unsigned char) mylong;
b1 = (unsigned char) (mylong >8);
b2 = (unsigned char) (mylong >>16);
b3 = (unsigned char) (mylong >>24);

// access: b0, b1, b2, b3
}

/**** Version B2 ******/
{
long mylong = -1;
unsigned char f_b[4];

*((long *)&f_b) = mylong;

// access: f_b[0], f_b[1], f_b[2], f_b[3]
}

/**** Version C2 ******/
{
union align_array_and_long {
unsigned char four_b[4];
long dummy;
};

long mylong = -1;
union align_array_and_long four;

four = (union align_array_and_long) mylong;

// access: four.four_b[0], four.four_b[1], four.four_b[2],
four.four_b[3]
}

/**** Version D2 ******/
{
struct four_struct {
unsigned char byte0;
unsigned char byte1;
unsigned char byte2;
unsigned char byte3;
};

union align_array_and_long {
struct four_struct four_s;
long dummy;
};

long mylong = -1;
union align_array_and_long four;

four = (union align_array_and_long) mylong;

// access: four.four_s.byte0, four.four_s.byte1,
four.four_s.byte2, four.four_s.byte3

}

My feeling is the Version D2 is best: mylong is loaded into four in
one shot (no shifts etc. as in A2).

And in D2 the compiler always knows that we specify exactly which byte
we want:
four.four_s.byte0
This is different in C2: four.four_b[which_byte]
Or is it really different? :
are these 2 equivalent: four.four_s.byte0 <--four.four_b[0] ???

..
..
..

Version A and A2 are portable in terms of endianness, but the question
is not about portability - it's about optimization for a given
platform.

Thanks.

anon.asdf

Aug 10 '07 #1
Share this Question
Share on Google+
16 Replies


P: n/a
an*******@gmail.com wrote:
On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.

Is version A, version B, or version C better?
Measure them and find out.
Now what if one needs to access the individual bytes the *whole time*?
Is A2, B2, C2 or D2 faster?
Measure them and find out.

--
Chris "performance is nothing without measurement" Dollin

Hewlett-Packard Limited registered no:
registered office: Cain Road, Bracknell, Berks RG12 1HN 690597 England

Aug 10 '07 #2

P: n/a
an*******@gmail.com wrote:
On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.

Is version A, version B, or version C better?
Mu.

Rule one of micro-optimisation:
Don't Do It.
Rule two of micro-optimisation (for experts only!):
Don't Do It Yet.
Rule three of micro-optimisation (only under duress):
Measure, Measure, Measure.

Unless you _know_ that it matters, assume that it doesn't, and write the
clearest code. If you think you do know that it matters, first gather
evidence. Only by measuring which is the fastest will you know which is
the fastest - on your machine, using your implementation, in your
project, under your optimisation settings. And don't be surprised to
find out that you were wrong, and the difference is no more than 0.5%,
with an error of 1%.

Richard
Aug 10 '07 #3

P: n/a
an*******@gmail.com writes:
Hi!

On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.

Is version A, version B, or version C better? Are there other
alternatives?

/**** Version A ******/
{
long mylong = -1;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
(unsigned char) mylong , \
(unsigned char) (mylong >8), \
(unsigned char) (mylong >>16), \
(unsigned char) (mylong >>24));
}

/**** Version B ******/
{
long mylong = -1;
unsigned char f_b[4];

*((long *)&f_b) = mylong;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", f_b[0], f_b[1], f_b[2],
f_b[3]);
}

/**** Version C ******/
{
union align_array_and_long {
unsigned char four_b[4];
long dummy;
};

long mylong = -1;
union align_array_and_long four;

four = (union align_array_and_long) mylong;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
four.four_b[0], \
four.four_b[1], \
four.four_b[2], \
four.four_b[3]);
}
My feeling is the Version C is best.
For the fastest, try:

printf("0x%08lx\n", mylong); /* :-) */

Versions B and C, invoke undefined behaviour. The defined way to do
version B is:

void *vp = &mylong;
unsigned char *cp = vp;
/* now do what you want with cp[0] to cp[sizeof long] */

There is no need to lie about having an array. Version C is very
likely to work, but the standard does not guarantee accesses to any
union member other than the last one assigned to (barring the special
exception for "common initial members").

Similar comments apply to the your other code fragments.

--
Ben.
Aug 10 '07 #4

P: n/a
an*******@gmail.com wrote:
Hi!

On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.
1) Your question is about your environment -- machine(s),
compiler(s), O/S(es), etc. -- and not about C. Seek a forum
where the experts on your environment hang out.

2) If "as fast as possible" is really your goal, you
should not be using C, nor even assembly. Custom-built
hardware is the way to go. Seek a forum where chip designers
hang out.

3) This is the second time in recent days that you've
given "I want" as the only reason for doing something. You
may not understand it yet, but the context of the "I want"
can often have a huge influence on the speed of whatever code
you wind up with. For example: Is this long just sitting
around in memory, or is it the result of a recent computation
and perhaps still available in a register? Seek a forum where
compiler experts hang out.

--
Eric Sosman
es*****@ieee-dot-org.invalid
Aug 10 '07 #5

P: n/a
For the fastest, try:

printf("0x%08lx\n", mylong); /* :-) */

Versions B and C, invoke undefined behaviour. The defined way to do
version B is:

void *vp = &mylong;
unsigned char *cp = vp;
/* now do what you want with cp[0] to cp[sizeof long] */

Very good comment, about using a pointer that way!!
Thanks!
anon.asdf

Aug 10 '07 #6

P: n/a
On Fri, 10 Aug 2007 03:35:38 -0700, anon.asdf wrote:
Hi!

On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.

Is version A, version B, or version C better? Are there other
alternatives?

/**** Version A ******/
{
long mylong = -1;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
(unsigned char) mylong , \
(unsigned char) (mylong >8), \
(unsigned char) (mylong >>16), \
(unsigned char) (mylong >>24));
Bitwise shifts on negative integers are implementation-defined,
and that needn't have anything to do with endianness.
}

/**** Version B ******/
{
long mylong = -1;
unsigned char f_b[4];

*((long *)&f_b) = mylong;
#include <string.h>
memcpy(f_b, &mylong, 4);
This does the same thing you were trying to do, without the risk
of disasters if f_b doesn't happen to be correctly aligned for a
long.
printf("0x%02x 0x%02x 0x%02x 0x%02x\n", f_b[0], f_b[1], f_b[2],
f_b[3]);
}

/**** Version C ******/
{
union align_array_and_long {
unsigned char four_b[4];
long dummy;
};

long mylong = -1;
union align_array_and_long four;

four = (union align_array_and_long) mylong;
You meant four.dummy = -1? You can only cast into a scalar type,
which a union is not.
Also, accessing one member of an union other than the last one
written in is UB, so I think the compiler is allowed to optimize
away an assignment to four.dummy if its value is not used.
printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
four.four_b[0], \
four.four_b[1], \
four.four_b[2], \
four.four_b[3]);
}
[snip]
Version A and A2 are portable in terms of endianness, but the
question
is not about portability - it's about optimization for a given platform.
Whoever implemented memcpy on your platform is likely to know what
is more efficient on that specific platform better than you do.
--
Army1987 (Replace "NOSPAM" with "email")
No-one ever won a game by resigning. -- S. Tartakower

Aug 10 '07 #7

P: n/a
In article <87************@bsb.me.uk>,
Ben Bacarisse <be********@bsb.me.ukwrote:
>The defined way to do
version B is:

void *vp = &mylong;
unsigned char *cp = vp;
/* now do what you want with cp[0] to cp[sizeof long] */
Make that cp[sizeof long - 1]
--
Programming is what happens while you're busy making other plans.
Aug 10 '07 #8

P: n/a
ro******@ibd.nrc-cnrc.gc.ca (Walter Roberson) writes:
In article <87************@bsb.me.uk>,
Ben Bacarisse <be********@bsb.me.ukwrote:
>>The defined way to do
version B is:

void *vp = &mylong;
unsigned char *cp = vp;
/* now do what you want with cp[0] to cp[sizeof long] */

Make that cp[sizeof long - 1]
Of course, thanks.

--
Ben.
Aug 10 '07 #9

P: n/a
In article <11*********************@j4g2000prf.googlegroups.c om>
<an*******@gmail.comwrote:
>On a machine of *given architecture* ...
OK, I give you "MIPS" as the architecture (using the MIPS compilers).
>... I want to access the individual bytes of a long (*once-off*)
as fast as possible.
Oops, now you have to decide whether this is a 32-bit MIPS (ILP32
model) or a 64-bit MIPS (I32LP64 model -- i.e., long is eight 8-bit
bytes long).
>Is version A, version B, or version C better?
[where A is shift-and-mask, and B and C go through RAM]

On most compilers, version A will be *far* faster than almost
anything else. In fact, since your original code fragment had the
variable set to a constant, if you compile with optimization, the
four or eight extracted sub-parts will also be constants.

Interesting side note: if the architecture is changed to the original
DEC (now Compaq) Alpha, "byte" accesses to RAM are handled in the
compiler by doing full 8-byte machine-word accesses and then using
shift-and-mask instructions, because that is how the machine *has*
to do it. (There are special instructions like "zap" for working
with the eight 8-bit "byte fields" of a register, but loads and
stores are always full 64-bit operations.)

(The MIPS architecture is a lot more common though, as it is found
in various home gaming systems.)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Aug 10 '07 #10

P: n/a
an*******@gmail.com wrote:
>
Hi!

On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.

/* BEGIN new.c */

#include <stdio.h>

int main (void)
{
long mylong = 0x12345678;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n",
((unsigned char *)&mylong)[0],
((unsigned char *)&mylong)[1],
((unsigned char *)&mylong)[2],
((unsigned char *)&mylong)[3]);
return 0;
}

/* END new.c */
--
pete
Aug 10 '07 #11

P: n/a
In article <46**********@mindspring.com>, pete <pf*****@mindspring.comwrote:
>an*******@gmail.com wrote:
>#include <stdio.h>
>int main (void)
{
long mylong = 0x12345678;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n",
((unsigned char *)&mylong)[0],
((unsigned char *)&mylong)[1],
((unsigned char *)&mylong)[2],
((unsigned char *)&mylong)[3]);
return 0;
}
What if sizeof(long) 4 ?
--
"It is important to remember that when it comes to law, computers
never make copies, only human beings make copies. Computers are given
commands, not permission. Only people can be given permission."
-- Brad Templeton
Aug 10 '07 #12

P: n/a
Walter Roberson wrote:
>
In article <46**********@mindspring.com>, pete <pf*****@mindspring.comwrote:
an*******@gmail.com wrote:
#include <stdio.h>
int main (void)
{
long mylong = 0x12345678;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n",
((unsigned char *)&mylong)[0],
((unsigned char *)&mylong)[1],
((unsigned char *)&mylong)[2],
((unsigned char *)&mylong)[3]);
return 0;
}

What if sizeof(long) 4 ?
/* BEGIN new.c */

#include <stdio.h>
#include <assert.h>

int main (void)
{
long mylong = 0x12345678;

assert(sizeof(long) == 4);
printf("0x%02x 0x%02x 0x%02x 0x%02x\n",
((unsigned char *)&mylong)[0],
((unsigned char *)&mylong)[1],
((unsigned char *)&mylong)[2],
((unsigned char *)&mylong)[3]);
return 0;
}

/* END new.c */

--
pete
Aug 10 '07 #13

P: n/a
On Fri, 10 Aug 2007 22:43:12 +0000 (UTC), (Walter Roberson) wrote:
>In article, pete wrote:
>>#include <stdio.h>
>>int main (void)
{
long mylong = 0x12345678;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n",
((unsigned char *)&mylong)[0],
((unsigned char *)&mylong)[1],
((unsigned char *)&mylong)[2],
((unsigned char *)&mylong)[3]);
return 0;
}

What if sizeof(long) 4 ?
what if this?

#include <stdio.h>
#include <limits.h>

int main (void)
{long mylong=0x123456789abcdef, prova=0xFF12;
unsigned char *a; int i;

if(CHAR_BIT!=8) return 0;

a= (char*) &mylong;
printf("Valore 0X%x\n",
(unsigned) ((unsigned char*)&prova)[sizeof(long)-1]);

if( ((unsigned char*)&prova)[sizeof(long)-1] == 0x12)
{for(i=0; i<sizeof(long); ++i)
printf("0x%02x ", (unsigned) a[i]);
}
else {for(i=sizeof(long)-1; i>=0; --i)
printf("0x%02x ", (unsigned) a[i]);
}

printf("\n");
return 0;
}

or this? How many UB do you find?
i find one in first example none in the below

#include <stdio.h>
#include <limits.h>

int main (void)
{long mylong=0x123456789abcdef;
unsigned long prova, r;
unsigned char *a;
int i;

if(CHAR_BIT!=8) return 0;
prova=0xFF;
for(i=sizeof(long)-1, prova<<=i*8; i>=0 ; prova>>=8, --i)
{r=((unsigned long)mylong & prova)>>(i*8);
printf("0x%02x ", r);
}

printf("\n");
return 0;
}
Aug 11 '07 #14

P: n/a
On Sat, 11 Aug 2007 09:36:17 +0200, "a\\/b" <al@f.gwrote:

>or this? How many UB do you find?
i find one in first example none in the below
UB in the sense the implementation give the correct result or nothing(
char_bit!=8)
>#include <stdio.h>
#include <limits.h>

int main (void)
{long mylong=0x123456789abcdef;
unsigned long prova, r;
unsigned char *a;
int i;

if(CHAR_BIT!=8) return 0;
prova=0xFF;
for(i=sizeof(long)-1, prova<<=i*8; i>=0 ; prova>>=8, --i)
{r=((unsigned long)mylong & prova)>>(i*8);
printf("0x%02x ", r);
okok printf("0x%02x ", (unsigned) r);
}

printf("\n");
return 0;
}
not take all to siriusly it is the summer time i have to say something
:)
Aug 11 '07 #15

P: n/a
On Fri, 10 Aug 2007 22:43:12 +0000, Walter Roberson wrote:
In article <46**********@mindspring.com>, pete <pf*****@mindspring.comwrote:
>>an*******@gmail.com wrote:
>>#include <stdio.h>
>>int main (void)
{
long mylong = 0x12345678;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n",
((unsigned char *)&mylong)[0],
((unsigned char *)&mylong)[1],
((unsigned char *)&mylong)[2],
((unsigned char *)&mylong)[3]);
return 0;
}

What if sizeof(long) 4 ?
#include <stdio.h>
int main(void)
{
long mylong = 0x12345678;
unsigned char *ptr;
for (ptr = &mylong; ptr < &mylong + 1; ptr++)
printf("0x%02x ", *ptr);
putchar('\n');
return 0;
}
--
Army1987 (Replace "NOSPAM" with "email")
No-one ever won a game by resigning. -- S. Tartakower

Aug 11 '07 #16

P: n/a
a\/b wrote:
>
On Sat, 11 Aug 2007 09:36:17 +0200, "a\\/b" <al@f.gwrote:
or this? How many UB do you find?
i find one in first example none in the below

UB in the sense the implementation give the correct result or nothing(
char_bit!=8)
That's implementation defined behavior.
>
#include <stdio.h>
#include <limits.h>

int main (void)
{long mylong=0x123456789abcdef;
The result of assigning a 45 bit integer value to a long,
is also implementation defined.

--
pete
Aug 11 '07 #17

This discussion thread is closed

Replies have been disabled for this discussion.