434,720 Members | 2,157 Online
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 434,720 IT Pros & Developers. It's quick & easy.

# Access individual bytes of a 4 byte long (optimization)

 P: n/a Hi! On a machine of *given architecture* (in terms of endianness etc.), I want to access the individual bytes of a long (*once-off*) as fast as possible. Is version A, version B, or version C better? Are there other alternatives? /**** Version A ******/ { long mylong = -1; printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \ (unsigned char) mylong , \ (unsigned char) (mylong >8), \ (unsigned char) (mylong >>16), \ (unsigned char) (mylong >>24)); } /**** Version B ******/ { long mylong = -1; unsigned char f_b[4]; *((long *)&f_b) = mylong; printf("0x%02x 0x%02x 0x%02x 0x%02x\n", f_b[0], f_b[1], f_b[2], f_b[3]); } /**** Version C ******/ { union align_array_and_long { unsigned char four_b[4]; long dummy; }; long mylong = -1; union align_array_and_long four; four = (union align_array_and_long) mylong; printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \ four.four_b[0], \ four.four_b[1], \ four.four_b[2], \ four.four_b[3]); } My feeling is the Version C is best. What can be said about the alignment of array f_b and mylong in Version B? (I think in Version B, the alignment of array f_b and mylong might be skew, in which case it is slower than C. If in Version B, four_b and mylong are aligned, then Version B is identical to Version C.?) .. .. .. Now what if one needs to access the individual bytes the *whole time*? Is A2, B2, C2 or D2 faster? /**** Version A2 ******/ { long mylong = -1; unsigned char b0, b1, b2, b3; b0 = (unsigned char) mylong; b1 = (unsigned char) (mylong >8); b2 = (unsigned char) (mylong >>16); b3 = (unsigned char) (mylong >>24); // access: b0, b1, b2, b3 } /**** Version B2 ******/ { long mylong = -1; unsigned char f_b[4]; *((long *)&f_b) = mylong; // access: f_b[0], f_b[1], f_b[2], f_b[3] } /**** Version C2 ******/ { union align_array_and_long { unsigned char four_b[4]; long dummy; }; long mylong = -1; union align_array_and_long four; four = (union align_array_and_long) mylong; // access: four.four_b[0], four.four_b[1], four.four_b[2], four.four_b[3] } /**** Version D2 ******/ { struct four_struct { unsigned char byte0; unsigned char byte1; unsigned char byte2; unsigned char byte3; }; union align_array_and_long { struct four_struct four_s; long dummy; }; long mylong = -1; union align_array_and_long four; four = (union align_array_and_long) mylong; // access: four.four_s.byte0, four.four_s.byte1, four.four_s.byte2, four.four_s.byte3 } My feeling is the Version D2 is best: mylong is loaded into four in one shot (no shifts etc. as in A2). And in D2 the compiler always knows that we specify exactly which byte we want: four.four_s.byte0 This is different in C2: four.four_b[which_byte] Or is it really different? : are these 2 equivalent: four.four_s.byte0 <--four.four_b[0] ??? .. .. .. Version A and A2 are portable in terms of endianness, but the question is not about portability - it's about optimization for a given platform. Thanks. anon.asdf Aug 10 '07 #1
16 Replies

 P: n/a an*******@gmail.com wrote: On a machine of *given architecture* (in terms of endianness etc.), I want to access the individual bytes of a long (*once-off*) as fast as possible. Is version A, version B, or version C better? Measure them and find out. Now what if one needs to access the individual bytes the *whole time*? Is A2, B2, C2 or D2 faster? Measure them and find out. -- Chris "performance is nothing without measurement" Dollin Hewlett-Packard Limited registered no: registered office: Cain Road, Bracknell, Berks RG12 1HN 690597 England Aug 10 '07 #2

 P: n/a an*******@gmail.com wrote: On a machine of *given architecture* (in terms of endianness etc.), I want to access the individual bytes of a long (*once-off*) as fast as possible. Is version A, version B, or version C better? Mu. Rule one of micro-optimisation: Don't Do It. Rule two of micro-optimisation (for experts only!): Don't Do It Yet. Rule three of micro-optimisation (only under duress): Measure, Measure, Measure. Unless you _know_ that it matters, assume that it doesn't, and write the clearest code. If you think you do know that it matters, first gather evidence. Only by measuring which is the fastest will you know which is the fastest - on your machine, using your implementation, in your project, under your optimisation settings. And don't be surprised to find out that you were wrong, and the difference is no more than 0.5%, with an error of 1%. Richard Aug 10 '07 #3

 P: n/a an*******@gmail.com writes: Hi! On a machine of *given architecture* (in terms of endianness etc.), I want to access the individual bytes of a long (*once-off*) as fast as possible. Is version A, version B, or version C better? Are there other alternatives? /**** Version A ******/ { long mylong = -1; printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \ (unsigned char) mylong , \ (unsigned char) (mylong >8), \ (unsigned char) (mylong >>16), \ (unsigned char) (mylong >>24)); } /**** Version B ******/ { long mylong = -1; unsigned char f_b[4]; *((long *)&f_b) = mylong; printf("0x%02x 0x%02x 0x%02x 0x%02x\n", f_b[0], f_b[1], f_b[2], f_b[3]); } /**** Version C ******/ { union align_array_and_long { unsigned char four_b[4]; long dummy; }; long mylong = -1; union align_array_and_long four; four = (union align_array_and_long) mylong; printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \ four.four_b[0], \ four.four_b[1], \ four.four_b[2], \ four.four_b[3]); } My feeling is the Version C is best. For the fastest, try: printf("0x%08lx\n", mylong); /* :-) */ Versions B and C, invoke undefined behaviour. The defined way to do version B is: void *vp = &mylong; unsigned char *cp = vp; /* now do what you want with cp[0] to cp[sizeof long] */ There is no need to lie about having an array. Version C is very likely to work, but the standard does not guarantee accesses to any union member other than the last one assigned to (barring the special exception for "common initial members"). Similar comments apply to the your other code fragments. -- Ben. Aug 10 '07 #4

 P: n/a an*******@gmail.com wrote: Hi! On a machine of *given architecture* (in terms of endianness etc.), I want to access the individual bytes of a long (*once-off*) as fast as possible. 1) Your question is about your environment -- machine(s), compiler(s), O/S(es), etc. -- and not about C. Seek a forum where the experts on your environment hang out. 2) If "as fast as possible" is really your goal, you should not be using C, nor even assembly. Custom-built hardware is the way to go. Seek a forum where chip designers hang out. 3) This is the second time in recent days that you've given "I want" as the only reason for doing something. You may not understand it yet, but the context of the "I want" can often have a huge influence on the speed of whatever code you wind up with. For example: Is this long just sitting around in memory, or is it the result of a recent computation and perhaps still available in a register? Seek a forum where compiler experts hang out. -- Eric Sosman es*****@ieee-dot-org.invalid Aug 10 '07 #5

 P: n/a For the fastest, try: printf("0x%08lx\n", mylong); /* :-) */ Versions B and C, invoke undefined behaviour. The defined way to do version B is: void *vp = &mylong; unsigned char *cp = vp; /* now do what you want with cp[0] to cp[sizeof long] */ Very good comment, about using a pointer that way!! Thanks! anon.asdf Aug 10 '07 #6

 P: n/a On Fri, 10 Aug 2007 03:35:38 -0700, anon.asdf wrote: Hi! On a machine of *given architecture* (in terms of endianness etc.), I want to access the individual bytes of a long (*once-off*) as fast as possible. Is version A, version B, or version C better? Are there other alternatives? /**** Version A ******/ { long mylong = -1; printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \ (unsigned char) mylong , \ (unsigned char) (mylong >8), \ (unsigned char) (mylong >>16), \ (unsigned char) (mylong >>24)); Bitwise shifts on negative integers are implementation-defined, and that needn't have anything to do with endianness. } /**** Version B ******/ { long mylong = -1; unsigned char f_b[4]; *((long *)&f_b) = mylong; #include memcpy(f_b, &mylong, 4); This does the same thing you were trying to do, without the risk of disasters if f_b doesn't happen to be correctly aligned for a long. printf("0x%02x 0x%02x 0x%02x 0x%02x\n", f_b[0], f_b[1], f_b[2], f_b[3]); } /**** Version C ******/ { union align_array_and_long { unsigned char four_b[4]; long dummy; }; long mylong = -1; union align_array_and_long four; four = (union align_array_and_long) mylong; You meant four.dummy = -1? You can only cast into a scalar type, which a union is not. Also, accessing one member of an union other than the last one written in is UB, so I think the compiler is allowed to optimize away an assignment to four.dummy if its value is not used. printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \ four.four_b[0], \ four.four_b[1], \ four.four_b[2], \ four.four_b[3]); } [snip] Version A and A2 are portable in terms of endianness, but the question is not about portability - it's about optimization for a given platform. Whoever implemented memcpy on your platform is likely to know what is more efficient on that specific platform better than you do. -- Army1987 (Replace "NOSPAM" with "email") No-one ever won a game by resigning. -- S. Tartakower Aug 10 '07 #7

 P: n/a In article <87************@bsb.me.uk>, Ben Bacarisse The defined way to doversion B is: void *vp = &mylong; unsigned char *cp = vp; /* now do what you want with cp[0] to cp[sizeof long] */ Make that cp[sizeof long - 1] -- Programming is what happens while you're busy making other plans. Aug 10 '07 #8

 P: n/a ro******@ibd.nrc-cnrc.gc.ca (Walter Roberson) writes: In article <87************@bsb.me.uk>, Ben Bacarisse >The defined way to doversion B is: void *vp = &mylong; unsigned char *cp = vp; /* now do what you want with cp[0] to cp[sizeof long] */ Make that cp[sizeof long - 1] Of course, thanks. -- Ben. Aug 10 '07 #9

 P: n/a In article <11*********************@j4g2000prf.googlegroups.c om> On a machine of *given architecture* ... OK, I give you "MIPS" as the architecture (using the MIPS compilers). >... I want to access the individual bytes of a long (*once-off*)as fast as possible. Oops, now you have to decide whether this is a 32-bit MIPS (ILP32 model) or a 64-bit MIPS (I32LP64 model -- i.e., long is eight 8-bit bytes long). >Is version A, version B, or version C better? [where A is shift-and-mask, and B and C go through RAM] On most compilers, version A will be *far* faster than almost anything else. In fact, since your original code fragment had the variable set to a constant, if you compile with optimization, the four or eight extracted sub-parts will also be constants. Interesting side note: if the architecture is changed to the original DEC (now Compaq) Alpha, "byte" accesses to RAM are handled in the compiler by doing full 8-byte machine-word accesses and then using shift-and-mask instructions, because that is how the machine *has* to do it. (There are special instructions like "zap" for working with the eight 8-bit "byte fields" of a register, but loads and stores are always full 64-bit operations.) (The MIPS architecture is a lot more common though, as it is found in various home gaming systems.) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: forget about it http://web.torek.net/torek/index.html Reading email is like searching for food in the garbage, thanks to spammers. Aug 10 '07 #10

 P: n/a an*******@gmail.com wrote: > Hi! On a machine of *given architecture* (in terms of endianness etc.), I want to access the individual bytes of a long (*once-off*) as fast as possible. /* BEGIN new.c */ #include int main (void) { long mylong = 0x12345678; printf("0x%02x 0x%02x 0x%02x 0x%02x\n", ((unsigned char *)&mylong)[0], ((unsigned char *)&mylong)[1], ((unsigned char *)&mylong)[2], ((unsigned char *)&mylong)[3]); return 0; } /* END new.c */ -- pete Aug 10 '07 #11

 P: n/a In article <46**********@mindspring.com>, pete an*******@gmail.com wrote: >#include >int main (void){ long mylong = 0x12345678; printf("0x%02x 0x%02x 0x%02x 0x%02x\n", ((unsigned char *)&mylong)[0], ((unsigned char *)&mylong)[1], ((unsigned char *)&mylong)[2], ((unsigned char *)&mylong)[3]); return 0;} What if sizeof(long) 4 ? -- "It is important to remember that when it comes to law, computers never make copies, only human beings make copies. Computers are given commands, not permission. Only people can be given permission." -- Brad Templeton Aug 10 '07 #12

 P: n/a Walter Roberson wrote: > In article <46**********@mindspring.com>, pete int main (void) { long mylong = 0x12345678; printf("0x%02x 0x%02x 0x%02x 0x%02x\n", ((unsigned char *)&mylong)[0], ((unsigned char *)&mylong)[1], ((unsigned char *)&mylong)[2], ((unsigned char *)&mylong)[3]); return 0; } What if sizeof(long) 4 ? /* BEGIN new.c */ #include #include int main (void) { long mylong = 0x12345678; assert(sizeof(long) == 4); printf("0x%02x 0x%02x 0x%02x 0x%02x\n", ((unsigned char *)&mylong)[0], ((unsigned char *)&mylong)[1], ((unsigned char *)&mylong)[2], ((unsigned char *)&mylong)[3]); return 0; } /* END new.c */ -- pete Aug 10 '07 #13

 P: n/a On Fri, 10 Aug 2007 22:43:12 +0000 (UTC), (Walter Roberson) wrote: >In article, pete wrote: >>#include >>int main (void){ long mylong = 0x12345678; printf("0x%02x 0x%02x 0x%02x 0x%02x\n", ((unsigned char *)&mylong)[0], ((unsigned char *)&mylong)[1], ((unsigned char *)&mylong)[2], ((unsigned char *)&mylong)[3]); return 0;} What if sizeof(long) 4 ? what if this? #include #include int main (void) {long mylong=0x123456789abcdef, prova=0xFF12; unsigned char *a; int i; if(CHAR_BIT!=8) return 0; a= (char*) &mylong; printf("Valore 0X%x\n", (unsigned) ((unsigned char*)&prova)[sizeof(long)-1]); if( ((unsigned char*)&prova)[sizeof(long)-1] == 0x12) {for(i=0; i=0; --i) printf("0x%02x ", (unsigned) a[i]); } printf("\n"); return 0; } or this? How many UB do you find? i find one in first example none in the below #include #include int main (void) {long mylong=0x123456789abcdef; unsigned long prova, r; unsigned char *a; int i; if(CHAR_BIT!=8) return 0; prova=0xFF; for(i=sizeof(long)-1, prova<<=i*8; i>=0 ; prova>>=8, --i) {r=((unsigned long)mylong & prova)>>(i*8); printf("0x%02x ", r); } printf("\n"); return 0; } Aug 11 '07 #14

 P: n/a On Sat, 11 Aug 2007 09:36:17 +0200, "¬a\\/b" or this? How many UB do you find?i find one in first example none in the below UB in the sense the implementation give the correct result or nothing( char_bit!=8) >#include #include int main (void){long mylong=0x123456789abcdef; unsigned long prova, r; unsigned char *a; int i; if(CHAR_BIT!=8) return 0; prova=0xFF; for(i=sizeof(long)-1, prova<<=i*8; i>=0 ; prova>>=8, --i) {r=((unsigned long)mylong & prova)>>(i*8); printf("0x%02x ", r); okok printf("0x%02x ", (unsigned) r); } printf("\n"); return 0;} not take all to siriusly it is the summer time i have to say something :) Aug 11 '07 #15

 P: n/a On Fri, 10 Aug 2007 22:43:12 +0000, Walter Roberson wrote: In article <46**********@mindspring.com>, pete >an*******@gmail.com wrote: >>#include >>int main (void){ long mylong = 0x12345678; printf("0x%02x 0x%02x 0x%02x 0x%02x\n", ((unsigned char *)&mylong)[0], ((unsigned char *)&mylong)[1], ((unsigned char *)&mylong)[2], ((unsigned char *)&mylong)[3]); return 0;} What if sizeof(long) 4 ? #include int main(void) { long mylong = 0x12345678; unsigned char *ptr; for (ptr = &mylong; ptr < &mylong + 1; ptr++) printf("0x%02x ", *ptr); putchar('\n'); return 0; } -- Army1987 (Replace "NOSPAM" with "email") No-one ever won a game by resigning. -- S. Tartakower Aug 11 '07 #16

 P: n/a ¬a\/b wrote: > On Sat, 11 Aug 2007 09:36:17 +0200, "¬a\\/b" #include #include int main (void) {long mylong=0x123456789abcdef; The result of assigning a 45 bit integer value to a long, is also implementation defined. -- pete Aug 11 '07 #17

### This discussion thread is closed

Replies have been disabled for this discussion.