By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,350 Members | 1,213 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,350 IT Pros & Developers. It's quick & easy.

Working with non byte aligned data values in memory

P: 14
I'm working with a data stream of 8 bytes in an embedded application. In most cases the data is byte aligned so I can define a structure and then memcpy the data directly to the structure elements. There are, however, a few cases where the 16 bit data values span 2 or 3 bytes. I came up with a few macros to handle these cases. It's working fine, but I'm wondering if anyone can point out some obvious flaws or code optimizations to make the macros more efficient.

The sample code is:

Expand|Select|Wrap|Line Numbers
  1. //------------------------------------------------------------------------------------------------------------------------------------------------
  2. // M_EXTRACT
  3. // This macro extracts a value from within a block of memory.  The value extracted starts from and includes the start bit and ends with the
  4. // end bit.  The first bit is bit 0.  The maximum number of bits for a value is 16.
  5. //------------------------------------------------------------------------------------------------------------------------------------------------
  6.  
  7.  
  8. #include <stdio.h>
  9.  
  10. #define M_EXTRACT16(sbit, ebit, data)   ( (M_BYTENUM(sbit)) == (M_BYTENUM(ebit)) ? M_EXTRACT_1(sbit, ebit, data) : ( (M_BYTENUM(ebit)) - (M_BYTENUM(sbit)) == 1) ? M_EXTRACT_2(sbit, ebit, data) : M_EXTRACT_3(sbit, ebit, data) ) 
  11.  
  12. #define M_BYTENUM(x) (x>>3)                  // Determine which byte the bit resides in
  13. #define M_BYTEVAL(x) ( M_BYTENUM(x) << 3 )   // Determine starting bit 0 for a given byte... for example bit 0 of byte 2 is bit 16 in the data set.
  14.  
  15. #define M_EXTRACT_1(sbit, ebit, data) ( M_LOW(sbit, data) & ( (1 << (ebit - sbit + 1)) - 1) )              // If start & end are within one byte, use this.
  16. #define M_EXTRACT_2(sbit, ebit, data) ( M_LOW(sbit, data) | M_HIGH(sbit, ebit, data) )                     // If start & end span 2 bytes, use this.
  17. #define M_EXTRACT_3(sbit, ebit, data) ( M_LOW(sbit, data) | M_MID(sbit, data) | M_HIGH(sbit, ebit, data) ) // if start & end span three bytes, use this.
  18.  
  19. #define M_HIGH(sbit, ebit, data) ( (data[M_BYTENUM(ebit)] & ( (1<<(ebit - M_BYTEVAL(ebit) + 1)) - 1 )  ) << (( M_BYTEVAL(ebit) - sbit)&0x0f ) ) // Upper portion of value
  20. #define M_MID(sbit, data)        ( data[M_BYTENUM(sbit)+1] << ((M_BYTEVAL(( M_BYTENUM(sbit)+1 )) - sbit )&0x0f) )                               // Middle byte value when value spans three bytes
  21. #define M_LOW(sbit, data)        ( data[M_BYTENUM(sbit)] >> ((sbit - M_BYTEVAL(sbit))&0x0f) )                                                   // Lower portion of value
  22.  
  23. void main()
  24. {
  25.     unsigned int i, sb, eb;
  26.     unsigned char d[8];
  27.  
  28.     d[0] = 0xa7;
  29.     d[1] = 0xc2;
  30.     d[2] = 0xd6;
  31.     d[3] = 0xe6;
  32.     d[4] = 0xa3;
  33.     d[5] = 0x5a;
  34.     d[6] = 0xa5;
  35.     d[7] = 0x1c;
  36.  
  37.     sb =     0; eb =  4; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  38.     sb =  eb+1; eb = 10; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  39.     sb =  eb+1; eb = 22; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  40.     sb =  eb+1; eb = 29; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  41.     sb =  eb+1; eb = 30; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  42.     sb =  eb+1; eb = 43; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  43.     sb =  eb+1; eb = 50; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  44.     sb =  eb+1; eb = 60; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  45.     sb =  eb+1; eb = 63; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  46. }
The output of the sample is:

Expand|Select|Wrap|Line Numbers
  1. sb = 00 | eb = 04 | i = 7
  2. sb = 05 | eb = 10 | i = 21
  3. sb = 11 | eb = 22 | i = 2776
  4. sb = 23 | eb = 29 | i = 77
  5. sb = 30 | eb = 30 | i = 1
  6. sb = 31 | eb = 43 | i = 5447
  7. sb = 44 | eb = 50 | i = 85
  8. sb = 51 | eb = 60 | i = 916
  9. sb = 61 | eb = 63 | i = 0
Jan 8 '10 #1
Share this Question
Share on Google+
9 Replies


Banfa
Expert Mod 5K+
P: 8,916
Erm, I suspect you might actually be better off using functions (or at least some functions instead of macros. I would have thought the compiler/optimiser would have a better chance which a few statements rather than a single enormousness statement, particularly if you are calling the top level macro with variables rather than constants and particularly if you are using C++ and can inline your functions. It would certainly be easier to debug that way. If you consider the expansion a compiler has to face for a call to M_EXTRACT16(sbit, ebit, data); the line of code is horrendous.

Also the lack of casts concerns me, are you sure you are getting the right values? Consider this

Expand|Select|Wrap|Line Numbers
  1. char cv = 0x48;
  2. int iv = cv << 2;
Does iv have the value 0x0120 or 0x0020? Does the shift happen in integer or byte arithmetic? I should know the answer to this question and don't. But I do know whenever I shift bytes like that I always cast them first to the output type.
Jan 10 '10 #2

P: 14
I did everything with macros to save on the overhead associated with a function call. My thinking was the preprocessor would do all of the constant math at compile time thus coming up with a condensed line of code rather than an expanded mess.

As for getting the right values, all of my test cases return correct values so I'm fairly confident the casts aren't needed. This, however, may be compiler dependent.

I did run into a problem with the shifting values. For some reason the result of (sbit - M_BYTEVAL(sbit)) would front-fill with all 1's thus causing a huge shift. Bitwise ANDing the result with 0x0f took care of that.
Jan 11 '10 #3

Banfa
Expert Mod 5K+
P: 8,916
I have a feeling that shifting by the number of bits in the lhs or more results in undefined behaviour. For example where int is 32 bits then

int a = 5;
a << N;

for any N >= 32 is undefined behaviour. I'm not sure if this effects your code.

The constant arithmatic would be done by the compiler as long as, like I said, you actually called the top level macro with constants rather than variables.

Saving the function call overhead is all well and good but in its place you have left code that will be very hard for anyone following to understand (or even yourself in a few years). Are you sure this is a bottle neck in your program?

This code appears to me to be a hack to get round the issue of function call overhead being too much. Now sometimes that sort of hack is required, I've done them myself however I am not convinced that you have written your code and then discovered that the the function call is a bottle neck, for instance through profiling. So I believe that you may be guilty of premature optomising (which you can google), that is optomising before you really know that the thing you are spending time and effort optomising is actually a program performance bottleneck.

There is quite a lot of material out there on premature optomising but I like this which also deals with the issue of not optomising at all, i.e. it explains what the originator of the term meant rather than what the term has slowly come to mean.

If you have already written and profiled your program and determined this is a bottleneck then fine but if not then I would suggest writing a clear and easy to understand function and once your code is finished profiling it. Then if the function is a bottleneck in the program you can come back to these macros.
Jan 12 '10 #4

P: 14
Left shifting by too many bits would definitely produce inaccurate results. As the name implies, the macro is designed to extract a 16-bit value from some location in a byte array. No error checking is done to see if (ebit-sbit)>16. I've left it up to the programmer to enter correct values from the start.

As you mentioned, I may be prematurely optimizing code here as I've not measured to establish a benchmark. As it is now I've not run into any bottlenecks with the message processing.

More than anything I'm trying to design the code to be lean up front to avoid problems in the future. Utilizing bit operations, for example, that might save on cycles. Something like this simplification:

y-x-1 = ~x+y

May save a few cycles due to fewer operations.

If there are any obvious bit tricks like this that can be done with the above mentioned marco-- or even if it were in function form-- then it may be worth putting them in the code.
Jan 18 '10 #5

P: 14
Found bug.

M_MID should be:

Expand|Select|Wrap|Line Numbers
  1. #define M_MID(sbit, data)        ( data[M_BYTENUM(sbit)+1] << ((((M_BYTENUM(sbit)+1) << 3) - sbit ) &0x0f) )                                    // Middle byte value when value spans three bytes
  2.  
Jan 20 '10 #6

P: 14
Just an update here... I wrote a function to perform the same task and compared it to the Macro version. On average, the Macro ran 40x faster than the function version.

Also, I was able to optimize the code slightly:

Expand|Select|Wrap|Line Numbers
  1. #define M_HIGH(sbit, ebit, data) ( (data[M_BYTENUM(ebit)] & ( (1<<((ebit&0x07) + 1)) - 1 )  ) << (( (ebit&0xf8) - sbit) ) ) // Upper portion of value
  2. #define M_MID(sbit, data)        ( data[M_BYTENUM(sbit)+1] << (  ((~sbit)&0x07) +1) )                                    // Middle byte value when value spans three bytes
  3. #define M_LOW(sbit, data)        ( data[M_BYTENUM(sbit)] >> (sbit&0x07) )                                                   // Lower portion of value
  4.  
Jan 29 '10 #7

Banfa
Expert Mod 5K+
P: 8,916
Did you try it as an inlined function?
Jan 29 '10 #8

P: 14
I did try it as an inline function and yes, it still ran 40x slower...

however, I just tried something different. Up until now I was compiling with VC++ 2008 Express Edition in Debug mode. I've since turned on Release mode and I can't even register a single clock tick to provide a measurable difference between using a function or Macro. I've tried up to 8 nested loops each going from 0 to ULONG_MAX and the end result is still 0 clock ticks.
Feb 3 '10 #9

Banfa
Expert Mod 5K+
P: 8,916
That is easy to explain, inlining is normally switched off in debug mode to facilitate debugging, it is very hard to step through a program when the code is all jumbled up. You should always do speed tests using release mode with full optimisation.

I am a little surprised that you where unable to register any time at all with 4 such nested loops which suggests a silly mistake on your part somewhere. You need to make sure that the compiler is unable to tell that you code does nothing because if it can it will optimise it away. For example

Expand|Select|Wrap|Line Numbers
  1. #include <iostream>
  2. #include <ctime>
  3. #include <climits>
  4.  
  5. using namespace std;
  6.  
  7. int total = 0;
  8.  
  9. int main()
  10. {
  11.     int test = 0;
  12.     clock_t start, finish;
  13.  
  14.     start = clock();
  15.     for(unsigned long ix=1; ix<ULONG_MAX; ix++)
  16.     {
  17. //        total += ix;
  18.         test = 0;
  19.     }
  20.     finish = clock();
  21.  
  22.     cout << start << "," << finish << "," << double((finish - start)/(CLOCKS_PER_SEC/1000))/1000 << "," << test << endl;
  23. }
  24.  
This program returns 0 time used, the compiler can tell the loop does nothing and gets rid of it. It you uncomment line 17 the compiler can no longer do that because the loop alters global data now the program takes approximately 2.6 seconds to run.

Also figures from measurements like this on a multi-threaded OS particularly running on a multi-core processor are not very reliable. They might be able to give you an idea of which piece of code is faster but anything other than orders of magnitude is not really significant.
Feb 3 '10 #10

Post your reply

Sign in to post your reply or Sign up for a free account.