473,396 Members | 1,877 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Working with non byte aligned data values in memory

14
I'm working with a data stream of 8 bytes in an embedded application. In most cases the data is byte aligned so I can define a structure and then memcpy the data directly to the structure elements. There are, however, a few cases where the 16 bit data values span 2 or 3 bytes. I came up with a few macros to handle these cases. It's working fine, but I'm wondering if anyone can point out some obvious flaws or code optimizations to make the macros more efficient.

The sample code is:

Expand|Select|Wrap|Line Numbers
  1. //------------------------------------------------------------------------------------------------------------------------------------------------
  2. // M_EXTRACT
  3. // This macro extracts a value from within a block of memory.  The value extracted starts from and includes the start bit and ends with the
  4. // end bit.  The first bit is bit 0.  The maximum number of bits for a value is 16.
  5. //------------------------------------------------------------------------------------------------------------------------------------------------
  6.  
  7.  
  8. #include <stdio.h>
  9.  
  10. #define M_EXTRACT16(sbit, ebit, data)   ( (M_BYTENUM(sbit)) == (M_BYTENUM(ebit)) ? M_EXTRACT_1(sbit, ebit, data) : ( (M_BYTENUM(ebit)) - (M_BYTENUM(sbit)) == 1) ? M_EXTRACT_2(sbit, ebit, data) : M_EXTRACT_3(sbit, ebit, data) ) 
  11.  
  12. #define M_BYTENUM(x) (x>>3)                  // Determine which byte the bit resides in
  13. #define M_BYTEVAL(x) ( M_BYTENUM(x) << 3 )   // Determine starting bit 0 for a given byte... for example bit 0 of byte 2 is bit 16 in the data set.
  14.  
  15. #define M_EXTRACT_1(sbit, ebit, data) ( M_LOW(sbit, data) & ( (1 << (ebit - sbit + 1)) - 1) )              // If start & end are within one byte, use this.
  16. #define M_EXTRACT_2(sbit, ebit, data) ( M_LOW(sbit, data) | M_HIGH(sbit, ebit, data) )                     // If start & end span 2 bytes, use this.
  17. #define M_EXTRACT_3(sbit, ebit, data) ( M_LOW(sbit, data) | M_MID(sbit, data) | M_HIGH(sbit, ebit, data) ) // if start & end span three bytes, use this.
  18.  
  19. #define M_HIGH(sbit, ebit, data) ( (data[M_BYTENUM(ebit)] & ( (1<<(ebit - M_BYTEVAL(ebit) + 1)) - 1 )  ) << (( M_BYTEVAL(ebit) - sbit)&0x0f ) ) // Upper portion of value
  20. #define M_MID(sbit, data)        ( data[M_BYTENUM(sbit)+1] << ((M_BYTEVAL(( M_BYTENUM(sbit)+1 )) - sbit )&0x0f) )                               // Middle byte value when value spans three bytes
  21. #define M_LOW(sbit, data)        ( data[M_BYTENUM(sbit)] >> ((sbit - M_BYTEVAL(sbit))&0x0f) )                                                   // Lower portion of value
  22.  
  23. void main()
  24. {
  25.     unsigned int i, sb, eb;
  26.     unsigned char d[8];
  27.  
  28.     d[0] = 0xa7;
  29.     d[1] = 0xc2;
  30.     d[2] = 0xd6;
  31.     d[3] = 0xe6;
  32.     d[4] = 0xa3;
  33.     d[5] = 0x5a;
  34.     d[6] = 0xa5;
  35.     d[7] = 0x1c;
  36.  
  37.     sb =     0; eb =  4; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  38.     sb =  eb+1; eb = 10; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  39.     sb =  eb+1; eb = 22; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  40.     sb =  eb+1; eb = 29; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  41.     sb =  eb+1; eb = 30; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  42.     sb =  eb+1; eb = 43; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  43.     sb =  eb+1; eb = 50; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  44.     sb =  eb+1; eb = 60; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  45.     sb =  eb+1; eb = 63; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  46. }
The output of the sample is:

Expand|Select|Wrap|Line Numbers
  1. sb = 00 | eb = 04 | i = 7
  2. sb = 05 | eb = 10 | i = 21
  3. sb = 11 | eb = 22 | i = 2776
  4. sb = 23 | eb = 29 | i = 77
  5. sb = 30 | eb = 30 | i = 1
  6. sb = 31 | eb = 43 | i = 5447
  7. sb = 44 | eb = 50 | i = 85
  8. sb = 51 | eb = 60 | i = 916
  9. sb = 61 | eb = 63 | i = 0
Jan 8 '10 #1
9 3192
Banfa
9,065 Expert Mod 8TB
Erm, I suspect you might actually be better off using functions (or at least some functions instead of macros. I would have thought the compiler/optimiser would have a better chance which a few statements rather than a single enormousness statement, particularly if you are calling the top level macro with variables rather than constants and particularly if you are using C++ and can inline your functions. It would certainly be easier to debug that way. If you consider the expansion a compiler has to face for a call to M_EXTRACT16(sbit, ebit, data); the line of code is horrendous.

Also the lack of casts concerns me, are you sure you are getting the right values? Consider this

Expand|Select|Wrap|Line Numbers
  1. char cv = 0x48;
  2. int iv = cv << 2;
Does iv have the value 0x0120 or 0x0020? Does the shift happen in integer or byte arithmetic? I should know the answer to this question and don't. But I do know whenever I shift bytes like that I always cast them first to the output type.
Jan 10 '10 #2
RichG
14
I did everything with macros to save on the overhead associated with a function call. My thinking was the preprocessor would do all of the constant math at compile time thus coming up with a condensed line of code rather than an expanded mess.

As for getting the right values, all of my test cases return correct values so I'm fairly confident the casts aren't needed. This, however, may be compiler dependent.

I did run into a problem with the shifting values. For some reason the result of (sbit - M_BYTEVAL(sbit)) would front-fill with all 1's thus causing a huge shift. Bitwise ANDing the result with 0x0f took care of that.
Jan 11 '10 #3
Banfa
9,065 Expert Mod 8TB
I have a feeling that shifting by the number of bits in the lhs or more results in undefined behaviour. For example where int is 32 bits then

int a = 5;
a << N;

for any N >= 32 is undefined behaviour. I'm not sure if this effects your code.

The constant arithmatic would be done by the compiler as long as, like I said, you actually called the top level macro with constants rather than variables.

Saving the function call overhead is all well and good but in its place you have left code that will be very hard for anyone following to understand (or even yourself in a few years). Are you sure this is a bottle neck in your program?

This code appears to me to be a hack to get round the issue of function call overhead being too much. Now sometimes that sort of hack is required, I've done them myself however I am not convinced that you have written your code and then discovered that the the function call is a bottle neck, for instance through profiling. So I believe that you may be guilty of premature optomising (which you can google), that is optomising before you really know that the thing you are spending time and effort optomising is actually a program performance bottleneck.

There is quite a lot of material out there on premature optomising but I like this which also deals with the issue of not optomising at all, i.e. it explains what the originator of the term meant rather than what the term has slowly come to mean.

If you have already written and profiled your program and determined this is a bottleneck then fine but if not then I would suggest writing a clear and easy to understand function and once your code is finished profiling it. Then if the function is a bottleneck in the program you can come back to these macros.
Jan 12 '10 #4
RichG
14
Left shifting by too many bits would definitely produce inaccurate results. As the name implies, the macro is designed to extract a 16-bit value from some location in a byte array. No error checking is done to see if (ebit-sbit)>16. I've left it up to the programmer to enter correct values from the start.

As you mentioned, I may be prematurely optimizing code here as I've not measured to establish a benchmark. As it is now I've not run into any bottlenecks with the message processing.

More than anything I'm trying to design the code to be lean up front to avoid problems in the future. Utilizing bit operations, for example, that might save on cycles. Something like this simplification:

y-x-1 = ~x+y

May save a few cycles due to fewer operations.

If there are any obvious bit tricks like this that can be done with the above mentioned marco-- or even if it were in function form-- then it may be worth putting them in the code.
Jan 18 '10 #5
RichG
14
Found bug.

M_MID should be:

Expand|Select|Wrap|Line Numbers
  1. #define M_MID(sbit, data)        ( data[M_BYTENUM(sbit)+1] << ((((M_BYTENUM(sbit)+1) << 3) - sbit ) &0x0f) )                                    // Middle byte value when value spans three bytes
  2.  
Jan 20 '10 #6
RichG
14
Just an update here... I wrote a function to perform the same task and compared it to the Macro version. On average, the Macro ran 40x faster than the function version.

Also, I was able to optimize the code slightly:

Expand|Select|Wrap|Line Numbers
  1. #define M_HIGH(sbit, ebit, data) ( (data[M_BYTENUM(ebit)] & ( (1<<((ebit&0x07) + 1)) - 1 )  ) << (( (ebit&0xf8) - sbit) ) ) // Upper portion of value
  2. #define M_MID(sbit, data)        ( data[M_BYTENUM(sbit)+1] << (  ((~sbit)&0x07) +1) )                                    // Middle byte value when value spans three bytes
  3. #define M_LOW(sbit, data)        ( data[M_BYTENUM(sbit)] >> (sbit&0x07) )                                                   // Lower portion of value
  4.  
Jan 29 '10 #7
Banfa
9,065 Expert Mod 8TB
Did you try it as an inlined function?
Jan 29 '10 #8
RichG
14
I did try it as an inline function and yes, it still ran 40x slower...

however, I just tried something different. Up until now I was compiling with VC++ 2008 Express Edition in Debug mode. I've since turned on Release mode and I can't even register a single clock tick to provide a measurable difference between using a function or Macro. I've tried up to 8 nested loops each going from 0 to ULONG_MAX and the end result is still 0 clock ticks.
Feb 3 '10 #9
Banfa
9,065 Expert Mod 8TB
That is easy to explain, inlining is normally switched off in debug mode to facilitate debugging, it is very hard to step through a program when the code is all jumbled up. You should always do speed tests using release mode with full optimisation.

I am a little surprised that you where unable to register any time at all with 4 such nested loops which suggests a silly mistake on your part somewhere. You need to make sure that the compiler is unable to tell that you code does nothing because if it can it will optimise it away. For example

Expand|Select|Wrap|Line Numbers
  1. #include <iostream>
  2. #include <ctime>
  3. #include <climits>
  4.  
  5. using namespace std;
  6.  
  7. int total = 0;
  8.  
  9. int main()
  10. {
  11.     int test = 0;
  12.     clock_t start, finish;
  13.  
  14.     start = clock();
  15.     for(unsigned long ix=1; ix<ULONG_MAX; ix++)
  16.     {
  17. //        total += ix;
  18.         test = 0;
  19.     }
  20.     finish = clock();
  21.  
  22.     cout << start << "," << finish << "," << double((finish - start)/(CLOCKS_PER_SEC/1000))/1000 << "," << test << endl;
  23. }
  24.  
This program returns 0 time used, the compiler can tell the loop does nothing and gets rid of it. It you uncomment line 17 the compiler can no longer do that because the loop alters global data now the program takes approximately 2.6 seconds to run.

Also figures from measurements like this on a multi-threaded OS particularly running on a multi-core processor are not very reliable. They might be able to give you an idea of which piece of code is faster but anything other than orders of magnitude is not really significant.
Feb 3 '10 #10

Sign in to post your reply or Sign up for a free account.

Similar topics

4
by: Shashi | last post by:
Can somebody explain how the byte alignment for structures work, taking the following example and considering: byte of 1 Byte word of 2 Bytes dword of 4 Bytes typedef struct { byte a; word...
11
by: Taran | last post by:
Hi all, I was wondering how does address alignment to x byte boundary is done. For example, if I say "adjust the requested size to be on a 4-byte boundary" or for that matter 8 byte boundary....
12
by: Olaf Baeyens | last post by:
I am porting some of my buffer class code for C++ to C#. This C++ class allocates a block of memory using m_pBuffer=new BYTE; But since the class is also used for pointers for funtions that uses...
33
by: Benjamin M. Stocks | last post by:
Hello all, I've heard differing opinions on this and would like a definitive answer on this once and for all. If I have an array of 4 1-byte values where index 0 is the least signficant byte of a...
10
by: Chad | last post by:
On the following sites faq http://c-faq.com/strangeprob/ptralign.html "By converting a char * (which can point to any byte) to an int * or long int *, and then indirecting on it, you can end...
20
by: quantumred | last post by:
I found the following code floating around somewhere and I'd like to get some comments. unsigned char a1= { 5,10,15,20}; unsigned char a2= { 25,30,35,40}; *(unsigned int *)a1=*(unsigned int...
5
by: moni | last post by:
Hey, My buffer contains a short int, some char, and a structure in form of a byte array. Read the string as: TextBox4.Text = System.Text.Encoding.ASCII.GetString(buffer1, 0, 31); Read...
15
by: shaanxxx | last post by:
why malloc (allocator) guarantees that address return by them will be aligned by 8 byte ( on 32bit machin ) or 16 byte (64 bit machin) ?
19
by: glchin | last post by:
Does a compiler guarantee that the variable w below is placed on an eight-byte aligned address? void myFunction( long iFreq ) { const double w = two_pi * iFreq; ... ... }
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.