473,654 Members | 3,264 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Working with non byte aligned data values in memory

14 New Member
I'm working with a data stream of 8 bytes in an embedded application. In most cases the data is byte aligned so I can define a structure and then memcpy the data directly to the structure elements. There are, however, a few cases where the 16 bit data values span 2 or 3 bytes. I came up with a few macros to handle these cases. It's working fine, but I'm wondering if anyone can point out some obvious flaws or code optimizations to make the macros more efficient.

The sample code is:

Expand|Select|Wrap|Line Numbers
  1. //------------------------------------------------------------------------------------------------------------------------------------------------
  2. // M_EXTRACT
  3. // This macro extracts a value from within a block of memory.  The value extracted starts from and includes the start bit and ends with the
  4. // end bit.  The first bit is bit 0.  The maximum number of bits for a value is 16.
  5. //------------------------------------------------------------------------------------------------------------------------------------------------
  6.  
  7.  
  8. #include <stdio.h>
  9.  
  10. #define M_EXTRACT16(sbit, ebit, data)   ( (M_BYTENUM(sbit)) == (M_BYTENUM(ebit)) ? M_EXTRACT_1(sbit, ebit, data) : ( (M_BYTENUM(ebit)) - (M_BYTENUM(sbit)) == 1) ? M_EXTRACT_2(sbit, ebit, data) : M_EXTRACT_3(sbit, ebit, data) ) 
  11.  
  12. #define M_BYTENUM(x) (x>>3)                  // Determine which byte the bit resides in
  13. #define M_BYTEVAL(x) ( M_BYTENUM(x) << 3 )   // Determine starting bit 0 for a given byte... for example bit 0 of byte 2 is bit 16 in the data set.
  14.  
  15. #define M_EXTRACT_1(sbit, ebit, data) ( M_LOW(sbit, data) & ( (1 << (ebit - sbit + 1)) - 1) )              // If start & end are within one byte, use this.
  16. #define M_EXTRACT_2(sbit, ebit, data) ( M_LOW(sbit, data) | M_HIGH(sbit, ebit, data) )                     // If start & end span 2 bytes, use this.
  17. #define M_EXTRACT_3(sbit, ebit, data) ( M_LOW(sbit, data) | M_MID(sbit, data) | M_HIGH(sbit, ebit, data) ) // if start & end span three bytes, use this.
  18.  
  19. #define M_HIGH(sbit, ebit, data) ( (data[M_BYTENUM(ebit)] & ( (1<<(ebit - M_BYTEVAL(ebit) + 1)) - 1 )  ) << (( M_BYTEVAL(ebit) - sbit)&0x0f ) ) // Upper portion of value
  20. #define M_MID(sbit, data)        ( data[M_BYTENUM(sbit)+1] << ((M_BYTEVAL(( M_BYTENUM(sbit)+1 )) - sbit )&0x0f) )                               // Middle byte value when value spans three bytes
  21. #define M_LOW(sbit, data)        ( data[M_BYTENUM(sbit)] >> ((sbit - M_BYTEVAL(sbit))&0x0f) )                                                   // Lower portion of value
  22.  
  23. void main()
  24. {
  25.     unsigned int i, sb, eb;
  26.     unsigned char d[8];
  27.  
  28.     d[0] = 0xa7;
  29.     d[1] = 0xc2;
  30.     d[2] = 0xd6;
  31.     d[3] = 0xe6;
  32.     d[4] = 0xa3;
  33.     d[5] = 0x5a;
  34.     d[6] = 0xa5;
  35.     d[7] = 0x1c;
  36.  
  37.     sb =     0; eb =  4; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  38.     sb =  eb+1; eb = 10; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  39.     sb =  eb+1; eb = 22; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  40.     sb =  eb+1; eb = 29; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  41.     sb =  eb+1; eb = 30; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  42.     sb =  eb+1; eb = 43; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  43.     sb =  eb+1; eb = 50; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  44.     sb =  eb+1; eb = 60; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  45.     sb =  eb+1; eb = 63; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);
  46. }
The output of the sample is:

Expand|Select|Wrap|Line Numbers
  1. sb = 00 | eb = 04 | i = 7
  2. sb = 05 | eb = 10 | i = 21
  3. sb = 11 | eb = 22 | i = 2776
  4. sb = 23 | eb = 29 | i = 77
  5. sb = 30 | eb = 30 | i = 1
  6. sb = 31 | eb = 43 | i = 5447
  7. sb = 44 | eb = 50 | i = 85
  8. sb = 51 | eb = 60 | i = 916
  9. sb = 61 | eb = 63 | i = 0
Jan 8 '10 #1
9 3210
Banfa
9,065 Recognized Expert Moderator Expert
Erm, I suspect you might actually be better off using functions (or at least some functions instead of macros. I would have thought the compiler/optimiser would have a better chance which a few statements rather than a single enormousness statement, particularly if you are calling the top level macro with variables rather than constants and particularly if you are using C++ and can inline your functions. It would certainly be easier to debug that way. If you consider the expansion a compiler has to face for a call to M_EXTRACT16(sbi t, ebit, data); the line of code is horrendous.

Also the lack of casts concerns me, are you sure you are getting the right values? Consider this

Expand|Select|Wrap|Line Numbers
  1. char cv = 0x48;
  2. int iv = cv << 2;
Does iv have the value 0x0120 or 0x0020? Does the shift happen in integer or byte arithmetic? I should know the answer to this question and don't. But I do know whenever I shift bytes like that I always cast them first to the output type.
Jan 10 '10 #2
RichG
14 New Member
I did everything with macros to save on the overhead associated with a function call. My thinking was the preprocessor would do all of the constant math at compile time thus coming up with a condensed line of code rather than an expanded mess.

As for getting the right values, all of my test cases return correct values so I'm fairly confident the casts aren't needed. This, however, may be compiler dependent.

I did run into a problem with the shifting values. For some reason the result of (sbit - M_BYTEVAL(sbit) ) would front-fill with all 1's thus causing a huge shift. Bitwise ANDing the result with 0x0f took care of that.
Jan 11 '10 #3
Banfa
9,065 Recognized Expert Moderator Expert
I have a feeling that shifting by the number of bits in the lhs or more results in undefined behaviour. For example where int is 32 bits then

int a = 5;
a << N;

for any N >= 32 is undefined behaviour. I'm not sure if this effects your code.

The constant arithmatic would be done by the compiler as long as, like I said, you actually called the top level macro with constants rather than variables.

Saving the function call overhead is all well and good but in its place you have left code that will be very hard for anyone following to understand (or even yourself in a few years). Are you sure this is a bottle neck in your program?

This code appears to me to be a hack to get round the issue of function call overhead being too much. Now sometimes that sort of hack is required, I've done them myself however I am not convinced that you have written your code and then discovered that the the function call is a bottle neck, for instance through profiling. So I believe that you may be guilty of premature optomising (which you can google), that is optomising before you really know that the thing you are spending time and effort optomising is actually a program performance bottleneck.

There is quite a lot of material out there on premature optomising but I like this which also deals with the issue of not optomising at all, i.e. it explains what the originator of the term meant rather than what the term has slowly come to mean.

If you have already written and profiled your program and determined this is a bottleneck then fine but if not then I would suggest writing a clear and easy to understand function and once your code is finished profiling it. Then if the function is a bottleneck in the program you can come back to these macros.
Jan 12 '10 #4
RichG
14 New Member
Left shifting by too many bits would definitely produce inaccurate results. As the name implies, the macro is designed to extract a 16-bit value from some location in a byte array. No error checking is done to see if (ebit-sbit)>16. I've left it up to the programmer to enter correct values from the start.

As you mentioned, I may be prematurely optimizing code here as I've not measured to establish a benchmark. As it is now I've not run into any bottlenecks with the message processing.

More than anything I'm trying to design the code to be lean up front to avoid problems in the future. Utilizing bit operations, for example, that might save on cycles. Something like this simplification:

y-x-1 = ~x+y

May save a few cycles due to fewer operations.

If there are any obvious bit tricks like this that can be done with the above mentioned marco-- or even if it were in function form-- then it may be worth putting them in the code.
Jan 18 '10 #5
RichG
14 New Member
Found bug.

M_MID should be:

Expand|Select|Wrap|Line Numbers
  1. #define M_MID(sbit, data)        ( data[M_BYTENUM(sbit)+1] << ((((M_BYTENUM(sbit)+1) << 3) - sbit ) &0x0f) )                                    // Middle byte value when value spans three bytes
  2.  
Jan 20 '10 #6
RichG
14 New Member
Just an update here... I wrote a function to perform the same task and compared it to the Macro version. On average, the Macro ran 40x faster than the function version.

Also, I was able to optimize the code slightly:

Expand|Select|Wrap|Line Numbers
  1. #define M_HIGH(sbit, ebit, data) ( (data[M_BYTENUM(ebit)] & ( (1<<((ebit&0x07) + 1)) - 1 )  ) << (( (ebit&0xf8) - sbit) ) ) // Upper portion of value
  2. #define M_MID(sbit, data)        ( data[M_BYTENUM(sbit)+1] << (  ((~sbit)&0x07) +1) )                                    // Middle byte value when value spans three bytes
  3. #define M_LOW(sbit, data)        ( data[M_BYTENUM(sbit)] >> (sbit&0x07) )                                                   // Lower portion of value
  4.  
Jan 29 '10 #7
Banfa
9,065 Recognized Expert Moderator Expert
Did you try it as an inlined function?
Jan 29 '10 #8
RichG
14 New Member
I did try it as an inline function and yes, it still ran 40x slower...

however, I just tried something different. Up until now I was compiling with VC++ 2008 Express Edition in Debug mode. I've since turned on Release mode and I can't even register a single clock tick to provide a measurable difference between using a function or Macro. I've tried up to 8 nested loops each going from 0 to ULONG_MAX and the end result is still 0 clock ticks.
Feb 3 '10 #9
Banfa
9,065 Recognized Expert Moderator Expert
That is easy to explain, inlining is normally switched off in debug mode to facilitate debugging, it is very hard to step through a program when the code is all jumbled up. You should always do speed tests using release mode with full optimisation.

I am a little surprised that you where unable to register any time at all with 4 such nested loops which suggests a silly mistake on your part somewhere. You need to make sure that the compiler is unable to tell that you code does nothing because if it can it will optimise it away. For example

Expand|Select|Wrap|Line Numbers
  1. #include <iostream>
  2. #include <ctime>
  3. #include <climits>
  4.  
  5. using namespace std;
  6.  
  7. int total = 0;
  8.  
  9. int main()
  10. {
  11.     int test = 0;
  12.     clock_t start, finish;
  13.  
  14.     start = clock();
  15.     for(unsigned long ix=1; ix<ULONG_MAX; ix++)
  16.     {
  17. //        total += ix;
  18.         test = 0;
  19.     }
  20.     finish = clock();
  21.  
  22.     cout << start << "," << finish << "," << double((finish - start)/(CLOCKS_PER_SEC/1000))/1000 << "," << test << endl;
  23. }
  24.  
This program returns 0 time used, the compiler can tell the loop does nothing and gets rid of it. It you uncomment line 17 the compiler can no longer do that because the loop alters global data now the program takes approximately 2.6 seconds to run.

Also figures from measurements like this on a multi-threaded OS particularly running on a multi-core processor are not very reliable. They might be able to give you an idea of which piece of code is faster but anything other than orders of magnitude is not really significant.
Feb 3 '10 #10

Sign in to post your reply or Sign up for a free account.

Similar topics

4
17681
by: Shashi | last post by:
Can somebody explain how the byte alignment for structures work, taking the following example and considering: byte of 1 Byte word of 2 Bytes dword of 4 Bytes typedef struct { byte a; word b;
11
3769
by: Taran | last post by:
Hi all, I was wondering how does address alignment to x byte boundary is done. For example, if I say "adjust the requested size to be on a 4-byte boundary" or for that matter 8 byte boundary. How is this adjustment/alignment done? I goolged around alot and wans't able to find how is it done, all it said was what is byte alignment and byte padding.
12
8110
by: Olaf Baeyens | last post by:
I am porting some of my buffer class code for C++ to C#. This C++ class allocates a block of memory using m_pBuffer=new BYTE; But since the class is also used for pointers for funtions that uses raw MMX and SSE power, the starting pointer MUST be starting at a 16 byte memory boundary. In C++ I allocate more memory than needed, and in a second phase I search for the address that starts on a 16 byte boundary. And I then use that new...
33
3387
by: Benjamin M. Stocks | last post by:
Hello all, I've heard differing opinions on this and would like a definitive answer on this once and for all. If I have an array of 4 1-byte values where index 0 is the least signficant byte of a 4-byte value. Can I use the arithmatic shift operators to hide the endian-ness of the underlying processor when assembling a native 4-byte value like follows: unsigned int integerValue; unsigned char byteArray;
10
2006
by: Chad | last post by:
On the following sites faq http://c-faq.com/strangeprob/ptralign.html "By converting a char * (which can point to any byte) to an int * or long int *, and then indirecting on it, you can end up asking the processor to fetch a multibyte value from an unaligned address, which it isn't willing to do. " How can char * point to any byte, but int * can't? Can someone clarify
20
3503
by: quantumred | last post by:
I found the following code floating around somewhere and I'd like to get some comments. unsigned char a1= { 5,10,15,20}; unsigned char a2= { 25,30,35,40}; *(unsigned int *)a1=*(unsigned int *)a2; // now a1=a2, a1=a2, etc.
5
3462
by: moni | last post by:
Hey, My buffer contains a short int, some char, and a structure in form of a byte array. Read the string as: TextBox4.Text = System.Text.Encoding.ASCII.GetString(buffer1, 0, 31); Read the int as:
15
7943
by: shaanxxx | last post by:
why malloc (allocator) guarantees that address return by them will be aligned by 8 byte ( on 32bit machin ) or 16 byte (64 bit machin) ?
19
4122
by: glchin | last post by:
Does a compiler guarantee that the variable w below is placed on an eight-byte aligned address? void myFunction( long iFreq ) { const double w = two_pi * iFreq; ... ... }
0
8379
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8294
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8709
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
7309
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6162
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4150
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
2719
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1924
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1597
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.