Working with non byte aligned data values in memory

14 New Member

I'm working with a data stream of 8 bytes in an embedded application. In most cases the data is byte aligned so I can define a structure and then memcpy the data directly to the structure elements. There are, however, a few cases where the 16 bit data values span 2 or 3 bytes. I came up with a few macros to handle these cases. It's working fine, but I'm wondering if anyone can point out some obvious flaws or code optimizations to make the macros more efficient.

The sample code is:

Expand|Select|Wrap|Line Numbers

 //------------------------------------------------------------------------------------------------------------------------------------------------

// M_EXTRACT

// This macro extracts a value from within a block of memory.  The value extracted starts from and includes the start bit and ends with the

// end bit.  The first bit is bit 0.  The maximum number of bits for a value is 16.

//------------------------------------------------------------------------------------------------------------------------------------------------
 
#include <stdio.h>
 
#define M_EXTRACT16(sbit, ebit, data)   ( (M_BYTENUM(sbit)) == (M_BYTENUM(ebit)) ? M_EXTRACT_1(sbit, ebit, data) : ( (M_BYTENUM(ebit)) - (M_BYTENUM(sbit)) == 1) ? M_EXTRACT_2(sbit, ebit, data) : M_EXTRACT_3(sbit, ebit, data) ) 
 
#define M_BYTENUM(x) (x>>3)                  // Determine which byte the bit resides in

#define M_BYTEVAL(x) ( M_BYTENUM(x) << 3 )   // Determine starting bit 0 for a given byte... for example bit 0 of byte 2 is bit 16 in the data set.
 
#define M_EXTRACT_1(sbit, ebit, data) ( M_LOW(sbit, data) & ( (1 << (ebit - sbit + 1)) - 1) )              // If start & end are within one byte, use this.

#define M_EXTRACT_2(sbit, ebit, data) ( M_LOW(sbit, data) | M_HIGH(sbit, ebit, data) )                     // If start & end span 2 bytes, use this.

#define M_EXTRACT_3(sbit, ebit, data) ( M_LOW(sbit, data) | M_MID(sbit, data) | M_HIGH(sbit, ebit, data) ) // if start & end span three bytes, use this.
 
#define M_HIGH(sbit, ebit, data) ( (data[M_BYTENUM(ebit)] & ( (1<<(ebit - M_BYTEVAL(ebit) + 1)) - 1 )  ) << (( M_BYTEVAL(ebit) - sbit)&0x0f ) ) // Upper portion of value

#define M_MID(sbit, data)        ( data[M_BYTENUM(sbit)+1] << ((M_BYTEVAL(( M_BYTENUM(sbit)+1 )) - sbit )&0x0f) )                               // Middle byte value when value spans three bytes

#define M_LOW(sbit, data)        ( data[M_BYTENUM(sbit)] >> ((sbit - M_BYTEVAL(sbit))&0x0f) )                                                   // Lower portion of value
 
void main()

{

    unsigned int i, sb, eb;

    unsigned char d[8];
 
    d[0] = 0xa7;

    d[1] = 0xc2;

    d[2] = 0xd6;

    d[3] = 0xe6;

    d[4] = 0xa3;

    d[5] = 0x5a;

    d[6] = 0xa5;

    d[7] = 0x1c;
 
    sb =     0; eb =  4; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);

    sb =  eb+1; eb = 10; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);

    sb =  eb+1; eb = 22; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);

    sb =  eb+1; eb = 29; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);

    sb =  eb+1; eb = 30; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);

    sb =  eb+1; eb = 43; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);

    sb =  eb+1; eb = 50; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);

    sb =  eb+1; eb = 60; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);

    sb =  eb+1; eb = 63; i = M_EXTRACT16( sb,  eb, d);    printf("sb = %02d | eb = %02d | i = %d\n", sb, eb, i);

}

The output of the sample is:

Expand|Select|Wrap|Line Numbers

 sb = 00 | eb = 04 | i = 7

sb = 05 | eb = 10 | i = 21

sb = 11 | eb = 22 | i = 2776

sb = 23 | eb = 29 | i = 77

sb = 30 | eb = 30 | i = 1

sb = 31 | eb = 43 | i = 5447

sb = 44 | eb = 50 | i = 85

sb = 51 | eb = 60 | i = 916

sb = 61 | eb = 63 | i = 0

Jan 8 '10 #1

Subscribe Reply

3210

Banfa

9,065

Recognized Expert Moderator Expert

Erm, I suspect you might actually be better off using functions (or at least some functions instead of macros. I would have thought the compiler/optimiser would have a better chance which a few statements rather than a single enormousness statement, particularly if you are calling the top level macro with variables rather than constants and particularly if you are using C++ and can inline your functions. It would certainly be easier to debug that way. If you consider the expansion a compiler has to face for a call to M_EXTRACT16(sbi t, ebit, data); the line of code is horrendous.

Also the lack of casts concerns me, are you sure you are getting the right values? Consider this

Expand|Select|Wrap|Line Numbers

 char cv = 0x48;

int iv = cv << 2;

Does iv have the value 0x0120 or 0x0020? Does the shift happen in integer or byte arithmetic? I should know the answer to this question and don't. But I do know whenever I shift bytes like that I always cast them first to the output type.

Jan 10 '10 #2

RichG

New Member

I did everything with macros to save on the overhead associated with a function call. My thinking was the preprocessor would do all of the constant math at compile time thus coming up with a condensed line of code rather than an expanded mess.

As for getting the right values, all of my test cases return correct values so I'm fairly confident the casts aren't needed. This, however, may be compiler dependent.

I did run into a problem with the shifting values. For some reason the result of (sbit - M_BYTEVAL(sbit) ) would front-fill with all 1's thus causing a huge shift. Bitwise ANDing the result with 0x0f took care of that.

Jan 11 '10 #3

Banfa

9,065

Recognized Expert Moderator Expert

I have a feeling that shifting by the number of bits in the lhs or more results in undefined behaviour. For example where int is 32 bits then

int a = 5;
a << N;

for any N >= 32 is undefined behaviour. I'm not sure if this effects your code.

The constant arithmatic would be done by the compiler as long as, like I said, you actually called the top level macro with constants rather than variables.

Saving the function call overhead is all well and good but in its place you have left code that will be very hard for anyone following to understand (or even yourself in a few years). Are you sure this is a bottle neck in your program?

This code appears to me to be a hack to get round the issue of function call overhead being too much. Now sometimes that sort of hack is required, I've done them myself however I am not convinced that you have written your code and then discovered that the the function call is a bottle neck, for instance through profiling. So I believe that you may be guilty of premature optomising (which you can google), that is optomising before you really know that the thing you are spending time and effort optomising is actually a program performance bottleneck.

There is quite a lot of material out there on premature optomising but I like this which also deals with the issue of not optomising at all, i.e. it explains what the originator of the term meant rather than what the term has slowly come to mean.

If you have already written and profiled your program and determined this is a bottleneck then fine but if not then I would suggest writing a clear and easy to understand function and once your code is finished profiling it. Then if the function is a bottleneck in the program you can come back to these macros.

Jan 12 '10 #4

RichG

New Member

Left shifting by too many bits would definitely produce inaccurate results. As the name implies, the macro is designed to extract a 16-bit value from some location in a byte array. No error checking is done to see if (ebit-sbit)>16. I've left it up to the programmer to enter correct values from the start.

As you mentioned, I may be prematurely optimizing code here as I've not measured to establish a benchmark. As it is now I've not run into any bottlenecks with the message processing.

More than anything I'm trying to design the code to be lean up front to avoid problems in the future. Utilizing bit operations, for example, that might save on cycles. Something like this simplification:

y-x-1 = ~x+y

May save a few cycles due to fewer operations.

If there are any obvious bit tricks like this that can be done with the above mentioned marco-- or even if it were in function form-- then it may be worth putting them in the code.

Jan 18 '10 #5

RichG

New Member

Found bug.

M_MID should be:

Expand|Select|Wrap|Line Numbers

 #define M_MID(sbit, data)        ( data[M_BYTENUM(sbit)+1] << ((((M_BYTENUM(sbit)+1) << 3) - sbit ) &0x0f) )                                    // Middle byte value when value spans three bytes
 
 

Jan 20 '10 #6

RichG

New Member

Just an update here... I wrote a function to perform the same task and compared it to the Macro version. On average, the Macro ran 40x faster than the function version.

Also, I was able to optimize the code slightly:

Expand|Select|Wrap|Line Numbers

 #define M_HIGH(sbit, ebit, data) ( (data[M_BYTENUM(ebit)] & ( (1<<((ebit&0x07) + 1)) - 1 )  ) << (( (ebit&0xf8) - sbit) ) ) // Upper portion of value

#define M_MID(sbit, data)        ( data[M_BYTENUM(sbit)+1] << (  ((~sbit)&0x07) +1) )                                    // Middle byte value when value spans three bytes

#define M_LOW(sbit, data)        ( data[M_BYTENUM(sbit)] >> (sbit&0x07) )                                                   // Lower portion of value

Jan 29 '10 #7

Banfa

9,065

Recognized Expert Moderator Expert

Did you try it as an inlined function?

Jan 29 '10 #8

RichG

New Member

I did try it as an inline function and yes, it still ran 40x slower...

however, I just tried something different. Up until now I was compiling with VC++ 2008 Express Edition in Debug mode. I've since turned on Release mode and I can't even register a single clock tick to provide a measurable difference between using a function or Macro. I've tried up to 8 nested loops each going from 0 to ULONG_MAX and the end result is still 0 clock ticks.

Feb 3 '10 #9

Banfa

9,065

Recognized Expert Moderator Expert

That is easy to explain, inlining is normally switched off in debug mode to facilitate debugging, it is very hard to step through a program when the code is all jumbled up. You should always do speed tests using release mode with full optimisation.

I am a little surprised that you where unable to register any time at all with 4 such nested loops which suggests a silly mistake on your part somewhere. You need to make sure that the compiler is unable to tell that you code does nothing because if it can it will optimise it away. For example

Expand|Select|Wrap|Line Numbers

 
#include <iostream>

#include <ctime>

#include <climits>
 
using namespace std;
 
int total = 0;
 
int main()

{

    int test = 0;

    clock_t start, finish;
 
    start = clock();

    for(unsigned long ix=1; ix<ULONG_MAX; ix++)

    {

//        total += ix;

        test = 0;

    }

    finish = clock();
 
    cout << start << "," << finish << "," << double((finish - start)/(CLOCKS_PER_SEC/1000))/1000 << "," << test << endl;

}

This program returns 0 time used, the compiler can tell the loop does nothing and gets rid of it. It you uncomment line 17 the compiler can no longer do that because the loop alters global data now the program takes approximately 2.6 seconds to run.

Also figures from measurements like this on a multi-threaded OS particularly running on a multi-core processor are not very reliable. They might be able to give you an idea of which piece of code is faster but anything other than orders of magnitude is not really significant.

Feb 3 '10 #10

Similar topics

17681

Byte Alignment

by: Shashi | last post by:

Can somebody explain how the byte alignment for structures work, taking the following example and considering: byte of 1 Byte word of 2 Bytes dword of 4 Bytes typedef struct { byte a; word b;

C / C++

3769

Address on x- byte boundary

by: Taran | last post by:

Hi all, I was wondering how does address alignment to x byte boundary is done. For example, if I say "adjust the requested size to be on a 4-byte boundary" or for that matter 8 byte boundary. How is this adjustment/alignment done? I goolged around alot and wans't able to find how is it done, all it said was what is byte alignment and byte padding.

C / C++

8110

"new byte[132]" alignment on 16 bytes

by: Olaf Baeyens | last post by:

I am porting some of my buffer class code for C++ to C#. This C++ class allocates a block of memory using m_pBuffer=new BYTE; But since the class is also used for pointers for funtions that uses raw MMX and SSE power, the starting pointer MUST be starting at a 16 byte memory boundary. In C++ I allocate more memory than needed, and in a second phase I search for the address that starts on a 16 byte boundary. And I then use that new...

C# / C Sharp

3387

Byte ordering and array access

by: Benjamin M. Stocks | last post by:

Hello all, I've heard differing opinions on this and would like a definitive answer on this once and for all. If I have an array of 4 1-byte values where index 0 is the least signficant byte of a 4-byte value. Can I use the arithmatic shift operators to hide the endian-ness of the underlying processor when assembling a native 4-byte value like follows: unsigned int integerValue; unsigned char byteArray;

C / C++

2006

char * can point to any byte?!

by: Chad | last post by:

On the following sites faq http://c-faq.com/strangeprob/ptralign.html "By converting a char * (which can point to any byte) to an int * or long int *, and then indirecting on it, you can end up asking the processor to fetch a multibyte value from an unaligned address, which it isn't willing to do. " How can char * point to any byte, but int * can't? Can someone clarify

C / C++

3503

Casting a byte array to allow assignment

by: quantumred | last post by:

I found the following code floating around somewhere and I'd like to get some comments. unsigned char a1= { 5,10,15,20}; unsigned char a2= { 25,30,35,40}; *(unsigned int *)a1=*(unsigned int *)a2; // now a1=a2, a1=a2, etc.

C / C++

3462

Convert from byte[] to structure value

by: moni | last post by:

Hey, My buffer contains a short int, some char, and a structure in form of a byte array. Read the string as: TextBox4.Text = System.Text.Encoding.ASCII.GetString(buffer1, 0, 31); Read the int as:

C# / C Sharp

7943

malloc(allocator) aligned by 8 or 16 byte

by: shaanxxx | last post by:

why malloc (allocator) guarantees that address return by them will be aligned by 8 byte ( on 32bit machin ) or 16 byte (64 bit machin) ?

C / C++

4122

Eight-byte alignment

by: glchin | last post by:

Does a compiler guarantee that the variable w below is placed on an eight-byte aligned address? void myFunction( long iFreq ) { const double w = two_pi * iFreq; ... ... }

C / C++

8379

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

8294

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

8709

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

7309

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

6162

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

4150

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

2719

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

1924

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

1597

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General