By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,206 Members | 1,020 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,206 IT Pros & Developers. It's quick & easy.

Multibyte add & subtract

P: n/a
Hi guys,

(rather lengthy...)

I'm trying to speed up the time spent on a postfilter for video.
YUV 4:2:0 data, each pixel is 1 byte (0-255)

The basic idea is to filter one pixel on each side of a 8-pixel border.
The filter used is a variant of (1,1,-4,1,1).

In the example below I do a vertical filtering of line n and the
diff for pixel c1 is calculated as
diff(c1) = a1+b1+(c1<<2)+d1+e1 (1)
c2 as
diff(c2) = a2+b2+(c2<<2)+d2+e2
etc.

Pixel 1.2.3.4.
--------------
n-2 a1a2a3a4
n-1 b1b2b3b4
n c1c2c3c4
----- pixel border----
n+1 d12d3d4d
n+2 e1e2e3e4

The current implementation reads the values of a1,b1,c1,d1,e1 one byte
at a time, do the calculation and write back the filtered value for c1.
I.e something close to the code below:
imdifftmp = *(ImageSrc_p-w2);
imdiff2 = *(ImageSrc_p-w2+1);
...
imdiff8 = *(ImageSrc_p-w2+7);
imdifftmp += *(ImageSrc_p-width);
imdiff2 += *(ImageSrc_p-width+1);
...
imdiff8 += *(ImageSrc_p-width+7);
imdifftmp -= (*(ImageSrc_p)) << 2;
imdiff2 -= (*(ImageSrc_p+1)) << 2;
...
imdiff8 -= (*(ImageSrc_p+7)) << 2;
imdifftmp += *(ImageSrc_p+width);
imdiff2 += *(ImageSrc_p+width+1);
...
imdiff8 += *(ImageSrc_p+width+7);
imdifftmp += *(ImageSrc_p+w2);
imdiff2 += *(ImageSrc_p+w2+1);
...
imdiff8 += *(ImageSrc_p+w2+7);

Not very efficient on a 32-bit machine! What I'm trying to achive is
to read a 32-bit word containing 4 pixel values, do the calculation
an a whole word and write back a word. After some googeling I found
the book "Hackers Delight" by Henry S. Warren, Jr. He presents such
a method implemented by the two macros below:

//Multibyte Add of 4 1-byte integers packed into a word
#define MBA(x, y, s)\
do{\
s = ((x)&0x7f7f7f7f)+((y)&0x7f7f7f7f); \
s = (((x)^(y))&0x80808080)^s; \
// printf("\ncarry %08lX", ((x)+(y))^(x)^(y));\
}while(0)

//Multibyte Subtract of 4 1-byte integers packed into a word
#define MBS(x, y, d)\
do{\
d = ((x)|0x80808080)-((y)&0x7f7f7f7f); \
d = ~((((x)^(y))|0x7f7f7f7f)^d); \
// printf("\ncarry %08lX", ((x)+(y))^(x)^(y));\
}while(0)

He also states that the operation below gives the carry into each
position
(where in this case denotes bitwise exclusive or (^):
(xy)xy

These macros works great for small values! The problem is how to handle
the carry so that the correct values after the calculations in (1)
can be extracted. My question (finally!) is:
How can I (if it is possible) handle the carry to recreate the correct
signed integer value after the calculations above?

Some sample code below:

void main(void){
long a1 = 0xc7c8c9ca;
long b1 = 0xc8c9cacb;
long c1 = 0xddc8c9ca;
long d1 = 0xcacbcccd;
long e1 = 0xcbcccdce;

MBA(a1,b1,s1); //a+b
MBA(s1,d1,s2); //+d
MBA(s2,e1,s1); //+e
MBS(s1,c1,s2); //-c (is it possible to do the -(c<<2) part smarter?
MBS(s2,c1,s1); //-c
MBS(s1,c1,s2); //-c
MBS(s2,c1,s1); //-c

//Extract MSB Byte (B0) and add carry stuff...

printf("\nvalue after macros %08lX, value after calc %08lX\n", s1,
0xc7+0xc8-(0xdd<<2)+0xca+0xcb);
}

Gives:
carry 9F939794
carry 1F073F3A
carry B7B9BF9C
carry F8101000
carry BF81879C
carry FF313730
carry 3B818384
value after macros B0080808, value after calc FFFFFFB0
-- --------
^ ^
|----------------------------|
|
Same value for different methods

Cheers
//Fredrik

Apr 28 '06 #1
Share this Question
Share on Google+
2 Replies


P: n/a
va*****@linuxmail.org wrote:
I'm trying to speed up the time spent on a postfilter for video.
YUV 4:2:0 data, each pixel is 1 byte (0-255)

In the example below I do a vertical filtering of line n and the
diff for pixel c1 is calculated as
diff(c1) = a1+b1+(c1<<2)+d1+e1 (1)
...
Not very efficient on a 32-bit machine! What I'm trying to achive is
to read a 32-bit word containing 4 pixel values, do the calculation
an a whole word and write back a word. After some googeling I found
the book "Hackers Delight" by Henry S. Warren, Jr. He presents such
a method implemented by the two macros below:

//Multibyte Add of 4 1-byte integers packed into a word
#define MBA(x, y, s)\
do{\
s = ((x)&0x7f7f7f7f)+((y)&0x7f7f7f7f); \
s = (((x)^(y))&0x80808080)^s; \
// printf("\ncarry %08lX", ((x)+(y))^(x)^(y));\
}while(0)

//Multibyte Subtract of 4 1-byte integers packed into a word
#define MBS(x, y, d)\
do{\
d = ((x)|0x80808080)-((y)&0x7f7f7f7f); \
d = ~((((x)^(y))|0x7f7f7f7f)^d); \
// printf("\ncarry %08lX", ((x)+(y))^(x)^(y));\
}while(0)
Each of the 8-bit fields is added mod 2^8.
These macros works great for small values! The problem is how to handle
the carry so that the correct values after the calculations in (1)
can be extracted. My question (finally!) is:
How can I (if it is possible) handle the carry to recreate the correct
signed integer value after the calculations above?

Some sample code below:

void main(void){
long a1 = 0xc7c8c9ca;
long b1 = 0xc8c9cacb;
long c1 = 0xddc8c9ca;
long d1 = 0xcacbcccd;
long e1 = 0xcbcccdce;

MBA(a1,b1,s1); //a+b
MBA(s1,d1,s2); //+d
MBA(s2,e1,s1); //+e
MBS(s1,c1,s2); //-c (is it possible to do the -(c<<2) part smarter?
MBS(s2,c1,s1); //-c
MBS(s1,c1,s2); //-c
MBS(s2,c1,s1); //-c
...
value after macros B0080808, value after calc FFFFFFB0
-- --------
^ ^
|----------------------------|
|
Same value for different methods


As you note, the 8 lsbs are correct. If you can guarantee that the
difference in pixel value over points a - e is less than 64, you can
simply use the msb as the sign bit. In your example the msb of B0 = 1,
so sign extend the bit.

If you make no assumptions about value range in the group, then the
range of computed value is -4*255 to 4*255. That requires 11 bits to
uniquely represent each value. You could represent each pixel as 11
bits, with the initial 3 msbs = 0. You could thus pack 2 pixels in a
32-bit word or 5 pixels in a 64-bit word. If you can guarantee a pixel
value difference of 128 or less in each 5 point group, you could get by
with 10 bits/pixel, packing 3 pixels per 32-bit word.

If you choose to use two 11 bit pixels in a 32-bit word, you might as
well pack 2 16-bit values per 32-bit word, which gives easier packing
and unpacking.

--
Thad
Apr 29 '06 #2

P: n/a

<va*****@linuxmail.org> wrote in message
news:11**********************@e56g2000cwe.googlegr oups.com...

Valinor,

I've made some corrections. Don't let those get to you. There are some
useful non-correction related comments below.
I'm trying to speed up the time spent on a postfilter for video.
YUV 4:2:0 data, each pixel is 1 byte (0-255)

The basic idea is to filter one pixel on each side of a 8-pixel border.
The filter used is a variant of (1,1,-4,1,1).

In the example below I do a vertical filtering of line n and the
diff for pixel c1 is calculated as
diff(c1) = a1+b1+(c1<<2)+d1+e1 (1)
c2 as
diff(c2) = a2+b2+(c2<<2)+d2+e2
etc.

Pixel 1.2.3.4.
--------------
n-2 a1a2a3a4
n-1 b1b2b3b4
n c1c2c3c4
----- pixel border----
n+1 d12d3d4d
n+2 e1e2e3e4
<snip>
//Multibyte Add of 4 1-byte integers packed into a word
#define MBA(x, y, s)\
do{\
s = ((x)&0x7f7f7f7f)+((y)&0x7f7f7f7f); \
s = (((x)^(y))&0x80808080)^s; \
// printf("\ncarry %08lX", ((x)+(y))^(x)^(y));\
The C++ comments create a multi-line comment according to GCC. Rewrite like
so:

/* printf("\ncarry %08lX", ((x)+(y))^(x)^(y)); */ \
}while(0)

//Multibyte Subtract of 4 1-byte integers packed into a word
#define MBS(x, y, d)\
do{\
d = ((x)|0x80808080)-((y)&0x7f7f7f7f); \
d = ~((((x)^(y))|0x7f7f7f7f)^d); \
// printf("\ncarry %08lX", ((x)+(y))^(x)^(y));\
The C++ comments create a multi-line comment according to GCC. Rewrite like
so:

/* printf("\ncarry %08lX", ((x)+(y))^(x)^(y)); */ \
}while(0)

He also states that the operation below gives the carry into each
position
(where in this case denotes bitwise exclusive or (^):
(xy)xy

These macros works great for small values! The problem is how to handle
the carry so that the correct values after the calculations in (1)
can be extracted. My question (finally!) is:
How can I (if it is possible) handle the carry to recreate the correct
signed integer value after the calculations above?

The MBS macro _appears_ (you'll need to confirm) to be calculating two's
complement correctly. This means that the values _should_ be correctly
signed when you extract each byte and cast them from an unsigned variable to
a signed one. This is because most compilers use two's complement for
negative integers.
Some sample code below:

void main(void){
#include <stdio.h>
#include <stdlib.h>
int main(void) { /* corrected */
long a1 = 0xc7c8c9ca;
long b1 = 0xc8c9cacb;
long c1 = 0xddc8c9ca;
long d1 = 0xcacbcccd;
long e1 = 0xcbcccdce;

long s1,s2; /* missing */
MBA(a1,b1,s1); //a+b
MBA(s1,d1,s2); //+d
MBA(s2,e1,s1); //+e
MBS(s1,c1,s2); //-c (is it possible to do the -(c<<2) part smarter?
MBS(s2,c1,s1); //-c
MBS(s1,c1,s2); //-c
MBS(s2,c1,s1); //-c

//Extract MSB Byte (B0) and add carry stuff...

printf("\nvalue after macros %08lX, value after calc %08lX\n", s1,
0xc7+0xc8-(0xdd<<2)+0xca+0xcb);
return(EXIT_SUCCESS); /* corrected */
}

In (1) above, you _add_ (c1<<2), but here you _subtract_ (c1<<2). Did you
want MBS() or MBA()?
MBS(s1,c1,s2); //-c (is it possible to do the -(c<<2) part smarter?
Yes, replace the four lines that compute (c1<<2), with (if you wanted MBS,
otherwise change to MBA):

MBS(s1,((c1&0x3f3f3f3f)<<2),s2); //-c (is it possible to do the -(c<<2) part
smarter?
Gives:
carry 9F939794
carry 1F073F3A
carry B7B9BF9C
carry F8101000
carry BF81879C
carry FF313730
carry 3B818384
value after macros B0080808, value after calc FFFFFFB0


Sorry, I didn't check these.
Rod Pemberton
Apr 29 '06 #3

This discussion thread is closed

Replies have been disabled for this discussion.