446,206 Members | 1,020 Online Need help? Post your question and get tips & solutions from a community of 446,206 IT Pros & Developers. It's quick & easy.

 P: n/a Hi guys, (rather lengthy...) I'm trying to speed up the time spent on a postfilter for video. YUV 4:2:0 data, each pixel is 1 byte (0-255) The basic idea is to filter one pixel on each side of a 8-pixel border. The filter used is a variant of (1,1,-4,1,1). In the example below I do a vertical filtering of line n and the diff for pixel c1 is calculated as diff(c1) = a1+b1+(c1<<2)+d1+e1 (1) c2 as diff(c2) = a2+b2+(c2<<2)+d2+e2 etc. Pixel 1.2.3.4. -------------- n-2 a1a2a3a4 n-1 b1b2b3b4 n c1c2c3c4 ----- pixel border---- n+1 d12d3d4d n+2 e1e2e3e4 The current implementation reads the values of a1,b1,c1,d1,e1 one byte at a time, do the calculation and write back the filtered value for c1. I.e something close to the code below: imdifftmp = *(ImageSrc_p-w2); imdiff2 = *(ImageSrc_p-w2+1); ... imdiff8 = *(ImageSrc_p-w2+7); imdifftmp += *(ImageSrc_p-width); imdiff2 += *(ImageSrc_p-width+1); ... imdiff8 += *(ImageSrc_p-width+7); imdifftmp -= (*(ImageSrc_p)) << 2; imdiff2 -= (*(ImageSrc_p+1)) << 2; ... imdiff8 -= (*(ImageSrc_p+7)) << 2; imdifftmp += *(ImageSrc_p+width); imdiff2 += *(ImageSrc_p+width+1); ... imdiff8 += *(ImageSrc_p+width+7); imdifftmp += *(ImageSrc_p+w2); imdiff2 += *(ImageSrc_p+w2+1); ... imdiff8 += *(ImageSrc_p+w2+7); Not very efficient on a 32-bit machine! What I'm trying to achive is to read a 32-bit word containing 4 pixel values, do the calculation an a whole word and write back a word. After some googeling I found the book "Hackers Delight" by Henry S. Warren, Jr. He presents such a method implemented by the two macros below: //Multibyte Add of 4 1-byte integers packed into a word #define MBA(x, y, s)\ do{\ s = ((x)&0x7f7f7f7f)+((y)&0x7f7f7f7f); \ s = (((x)^(y))&0x80808080)^s; \ // printf("\ncarry %08lX", ((x)+(y))^(x)^(y));\ }while(0) //Multibyte Subtract of 4 1-byte integers packed into a word #define MBS(x, y, d)\ do{\ d = ((x)|0x80808080)-((y)&0x7f7f7f7f); \ d = ~((((x)^(y))|0x7f7f7f7f)^d); \ // printf("\ncarry %08lX", ((x)+(y))^(x)^(y));\ }while(0) He also states that the operation below gives the carry into each position (where ¤ in this case denotes bitwise exclusive or (^): (x¤y)¤x¤y These macros works great for small values! The problem is how to handle the carry so that the correct values after the calculations in (1) can be extracted. My question (finally!) is: How can I (if it is possible) handle the carry to recreate the correct signed integer value after the calculations above? Some sample code below: void main(void){ long a1 = 0xc7c8c9ca; long b1 = 0xc8c9cacb; long c1 = 0xddc8c9ca; long d1 = 0xcacbcccd; long e1 = 0xcbcccdce; MBA(a1,b1,s1); //a+b MBA(s1,d1,s2); //+d MBA(s2,e1,s1); //+e MBS(s1,c1,s2); //-c (is it possible to do the -(c<<2) part smarter? MBS(s2,c1,s1); //-c MBS(s1,c1,s2); //-c MBS(s2,c1,s1); //-c //Extract MSB Byte (B0) and add carry stuff... printf("\nvalue after macros %08lX, value after calc %08lX\n", s1, 0xc7+0xc8-(0xdd<<2)+0xca+0xcb); } Gives: carry 9F939794 carry 1F073F3A carry B7B9BF9C carry F8101000 carry BF81879C carry FF313730 carry 3B818384 value after macros B0080808, value after calc FFFFFFB0 -- -------- ^ ^ |----------------------------| | Same value for different methods Cheers //Fredrik Apr 28 '06 #1 