473,405 Members | 2,210 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

Help in optimizing branches

Hi,

I ran a profiler against this complex app that I'm trying to opimize.
This is an application I'm doing to test image processing. Even
though it does a lot of computation, two simple lines take 30% of the
running times! Both these lines are from Intel's OpenCV library.
Note that mhi, mask8u, and mask are arrays with one entry per pixel in
a 640x480 image.

If anyone has any hints on how to optimize this, it would be greatly
appreciated.

Thanks,
John
const int cts = (int&)ts;
for( y = 0; y < mhi->rows; y++ )
{
int* mhi_row = (int*)(mhi->data.ptr + y*mhi->step);
uchar* mask8u_row = mask8u->data.ptr + (y+1)*mask8u->step + 1;

for( x = 0; x < mhi->cols; x++ )
{
>>> if( mhi_row[x] == cts && mask8u_row[x] == 0 ) THE LINE ABOVE TAKES 20% of the time
uchar* mask_row = mask->data.ptr + mask->step;
for( i = 1; i <= size.height; i++, mask_row += mask->step )
{>>> mask_row[0] = mask_row[size.width+1] = (uchar)1;

THE LINE ABOVE TAKES 10% of the time
Jul 19 '05 #1
4 1682
John Malek wrote:
Hi,

I ran a profiler against this complex app that I'm trying to opimize.
This is an application I'm doing to test image processing. Even
though it does a lot of computation, two simple lines take 30% of the
running times! Both these lines are from Intel's OpenCV library.
Note that mhi, mask8u, and mask are arrays with one entry per pixel in
a 640x480 image.

If anyone has any hints on how to optimize this, it would be greatly
appreciated.
This question is off-topic for comp.std.c++ - try comp.programming.

However, start by posting the whole routine or send us ap pointer to the
library.

As for some possible answers:-

You could try folding some constants out of the loop but it's possible
that the optimizer has already done this. The other thing is loop
unrolling. Finally you need to look at cache.

const int cts = (int&)ts;
for( y = 0; y < mhi->rows; y++ )
{
int* mhi_row = (int*)(mhi->data.ptr + y*mhi->step);
uchar* mask8u_row = mask8u->data.ptr + (y+1)*mask8u->step + 1;
for( x = 0; x < mhi->cols; x++ )
{
>>>> if( mhi_row[x] == cts && mask8u_row[x] == 0 )
THE LINE ABOVE TAKES 20% of the time
uchar* mask_row = mask->data.ptr + mask->step;
add:
uchar* mask_row_plus_width_plus_1 = mask_row + size.width+1;
int step = mask->step;
for( i = 1; i <= size.height; i++, mask_row += mask->step ) // comparison with 0 is faster - i is only used to limit the loop
// count so you can reverse the loop.
replace:

for(
i = size.height;
i >= 0;
i--,
mask_row += step,
mask_row_plus_width_plus_1 += step
)
// you could theoretically also unroll this loop easily
{
>> mask_row[0] = mask_row[size.width+1] = (uchar)1;

replace:
mask_row_plus_width_plus_1[0] = (uchar)1;
mask_row[0] = (uchar)1;

THE LINE ABOVE TAKES 10% of the time


Both of these seem like cache thrashers depending on the value of
"step". Basically the cache optimizations are changing the algorithm to
limit the memory footprint to "blocks" at a time. Hence the name
"blocking". This may require a change in the data structure. That's
why many image algorithms work with "tiles".

Jul 19 '05 #2
"John Malek" <jo********@hotmail.com> wrote in message
news:9e**************************@posting.google.c om...
Hi,

I ran a profiler against this complex app that I'm trying to opimize.
This is an application I'm doing to test image processing. Even
though it does a lot of computation, two simple lines take 30% of the
running times! Both these lines are from Intel's OpenCV library.
Note that mhi, mask8u, and mask are arrays with one entry per pixel in
a 640x480 image.

If anyone has any hints on how to optimize this, it would be greatly
appreciated.


John,

I don't think you can expect much improvement.

Those lines are in nested loops and will get executed many times. You can
try to do some constant propagation, loop invariant code motions by hand but
probably the compiler can do that too if the code's not too complicated.

Nick
Jul 19 '05 #3
> If anyone has any hints on how to optimize this, it would be greatly
appreciated.

Thanks,
John
const int cts = (int&)ts;
for( y = 0; y < mhi->rows; y++ )
{
int* mhi_row = (int*)(mhi->data.ptr + y*mhi->step);
uchar* mask8u_row = mask8u->data.ptr + (y+1)*mask8u->step + 1;

for( x = 0; x < mhi->cols; x++ )
{
>>>> if( mhi_row[x] == cts && mask8u_row[x] == 0 ) THE LINE ABOVE TAKES 20% of the time
uchar* mask_row = mask->data.ptr + mask->step;
for( i = 1; i <= size.height; i++, mask_row += mask->step )
{>>>> mask_row[0] = mask_row[size.width+1] = (uchar)1;

THE LINE ABOVE TAKES 10% of the time


I'm wondering if moving the pointer dereference out of the loop logic
would help. I assume that mhi->cols is not changing within the loop?
Try moving this out of the for loop. The compiler may already be doing
this for you. A quick look at the asm output would tell whether this
could buy you some time.
Jul 19 '05 #4

"John Malek" <jo********@hotmail.com> wrote in message news:9e**************************@posting.google.c om...
int* mhi_row = (int*)(mhi->data.ptr + y*mhi->step);
Just what is mhi->data.ptr? Hopefully, it's something that converts
to int* nicely.

>>>> if( mhi_row[x] == cts && mask8u_row[x] == 0 )

Depending on the platform, using pointers might be faster:
if (*mhi_row++ == cts && *mask8u_row++ == 0)
you may need to put the increments elsewhere depending on what the rest
of the code does with these values.
uchar* mask_row = mask->data.ptr + mask->step;
for( i = 1; i <= size.height; i++, mask_row += mask->step )
{>>>> mask_row[0] = mask_row[size.width+1] = (uchar)1;


You might precompute size.width+1 into a local variable rather than preforming
the addition each time, this looks invariant to the loop. There's no reason to cast
the literal 1.

Jul 19 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Pekka Niiranen | last post by:
Hi there, I have tree-like dictionary data = {p_name: {p_keyp: {p_cost: }}} which I update by creating nonexisting branches when necessary. If the full branch already exists I update its...
0
by: Juraj Longauer | last post by:
Hi, I need to concatenate rows with description of account branch into one column. Example: I have created temporary table to retrieve client and branches for his accounts.(5 mil. rows) ...
0
by: Vikram Vaswani | last post by:
Hi all, I have the following two tables: mysql> SELECT * FROM clients; +-----+-----------------------------+ | cid | cname | +-----+-----------------------------+ |...
3
by: Stewart Allen | last post by:
Hi there, I'm trying to create a query that will display all branches in a company even if that branch hasn't made a claim. The main manufacturing company makes the machines and distributes the...
3
by: Alwin | last post by:
Hey All! I am currently designing a database model for (at first sight) a simple order entry program. The problem I'm currently facing is the exchange of data between the databases of each branch...
4
JRBower
by: JRBower | last post by:
The SQL below displays an employee count for 14 branches. What I'd like to do is keep the individual counts for branches 1 through 11 but have branch 12 be the sum of branch 12, 13, 14. ...
10
by: anon.asdf | last post by:
Here's a reminder of duff's device: /*************************************/ #include <stdio.h> #define STEP 8 #define MAX_LEN STEP*4+1 #define SOURCE_LEN 28
3
by: =?iso-8859-1?Q?Tine_M=FCller?= | last post by:
On this site http://www.tinemuller.dk/sikkerbyg/ the dropdown is functioning the way it should for the different part in Denmark but now I want to show the same informations for the hole Denmark...
6
by: Jeff Newman | last post by:
Hello, Could anyone explain to me why the following class's destructor shows up as having multiple branches? (At least as judged by gcov 4.1.2 when compiled with gcc 4.1.2 ): struct blah {...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.