472,145 Members | 1,607 Online

# Tips on optimizing these functions

Hello everyone,

I wrote a bunch of recursive functions to operate on multi-dimensional
matrices. The matrices are allocated dynamically in a non-contiguous way,
i.e. as an array of pointers pointing to arrays of data,or other pointers
if the matrix has more than 2 dimensions.

The parameters passed to these functions are:
- current_dimension: counts (from 0 to dimensions-1) the matrix
dimension on which the function is working, it's the variable passed on
the stack by the recursion
- dimensions: number of matrix's dimensions
- elem_size: size of the matrix's elements
- dimensions_sizes: a vector containing the 'size' of each dimension
For example, to work on a 10x20 matrix of integers, following the
ordering of above, we would pass:
(0,2,sizeof(int),(unsigned int ){10,20})
for a 10x20x15 one, we would pass
(0,3,sizeof(int),(unsigned int ){10,20,15})

The functions work fast for allocation and freeing, 'cause calls to
malloc and free take up most of the execution time. They're somewhat slow
at copying or initialising matrices. For initialization I mean assign a
scalar value to the elements of the matrix.

I've done some benchmarks with copying and initialisation. Compared to a
specific-nested-loop solution, the functions take up to twice the time.
However, turning on some optimization flags, specifically '-O3' with gcc,
the gap between the recursive and the specific solution reduces to 20%.

Other suggestions are welcomeas well.

TIA

Andrea

Here follows the copying function. The initialising function is almost
identical

NB: to better understand the code you should imagine to work with a bi-
dimensional matrix (implemented as a pointer to pointer in the code). The
recursive step casts either the matrix to a vector, if the function
reached the elements' dimension, ending recursion, or the rows of the
matrix to a bi-dimensional matrix (again, pointer to pointer), continuing
recursion.

//////////////////////////////////

typedef unsigned char byte;

// this one copy one row of the matrix. The row is supposed to store the
value of elements, not pointers
void _copy_row(void* dest, void* src, unsigned short elem_size, unsigned
int n)
{
unsigned short length;

byte* d1,*d2;

d1 = (byte*)dest;
d2 = (byte*)src;

// copy byte to byte
while (n 0)
{
for (length = 0; length < elem_size; length++)
{
(*d1) = (*d2);
d1++;
d2++;
};
n--;
};
}

// this is the recursive function
void _vec_copy(byte current_dimension, byte dimensions,unsigned short
elem_size, unsigned int* dimensions_size, void** restrict dest, void**
restrict src)
{
int i; // row index

if (current_dimension < dimensions)
{
if (current_dimension == dimensions -1)
{
_copy_row((void*)dest, (void*)src, elem_size,dimensions_size
[current_dimension]);
}
else
{
for (i = 0; i < dimensions_size[current_dimension]; i++)
_vec_copy(current_dimension+1, dimensions,
elem_size,dimensions_size, (void**)dest[i], (void**)src[i]);
};
};

Sep 27 '08 #1
2 1662 Andrea Taverna wrote:
I've done some benchmarks with copying and initialisation. Compared to a
specific-nested-loop solution, the functions take up to twice the time.
However, turning on some optimization flags, specifically '-O3' with gcc,
the gap between the recursive and the specific solution reduces to 20%.

Other suggestions are welcomeas well.
typedef unsigned char byte;

// this one copy one row of the matrix. The row is supposed to store the
value of elements, not pointers
void _copy_row(void* dest, void* src, unsigned short elem_size, unsigned
int n)
{
unsigned short length;

byte* d1,*d2;

d1 = (byte*)dest;
d2 = (byte*)src;

// copy byte to byte
while (n 0)
{
for (length = 0; length < elem_size; length++)
{
(*d1) = (*d2);
d1++;
d2++;
};
n--;
};
}
This is so dependent on the platform that we could justifiably argue you
should choose one, and go to a forum associated with that platform.
Do any of the compilers you use take advantage of restrict?
If elem_size happens to match frequently the size of a stdint type, you
will need to switch case the code so as to remove the inner loop for those
cases.
Some compilers automatically substitute a run-time library copy function
which invokes all the usual memcpy() optimizations (align destination,
move groups of bytes per instruction).
If you wrote memcpy() in line, that would work well with certain
compilers, not so well with others (possibly depending on command line
options and which run time library you choose). If you are somehow
prohibited from using restrict, writing in memcpy() makes the same assertion.
Sep 27 '08 #2
On Sat, 27 Sep 2008 14:13:50 +0200 (CEST), Andrea Taverna
<a.****@libero.itwrote:

snip discussion of matrix philosophy
>typedef unsigned char byte;

// this one copy one row of the matrix. The row is supposed to store the
value of elements, not pointers
void _copy_row(void* dest, void* src, unsigned short elem_size, unsigned
int n)
{
unsigned short length;

byte* d1,*d2;

d1 = (byte*)dest;
d2 = (byte*)src;

// copy byte to byte
while (n 0)
{
for (length = 0; length < elem_size; length++)
{
(*d1) = (*d2);
d1++;
d2++;
};
n--;
};
}
Each element consists of elem_size contiguous bytes. Each row
consists of n contiguous elements. Therefore, each row must consist
of n*elem_size contiguous bytes.

The entire body of your function can be replaced with
memcpy(dest, src, (size_t)n*elem_size);

In fact, the entire function can be deleted and any call to the
function replaced with the above statement.

Either substitution will have the additional benefit of not invoking
undefined behavior if any of the elements are indeterminate.

--
Remove del for email
Sep 27 '08 #3

### This discussion thread is closed

Replies have been disabled for this discussion.

### Similar topics

 7 posts views Thread by Andreas Paasch | last post: by reply views Thread by Mike Chirico | last post: by 4 posts views Thread by J. Campbell | last post: by 8 posts views Thread by Hagen | last post: by 7 posts views Thread by eyh5 | last post: by 1 post views Thread by code | last post: by 2 posts views Thread by Jack | last post: by 4 posts views Thread by Got2Go | last post: by reply views Thread by Miguel Perez | last post: by reply views Thread by isladogs | last post: by reply views Thread by antdb | last post: by reply views Thread by pddon | last post: by reply views Thread by leo001 | last post: by reply views Thread by antdb | last post: by 3 posts views Thread by Bright1Light | last post: by 7 posts views Thread by bounthong | last post: by 5 posts views Thread by Bright1Light | last post: by 1 post views Thread by Computer0300 | last post: by