473,395 Members | 1,526 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Tips on optimizing these functions

Hello everyone,

I wrote a bunch of recursive functions to operate on multi-dimensional
matrices. The matrices are allocated dynamically in a non-contiguous way,
i.e. as an array of pointers pointing to arrays of data,or other pointers
if the matrix has more than 2 dimensions.

The parameters passed to these functions are:
- current_dimension: counts (from 0 to dimensions-1) the matrix
dimension on which the function is working, it's the variable passed on
the stack by the recursion
- dimensions: number of matrix's dimensions
- elem_size: size of the matrix's elements
- dimensions_sizes: a vector containing the 'size' of each dimension
For example, to work on a 10x20 matrix of integers, following the
ordering of above, we would pass:
(0,2,sizeof(int),(unsigned int [2]){10,20})
for a 10x20x15 one, we would pass
(0,3,sizeof(int),(unsigned int [3]){10,20,15})

The functions work fast for allocation and freeing, 'cause calls to
malloc and free take up most of the execution time. They're somewhat slow
at copying or initialising matrices. For initialization I mean assign a
scalar value to the elements of the matrix.

I've done some benchmarks with copying and initialisation. Compared to a
specific-nested-loop solution, the functions take up to twice the time.
However, turning on some optimization flags, specifically '-O3' with gcc,
the gap between the recursive and the specific solution reduces to 20%.

So, have you got any advice about optimizing this code?
Other suggestions are welcomeas well.

TIA

Andrea

Here follows the copying function. The initialising function is almost
identical

NB: to better understand the code you should imagine to work with a bi-
dimensional matrix (implemented as a pointer to pointer in the code). The
recursive step casts either the matrix to a vector, if the function
reached the elements' dimension, ending recursion, or the rows of the
matrix to a bi-dimensional matrix (again, pointer to pointer), continuing
recursion.

//////////////////////////////////

typedef unsigned char byte;

// this one copy one row of the matrix. The row is supposed to store the
value of elements, not pointers
void _copy_row(void* dest, void* src, unsigned short elem_size, unsigned
int n)
{
unsigned short length;

byte* d1,*d2;

d1 = (byte*)dest;
d2 = (byte*)src;

// copy byte to byte
while (n 0)
{
for (length = 0; length < elem_size; length++)
{
(*d1) = (*d2);
d1++;
d2++;
};
n--;
};
}

// this is the recursive function
void _vec_copy(byte current_dimension, byte dimensions,unsigned short
elem_size, unsigned int* dimensions_size, void** restrict dest, void**
restrict src)
{
int i; // row index

if (current_dimension < dimensions)
{
if (current_dimension == dimensions -1)
{
_copy_row((void*)dest, (void*)src, elem_size,dimensions_size
[current_dimension]);
}
else
{
for (i = 0; i < dimensions_size[current_dimension]; i++)
_vec_copy(current_dimension+1, dimensions,
elem_size,dimensions_size, (void**)dest[i], (void**)src[i]);
};
};

Sep 27 '08 #1
2 1742
Andrea Taverna wrote:
I've done some benchmarks with copying and initialisation. Compared to a
specific-nested-loop solution, the functions take up to twice the time.
However, turning on some optimization flags, specifically '-O3' with gcc,
the gap between the recursive and the specific solution reduces to 20%.

So, have you got any advice about optimizing this code?
Other suggestions are welcomeas well.
typedef unsigned char byte;

// this one copy one row of the matrix. The row is supposed to store the
value of elements, not pointers
void _copy_row(void* dest, void* src, unsigned short elem_size, unsigned
int n)
{
unsigned short length;

byte* d1,*d2;

d1 = (byte*)dest;
d2 = (byte*)src;

// copy byte to byte
while (n 0)
{
for (length = 0; length < elem_size; length++)
{
(*d1) = (*d2);
d1++;
d2++;
};
n--;
};
}
This is so dependent on the platform that we could justifiably argue you
should choose one, and go to a forum associated with that platform.
Do any of the compilers you use take advantage of restrict?
If elem_size happens to match frequently the size of a stdint type, you
will need to switch case the code so as to remove the inner loop for those
cases.
Some compilers automatically substitute a run-time library copy function
which invokes all the usual memcpy() optimizations (align destination,
move groups of bytes per instruction).
If you wrote memcpy() in line, that would work well with certain
compilers, not so well with others (possibly depending on command line
options and which run time library you choose). If you are somehow
prohibited from using restrict, writing in memcpy() makes the same assertion.
Sep 27 '08 #2
On Sat, 27 Sep 2008 14:13:50 +0200 (CEST), Andrea Taverna
<a.****@libero.itwrote:

snip discussion of matrix philosophy
>typedef unsigned char byte;

// this one copy one row of the matrix. The row is supposed to store the
value of elements, not pointers
void _copy_row(void* dest, void* src, unsigned short elem_size, unsigned
int n)
{
unsigned short length;

byte* d1,*d2;

d1 = (byte*)dest;
d2 = (byte*)src;

// copy byte to byte
while (n 0)
{
for (length = 0; length < elem_size; length++)
{
(*d1) = (*d2);
d1++;
d2++;
};
n--;
};
}
Each element consists of elem_size contiguous bytes. Each row
consists of n contiguous elements. Therefore, each row must consist
of n*elem_size contiguous bytes.

The entire body of your function can be replaced with
memcpy(dest, src, (size_t)n*elem_size);

In fact, the entire function can be deleted and any call to the
function replaced with the above statement.

Either substitution will have the additional benefit of not invoking
undefined behavior if any of the elements are indeterminate.

--
Remove del for email
Sep 27 '08 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Andreas Paasch | last post by:
I've finally gotten my nice little system working and it's gone live. Now, I spent time optimizing my code and adding a little smart functionality here and there, based on needs and simplicity. ...
0
by: Mike Chirico | last post by:
Interesting Things to Know about MySQL Mike Chirico (mchirico@users.sourceforge.net) Copyright (GPU Free Documentation License) 2004 Last Updated: Mon Jun 7 10:37:28 EDT 2004 The latest...
4
by: J. Campbell | last post by:
From reading this forum, it is my understanding that C++ doesn't require the compiler to keep code that does not manifest itself in any way to the user. For example, in the following: { for(int...
8
by: Hagen | last post by:
Hi, I have a question that you probably shouldn´t worry about since the compiler cares for it, but anyways: When you run your compiler with optimization turned on (eg. g++ with -Ox flag) and...
7
by: eyh5 | last post by:
Hi, I'm writing some C codes to run simulations. I'm wondering if there is a website that may contain useful information on how to make one's code run more efficiently and in a...
1
by: code | last post by:
Hi Grp http://www.books-download.com/?Book=1493-PHP+Hacks+%3a+Tips+%26+Tools+For+Creating+Dynamic+Websites+(Hacks) Description Programmers love its flexibility and speed; designers love its...
2
by: Jack | last post by:
I have a chunk of code that loads a few dozen function pointers into global variables. I'm concerned with unused memory consumption. What if the client only needs to use one or two functions? Then...
4
by: Got2Go | last post by:
Hello Group, I have a table that has millions of records in it. About 100 records are added every 5 minutes (one per OIDID) (the sample provided below has data for 2 OIDIDs (99 and 100) And I...
0
by: Miguel Perez | last post by:
Please critique this tail call optimizing decorator I've written. I've tried to fix the pitfalls of other proposed decorators, and the result is this one that supports mutual recursion, does not...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.