By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,999 Members | 1,313 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,999 IT Pros & Developers. It's quick & easy.

Multidimensional array performance

P: n/a
I'm working with displaying and manipulating very large image sets. The
program handles anything from 2D images to 4D RGB volumes in a
time-series. I've been using dynamically allocated arrays to accomplish
this, but having recently added the ability to load and display 4D
data, I've run into some significant performance issues.

For a dataset of size 256x176x176x1, it takes about 30 seconds to
allocate the array. Loading the data into the array from 176 files only
take a few seconds. Accessing the array is fine, I'm able to loop
through the entire volume in a couple seconds. When its time to
deallocate the memory, it takes a couple minutes. This is on a P4
2.4GHz with 1GB RAM. I'm using the code below to allocate and
deallocate the memory.

I've read about alternate methods for multidimensional arrays,
including templates. I've also read about memory pools, but haven't
seen anything about how to allocate a multidimensional array inside of
the pool. I also dont think a pool is ideal because I'll only be
(de)allocating memory when the user program loads or unloads a dataset.
I've also thought about just declaring a single array in memory and
accessing elements by multiplication (ie array[row*col*plane])

What method would be best to quickly allocate and deallocate large
chunks of memory?

Thanks,
Greg
-------------- DECLARATION -----------------
unsigned short int ****imgDataINT16U; /* unsigned 16-bit */
--------------- ALLOCATION -------------------
int i, j, k;
/* setup the image buffer */
imgData->imgDataINT16U = new unsigned short int***[YSize];
for (i=0; i < YSize;i++)
imgData->imgDataINT16U[i] = new unsigned short int**[XSize];
for (i=0; i < YSize;i++) {
for (j=0; j < XSize;j++)
imgData->imgDataINT16U[i][j] = new unsigned short int*[ZSize];
}
for (i=0; i < YSize;i++) {
for (j=0; j < XSize;j++)
for (k=0; k < ZSize;k++)
imgData->imgDataINT16U[i][j][k] = new unsigned short int[TSize];
}
--------------- DEALLOCATION -------------
for (i=0; i < imgInfo->GetVolumeYSize();i++) {
for (j=0; j < imgInfo->GetVolumeXSize();j++)
for (k=0; k < imgInfo->GetVolumeZSize();k++)
delete[] imgDataINT16U[i][j][k];
progDlg.Update(i,"Unloading Data");
}
for (i=0; i < imgInfo->GetVolumeYSize();i++) {
for (j=0; j < imgInfo->GetVolumeXSize();j++)
delete[] imgDataINT16U[i][j];
}
for (i=0; i < imgInfo->GetVolumeYSize();i++)
delete[] imgDataINT16U[i];
delete[] imgDataINT16U;

------------ GENERAL ACCESS ------------
voxelValue = imgData->imgDataINT16U[col][row][slice][0];

Sep 30 '06 #1
Share this Question
Share on Google+
4 Replies


P: n/a
<Gr************@gmail.comwrote in message
news:11**********************@m73g2000cwd.googlegr oups.com...
: I'm working with displaying and manipulating very large image sets.
The
: program handles anything from 2D images to 4D RGB volumes in a
: time-series. I've been using dynamically allocated arrays to
accomplish
: this, but having recently added the ability to load and display 4D
: data, I've run into some significant performance issues.
:
: For a dataset of size 256x176x176x1, it takes about 30 seconds to
: allocate the array. Loading the data into the array from 176 files
only
: take a few seconds.
....
: -------------- DECLARATION -----------------
: unsigned short int ****imgDataINT16U; /* unsigned 16-bit */
: --------------- ALLOCATION -------------------
: int i, j, k;
: /* setup the image buffer */
: imgData->imgDataINT16U = new unsigned short int***[YSize];
: for (i=0; i < YSize;i++)
: imgData->imgDataINT16U[i] = new unsigned short int**[XSize];
: for (i=0; i < YSize;i++) {
: for (j=0; j < XSize;j++)
: imgData->imgDataINT16U[i][j] = new unsigned short int*[ZSize];
: }
: for (i=0; i < YSize;i++) {
: for (j=0; j < XSize;j++)
: for (k=0; k < ZSize;k++)
: imgData->imgDataINT16U[i][j][k] = new unsigned short int[TSize];
: }
Think of it: if TSize=1, the every single unsigned short 'voxel'
will be allocated in its own memory block! This represents an
space overhead of a pointer+several bytes (depending on your
library implementation, maybe 16!) for each 2 bytes of data !
This of course also means expensive initial allocations.
But during data manipulation as well, performance will be degraded
by the many misaligned memory accesses when manipulating the data.
: What method would be best to quickly allocate and deallocate large
: chunks of memory?

A first step would be to modify your code to use something like:
class Buf4D_Int16U {
public:
Buf4D_Int16U(int xSize, int ySize, int zSize, int tSize)
: p_( new unsigned short(xSize*ySize*zSize*tSize) )
, xSize_(xSize)
, ySize_(ySize)
, zSize_(zSize)
, tSize_(tSize)
{ }

~Buf4D_Int16U() { delete p_; }

// Element access (non-const and const)
unsigned short& operator(int x,int y,int z,int t)
{ return p_[((x*ySize_+y)*zSize_+z_)*tSize+t]; }
unsigned short operator(int x,int y,int z,int t) const
{ return p_[((x*ySize_+y)*zSize_+z_)*tSize+t]; }

private:
unsigned short* p_;
int xSize_,ySize_,zSize_,tSize_;
Buf4D_Int16U(Buf4D_Int16U const&); //disabled copy-ctr
void operator=(Buf4D_Int16U const&); //disabled copy-assign
};

: ------------ GENERAL ACCESS ------------
: voxelValue = imgData->imgDataINT16U[col][row][slice][0];

Becomes:
voxelValue = imgData(col,row,slice,0);

Pass a reference to the imgData object instead of the
unsigned short**** pointers when passing the buffer to functions.

Note that you probably want to use an std::vector<shortinstead
of the naked pointer p_ (but this may have a performance cost,
especially in non-optimized builds...).
Many further improvements are possible, but this first step
should be a no-brainer...
Cheers,
Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
Brainbench MVP for C++ <http://www.brainbench.com

Sep 30 '06 #2

P: n/a
That looks like an excellent solution. I never thought about putting
the indexing math into a function.
How does the performance of using [n][m][...] notation compare to a
function call when accessing elements?
Many further improvements are possible, but this first step
should be a no-brainer...
What other ways are there of improving large multidimensional arrays?

-Greg

Ivan Vecerina wrote:
<Gr************@gmail.comwrote in message
news:11**********************@m73g2000cwd.googlegr oups.com...
: I'm working with displaying and manipulating very large image sets.
The
: program handles anything from 2D images to 4D RGB volumes in a
: time-series. I've been using dynamically allocated arrays to
accomplish
: this, but having recently added the ability to load and display 4D
: data, I've run into some significant performance issues.
:
: For a dataset of size 256x176x176x1, it takes about 30 seconds to
: allocate the array. Loading the data into the array from 176 files
only
: take a few seconds.
...
: -------------- DECLARATION -----------------
: unsigned short int ****imgDataINT16U; /* unsigned 16-bit */
: --------------- ALLOCATION -------------------
: int i, j, k;
: /* setup the image buffer */
: imgData->imgDataINT16U = new unsigned short int***[YSize];
: for (i=0; i < YSize;i++)
: imgData->imgDataINT16U[i] = new unsigned short int**[XSize];
: for (i=0; i < YSize;i++) {
: for (j=0; j < XSize;j++)
: imgData->imgDataINT16U[i][j] = new unsigned short int*[ZSize];
: }
: for (i=0; i < YSize;i++) {
: for (j=0; j < XSize;j++)
: for (k=0; k < ZSize;k++)
: imgData->imgDataINT16U[i][j][k] = new unsigned short int[TSize];
: }
Think of it: if TSize=1, the every single unsigned short 'voxel'
will be allocated in its own memory block! This represents an
space overhead of a pointer+several bytes (depending on your
library implementation, maybe 16!) for each 2 bytes of data !
This of course also means expensive initial allocations.
But during data manipulation as well, performance will be degraded
by the many misaligned memory accesses when manipulating the data.
: What method would be best to quickly allocate and deallocate large
: chunks of memory?

A first step would be to modify your code to use something like:
class Buf4D_Int16U {
public:
Buf4D_Int16U(int xSize, int ySize, int zSize, int tSize)
: p_( new unsigned short(xSize*ySize*zSize*tSize) )
, xSize_(xSize)
, ySize_(ySize)
, zSize_(zSize)
, tSize_(tSize)
{ }

~Buf4D_Int16U() { delete p_; }

// Element access (non-const and const)
unsigned short& operator(int x,int y,int z,int t)
{ return p_[((x*ySize_+y)*zSize_+z_)*tSize+t]; }
unsigned short operator(int x,int y,int z,int t) const
{ return p_[((x*ySize_+y)*zSize_+z_)*tSize+t]; }

private:
unsigned short* p_;
int xSize_,ySize_,zSize_,tSize_;
Buf4D_Int16U(Buf4D_Int16U const&); //disabled copy-ctr
void operator=(Buf4D_Int16U const&); //disabled copy-assign
};

: ------------ GENERAL ACCESS ------------
: voxelValue = imgData->imgDataINT16U[col][row][slice][0];

Becomes:
voxelValue = imgData(col,row,slice,0);

Pass a reference to the imgData object instead of the
unsigned short**** pointers when passing the buffer to functions.

Note that you probably want to use an std::vector<shortinstead
of the naked pointer p_ (but this may have a performance cost,
especially in non-optimized builds...).
Many further improvements are possible, but this first step
should be a no-brainer...
Cheers,
Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
Brainbench MVP for C++ <http://www.brainbench.com
Oct 1 '06 #3

P: n/a
<Gr************@gmail.comwrote in message
news:11**********************@c28g2000cwb.googlegr oups.com...
: That looks like an excellent solution. I never thought about putting
: the indexing math into a function.
: How does the performance of using [n][m][...] notation compare to a
: function call when accessing elements?
You should check using a profiler on your specific platform.
Some reasoning though:
- In Debug mode/unoptimized builds, you will suffer the overhead
of a function call. This may or may not matter.
It may actually be useful to check, within the accessor function,
at the expense of an additional runtime overhead, that all indices
are within range (e.g. ASSERT(x<xSize) if x and xSize are unsigned).
In release mode, this should not matter.
- On a modern HW architecture, the contiguous (and cache-friendly)
memory accesses to [xSize,ySize,zSize], and the multiplications
to compute an index should be less expensive than the multiple
dereferencing of pointers. But take this as a wild guess.

: Many further improvements are possible, but this first step
: should be a no-brainer...
:
: What other ways are there of improving large multidimensional arrays?

For very large data sets, it can be useful to tile the data in
memory (e.g. store small sub-cubes of the data in one contiguous
memory block). This can, for example, improve the performance
of some multi-planar reconstructions (memory locality, can
be more cache-friendly, especially if some of the data does
not fit in main memory).
But this will complicate the implementation of most algorithms.
Since your program ran with all the allocation space overhead
you had, this should not be an issue for you.

For many algorithms, you will save time by using relative offsets to
reach the neighbors of a pixel rather than to recompute each index.
E.g.: tStep=1, zStep=tSize, yStep=zStep*zSize, xStep=yStep*ySize;
When you want to acess the neighbors of a voxel at pv,
you can directly address pv[+zStep], pv[-zStep] etc...
Even better, by adding a 'margin' around your data set,
you can linearly scan and transform the data step from
begin() to end() without any full-index computation.

In a different vein, many algorithms today can be
reimplemented to leverage the graphics processor
of your display card:
http://www.google.com/search?q=GPU+processing

But remember the saying about premature optimization...
hth
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
Brainbench MVP for C++ <http://www.brainbench.com

Oct 1 '06 #4

P: n/a
Thanks! All your information will be helpful.
-Greg
Ivan Vecerina wrote:
<Gr************@gmail.comwrote in message
news:11**********************@c28g2000cwb.googlegr oups.com...
: That looks like an excellent solution. I never thought about putting
: the indexing math into a function.
: How does the performance of using [n][m][...] notation compare to a
: function call when accessing elements?
You should check using a profiler on your specific platform.
Some reasoning though:
- In Debug mode/unoptimized builds, you will suffer the overhead
of a function call. This may or may not matter.
It may actually be useful to check, within the accessor function,
at the expense of an additional runtime overhead, that all indices
are within range (e.g. ASSERT(x<xSize) if x and xSize are unsigned).
In release mode, this should not matter.
- On a modern HW architecture, the contiguous (and cache-friendly)
memory accesses to [xSize,ySize,zSize], and the multiplications
to compute an index should be less expensive than the multiple
dereferencing of pointers. But take this as a wild guess.

: Many further improvements are possible, but this first step
: should be a no-brainer...
:
: What other ways are there of improving large multidimensional arrays?

For very large data sets, it can be useful to tile the data in
memory (e.g. store small sub-cubes of the data in one contiguous
memory block). This can, for example, improve the performance
of some multi-planar reconstructions (memory locality, can
be more cache-friendly, especially if some of the data does
not fit in main memory).
But this will complicate the implementation of most algorithms.
Since your program ran with all the allocation space overhead
you had, this should not be an issue for you.

For many algorithms, you will save time by using relative offsets to
reach the neighbors of a pixel rather than to recompute each index.
E.g.: tStep=1, zStep=tSize, yStep=zStep*zSize, xStep=yStep*ySize;
When you want to acess the neighbors of a voxel at pv,
you can directly address pv[+zStep], pv[-zStep] etc...
Even better, by adding a 'margin' around your data set,
you can linearly scan and transform the data step from
begin() to end() without any full-index computation.

In a different vein, many algorithms today can be
reimplemented to leverage the graphics processor
of your display card:
http://www.google.com/search?q=GPU+processing

But remember the saying about premature optimization...
hth
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
Brainbench MVP for C++ <http://www.brainbench.com
Oct 1 '06 #5

This discussion thread is closed

Replies have been disabled for this discussion.