473,406 Members | 2,404 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Vectorization of template functions

Hello all,

I am attempting to vectorize few template functions with the Intel
compiler, but without much success so far. Ok granted, this question
is not 100% c++, but it is related enough that I felt I could post it
here. Also, I did ask in the Intel forums, without much success. And
maybe there are some c++ coders in here that are familiar with the
Intel compiler.

The code below highlights the problem I have. Essentially, I have a
template class that handle images (i.e. large matrices). The template
parameter represent the data type held by the image (i.e. uint8,
float, ...).

I have many functions that apply certain filters to these images, and
I would like to vectorize them. Obviously, since the argument of the
functions is an instance of a template class (the image), the function
itself is a template. The name of the problematic function is 'test',
defined towards the bottom of the code. 'test' is called in 2
different ways.

First, in the main, I defined 2 images, 'gray' and 'tmp', and call
'test' using them as argument. Now when I compile with -QxN, the inner
loop within 'test' is vectorized. Good.

However, I cannot always have the definition of the template function
in the same file as the main, because they are simply too many
template functions defined. Hence I define the template in another
source file, but I have to instantiate it with the actual parameters
it is going to be used, otherwise the linker will not find the code. I
tried to reproduce this organization in a single file. Just below the
definition of 'test', I inserted a line that instantiates 'test' for
the parameters with which it is going to be used. The compiler
understands that correctly, but now says that it cannot vectorize the
inner loop! The message is : "loop was not vectorized: deference too
complex". Why is the deference now too complex, when the compiler
handled it just fine for the other instantiation in the main?!

Any help would be much appreciated.

Alex

##### CUT #####

#include <windows.h>

#include <stdio.h>

#include <math.h>

typedef unsigned char u8;

typedef float f32;



////////////////////

///// MEMORY ROUTINES

////////////////////

enum { MemoryAlignment=64};

void* AllocateMemory(size_t size)

{

return _aligned_malloc(size, MemoryAlignment);

}

void ReleaseMemory(void *memblock)

{

return _aligned_free(memblock);

}

int ComputeAlignedWidth(int width)

{

int alignment_needed = MemoryAlignment / sizeof(float);

return (int)ceil((float)width/(float)alignment_needed) *
alignment_needed;

}



////////////////////

///// CLASS DECLARATION

////////////////////

template <typename T>

struct Image

{

public: // members

// std information

int width, height, depth;

// actual width of the buffer

// buffer holding image data is padded to be a multiple

// of MemoryAlignment for optimisation purposes

int width_padded;

// dimensions helper

int firstRow, lastRow, firstCol, lastCol;

// pointer to the image data

T* data;

public: // methods

// ctor

Image():

width(0),height(0),depth(0),

width_padded(0),

firstRow(0), lastRow(0), firstCol(0), lastCol(0),

data(NULL)

{

}

// dtor

~Image()

{

}

// memory management

void Allocate() { data =
static_cast<T*>(AllocateMemory(width_padded*height *depth*sizeof(T)));}

void Release () { ReleaseMemory(data);}

// pixel access

// virtual T& operator() (int row, int col)

// dimensions management

void SetDimensions(int h, int w, int d){

height = h;

width = w;

depth = d;

width_padded = ComputeAlignedWidth(width);

firstRow = 0;

firstCol = 0;

lastRow = height-1;

lastCol = width-1;

}

// size information

int GetTotalSize(bool padded=false){

if (padded) return width_padded*height*depth*sizeof(T);

else return width *height*depth*sizeof(T);

}

int GetImageSize(bool padded=false){

if (padded) return width_padded*height*depth;

else return width *height*depth;

}

int GetPlaneSize(bool padded=false){

if (padded) return width_padded*height;

else return width *height;

}

};



template <typename T>

struct GrayImage : public Image<T>

{

public: // methods

// ctor

GrayImage():

Image()

{

depth=1;

}

// pixel access

T& operator() (int row, int col)

{

return data[row*width_padded + col];

}

};

template <typename T>

void test(GrayImage<T&input, GrayImage<T&output)

{

int lastR = input.lastRow, firstR = input.firstRow;

int lastC = input.lastCol, firstC = input.firstCol;

for(int row=firstR ; row<=lastR ; ++row){

#pragma ivdep

for(int col=firstC ; col<=lastC ; ++col){
//for(int row=input.firstRow ; row<=input.lastRow ; ++row){

// for(int col=input.firstCol ; col<input.lastCol; ++col){

output(row, col) = input(row, col) + 1;

}

}

}

template void test<f32>(GrayImage<f32&input, GrayImage<f32>
&output);

int main(int argc, char* argv[])

{

GrayImage<f32gray, tmp;
gray.SetDimensions(2000, 2000, 1); gray.Allocate();

tmp.SetDimensions(gray.height, gray.width, 1); tmp.Allocate();

test(gray, tmp);

gray.Release(); tmp.Release();

return 0;

}

##### CUT #####

Jun 22 '07 #1
2 1737
vectorizor wrote:
[..] The compiler
understands that correctly, but now says that it cannot vectorize the
inner loop! The message is : "loop was not vectorized: deference too
complex". Why is the deference now too complex, when the compiler
handled it just fine for the other instantiation in the main?!
[..]
The vectorization of loops is (AFAIUI) an optimization technique your
compiler has and employs if possible. Apparently, if not possible, it
doesn't employ it. But all this has really nothing to do with C++
language, you need to talk to Intel technical support to learn more
about the availability of different opmitization methods and in what
circumstances the compiler can or cannot use those.

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Jun 22 '07 #2
jg
On Jun 22, 8:22 am, vectorizor <vectori...@googlemail.comwrote:
Hello all,
the parameters with which it is going to be used. The compiler
understands that correctly, but now says that it cannot vectorize the
inner loop! The message is : "loop was not vectorized: deference too
complex". Why is the deference now too complex, when the compiler
handled it just fine for the other instantiation in the main?!

Any help would be much appreciated.
I guess it is related to inlining of your function template.
If the call site and template function definition are within the
same file, a compiler can inline it. After inlining, it may
do a better alias analysis, and your code can be vectorized.
Without inlining, reference parameters of your template func
are internally implmented as pointers, which involves more
pointer analysis to see if it can be vectorized. Your best bet
is to ask Intel.

JG

Jun 22 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: claude uq | last post by:
Hello, Am trying to use template functions within some class to convert int, float, doubles, etc into strings. Below, three ways to do it via use of "to_string(const T & Value)" . The only...
4
by: richardclay09 | last post by:
Hi Can someone please write some "compare and contrast" notes for "Template functions vs. function objects"? When to use one and not the other? For example, the TF square_f does the same thing as...
3
by: REH | last post by:
I need to create template functions with the same name but different number of template parameters. My compiler says this is illegal: template<class TO, class F1> TO convert_to(); ...
9
by: Ann Huxtable | last post by:
I have the following code segment - which compiles fine. I'm just worried I may get run time probs - because it looks like the functions are being overloaded by the return types?. Is this Ok: ? ...
5
by: Andrei Tarassov | last post by:
Hi! I would like to know if it is possible to implement something like this: template <class X, class Y> X void func(X, Y) { ... } template <class X, class Y> Y void func(X, Y) { ... }
2
by: shuisheng | last post by:
Dear All, I have question on template functions. Such as I have a class template A as the follows <template class Type> class A { // fun #1 void fun(const Type& val) {...}
1
by: shuisheng | last post by:
Dear All, I am confused about the template functions in template class. Such as I have template class A <class T, int n> In side the class, I have function A<T,nfun() and
2
by: newbie | last post by:
I happened to read boost library code and realized that most (the part I read) functions are inlined like: template <class Key> inline void Foo(const Key& k) { ... ... } Is there a strong...
14
by: Jess | last post by:
Hello, I was told that if I have a template class or template function, then the definitions must be put into the header file where I put the declarations. On the other hand, it is generally...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.