By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,402 Members | 1,011 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,402 IT Pros & Developers. It's quick & easy.

How can a Vector class be optimized ?

P: n/a
Hi everyone

I am working on some code that uses colors. Until recently this code
used colors represented a tree floats (RGB format) but recently changed
so colors are now defined as spectrum. The size of the vector went from
3 (RGB) to 151 (400 nm to 700 with a sample every 2nm). The variables
are using a simple Vector class defined as follow:

template<typename T, int Depth>
class Vector
{ ...
};

Since the move from the RGB version of the code to the Spectral version
the application has significantly slowed dow. I did a test where I use
the Vector class & just a straight usage of arrays of 151 floats on
which the same operations are performed 1 million times.

int maxIter = static_cast<int>( 1e+6 );

#include <time.h>

clock_t c1, c0 = clock();

c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
float real = 1.245;
float anotherReal = 20.43492342;
float v[ 151 ];
float v2[ 151 ];
memset( v, 0, sizeof( float ) * 151 );
memset( v2, 0, sizeof( float ) * 151 );

// mixing
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * real;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * anotherReal;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] += v[ j ];
}
}
c1 = clock();

cerr << "\nfloat[ 151 ]" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;

c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
float real = 1.245;
float anotherReal = 20.43492342;
Vector<float, 151v( 12.0 );
Vector<float, 151v2( -12.0 );
v = v2 * ( 1.0 - 0.5 ) + v * 0.5;
v += Vector<float, 151>( 10.0 ) * real * anotherReal;
}

c1 = clock();

cerr << "\nSuperVector class" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;

Here are the results
// RGB version, Vector<float, 3>
end CPU time : 390000
elapsed CPU time : 0.39

// Spectral Version Vector<float, 151>
end CPU time : 10510000
elapsed CPU time : 10.12

// Using arrays of 151 floats
end CPU time : 13230000
elapsed CPU time : 2.72

Basically it of course shows that using the Vector class really really
slows down the application especially has the size of the Vector
increases and is not as efficient as doing the operations on arrays of
floats directly. So basically my question is : is there a way of
optimising it ?

I do realise that doing:
Vector<float, 151result = Vecotr<float, 151>( 0.1 ) * 0.1 * 100.0;

is not the same as doing:
float result[ 151 ], temp [ 151 ];
for ( int i = 0; i < 151; ++i ) {
temp[ i ] = 0.1f;
result[ i ] = temp[ i ] * 0.1 * 100.0;
}

But isn't there a way i can make the Vector class as efficient as the
second option (which is to do the math operation on arrays of float
directly) ? Or if the speed is a priority is writing some C type of
code the only way i can get it back when the vector size becomes an
issue ?

Thanks for you help -

template<typename T, int Size>
class SuperVector
{

public:
T w[ Size ];
public:
SuperVector()
{ memset( w, 0, sizeof( T ) * Size ); }
SuperVector( const T &real )
{
for ( int i = 0; i < Size; ++i ) {
(*this).w[ i ] = real;
}
}

inline SuperVector<T, Sizeoperator * ( const SuperVector<T, Size>
&v )
{
SuperVector<T, Sizesv;
for ( int i = 0; i < Size; ++i ) {
sv[ i ] = (*this).w[ i ] * v.w[ i ];
}
return sv;
}

inline SuperVector<T, Sizeoperator * ( const T &real )
{
SuperVector<T, Sizesv;
for ( int i = 0; i < Size; ++i ) {
(*this).w[ i ] *= real;
}
return sv;
}
inline SuperVector<T, Sizeoperator + ( const SuperVector<T, Size>
&v )
{
SuperVector<T, Sizesv;
for ( int i = 0; i < Size; ++i ) {
sv.w[ i ] = (*this).w[ i ] + v.w[ i ];
}
return sv;
}

inline SuperVector<T, Size>& operator += ( const SuperVector<T, Size>
&v )
{
for ( int i = 0; i < Size; ++i ) {
(*this).w[ i ] += v.w[ i ];
}
return *this;
}
};

Nov 5 '06 #1
Share this Question
Share on Google+
12 Replies


P: n/a
ma*****@yahoo.com wrote:
Hi everyone

I am working on some code that uses colors. Until recently this code
used colors represented a tree floats (RGB format) but recently changed
so colors are now defined as spectrum. The size of the vector went from
3 (RGB) to 151 (400 nm to 700 with a sample every 2nm). The variables
are using a simple Vector class defined as follow:

template<typename T, int Depth>
class Vector
{ ...
};

Since the move from the RGB version of the code to the Spectral version
the application has significantly slowed dow. I did a test where I use
the Vector class & just a straight usage of arrays of 151 floats on
which the same operations are performed 1 million times.
[code snipped]

Read up on expression templates. Or, better, use a linear algebra library.
Best

Kai-Uwe Bux
Nov 6 '06 #2

P: n/a

ma*****@yahoo.com wrote:
Hi everyone

I am working on some code that uses colors. Until recently this code
used colors represented a tree floats (RGB format) but recently changed
so colors are now defined as spectrum. The size of the vector went from
3 (RGB) to 151 (400 nm to 700 with a sample every 2nm). The variables
are using a simple Vector class defined as follow:

template<typename T, int Depth>
class Vector
{ ...
};

Since the move from the RGB version of the code to the Spectral version
the application has significantly slowed dow. I did a test where I use
the Vector class & just a straight usage of arrays of 151 floats on
which the same operations are performed 1 million times.

int maxIter = static_cast<int>( 1e+6 );

#include <time.h>

clock_t c1, c0 = clock();

c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
float real = 1.245;
float anotherReal = 20.43492342;
float v[ 151 ];
float v2[ 151 ];
memset( v, 0, sizeof( float ) * 151 );
memset( v2, 0, sizeof( float ) * 151 );

// mixing
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * real;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * anotherReal;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] += v[ j ];
}
}
c1 = clock();

cerr << "\nfloat[ 151 ]" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;

c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
float real = 1.245;
float anotherReal = 20.43492342;
Vector<float, 151v( 12.0 );
Vector<float, 151v2( -12.0 );
std::vector<floatv(151, 12.0);
std::vector<floatv2(151, -12.0);

using the exact same random iterator calculations as the array above:
see the clock results below.
v = v2 * ( 1.0 - 0.5 ) + v * 0.5;
v += Vector<float, 151>( 10.0 ) * real * anotherReal;
}

c1 = clock();

cerr << "\nSuperVector class" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;

Here are the results
// RGB version, Vector<float, 3>
end CPU time : 390000
elapsed CPU time : 0.39

// Spectral Version Vector<float, 151>
end CPU time : 10510000
elapsed CPU time : 10.12

// Using arrays of 151 floats
end CPU time : 13230000
elapsed CPU time : 2.72
_____________________________
Results:

float[ 151 ]
end CPU time : 2620000
elapsed CPU time : 2.62
std::vector class
end CPU time : 4680000
elapsed CPU time : 2.06
>
Basically it of course shows that using the Vector class really really
slows down the application especially has the size of the Vector
increases and is not as efficient as doing the operations on arrays of
floats directly. So basically my question is : is there a way of
optimising it ?
yes, use resize() to manually specify the container's size.

void resize(n, t = T())
- Inserts or erases elements at the end such that the size becomes n
>
I do realise that doing:
Vector<float, 151result = Vecotr<float, 151>( 0.1 ) * 0.1 * 100.0;

is not the same as doing:
float result[ 151 ], temp [ 151 ];
for ( int i = 0; i < 151; ++i ) {
temp[ i ] = 0.1f;
result[ i ] = temp[ i ] * 0.1 * 100.0;
}

But isn't there a way i can make the Vector class as efficient as the
second option (which is to do the math operation on arrays of float
directly) ? Or if the speed is a priority is writing some C type of
code the only way i can get it back when the vector size becomes an
issue ?

Thanks for you help -

template<typename T, int Size>
class SuperVector
{

public:
T w[ Size ];
public:
SuperVector()
{ memset( w, 0, sizeof( T ) * Size ); }
SuperVector( const T &real )
{
for ( int i = 0; i < Size; ++i ) {
(*this).w[ i ] = real;
}
}

inline SuperVector<T, Sizeoperator * ( const SuperVector<T, Size>
&v )
{
SuperVector<T, Sizesv;
for ( int i = 0; i < Size; ++i ) {
sv[ i ] = (*this).w[ i ] * v.w[ i ];
}
return sv;
}

inline SuperVector<T, Sizeoperator * ( const T &real )
{
SuperVector<T, Sizesv;
for ( int i = 0; i < Size; ++i ) {
(*this).w[ i ] *= real;
}
return sv;
}
inline SuperVector<T, Sizeoperator + ( const SuperVector<T, Size>
&v )
{
SuperVector<T, Sizesv;
for ( int i = 0; i < Size; ++i ) {
sv.w[ i ] = (*this).w[ i ] + v.w[ i ];
}
return sv;
}

inline SuperVector<T, Size>& operator += ( const SuperVector<T, Size>
&v )
{
for ( int i = 0; i < Size; ++i ) {
(*this).w[ i ] += v.w[ i ];
}
return *this;
}
};
Nov 6 '06 #3

P: n/a
"ma*****@yahoo.com" <ma*****@yahoo.comwrote:
v = v2 * ( 1.0 - 0.5 ) + v * 0.5;
v += Vector<float, 151>( 10.0 ) * real * anotherReal;
Optimize your vector class by removing the op* and op+. Too many
temporaries are being created.

Here is an interesting exorcise:

int maxIter = static_cast<int>( 1e+6 );

clock_t c1, c0 = clock();

struct binary_op
{
float operator()( float lhs, float rhs ) const
{
return lhs * ( 1.0 - 0.5 ) + rhs * 0.5;
}
};

struct unary_op
{
unary_op( float r, float r2 ): real( r ), anotherReal( r2 ) { }
float operator()( float v ) const {
return v + 10.0 * real * anotherReal;
}
const float real, anotherReal;
};

int main() {
float real = 1.245;
float anotherReal = 20.43492342;
vector<floatv( 151, 12.0 );
vector<floatv2( 151, -12.0 );
c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
// mixing
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * real;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * anotherReal;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] += v[ j ];
}
}
c1 = clock();

cerr << "\nManual iteration" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;

for ( int i = 0; i < 151; ++i ) {
v[i] = 12.0;
v2[i] = -12.0;
}

c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
transform( v2.begin(), v2.end(), v.begin(), v.begin(),
binary_op() );
transform( v.begin(), v.end(), v.begin(),
unary_op( real, anotherReal ) );
}
c1 = clock();
cerr << "\nAlgorithm Use" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;
}

My output:

Manual iteration
end CPU time : 174
elapsed CPU time : 1.74

Algorithm Use
end CPU time : 265
elapsed CPU time : 0.91

--
To send me email, put "sheltie" in the subject.
Nov 6 '06 #4

P: n/a

Daniel T. wrote:
"ma*****@yahoo.com" <ma*****@yahoo.comwrote:
v = v2 * ( 1.0 - 0.5 ) + v * 0.5;
v += Vector<float, 151>( 10.0 ) * real * anotherReal;

Optimize your vector class by removing the op* and op+. Too many
temporaries are being created.

Here is an interesting exorcise:

int maxIter = static_cast<int>( 1e+6 );

clock_t c1, c0 = clock();

struct binary_op
{
float operator()( float lhs, float rhs ) const
{
return lhs * ( 1.0 - 0.5 ) + rhs * 0.5;
}
};

struct unary_op
{
unary_op( float r, float r2 ): real( r ), anotherReal( r2 ) { }
float operator()( float v ) const {
return v + 10.0 * real * anotherReal;
}
const float real, anotherReal;
};

int main() {
float real = 1.245;
float anotherReal = 20.43492342;
vector<floatv( 151, 12.0 );
vector<floatv2( 151, -12.0 );
c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
// mixing
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * real;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * anotherReal;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] += v[ j ];
}
}
c1 = clock();

cerr << "\nManual iteration" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;

for ( int i = 0; i < 151; ++i ) {
v[i] = 12.0;
v2[i] = -12.0;
}

c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
transform( v2.begin(), v2.end(), v.begin(), v.begin(),
binary_op() );
transform( v.begin(), v.end(), v.begin(),
unary_op( real, anotherReal ) );
}
c1 = clock();
cerr << "\nAlgorithm Use" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;
}

My output:

Manual iteration
end CPU time : 174
elapsed CPU time : 1.74

Algorithm Use
end CPU time : 265
elapsed CPU time : 0.91
Perhaps, I'm missing something here, nonetheless, the output for Manual
Iteration ( dump v.front() and v.back() ) results in all zeros, while
that of the algorithm doesn't. Digging deeper I realize that the
return from unary_op and the summation in the manual iteration aren't
the same. So I modified the source such that we have:

int main() {
float real = 1.245;
float anotherReal = 20.43492342;
std::vector<floatv( 151, 12.0 );
std::vector<floatv2( 151, -12.0 );
c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
// mixing
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
}
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] + 10 * real * anotherReal;
}

}
c1 = clock();

std::cerr << "\nManual iteration" << std::endl;
std::cerr << "end CPU time : " << (long)c1 << std::endl;
std::cerr << "elapsed CPU time : " << (float)( c1 - c0 ) /
CLOCKS_PER_SEC
<< std::endl;
//std::cout << " ### " << v.front() << std::endl;
//std::cout << " ### " << v.back() << std::endl;

for ( int i = 0; i < 151; ++i ) {
v[i] = 12.0;
v2[i] = -12.0;
}
c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
std::transform( v2.begin(), v2.end(), v.begin(), v.begin(),
binary_op() );
std::transform( v.begin(), v.end(), v.begin(),
unary_op( real, anotherReal ) );
}
c1 = clock();
std::cerr << "\nAlgorithm Use" << std::endl;
std::cerr << "end CPU time : " << (long)c1 << std::endl;
std::cerr << "elapsed CPU time : " << (float)( c1 - c0 ) /
CLOCKS_PER_SEC
<< std::endl;

//std::cout << " ## " << v.front() << std::endl;
//std::cout << " ## " << v.back() << std::endl;

}

P4 3.2Ghz MSVC.NET -O3 optimization

Manual iteration
end CPU time : 7570
elapsed CPU time : 7.57

Algorithm Use
end CPU time : 28453
elapsed CPU time : 20.883

Press any key to continue . . .

Nov 7 '06 #5

P: n/a
"ma740988" <ma******@gmail.comwrote:
Daniel T. wrote:
>Manual iteration
end CPU time : 174
elapsed CPU time : 1.74

Algorithm Use
end CPU time : 265
elapsed CPU time : 0.91

Perhaps, I'm missing something here, nonetheless, the output for Manual
Iteration ( dump v.front() and v.back() ) results in all zeros, while
that of the algorithm doesn't. Digging deeper I realize that the
return from unary_op and the summation in the manual iteration aren't
the same. So I modified the source...

P4 3.2Ghz MSVC.NET -O3 optimization

Manual iteration
end CPU time : 7570
elapsed CPU time : 7.57

Algorithm Use
end CPU time : 28453
elapsed CPU time : 20.883

Press any key to continue . . .
Odd, I used your main and it, of course, speeded up the manual iteration
considerably:

PowerPC G5 1.6GHz g++-4.0

Manual iteration
end CPU time : 96
elapsed CPU time : 0.96

Algorithm Use
end CPU time : 187
elapsed CPU time : 0.91

cpp_sandbox has exited with status 0.

Of course the fact that my computer is faster than yours isn't relevant,
it's the percentage difference in the numbers that surprises. I'm
showing a 5% increase in speed for the algorithm use, whereas you show a
176% *decrease*. Are you sure you compiled with full optimizations for
speed?

--
To send me email, put "sheltie" in the subject.
Nov 7 '06 #6

P: n/a

Daniel T. wrote:
"ma740988" <ma******@gmail.comwrote:
Daniel T. wrote:

Of course the fact that my computer is faster than yours isn't relevant,
it's the percentage difference in the numbers that surprises. I'm
showing a 5% increase in speed for the algorithm use, whereas you show a
176% *decrease*. Are you sure you compiled with full optimizations for
speed?
I'll try full optimization then observe the difference. My initial
test was done with 03 (MSCV.NET 05 ) optimzation. What compiler are
you using?

Nov 8 '06 #7

P: n/a
"ma740988" <ma******@gmail.comwrote:
Daniel T. wrote:
>"ma740988" <ma******@gmail.comwrote:
>>Daniel T. wrote:

Of course the fact that my computer is faster than yours isn't
relevant, it's the percentage difference in the numbers that
surprises. I'm showing a 5% increase in speed for the algorithm
use, whereas you show a 176% *decrease*. Are you sure you compiled
with full optimizations for speed?

I'll try full optimization then observe the difference. My initial
test was done with 03 (MSCV.NET 05 ) optimzation. What compiler are
you using?
PowerPC G5 1.6GHz g++ - 4.0

--
To send me email, put "sheltie" in the subject.
Nov 8 '06 #8

P: n/a
ma*****@yahoo.com wrote:
Hi everyone

I am working on some code that uses colors. Until recently this code
used colors represented a tree floats (RGB format) but recently changed
so colors are now defined as spectrum. The size of the vector went from
3 (RGB) to 151 (400 nm to 700 with a sample every 2nm). The variables
are using a simple Vector class defined as follow:

template<typename T, int Depth>
class Vector
{ ...
};

Since the move from the RGB version of the code to the Spectral version
the application has significantly slowed dow. I did a test where I use
the Vector class & just a straight usage of arrays of 151 floats on
which the same operations are performed 1 million times.

int maxIter = static_cast<int>( 1e+6 );

#include <time.h>

clock_t c1, c0 = clock();

c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
float real = 1.245;
float anotherReal = 20.43492342;
float v[ 151 ];
float v2[ 151 ];
memset( v, 0, sizeof( float ) * 151 );
memset( v2, 0, sizeof( float ) * 151 );

// mixing
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * real;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * anotherReal;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] += v[ j ];
}
}
c1 = clock();

cerr << "\nfloat[ 151 ]" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;

c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
float real = 1.245;
float anotherReal = 20.43492342;
Vector<float, 151v( 12.0 );
Vector<float, 151v2( -12.0 );
v = v2 * ( 1.0 - 0.5 ) + v * 0.5;
v += Vector<float, 151>( 10.0 ) * real * anotherReal;
}

c1 = clock();

cerr << "\nSuperVector class" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;

Here are the results
// RGB version, Vector<float, 3>
end CPU time : 390000
elapsed CPU time : 0.39

// Spectral Version Vector<float, 151>
end CPU time : 10510000
elapsed CPU time : 10.12

// Using arrays of 151 floats
end CPU time : 13230000
elapsed CPU time : 2.72

Basically it of course shows that using the Vector class really really
slows down the application especially has the size of the Vector
increases and is not as efficient as doing the operations on arrays of
floats directly. So basically my question is : is there a way of
optimising it ?

I do realise that doing:
Vector<float, 151result = Vecotr<float, 151>( 0.1 ) * 0.1 * 100.0;

is not the same as doing:
float result[ 151 ], temp [ 151 ];
for ( int i = 0; i < 151; ++i ) {
temp[ i ] = 0.1f;
result[ i ] = temp[ i ] * 0.1 * 100.0;
}

But isn't there a way i can make the Vector class as efficient as the
second option (which is to do the math operation on arrays of float
directly) ? Or if the speed is a priority is writing some C type of
code the only way i can get it back when the vector size becomes an
issue ?

Thanks for you help -

template<typename T, int Size>
class SuperVector
{

public:
T w[ Size ];
public:
SuperVector()
{ memset( w, 0, sizeof( T ) * Size ); }
SuperVector( const T &real )
{
for ( int i = 0; i < Size; ++i ) {
(*this).w[ i ] = real;
}
}

inline SuperVector<T, Sizeoperator * ( const SuperVector<T, Size>
&v )
{
SuperVector<T, Sizesv;
for ( int i = 0; i < Size; ++i ) {
sv[ i ] = (*this).w[ i ] * v.w[ i ];
}
return sv;
}

inline SuperVector<T, Sizeoperator * ( const T &real )
{
SuperVector<T, Sizesv;
for ( int i = 0; i < Size; ++i ) {
(*this).w[ i ] *= real;
}
return sv;
}
inline SuperVector<T, Sizeoperator + ( const SuperVector<T, Size>
&v )
{
SuperVector<T, Sizesv;
for ( int i = 0; i < Size; ++i ) {
sv.w[ i ] = (*this).w[ i ] + v.w[ i ];
}
return sv;
}

inline SuperVector<T, Size>& operator += ( const SuperVector<T, Size>
&v )
{
for ( int i = 0; i < Size; ++i ) {
(*this).w[ i ] += v.w[ i ];
}
return *this;
}
};

Here is an illustration of how expression template reduce the number of
temporaries:

#include <cstdlib // std::size_t
#include <iostream>

/*
We define many things that are not meant to show up
in client code:
*/
namespace DO_NOT_USE {

// the basic container:
template < typename ValueType, std::size_t Size >
class VectorData {

ValueType the_data [ Size ];

public:

VectorData ( ValueType val = ValueType() )
{
for ( std::size_t i = 0; i < Size; ++ i ) {
the_data[ i ] = val;
}
}

ValueType operator[] ( std::size_t i ) const {
return ( the_data[i] );
}

ValueType & operator[] ( std::size_t i ) {
return ( the_data[i] );
}

};

template < typename ValueType, std::size_t Size, typename Expr >
struct VectorExpr : public Expr {

VectorExpr ( void ) : Expr() {}

VectorExpr ( Expr const & a ) : Expr(a) {}

};
template < typename ValueType, std::size_t Size, typename Expr >
std::ostream &
operator<< ( std::ostream & o_str,
VectorExpr< ValueType, Size, Expr const & a ) {
if ( Size 0 ) {
std::size_t i = 0;
while( i < Size-1 ) {
o_str << a[i] << " ";
++i;
}
o_str << a[i];
}
return ( o_str );
}

template < typename ValueType, std::size_t Size, typename ExprA, typename
ExprB >
struct VectorPlusVector {

ExprA const & a;
ExprB const & b;

VectorPlusVector ( ExprA const & a_, ExprB const & b_ )
: a ( a_ )
, b ( b_ )
{}

ValueType operator[] ( std::size_t i ) const {
return ( a[i] + b[i] );
}

};

template < typename ValueType, std::size_t Size, typename ExprA, typename
ExprB >
VectorExpr< ValueType, Size, VectorPlusVector< ValueType, Size, ExprA,
ExprB
operator+ ( VectorExpr< ValueType, Size, ExprA const & a,
VectorExpr< ValueType, Size, ExprB const & b ) {
return ( VectorPlusVector< ValueType, Size, ExprA, ExprB >( a, b ) );
}
template < typename ValueType, std::size_t Size, typename ExprA >
struct VectorTimesScalar {

ExprA const & a;
ValueType b;

VectorTimesScalar ( ExprA const & a_, ValueType b_ )
: a ( a_ )
, b ( b_ )
{}

ValueType operator[] ( std::size_t i ) const {
return ( a[i] * b );
}

};

template < typename ValueType, std::size_t Size, typename ExprA >
VectorExpr< ValueType, Size, VectorTimesScalar< ValueType, Size, ExprA
operator* ( VectorExpr< ValueType, Size, ExprA const & a,
ValueType b ) {
return ( VectorTimesScalar< ValueType, Size, ExprA >( a, b ) );
}

template < typename ValueType, std::size_t Size >
class la_vect
: public VectorExpr< ValueType, Size, VectorData< ValueType, Size
{
public:

la_vect ( ValueType val = ValueType() )
: VectorExpr< ValueType, Size, VectorData< ValueType, Size ( val )
{}

template < typename Expr >
la_vect & operator= ( VectorExpr< ValueType, Size, Expr const & rhs )
{
for ( std::size_t i = 0; i < Size; ++i ) {
(*this)[ i ] = rhs[ i ];
}
return ( *this );
}

template < typename Expr >
la_vect & operator+= ( VectorExpr< ValueType, Size, Expr const & rhs )
{
for ( std::size_t i = 0; i < Size; ++i ) {
(*this)[ i ] += rhs[ i ];
}
return ( *this );
}

};

}

using DO_NOT_USE::la_vect;
int maxIter = static_cast<int>( 100000 );

#include <ctime>
#include <cstdlib>

int main ( void ) {
std::clock_t c1, c0 = std::clock();

c0 = std::clock();
for ( int i = 0; i < maxIter; ++i ) {
float real = 1.245;
float anotherReal = 20.43492342;
float v[ 151 ];
float v2[ 151 ];
std::memset( v, 0, sizeof( float ) * 151 );
std::memset( v2, 0, sizeof( float ) * 151 );

// mixing
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * real;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * anotherReal;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] += v[ j ];
}
}
c1 = std::clock();

std::cerr << "\nfloat[ 151 ]" << std::endl;
std::cerr << "end CPU time : " << (long)c1 << std::endl;
std::cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< std::endl;

c0 = std::clock();
for ( int i = 0; i < maxIter; ++i ) {
float real = 1.245;
float anotherReal = 20.43492342;
la_vect<float, 151v( 12.0 );
la_vect<float, 151v2( -12.0 );
v = v2 * float( 1.0 - 0.5 ) + v * float(0.5);
v += la_vect<float, 151>( 10.0 ) * real * anotherReal;
}

c1 = std::clock();

std::cerr << "\nla_vect class" << std::endl;
std::cerr << "end CPU time : " << (long)c1 << std::endl;
std::cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< std::endl;

}

designa.out

float[ 151 ]
end CPU time : 540000
elapsed CPU time : 0.54

la_vect class
end CPU time : 1160000
elapsed CPU time : 0.62
Also note that the computation using raw arrays and loops does not the same
thing as the computation using SuperVector. This might explain the
remaining difference: theoretically, an optimizing compiler could eliminate
almost all temporaries.
Best

Kai-Uwe Bux
Nov 8 '06 #9

P: n/a
Daniel T. wrote:
>
Of course the fact that my computer is faster than yours isn't relevant,
it's the percentage difference in the numbers that surprises. I'm
showing a 5% increase in speed for the algorithm use, whereas you show a
176% *decrease*. Are you sure you compiled with full optimizations for
speed?
Timing is one of those things that often puzzles me when using the .NET
compiler ( .NET 05 ). Algorithms almost always seem so much slower
that conventional loops. Full optimization produce a similar result.

Manual iteration
end CPU time : 7653
elapsed CPU time : 7.653

Algorithm Use
end CPU time : 29034
elapsed CPU time : 21.366

I'm confused.

Nov 10 '06 #10

P: n/a

ma740988 skrev:
Daniel T. wrote:
Of course the fact that my computer is faster than yours isn't relevant,
it's the percentage difference in the numbers that surprises. I'm
showing a 5% increase in speed for the algorithm use, whereas you show a
176% *decrease*. Are you sure you compiled with full optimizations for
speed?
Timing is one of those things that often puzzles me when using the .NET
compiler ( .NET 05 ). Algorithms almost always seem so much slower
that conventional loops. Full optimization produce a similar result.
Could you give us the command-line arguments here? Also tell us if you
have removed the extra checking in Visual Studio 2005.
/Peter
>
Manual iteration
end CPU time : 7653
elapsed CPU time : 7.653

Algorithm Use
end CPU time : 29034
elapsed CPU time : 21.366

I'm confused.
Nov 10 '06 #11

P: n/a
ma740988 wrote:
>
Timing is one of those things that often puzzles me when using the .NET
compiler ( .NET 05 ). Algorithms almost always seem so much slower
that conventional loops. Full optimization produce a similar result.

Manual iteration
end CPU time : 7653
elapsed CPU time : 7.653

Algorithm Use
end CPU time : 29034
elapsed CPU time : 21.366

I'm confused.
Even more confusing: These numbers are *slower* than when you didn't
compile with full optimization. Maybe you are optimizing for smaller
size rather than faster speed?

Nov 10 '06 #12

P: n/a

peter koch wrote:
Could you give us the command-line arguments here? Also tell us if you
have removed the extra checking in Visual Studio 2005.
/Peter
/Ox /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /Gm
/EHsc /MDd /Fo"Debug\\" /Fd"Debug\vc80.pdb" /W3 /nologo /c /Wp64 /TP
/errorReport:prompt

Nov 10 '06 #13

This discussion thread is closed

Replies have been disabled for this discussion.