446,402 Members | 1,011 Online
Need help? Post your question and get tips & solutions from a community of 446,402 IT Pros & Developers. It's quick & easy.

# How can a Vector class be optimized ?

 P: n/a Hi everyone I am working on some code that uses colors. Until recently this code used colors represented a tree floats (RGB format) but recently changed so colors are now defined as spectrum. The size of the vector went from 3 (RGB) to 151 (400 nm to 700 with a sample every 2nm). The variables are using a simple Vector class defined as follow: template class Vector { ... }; Since the move from the RGB version of the code to the Spectral version the application has significantly slowed dow. I did a test where I use the Vector class & just a straight usage of arrays of 151 floats on which the same operations are performed 1 million times. int maxIter = static_cast( 1e+6 ); #include clock_t c1, c0 = clock(); c0 = clock(); for ( int i = 0; i < maxIter; ++i ) { float real = 1.245; float anotherReal = 20.43492342; float v[ 151 ]; float v2[ 151 ]; memset( v, 0, sizeof( float ) * 151 ); memset( v2, 0, sizeof( float ) * 151 ); // mixing for ( int j = 0; j < 151; ++j ) { v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5; } // summing up & * for ( int j = 0; j < 151; ++j ) { v[ j ] = v[ j ] * real; } // summing up & * for ( int j = 0; j < 151; ++j ) { v[ j ] = v[ j ] * anotherReal; } // summing up & * for ( int j = 0; j < 151; ++j ) { v[ j ] += v[ j ]; } } c1 = clock(); cerr << "\nfloat[ 151 ]" << endl; cerr << "end CPU time : " << (long)c1 << endl; cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC << endl; c0 = clock(); for ( int i = 0; i < maxIter; ++i ) { float real = 1.245; float anotherReal = 20.43492342; Vector( 10.0 ) * real * anotherReal; } c1 = clock(); cerr << "\nSuperVector class" << endl; cerr << "end CPU time : " << (long)c1 << endl; cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC << endl; Here are the results // RGB version, Vector end CPU time : 390000 elapsed CPU time : 0.39 // Spectral Version Vector end CPU time : 10510000 elapsed CPU time : 10.12 // Using arrays of 151 floats end CPU time : 13230000 elapsed CPU time : 2.72 Basically it of course shows that using the Vector class really really slows down the application especially has the size of the Vector increases and is not as efficient as doing the operations on arrays of floats directly. So basically my question is : is there a way of optimising it ? I do realise that doing: Vector( 0.1 ) * 0.1 * 100.0; is not the same as doing: float result[ 151 ], temp [ 151 ]; for ( int i = 0; i < 151; ++i ) { temp[ i ] = 0.1f; result[ i ] = temp[ i ] * 0.1 * 100.0; } But isn't there a way i can make the Vector class as efficient as the second option (which is to do the math operation on arrays of float directly) ? Or if the speed is a priority is writing some C type of code the only way i can get it back when the vector size becomes an issue ? Thanks for you help - template class SuperVector { public: T w[ Size ]; public: SuperVector() { memset( w, 0, sizeof( T ) * Size ); } SuperVector( const T &real ) { for ( int i = 0; i < Size; ++i ) { (*this).w[ i ] = real; } } inline SuperVector &v ) { SuperVector &v ) { SuperVector& operator += ( const SuperVector &v ) { for ( int i = 0; i < Size; ++i ) { (*this).w[ i ] += v.w[ i ]; } return *this; } }; Nov 5 '06 #1
12 Replies

 P: n/a ma*****@yahoo.com wrote: Hi everyone I am working on some code that uses colors. Until recently this code used colors represented a tree floats (RGB format) but recently changed so colors are now defined as spectrum. The size of the vector went from 3 (RGB) to 151 (400 nm to 700 with a sample every 2nm). The variables are using a simple Vector class defined as follow: template class Vector { ... }; Since the move from the RGB version of the code to the Spectral version the application has significantly slowed dow. I did a test where I use the Vector class & just a straight usage of arrays of 151 floats on which the same operations are performed 1 million times. [code snipped] Read up on expression templates. Or, better, use a linear algebra library. Best Kai-Uwe Bux Nov 6 '06 #2

 P: n/a ma*****@yahoo.com wrote: Hi everyone I am working on some code that uses colors. Until recently this code used colors represented a tree floats (RGB format) but recently changed so colors are now defined as spectrum. The size of the vector went from 3 (RGB) to 151 (400 nm to 700 with a sample every 2nm). The variables are using a simple Vector class defined as follow: template class Vector { ... }; Since the move from the RGB version of the code to the Spectral version the application has significantly slowed dow. I did a test where I use the Vector class & just a straight usage of arrays of 151 floats on which the same operations are performed 1 million times. int maxIter = static_cast( 1e+6 ); #include clock_t c1, c0 = clock(); c0 = clock(); for ( int i = 0; i < maxIter; ++i ) { float real = 1.245; float anotherReal = 20.43492342; float v[ 151 ]; float v2[ 151 ]; memset( v, 0, sizeof( float ) * 151 ); memset( v2, 0, sizeof( float ) * 151 ); // mixing for ( int j = 0; j < 151; ++j ) { v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5; } // summing up & * for ( int j = 0; j < 151; ++j ) { v[ j ] = v[ j ] * real; } // summing up & * for ( int j = 0; j < 151; ++j ) { v[ j ] = v[ j ] * anotherReal; } // summing up & * for ( int j = 0; j < 151; ++j ) { v[ j ] += v[ j ]; } } c1 = clock(); cerr << "\nfloat[ 151 ]" << endl; cerr << "end CPU time : " << (long)c1 << endl; cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC << endl; c0 = clock(); for ( int i = 0; i < maxIter; ++i ) { float real = 1.245; float anotherReal = 20.43492342; Vector( 10.0 ) * real * anotherReal; } c1 = clock(); cerr << "\nSuperVector class" << endl; cerr << "end CPU time : " << (long)c1 << endl; cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC << endl; Here are the results // RGB version, Vector end CPU time : 390000 elapsed CPU time : 0.39 // Spectral Version Vector end CPU time : 10510000 elapsed CPU time : 10.12 // Using arrays of 151 floats end CPU time : 13230000 elapsed CPU time : 2.72 _____________________________ Results: float[ 151 ] end CPU time : 2620000 elapsed CPU time : 2.62 std::vector class end CPU time : 4680000 elapsed CPU time : 2.06 > Basically it of course shows that using the Vector class really really slows down the application especially has the size of the Vector increases and is not as efficient as doing the operations on arrays of floats directly. So basically my question is : is there a way of optimising it ? yes, use resize() to manually specify the container's size. void resize(n, t = T()) - Inserts or erases elements at the end such that the size becomes n > I do realise that doing: Vector( 0.1 ) * 0.1 * 100.0; is not the same as doing: float result[ 151 ], temp [ 151 ]; for ( int i = 0; i < 151; ++i ) { temp[ i ] = 0.1f; result[ i ] = temp[ i ] * 0.1 * 100.0; } But isn't there a way i can make the Vector class as efficient as the second option (which is to do the math operation on arrays of float directly) ? Or if the speed is a priority is writing some C type of code the only way i can get it back when the vector size becomes an issue ? Thanks for you help - template class SuperVector { public: T w[ Size ]; public: SuperVector() { memset( w, 0, sizeof( T ) * Size ); } SuperVector( const T &real ) { for ( int i = 0; i < Size; ++i ) { (*this).w[ i ] = real; } } inline SuperVector &v ) { SuperVector &v ) { SuperVector& operator += ( const SuperVector &v ) { for ( int i = 0; i < Size; ++i ) { (*this).w[ i ] += v.w[ i ]; } return *this; } }; Nov 6 '06 #3

 P: n/a "ma*****@yahoo.com" ( 10.0 ) * real * anotherReal; Optimize your vector class by removing the op* and op+. Too many temporaries are being created. Here is an interesting exorcise: int maxIter = static_cast( 1e+6 ); clock_t c1, c0 = clock(); struct binary_op { float operator()( float lhs, float rhs ) const { return lhs * ( 1.0 - 0.5 ) + rhs * 0.5; } }; struct unary_op { unary_op( float r, float r2 ): real( r ), anotherReal( r2 ) { } float operator()( float v ) const { return v + 10.0 * real * anotherReal; } const float real, anotherReal; }; int main() { float real = 1.245; float anotherReal = 20.43492342; vector

 P: n/a Daniel T. wrote: "ma*****@yahoo.com" ( 10.0 ) * real * anotherReal; Optimize your vector class by removing the op* and op+. Too many temporaries are being created. Here is an interesting exorcise: int maxIter = static_cast( 1e+6 ); clock_t c1, c0 = clock(); struct binary_op { float operator()( float lhs, float rhs ) const { return lhs * ( 1.0 - 0.5 ) + rhs * 0.5; } }; struct unary_op { unary_op( float r, float r2 ): real( r ), anotherReal( r2 ) { } float operator()( float v ) const { return v + 10.0 * real * anotherReal; } const float real, anotherReal; }; int main() { float real = 1.245; float anotherReal = 20.43492342; vector

 P: n/a "ma740988" Manual iterationend CPU time : 174elapsed CPU time : 1.74Algorithm Useend CPU time : 265elapsed CPU time : 0.91 Perhaps, I'm missing something here, nonetheless, the output for Manual Iteration ( dump v.front() and v.back() ) results in all zeros, while that of the algorithm doesn't. Digging deeper I realize that the return from unary_op and the summation in the manual iteration aren't the same. So I modified the source... P4 3.2Ghz MSVC.NET -O3 optimization Manual iteration end CPU time : 7570 elapsed CPU time : 7.57 Algorithm Use end CPU time : 28453 elapsed CPU time : 20.883 Press any key to continue . . . Odd, I used your main and it, of course, speeded up the manual iteration considerably: PowerPC G5 1.6GHz g++-4.0 Manual iteration end CPU time : 96 elapsed CPU time : 0.96 Algorithm Use end CPU time : 187 elapsed CPU time : 0.91 cpp_sandbox has exited with status 0. Of course the fact that my computer is faster than yours isn't relevant, it's the percentage difference in the numbers that surprises. I'm showing a 5% increase in speed for the algorithm use, whereas you show a 176% *decrease*. Are you sure you compiled with full optimizations for speed? -- To send me email, put "sheltie" in the subject. Nov 7 '06 #6

 P: n/a Daniel T. wrote: "ma740988"

 P: n/a "ma740988" "ma740988" >Daniel T. wrote: Of course the fact that my computer is faster than yours isn'trelevant, it's the percentage difference in the numbers thatsurprises. I'm showing a 5% increase in speed for the algorithmuse, whereas you show a 176% *decrease*. Are you sure you compiledwith full optimizations for speed? I'll try full optimization then observe the difference. My initial test was done with 03 (MSCV.NET 05 ) optimzation. What compiler are you using? PowerPC G5 1.6GHz g++ - 4.0 -- To send me email, put "sheltie" in the subject. Nov 8 '06 #8

 P: n/a ma*****@yahoo.com wrote: Hi everyone I am working on some code that uses colors. Until recently this code used colors represented a tree floats (RGB format) but recently changed so colors are now defined as spectrum. The size of the vector went from 3 (RGB) to 151 (400 nm to 700 with a sample every 2nm). The variables are using a simple Vector class defined as follow: template class Vector { ... }; Since the move from the RGB version of the code to the Spectral version the application has significantly slowed dow. I did a test where I use the Vector class & just a straight usage of arrays of 151 floats on which the same operations are performed 1 million times. int maxIter = static_cast( 1e+6 ); #include clock_t c1, c0 = clock(); c0 = clock(); for ( int i = 0; i < maxIter; ++i ) { float real = 1.245; float anotherReal = 20.43492342; float v[ 151 ]; float v2[ 151 ]; memset( v, 0, sizeof( float ) * 151 ); memset( v2, 0, sizeof( float ) * 151 ); // mixing for ( int j = 0; j < 151; ++j ) { v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5; } // summing up & * for ( int j = 0; j < 151; ++j ) { v[ j ] = v[ j ] * real; } // summing up & * for ( int j = 0; j < 151; ++j ) { v[ j ] = v[ j ] * anotherReal; } // summing up & * for ( int j = 0; j < 151; ++j ) { v[ j ] += v[ j ]; } } c1 = clock(); cerr << "\nfloat[ 151 ]" << endl; cerr << "end CPU time : " << (long)c1 << endl; cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC << endl; c0 = clock(); for ( int i = 0; i < maxIter; ++i ) { float real = 1.245; float anotherReal = 20.43492342; Vector( 10.0 ) * real * anotherReal; } c1 = clock(); cerr << "\nSuperVector class" << endl; cerr << "end CPU time : " << (long)c1 << endl; cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC << endl; Here are the results // RGB version, Vector end CPU time : 390000 elapsed CPU time : 0.39 // Spectral Version Vector end CPU time : 10510000 elapsed CPU time : 10.12 // Using arrays of 151 floats end CPU time : 13230000 elapsed CPU time : 2.72 Basically it of course shows that using the Vector class really really slows down the application especially has the size of the Vector increases and is not as efficient as doing the operations on arrays of floats directly. So basically my question is : is there a way of optimising it ? I do realise that doing: Vector( 0.1 ) * 0.1 * 100.0; is not the same as doing: float result[ 151 ], temp [ 151 ]; for ( int i = 0; i < 151; ++i ) { temp[ i ] = 0.1f; result[ i ] = temp[ i ] * 0.1 * 100.0; } But isn't there a way i can make the Vector class as efficient as the second option (which is to do the math operation on arrays of float directly) ? Or if the speed is a priority is writing some C type of code the only way i can get it back when the vector size becomes an issue ? Thanks for you help - template class SuperVector { public: T w[ Size ]; public: SuperVector() { memset( w, 0, sizeof( T ) * Size ); } SuperVector( const T &real ) { for ( int i = 0; i < Size; ++i ) { (*this).w[ i ] = real; } } inline SuperVector &v ) { SuperVector &v ) { SuperVector& operator += ( const SuperVector &v ) { for ( int i = 0; i < Size; ++i ) { (*this).w[ i ] += v.w[ i ]; } return *this; } }; Here is an illustration of how expression template reduce the number of temporaries: #include /* We define many things that are not meant to show up in client code: */ namespace DO_NOT_USE { // the basic container: template < typename ValueType, std::size_t Size > class VectorData { ValueType the_data [ Size ]; public: VectorData ( ValueType val = ValueType() ) { for ( std::size_t i = 0; i < Size; ++ i ) { the_data[ i ] = val; } } ValueType operator[] ( std::size_t i ) const { return ( the_data[i] ); } ValueType & operator[] ( std::size_t i ) { return ( the_data[i] ); } }; template < typename ValueType, std::size_t Size, typename Expr > struct VectorExpr : public Expr { VectorExpr ( void ) : Expr() {} VectorExpr ( Expr const & a ) : Expr(a) {} }; template < typename ValueType, std::size_t Size, typename Expr > std::ostream & operator<< ( std::ostream & o_str, VectorExpr< ValueType, Size, Expr const & a ) { if ( Size 0 ) { std::size_t i = 0; while( i < Size-1 ) { o_str << a[i] << " "; ++i; } o_str << a[i]; } return ( o_str ); } template < typename ValueType, std::size_t Size, typename ExprA, typename ExprB > struct VectorPlusVector { ExprA const & a; ExprB const & b; VectorPlusVector ( ExprA const & a_, ExprB const & b_ ) : a ( a_ ) , b ( b_ ) {} ValueType operator[] ( std::size_t i ) const { return ( a[i] + b[i] ); } }; template < typename ValueType, std::size_t Size, typename ExprA, typename ExprB > VectorExpr< ValueType, Size, VectorPlusVector< ValueType, Size, ExprA, ExprB operator+ ( VectorExpr< ValueType, Size, ExprA const & a, VectorExpr< ValueType, Size, ExprB const & b ) { return ( VectorPlusVector< ValueType, Size, ExprA, ExprB >( a, b ) ); } template < typename ValueType, std::size_t Size, typename ExprA > struct VectorTimesScalar { ExprA const & a; ValueType b; VectorTimesScalar ( ExprA const & a_, ValueType b_ ) : a ( a_ ) , b ( b_ ) {} ValueType operator[] ( std::size_t i ) const { return ( a[i] * b ); } }; template < typename ValueType, std::size_t Size, typename ExprA > VectorExpr< ValueType, Size, VectorTimesScalar< ValueType, Size, ExprA operator* ( VectorExpr< ValueType, Size, ExprA const & a, ValueType b ) { return ( VectorTimesScalar< ValueType, Size, ExprA >( a, b ) ); } template < typename ValueType, std::size_t Size > class la_vect : public VectorExpr< ValueType, Size, VectorData< ValueType, Size { public: la_vect ( ValueType val = ValueType() ) : VectorExpr< ValueType, Size, VectorData< ValueType, Size ( val ) {} template < typename Expr > la_vect & operator= ( VectorExpr< ValueType, Size, Expr const & rhs ) { for ( std::size_t i = 0; i < Size; ++i ) { (*this)[ i ] = rhs[ i ]; } return ( *this ); } template < typename Expr > la_vect & operator+= ( VectorExpr< ValueType, Size, Expr const & rhs ) { for ( std::size_t i = 0; i < Size; ++i ) { (*this)[ i ] += rhs[ i ]; } return ( *this ); } }; } using DO_NOT_USE::la_vect; int maxIter = static_cast( 100000 ); #include #include int main ( void ) { std::clock_t c1, c0 = std::clock(); c0 = std::clock(); for ( int i = 0; i < maxIter; ++i ) { float real = 1.245; float anotherReal = 20.43492342; float v[ 151 ]; float v2[ 151 ]; std::memset( v, 0, sizeof( float ) * 151 ); std::memset( v2, 0, sizeof( float ) * 151 ); // mixing for ( int j = 0; j < 151; ++j ) { v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5; } // summing up & * for ( int j = 0; j < 151; ++j ) { v[ j ] = v[ j ] * real; } // summing up & * for ( int j = 0; j < 151; ++j ) { v[ j ] = v[ j ] * anotherReal; } // summing up & * for ( int j = 0; j < 151; ++j ) { v[ j ] += v[ j ]; } } c1 = std::clock(); std::cerr << "\nfloat[ 151 ]" << std::endl; std::cerr << "end CPU time : " << (long)c1 << std::endl; std::cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC << std::endl; c0 = std::clock(); for ( int i = 0; i < maxIter; ++i ) { float real = 1.245; float anotherReal = 20.43492342; la_vect( 10.0 ) * real * anotherReal; } c1 = std::clock(); std::cerr << "\nla_vect class" << std::endl; std::cerr << "end CPU time : " << (long)c1 << std::endl; std::cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC << std::endl; } designa.out float[ 151 ] end CPU time : 540000 elapsed CPU time : 0.54 la_vect class end CPU time : 1160000 elapsed CPU time : 0.62 Also note that the computation using raw arrays and loops does not the same thing as the computation using SuperVector. This might explain the remaining difference: theoretically, an optimizing compiler could eliminate almost all temporaries. Best Kai-Uwe Bux Nov 8 '06 #9

 P: n/a Daniel T. wrote: > Of course the fact that my computer is faster than yours isn't relevant, it's the percentage difference in the numbers that surprises. I'm showing a 5% increase in speed for the algorithm use, whereas you show a 176% *decrease*. Are you sure you compiled with full optimizations for speed? Timing is one of those things that often puzzles me when using the .NET compiler ( .NET 05 ). Algorithms almost always seem so much slower that conventional loops. Full optimization produce a similar result. Manual iteration end CPU time : 7653 elapsed CPU time : 7.653 Algorithm Use end CPU time : 29034 elapsed CPU time : 21.366 I'm confused. Nov 10 '06 #10

 P: n/a ma740988 skrev: Daniel T. wrote: Of course the fact that my computer is faster than yours isn't relevant, it's the percentage difference in the numbers that surprises. I'm showing a 5% increase in speed for the algorithm use, whereas you show a 176% *decrease*. Are you sure you compiled with full optimizations for speed? Timing is one of those things that often puzzles me when using the .NET compiler ( .NET 05 ). Algorithms almost always seem so much slower that conventional loops. Full optimization produce a similar result. Could you give us the command-line arguments here? Also tell us if you have removed the extra checking in Visual Studio 2005. /Peter > Manual iteration end CPU time : 7653 elapsed CPU time : 7.653 Algorithm Use end CPU time : 29034 elapsed CPU time : 21.366 I'm confused. Nov 10 '06 #11

 P: n/a ma740988 wrote: > Timing is one of those things that often puzzles me when using the .NET compiler ( .NET 05 ). Algorithms almost always seem so much slower that conventional loops. Full optimization produce a similar result. Manual iteration end CPU time : 7653 elapsed CPU time : 7.653 Algorithm Use end CPU time : 29034 elapsed CPU time : 21.366 I'm confused. Even more confusing: These numbers are *slower* than when you didn't compile with full optimization. Maybe you are optimizing for smaller size rather than faster speed? Nov 10 '06 #12

 P: n/a peter koch wrote: Could you give us the command-line arguments here? Also tell us if you have removed the extra checking in Visual Studio 2005. /Peter /Ox /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /Gm /EHsc /MDd /Fo"Debug\\" /Fd"Debug\vc80.pdb" /W3 /nologo /c /Wp64 /TP /errorReport:prompt Nov 10 '06 #13

### This discussion thread is closed

Replies have been disabled for this discussion.