Operator overloading, C++ performance crappiness

Jojo

Is there any way to get to the left-hand side of an operator? Consider
the following (this is not meant to be perfect code, just an example of
the problem):

class Matrix
{
public:
int data[1024];

Matrix() {}

Matrix(int value)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
data[i] = value;
}

void add(const Matrix& obj, Matrix* output)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
output->data[i] = data[i] + obj.data[i];
}

Matrix operator +(const Matrix& obj)
{
Matrix temp; // "unnecessary" creation of temp variable

for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
temp.data[i] = data[i] + obj.data[i];

return temp; // "unnecessary" extra copy of output
}
};

For nice looking syntax you _really_ want to use the operator+ like:
matrix3 = matrix1 + matrix2;

However, that is some 50% slower than the _much_ uglier:
matrix1.add(matrix2, &matrix3);

If only there were a way to get to the left-hand argument of the
operator+ then it could be fast and easy to use. Consider the following
code which is not valid C++ and will not compile for this example:

Matrix as M
operator+(const Matrix& obj)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
M.data[i] = data[i] + obj.data[i];
}

That would be fast and clean to use. Is there any way to accomplish
this? Otherwise the situation is just ugly and there is no point in
using operator overloading for these types of situations (which really
defeats the purpose of operator overloading in the first place).

Thanks! Jo

Aug 17 '05 #1

Subscribe Reply

3514

Christian Meier

"Jojo" <jo**@pleasenomorespamicanttakeitanymore.com> schrieb im Newsbeitrag
news:43***********************@authen.white.readfr eenews.net...

Is there any way to get to the left-hand side of an operator? Consider
the following (this is not meant to be perfect code, just an example of
the problem):

class Matrix
{
public:
int data[1024];

Matrix() {}

Matrix(int value)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
data[i] = value;
}

void add(const Matrix& obj, Matrix* output)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
output->data[i] = data[i] + obj.data[i];
}

Matrix operator +(const Matrix& obj)
{
Matrix temp; // "unnecessary" creation of temp variable

for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
temp.data[i] = data[i] + obj.data[i];

return temp; // "unnecessary" extra copy of output
}
};

For nice looking syntax you _really_ want to use the operator+ like:
matrix3 = matrix1 + matrix2;

However, that is some 50% slower than the _much_ uglier:
matrix1.add(matrix2, &matrix3);

If only there were a way to get to the left-hand argument of the
operator+ then it could be fast and easy to use. Consider the following
code which is not valid C++ and will not compile for this example:

Matrix as M
operator+(const Matrix& obj)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
M.data[i] = data[i] + obj.data[i];
}

That would be fast and clean to use. Is there any way to accomplish
this? Otherwise the situation is just ugly and there is no point in
using operator overloading for these types of situations (which really
defeats the purpose of operator overloading in the first place).

Thanks! Jo

You could just use operator+=
matrix1 += matrix2;
matrix1 += matrix3;

or rewrite add()
matrix1.add(matrix2);
matrix1.add(matrix3);

or you could write the function add using ellipses:
add(...);

matrix1.add(&matrix2, &matrix3, &matrix4, &matrix5);

Greets Chris

Aug 17 '05 #2

Jojo

Christian Meier wrote:

You could just use operator+=
matrix1 += matrix2;
matrix1 += matrix3;

or rewrite add()
matrix1.add(matrix2);
matrix1.add(matrix3);

or you could write the function add using ellipses:
add(...);

matrix1.add(&matrix2, &matrix3, &matrix4, &matrix5);

Greets Chris

"+=" does not accomplish the same thing, and neither does add(Matrix).
There is a third variable involved which is the result of adding two
other variables that you don't want to modify.

As I mentioned the "matrix.add()" syntax certainly works and it is fast
but it is extremely awkward to use and makes for some nasty looking code.

Jo

Aug 17 '05 #3

benben

> Is there any way to get to the left-hand side of an operator? Consider

the following (this is not meant to be perfect code, just an example of
the problem):

class Matrix
{
public:
int data[1024];

Matrix() {}

Matrix(int value)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
data[i] = value;
}

void add(const Matrix& obj, Matrix* output)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
output->data[i] = data[i] + obj.data[i];
}

Matrix operator +(const Matrix& obj)
{
Matrix temp; // "unnecessary" creation of temp variable

for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
temp.data[i] = data[i] + obj.data[i];

return temp; // "unnecessary" extra copy of output
}
};

For nice looking syntax you _really_ want to use the operator+ like:
matrix3 = matrix1 + matrix2;

However, that is some 50% slower than the _much_ uglier:
matrix1.add(matrix2, &matrix3);

If only there were a way to get to the left-hand argument of the operator+
then it could be fast and easy to use. Consider the following code which
is not valid C++ and will not compile for this example:

Matrix as M
operator+(const Matrix& obj)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
M.data[i] = data[i] + obj.data[i];
}

That would be fast and clean to use. Is there any way to accomplish this?
Otherwise the situation is just ugly and there is no point in using
operator overloading for these types of situations (which really defeats
the purpose of operator overloading in the first place).

Thanks! Jo

If that code is slow, that's because it is not well written. In production
code you probably don't want to return a full-blown matrix object from
operator + because that'd make a temporary. Instead, a small object of some
other type is returned that records the operation and operands. The
evaluation happens eventually in the assignment operation.

class MatrixOp;

MatrixOp operator + (const Matrix&, const Matrix);
Matrix operator = (Matrix&, const MatrixOp);

Ben

Aug 17 '05 #4

Christian Meier

"Jojo" <jo**@pleasenomorespamicanttakeitanymore.com> schrieb im Newsbeitrag
news:43***********************@authen.white.readfr eenews.net...

Christian Meier wrote:
You could just use operator+=
matrix1 += matrix2;
matrix1 += matrix3;

or rewrite add()
matrix1.add(matrix2);
matrix1.add(matrix3);

or you could write the function add using ellipses:
add(...);

matrix1.add(&matrix2, &matrix3, &matrix4, &matrix5);

Greets Chris
"+=" does not accomplish the same thing, and neither does add(Matrix).
There is a third variable involved which is the result of adding two
other variables that you don't want to modify.

Why not? After your statements an object has the value of the sum of two
others.
obj1 = obj2 + obj3;
obj1 += obj2;
obj1 += obj3;

If you need to reset obj1 first, you can write a reset() function. Or assign
0 before the +=operator calls.

As I mentioned the "matrix.add()" syntax certainly works and it is fast
but it is extremely awkward to use and makes for some nasty looking code.

Jo
Is sizeof(data)/sizeof(int) not the same for these three objects?
If not, then you have problems with your function add():
void add(const Matrix& obj, Matrix* output)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
output->data[i] = data[i] + obj.data[i];
}

You do no range checking here for output->data and obj.data.

Greets Chris

Aug 17 '05 #5

Jojo

benben wrote:

If that code is slow, that's because it is not well written. In production
code you probably don't want to return a full-blown matrix object from
operator + because that'd make a temporary. Instead, a small object of some
other type is returned that records the operation and operands. The
evaluation happens eventually in the assignment operation.

class MatrixOp;

MatrixOp operator + (const Matrix&, const Matrix);
Matrix operator = (Matrix&, const MatrixOp);

Ben

You are still going to incur a performance penalty from copying data
into the MatrixOp variable. This penalty is certainly much smaller than
that of copying the objects in my example but as I said in the beginning
my example is to illustrate the problem and not be an example of perfect
code used in production.

My very last pseudo code for operator+ would still be faster than using
a MatrixOp temp variable and the "Matrix.add(const Matrix&, Matrix*
output)" is faster as well (although extemely ugly as I already mentioned).

Jo

Aug 17 '05 #6

Kyle

Jojo wrote:

Is there any way to get to the left-hand side of an operator? Consider
the following (this is not meant to be perfect code, just an example of
the problem):

class Matrix
{
public:
int data[1024];

Matrix() {}

Matrix(int value)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
data[i] = value;
}

void add(const Matrix& obj, Matrix* output)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
output->data[i] = data[i] + obj.data[i];
}

Matrix operator +(const Matrix& obj)
{
Matrix temp; // "unnecessary" creation of temp variable

for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
temp.data[i] = data[i] + obj.data[i];

return temp; // "unnecessary" extra copy of output
}
};

For nice looking syntax you _really_ want to use the operator+ like:
matrix3 = matrix1 + matrix2;

However, that is some 50% slower than the _much_ uglier:
matrix1.add(matrix2, &matrix3);

If only there were a way to get to the left-hand argument of the
operator+ then it could be fast and easy to use.
you would want to get to left hand argument of operator=, but then how
do you know its there ?

what about:

foo( a + b );

operator+ would need to know about the context its used in than ...
Consider the following
code which is not valid C++ and will not compile for this example:

Matrix as M
operator+(const Matrix& obj)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
M.data[i] = data[i] + obj.data[i];
}

That would be fast and clean to use. Is there any way to accomplish
this? Otherwise the situation is just ugly and there is no point in
using operator overloading for these types of situations (which really
defeats the purpose of operator overloading in the first place).

Thanks! Jo

Aug 17 '05 #7

benben

> You are still going to incur a performance penalty from copying data into

the MatrixOp variable. This penalty is certainly much smaller than that
of copying the objects in my example but as I said in the beginning my
example is to illustrate the problem and not be an example of perfect code
used in production.

MatrixOp can be constructed without copying any matrix at all. A reference
or pointer will do.
Ben

Aug 17 '05 #8

Jojo

benben wrote:

You are still going to incur a performance penalty from copying data into
the MatrixOp variable. This penalty is certainly much smaller than that
of copying the objects in my example but as I said in the beginning my
example is to illustrate the problem and not be an example of perfect code
used in production.

MatrixOp can be constructed without copying any matrix at all. A reference
or pointer will do.
Ben

I didn't say anything to the contrary. The "add(const Object& obj,
Object* output)" method in my example is still faster. MatrixOp adds
overhead no matter how you look at it. Like I said before, it's a
smaller overhead than copying a large object but it _does_ add overhead.

Consider if we're working with millions of "Vector" objects that only
have two or three float members. MatrixOp would have about the same
complexity as the Vector object itself. There would be no benefit to
using it and it would be slower than the ever so ugly:

void add(const Vector& obj, Vector* output)
{
*output = obj;
}

Jo

Aug 17 '05 #9

Jojo

Christian Meier wrote:

Why not? After your statements an object has the value of the sum of two
others.
obj1 = obj2 + obj3;
obj1 += obj2;
obj1 += obj3;
That is not going to be any faster than the plain slow operator+ in my
example. First you "add" (copy) obj2 into obj1, then you add obj3.
This is slower than just adding two variables straight out into a third.
If you need to reset obj1 first, you can write a reset() function. Or assign
0 before the +=operator calls.

Again, another performance hit that is going to be slower than just
adding two objects straight out into a (uninitialized) third.

Jo

Aug 17 '05 #10

Christian Meier

> > obj1 += obj2;

obj1 += obj3;

That is not going to be any faster than the plain slow operator+ in my
example. First you "add" (copy) obj2 into obj1, then you add obj3.
This is slower than just adding two variables straight out into a third.

Who mentioned the word copy?
Matrix& Matrix::operator+=(const Matrix& rMatrix)
{}

No copy is created.

Aug 17 '05 #11

Jojo

Kyle wrote:

you would want to get to left hand argument of operator=, but then how
do you know its there ?
Good point but the compiler can handle that. If no left-hand variable
is present then the compiler would generate a transient temp variable of
the return type.
what about:

foo( a + b );

operator+ would need to know about the context its used in than ...

That is still a lot uglier than "var3 = var1 + var2".

Jo

Aug 17 '05 #12

Maxim Yegorushkin

Jojo wrote:

[]

That would be fast and clean to use. Is there any way to accomplish
this? Otherwise the situation is just ugly and there is no point in
using operator overloading for these types of situations (which really
defeats the purpose of operator overloading in the first place).

Use expression templates. Refer to
http://osl.iu.edu/~tveldhui/papers/E.../exprtmpl.html

Aug 17 '05 #13

Jojo

Christian Meier wrote:

obj1 += obj2;
obj1 += obj3;

That is not going to be any faster than the plain slow operator+ in my
example. First you "add" (copy) obj2 into obj1, then you add obj3.
This is slower than just adding two variables straight out into a third.

Who mentioned the word copy?
Matrix& Matrix::operator+=(const Matrix& rMatrix)
{}

No copy is created.

LOL... That's why I said "add (copy)". The first addition is equivalent
to a copy. See:

1. obj1 is in "emtpy" state (this has overhead because it would need to
be initialied to all 0).

2. obj1 += obj2. This is essientially copying obj2 into obj1 because
obj1 now has the value of obj2. Overhead again.

3. obj1 += obj3. Now you finally do the operation you want where you
add obj2 (remember obj1==obj2 at this point) to obj3. This is really
were all the work is and the only required overhead.

That is slower than:

1. obj1 in completely uninitialized or otherwise unknown state (no overhead)

2. Directly set obj1 to the value of obj2 + obj3

Jo

Aug 17 '05 #14

Karl Heinz Buchegger

Jojo wrote:

Is there any way to get to the left-hand side of an operator? Consider
the following (this is not meant to be perfect code, just an example of
the problem):
You must have a compiler which is really bad at optimization.
In the following code, the timing is as follows:

operator+: 180 clock ticks
function add(): 240 clock ticks

So (thanks to the optimizer, which optimizes away the total overhead
of the temporary) the code using operator+ is actually *faster* then
your specialized function.

#include <iostream>
#include <ctime>

using namespace std;

class Matrix
{
public:
int data[1024];

Matrix() {}
Matrix(int value)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
data[i] = value;
}

void add(const Matrix& obj, Matrix& output)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
output.data[i] = data[i] + obj.data[i];
}

friend Matrix operator+( const Matrix& lhs, const Matrix& rhs );
};

inline Matrix operator +( const Matrix& lhs, const Matrix& rhs )
{
Matrix temp; // "unnecessary" creation of temp variable

for (unsigned i = 0; i < sizeof(lhs.data)/sizeof(int); i++)
temp.data[i] = lhs.data[i] + rhs.data[i];

return temp;
}

int main()
{
Matrix a(2), b(3), c;
time_t start, end;

start = clock();
for( int j = 0; j < 100000; ++j )
c = a + b;
end = clock();

cout << end - start << endl;

start = clock();
for( int j = 0; j < 100000; ++j )
a.add( b, c );
end = clock();

cout << end - start << endl;
}

class Matrix
{
public:
int data[1024];

Matrix() {}

Matrix(int value)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
data[i] = value;
}

void add(const Matrix& obj, Matrix* output)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
output->data[i] = data[i] + obj.data[i];
}

Matrix operator +(const Matrix& obj)
{
Matrix temp; // "unnecessary" creation of temp variable

for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
temp.data[i] = data[i] + obj.data[i];

return temp; // "unnecessary" extra copy of output
}
};

For nice looking syntax you _really_ want to use the operator+ like:
matrix3 = matrix1 + matrix2;

However, that is some 50% slower than the _much_ uglier:
matrix1.add(matrix2, &matrix3);

If only there were a way to get to the left-hand argument of the
operator+ then it could be fast and easy to use. Consider the following
code which is not valid C++ and will not compile for this example:

Matrix as M
operator+(const Matrix& obj)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
M.data[i] = data[i] + obj.data[i];
}

That would be fast and clean to use. Is there any way to accomplish
this? Otherwise the situation is just ugly and there is no point in
using operator overloading for these types of situations (which really
defeats the purpose of operator overloading in the first place).

Thanks! Jo

--
Karl Heinz Buchegger, GASCAD GmbH
Teichstrasse 2
A-4595 Waldneukirchen
Tel ++43/7258/7545-0 Fax ++43/7258/7545-99
email: kb******@gascad.at Web: www.gascad.com

Fuer sehr grosse Werte von 2 gilt: 2 + 2 = 5

Aug 17 '05 #15

Christian Meier

"Jojo" <jo**@pleasenomorespamicanttakeitanymore.com> schrieb im Newsbeitrag
news:43***********************@authen.white.readfr eenews.net...

Christian Meier wrote:
obj1 += obj2;
obj1 += obj3;

That is not going to be any faster than the plain slow operator+ in my
example. First you "add" (copy) obj2 into obj1, then you add obj3.
This is slower than just adding two variables straight out into a third.

Who mentioned the word copy?
Matrix& Matrix::operator+=(const Matrix& rMatrix)
{}

No copy is created.

LOL... That's why I said "add (copy)". The first addition is equivalent
to a copy. See:

1. obj1 is in "emtpy" state (this has overhead because it would need to
be initialied to all 0).

2. obj1 += obj2. This is essientially copying obj2 into obj1 because
obj1 now has the value of obj2. Overhead again.

3. obj1 += obj3. Now you finally do the operation you want where you
add obj2 (remember obj1==obj2 at this point) to obj3. This is really
were all the work is and the only required overhead.

That is slower than:

1. obj1 in completely uninitialized or otherwise unknown state (no

overhead)
2. Directly set obj1 to the value of obj2 + obj3

Jo

aaah, now I start to understand your "copy" :-)
OK, back to the beginning.
Your function add() is not totally bad. But the name "add" is a bit
confusing with your two parameters. What about something like "setToSum"?
Now to your parameters. Why don't use use *this for the object to be set? Is
there a reason why *this has to be unchanged?

void setToSum(const Matrix& first, const Matrix& second)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); ++i)
data[i] = first.data[i] + second.data[i];
}

This is perhaps a bit cleaner.

Greets Chris

Aug 17 '05 #16

Jojo

Karl Heinz Buchegger wrote:

Jojo wrote:
Is there any way to get to the left-hand side of an operator? Consider
the following (this is not meant to be perfect code, just an example of
the problem):

You must have a compiler which is really bad at optimization.
In the following code, the timing is as follows:

operator+: 180 clock ticks
function add(): 240 clock ticks

So (thanks to the optimizer, which optimizes away the total overhead
of the temporary) the code using operator+ is actually *faster* then
your specialized function.

Yes a good compiler could do that. I have not found one than can.

So the big question is what compiler are you using?

I'm using GCC (I tried versions 3.3 and 4.0).

Jo

Aug 17 '05 #17

Jojo

Christian Meier wrote:

aaah, now I start to understand your "copy" :-)
OK, back to the beginning.
Your function add() is not totally bad. But the name "add" is a bit
confusing with your two parameters. What about something like "setToSum"?
Now to your parameters. Why don't use use *this for the object to be set? Is
there a reason why *this has to be unchanged?

void setToSum(const Matrix& first, const Matrix& second)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); ++i)
data[i] = first.data[i] + second.data[i];
}

This is perhaps a bit cleaner.

Greets Chris

True, but you can name it anything and the code will still look really
bad when things get complicated.

Nothing works as clean as "var3 = var1 + var2". There just doesn't seem
to be a way to do that efficiently in C++ (of course unless you have a
really good compiler as mentioned in another post; show me that compiler!).

Jo

Aug 17 '05 #18

Christian Meier

"Jojo" <jo**@pleasenomorespamicanttakeitanymore.com> schrieb im Newsbeitrag
news:43***********************@authen.white.readfr eenews.net...

Christian Meier wrote:
aaah, now I start to understand your "copy" :-)
OK, back to the beginning.
Your function add() is not totally bad. But the name "add" is a bit
confusing with your two parameters. What about something like "setToSum"? Now to your parameters. Why don't use use *this for the object to be set? Is there a reason why *this has to be unchanged?

void setToSum(const Matrix& first, const Matrix& second)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); ++i)
data[i] = first.data[i] + second.data[i];
}

This is perhaps a bit cleaner.

Greets Chris
True, but you can name it anything and the code will still look really
bad when things get complicated.

Also true, but a bit better than add(const Matrix&, Matrix*);

Nothing works as clean as "var3 = var1 + var2". There just doesn't seem
to be a way to do that efficiently in C++ (of course unless you have a
really good compiler as mentioned in another post; show me that

compiler!).

IMHO optimize away a temporary return value is support by many compilers....

Aug 17 '05 #19

Jojo

Karl Heinz Buchegger wrote:

inline Matrix operator +( const Matrix& lhs, const Matrix& rhs )
{
Matrix temp; // "unnecessary" creation of temp variable

for (unsigned i = 0; i < sizeof(lhs.data)/sizeof(int); i++)
temp.data[i] = lhs.data[i] + rhs.data[i];

return temp;
}

OK, _this_ is the answer I think I have been looking for. This method
is not the method I posted and this is indeed faster in GCC.

My method was:
Matrix operator+(const Matrix&)

Your method is:
operator+(const Matrix&, const Matrix&)

I did not know there was a two parameter version of operator+.

Thanks! Jo

Aug 17 '05 #20

Karl Heinz Buchegger

Karl Heinz Buchegger wrote:

Jojo wrote:

Is there any way to get to the left-hand side of an operator? Consider
the following (this is not meant to be perfect code, just an example of
the problem):

You must have a compiler which is really bad at optimization.
In the following code, the timing is as follows:

operator+: 180 clock ticks
function add(): 240 clock ticks

So (thanks to the optimizer, which optimizes away the total overhead
of the temporary) the code using operator+ is actually *faster* then
your specialized function.

Oh. I forgot. This was measure with VC++ 6.0. A fairly old compiler.
If VC++ can do it, any other compiler younger then 8 years can do it
also.
--
Karl Heinz Buchegger
kb******@gascad.at

Aug 17 '05 #21

Karl Heinz Buchegger

Jojo wrote:

Karl Heinz Buchegger wrote:
inline Matrix operator +( const Matrix& lhs, const Matrix& rhs )
{
Matrix temp; // "unnecessary" creation of temp variable

for (unsigned i = 0; i < sizeof(lhs.data)/sizeof(int); i++)
temp.data[i] = lhs.data[i] + rhs.data[i];

return temp;
}
OK, _this_ is the answer I think I have been looking for. This method
is not the method I posted and this is indeed faster in GCC.

My method was:
Matrix operator+(const Matrix&)

That is the member function 'operator+'
Your method is:
operator+(const Matrix&, const Matrix&)

That is the free standing function 'operator+'
I did not know there was a two parameter version of operator+.

The free standing version usually is the prefered one for operator+, operator+,
operator*, ... (all operators that return a temporary object and not a reference
to *this).

Even if I change the free standing function back into a member function,
the operator+ version is still faster (by roughly the same amount) then
the add() version.

So either your measurement code is incorrect or you did not turn on
the optimizer for your measurements, since I can't believe that VC++ 6.0
outperforms gcc in terms of code optimization.
--
Karl Heinz Buchegger
kb******@gascad.at

Aug 17 '05 #22

Victor Bazarov

Jojo wrote:

Karl Heinz Buchegger wrote:
inline Matrix operator +( const Matrix& lhs, const Matrix& rhs )
{
Matrix temp; // "unnecessary" creation of temp variable

for (unsigned i = 0; i < sizeof(lhs.data)/sizeof(int); i++)
temp.data[i] = lhs.data[i] + rhs.data[i];

return temp;
}

OK, _this_ is the answer I think I have been looking for. This method
is not the method I posted and this is indeed faster in GCC.

My method was:
Matrix operator+(const Matrix&)

Your method is:
operator+(const Matrix&, const Matrix&)

I did not know there was a two parameter version of operator+.

Not as a member there isn't. A single-argument variation of operator+
actually does have two operands (and two arguments). The other (left)
argument is the hidden argument of any non-static member function --
the object for which the function is called. What book are you reading
that doesn't explain the difference between a member-based and non-member
based operator overloading?

V

Aug 17 '05 #23

Kai-Uwe Bux

Jojo wrote:

benben wrote:
You are still going to incur a performance penalty from copying data into
the MatrixOp variable. This penalty is certainly much smaller than that
of copying the objects in my example but as I said in the beginning my
example is to illustrate the problem and not be an example of perfect
code used in production.

MatrixOp can be constructed without copying any matrix at all. A
reference or pointer will do.
Ben

I didn't say anything to the contrary. The "add(const Object& obj,
Object* output)" method in my example is still faster. MatrixOp adds
overhead no matter how you look at it. Like I said before, it's a
smaller overhead than copying a large object but it _does_ add overhead.

Consider if we're working with millions of "Vector" objects that only
have two or three float members. MatrixOp would have about the same
complexity as the Vector object itself. There would be no benefit to
using it and it would be slower than the ever so ugly:

void add(const Vector& obj, Vector* output)
{
*output = obj;
}

Jo

Here is an experiment:

#include <iostream>
#include <ctime>

unsigned long const dim = 1024;
unsigned long const runs = 1234567;

struct Vector;

struct VectorSum {

Vector const & a;
Vector const & b;

VectorSum( Vector const & a_, Vector const & b_ )
: a ( a_ )
, b ( b_ )
{}

}; // struct VectorSum;

struct Vector {

double entry [dim];

Vector ( void ) {
for ( unsigned long i = 0; i < dim; ++i ) {
entry[i] = 2.718281828459;
}
}

Vector & operator= ( Vector const & other ) {
for ( unsigned int i = 0; i < dim; ++i ) {
this->entry[i] = other.entry[i];
}
return( *this );
}

Vector & operator= ( VectorSum const & sum ) {
sum.a.add( sum.b, *this );
return( *this );
}

void add ( Vector const & other, Vector & result ) const {
for( unsigned int i = 0; i < dim; ++i ) {
result.entry[i] = this->entry[i] + other.entry[i];
}
}

VectorSum operator+ ( Vector const & b ) const {
return( VectorSum( *this, b ) );
}

}; // struct Vector;

int main ( void ) {
{
Vector a;
Vector b;
std::clock_t ticks = std::clock();
{
for ( unsigned int i = 0; i < runs; ++i ) {
a = a + b;
}
}
std::cout << std::clock() - ticks << '\n';
}
{
Vector a;
Vector b;
std::clock_t ticks = std::clock();
{
for ( unsigned int i = 0; i < runs; ++i ) {
a.add( b, a );
}
}
std::cout << std::clock() - ticks << '\n';
}
}
news_group> a.out
90000
120000
This was with all optimizations of g++ turned on.
Best

Kai-Uwe Bux

Aug 17 '05 #24

Jojo

Karl Heinz Buchegger wrote:

The free standing version usually is the prefered one for operator+, operator+,
operator*, ... (all operators that return a temporary object and not a reference
to *this).

Even if I change the free standing function back into a member function,
the operator+ version is still faster (by roughly the same amount) then
the add() version.

So either your measurement code is incorrect or you did not turn on
the optimizer for your measurements, since I can't believe that VC++ 6.0
outperforms gcc in terms of code optimization.

Thanks. I don't think there is any difference in speed between the
add() method and the overloaded operator though. I think the difference
you are seeing is from already having the output loaded on the second
run. The CPU will optimize away some of the value changes because they
are exactly the same on the second run. If you separate out the two
versions and run two separate instances the timing will be exactly the same.

Jo

Aug 17 '05 #25

Victor Bazarov

Karl Heinz Buchegger wrote:

Karl Heinz Buchegger wrote:
Jojo wrote:
Is there any way to get to the left-hand side of an operator? Consider
the following (this is not meant to be perfect code, just an example of
the problem):

You must have a compiler which is really bad at optimization.
In the following code, the timing is as follows:

operator+: 180 clock ticks
function add(): 240 clock ticks

So (thanks to the optimizer, which optimizes away the total overhead
of the temporary) the code using operator+ is actually *faster* then
your specialized function.

Oh. I forgot. This was measure with VC++ 6.0. A fairly old compiler.
If VC++ can do it, any other compiler younger then 8 years can do it
also.

Don't bet on it. VC++ v6 has a very decent optimizer. VC++ v7.1 has
a better one, of course, but even v6 could outperform many others, even
more recent ones.

V

Aug 17 '05 #26

Jojo

Kai-Uwe Bux wrote:

news_group> a.out
90000
120000
This was with all optimizations of g++ turned on.
Best

Kai-Uwe Bux

Another good find! I knew the compiler could optimize out the temporary
object when creating a new object, I didn't think of using it that way
though! I'll have to try this out.

Again though, you can't get accurate timing by running nearly the same
code twice in the same execution. You have to run it two separate times
with each different set of code. Otherwise the CPU can optimize out
some of the value changes because they don't change at all.

Jo

Aug 17 '05 #27

Kai-Uwe Bux

Jojo wrote:

Karl Heinz Buchegger wrote:
Jojo wrote:
Is there any way to get to the left-hand side of an operator? Consider
the following (this is not meant to be perfect code, just an example of
the problem):

You must have a compiler which is really bad at optimization.
In the following code, the timing is as follows:

operator+: 180 clock ticks
function add(): 240 clock ticks

So (thanks to the optimizer, which optimizes away the total overhead
of the temporary) the code using operator+ is actually *faster* then
your specialized function.

Yes a good compiler could do that. I have not found one than can.

So the big question is what compiler are you using?

I'm using GCC (I tried versions 3.3 and 4.0).

Jo

Hi,

I compiled Karl Heinz Buchbergers code with gcc-4.0.0, all optimizations
turned on:

news_group> a.out
60000
90000
So, I would venture the conjecture that Mr Buchberger might have used
gcc, too.
Best

Kai-Uwe Bux

Aug 17 '05 #28

Kai-Uwe Bux

Karl Heinz Buchegger wrote:
[snip]

int main()
{
Matrix a(2), b(3), c;
time_t start, end;

start = clock();
for( int j = 0; j < 100000; ++j )
c = a + b;
end = clock();
Are you sure the compiler did not optimize away all of the loop?
After all, neither a nor b nor c are changing.
cout << end - start << endl;

start = clock();
for( int j = 0; j < 100000; ++j )
a.add( b, c );
end = clock();
Same here.

cout << end - start << endl;
}

Best

Kai-Uwe Bux

Aug 17 '05 #29

Jojo

Victor Bazarov wrote:

Not as a member there isn't. A single-argument variation of operator+
actually does have two operands (and two arguments). The other (left)
argument is the hidden argument of any non-static member function --
the object for which the function is called. What book are you reading
that doesn't explain the difference between a member-based and non-member
based operator overloading?

V

Embarrassed to say but I've been programming in C++ for over 12 years.
I just never had to use such functionality. You learn something new
every day I guess (a testament to the annoying complexity yet extreme
power of C++).

Jo

Aug 17 '05 #30

Kai-Uwe Bux

Jojo wrote:

Kai-Uwe Bux wrote:
news_group> a.out
90000
120000
This was with all optimizations of g++ turned on.
Best

Kai-Uwe Bux

Another good find! I knew the compiler could optimize out the temporary
object when creating a new object, I didn't think of using it that way
though! I'll have to try this out.

Again though, you can't get accurate timing by running nearly the same
code twice in the same execution. You have to run it two separate times
with each different set of code. Otherwise the CPU can optimize out
some of the value changes because they don't change at all.

Jo

Hi,

I think, in my code the loops are a = a+b; and a.add( b, a ).
Thus the value of a changes. Also note that a is reset before the
second loop starts. I do not see how a CPU could be cleverly reusing
any information gained in the first run to speed up the second.

I am more puzzled that there is a difference at all: the compiler is
allowed and hoped to inline the operator calls and the calls to add.
In that case, the two loops should have identical assembler code.
Best

Kai-Uwe Bux

Aug 17 '05 #31

Jojo

Karl Heinz Buchegger wrote:

Jojo wrote:
Is there any way to get to the left-hand side of an operator? Consider
the following (this is not meant to be perfect code, just an example of
the problem):

You must have a compiler which is really bad at optimization.
In the following code, the timing is as follows:

operator+: 180 clock ticks
function add(): 240 clock ticks

So (thanks to the optimizer, which optimizes away the total overhead
of the temporary) the code using operator+ is actually *faster* then
your specialized function.

After more testing I found this to not be the case. You have to use the
data otherwise the compiler will do extra optimization. Here is a full
code example based on your code:

#include <iostream>
#include <ctime>

using namespace std;

class Matrix
{
public:
int data[1024];

Matrix() {}

Matrix(int value)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
data[i] = value;
}

void add(const Matrix& obj, Matrix& output)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
output.data[i] = data[i] + obj.data[i];
}

friend Matrix operator+(const Matrix& lhs, const Matrix& rhs);
};

inline Matrix operator +(const Matrix& lhs, const Matrix& rhs)
{
Matrix temp; // "unnecessary" creation of temp variable

for (unsigned i = 0; i < sizeof(lhs.data)/sizeof(int); i++)
temp.data[i] = lhs.data[i] + rhs.data[i];

return temp;
}

int main()
{
Matrix a(2), b(3), c;
time_t start, end;

start = clock();
for( int j = 0; j < 5000000; ++j )
c = a + b;
end = clock();

cout << a.data[0] << " " << b.data[0] << " " << c.data[0]
<< " " << end-start << endl;

start = clock();
for( int j = 0; j < 5000000; ++j )
a.add( b, c );
end = clock();

cout << a.data[0] << " " << b.data[0] << " " << c.data[0]
<< " " << end-start << endl;

return 0;
}

----------------------------------

$ g++-4.0 -Wall -O2 test2.cpp
$ ./a.out
2 3 5 6620000
2 3 5 8920000

So we are back to where I started, the add() method is still 50% faster.

If you take out the printing of the array values then the timing becomes
similar.

Jo

Aug 17 '05 #32

Jojo

> $ g++-4.0 -Wall -O2 test2.cpp

$ ./a.out
2 3 5 6620000
2 3 5 8920000

Oops, those timings are inverted from the code I posted. Please note
that the 6620000 time is for "a.add()" and the 8920000 is for "c = a + b"

Jo

Aug 17 '05 #33

Jojo

Maxim Yegorushkin wrote:

Jojo wrote:

[]

That would be fast and clean to use. Is there any way to accomplish
this? Otherwise the situation is just ugly and there is no point in
using operator overloading for these types of situations (which really
defeats the purpose of operator overloading in the first place).

Use expression templates. Refer to
http://osl.iu.edu/~tveldhui/papers/E.../exprtmpl.html

Even that is not as fast as my add() method. Nothing beats that. If
only we could access the left-hand side of the expression then the
syntax would be clean.

Jo

Aug 17 '05 #34

Victor Bazarov

Jojo wrote:

Karl Heinz Buchegger wrote:
Jojo wrote:
Is there any way to get to the left-hand side of an operator? Consider
the following (this is not meant to be perfect code, just an example of
the problem):
You must have a compiler which is really bad at optimization.
In the following code, the timing is as follows:

operator+: 180 clock ticks
function add(): 240 clock ticks

So (thanks to the optimizer, which optimizes away the total overhead
of the temporary) the code using operator+ is actually *faster* then
your specialized function.

After more testing I found this to not be the case. You have to use the
data otherwise the compiler will do extra optimization. Here is a full
code example based on your code:

#include <iostream>
#include <ctime>

using namespace std;

class Matrix
{
public:
int data[1024];

Matrix() {}

Matrix(int value)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
data[i] = value;
}

void add(const Matrix& obj, Matrix& output)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
output.data[i] = data[i] + obj.data[i];
}

friend Matrix operator+(const Matrix& lhs, const Matrix& rhs);

There is no need in this 'friend' declaration.
};

inline Matrix operator +(const Matrix& lhs, const Matrix& rhs)
{
Matrix temp; // "unnecessary" creation of temp variable

for (unsigned i = 0; i < sizeof(lhs.data)/sizeof(int); i++)
temp.data[i] = lhs.data[i] + rhs.data[i];

return temp;
}

int main()
{
Matrix a(2), b(3), c;
time_t start, end;

start = clock();
for( int j = 0; j < 5000000; ++j )
c = a + b;
end = clock();

cout << a.data[0] << " " << b.data[0] << " " << c.data[0]
<< " " << end-start << endl;

start = clock();
for( int j = 0; j < 5000000; ++j )
a.add( b, c );
end = clock();

cout << a.data[0] << " " << b.data[0] << " " << c.data[0]
<< " " << end-start << endl;

return 0;
}

----------------------------------

$ g++-4.0 -Wall -O2 test2.cpp
$ ./a.out
2 3 5 6620000
2 3 5 8920000

So we are back to where I started, the add() method is still 50% faster.
Huh? It seems that the first one (using the operator+) actually faster on
your system. In my book six is smaller than eight.
If you take out the printing of the array values then the timing becomes
similar.

What do you mean by that?

Built with Visual C++ v8 Beta 2, I get the output

2 3 5 22069
2 3 5 6741

which suggests that 'add' function works better. However, since the body
of either loop doesn't use the argument, it's quite possible that the
optimizer does something drastic, like calling each function only once...
Not the best test case, IOW.

V

Aug 17 '05 #35

Victor Bazarov

Jojo wrote:

$ g++-4.0 -Wall -O2 test2.cpp
$ ./a.out
2 3 5 6620000
2 3 5 8920000

Oops, those timings are inverted from the code I posted. Please note
that the 6620000 time is for "a.add()" and the 8920000 is for "c = a + b"

That's why copy-and-paste should always be used instead of typing.

Aug 17 '05 #36

Jojo

Victor Bazarov wrote:

Huh? It seems that the first one (using the operator+) actually faster on
your system. In my book six is smaller than eight.
No, I had the output inverted from the code I posted.

If you take out the printing of the array values then the timing
becomes similar.

What do you mean by that?

If you do not print the values a.data[0], b.data[0], etc then the timing
becomes similar. Accessing the array prevents the compiler from
performing extra optimization (at leat with gcc).

Built with Visual C++ v8 Beta 2, I get the output

2 3 5 22069
2 3 5 6741

which suggests that 'add' function works better. However, since the body
of either loop doesn't use the argument, it's quite possible that the
optimizer does something drastic, like calling each function only once...
Not the best test case, IOW.

V

Something isn't right with your code. Did you change the values in one
for-loop and forget to change the other?

Using VS 7.1

C:\>cl /Ox test2.cpp
C:\>test2
2 3 5 6703
2 3 5 11484

6703 is for add() method
11484 is for "c = a + b"

Jo

Aug 17 '05 #37

Jojo

Victor Bazarov wrote:

Jojo wrote:
$ g++-4.0 -Wall -O2 test2.cpp
$ ./a.out
2 3 5 6620000
2 3 5 8920000

Oops, those timings are inverted from the code I posted. Please note
that the 6620000 time is for "a.add()" and the 8920000 is for "c = a + b"

That's why copy-and-paste should always be used instead of typing.

I did cut and paste. I had just changed the code because I was making
sure that reversing the order of the operations did not effect the timing.

Jo

Aug 17 '05 #38

Victor Bazarov

Jojo wrote:

Victor Bazarov wrote:
Huh? It seems that the first one (using the operator+) actually
faster on
your system. In my book six is smaller than eight.

No, I had the output inverted from the code I posted.
If you take out the printing of the array values then the timing
becomes similar.

What do you mean by that?

If you do not print the values a.data[0], b.data[0], etc then the timing
becomes similar. Accessing the array prevents the compiler from
performing extra optimization (at leat with gcc).

Built with Visual C++ v8 Beta 2, I get the output

2 3 5 22069
2 3 5 6741

which suggests that 'add' function works better. However, since the body
of either loop doesn't use the argument, it's quite possible that the
optimizer does something drastic, like calling each function only once...
Not the best test case, IOW.

V

Something isn't right with your code.

That's *your* code. And I gave the output as appears from the code
_as_posted_.
Did you change the values in one
for-loop and forget to change the other?
No. I copied the code straight out of your post.
Using VS 7.1

C:\>cl /Ox test2.cpp
C:\>test2
2 3 5 6703
2 3 5 11484

6703 is for add() method
11484 is for "c = a + b"

Yes, I believe that. I've tested on several compilers/systems and all
pretty much give the same result, 'add' is two-three times better. The
difference undoubtedly comes from the fact that the matrix needs to be
copied to and fro.

V

Aug 17 '05 #39

Serge Skorokhodov (216716244)

Hi,

<skip>

So we are back to where I started, the add() method is still
50% faster.

If you take out the printing of the array values then the
timing becomes similar.

Well, a simple blas-style plain C function add_matrix(double*
pdest, double* plh, double* prh) may be even faster...;)

BTW, have you seen http://www.oonumerics.org/blitz/?

--
Serge

Aug 17 '05 #40

Karl Heinz Buchegger

Kai-Uwe Bux wrote:

Karl Heinz Buchegger wrote:
[snip]
int main()
{
Matrix a(2), b(3), c;
time_t start, end;

start = clock();
for( int j = 0; j < 100000; ++j )
c = a + b;
end = clock();

Are you sure the compiler did not optimize away all of the loop?
After all, neither a nor b nor c are changing.#

Thats what I initially thought also.
But then I tried an experiment: I increased the number
of iterations and with that the time increased. For me
this is evicence enough that the loop is not
optimized away entirely.
--
Karl Heinz Buchegger
kb******@gascad.at

Aug 18 '05 #41

Karl Heinz Buchegger

Jojo wrote:

Karl Heinz Buchegger wrote:
The free standing version usually is the prefered one for operator+, operator+,
operator*, ... (all operators that return a temporary object and not a reference
to *this).

Even if I change the free standing function back into a member function,
the operator+ version is still faster (by roughly the same amount) then
the add() version.

So either your measurement code is incorrect or you did not turn on
the optimizer for your measurements, since I can't believe that VC++ 6.0
outperforms gcc in terms of code optimization.
Thanks. I don't think there is any difference in speed between the
add() method and the overloaded operator though. I think the difference
you are seeing is from already having the output loaded on the second
run.

Then the second run should actually be faster, don't you think.
But in reality quite the contrary is true. The second run through
the loop, using the add() function is slower.
The CPU will optimize away some of the value changes because they
are exactly the same on the second run. If you separate out the two
versions and run two separate instances the timing will be exactly the same.

Just switch the 2 code parts and see if the results are the same.
Hint: They are the same.

--
Karl Heinz Buchegger
kb******@gascad.at

Aug 18 '05 #42

Kai-Uwe Bux

Jojo wrote:

Is there any way to get to the left-hand side of an operator? Consider
the following (this is not meant to be perfect code, just an example of
the problem):

class Matrix
{
public:
int data[1024];

Matrix() {}

Matrix(int value)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
data[i] = value;
}

void add(const Matrix& obj, Matrix* output)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
output->data[i] = data[i] + obj.data[i];
}

Matrix operator +(const Matrix& obj)
{
Matrix temp; // "unnecessary" creation of temp variable

for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
temp.data[i] = data[i] + obj.data[i];

return temp; // "unnecessary" extra copy of output
}
};

For nice looking syntax you _really_ want to use the operator+ like:
matrix3 = matrix1 + matrix2;

However, that is some 50% slower than the _much_ uglier:
matrix1.add(matrix2, &matrix3);

If only there were a way to get to the left-hand argument of the
operator+ then it could be fast and easy to use. Consider the following
code which is not valid C++ and will not compile for this example:

Matrix as M
operator+(const Matrix& obj)
{
for (unsigned i = 0; i < sizeof(data)/sizeof(int); i++)
M.data[i] = data[i] + obj.data[i];
}

That would be fast and clean to use. Is there any way to accomplish
this? Otherwise the situation is just ugly and there is no point in
using operator overloading for these types of situations (which really
defeats the purpose of operator overloading in the first place).

I guess somebody else already mentioned expression templates. Here is some
code for experimentation:

unsigned long const rank = 1024;
unsigned long const runs = 12345;

template < typename VectorExprA,
typename VectorExprB >
struct VectorSum {

typedef typename VectorExprA::value_type value_type;

VectorExprA const & a;
VectorExprB const & b;

VectorSum( VectorExprA const & a_,
VectorExprB const & b_ )
: a ( a_ )
, b ( b_ )
{}

value_type operator[] ( unsigned long i ) const {
return( a[i] + b[i] );
}

}; // struct VectorSum<>;

template < typename arithmetic_type,
unsigned long dim >
class Vector {
public:

typedef arithmetic_type value_type;

private:

value_type entry [dim];

public:

Vector ( value_type val = value_type() ) {
for ( unsigned long i = 0; i < dim; ++i ) {
entry[i] = val;
}
}

Vector & operator= ( Vector const & other ) {
for ( unsigned int i = 0; i < dim; ++i ) {
this->entry[i] = other.entry[i];
}
return( *this );
}

template < typename VectorExpr >
Vector & operator= ( VectorExpr const & expr ) {
for ( unsigned long i = 0; i < dim; ++i ) {
this->entry[i] = expr[i];
}
return( *this );
}

value_type const & operator[] ( unsigned long i ) const {
return( entry[i] );
}

value_type & operator[] ( unsigned long i ) {
return( entry[i] );
}

void add ( Vector const & other, Vector & result ) const {
for( unsigned int i = 0; i < dim; ++i ) {
result.entry[i] = this->entry[i] + other.entry[i];
}
}

}; // class Vector<>;

template < typename VectorExprA,
typename VectorExprB >
inline
VectorSum< VectorExprA, VectorExprB > operator+ ( VectorExprA const & a,
VectorExprB const & b ) {
return( VectorSum< VectorExprA, VectorExprB >( a, b ) );
}

#include <iostream>
#include <ctime>
#include <cstdlib>

void test_operator1 ( void ) {
Vector< double, rank > a ( 0 );
Vector< double, rank > b ( 1 );
std::clock_t ticks = std::clock();
{
for ( unsigned int i = 0; i < runs; ++i ) {
a = a + b;
}
}
ticks = std::clock() - ticks;
std::cout << "opp1: a[0] = " << a[0] << " time: " << ticks << '\n';
}

void test_operator2 ( void ) {
Vector< double, rank > a ( 0 );
Vector< double, rank > b ( 1 );
std::clock_t ticks = std::clock();
{
for ( unsigned int i = 0; i < runs; ++i ) {
a = a + b + b;
}
}
ticks = std::clock() - ticks;
std::cout << "opp2: a[0] = " << a[0] << " time: " << ticks << '\n';
}
void test_add1 ( void ) {
Vector< double, rank > a ( 0 );
Vector< double, rank > b ( 1 );
std::clock_t ticks = std::clock();
{
for ( unsigned int i = 0; i < runs; ++i ) {
a.add( b, a );
}
}
ticks = std::clock() - ticks;
std::cout << "add1: a[0] = " << a[0] << " time: " << ticks << '\n';
}

void test_add2 ( void ) {
Vector< double, rank > a ( 0 );
Vector< double, rank > b ( 1 );
std::clock_t ticks = std::clock();
{
for ( unsigned int i = 0; i < runs; ++i ) {
a.add( b, a );
a.add( b, a );
}
}
ticks = std::clock() - ticks;
std::cout << "add2: a[0] = " << a[0] << " time: " << ticks << '\n';
}

int main ( void ) {
std::srand( 24163 );
while ( true ) {
switch ( std::rand() % 4 ) {
case 0 :
test_operator1();
break;
case 1 :
test_operator2();
break;
case 2 :
test_add1();
break;
case 3 :
test_add2();
break;
}
}
}
On my machine (OS: Linux; CPU: intel; CC: g++-4.0.1 all optimization turned
on), opp1 and add1 are essentially equivalent and opp2 beats add2.
Best

Kai-Uwe Bux

Aug 18 '05 #43

Aleksey Loginov

Victor Bazarov wrote:

Jojo wrote:
6703 is for add() method
11484 is for "c = a + b"

Yes, I believe that. I've tested on several compilers/systems and all
pretty much give the same result, 'add' is two-three times better. The
difference undoubtedly comes from the fact that the matrix needs to be
copied to and fro.

can you compare result with

#include <iostream>

using namespace std;

const int dim=4;

struct matrix {
int * data;
int size;

explicit matrix () : data (new int[dim]), size (dim) { }

explicit matrix (int value) : data (new int[dim]), size (dim) {
for ( int i=0; i<size; ++i ) data[i]=value;
}

~matrix () { delete [] data; }

matrix (const matrix &);

matrix & operator = (const matrix & x) {
matrix * ptr=const_cast<matrix *>(&x);
delete [] this->data;
this->data=ptr->data;
this->size=ptr->size;
ptr->data=0; ptr->size=0;
return (*this);
}

int & operator [] (int i) { return data[i]; }
const int & operator [] (int i) const { return data[i]; }

};
matrix operator + ( const matrix & x, const matrix & y ) {
matrix temp;

for (int i=0; i<temp.size; ++i) temp[i]=x[i]+y[i];

return temp;
}

int main () {

matrix a(2);
matrix b(3);
matrix c;

c=a+b;

return 0;
}

Aug 18 '05 #44

Earl Purple

Jojo wrote:
<snip>

If you really want to add performance you might find pointer arithmetic
is faster than using array access. Thus:

class Matrix
{
public:
enum { numElements = 1024 };

int data[numElements];
Matrix() {}
explicit Matrix(int value)
{
int* begin( data );
int* end( data + numElements );
while ( begin != end )
{
*begin = value;
++begin;
}
}

Matrix& operator+=( const Matrix& rhs )
{
int* begin( data );
int* end( data + numElements );
const int* srcBegin( rhs.data );
while ( begin != end )
{
*begin = *srcBegin;
++begin;
++srcBegin;
}
return *this;
}
}

Matrix operator+( const Matrix& lhs, const Matrix& rhs )
{
Matrix temp( lhs );
return temp += rhs;
}
If you want to do operator+ without copy-construction then

Matrix operator+( const Matrix& lhs, const Matrix& rhs )
{
Matrix temp;
int* destBegin ( temp.data );
const int* src1Begin( lhs.data );
const int* src2Begin( rhs.data );
int* destEnd( destBegin + Matrix::numElements );
while ( destBegin != destEnd )
{
*destBegin = *src1Begin + *src2Begin;
++destBegin;
++src1Begin;
++src2Begin;
}
};

Note, you could also use post-increment on the pointers which appears
to use fewer lines of code but in reality would probably generate the
same assembly code here (with regular pointers) could could generate
more code (using iterators).

(And you can change the use of the word Begin to something like Pos if
you think Begin is a misnomer).

Note: you'll probably find that STL implements its algorithms similar
to how I just did, and you might want to actually use them.

Aug 18 '05 #45

Victor Bazarov

Aleksey Loginov wrote:

Victor Bazarov wrote:
Jojo wrote:
6703 is for add() method
11484 is for "c = a + b"

Yes, I believe that. I've tested on several compilers/systems and all
pretty much give the same result, 'add' is two-three times better. The
difference undoubtedly comes from the fact that the matrix needs to be
copied to and fro.

can you compare result with

[...]

I just went ahead and merged the OP's 'Matrix' code with your 'matrix',
and then ran 5 million loops of additions for one and then for the other.
The results are interesting, implementing crude "move" semantics does
give an advantage:

2 3 5 21804
2 3 5 13142

(the first is for the 'Matrix' additions, the latter is for 'matrix').

I had to adjust your 'dim' to 1024 to match the original code. I did
implement the copy constructor (although some compilers didn't seem to
need it), just in case.

V

Aug 18 '05 #46

Aleksey Loginov

Victor Bazarov wrote:

Aleksey Loginov wrote:
Victor Bazarov wrote:
Jojo wrote:

6703 is for add() method
11484 is for "c = a + b"

Yes, I believe that. I've tested on several compilers/systems and all
pretty much give the same result, 'add' is two-three times better. The
difference undoubtedly comes from the fact that the matrix needs to be
copied to and fro.

can you compare result with

[...]

I just went ahead and merged the OP's 'Matrix' code with your 'matrix',
and then ran 5 million loops of additions for one and then for the other.
The results are interesting, implementing crude "move" semantics does
give an advantage:

2 3 5 21804
2 3 5 13142

(the first is for the 'Matrix' additions, the latter is for 'matrix').

what time for add() method?

I had to adjust your 'dim' to 1024 to match the original code. I did
implement the copy constructor (although some compilers didn't seem to
need it), just in case.

thanks.
can you try this code, please?

#include <iostream>

using namespace std;

const int dim=1024;

struct matrix {
int * data;
int size;

struct matrix_ref {
matrix * ptr;
matrix_ref ( matrix * x ) : ptr (x) { }
matrix * operator -> () { return ptr; }
};

explicit matrix () : data (new int[dim]), size (dim) { }

explicit matrix (int value) : data (new int[dim]), size (dim) {
for ( int i=0; i<size; ++i ) data[i]=value;
}

explicit matrix (const matrix &);

~matrix () { delete [] data; }
matrix (matrix_ref ptr) : data(0), size(0) {
this->reset (ptr);
}

matrix & operator = (matrix & x) {
this->reset (x);
return (*this);
}

matrix & operator = (matrix_ref ptr) {
this->reset (ptr);
return (*this);
}

operator matrix_ref () { return matrix_ref (this); }
void reset ( matrix_ref ptr ) {
delete [] this->data;
this->data=ptr->data;
this->size=ptr->size;
ptr->data=0; ptr->size=0;
}

int & operator [] (int i) { return data[i]; }
const int & operator [] (int i) const { return data[i]; }

};
matrix operator + ( const matrix & x, const matrix & y ) {
matrix temp;

for (int i=0; i<temp.size; ++i) temp[i]=x[i]+y[i];

return temp;
}

int main () {

matrix a(2);
matrix b(3);
matrix c;

c=a+b;

return 0;
}

Aug 18 '05 #47

Victor Bazarov

Aleksey Loginov wrote:

Victor Bazarov wrote:
Aleksey Loginov wrote:
Victor Bazarov wrote:
Jojo wrote:
>6703 is for add() method
>11484 is for "c = a + b"

Yes, I believe that. I've tested on several compilers/systems and all
pretty much give the same result, 'add' is two-three times better. The
difference undoubtedly comes from the fact that the matrix needs to be
copied to and fro.
can you compare result with

[...]

I just went ahead and merged the OP's 'Matrix' code with your 'matrix',
and then ran 5 million loops of additions for one and then for the other.
The results are interesting, implementing crude "move" semantics does
give an advantage:

2 3 5 21804
2 3 5 13142

(the first is for the 'Matrix' additions, the latter is for 'matrix').

what time for add() method?

Your class 'matrix' doesn't have 'add' method.

I had to adjust your 'dim' to 1024 to match the original code. I did
implement the copy constructor (although some compilers didn't seem to
need it), just in case.

thanks.
can you try this code, please?

[..]

Sorry, I don't have time at this point. Maybe over the weekend if you are
still interested by then.

V

Aug 18 '05 #48

Aleksey Loginov

Victor Bazarov wrote:

Aleksey Loginov wrote:
Victor Bazarov wrote:
Aleksey Loginov wrote:

Victor Bazarov wrote:
>Jojo wrote:
>
>
>>6703 is for add() method
>>11484 is for "c = a + b"
here

>Yes, I believe that. I've tested on several compilers/systems and all
>pretty much give the same result, 'add' is two-three times better. The
>difference undoubtedly comes from the fact that the matrix needs to be
>copied to and fro.
can you compare result with

[...]

I just went ahead and merged the OP's 'Matrix' code with your 'matrix',
and then ran 5 million loops of additions for one and then for the other.
The results are interesting, implementing crude "move" semantics does
give an advantage:

2 3 5 21804
2 3 5 13142

(the first is for the 'Matrix' additions, the latter is for 'matrix').

what time for add() method?

Your class 'matrix' doesn't have 'add' method.

i thought OP have... no matter.

I had to adjust your 'dim' to 1024 to match the original code. I did
implement the copy constructor (although some compilers didn't seem to
need it), just in case.

thanks.
can you try this code, please?

[..]

Sorry, I don't have time at this point. Maybe over the weekend if you are
still interested by then.

i can wait, it's not a problem.

i work with gcc 3.2, so can't get real results by myself...

Aug 19 '05 #49

Victor Bazarov

Aleksey Loginov wrote:

[...]
i work with gcc 3.2, so can't get real results by myself...

Hmm... I didn't know gcc 3.2 was unable to produce real results...

Aug 19 '05 #50

Operator overloading, C++ performance crappiness

Similar topics