By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,885 Members | 1,191 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,885 IT Pros & Developers. It's quick & easy.

std::copy versus pointer copy

P: n/a
Hello,

I am wondering which of these two methods is the fastest: std::copy, which
is included in the standard library, or a manually written pointer copy?
Do any of you have any experience with this?

I would think that the library function std::copy would perform optimally,
as it is a library function, and therefore the writers of this function
would know best how to optimize it ... but some tests seem to indicate
that my pointer copy function can do a lot better, especially when its
loop is vectorized. I have used g++ and icc with -O2, and icc with -O2
-march=pentiumiii -xK -tpp6 as compiler options.

Or is there another way to look at this? For completeness, I include my
test program:

<code>
#include <ctime>
#include <algorithm>
#include <iostream>

using namespace std;

typedef double Element;

class Vector
{
public:

Vector( unsigned int newsize ) { array = new Element[ newsize ]; size = newsize; }
~Vector() { if ( array ) delete( array ), array = 0; }

void init() { for ( unsigned int i = 0; i<size; i++ ) array[ i ] = i; }
void print() { for ( unsigned int i = 0; i<size; i++ ) cout << array[ i ] << " "; cout << endl; }

friend void iCopy( Element *dst, const Element *src, int n ) { int i = n + 1; while ( i-- > 0 ) *dst++ = *src++; }
friend void sCopy( Element *dst, const Element *src, int n ) { std::copy( src, src+n+1, dst ); }

unsigned int size;
Element *array;
};

int main()
{
Vector a( 10 ), b( 10 ), c( 10 ); a.init();
cout << "a : "; a.print();
cout << "b : "; b.print();
cout << "c : "; c.print();

iCopy( b.array, a.array, 7 );
cout << "b after iCopy : "; b.print();

sCopy( c.array, a.array, 7 );
cout << "c after sCopy : "; c.print();

clock_t start, stop, taken;

int VECTOR_SIZE = 5000000;

Vector d( VECTOR_SIZE ), e( VECTOR_SIZE );
d.init();

start = clock();
for ( int i = 0 ; i < 10; i++ ) iCopy( e.array, d.array, VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / 1000;
cerr << "time for iCopy : " << taken << endl;

start = clock();
for ( int i = 0; i < 10; i++ ) sCopy( e.array, d.array, VECTOR_SIZE-1 );
stop = clock();
taken = ( stop - start ) / 1000;
cerr << "time for sCopy : " << taken << endl;

start = clock();
for ( int i = 0 ; i < 10; i++ ) iCopy( e.array, d.array, VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / 1000;
cerr << "time for iCopy : " << taken << endl;

start = clock();
for ( int i = 0; i < 10; i++ ) sCopy( e.array, d.array, VECTOR_SIZE-1 );
stop = clock();
taken = ( stop - start ) / 1000;
cerr << "time for sCopy : " << taken << endl;

return 0;
}
</code>

Thanks for any reply.

Regards,

Franky B.
Jul 19 '05 #1
Share this Question
Share on Google+
30 Replies


P: n/a
fr***************@ua.ac.be wrote:
Hello,

I am wondering which of these two methods is the fastest: std::copy,
which is included in the standard library, or a manually written pointer
copy?
Do any of you have any experience with this?


Yes. The answer is: It depends. On the compiler, the flags, the library
implementation etc.

The rule of thumb: do not optimize, unless you have to. Go for the easiest
solution and if it proves to be a bottlneck, then try to find out the cause.

--
WW aka Attila
Jul 19 '05 #2

P: n/a
On Sun, 14 Sep 2003 23:40:23 +0200, franky.backeljauw wrote:
Hello,

I am wondering which of these two methods is the fastest: std::copy, which
is included in the standard library, or a manually written pointer copy?
Do any of you have any experience with this?

friend void iCopy( Element *dst, const Element *src, int n )
{ int i = n + 1; while ( i-- >dst++ = *src++; }
friend void sCopy( Element *dst, const Element *src, int n )
{ std::copy(src, src+n+1, dst ); }
I think that std::copy is going to be /at least/ as fast as the hand-coded
copy. Alternatively, if your standard library implementation is clever
enough to meta-programatically determine that Element is POD, std::copy
could simply call std::memcpy, which might be faster.
I would think that the library function std::copy would perform optimally,
as it is a library function, and therefore the writers of this function
would know best how to optimize it ... but some tests seem to indicate
that my pointer copy function can do a lot better, especially when its
loop is vectorized. I have used g++ and icc with -O2, and icc with -O2
-march=pentiumiii -xK -tpp6 as compiler options.
YMMV.
start = clock();
for ( int i = 0 ; i < 10; i++ ) iCopy( e.array, d.array, VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / 1000;
cerr << "time for iCopy : " << taken << endl;


Why are you dividing by 1000? Either divide by CLOCKS_PER_SEC or nothing
at all. Also, std::clock_t is integral; you probably want to be doing
floating-point division.

You didn't post your results. When I tested with GCC 3.3, I get basically
the same results for the two copies.

Josh
Jul 19 '05 #3

P: n/a
fr***************@ua.ac.be wrote:
Hello,

I am wondering which of these two methods is the fastest: std::copy, which
is included in the standard library, or a manually written pointer copy?
Do any of you have any experience with this?

I would think that the library function std::copy would perform optimally,
as it is a library function, and therefore the writers of this function
would know best how to optimize it
I think you would probably find that it is not very optimized at all. It
is designed to be generic, so it can apply to objects that need to be
copied via their assignment operators (where a bit-wise copy would break
the objects horribly).
... but some tests seem to indicate
that my pointer copy function can do a lot better, especially when its
loop is vectorized. I have used g++ and icc with -O2, and icc with -O2
-march=pentiumiii -xK -tpp6 as compiler options.

Or is there another way to look at this? For completeness, I include my
test program:

<code>
#include <ctime>
#include <algorithm>
#include <iostream>

using namespace std;

typedef double Element;

class Vector
{
public:

Vector( unsigned int newsize ) { array = new Element[ newsize ]; size = newsize; }
~Vector() { if ( array ) delete( array ), array = 0; }
This is broken. delete is not a function, it's an operator. You also
must use the proper 'delete' operator for the 'new' operator you used,
so you need to use delete[] here.

As for the rest of it, it seems quite unnecessary... delete works fine
with null pointers, and 'array' is disappearing as soon as the
destructor completes, so setting it to 0 is rather pointless.

void init() { for ( unsigned int i = 0; i<size; i++ ) array[ i ] = i; }
void print() { for ( unsigned int i = 0; i<size; i++ ) cout << array[ i ] << " "; cout << endl; }

friend void iCopy( Element *dst, const Element *src, int n ) { int i = n + 1; while ( i-- > 0 ) *dst++ = *src++; }
friend void sCopy( Element *dst, const Element *src, int n ) { std::copy( src, src+n+1, dst ); }


This is bizarre (putting friend definitions inside the class), and I'm
not sure if it's legal.

Also, you are copying one too many elements with your std::copy. The
second argument should be src+n.

-Kevin
--
My email address is valid, but changes periodically.
To contact me please use the address from a recent posting.

Jul 19 '05 #4

P: n/a
Hello,

let me add another one: just a normal member copy, like dst[i]=src[i]. As
it seems from my test program, this method seems to be the fastest of all
.... any ideas why? Can it be optimized better by the compiler, as it is
easier to see what's going on? Or is there some penalty for the other
methods?

Thanks again for any reply.

<code>
#include <ctime>
#include <algorithm>
#include <iostream>

using namespace std;

typedef double Element;

class Vector
{
public:

Vector( unsigned int newsize ) { array = new Element[ newsize ]; size = newsize; }
~Vector() { if ( array ) delete( array ), array = 0; }

void init() { for ( unsigned int i = 0; i<size; i++ ) array[ i ] = i; }
void print() { for ( unsigned int i = 0; i<size; i++ ) cout << array[ i ] << " "; cout << endl; }

friend void iCopy( Element *dst, const Element *src, int n ) { int i = n + 1; while ( i-- > 0 ) *dst++ = *src++; }
friend void mCopy( Element *dst, const Element *src, int n ) { int i = n + 1; while ( i-- > 0 ) dst[ i ] = src[i]; }
friend void sCopy( Element *dst, const Element *src, int n ) { std::copy( src, src+n+1, dst ); }

unsigned int size;
Element *array;
};

int main()
{
Vector a( 10 ), b( 10 ), c( 10 ); a.init();
cout << "a : "; a.print();
cout << "b : "; b.print();
cout << "c : "; c.print();

iCopy( b.array, a.array, 7 );
cout << "b after iCopy : "; b.print();

sCopy( c.array, a.array, 7 );
cout << "c after sCopy : "; c.print();

clock_t start, stop, taken;

int VECTOR_SIZE = 5000000;

Vector d( VECTOR_SIZE ), e( VECTOR_SIZE );
d.init();

start = clock();
for ( int i = 0 ; i < 10; i++ ) iCopy( e.array, d.array, VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / 1000;
cerr << "time for iCopy : " << taken << endl;

start = clock();
for ( int i = 0 ; i < 10; i++ ) mCopy( e.array, d.array, VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / 1000;
cerr << "time for mCopy : " << taken << endl;

start = clock();
for ( int i = 0; i < 10; i++ ) sCopy( e.array, d.array, VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / 1000;
cerr << "time for sCopy : " << taken << endl;

start = clock();
for ( int i = 0 ; i < 10; i++ ) iCopy( e.array, d.array, VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / 1000;
cerr << "time for iCopy : " << taken << endl;

start = clock();
for ( int i = 0 ; i < 10; i++ ) mCopy( e.array, d.array, VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / 1000;
cerr << "time for mCopy : " << taken << endl;

start = clock();
for ( int i = 0; i < 10; i++ ) sCopy( e.array, d.array, VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / 1000;
cerr << "time for sCopy : " << taken << endl;

return 0;
}
</code>

Regards,

Franky B.
Jul 19 '05 #5

P: n/a
On Sun, 14 Sep 2003, Josh Sebastian wrote:
I would think that the library function std::copy would perform optimally,
as it is a library function, and therefore the writers of this function
would know best how to optimize it ... but some tests seem to indicate
that my pointer copy function can do a lot better, especially when its
loop is vectorized. I have used g++ and icc with -O2, and icc with -O2
-march=pentiumiii -xK -tpp6 as compiler options.
YMMV.


What does "YMMV" mean? Never heard it before ...
start = clock();
for ( int i = 0 ; i < 10; i++ ) iCopy( e.array, d.array, VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / 1000;
cerr << "time for iCopy : " << taken << endl;


Why are you dividing by 1000? Either divide by CLOCKS_PER_SEC or nothing
at all. Also, std::clock_t is integral; you probably want to be doing
floating-point division.


Okay, I shall do that.
You didn't post your results. When I tested with GCC 3.3, I get basically
the same results for the two copies.


Here are some results for the three methods (see my second message):

<results>
$ icc stdcopy2.cpp -o stdcopy2 -O2 -march=pentiumiii -tpp6 -xK :
/opt/intel/compiler70/ia32/include/xlocnum(102) : (col. 64) remark: LOOP
WAS VECTORIZED.
/opt/intel/compiler70/ia32/include/xlocnum(103) : (col. 69) remark: LOOP
WAS VECTORIZED.
/opt/intel/compiler70/ia32/include/xlocnum(104) : (col. 67) remark: LOOP
WAS VECTORIZED.
$ ./stdcopy2
time for iCopy : 2090
time for mCopy : 2090
time for sCopy : 3090
time for iCopy : 2100
time for mCopy : 2090
time for sCopy : 3090

$ icc stdcopy2.cpp -o stdcopy2 -O2
$ ./stdcopy2
time for iCopy : 2100
time for mCopy : 2090
time for sCopy : 3090
time for iCopy : 2110
time for mCopy : 2100
time for sCopy : 3100

$ g++ stdcopy2.cpp -o stdcopy2 -O2
$ ./stdcopy2
time for iCopy : 2970
time for mCopy : 2090
time for sCopy : 3090
time for iCopy : 3010
time for mCopy : 1990
time for sCopy : 3070
</results>

According to the results I got, std::copy is the slowest, followed closely
by a pointer copy, but the member copy wins by quite a margin. Could you
run my second test program and check your results ... thanks!

Regards,

Franky B.
Jul 19 '05 #6

P: n/a
On Mon, 15 Sep 2003, White Wolf wrote:
Yes. The answer is: It depends. On the compiler, the flags, the
library implementation etc. The rule of thumb: do not optimize, unless
you have to. Go for the easiest solution and if it proves to be a
bottlneck, then try to find out the cause.


I am looking for the most optimized and thus fastest solution ...

Regards,

Franky B.
Jul 19 '05 #7

P: n/a
On Sun, 14 Sep 2003, Kevin Goodsell wrote:
I think you would probably find that it is not very optimized at all. It
is designed to be generic, so it can apply to objects that need to be
copied via their assignment operators (where a bit-wise copy would break
the objects horribly).
In an earlier thread somebody was saying that std::copy was the fastest
way of copying, so that's why I wanted to test it, as it is very important
for what I'm doing - a lot of copying, basically :-). But my results seem
to indicate that is not very optimized indeed ...
~Vector() { if ( array ) delete( array ), array = 0; }


This is broken. delete is not a function, it's an operator. You also
must use the proper 'delete' operator for the 'new' operator you used,
so you need to use delete[] here.


Indeed - sorry, my mistake - blame it on the moonlight!
As for the rest of it, it seems quite unnecessary... delete works fine
with null pointers, and 'array' is disappearing as soon as the
destructor completes, so setting it to 0 is rather pointless.
True, once again ...
friend void iCopy( Element *dst, const Element *src, int n ) { int i = n + 1; while ( i-- > 0 ) *dst++ = *src++; }
friend void sCopy( Element *dst, const Element *src, int n ) { std::copy( src, src+n+1, dst ); }


This is bizarre (putting friend definitions inside the class), and I'm
not sure if it's legal.


Maybe, but it's not of concern to this mail.
Also, you are copying one too many elements with your std::copy. The
second argument should be src+n.


The test in the beginning (copying a to b and c) seemed to show that
std::copy needed the +1 to copy the same amount of elements ...

Regards,

Franky B.
Jul 19 '05 #8

P: n/a
In article <Pi*******************************@leibniz.ruca.ua .ac.be>, fr***************@ua.ac.be wrote:
Hello,

I am wondering which of these two methods is the fastest: std::copy, which
is included in the standard library, or a manually written pointer copy?
Do any of you have any experience with this?

I would think that the library function std::copy would perform optimally,
as it is a library function, and therefore the writers of this function
would know best how to optimize it ... but some tests seem to indicate
that my pointer copy function can do a lot better, especially when its
loop is vectorized. I have used g++ and icc with -O2, and icc with -O2
-march=pentiumiii -xK -tpp6 as compiler options.

[-]
I seriously doubt your tests as std::copy() isn't a function but a template
and its actual implementation tends to result in either a simple loop or,
where possible, even a call to memmove() or some similar, optimized function.

You'd look what actually happens in algorithm,
Juergen

--
\ Real name : Juergen Heinzl \ no flames /
\ EMail Private : ju*****@manannan.org \ send money instead /
Jul 19 '05 #9

P: n/a
On Mon, 15 Sep 2003 00:03:15 +0200, franky.backeljauw wrote:
On Sun, 14 Sep 2003, Josh Sebastian wrote:
> I would think that the library function std::copy would perform optimally,
> as it is a library function, and therefore the writers of this function
> would know best how to optimize it ... but some tests seem to indicate
> that my pointer copy function can do a lot better, especially when its
> loop is vectorized. I have used g++ and icc with -O2, and icc with -O2
> -march=pentiumiii -xK -tpp6 as compiler options.
YMMV.


What does "YMMV" mean? Never heard it before ...


Your mileage may vary.
According to the results I got, std::copy is the slowest, followed closely
by a pointer copy, but the member copy wins by quite a margin. Could you
run my second test program and check your results ... thanks!


I added a couple of tests: s2Copy (which is std::copy with the arithmetic
operations taken out of the timing) and bCopy (which calls memcpy to do a
bitwise copy). Here's my results:

curien@balar:~/prog$ g++ blah.cpp
curien@balar:~/prog$ ./a.out
a : 0 1 2 3 4 5 6 7 8 9
b : 0 0 0 0 0 0 0 0 0 0
c : 0 0 0 0 0 0 0 0 0 0
b after iCopy : 0 1 2 3 4 5 6 7 0 0
c after sCopy : 0 1 2 3 4 5 6 7 0 0
time for iCopy : 1600
time for mCopy : 1920
time for sCopy : 1500
time for s2Copy : 1490
time for bCopy : 1500
time for iCopy : 1580
time for mCopy : 1920
time for sCopy : 1490
time for s2Copy : 1500
time for bCopy : 1500

curien@balar:~/prog$ g++ -O1 blah.cpp
curien@balar:~/prog$ ./a.out
a : 0 1 2 3 4 5 6 7 8 9
b : 0 0 0 0 0 0 0 0 0 0
c : 0 0 0 0 0 0 0 0 0 0
b after iCopy : 0 1 2 3 4 5 6 7 0 0
c after sCopy : 0 1 2 3 4 5 6 7 0 0
time for iCopy : 1540
time for mCopy : 1430
time for sCopy : 1500
time for s2Copy : 1520
time for bCopy : 1490
time for iCopy : 1490
time for mCopy : 1330
time for sCopy : 1480
time for s2Copy : 1490
time for bCopy : 1490

curien@balar:~/prog$ g++ -O2 blah.cpp
curien@balar:~/prog$ ./a.out
a : 0 1 2 3 4 5 6 7 8 9
b : 0 0 0 0 0 0 0 0 0 0
c : 0 0 0 0 0 0 0 0 0 0
b after iCopy : 0 1 2 3 4 5 6 7 0 0
c after sCopy : 0 1 2 3 4 5 6 7 0 0
time for iCopy : 1510
time for mCopy : 1460
time for sCopy : 1490
time for s2Copy : 1480
time for bCopy : 1490
time for iCopy : 1480
time for mCopy : 1460
time for sCopy : 1510
time for s2Copy : 1490
time for bCopy : 1470

As you can see, different methods are faster, depending on optimization
settings, but the different is miniscule. Interestingly, I also show that
mCopy is the fastest.

Josh
Jul 19 '05 #10

P: n/a
fr***************@ua.ac.be wrote:
On Mon, 15 Sep 2003, White Wolf wrote:
Yes. The answer is: It depends. On the compiler, the flags, the
library implementation etc. The rule of thumb: do not optimize,
unless you have to. Go for the easiest solution and if it proves to
be a bottlneck, then try to find out the cause.


I am looking for the most optimized and thus fastest solution ...


Then again: you are looking at the last steps of any possible optimization
(step #4). Did you do the previous step? Measure that this copy is really
a bottleneck of your application?

And the answer is still the same. It depends. It depends on the quality of
your compiler, your standard library implementation, the quality of their
interaction etc. In theory the standard library copy should be the fastest
possible implementation. If it is not, you may roll your own as well as
report to the implementors that their is non-optimal and what can they do to
make it better. But while doing so you will wonder into
implementation-dependent-land. Meaning that if you change compiler/standard
libarry implementation you may find that now your loop is inferior and
std::copy is faster. That is why I am saying: if it is an academic try just
staying with this answer: it depends. If you really really (swear on your
grave) have this as a serious bottleneck, you will need to find out the
cause of it. Since g++ is GNU, if you have found the cause and fixed it
professionally it (I am sure) will make it into the implementation. Until
it does, you may stick with a home-made solution.

--
WW aka Attila
Jul 19 '05 #11

P: n/a
fr***************@ua.ac.be wrote:

<code>
#include <ctime>
#include <algorithm>
#include <iostream>

using namespace std;

typedef double Element;

class Vector
{
public:

Vector( unsigned int newsize ) { array = new Element[ newsize ]; size = newsize; }
~Vector() { if ( array ) delete( array ), array = 0; }
This is still wrong.

void init() { for ( unsigned int i = 0; i<size; i++ ) array[ i ] = i; }
void print() { for ( unsigned int i = 0; i<size; i++ ) cout << array[ i ] << " "; cout << endl; }

friend void iCopy( Element *dst, const Element *src, int n ) { int i = n + 1; while ( i-- > 0 ) *dst++ = *src++; }
friend void mCopy( Element *dst, const Element *src, int n ) { int i = n + 1; while ( i-- > 0 ) dst[ i ] = src[i]; }
friend void sCopy( Element *dst, const Element *src, int n ) { std::copy( src, src+n+1, dst ); }
These are all going 1 element too far.

unsigned int size;
Element *array;
};

int main()
{
Vector a( 10 ), b( 10 ), c( 10 ); a.init();
cout << "a : "; a.print();
cout << "b : "; b.print();
cout << "c : "; c.print();
b and c have uninitialized memory that you are trying to print. This
gives undefined behavior.

<snipped>

return 0;
}
</code>


I had to make a number of changes to get sensible output. The result was
being truncated to 0. Here are the results (Visual C++ 6.0, default
project settings, original library with a few of the fixes published by
Dinkumware, none of which seem likely to affect this test):

No optimizations (debug build)
----
time for iCopy : 0.828
time for mCopy : 0.812
time for sCopy : 0.797
time for iCopy : 0.813
time for mCopy : 0.812
time for sCopy : 0.797

Normal optimizations (release build)
----
time for iCopy : 0.735
time for mCopy : 0.671
time for sCopy : 0.672
time for iCopy : 0.703
time for mCopy : 0.672
time for sCopy : 0.687

Not a terribly significant difference in any case. I wouldn't expect
this to be a bottleneck

-Kevin
--
My email address is valid, but changes periodically.
To contact me please use the address from a recent posting.

Jul 19 '05 #12

P: n/a
fr***************@ua.ac.be wrote:
YMMV.
What does "YMMV" mean? Never heard it before ...


The FAQ is your friend.

http://www.parashift.com/c++-faq-lit...t.html#faq-5.1

and in there you see:

http://www.astro.umd.edu/~marshall/abbrev.html
According to the results I got, std::copy is the slowest, followed
closely by a pointer copy, but the member copy wins by quite
a margin.


As you see: it depends. Now if this is really a serious problem in your
app, look at the code of std::copy in the g++ implementation and try to find
out why is it slow.

--
WW aka Attila
Jul 19 '05 #13

P: n/a
fr***************@ua.ac.be wrote:
On Sun, 14 Sep 2003, Kevin Goodsell wrote:
Also, you are copying one too many elements with your std::copy. The
second argument should be src+n.

The test in the beginning (copying a to b and c) seemed to show that
std::copy needed the +1 to copy the same amount of elements ...


It's the other way around. The other tests are copying too many elements.

As for your issue, I would recommend writing your copy function to call
std::copy. Later, when everything's working, you can replace it with a
faster function if you feel it's necessary. But the "faster function"
might be slower on some systems, so you have to watch that. Simply using
std::copy is probably a good compromise anyway.

-Kevin
--
My email address is valid, but changes periodically.
To contact me please use the address from a recent posting.

Jul 19 '05 #14

P: n/a
On Mon, 15 Sep 2003, White Wolf wrote:
I am looking for the most optimized and thus fastest solution ...
Then again: you are looking at the last steps of any possible
optimization (step #4). Did you do the previous step? Measure that
this copy is really a bottleneck of your application?


As the application I am writing involves a large amount of array copies,
it is very important to make it work as fast as possible ...
And the answer is still the same. It depends. It depends on the
quality of your compiler, your standard library implementation, the
quality of their interaction etc. In theory the standard library copy
should be the fastest possible implementation. If it is not, you may
roll your own as well as report to the implementors that their is
non-optimal and what can they do to make it better.
Indeed, that was my first thought, namely that the implementors of the
library would know best how to make their function work optimally, as they
developed the library themselves.

On the other hand, as somebody else replied, it might not be very optimal,
since it is written with generality in mind, that is, it should work on
classes too ... while if you would write your own copying routine, it is
optimized for the application at hand (in this case, an array copy).
But while doing so you will wonder into implementation-dependent-land.
Meaning that if you change compiler/standard libarry implementation you
may find that now your loop is inferior and std::copy is faster. That
is why I am saying: if it is an academic try just staying with this
answer: it depends. If you really really (swear on your grave) have
this as a serious bottleneck, you will need to find out the cause of it.
Since g++ is GNU, if you have found the cause and fixed it
professionally it (I am sure) will make it into the implementation.
Until it does, you may stick with a home-made solution.


Yet, even in the case of g++, I am sure it is written as optimal as much
as it is made generic ...

Regards,

Franky B.
Jul 19 '05 #15

P: n/a
In article <Pi*******************************@leibniz.ruca.ua .ac.be>, fr***************@ua.ac.be wrote:
On Mon, 15 Sep 2003, White Wolf wrote:
> I am looking for the most optimized and thus fastest solution ...


Then again: you are looking at the last steps of any possible
optimization (step #4). Did you do the previous step? Measure that
this copy is really a bottleneck of your application?


As the application I am writing involves a large amount of array copies,
it is very important to make it work as fast as possible ...
And the answer is still the same. It depends. It depends on the
quality of your compiler, your standard library implementation, the
quality of their interaction etc. In theory the standard library copy
should be the fastest possible implementation. If it is not, you may
roll your own as well as report to the implementors that their is
non-optimal and what can they do to make it better.


Indeed, that was my first thought, namely that the implementors of the
library would know best how to make their function work optimally, as they
developed the library themselves.

On the other hand, as somebody else replied, it might not be very optimal,
since it is written with generality in mind, that is, it should work on
classes too ... while if you would write your own copying routine, it is
optimized for the application at hand (in this case, an array copy).

[-]
A compiler can detect whether std::copy is applied to some POD type or
something more complex and choose the best template for each case.
But while doing so you will wonder into implementation-dependent-land.
Meaning that if you change compiler/standard libarry implementation you
may find that now your loop is inferior and std::copy is faster. That
is why I am saying: if it is an academic try just staying with this
answer: it depends. If you really really (swear on your grave) have
this as a serious bottleneck, you will need to find out the cause of it.
Since g++ is GNU, if you have found the cause and fixed it
professionally it (I am sure) will make it into the implementation.
Until it does, you may stick with a home-made solution.


Yet, even in the case of g++, I am sure it is written as optimal as much
as it is made generic ...

[-]
There're more that one copy templates behind std::copy(). The source is
there, so why make assumptions ?

Juergen

--
\ Real name : Juergen Heinzl \ no flames /
\ EMail Private : ju*****@manannan.org \ send money instead /
Jul 19 '05 #16

P: n/a
On Sun, 14 Sep 2003, Josh Sebastian wrote:
What does "YMMV" mean? Never heard it before ...
Your mileage may vary.


I never would have guessed it 8-|
I added a couple of tests: s2Copy (which is std::copy with the arithmetic
operations taken out of the timing) and bCopy (which calls memcpy to do a
bitwise copy). Here's my results:

[..]
curien@balar:~/prog$ g++ -O2 blah.cpp
curien@balar:~/prog$ ./a.out
a : 0 1 2 3 4 5 6 7 8 9
b : 0 0 0 0 0 0 0 0 0 0
c : 0 0 0 0 0 0 0 0 0 0
b after iCopy : 0 1 2 3 4 5 6 7 0 0
c after sCopy : 0 1 2 3 4 5 6 7 0 0
time for iCopy : 1510
time for mCopy : 1460
time for sCopy : 1490
time for s2Copy : 1480
time for bCopy : 1490
time for iCopy : 1480
time for mCopy : 1460
time for sCopy : 1510
time for s2Copy : 1490
time for bCopy : 1470

As you can see, different methods are faster, depending on optimization
settings, but the different is miniscule. Interestingly, I also show that
mCopy is the fastest.


Thanks,

Franky B.
Jul 19 '05 #17

P: n/a

<fr***************@ua.ac.be>
YMMV.

What does "YMMV" mean? Never heard it before ...

</>

YMMV Your Mileage May Vary; literally, "your experience may differ".

Until I looked it up I thought it meant

You Make Me Vomit :-)

Nothing personal... Look it up at

http://www.astro.umd.edu/~marshall/abbrev.html

it's fun.

-X
Jul 19 '05 #18

P: n/a
On Sun, 14 Sep 2003, Juergen Heinzl wrote:
A compiler can detect whether std::copy is applied to some POD type or
something more complex and choose the best template for each case.
[..]
There're more that one copy templates behind std::copy(). The source is
there, so why make assumptions ?


Because my results show that an ordinary member copy (and even the pointer
copy) are faster than the std::copy case ... so that would mean that
either the correct template isn't used, or that the code is not optimally
written. I still need to check the code however, to see what method of
copying is used.

Regards,

Franky B.
Jul 19 '05 #19

P: n/a
On Mon, 15 Sep 2003, White Wolf wrote:
The FAQ is your friend.
http://www.parashift.com/c++-faq-lit...t.html#faq-5.1
Know this one ...
and in there you see:
http://www.astro.umd.edu/~marshall/abbrev.html
Didn't quite see that one ... YAB! (*)
As you see: it depends. Now if this is really a serious problem in your
app, look at the code of std::copy in the g++ implementation and try to
find out why is it slow.


Yes, I need to check the code to see what's going on ... but I would not
be asking this question if it didn't matter of if it wasn't a serious
problem!

Regards,

Franky B.
Jul 19 '05 #20

P: n/a
fr***************@ua.ac.be wrote:
On Mon, 15 Sep 2003, White Wolf wrote:
I am looking for the most optimized and thus fastest solution ...
Then again: you are looking at the last steps of any possible
optimization (step #4). Did you do the previous step? Measure that
this copy is really a bottleneck of your application?


As the application I am writing involves a large amount of array
copies, it is very important to make it work as fast as possible ...


http://magnonel.guild.net/~schwern/t...imization.html

http://tinyurl.com/ncpc
Indeed, that was my first thought, namely that the implementors of the
library would know best how to make their function work optimally, as
they developed the library themselves.
In case of g++ also the compiler.
On the other hand, as somebody else replied, it might not be very
optimal, since it is written with generality in mind, that is, it
should work on classes too ... while if you would write your own
copying routine, it is optimized for the application at hand (in this
case, an array copy).
This is only half true. Nothing whatsoever prevents the library implentor
to specialize its templates for cases dealing with POD types etc. There are
many kinds of magic which can be used to achieve that.
Yet, even in the case of g++, I am sure it is written as optimal as
much as it is made generic ...


As above: it can be made generic *and* optimal. The magic of templates.

--
WW aka Attila
Jul 19 '05 #21

P: n/a
fr***************@ua.ac.be wrote:
As you see: it depends. Now if this is really a serious problem in
your app, look at the code of std::copy in the g++ implementation
and try to find out why is it slow.


Yes, I need to check the code to see what's going on ... but I would
not be asking this question if it didn't matter of if it wasn't a
serious problem!


Did you *measure*? Not little test programs outside, but *inside* your
application. Does the profiler show these copy operations to be a
bottleneck?

--
WW aka Attila
Jul 19 '05 #22

P: n/a
<franky.backeljauw>
http://www.astro.umd.edu/~marshall/abbrev.html


Didn't quite see that one ... YAB! (*)


(*) also see
http://www.winternet.com/~mikelr/flame1.html

-X
Jul 19 '05 #23

P: n/a
On Sun, 14 Sep 2003, Kevin Goodsell wrote:
~Vector() { if ( array ) delete( array ), array = 0; } This is still wrong.


Yes, as I had e-mailed it before I had read your message ...
friend void iCopy( Element *dst, const Element *src, int n ) { int i = n + 1; while ( i-- > 0 ) *dst++ = *src++; }
friend void mCopy( Element *dst, const Element *src, int n ) { int i = n + 1; while ( i-- > 0 ) dst[ i ] = src[i]; }
friend void sCopy( Element *dst, const Element *src, int n ) { std::copy( src, src+n+1, dst ); }

These are all going 1 element too far.


No, this depends on which n is given ...
Vector a( 10 ), b( 10 ), c( 10 ); a.init();
cout << "a : "; a.print();
cout << "b : "; b.print();
cout << "c : "; c.print();

b and c have uninitialized memory that you are trying to print. This
gives undefined behavior.


Why? Must a double be initialized, in order to get it's memory location
initialized?
I had to make a number of changes to get sensible output. The result was
being truncated to 0. Here are the results (Visual C++ 6.0, default
project settings, original library with a few of the fixes published by
Dinkumware, none of which seem likely to affect this test):

Normal optimizations (release build)
----
time for iCopy : 0.735
time for mCopy : 0.671
time for sCopy : 0.672
time for iCopy : 0.703
time for mCopy : 0.672
time for sCopy : 0.687

Not a terribly significant difference in any case. I wouldn't expect
this to be a bottleneck


No, but your results also seem to indicate that the member copy is the
fastest ... maybe I should have put my question like this: what is the
fastest way to copy arrays of pod?

Regards,

Franky B.
Jul 19 '05 #24

P: n/a
On Sun, 14 Sep 2003, Kevin Goodsell wrote:
As for your issue, I would recommend writing your copy function to call
std::copy. Later, when everything's working, you can replace it with a
faster function if you feel it's necessary. But the "faster function"
might be slower on some systems, so you have to watch that. Simply using
std::copy is probably a good compromise anyway.


It is not, at least on my system, because an ordinary member copy takes
only two thirds of the time that std::copy takes to copy the same array.

Regards,

Franky B.
Jul 19 '05 #25

P: n/a
On Mon, 15 Sep 2003, White Wolf wrote:
Then again: you are looking at the last steps of any possible
optimization (step #4). Did you do the previous step? Measure that
this copy is really a bottleneck of your application?


As the application I am writing involves a large amount of array
copies, it is very important to make it work as fast as possible ...


http://magnonel.guild.net/~schwern/t...imization.html
http://tinyurl.com/ncpc


Aha, so now I get to know what step #4 is all about ... I'll read it first
thing tomorrow.

Regards,

Franky B.
Jul 19 '05 #26

P: n/a
On Mon, 15 Sep 2003, Agent Mulder wrote:
http://www.astro.umd.edu/~marshall/abbrev.html


Didn't quite see that one ... YAB! (*)


(*) also see
http://www.winternet.com/~mikelr/flame1.html


I just meant : Yet Another Bookmark ...

Regards,

Franky B.
Jul 19 '05 #27

P: n/a
fr***************@ua.ac.be wrote:

These are all going 1 element too far.

No, this depends on which n is given ...


OK, but I submit that it's very odd to use an function interface where
the size parameter actually means "One less than the size".

b and c have uninitialized memory that you are trying to print. This
gives undefined behavior.

Why? Must a double be initialized, in order to get it's memory location
initialized?


I don't understand your second question. But C and C++ both forbid
"examining" an object that has not been given a value. The following
program has undefined behavior:

int main()
{
int i;
i;
}


No, but your results also seem to indicate that the member copy is the
fastest ... maybe I should have put my question like this: what is the
fastest way to copy arrays of pod?


The answer is simple: Unspecified.

-Kevin
--
My email address is valid, but changes periodically.
To contact me please use the address from a recent posting.

Jul 19 '05 #28

P: n/a
fr***************@ua.ac.be wrote:
On Mon, 15 Sep 2003, Agent Mulder wrote:
http://www.astro.umd.edu/~marshall/abbrev.html

Didn't quite see that one ... YAB! (*)


(*) also see
http://www.winternet.com/~mikelr/flame1.html


I just meant : Yet Another Bookmark ...


I gave up bookmarking. I have too many and too many broken. My bookmark
nowadays is few stable sites (FAQs etc.) and Google is for the rest. :-)

--
WW aka Attila
Jul 19 '05 #29

P: n/a
On Mon, 15 Sep 2003 01:10:53 +0200, fr***************@ua.ac.be wrote:
No, but your results also seem to indicate that the member copy is the
fastest ... maybe I should have put my question like this: what is the
fastest way to copy arrays of pod?


It depends on the compiler. Good candidates are std::copy,
std::memcpy, memberwise loop, and, often the winner for poorer
compilers, Duff's Device.

In my tests, the results depend very much on the precise optimization
settings. With loop unrolling, iCopy, mCopy and dCopy are generally
better than cCopy (memcpy) and sCopy (std::copy). Without, dCopy is
generally best, followed by the rest all about even.

Here are my results (all with max optimization), followed by the code:

GCC 2.95 (-O3 -funroll-loops):
time for iCopy : 0.546
time for mCopy : 0.625
time for sCopy : 1.11
time for cCopy : 1.093
time for dCopy : 0.61
time for iCopy : 0.531
time for mCopy : 0.703
time for sCopy : 1.094
time for cCopy : 1.094
time for dCopy : 0.593

GCC 3.2 (-O3 -funroll-loops -mcpu=athlon):
time for iCopy : 0.516
time for mCopy : 0.609
time for sCopy : 1.156
time for cCopy : 1.032
time for dCopy : 0.64
time for iCopy : 0.531
time for mCopy : 0.625
time for sCopy : 1.204
time for cCopy : 1.046
time for dCopy : 0.641

VC7.1 (/Ox /Og /Oi /Ot /Oy /G7 /GA):
time for iCopy : 0.813
time for mCopy : 0.625
time for sCopy : 1.093
time for cCopy : 1.032
time for dCopy : 0.656
time for iCopy : 0.812
time for mCopy : 0.672
time for sCopy : 1.11
time for cCopy : 1.031
time for dCopy : 0.656
#include <time.h>
#include <algorithm>
#include <iostream>
#include <string.h>

using namespace std;

typedef double Element;
void iCopy( Element *dst, const Element *src, int n ) { while ( n-- >
0 ) *dst++ = *src++; }
void mCopy( Element *dst, const Element *src, int n ) { while ( n-- >
0 ) dst[n] = src[n]; }
void sCopy( Element *dst, const Element *src, int n ) { std::copy(src,
src+n, dst); }
void cCopy( Element *dst, const Element *src, int n ) { memcpy(dst,
src, n*sizeof *dst); }
void dCopy( Element *dst, const Element *src, int n )
{
const Element* const end = src + n;
if (n <= 0)
return;
switch(n % 8)
{
while(src != end)
{
case 0:
*dst++ = *src++;
case 7:
*dst++ = *src++;
case 6:
*dst++ = *src++;
case 5:
*dst++ = *src++;
case 4:
*dst++ = *src++;
case 3:
*dst++ = *src++;
case 2:
*dst++ = *src++;
case 1:
*dst++ = *src++;
}
}
}

class Vector
{
public:

Vector( unsigned int newsize ) { array = new Element[ newsize ];
size = newsize; }
~Vector() { delete[] array; }

void init() { for ( unsigned int i = 0; i<size; i++ ) array[ i ] =
i; }
void print() { for ( unsigned int i = 0; i<size; i++ ) cout <<
array[ i ] << " "; cout << endl; }

unsigned int size;
Element *array;
};

int main()
{
Vector a( 10 ), b( 10 ), c( 10 ); a.init();
cout << "a : "; a.print();
cout << "b : "; b.print();
cout << "c : "; c.print();

iCopy( b.array, a.array, 7 );
cout << "b after iCopy : "; b.print();

sCopy( c.array, a.array, 7 );
cout << "c after sCopy : "; c.print();

clock_t start, stop;
double taken;

int VECTOR_SIZE = 100;
int LOOP_COUNT = 5000000;

Vector d( VECTOR_SIZE ), e( VECTOR_SIZE );
d.init();

start = clock();
for ( int i = 0 ; i < LOOP_COUNT; i++ ) iCopy( e.array, d.array,
VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / (double)CLOCKS_PER_SEC;
cerr << "time for iCopy : " << taken << endl;

start = clock();
for ( int i = 0 ; i < LOOP_COUNT; i++ ) mCopy( e.array, d.array,
VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / (double)CLOCKS_PER_SEC;
cerr << "time for mCopy : " << taken << endl;

start = clock();
for ( int i = 0; i < LOOP_COUNT; i++ ) sCopy( e.array, d.array,
VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / (double)CLOCKS_PER_SEC;
cerr << "time for sCopy : " << taken << endl;

start = clock();
for ( int i = 0; i < LOOP_COUNT; i++ ) cCopy( e.array, d.array,
VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / (double)CLOCKS_PER_SEC;
cerr << "time for cCopy : " << taken << endl;

start = clock();
for ( int i = 0; i < LOOP_COUNT; i++ ) dCopy( e.array, d.array,
VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / (double)CLOCKS_PER_SEC;
cerr << "time for dCopy : " << taken << endl;

start = clock();
for ( int i = 0 ; i < LOOP_COUNT; i++ ) iCopy( e.array, d.array,
VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / (double)CLOCKS_PER_SEC;
cerr << "time for iCopy : " << taken << endl;

start = clock();
for ( int i = 0 ; i < LOOP_COUNT; i++ ) mCopy( e.array, d.array,
VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / (double)CLOCKS_PER_SEC;
cerr << "time for mCopy : " << taken << endl;

start = clock();
for ( int i = 0; i < LOOP_COUNT; i++ ) sCopy( e.array, d.array,
VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / (double)CLOCKS_PER_SEC;
cerr << "time for sCopy : " << taken << endl;

start = clock();
for ( int i = 0; i < LOOP_COUNT; i++ ) cCopy( e.array, d.array,
VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / (double)CLOCKS_PER_SEC;
cerr << "time for cCopy : " << taken << endl;

start = clock();
for ( int i = 0; i < LOOP_COUNT; i++ ) dCopy( e.array, d.array,
VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / (double)CLOCKS_PER_SEC;
cerr << "time for dCopy : " << taken << endl;

return 0;
}
Jul 19 '05 #30

P: n/a
On Mon, 15 Sep 2003, tom_usenet wrote:
It depends on the compiler. Good candidates are std::copy,
std::memcpy, memberwise loop, and, often the winner for poorer
compilers, Duff's Device.

In my tests, the results depend very much on the precise optimization
settings. With loop unrolling, iCopy, mCopy and dCopy are generally
better than cCopy (memcpy) and sCopy (std::copy). Without, dCopy is
generally best, followed by the rest all about even.


Thanks! I did not yet test with the maximum optimizations ...

Regards,

Franky B.
Jul 19 '05 #31

This discussion thread is closed

Replies have been disabled for this discussion.