memcpy() vs. for() performance

Case

#define SIZE 100
#define USE_MEMCPY

int main(void)
{
char a[SIZE];
char b[SIZE];
int n;

/* code 'filling' a[] */

#ifdef USE_MEMCPY
memcpy(b, a, sizeof(a));
#else
for (n = 0; n < sizeof(a); n++)
{
b[n] = a[n];
}
#endif
}

/*
Any (general) ideas about when (depending on SIZE) to use
memcpy(), and when to use for()?

<OT>
Any remarks about this issue using GCC, or the Sun compiler,
are welcome.
</OT>
*/

Nov 14 '05 #1

Subscribe Post Reply

33593

jacob navia

memcpy implementations tend to be very optimized and well done,
specially for machines that have a block move instruction.

At the other hand, a very clever compiler would recognize
that you are doing a memory move and replace the whole
"for" loop into a memory move instruction if available.

There is no way to know without you measuring the
relative performances in your machine and with your compiler
options

Contrary to what many people think, measuring speeds is not
a waste of time. It provides you with concrete data concerning
your choice.

Why depend on what some "gurus" tell you in C.L.C?

Better find out exactly what is the best: measure it.

jacob

Nov 14 '05 #2

Case

Case wrote:

#define SIZE 100
#define USE_MEMCPY

int main(void)
{
char a[SIZE];
char b[SIZE];
int n;

/* code 'filling' a[] */

#ifdef USE_MEMCPY
memcpy(b, a, sizeof(a));
Don't you dare to forget #include <string.h> when using memcpy();
especially when posting to 'bloodhound' comp.lang.c. :-)
#else
for (n = 0; n < sizeof(a); n++)
{
b[n] = a[n];
}
#endif
}

/*
Any (general) ideas about when (depending on SIZE) to use
memcpy(), and when to use for()?

<OT>
Any remarks about this issue using GCC, or the Sun compiler,
are welcome.
</OT>
*/

Nov 14 '05 #3

Alex Fraser

"Case" <no@no.no> wrote in message
news:40***********************@news.xs4all.nl...

#define SIZE 100
#define USE_MEMCPY

int main(void)
{
char a[SIZE];
char b[SIZE];
int n;

/* code 'filling' a[] */

#ifdef USE_MEMCPY
memcpy(b, a, sizeof(a));
#else
for (n = 0; n < sizeof(a); n++)
{
b[n] = a[n];
}
#endif
}

/*
Any (general) ideas about when (depending on SIZE) to use
memcpy(), and when to use for()?

I would always use memcpy(). Using a loop instead is a last-resort
optimisation (after a performance problem has been found, and attempts to
reduce the need failed or were rejected).

In practice I would expect the loop to be slower for anything more than a
few bytes, as memcpy() is likely to be implemented efficiently (more so than
can possibly be done in standard C).

Alex

Nov 14 '05 #4

Dan Pop

In <40***********************@news.xs4all.nl> Case <no@no.no> writes:

#define SIZE 100
#define USE_MEMCPY

int main(void)
{
char a[SIZE];
char b[SIZE];
int n;

/* code 'filling' a[] */

#ifdef USE_MEMCPY
memcpy(b, a, sizeof(a));
#else
for (n = 0; n < sizeof(a); n++)
{
b[n] = a[n];
}
#endif
}

/*
Any (general) ideas about when (depending on SIZE) to use
memcpy(), and when to use for()?
ALWAYS use memcpy(), NEVER use for loops, unless you have empirical
evidence that your memcpy() is very poorly implemented.

A well implemented memcpy() can use many tricks to accelerate its
operation.
<OT>
Any remarks about this issue using GCC, or the Sun compiler,
are welcome.
gcc is smart enough to inline memcpy calls for short memory blocks,
when optimisations are enabled:

fangorn:~/tmp 273> cat test.c
#include <string.h>

void foo(int *p, int *q)
{
memcpy(q, p, 2 * sizeof *p);
}
fangorn:~/tmp 274> gcc -O2 -S test.c
fangorn:~/tmp 275> cat test.s
.file "test.c"
.text
.p2align 4,,15
.globl foo
.type foo, @function
foo:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %edx
movl 12(%ebp), %ecx
movl (%edx), %eax
movl %eax, (%ecx)
movl 4(%edx), %eax
movl %eax, 4(%ecx)
popl %ebp
ret
.size foo, .-foo
.section .note.GNU-stack,"",@progbits
.ident "GCC: (GNU) 3.3.3"

Even if you have no clue about x86 assembly, you can easily see that there
is no memcpy call in the code generated by gcc for this function. One
more reason to prefer memcpy to for loops.
</OT>
*/

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #5

Dan Pop

In <cb**********@news-reader5.wanadoo.fr> "jacob navia" <ja***@jacob.remcomp.fr> writes:

memcpy implementations tend to be very optimized and well done,
specially for machines that have a block move instruction.

They tend to be very optimised and well done for machines without a
block move instruction, too. Been there, done that.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #6

Thomas Matthews

Here are some guidelines for copying data (objects).
1. For small, intrinsic types, use assignment.*
2. For small amounts of data use a "for" loop. **
3. For large amounts of data prefer memcpy. **
4. For large amounts of data don't copy, use pointers.
Copying pointers takes less time.
5. For huge amounts of data, seek hardware assistance.
[Yep, this is not portable.]

* Repeated assignments may be faster and more efficient
than a small "for" loop. Many processors execute
data processing instructions faster than branch
instructions. For example, 4 assignments may be
faster than executing one assignment statement
4 times.

Also try and use your processor's native integer
size. For example, if your processor likes 32-bit
quantities, copy 32-bits at a time, rather than
8-bits.

** The threshold of when to use "for" vs. memcpy
depends on how your compiler uses memcpy. An
inlined version will have less overhead. A
memcpy function will have the minimum overhead
of executing the calling and return sequences.
Measure this overhead. Then determine how many
copy statements can be executed within this
time frame. This will be your threshold of
when to use memcpy vs. for-loop.

I've written my own memcpy function which uses the
processor's specialized instructions. However,
it has a minimum overhead. The threshold between
using memcpy for large areas vs. the DMA device
is very close (on my platform).

The best you can do is to profile. Is the copy
the bottleneck of your system? Is it executed
often?

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

Nov 14 '05 #7

Arthur J. O'Dwyer

On Wed, 30 Jun 2004, Dan Pop wrote:

Case <no@no.no> writes:

Any (general) ideas about when (depending on SIZE) to use
memcpy(), and when to use for()?
ALWAYS use memcpy(), NEVER use for loops, unless you have empirical
evidence that your memcpy() is very poorly implemented.

A well implemented memcpy() can use many tricks to accelerate its
operation.

Agreed and agreed. I use 'memcpy' any time I can guarantee it
will be safe, which in C is all the time, as far as I can recall.
Of course, I don't write many programs in which "copy a chunk of
memory from A to B" is much of a bottleneck... :)

<OT>
Any remarks about this issue using GCC, or the Sun compiler,
are welcome.

gcc is smart enough to inline memcpy calls for short memory blocks,
when optimisations are enabled:

fangorn:~/tmp 273> cat test.c
#include <string.h>

void foo(int *p, int *q)
{
memcpy(q, p, 2 * sizeof *p);
}
fangorn:~/tmp 274> gcc -O2 -S test.c
fangorn:~/tmp 275> cat test.s

[...] Even if you have no clue about x86 assembly, you can easily see that there
is no memcpy call in the code generated by gcc for this function. One
more reason to prefer memcpy to for loops.
Unfortunately for your example, "The Dev Team Thinks Of Everything"
in GCC, too:

% cat test.c
#include <string.h>

void foo(int *p, int *q)
{
memcpy(q, p, 2 * sizeof *p);
}
% gcc -O2 -S test.c
% cat test2.c
#include <string.h>

void foo(int *p, int *q)
{
int i;
for (i=0; i < 2; ++i)
q[i] = p[i];
}
% gcc -O2 -S test2.c
% diff test.s test2.s
1c1
< .file "test.c"
--- .file "test2.c"

%
One more reason to prefer whichever alternative is the more readable
(in this case, the alternative that doesn't involve a function call
to do a one-line task :) .

-Arthur

Nov 14 '05 #8

Arthur J. O'Dwyer

On Wed, 30 Jun 2004, Arthur J. O'Dwyer wrote:

One more reason to prefer whichever alternative is the more readable
(in this case, the alternative that doesn't involve a function call
to do a one-line task :) .

And to clarify: I mean the function call 'foo', not the function
call 'memcpy'. 'memcpy' is good. 'foo' itself is unnecessary and
ought to be removed. :)
Okay, I think that's clearer.

-Arthur

Nov 14 '05 #9

luc wastiaux

Thomas Matthews wrote:

I've written my own memcpy function which uses the
processor's specialized instructions. However,
it has a minimum overhead. The threshold between
using memcpy for large areas vs. the DMA device
is very close (on my platform).

Out of curiosity, how do you instruct your processor to use DMA in your
custom memcpy function ?

--
luc wastiaux

Nov 14 '05 #10

Richard Bos

luc wastiaux <du*******@airpost.net> wrote:

Thomas Matthews wrote:
I've written my own memcpy function which uses the
processor's specialized instructions. However,
it has a minimum overhead. The threshold between
using memcpy for large areas vs. the DMA device
is very close (on my platform).

Out of curiosity, how do you instruct your processor to use DMA in your
custom memcpy function ?

In ISO C, you don't. It all depends on the architecture, and therefore
will differ between, say, an Intel machine and a Sparc.

Richard

Nov 14 '05 #11

Thomas Matthews

luc wastiaux wrote:

Thomas Matthews wrote:
I've written my own memcpy function which uses the
processor's specialized instructions. However,
it has a minimum overhead. The threshold between
using memcpy for large areas vs. the DMA device
is very close (on my platform).

Out of curiosity, how do you instruct your processor to use DMA in your
custom memcpy function ?

I use assembly language. The DMA is not a part of the processor,
but a component on the platform. The DMA has a setup overhead,
so it should only be used for large or automated transfers.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

Nov 14 '05 #12

Edmund Bacon

Arthur J. O'Dwyer wrote:

Case <no@no.no> writes:

A well implemented memcpy() can use many tricks to accelerate its
operation.

Agreed and agreed. I use 'memcpy' any time I can guarantee it
will be safe, which in C is all the time, as far as I can recall.

Aren't there issues with memcpy and overlapping memory locations?

In the following program, isn't the call to memcpy an error?

#include <stdio.h>
#include <string.h>

int main()
{

int x[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
int *to = x;
int *from = &x[1];

memcpy(to, from, sizeof x - sizeof *x); /* UB ? */

return 0;
}

Nov 14 '05 #13

Dan Pop

In <Pi**********************************@unix49.andre w.cmu.edu> "Arthur J. O'Dwyer" <aj*@nospam.andrew.cmu.edu> writes:

On Wed, 30 Jun 2004, Arthur J. O'Dwyer wrote:

One more reason to prefer whichever alternative is the more readable
(in this case, the alternative that doesn't involve a function call
to do a one-line task :) .

And to clarify: I mean the function call 'foo', not the function
call 'memcpy'. 'memcpy' is good. 'foo' itself is unnecessary and
ought to be removed. :)
Okay, I think that's clearer.

Indeed. foo() was introduced for the sole reason of having a minimal
translation unit ;-)

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #14

Dan Pop

In <Pi**********************************@unix49.andre w.cmu.edu> "Arthur J. O'Dwyer" <aj*@nospam.andrew.cmu.edu> writes:

Unfortunately for your example, "The Dev Team Thinks Of Everything"
in GCC, too:

% cat test.c
#include <string.h>

void foo(int *p, int *q)
{
memcpy(q, p, 2 * sizeof *p);
}
% gcc -O2 -S test.c
% cat test2.c
#include <string.h>

void foo(int *p, int *q)
{
int i;
for (i=0; i < 2; ++i)
q[i] = p[i];
}
% gcc -O2 -S test2.c
% diff test.s test2.s
1c1
< .file "test.c"
---
.file "test2.c"%

Which shows that the memcpy version is still at least as good as the
for loop ;-)
One more reason to prefer whichever alternative is the more readable
(in this case, the alternative that doesn't involve a function call
to do a one-line task :) .

To me, the memcpy alternative is more readable than the other: it
consists of a single, very simple, idiomatic even (for objects that can't
be directly assigned) function call. Which I wouldn't hide behind a
function in real C code: either use as such, inline, or hidden behind
a macro.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #15

Dan Pop

In <hoCEc.914888$Pk3.851808@pd7tw1no> Edmund Bacon <eb****@SpamMeNot.onesystem.com> writes:

Arthur J. O'Dwyer wrote:
Case <no@no.no> writes:

A well implemented memcpy() can use many tricks to accelerate its
operation.
Agreed and agreed. I use 'memcpy' any time I can guarantee it
will be safe, which in C is all the time, as far as I can recall.

Aren't there issues with memcpy and overlapping memory locations?

Yes, there are.
In the following program, isn't the call to memcpy an error?

#include <stdio.h>
#include <string.h>

int main()
{

int x[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
int *to = x;
int *from = &x[1];

memcpy(to, from, sizeof x - sizeof *x); /* UB ? */

return 0;
}

Use memmove() in such cases. It has well defined behaviour for
overlapping memory blocks. Depending on the nature of the overlap,
it will either perform an ordinary memcpy() or a copy in the opposite
direction.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #16

Dan Pop

In <2w*****************@newssvr32.news.prodigy.com> Thomas Matthews <Th****************************@sbcglobal.net> writes:

luc wastiaux wrote:
Thomas Matthews wrote:
I've written my own memcpy function which uses the
processor's specialized instructions. However,
it has a minimum overhead. The threshold between
using memcpy for large areas vs. the DMA device
is very close (on my platform).

Out of curiosity, how do you instruct your processor to use DMA in your
custom memcpy function ?

I use assembly language. The DMA is not a part of the processor,
but a component on the platform. The DMA has a setup overhead,
so it should only be used for large or automated transfers.

By "automated" I guess you mean "asynchronous to the program execution".
Which has obvious advantages and disadvantages.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #17

Eric Sosman

Edmund Bacon wrote:

Arthur J. O'Dwyer wrote:

Case <no@no.no> writes:

A well implemented memcpy() can use many tricks to accelerate its
operation.

Agreed and agreed. I use 'memcpy' any time I can guarantee it
will be safe, which in C is all the time, as far as I can recall.

Aren't there issues with memcpy and overlapping memory locations?

In the following program, isn't the call to memcpy an error?
[snip example with overlapping source and destination]

Yes: The behavior of memcpy() is not defined if the
source and destination objects overlap. If that's a
possibility, use memmove() instead.

--
Er*********@sun.com

Nov 14 '05 #18

Old Wolf

"Arthur J. O'Dwyer" <aj*@nospam.andrew.cmu.edu> wrote:

ALWAYS use memcpy(), NEVER use for loops, unless you have empirical
evidence that your memcpy() is very poorly implemented.

A well implemented memcpy() can use many tricks to accelerate its
operation.

Agreed and agreed. I use 'memcpy' any time I can guarantee it
will be safe, which in C is all the time, as far as I can recall.
Of course, I don't write many programs in which "copy a chunk of
memory from A to B" is much of a bottleneck... :)

I have a slight aversion to memcpy, because of one compiler I had to
use, which would copy 65535 bytes if you called it with a third
argument of 0. (I think this is not standard-conforming, but
unfortunately the real world rears its ugly head sometimes).

FWIW this was Hitech C for the Z80 (and I guess the problem came
about because the Z80's block-move instruction does this if you
pass 0 as the length (it decrements and then checks the zero flag),
and the implementers must have not been aware of this behaviour).

Nov 14 '05 #19

luc wastiaux

Dan Pop wrote:

I've written my own memcpy function which uses the
processor's specialized instructions. However,
it has a minimum overhead. The threshold between
using memcpy for large areas vs. the DMA device
is very close (on my platform).

Out of curiosity, how do you instruct your processor to use DMA in your
custom memcpy function ?

I use assembly language. The DMA is not a part of the processor,
but a component on the platform. The DMA has a setup overhead,
so it should only be used for large or automated transfers.

By "automated" I guess you mean "asynchronous to the program execution".
Which has obvious advantages and disadvantages.

But how do you know when the transfer is complete then ? I assume that
even in synchronous mode, using DMA for large transfers can be beneficial.

--
luc wastiaux

Nov 14 '05 #20

Xenos

"luc wastiaux" <du*******@airpost.net> wrote in message
news:cb*********@news1.newsguy.com...

But how do you know when the transfer is complete then ? I assume that
even in synchronous mode, using DMA for large transfers can be beneficial.

DMA engine usually generate an interrupt or have a status register or such
to indicate completion.

Nov 14 '05 #21

Case -

Dan Pop wrote:

In <40***********************@news.xs4all.nl> Case <no@no.no> writes:
Any (general) ideas about when (depending on SIZE) to use
memcpy(), and when to use for()?
ALWAYS use memcpy(), NEVER use for loops, unless you have empirical
evidence that your memcpy() is very poorly implemented.

A well implemented memcpy() can use many tricks to accelerate its
operation.

Thanks Dan, I've moved over to always using memcpy(). And as
you say in a later post, its shorter/more elegant too; this is
an impotant thing (for me) too.

<OT>
Any remarks about this issue using GCC, or the Sun compiler,
are welcome.

gcc is smart enough to inline memcpy calls for short memory blocks,
when optimisations are enabled:

fangorn:~/tmp 273> cat test.c
#include <string.h>

void foo(int *p, int *q)
{
memcpy(q, p, 2 * sizeof *p);
}
fangorn:~/tmp 274> gcc -O2 -S test.c
fangorn:~/tmp 275> cat test.s
.file "test.c"
.text
.p2align 4,,15
.globl foo
.type foo, @function
foo:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %edx
movl 12(%ebp), %ecx
movl (%edx), %eax
movl %eax, (%ecx)
movl 4(%edx), %eax
movl %eax, 4(%ecx)
popl %ebp
ret
.size foo, .-foo
.section .note.GNU-stack,"",@progbits
.ident "GCC: (GNU) 3.3.3"

Even if you have no clue about x86 assembly, you can easily see that there
is no memcpy call in the code generated by gcc for this function. One
more reason to prefer memcpy to for loops.

Yes, this clearly states the point!

</OT>
*/

Dan

Nov 14 '05 #22

Barry Schwarz

On Wed, 30 Jun 2004 11:51:18 +0200, Case <no@no.no> wrote:

#define SIZE 100
#define USE_MEMCPY

int main(void)
{
char a[SIZE];
char b[SIZE];
int n;

/* code 'filling' a[] */

#ifdef USE_MEMCPY
memcpy(b, a, sizeof(a));
#else
for (n = 0; n < sizeof(a); n++)
{
b[n] = a[n];
}
#endif
}
While the two techniques are equivalent for char, they are not for any
type where sizeof(type) is not 1. You can change the limit check in
the for loop from n<sizeof(a) to n<SIZE to eliminate this restriction.

/*
Any (general) ideas about when (depending on SIZE) to use
memcpy(), and when to use for()?

The call to memcpy has a certain amount of overhead. The break even
point is when this overhead balances out the "extra efficiency" that
may be built in to memcpy. The only practical way to tell is to run
some tests.

<<Remove the del for email>>

Nov 14 '05 #23

Case

Dan Pop wrote:

In <40***********************@news.xs4all.nl> Case <no@no.no> writes:
Any (general) ideas about when (depending on SIZE) to use
memcpy(), and when to use for()?

ALWAYS use memcpy(), NEVER use for loops, unless you have empirical
evidence that your memcpy() is very poorly implemented.

A well implemented memcpy() can use many tricks to accelerate its
operation.

I did some tests myself, and found out that this is only true
when the block size is fixed/known. GCC nor Sun-CC 'inline/optimize'
the memcpy() when size is a variable. Unfortunately, at many
places in my code, the size is variable. Although my understanding
of this issue has increased, I must admit this was a flaw in my
initial question: an over simplification.

I'd be interested to hear comments/insights about this variable
case.

Case

Nov 14 '05 #24

Dan Pop

In <40***********************@news.xs4all.nl> Case <no@no.no> writes:

Dan Pop wrote:
In <40***********************@news.xs4all.nl> Case <no@no.no> writes:
Any (general) ideas about when (depending on SIZE) to use
memcpy(), and when to use for()?

ALWAYS use memcpy(), NEVER use for loops, unless you have empirical
evidence that your memcpy() is very poorly implemented.

A well implemented memcpy() can use many tricks to accelerate its
operation.

I did some tests myself, and found out that this is only true
when the block size is fixed/known. GCC nor Sun-CC 'inline/optimize'
the memcpy() when size is a variable. Unfortunately, at many
places in my code, the size is variable. Although my understanding
of this issue has increased, I must admit this was a flaw in my
initial question: an over simplification.

I'd be interested to hear comments/insights about this variable
case.

It would be *very* helpful if you didn't mix up things. Inlining is one
thing and providing a highly optimised library version of memcpy is a
completely different one.

When the size is unknown at compile time (or too large), the compiler
cannot won't inline the memcpy call, it will call the library version.
But the library version can still be much faster than the code generated
by the compiler from a for loop. Especially when dealing with arrays of
characters.

If you want ultimate answers, benchmark the two versions yourself.
Keep in mind that they cannot be extrapolated to other implementations.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #25

Alex Vinokur

"Alex Fraser" <me@privacy.net> wrote in message news:2k************@uni-berlin.de...
[snip]

In practice I would expect the loop to be slower for anything more than a
few bytes, as memcpy() is likely to be implemented efficiently (more so than
can possibly be done in standard C).

[snip]

Some results of performance measurement for several str- and mem-functions can be seen at:
* http://groups.google.com/groups?selm....uni-berlin.de

--
Alex Vinokur
http://mathforum.org/library/view/10978.html
http://sourceforge.net/users/alexvn

Nov 14 '05 #26

Case

Dan Pop wrote:

In <40***********************@news.xs4all.nl> Case <no@no.no> writes:

Dan Pop wrote:
In <40***********************@news.xs4all.nl> Case <no@no.no> writes:

Any (general) ideas about when (depending on SIZE) to use
memcpy(), and when to use for()?

ALWAYS use memcpy(), NEVER use for loops, unless you have empirical
evidence that your memcpy() is very poorly implemented.

A well implemented memcpy() can use many tricks to accelerate its
operation.
I did some tests myself, and found out that this is only true
when the block size is fixed/known. GCC nor Sun-CC 'inline/optimize'
the memcpy() when size is a variable. Unfortunately, at many
places in my code, the size is variable. Although my understanding
of this issue has increased, I must admit this was a flaw in my
initial question: an over simplification.

I'd be interested to hear comments/insights about this variable
case.

It would be *very* helpful if you didn't mix up things. Inlining is one
thing and providing a highly optimised library version of memcpy is a
completely different one.

I know the difference. What the compiler does looks like (in my eyes)
a form of inlining (the function call is replaced). But at the same
time the code that is inserted is highly optimized for the particular
block size; it's not just inserting a standard piece of memcpy code.
That's why I write 'inline/optimize', and quoted the expression to
mark it as not to be taken to literally, because it's a combination.

When the size is unknown at compile time (or too large), the compiler
cannot won't inline the memcpy call, it will call the library version.
When I had to make a choice between the two, I would call it
call it optimization. I'm surprized that you seem to prefer the
term inlining. Why?
But the library version can still be much faster than the code generated
by the compiler from a for loop. Especially when dealing with arrays of
characters.
Agreed. And, for simplicity I'd rather use one way all the time,
instead of context depedently (either code-time or even run-time)
choosing between a couple of alternatives. Otherwise this will
easily fall within the famous 97%.

If you want ultimate answers, benchmark the two versions yourself.
Keep in mind that they cannot be extrapolated to other implementations.

Yep, one other good reason to always use memcpy(). However, how was
the saying again .... "Never say always!" :-)

Thanks,

Case

Nov 14 '05 #27

Dan Pop

In <40***********************@news.xs4all.nl> Case <no@no.no> writes:

Dan Pop wrote:

When the size is unknown at compile time (or too large), the compiler
cannot inline the memcpy call, it will call the library version.

When I had to make a choice between the two, I would call it
call it optimization. I'm surprized that you seem to prefer the
term inlining. Why?

Because this is the specific name of that particular optimisation.
What is so difficult to understand?

As I said, inlining is NOT the only way an implementation can optimise
a memcpy call. There are plenty of optimisations that can be applied
to the library version of memcpy (especially if it's not written in C).

If you want ultimate answers, benchmark the two versions yourself.
Keep in mind that they cannot be extrapolated to other implementations.

Yep, one other good reason to always use memcpy(). However, how was
the saying again .... "Never say always!" :-)

Another failed attempt at humour...

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #28

Case -

Dan Pop wrote:

In <40***********************@news.xs4all.nl> Case <no@no.no> writes:
Yep, one other good reason to always use memcpy(). However, how was
the saying again .... "Never say always!" :-)

Another failed attempt at humour...

Humour is in the eye of the beholder.

Nov 14 '05 #29

Dan Pop

In <40**********************@dreader2.news.tiscali.nl > Case - <no@no.no> writes:

Dan Pop wrote:
In <40***********************@news.xs4all.nl> Case <no@no.no> writes:
Yep, one other good reason to always use memcpy(). However, how was
the saying again .... "Never say always!" :-)

Another failed attempt at humour...

Humour is in the eye of the beholder.

Only when a large enough number of beholders perceive it as such.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #30

red floyd

Case - <no@no.no> wrote in message news:<40**********************@dreader2.news.tisca li.nl>...

Humour is in the eye of the beholder.

Would that be vitreous humor or aqueous humor :-)

Nov 14 '05 #31

Keith Thompson

re********@yahoo.com (red floyd) writes:

Case - <no@no.no> wrote in message
news:<40**********************@dreader2.news.tisca li.nl>...
Humour is in the eye of the beholder.

Would that be vitreous humor or aqueous humor :-)

checking for [OT] tag ... ok

Yes, the eye certainly lens itself to puns. But enough of this
ocularity. If the jokes get any cornea, I'll give you 40 lashes.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 14 '05 #32

Case -

Dan Pop wrote:

In <40**********************@dreader2.news.tiscali.nl > Case - <no@no.no> writes:

Dan Pop wrote:

In <40***********************@news.xs4all.nl> Case <no@no.no> writes:

Yep, one other good reason to always use memcpy(). However, how was
the saying again .... "Never say always!" :-)
Another failed attempt at humour...

Humour is in the eye of the beholder.

Only when a large enough number of beholders perceive it as such.

No, on the contrary! Needing only the personal (i.e.,
individual) observation, is at the heart of the original
'beholder-saying'.

Case

Nov 14 '05 #33

Dan Pop

In <40**********************@dreader2.news.tiscali.nl > Case - <no@no.no> writes:

Dan Pop wrote:
In <40**********************@dreader2.news.tiscali.nl > Case - <no@no.no> writes:

Dan Pop wrote:
In <40***********************@news.xs4all.nl> Case <no@no.no> writes:

>Yep, one other good reason to always use memcpy(). However, how was
>the saying again .... "Never say always!" :-)
Another failed attempt at humour...

Humour is in the eye of the beholder.

Only when a large enough number of beholders perceive it as such.

No, on the contrary! Needing only the personal (i.e.,
individual) observation, is at the heart of the original
'beholder-saying'.

Which is why it doesn't apply to humour ;-)

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #34

memcpy() vs. for() performance

Similar topics