468,101 Members | 1,325 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,101 developers. It's quick & easy.

Program performance/optimisation

Hi,
I was exploring the affect of cache on program
performance/optimisation.Is it the compilers responsibility only to
consider this kind of optimisation or the programmer can do his bit in
this case ?
Reading through the "Expert C Programming" text,it mentions how the
below program can be efficient taking the cache details into accont.

The below program can be executed using the two versions of copy
alternatively and running the time command on the executable on Unix,to
see the difference.As obvious,the slowdown happens in DUMBCOPY.

#include<stdio.h>
#include<string.h>

#define DUMBCOPY for (i = 0; i < 65536; i++) \
destination[i] = source[i]

#define SMARTCOPY memcpy(destination, source, 65536)

int main()
{
char source[65536], destination[65536];
int i, j;
for (j = 0; j < 100; j++)
SMARTCOPY;
/* DUMBCOPY; */
return 0;
}

Below are the reasonings :
The slowdown happens because the source and destination are an exact
multiple of the cache size apart.The particular algorithm used happens
to fill the same line for main memory addresses that are exact multiples
of the cache size apart.

In this particular case both the source and destination use the same
cache line, causing every memory reference to miss the cache and stall
the processor while it waited for regular memory to deliver. The library
memcpy() routine is especially tuned for high performance.
It unrolls the loop to read for one cache line and then write, which
avoids the problem.Using the smart copy, we were able to get a huge
performance improvement. This also shows the folly of drawing
conclusions from simple-minded benchmark programs.

I dont fully understand the above 2 paragraphs,so if someone could give
a better explanation.Would also appreciate any helpful pointers.

This might not be something directly related to C,but I thought I would
get better answers in this newsgroup and hence the posting.

-TIA
Sep 12 '06 #1
1 1799
On Tue, 12 Sep 2006 21:19:34 +0530, grid <pr******@gmail.comwrote in
comp.lang.c:
Hi,
I was exploring the affect of cache on program
performance/optimisation.Is it the compilers responsibility only to
consider this kind of optimisation or the programmer can do his bit in
this case ?
Any special effort a particular compiler makes to use cache, or any
other hardware feature of the platform, is completely a QOI (Quality
Of Implementation) issue, not a language one. The C language and its
standard define the operation of a correctly written program. They
make no mention of, nor do they place any requirements, on the speed
of efficiency, or any program.
Reading through the "Expert C Programming" text,it mentions how the
below program can be efficient taking the cache details into accont.

The below program can be executed using the two versions of copy
alternatively and running the time command on the executable on Unix,to
see the difference.As obvious,the slowdown happens in DUMBCOPY.

#include<stdio.h>
#include<string.h>

#define DUMBCOPY for (i = 0; i < 65536; i++) \
destination[i] = source[i]

#define SMARTCOPY memcpy(destination, source, 65536)

int main()
{
char source[65536], destination[65536];
int i, j;
for (j = 0; j < 100; j++)
SMARTCOPY;
/* DUMBCOPY; */
return 0;
}

Below are the reasonings :
The slowdown happens because the source and destination are an exact
multiple of the cache size apart.The particular algorithm used happens
to fill the same line for main memory addresses that are exact multiples
of the cache size apart.

In this particular case both the source and destination use the same
cache line, causing every memory reference to miss the cache and stall
the processor while it waited for regular memory to deliver. The library
memcpy() routine is especially tuned for high performance.
It unrolls the loop to read for one cache line and then write, which
avoids the problem.Using the smart copy, we were able to get a huge
performance improvement. This also shows the folly of drawing
conclusions from simple-minded benchmark programs.

I dont fully understand the above 2 paragraphs,so if someone could give
a better explanation.Would also appreciate any helpful pointers.

This might not be something directly related to C,but I thought I would
get better answers in this newsgroup and hence the posting.
Actually, your question is completely off-topic here. As far as C is
concerned, there is no such thing as a cache, cache line, or processor
stall. This is all quite hardware and architecture dependent.

You need to ask questions about this in some sort of platform specific
newsgroup. The moderated group news:comp.lang.asm.x86 is a good place
to discuss the behavior of such things as cache on x86 processors.
You'll have to look to find an appropriate group for other processor
architectures.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html
Sep 12 '06 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

15 posts views Thread by RAYYILDIZ | last post: by
9 posts views Thread by Java script Dude | last post: by
14 posts views Thread by Nigel | last post: by
8 posts views Thread by rendle | last post: by
334 posts views Thread by Antoninus Twink | last post: by
12 posts views Thread by lali.b97 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.