Program performance/optimisation

grid

Hi,
I was exploring the affect of cache on program
performance/optimisation.Is it the compilers responsibility only to
consider this kind of optimisation or the programmer can do his bit in
this case ?
Reading through the "Expert C Programming" text,it mentions how the
below program can be efficient taking the cache details into accont.

The below program can be executed using the two versions of copy
alternatively and running the time command on the executable on Unix,to
see the difference.As obvious,the slowdown happens in DUMBCOPY.

#include<stdio.h>
#include<string.h>

#define DUMBCOPY for (i = 0; i < 65536; i++) \
destination[i] = source[i]

#define SMARTCOPY memcpy(destination, source, 65536)

int main()
{
char source[65536], destination[65536];
int i, j;
for (j = 0; j < 100; j++)
SMARTCOPY;
/* DUMBCOPY; */
return 0;
}

Below are the reasonings :
The slowdown happens because the source and destination are an exact
multiple of the cache size apart.The particular algorithm used happens
to fill the same line for main memory addresses that are exact multiples
of the cache size apart.

In this particular case both the source and destination use the same
cache line, causing every memory reference to miss the cache and stall
the processor while it waited for regular memory to deliver. The library
memcpy() routine is especially tuned for high performance.
It unrolls the loop to read for one cache line and then write, which
avoids the problem.Using the smart copy, we were able to get a huge
performance improvement. This also shows the folly of drawing
conclusions from simple-minded benchmark programs.

I dont fully understand the above 2 paragraphs,so if someone could give
a better explanation.Would also appreciate any helpful pointers.

This might not be something directly related to C,but I thought I would
get better answers in this newsgroup and hence the posting.

-TIA

Sep 12 '06 #1

Subscribe Post Reply

2037

Jack Klein

On Tue, 12 Sep 2006 21:19:34 +0530, grid <pr******@gmail.comwrote in
comp.lang.c:

Hi,
I was exploring the affect of cache on program
performance/optimisation.Is it the compilers responsibility only to
consider this kind of optimisation or the programmer can do his bit in
this case ?

Any special effort a particular compiler makes to use cache, or any
other hardware feature of the platform, is completely a QOI (Quality
Of Implementation) issue, not a language one. The C language and its
standard define the operation of a correctly written program. They
make no mention of, nor do they place any requirements, on the speed
of efficiency, or any program.

Reading through the "Expert C Programming" text,it mentions how the
below program can be efficient taking the cache details into accont.

The below program can be executed using the two versions of copy
alternatively and running the time command on the executable on Unix,to
see the difference.As obvious,the slowdown happens in DUMBCOPY.

#include<stdio.h>
#include<string.h>

#define DUMBCOPY for (i = 0; i < 65536; i++) \
destination[i] = source[i]

#define SMARTCOPY memcpy(destination, source, 65536)

int main()
{
char source[65536], destination[65536];
int i, j;
for (j = 0; j < 100; j++)
SMARTCOPY;
/* DUMBCOPY; */
return 0;
}

Below are the reasonings :
The slowdown happens because the source and destination are an exact
multiple of the cache size apart.The particular algorithm used happens
to fill the same line for main memory addresses that are exact multiples
of the cache size apart.

In this particular case both the source and destination use the same
cache line, causing every memory reference to miss the cache and stall
the processor while it waited for regular memory to deliver. The library
memcpy() routine is especially tuned for high performance.
It unrolls the loop to read for one cache line and then write, which
avoids the problem.Using the smart copy, we were able to get a huge
performance improvement. This also shows the folly of drawing
conclusions from simple-minded benchmark programs.

I dont fully understand the above 2 paragraphs,so if someone could give
a better explanation.Would also appreciate any helpful pointers.

This might not be something directly related to C,but I thought I would
get better answers in this newsgroup and hence the posting.

Actually, your question is completely off-topic here. As far as C is
concerned, there is no such thing as a cache, cache line, or processor
stall. This is all quite hardware and architecture dependent.

You need to ask questions about this in some sort of platform specific
newsgroup. The moderated group news:comp.lang.asm.x86 is a good place
to discuss the behavior of such things as cache on x86 processors.
You'll have to look to find an appropriate group for other processor
architectures.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html

Sep 12 '06 #2

Similar topics

How increase a C/C++ Program speed ?

by: RAYYILDIZ | last post by:

I Know C is the fastest progrmming language. However, by using some bitwise operation you can get faster the your program. For instance, we talk about swap function. For a integer swapping we use...

C / C++

String concatenation performance

by: Java script Dude | last post by:

In many languages, it is necessary to string together multiple strings into one string for use over multiple lines of code. Which one is the most efficient from the interpreters perspective: ...

Javascript

C# v. C++ Performance

by: Nigel | last post by:

I read that C#'s JIT compiler produces very efficient machine code. However, I've found when performing extensive numerical calculations that C# is less than a fourth the speed of C++. I give code...

C# / C Sharp

Declaring variables and performance

by: rendle | last post by:

I have a MSIL/performance question: Is there any difference between declaring a variable once and assigning to it multiple times, and declaring and assigning multiple times? For example: //...

C# / C Sharp

Should running a program at the smallest call stack depth be pursued?

by: Tony | last post by:

Is there any value to pursuing program designs that mimimize the mainline call stack? For example, within main() I could code up something like: while(GetMsg(m)) DispatchMsg(m); instead of...

C / C++

334

Bug/Gross InEfficiency in HeathField's fgetline program

by: Antoninus Twink | last post by:

The function below is from Richard HeathField's fgetline program. For some reason, it makes three passes through the string (a strlen(), a strcpy() then another pass to change dots) when two would...

C / C++

Does 'if' have performance overhead

by: lali.b97 | last post by:

Somewhere in a tutorial i read that if statement has performance overheads as code within the if statement cannot take benefit of pipeling of microprocessor and also that the compiler cannot...

C / C++

SQL Server 2005 SP2 update 2 rollup Performance tank

by: traceable1 | last post by:

I installed the SQL Server 2005 SP2 update 2 rollup on my 64-bit server and the performance has tanked! I installed rollup 3 on some of them, but that did not seem to help. I thought it...

Microsoft SQL Server

Same program in C and in C#. C# is faster than C. How Come ?

by: c | last post by:

Hi every one, Me and my Cousin were talking about C and C#, I love C and he loves C#..and were talking C is ...blah blah...C# is Blah Blah ...etc and then we decided to write a program that...

C / C++

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA