Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old September 25th, 2006, 04:15 PM
Thomas Kowalski
Guest
 
Posts: n/a
Default Cache

Hi,
I would like to know which high level techniques can improve the number
of cache hits. I am not trying to rewrite some code in assembler, just
want to know which things might be considered to help a good compiler
"understanding" my code better. Some examples or links would be very
helpful.

Thanks in advance,
Thomas Kowalski

  #2  
Old September 25th, 2006, 04:45 PM
Victor Bazarov
Guest
 
Posts: n/a
Default Re: Cache

Thomas Kowalski wrote:
Quote:
I would like to know which high level techniques can improve the
number of cache hits. I am not trying to rewrite some code in
assembler, just want to know which things might be considered to help
a good compiler "understanding" my code better. Some examples or
links would be very helpful.
Last time I came across the recommendations of this sort it was mostly
to place objects in such a way that when you follow the references (not
C++ references, but generally using pointers, indices, etc) from one to
another, the objects found (resolved) would be close-by.

Beware, those are always platform-specific, and so is the advice you'll
receive, I am fairly certain. Try a good book, like "Efficient C++".

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask


  #3  
Old September 25th, 2006, 04:55 PM
Alan Woodland
Guest
 
Posts: n/a
Default Re: Cache

Thomas Kowalski wrote:
Quote:
Hi,
I would like to know which high level techniques can improve the number
of cache hits. I am not trying to rewrite some code in assembler, just
want to know which things might be considered to help a good compiler
"understanding" my code better. Some examples or links would be very
helpful.
My reply is perhaps slightly OT here: Have you tried profiling it with
valgrind's cachegrind, or the frontend to it, kcachegrind? I can't give
many hard and fast rules (except for the case of linear algebra), but I
imagine kcachegrind could give you a somewhat better view of what's
going on.

Alan
  #4  
Old September 25th, 2006, 08:35 PM
Jacek Dziedzic
Guest
 
Posts: n/a
Default Re: Cache

Thomas Kowalski wrote:
Quote:
Hi,
I would like to know which high level techniques can improve the number
of cache hits. I am not trying to rewrite some code in assembler, just
want to know which things might be considered to help a good compiler
"understanding" my code better. Some examples or links would be very
helpful.
I imagine this would be hard on the source level, but one
advice springs to mind (similar to what Victor had pointed out)
-- it is usually beneficial to access memory addresses that
are close together, in an increasing order. For instance

int arr[1000][1000];
long int sum=0;

for(int i=0; i<1000; ++i)
for(int j=0; j<1000; ++j)
sum+=arr[i][j];

would probably run significantly faster than

for(int i=0; i<1000; ++i)
for(int j=0; j<1000; ++j)
sum+=arr[j][i];

because in the first case the memory occupied by arr is
traversed linearly. Therefore it is sometimes beneficial
to have a transposed copy of an array that you need to
access in the latter fashion.

<OT>
Other than that, I remember that aligning objects to
some boundaries may speed things up. Try looking up
"alignment" or __declspec(align(n)) in your compiler's
manual. For example on the Itanium 2 it is beneficial
to align variables at 16-byte boundaries. The cache
also usually displays this property that certain
alignments are ill advised, because they interfere
with the cache's internal hashing algorithm. For
instance on the Itanium 2 accessing variables in
succession that all lie at addresses divisible by 2048
causes very poor performance.
</OT>

As a last note -- definitely use a profiler to
check where the bottleneck lies. It quite often
lies far from where you'd expect it to be by
inspecting the source code.

HTH,
- J.
  #5  
Old September 25th, 2006, 08:55 PM
Bart
Guest
 
Posts: n/a
Default Re: Cache

Thomas Kowalski wrote:
Quote:
Hi,
I would like to know which high level techniques can improve the number
of cache hits. I am not trying to rewrite some code in assembler, just
want to know which things might be considered to help a good compiler
"understanding" my code better. Some examples or links would be very
helpful.
As others have pointed out, it's obviously very platform-specific.
Ideally, the part of your data structure that you are accessing should
fit into cache. There are various tricks already mentioned by others
like alignment, but again, these are just too specific to the
particular processor you're using.

Regards,
Bart.

  #6  
Old September 25th, 2006, 09:05 PM
Bart
Guest
 
Posts: n/a
Default Re: Cache

Jacek Dziedzic wrote:
Quote:
I imagine this would be hard on the source level, but one
advice springs to mind (similar to what Victor had pointed out)
-- it is usually beneficial to access memory addresses that
are close together, in an increasing order. For instance
>
int arr[1000][1000];
long int sum=0;
>
for(int i=0; i<1000; ++i)
for(int j=0; j<1000; ++j)
sum+=arr[i][j];
>
would probably run significantly faster than
>
for(int i=0; i<1000; ++i)
for(int j=0; j<1000; ++j)
sum+=arr[j][i];
>
because in the first case the memory occupied by arr is
traversed linearly.
That's entirely dependent on the situation. If the whole structure fits
into cache then that wouldn't be the case. Okay, arguably not too many
computers have 1MB of cache, but the point is that you have to analyze
each case separately. General advice on this particular topic is rarely
useful and gets obsolete very quickly.

Regards,
Bart.

  #7  
Old September 25th, 2006, 09:05 PM
Bart
Guest
 
Posts: n/a
Default Re: Cache

Bart wrote:
Quote:
Jacek Dziedzic wrote:
Quote:
I imagine this would be hard on the source level, but one
advice springs to mind (similar to what Victor had pointed out)
-- it is usually beneficial to access memory addresses that
are close together, in an increasing order. For instance

int arr[1000][1000];
long int sum=0;

for(int i=0; i<1000; ++i)
for(int j=0; j<1000; ++j)
sum+=arr[i][j];

would probably run significantly faster than

for(int i=0; i<1000; ++i)
for(int j=0; j<1000; ++j)
sum+=arr[j][i];

because in the first case the memory occupied by arr is
traversed linearly.
>
That's entirely dependent on the situation. If the whole structure fits
into cache then that wouldn't be the case. Okay, arguably not too many
computers have 1MB of cache,
For some reason, I'm somewhat careless today. I meant 1 million ints
but wrote 1MB instead.

Sorry for that,

Regards,
Bart.

  #8  
Old September 25th, 2006, 10:25 PM
Gianni Mariani
Guest
 
Posts: n/a
Default Re: Cache

Thomas Kowalski wrote:
Quote:
Hi,
I would like to know which high level techniques can improve the number
of cache hits. I am not trying to rewrite some code in assembler, just
want to know which things might be considered to help a good compiler
"understanding" my code better. Some examples or links would be very
helpful.
>
One of the most effective techniques is "blocking" or taking chunks of
your data and operating on the chunks.

e.g. If I was doing this:

for ( i = 0; i < 1000; ++ i ) {
for ( j = 0; j < 1000; ++ j ) {
magic....
}
}

it can be turned into :

for ( i = 0; i < 1000; i += 10 ) {
for ( j = 0; j < 1000; j += 10) {
type temp_data[10];
for ( ii = i; ii < i + 10; ++i ) {
for ( jj = j; jj < j + 10; ++j ) {
tmp_data[ii-i] += magic ....
}
}
}
}

This is very simplistic, but, basically the idea is to keep as much data
in cache. I have seen 100x speed increases using this technique on
large data.


 

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles