473,386 Members | 1,758 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Cache

Hi,
I would like to know which high level techniques can improve the number
of cache hits. I am not trying to rewrite some code in assembler, just
want to know which things might be considered to help a good compiler
"understanding" my code better. Some examples or links would be very
helpful.

Thanks in advance,
Thomas Kowalski

Sep 25 '06 #1
7 2391
Thomas Kowalski wrote:
I would like to know which high level techniques can improve the
number of cache hits. I am not trying to rewrite some code in
assembler, just want to know which things might be considered to help
a good compiler "understanding" my code better. Some examples or
links would be very helpful.
Last time I came across the recommendations of this sort it was mostly
to place objects in such a way that when you follow the references (not
C++ references, but generally using pointers, indices, etc) from one to
another, the objects found (resolved) would be close-by.

Beware, those are always platform-specific, and so is the advice you'll
receive, I am fairly certain. Try a good book, like "Efficient C++".

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Sep 25 '06 #2
Thomas Kowalski wrote:
Hi,
I would like to know which high level techniques can improve the number
of cache hits. I am not trying to rewrite some code in assembler, just
want to know which things might be considered to help a good compiler
"understanding" my code better. Some examples or links would be very
helpful.
My reply is perhaps slightly OT here: Have you tried profiling it with
valgrind's cachegrind, or the frontend to it, kcachegrind? I can't give
many hard and fast rules (except for the case of linear algebra), but I
imagine kcachegrind could give you a somewhat better view of what's
going on.

Alan
Sep 25 '06 #3
Thomas Kowalski wrote:
Hi,
I would like to know which high level techniques can improve the number
of cache hits. I am not trying to rewrite some code in assembler, just
want to know which things might be considered to help a good compiler
"understanding" my code better. Some examples or links would be very
helpful.
I imagine this would be hard on the source level, but one
advice springs to mind (similar to what Victor had pointed out)
-- it is usually beneficial to access memory addresses that
are close together, in an increasing order. For instance

int arr[1000][1000];
long int sum=0;

for(int i=0; i<1000; ++i)
for(int j=0; j<1000; ++j)
sum+=arr[i][j];

would probably run significantly faster than

for(int i=0; i<1000; ++i)
for(int j=0; j<1000; ++j)
sum+=arr[j][i];

because in the first case the memory occupied by arr is
traversed linearly. Therefore it is sometimes beneficial
to have a transposed copy of an array that you need to
access in the latter fashion.

<OT>
Other than that, I remember that aligning objects to
some boundaries may speed things up. Try looking up
"alignment" or __declspec(align(n)) in your compiler's
manual. For example on the Itanium 2 it is beneficial
to align variables at 16-byte boundaries. The cache
also usually displays this property that certain
alignments are ill advised, because they interfere
with the cache's internal hashing algorithm. For
instance on the Itanium 2 accessing variables in
succession that all lie at addresses divisible by 2048
causes very poor performance.
</OT>

As a last note -- definitely use a profiler to
check where the bottleneck lies. It quite often
lies far from where you'd expect it to be by
inspecting the source code.

HTH,
- J.
Sep 25 '06 #4
Thomas Kowalski wrote:
Hi,
I would like to know which high level techniques can improve the number
of cache hits. I am not trying to rewrite some code in assembler, just
want to know which things might be considered to help a good compiler
"understanding" my code better. Some examples or links would be very
helpful.
As others have pointed out, it's obviously very platform-specific.
Ideally, the part of your data structure that you are accessing should
fit into cache. There are various tricks already mentioned by others
like alignment, but again, these are just too specific to the
particular processor you're using.

Regards,
Bart.

Sep 25 '06 #5
Jacek Dziedzic wrote:
I imagine this would be hard on the source level, but one
advice springs to mind (similar to what Victor had pointed out)
-- it is usually beneficial to access memory addresses that
are close together, in an increasing order. For instance

int arr[1000][1000];
long int sum=0;

for(int i=0; i<1000; ++i)
for(int j=0; j<1000; ++j)
sum+=arr[i][j];

would probably run significantly faster than

for(int i=0; i<1000; ++i)
for(int j=0; j<1000; ++j)
sum+=arr[j][i];

because in the first case the memory occupied by arr is
traversed linearly.
That's entirely dependent on the situation. If the whole structure fits
into cache then that wouldn't be the case. Okay, arguably not too many
computers have 1MB of cache, but the point is that you have to analyze
each case separately. General advice on this particular topic is rarely
useful and gets obsolete very quickly.

Regards,
Bart.

Sep 25 '06 #6
Bart wrote:
Jacek Dziedzic wrote:
I imagine this would be hard on the source level, but one
advice springs to mind (similar to what Victor had pointed out)
-- it is usually beneficial to access memory addresses that
are close together, in an increasing order. For instance

int arr[1000][1000];
long int sum=0;

for(int i=0; i<1000; ++i)
for(int j=0; j<1000; ++j)
sum+=arr[i][j];

would probably run significantly faster than

for(int i=0; i<1000; ++i)
for(int j=0; j<1000; ++j)
sum+=arr[j][i];

because in the first case the memory occupied by arr is
traversed linearly.

That's entirely dependent on the situation. If the whole structure fits
into cache then that wouldn't be the case. Okay, arguably not too many
computers have 1MB of cache,
For some reason, I'm somewhat careless today. I meant 1 million ints
but wrote 1MB instead.

Sorry for that,

Regards,
Bart.

Sep 25 '06 #7
Thomas Kowalski wrote:
Hi,
I would like to know which high level techniques can improve the number
of cache hits. I am not trying to rewrite some code in assembler, just
want to know which things might be considered to help a good compiler
"understanding" my code better. Some examples or links would be very
helpful.
One of the most effective techniques is "blocking" or taking chunks of
your data and operating on the chunks.

e.g. If I was doing this:

for ( i = 0; i < 1000; ++ i ) {
for ( j = 0; j < 1000; ++ j ) {
magic....
}
}

it can be turned into :

for ( i = 0; i < 1000; i += 10 ) {
for ( j = 0; j < 1000; j += 10) {
type temp_data[10];
for ( ii = i; ii < i + 10; ++i ) {
for ( jj = j; jj < j + 10; ++j ) {
tmp_data[ii-i] += magic ....
}
}
}
}

This is very simplistic, but, basically the idea is to keep as much data
in cache. I have seen 100x speed increases using this technique on
large data.
Sep 25 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: martin | last post by:
Hi, I am storing a dataset in cache, which is happening fine. I can easily retrive it at postback from the cache, cast it to a dataset and reuse it. However I have specified that the cache...
5
by: Darrel | last post by:
I thought this warranted a new thread. Yesterday I asked about access relatively static content...is it better to read from the DB, or just grab a text file. It was suggested that I use the DB...
14
by: Tom.PesterDELETETHISSS | last post by:
Hi, I think this question requires an in depth understanding of how a browser cache works. I hope I can reach an expert here. I may have found a quirk in the asp.net documentation or I don't...
1
by: William Sullivan | last post by:
I'm trying to nail down some issues with the cache in my application. Currently, I have an object that stands between my business logic and database logic called CacheLogic (cute, no?). ...
13
by: Andrew Morton | last post by:
I am caching some data in VB.NET using System.Web.Caching, is it possible to lock the cache so that other sessions attempting to access the same cache wait when it is being updated? I have the...
26
by: Ed L. | last post by:
Here's some of my current notions on pgsql performance tuning strictly as it relates to pgsql tuning parameters in the context of a dedicated linux or hpux server. I'm particularly focusing on...
18
by: siddharthkhare | last post by:
Hi All, what is the diference between these two cache control header. no-cache and no-store. I have read the w3.org explanation. So lets say I am using only no-cache ....my understanding is...
0
by: mateipuiu | last post by:
When a try to run a client build on 2005, which uses the Microsoft.ApplicationBlocks.Cache.dll reference, when using a Microsoft.ApplicationBlocks.Cache.dll created on Debug mode, the client works...
5
by: Stan SR | last post by:
Hi, Some newbie questions.. :-) First, what is the namespace to use for the Cache class ? When I use this bit of code I get an error if (Cache==null) Cache.Insert("myUserList",userlist);...
0
by: =?Utf-8?B?YmlqYXk=?= | last post by:
The type initializer for 'Microsoft.ApplicationBlocks.Cache.CacheService' threw an exception. We migrated our windows application from 1.1 to 2.0. The debug and Release mode of the application...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.