473,405 Members | 2,279 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

g++ loop unrolling performance

Hi all

I am using the boost::array template class trying to generalize my
handcrafted
vector specialization for dimensions 2 (class vec2), 3 (class vec3) etc.

As performance is of greatest importance I have written an initial
benchmarker that tests how well g++ can unroll loops whose number of
iterations
can be determined at compile time or upon entry to the loop. The gcc switch
"-funroll-loops" should do just that. The test program calculates the
dotproduct of two four-dimensional arrays of int 10 million times and
looks like follows:

#include "../array.hh"
#include "../Timer.hh"

using boost::array;
using std::cout;
using std::endl;

template <typename T, std::size_t N>
inline T general_dot(const array<T, N> & a, const array<T, N> & b)
{
T c = 0;
for (size_t i = 0; i < N; i++)
{
c += a[i] * b[i];
}
return c;
}

template <typename T>
inline T special_dot(const array<T, 4> & a, const array<T, 4> & b)
{
return (a[0] * b[0] +
a[1] * b[1] +
a[2] * b[2] +
a[3] * b[3]);
}

int main(int argc, char * argv[])
{
typedef array<int, 4> T;

T a(3);

cout << "a: " << a << endl;

a[0] = 11;
a[1] = 13;
a[2] = 17;
a[3] = 19;

cout << "a: " << a << endl;

T b = a;

Timer t;

const unsigned int nloops = 10000000;

unsigned int sum = 0;
t.reset();
for (unsigned int i = 0; i < nloops; i++)
{
sum += general_dot(a, b);
}
t.read();
cout << "general: " << t << endl;

unsigned int tum = 0;
t.reset();
for (unsigned int i = 0; i < nloops; i++)
{
tum += special_dot(a, b);
}
t.read();
cout << "special: " << t << endl;

if (sum == tum)
{
cout << "Checksums are equal. OK" << endl;
}
else
{
cout << "Checksums are not equal. NOT OK" << endl;
}

return 0;
}

The calculation is performed with a general and a specialized version of
the dot product: general_dot() and special_dot() respectively.

However the performance of the general_dot() is terrible compared to the
special_dot(). Around 35 times slower when I compile it with gcc-3.3.2 using
the switches "-O3 -funroll-all-loops".

Is gcc really that lame or have I forgotten something?
Many thanks in advance,

Per Nordlöw
Swedish Defence Research Agency
Linköping
Sweden

Jul 22 '05 #1
1 2928
On Tue, 31 Aug 2004 08:51:12 +0200, Per Nordlöw <pe*@foi.se> wrote in
comp.lang.c++:
Hi all

I am using the boost::array template class trying to generalize my
handcrafted
vector specialization for dimensions 2 (class vec2), 3 (class vec3) etc.

As performance is of greatest importance I have written an initial
benchmarker that tests how well g++ can unroll loops whose number of
iterations
can be determined at compile time or upon entry to the loop. The gcc switch
"-funroll-loops" should do just that. The test program calculates the
dotproduct of two four-dimensional arrays of int 10 million times and
looks like follows:
[snip]
The calculation is performed with a general and a specialized version of
the dot product: general_dot() and special_dot() respectively.

However the performance of the general_dot() is terrible compared to the
special_dot(). Around 35 times slower when I compile it with gcc-3.3.2 using
the switches "-O3 -funroll-all-loops".

Is gcc really that lame or have I forgotten something?


Questions about gcc and specific options should be addressed to one of
the news:gnu.gcc.* groups. The C++ language does not define
optimization options at all, not to mention those of specific
compilers.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html
Jul 22 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: John Edwards | last post by:
Hello, I have sort of a newbie question. I'm trying to optimize a loop by breaking it into two passes. Example: for(i = 0; i < max; i++) {
47
by: Mountain Bikn' Guy | last post by:
Take some standard code such as shown below. It simply loops to add up a series of terms and it produces the correct result. // sum numbers with a loop public int DoSumLooping(int iterations) {...
3
by: pertheli | last post by:
Hello, I have a large array of pointer to some object. I have to run test such that every possible pair in the array is tested. eg. if A,B,C,D are items of the array, possible pairs are AB, AC,...
22
by: Jan Richter | last post by:
Hi there, the Code below shows DJBs own implementation of strlen (str_len): unsigned int str_len(char *s) { register char *t; t = s; for (;;) { if (!*t) return t - s; ++t;
10
by: Bhan | last post by:
I heard for(i=0;i<20;i++) { do-something; } Can be optimized. Is this can really optimized by an equivalent for loop or with while or do while loops?
9
by: V | last post by:
Hello: Consider the following nested for loop: uint64 TABLE; for (i=0; i<=7; i++) for (j=1; j<=7; j++) for (k=1; k<=(1<<j)-1; k++) TABLE = (TABLE) ^ (TABLE);
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.