g++ loop unrolling performance

Per Nordlöw

Hi all

I am using the boost::array template class trying to generalize my
handcrafted
vector specialization for dimensions 2 (class vec2), 3 (class vec3) etc.

As performance is of greatest importance I have written an initial
benchmarker that tests how well g++ can unroll loops whose number of
iterations
can be determined at compile time or upon entry to the loop. The gcc switch
"-funroll-loops" should do just that. The test program calculates the
dotproduct of two four-dimensional arrays of int 10 million times and
looks like follows:

#include "../array.hh"
#include "../Timer.hh"

using boost::array;
using std::cout;
using std::endl;

template <typename T, std::size_t N>
inline T general_dot(const array<T, N> & a, const array<T, N> & b)
{
T c = 0;
for (size_t i = 0; i < N; i++)
{
c += a[i] * b[i];
}
return c;
}

template <typename T>
inline T special_dot(const array<T, 4> & a, const array<T, 4> & b)
{
return (a[0] * b[0] +
a[1] * b[1] +
a[2] * b[2] +
a[3] * b[3]);
}

int main(int argc, char * argv[])
{
typedef array<int, 4> T;

T a(3);

cout << "a: " << a << endl;

a[0] = 11;
a[1] = 13;
a[2] = 17;
a[3] = 19;

cout << "a: " << a << endl;

T b = a;

Timer t;

const unsigned int nloops = 10000000;

unsigned int sum = 0;
t.reset();
for (unsigned int i = 0; i < nloops; i++)
{
sum += general_dot(a, b);
}
t.read();
cout << "general: " << t << endl;

unsigned int tum = 0;
t.reset();
for (unsigned int i = 0; i < nloops; i++)
{
tum += special_dot(a, b);
}
t.read();
cout << "special: " << t << endl;

if (sum == tum)
{
cout << "Checksums are equal. OK" << endl;
}
else
{
cout << "Checksums are not equal. NOT OK" << endl;
}

return 0;
}

The calculation is performed with a general and a specialized version of
the dot product: general_dot() and special_dot() respectively.

However the performance of the general_dot() is terrible compared to the
special_dot(). Around 35 times slower when I compile it with gcc-3.3.2 using
the switches "-O3 -funroll-all-loops".

Is gcc really that lame or have I forgotten something?
Many thanks in advance,

Per Nordlöw
Swedish Defence Research Agency
Linköping
Sweden

Jul 22 '05 #1

Subscribe Post Reply

2928

Jack Klein

On Tue, 31 Aug 2004 08:51:12 +0200, Per Nordlöw <pe*@foi.se> wrote in
comp.lang.c++:

Hi all

I am using the boost::array template class trying to generalize my
handcrafted
vector specialization for dimensions 2 (class vec2), 3 (class vec3) etc.

As performance is of greatest importance I have written an initial
benchmarker that tests how well g++ can unroll loops whose number of
iterations
can be determined at compile time or upon entry to the loop. The gcc switch
"-funroll-loops" should do just that. The test program calculates the
dotproduct of two four-dimensional arrays of int 10 million times and
looks like follows:
[snip]
The calculation is performed with a general and a specialized version of
the dot product: general_dot() and special_dot() respectively.

However the performance of the general_dot() is terrible compared to the
special_dot(). Around 35 times slower when I compile it with gcc-3.3.2 using
the switches "-O3 -funroll-all-loops".

Is gcc really that lame or have I forgotten something?

Questions about gcc and specific options should be addressed to one of
the news:gnu.gcc.* groups. The C++ language does not define
optimization options at all, not to mention those of specific
compilers.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html

Jul 22 '05 #2

by: John Edwards | last post by:

Hello, I have sort of a newbie question. I'm trying to optimize a loop by breaking it into two passes. Example: for(i = 0; i < max; i++) {

C / C++

Unwinding a loop: why can't .NET add 1 + 2 + 3 ... ?

by: Mountain Bikn' Guy | last post by:

Take some standard code such as shown below. It simply loops to add up a series of terms and it produces the correct result. // sum numbers with a loop public int DoSumLooping(int iterations) {...

.NET Framework

loop performance question

by: pertheli | last post by:

Hello, I have a large array of pointer to some object. I have to run test such that every possible pair in the array is tested. eg. if A,B,C,D are items of the array, possible pairs are AB, AC,...

C / C++

duff's device / loop unriolling

by: Jan Richter | last post by:

Hi there, the Code below shows DJBs own implementation of strlen (str_len): unsigned int str_len(char *s) { register char *t; t = s; for (;;) { if (!*t) return t - s; ++t;

C / C++

How to optimize for loop?

by: Bhan | last post by:

I heard for(i=0;i<20;i++) { do-something; } Can be optimized. Is this can really optimized by an equivalent for loop or with while or do while loops?

C / C++

unrolling nested for-loop

by: V | last post by:

Hello: Consider the following nested for loop: uint64 TABLE; for (i=0; i<=7; i++) for (j=1; j<=7; j++) for (k=1; k<=(1<<j)-1; k++) TABLE = (TABLE) ^ (TABLE);

C / C++

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

g++ loop unrolling performance

Similar topics