473,790 Members | 2,951 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

gcc 3.4.3 performance problem illustrated

I was noticing significantly worse performance in some of my C++ codes compiled with gcc 3.4.3
as compared to gcc 3.3.4. I have boiled it down into one relatively short code that illustrates.
It seems to be an issue of excessive cache misses in certain pointer lookup operations in gcc
3.4.3 binaries. BTW, are there any tools to actually count cache misses?

If anyone has a few minutes to compile and run the following code, I would be interested in
knowing if you experience the same problems. I'm running AMD64 athlon 3200 with 1024KB cache. I
compiled with

g++ -O3 -Wall -march=k8

Compiled with gcc 3.3.4 average run time: 2.0 seconds
Compiled with gcc 3.4.3 average run time: 2.9 seconds

I've noticed even more dramatic differences in larger codes that actually do something.

I would be interested in answering the following questions:

1) is this observed only on AMD64, or also x86 ?
2) how does gcc 4.0.0 do?
3) are there compiler options that would improve performance (none that I've tried did)
4) what changed between gcc 3.3 and 3.4 to cause this?

If you have any spare time, I think this is an interesting example, and worth the effort for
someone to figure out. I'm afraid my compiler expertise is not sufficient, so I am asking for
some help. Thanks.

Code:

// run time is anywhere from 33 to 50 % longer when compiled with gcc 3.4.3 compared to 3.3.4
// compiled with g++ -O3 -Wall -march=k8 (same performance lag observed with -O2)
//
// Objects are created in a heirarchy of classes.
// When referenced, it seems that the pointer lookups
// must cause more cache misses in gcc 3.4.3 binaries.

#include <stdio.h>
#include <vector>

class mytype_A {
public:
int id;
mytype_A():id(0 ) {}
};

class mytype_B {
public:
mytype_A* A;
mytype_B(mytype _A* p):A(p) {}
};

class mytype_C {
public:
mytype_B* B;
mytype_C(mytype _B* p):B(p) {}
};
class mytype_D {
public:
// mytype_C* C[2]; // less performance difference if we use simple arrays
std::vector<myt ype_C*> C;
int junk[3]; // affects performance (must cause cache misses)

public:
mytype_D(mytype _A* a0, mytype_A* a1) {
// C[0] = new mytype_C(new mytype_B(a0));
// C[1] = new mytype_C(new mytype_B(a0));
C.push_back(new mytype_C(new mytype_B(a0)));
C.push_back(new mytype_C(new mytype_B(a0)));
}
};

int main() {
int k = 5000; // run-time not linear in k
mytype_A* A[k];
mytype_D* D[k];
for (int i=0;i<=k;i++)
A[i] = new mytype_A();
for (int i=0;i<k;i++)
D[i] = new mytype_D(A[i],A[k-i]); // intentionally make some pointers farther apart

clock_t before = clock();

int k0 = 0;
for (int i=0;i<k;i++) {
k0 = 0;
for (int j=0;j<k;j++) { // run through list of D's, and reference pointers
mytype_D* d = D[j];
if (d->C[0]->B->A->id) k0++;
if (d->C[1]->B->A->id) k0++;
}
}
printf("%d\n",k 0); // don't allow compiler to optimize away k0

printf("time: %f\n",(double)( clock()-before)/CLOCKS_PER_SEC) ;

return 0;
}

--
Kenneth Massey
http://www.masseyratings.com
Jul 23 '05 #1
2 1880
Kenneth Massey wrote:
I was noticing significantly worse performance in some of my C++
codes compiled with gcc 3.4.3 as compared to gcc 3.3.4. I have boiled
it down [...]

I would be interested in answering the following questions:

1) is this observed only on AMD64, or also x86 ?
2) how does gcc 4.0.0 do?
3) are there compiler options that would improve performance (none
that I've tried did) 4) what changed between gcc 3.3 and 3.4 to cause
this?

If you have any spare time, I think this is an interesting example,
and worth the effort for someone to figure out. I'm afraid my
compiler expertise is not sufficient, so I am asking for some help.
[...]


Please re-post this to gnu.g++.help. This is all very compiler-specific
and as such not a C++ *language* issue but rather a compiler issue. You
should be able to get much better help in the newsgroup for your compiler.

Thanks.

V
Jul 23 '05 #2
Kenneth Massey wrote:
I was noticing significantly worse performance in some of my C++ codes
compiled with gcc 3.4.3 as compared to gcc 3.3.4. I have boiled it down
into one relatively short code that illustrates. It seems to be an issue
of excessive cache misses in certain pointer lookup operations in gcc
3.4.3 binaries. BTW, are there any tools to actually count cache misses?

If anyone has a few minutes to compile and run the following code, I would
be interested in knowing if you experience the same problems. I'm running
AMD64 athlon 3200 with 1024KB cache. I compiled with

g++ -O3 -Wall -march=k8

Compiled with gcc 3.3.4 average run time: 2.0 seconds
Compiled with gcc 3.4.3 average run time: 2.9 seconds

[snip]

My results:
~/Projects/stl_string> ./mytest
0
time: 5.210000

Compiled as:
g++ -O3 -Wall -march=athlon -o mytest main.cpp

My specs:
AMD Athlon 1800+
1GB PC2700 DDR
SuSE 9.1 Pro

~/Projects/stl_string> g++ -v
g++ -v
Reading specs from /usr/lib/gcc-lib/i586-suse-linux/3.3.3/specs
Configured with: ../configure --enable-threads=posix --prefix=/usr
--with-local-prefix=/usr/local --infodir=/usr/share/info
--mandir=/usr/share/man --enable-languages=c,c++ ,f77,objc,java, ada
--disable-checking --libdir=/usr/lib --enable-libgcj
--with-gxx-include-dir=/usr/include/g++ --with-slibdir=/lib
--with-system-zlib --enable-shared --enable-__cxa_atexit i586-suse-linux
Thread model: posix
gcc version 3.3.3 (SuSE Linux)
Hope this helps.

Alvin

Jul 23 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

25
3493
by: Brian Patterson | last post by:
I have noticed in the book of words that hasattr works by calling getattr and raising an exception if no such attribute exists. If I need the value in any case, am I better off using getattr within a try statement myself, or is there some clever implementation enhancement which makes this a bad idea? i.e. should I prefer: if hasattr(self,"datum"): datum=getattr("datum") else: datum=None
12
8354
by: serge | last post by:
I have an SP that is big, huge, 700-800 lines. I am not an expert but I need to figure out every possible way that I can improve the performance speed of this SP. In the next couple of weeks I will work on preparing SQL statements that will create the tables, insert sample record and run the SP. I would hope people will look at my SP and give me any hints on how I can better write the SP.
62
3399
by: christopher diggins | last post by:
Since nobody responded to my earlier post , I thought I would try to explain what I am doing a bit differently. When multiply inheriting pure virtual (abstract) base classes, a class obviously bloats quickly for each new vtable needed. Execution slows down considerably as well. You can work around this by using interfaces referemnces which have a pointer to the object and a pointer to an external function lookup table. This technique...
6
2324
by: teedilo | last post by:
We have an application with a SQL Server 2000 back end that is fairly database intensive -- lots of fairly frequent queries, inserts, updates -- the gamut. The application does not make use of performance hogs like cursors, but I know there are lots of ways the application could be made more efficient database-wise. The server code is running VB6 of all things, using COM+ database interfaces. There are some clustered and non-clustered...
5
4007
by: Scott | last post by:
I have a customer that had developed an Access97 application to track their business information. The application grew significantly and they used the Upsizing Wizard to move the tables to SQL 2000. Of course there were no modifications made to the queries and they noticed significant performance issues. They recently upgraded the application to Access XP expecting the newer version to provide performance benefits and now queries take...
115
7653
by: Mark Shelor | last post by:
I've encountered a troublesome inconsistency in the C-language Perl extension I've written for CPAN (Digest::SHA). The problem involves the use of a static array within a performance-critical transform function. When compiling under gcc on my big-endian PowerPC (Mac OS X), declaring this array as "static" DECREASES the transform throughput by around 5%. However, declaring it as "static" on gcc/Linux/Intel INCREASES the throughput by...
13
2767
by: bjarne | last post by:
Willy Denoyette wrote; > ... it > was not the intention of StrousTrup to the achieve the level of efficiency > of C when he invented C++, ... Ahmmm. It was my aim to match the performance of C and I achieved that aim very early on. See, for example "The Design and Evolution of C++". -- Bjarne Stroustrup; http://www.research.att.com/~bs
22
3363
by: Kevin Murphy | last post by:
I'm using PG 7.4.3 on Mac OS X. I am disappointed with the performance of queries like 'select foo from bar where baz in (subquery)', or updates like 'update bar set foo = 2 where baz in (subquery)'. PG always seems to want to do a sequential scan of the bar table. I wish there were a way of telling PG, "use the index on baz in your plan, because I know that the subquery will return very few results". Where it really matters, I have...
1
2455
by: jvn | last post by:
I am experiencing a particular problem with performance counters. I have created a set of classes, that uses System.Diagnostics.PerformanceCounter to increment custom performance counters (using .Net 2.0) The performance counter categories have been successfully created. When the set of classes are used by a WinForm test harness application, they function as expected, and the performance counters can be seen to be updated by using the...
0
9666
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9512
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10201
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9023
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5424
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5552
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4100
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3709
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2910
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.