473,787 Members | 2,931 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Performance measurement and optimization levels


For instance, we need to measure performance
of assignment 'ch1 = ch2' where ch1 and ch2 are of char type.
We need to do that for different optimization levels of the same compiler.
Here is some test program.
Environment
-----------
Windows 2000
Intel (R) Celeron (R) CPU 1.70 GHz
GNU g++ 3.3.1 (cygming special), MINGW

========== C++ code : foo.cpp : BEGIN ==========
// Note. To simplify this demo program
// the clock() return value isn't checked
// ---------------------------------------------
#include <ctime>
#include <iostream>
using namespace std;

int main()
{
clock_t t0, tn;
unsigned long i = 0;
char ch;

#define REPETITIONS 100000000

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) {}
tn = clock ();
cout << "Do noting : " << (tn - t0) << " ticks" << endl;

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) ch = 'a';
tn = clock ();

cout << "Do something : " << (tn - t0) << " ticks" << endl;

return 0;
}
========== C++ code : foo.cpp : END ============
========= Compilation : BEGIN =========

$ g++ --version
g++ (GCC) 3.3.1 (cygming special)
[---omitted---]

$ g++ -mno-cygwin foo.cpp -o a0

$ g++ -mno-cygwin -O1 foo.cpp -o a1

$ g++ -mno-cygwin -O2 foo.cpp -o a2

$ g++ -mno-cygwin -O3 foo.cpp -o a3

$ wc *.exe
394 5333 424460 a0.exe
398 5294 424460 a1.exe
397 5293 424460 a2.exe
396 5303 424478 a3.exe
1585 21223 1697858 total

========= Compilation : END ===========
========= Run : BEGIN =========

$ a0
Do noting : 250 ticks
Do something : 371 ticks

$ a1
Do noting : 120 ticks
Do something : 130 ticks

$ a2
Do noting : 120 ticks
Do something : 120 ticks

$ a3
Do noting : 120 ticks
Do something : 120 ticks

========= Run : END ===========
We can see that only a0 generates believable results.
Most probably, assignment ch = 'a' in a1, a2, a3 is performed without loop.

So, how should one measure performance in the program above for optimization levels O1, O2, O3?
--
Alex Vinokur
http://mathforum.org/library/view/10978.html
http://sourceforge.net/users/alexvn

Jul 22 '05 #1
3 2149
On Wed, 21 Jul 2004 08:40:45 +0300, Alex Vinokur wrote:

For instance, we need to measure performance
of assignment 'ch1 = ch2' where ch1 and ch2 are of char type.
We need to do that for different optimization levels of the same compiler.
Very likely this specific operation will be the same at all levels -- some
approximation of mov ch1, ch2.

....

(Source and build commands kept for context) t0 = clock ();
for (i = 0; i < REPETITIONS; i++) {}
tn = clock ();
cout << "Do noting : " << (tn - t0) << " ticks" << endl; t0 = clock ();
for (i = 0; i < REPETITIONS; i++) ch = 'a';
tn = clock ();
cout << "Do something : " << (tn - t0) << " ticks" << endl;
$ g++ -mno-cygwin foo.cpp -o a0
$ a0
Do noting : 250 ticks
Do something : 371 ticks $ g++ -mno-cygwin -O1 foo.cpp -o a1
$ a1
Do noting : 120 ticks
Do something : 130 ticks .... $ g++ -mno-cygwin -O3 foo.cpp -o a3
$ a3
Do noting : 120 ticks
Do something : 120 ticks We can see that only a0 generates believable results.
Most probably, assignment ch = 'a' in a1, a2, a3 is performed without loop.

So, how should one measure performance in the program above for optimization levels O1, O2, O3?


What, exactly, were you expecting the optimizer to do? *Not* optimize
your program?

--
Some say the Wired doesn't have political borders like the real world,
but there are far too many nonsense-spouting anarchists or idiots who
think that pranks are a revolution.

Jul 22 '05 #2
Alex Vinokur wrote:
For instance, we need to measure performance
of assignment 'ch1 = ch2' where ch1 and ch2 are of char type.
We need to do that for different optimization levels of the same compiler.
Here is some test program.
Environment
-----------
Windows 2000
Intel (R) Celeron (R) CPU 1.70 GHz
GNU g++ 3.3.1 (cygming special), MINGW

========== C++ code : foo.cpp : BEGIN ==========
// Note. To simplify this demo program
// the clock() return value isn't checked
// ---------------------------------------------
#include <ctime>
#include <iostream>
using namespace std;

int main()
{
clock_t t0, tn;
unsigned long i = 0;
char ch;

#define REPETITIONS 100000000

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) {}
tn = clock ();
cout << "Do noting : " << (tn - t0) << " ticks" << endl;

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) ch = 'a';
tn = clock ();

cout << "Do something : " << (tn - t0) << " ticks" << endl;

return 0;
}
========== C++ code : foo.cpp : END ============
========= Compilation : BEGIN =========

$ g++ --version
g++ (GCC) 3.3.1 (cygming special)
[---omitted---]

$ g++ -mno-cygwin foo.cpp -o a0

$ g++ -mno-cygwin -O1 foo.cpp -o a1

$ g++ -mno-cygwin -O2 foo.cpp -o a2

$ g++ -mno-cygwin -O3 foo.cpp -o a3

$ wc *.exe
394 5333 424460 a0.exe
398 5294 424460 a1.exe
397 5293 424460 a2.exe
396 5303 424478 a3.exe
1585 21223 1697858 total

========= Compilation : END ===========
========= Run : BEGIN =========

$ a0
Do noting : 250 ticks
Do something : 371 ticks

$ a1
Do noting : 120 ticks
Do something : 130 ticks

$ a2
Do noting : 120 ticks
Do something : 120 ticks

$ a3
Do noting : 120 ticks
Do something : 120 ticks

========= Run : END ===========
We can see that only a0 generates believable results.
a1, a2 and a3 are IMHO believable too. In fact with a good optimizer I
would expect results close to 0 ticks, because with this code the 'for'
loops can be completely eliminated.
Most probably, assignment ch = 'a' in a1, a2, a3 is performed without loop.

So, how should one measure performance in the program above for optimization levels O1, O2, O3?


Keep in mind that code that has no observable effects can be completely
optimized away by the optimizer. Since in your code 'ch' is assigned to
but never used, the optimizer can replace the assignment with nothing.
To prevent this optimization you could for example output the ch
variable after the loop has completed:

Also the 'for' loop can be replaced with something that has the same
effect (which may be nothing). For example:

for (i = 0; i < REPETITIONS; i++) ch = 'a';

Can be replaced with:

ch = 'a';

MSVC can do this optimization, and can handle even more complex cases.
For example with optimization enabled the following code:

int main()
{
int i = 10;

for(int j= 0; j < 10; ++j)
{
i += 10;
}

return i;
}

Will produce the equivalent of:

int main()
{
return 110;
}

Like I said in another thread; making a good benchmark is extremely
tricky. Artifical code like you posted, is prone to produce
non-representative benchmark results.

--
Peter van Merkerk
peter.van.merke rk(at)dse.nl
Jul 22 '05 #3
"Alex Vinokur" <al****@big-foot.com> wrote in message
news:2m******** ****@uni-berlin.de...

int main()
{
clock_t t0, tn;
unsigned long i = 0;
char ch;

#define REPETITIONS 100000000

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) {}
A good optimizer will optimize the above out of existence. It does nothing
anyway.
tn = clock ();
cout << "Do noting : " << (tn - t0) << " ticks" << endl;

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) ch = 'a';
A good compiler will optimize the above loop to { ch = 'a'; }, just one
assignment.
tn = clock ();

cout << "Do something : " << (tn - t0) << " ticks" << endl;

return 0;
} $ a0
Do noting : 250 ticks
Do something : 371 ticks

$ a1
Do noting : 120 ticks
Do something : 130 ticks

$ a2
Do noting : 120 ticks
Do something : 120 ticks

$ a3
Do noting : 120 ticks
Do something : 120 ticks

========= Run : END ===========
We can see that only a0 generates believable results.
Most probably, assignment ch = 'a' in a1, a2, a3 is performed without loop.
So, how should one measure performance in the program above for

optimization levels O1, O2, O3?

We need to have side effects, or fool the optimizer to think there are side
effects, by calling external functions. There might be other ways too.
Jul 22 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
2717
by: Greg Brunet | last post by:
In doing some testing of different but simple algorithms for getting a list of prime numbers, I ended up getting some results that seem a bit contradictory. Given the following test program (testPrimes.py) with two algorithms that both check for primes by testing only odd numbers using factors up to the square root of the value, where Primes1 is based on all of the existing primes so far, and Primes2 is based on all odd numbers, I would...
0
2671
by: Alex Vinokur | last post by:
=================================== ------------- Sorting ------------- Comparative performance measurement =================================== Testsuite : Comparing Function Objects to Function Pointers Source : Technical Report on C++ Performance Tool : The C/C++ Program Perfometer (Version 2.8.1-1.19-Beta) * http://sourceforge.net/projects/cpp-perfometer/
6
2169
by: AC Slater | last post by:
Hi All, Out of nowhere my udb system (v8) performance has went terrible. Its gotten about 10x worse, (some tests that used to take 2 seconds to run now take 20)... I'm not sure what happened. I did reorg/runstat/rebind on everything, no luck... I'm not sure what to do next...? Any recommendations on something to try to start narrowing down the possible problem. Things I've tried:
14
5423
by: Sean C. | last post by:
Helpful folks, Most of my previous experience with DB2 was on s390 mainframe systems and the optimizer on this platform always seemed very predictable and consistent. Since moving to a WinNT/UDB 7.2 environment, the choices the optimizer makes often seem flaky. But this last example really floored me. I was hoping someone could explain why I get worse response time when the optimizer uses two indexes, than when it uses one. Some context:
23
3500
by: Rudolf Bargholz | last post by:
Hi, I have a ralatively simple SQL: select FK from TABLE where upper(A) like 'B%' and upper(C) like 'D%' We have DB2 UDB v7.1 FP 12 installed on Linux and on Windows 2003 On Linux using optimization level 5 as well as 9 and 0 the SQL uses 3'100'000'000 timerons !
1
1881
by: Lakesider | last post by:
Hi NG, I have written an application with a lot of file- and database operations. There are several algorithmic operations, too. My question is: are ther any tools to improve performance - for "normal" C# methods - for database operations - for memory optimization - ...
4
1684
by: tarscher | last post by:
Hi all, I have 2 questions regarding performance: 1) I'm building a monitoring system that has to store lots of sensor data that I 'll have to query from time to time. I have pressure and temperature. Since we sample every 500 ms we will get lots of data after some time. Will my performance increase by making 2 tables; one with pressure and one with temperature? Thus when querying (eg pressure) mysql will only need to look in the...
36
2505
by: mrby | last post by:
Hi, Does anyone know of any link which describes the (relative) performance of all kinds of C operations? e.g: how fast is "add" comparing with "multiplication" on a typical machine. Thanks! -- B. Y.
10
4303
by: shsandeep | last post by:
The ETL application loaded around 3000 rows in 14 seconds in a Development database while it took 2 hours to load in a UAT database. UAT db is partitioned. Dev db is not partitioned. the application looks for existing rows in the table...if they already exist then it updates otherwise inserts them. The table is pretty large, around 6.5 million rows.
0
9655
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9498
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8993
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7517
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6749
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5535
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4069
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3670
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2894
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.