473,327 Members | 2,074 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,327 software developers and data experts.

Performance measurement and optimization levels


For instance, we need to measure performance
of assignment 'ch1 = ch2' where ch1 and ch2 are of char type.
We need to do that for different optimization levels of the same compiler.
Here is some test program.
Environment
-----------
Windows 2000
Intel (R) Celeron (R) CPU 1.70 GHz
GNU g++ 3.3.1 (cygming special), MINGW

========== C++ code : foo.cpp : BEGIN ==========
// Note. To simplify this demo program
// the clock() return value isn't checked
// ---------------------------------------------
#include <ctime>
#include <iostream>
using namespace std;

int main()
{
clock_t t0, tn;
unsigned long i = 0;
char ch;

#define REPETITIONS 100000000

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) {}
tn = clock ();
cout << "Do noting : " << (tn - t0) << " ticks" << endl;

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) ch = 'a';
tn = clock ();

cout << "Do something : " << (tn - t0) << " ticks" << endl;

return 0;
}
========== C++ code : foo.cpp : END ============
========= Compilation : BEGIN =========

$ g++ --version
g++ (GCC) 3.3.1 (cygming special)
[---omitted---]

$ g++ -mno-cygwin foo.cpp -o a0

$ g++ -mno-cygwin -O1 foo.cpp -o a1

$ g++ -mno-cygwin -O2 foo.cpp -o a2

$ g++ -mno-cygwin -O3 foo.cpp -o a3

$ wc *.exe
394 5333 424460 a0.exe
398 5294 424460 a1.exe
397 5293 424460 a2.exe
396 5303 424478 a3.exe
1585 21223 1697858 total

========= Compilation : END ===========
========= Run : BEGIN =========

$ a0
Do noting : 250 ticks
Do something : 371 ticks

$ a1
Do noting : 120 ticks
Do something : 130 ticks

$ a2
Do noting : 120 ticks
Do something : 120 ticks

$ a3
Do noting : 120 ticks
Do something : 120 ticks

========= Run : END ===========
We can see that only a0 generates believable results.
Most probably, assignment ch = 'a' in a1, a2, a3 is performed without loop.

So, how should one measure performance in the program above for optimization levels O1, O2, O3?
--
Alex Vinokur
http://mathforum.org/library/view/10978.html
http://sourceforge.net/users/alexvn

Jul 22 '05 #1
3 2130
On Wed, 21 Jul 2004 08:40:45 +0300, Alex Vinokur wrote:

For instance, we need to measure performance
of assignment 'ch1 = ch2' where ch1 and ch2 are of char type.
We need to do that for different optimization levels of the same compiler.
Very likely this specific operation will be the same at all levels -- some
approximation of mov ch1, ch2.

....

(Source and build commands kept for context) t0 = clock ();
for (i = 0; i < REPETITIONS; i++) {}
tn = clock ();
cout << "Do noting : " << (tn - t0) << " ticks" << endl; t0 = clock ();
for (i = 0; i < REPETITIONS; i++) ch = 'a';
tn = clock ();
cout << "Do something : " << (tn - t0) << " ticks" << endl;
$ g++ -mno-cygwin foo.cpp -o a0
$ a0
Do noting : 250 ticks
Do something : 371 ticks $ g++ -mno-cygwin -O1 foo.cpp -o a1
$ a1
Do noting : 120 ticks
Do something : 130 ticks .... $ g++ -mno-cygwin -O3 foo.cpp -o a3
$ a3
Do noting : 120 ticks
Do something : 120 ticks We can see that only a0 generates believable results.
Most probably, assignment ch = 'a' in a1, a2, a3 is performed without loop.

So, how should one measure performance in the program above for optimization levels O1, O2, O3?


What, exactly, were you expecting the optimizer to do? *Not* optimize
your program?

--
Some say the Wired doesn't have political borders like the real world,
but there are far too many nonsense-spouting anarchists or idiots who
think that pranks are a revolution.

Jul 22 '05 #2
Alex Vinokur wrote:
For instance, we need to measure performance
of assignment 'ch1 = ch2' where ch1 and ch2 are of char type.
We need to do that for different optimization levels of the same compiler.
Here is some test program.
Environment
-----------
Windows 2000
Intel (R) Celeron (R) CPU 1.70 GHz
GNU g++ 3.3.1 (cygming special), MINGW

========== C++ code : foo.cpp : BEGIN ==========
// Note. To simplify this demo program
// the clock() return value isn't checked
// ---------------------------------------------
#include <ctime>
#include <iostream>
using namespace std;

int main()
{
clock_t t0, tn;
unsigned long i = 0;
char ch;

#define REPETITIONS 100000000

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) {}
tn = clock ();
cout << "Do noting : " << (tn - t0) << " ticks" << endl;

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) ch = 'a';
tn = clock ();

cout << "Do something : " << (tn - t0) << " ticks" << endl;

return 0;
}
========== C++ code : foo.cpp : END ============
========= Compilation : BEGIN =========

$ g++ --version
g++ (GCC) 3.3.1 (cygming special)
[---omitted---]

$ g++ -mno-cygwin foo.cpp -o a0

$ g++ -mno-cygwin -O1 foo.cpp -o a1

$ g++ -mno-cygwin -O2 foo.cpp -o a2

$ g++ -mno-cygwin -O3 foo.cpp -o a3

$ wc *.exe
394 5333 424460 a0.exe
398 5294 424460 a1.exe
397 5293 424460 a2.exe
396 5303 424478 a3.exe
1585 21223 1697858 total

========= Compilation : END ===========
========= Run : BEGIN =========

$ a0
Do noting : 250 ticks
Do something : 371 ticks

$ a1
Do noting : 120 ticks
Do something : 130 ticks

$ a2
Do noting : 120 ticks
Do something : 120 ticks

$ a3
Do noting : 120 ticks
Do something : 120 ticks

========= Run : END ===========
We can see that only a0 generates believable results.
a1, a2 and a3 are IMHO believable too. In fact with a good optimizer I
would expect results close to 0 ticks, because with this code the 'for'
loops can be completely eliminated.
Most probably, assignment ch = 'a' in a1, a2, a3 is performed without loop.

So, how should one measure performance in the program above for optimization levels O1, O2, O3?


Keep in mind that code that has no observable effects can be completely
optimized away by the optimizer. Since in your code 'ch' is assigned to
but never used, the optimizer can replace the assignment with nothing.
To prevent this optimization you could for example output the ch
variable after the loop has completed:

Also the 'for' loop can be replaced with something that has the same
effect (which may be nothing). For example:

for (i = 0; i < REPETITIONS; i++) ch = 'a';

Can be replaced with:

ch = 'a';

MSVC can do this optimization, and can handle even more complex cases.
For example with optimization enabled the following code:

int main()
{
int i = 10;

for(int j= 0; j < 10; ++j)
{
i += 10;
}

return i;
}

Will produce the equivalent of:

int main()
{
return 110;
}

Like I said in another thread; making a good benchmark is extremely
tricky. Artifical code like you posted, is prone to produce
non-representative benchmark results.

--
Peter van Merkerk
peter.van.merkerk(at)dse.nl
Jul 22 '05 #3
"Alex Vinokur" <al****@big-foot.com> wrote in message
news:2m************@uni-berlin.de...

int main()
{
clock_t t0, tn;
unsigned long i = 0;
char ch;

#define REPETITIONS 100000000

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) {}
A good optimizer will optimize the above out of existence. It does nothing
anyway.
tn = clock ();
cout << "Do noting : " << (tn - t0) << " ticks" << endl;

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) ch = 'a';
A good compiler will optimize the above loop to { ch = 'a'; }, just one
assignment.
tn = clock ();

cout << "Do something : " << (tn - t0) << " ticks" << endl;

return 0;
} $ a0
Do noting : 250 ticks
Do something : 371 ticks

$ a1
Do noting : 120 ticks
Do something : 130 ticks

$ a2
Do noting : 120 ticks
Do something : 120 ticks

$ a3
Do noting : 120 ticks
Do something : 120 ticks

========= Run : END ===========
We can see that only a0 generates believable results.
Most probably, assignment ch = 'a' in a1, a2, a3 is performed without loop.
So, how should one measure performance in the program above for

optimization levels O1, O2, O3?

We need to have side effects, or fool the optimizer to think there are side
effects, by calling external functions. There might be other ways too.
Jul 22 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Greg Brunet | last post by:
In doing some testing of different but simple algorithms for getting a list of prime numbers, I ended up getting some results that seem a bit contradictory. Given the following test program...
0
by: Alex Vinokur | last post by:
=================================== ------------- Sorting ------------- Comparative performance measurement =================================== Testsuite : Comparing Function Objects to...
6
by: AC Slater | last post by:
Hi All, Out of nowhere my udb system (v8) performance has went terrible. Its gotten about 10x worse, (some tests that used to take 2 seconds to run now take 20)... I'm not sure what happened. ...
14
by: Sean C. | last post by:
Helpful folks, Most of my previous experience with DB2 was on s390 mainframe systems and the optimizer on this platform always seemed very predictable and consistent. Since moving to a WinNT/UDB...
23
by: Rudolf Bargholz | last post by:
Hi, I have a ralatively simple SQL: select FK from TABLE where upper(A) like 'B%' and upper(C) like 'D%' We have DB2 UDB v7.1 FP 12 installed on Linux and on Windows 2003 On Linux using...
1
by: Lakesider | last post by:
Hi NG, I have written an application with a lot of file- and database operations. There are several algorithmic operations, too. My question is: are ther any tools to improve performance -...
4
by: tarscher | last post by:
Hi all, I have 2 questions regarding performance: 1) I'm building a monitoring system that has to store lots of sensor data that I 'll have to query from time to time. I have pressure and...
36
by: mrby | last post by:
Hi, Does anyone know of any link which describes the (relative) performance of all kinds of C operations? e.g: how fast is "add" comparing with "multiplication" on a typical machine. Thanks!...
10
by: shsandeep | last post by:
The ETL application loaded around 3000 rows in 14 seconds in a Development database while it took 2 hours to load in a UAT database. UAT db is partitioned. Dev db is not partitioned. the...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.