469,293 Members | 1,319 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,293 developers. It's quick & easy.

Performance measurement and optimization levels


For instance, we need to measure performance
of assignment 'ch1 = ch2' where ch1 and ch2 are of char type.
We need to do that for different optimization levels of the same compiler.
Here is some test program.
Environment
-----------
Windows 2000
Intel (R) Celeron (R) CPU 1.70 GHz
GNU g++ 3.3.1 (cygming special), MINGW

========== C++ code : foo.cpp : BEGIN ==========
// Note. To simplify this demo program
// the clock() return value isn't checked
// ---------------------------------------------
#include <ctime>
#include <iostream>
using namespace std;

int main()
{
clock_t t0, tn;
unsigned long i = 0;
char ch;

#define REPETITIONS 100000000

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) {}
tn = clock ();
cout << "Do noting : " << (tn - t0) << " ticks" << endl;

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) ch = 'a';
tn = clock ();

cout << "Do something : " << (tn - t0) << " ticks" << endl;

return 0;
}
========== C++ code : foo.cpp : END ============
========= Compilation : BEGIN =========

$ g++ --version
g++ (GCC) 3.3.1 (cygming special)
[---omitted---]

$ g++ -mno-cygwin foo.cpp -o a0

$ g++ -mno-cygwin -O1 foo.cpp -o a1

$ g++ -mno-cygwin -O2 foo.cpp -o a2

$ g++ -mno-cygwin -O3 foo.cpp -o a3

$ wc *.exe
394 5333 424460 a0.exe
398 5294 424460 a1.exe
397 5293 424460 a2.exe
396 5303 424478 a3.exe
1585 21223 1697858 total

========= Compilation : END ===========
========= Run : BEGIN =========

$ a0
Do noting : 250 ticks
Do something : 371 ticks

$ a1
Do noting : 120 ticks
Do something : 130 ticks

$ a2
Do noting : 120 ticks
Do something : 120 ticks

$ a3
Do noting : 120 ticks
Do something : 120 ticks

========= Run : END ===========
We can see that only a0 generates believable results.
Most probably, assignment ch = 'a' in a1, a2, a3 is performed without loop.

So, how should one measure performance in the program above for optimization levels O1, O2, O3?
--
Alex Vinokur
http://mathforum.org/library/view/10978.html
http://sourceforge.net/users/alexvn

Jul 22 '05 #1
3 1925
On Wed, 21 Jul 2004 08:40:45 +0300, Alex Vinokur wrote:

For instance, we need to measure performance
of assignment 'ch1 = ch2' where ch1 and ch2 are of char type.
We need to do that for different optimization levels of the same compiler.
Very likely this specific operation will be the same at all levels -- some
approximation of mov ch1, ch2.

....

(Source and build commands kept for context) t0 = clock ();
for (i = 0; i < REPETITIONS; i++) {}
tn = clock ();
cout << "Do noting : " << (tn - t0) << " ticks" << endl; t0 = clock ();
for (i = 0; i < REPETITIONS; i++) ch = 'a';
tn = clock ();
cout << "Do something : " << (tn - t0) << " ticks" << endl;
$ g++ -mno-cygwin foo.cpp -o a0
$ a0
Do noting : 250 ticks
Do something : 371 ticks $ g++ -mno-cygwin -O1 foo.cpp -o a1
$ a1
Do noting : 120 ticks
Do something : 130 ticks .... $ g++ -mno-cygwin -O3 foo.cpp -o a3
$ a3
Do noting : 120 ticks
Do something : 120 ticks We can see that only a0 generates believable results.
Most probably, assignment ch = 'a' in a1, a2, a3 is performed without loop.

So, how should one measure performance in the program above for optimization levels O1, O2, O3?


What, exactly, were you expecting the optimizer to do? *Not* optimize
your program?

--
Some say the Wired doesn't have political borders like the real world,
but there are far too many nonsense-spouting anarchists or idiots who
think that pranks are a revolution.

Jul 22 '05 #2
Alex Vinokur wrote:
For instance, we need to measure performance
of assignment 'ch1 = ch2' where ch1 and ch2 are of char type.
We need to do that for different optimization levels of the same compiler.
Here is some test program.
Environment
-----------
Windows 2000
Intel (R) Celeron (R) CPU 1.70 GHz
GNU g++ 3.3.1 (cygming special), MINGW

========== C++ code : foo.cpp : BEGIN ==========
// Note. To simplify this demo program
// the clock() return value isn't checked
// ---------------------------------------------
#include <ctime>
#include <iostream>
using namespace std;

int main()
{
clock_t t0, tn;
unsigned long i = 0;
char ch;

#define REPETITIONS 100000000

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) {}
tn = clock ();
cout << "Do noting : " << (tn - t0) << " ticks" << endl;

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) ch = 'a';
tn = clock ();

cout << "Do something : " << (tn - t0) << " ticks" << endl;

return 0;
}
========== C++ code : foo.cpp : END ============
========= Compilation : BEGIN =========

$ g++ --version
g++ (GCC) 3.3.1 (cygming special)
[---omitted---]

$ g++ -mno-cygwin foo.cpp -o a0

$ g++ -mno-cygwin -O1 foo.cpp -o a1

$ g++ -mno-cygwin -O2 foo.cpp -o a2

$ g++ -mno-cygwin -O3 foo.cpp -o a3

$ wc *.exe
394 5333 424460 a0.exe
398 5294 424460 a1.exe
397 5293 424460 a2.exe
396 5303 424478 a3.exe
1585 21223 1697858 total

========= Compilation : END ===========
========= Run : BEGIN =========

$ a0
Do noting : 250 ticks
Do something : 371 ticks

$ a1
Do noting : 120 ticks
Do something : 130 ticks

$ a2
Do noting : 120 ticks
Do something : 120 ticks

$ a3
Do noting : 120 ticks
Do something : 120 ticks

========= Run : END ===========
We can see that only a0 generates believable results.
a1, a2 and a3 are IMHO believable too. In fact with a good optimizer I
would expect results close to 0 ticks, because with this code the 'for'
loops can be completely eliminated.
Most probably, assignment ch = 'a' in a1, a2, a3 is performed without loop.

So, how should one measure performance in the program above for optimization levels O1, O2, O3?


Keep in mind that code that has no observable effects can be completely
optimized away by the optimizer. Since in your code 'ch' is assigned to
but never used, the optimizer can replace the assignment with nothing.
To prevent this optimization you could for example output the ch
variable after the loop has completed:

Also the 'for' loop can be replaced with something that has the same
effect (which may be nothing). For example:

for (i = 0; i < REPETITIONS; i++) ch = 'a';

Can be replaced with:

ch = 'a';

MSVC can do this optimization, and can handle even more complex cases.
For example with optimization enabled the following code:

int main()
{
int i = 10;

for(int j= 0; j < 10; ++j)
{
i += 10;
}

return i;
}

Will produce the equivalent of:

int main()
{
return 110;
}

Like I said in another thread; making a good benchmark is extremely
tricky. Artifical code like you posted, is prone to produce
non-representative benchmark results.

--
Peter van Merkerk
peter.van.merkerk(at)dse.nl
Jul 22 '05 #3
"Alex Vinokur" <al****@big-foot.com> wrote in message
news:2m************@uni-berlin.de...

int main()
{
clock_t t0, tn;
unsigned long i = 0;
char ch;

#define REPETITIONS 100000000

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) {}
A good optimizer will optimize the above out of existence. It does nothing
anyway.
tn = clock ();
cout << "Do noting : " << (tn - t0) << " ticks" << endl;

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) ch = 'a';
A good compiler will optimize the above loop to { ch = 'a'; }, just one
assignment.
tn = clock ();

cout << "Do something : " << (tn - t0) << " ticks" << endl;

return 0;
} $ a0
Do noting : 250 ticks
Do something : 371 ticks

$ a1
Do noting : 120 ticks
Do something : 130 ticks

$ a2
Do noting : 120 ticks
Do something : 120 ticks

$ a3
Do noting : 120 ticks
Do something : 120 ticks

========= Run : END ===========
We can see that only a0 generates believable results.
Most probably, assignment ch = 'a' in a1, a2, a3 is performed without loop.
So, how should one measure performance in the program above for

optimization levels O1, O2, O3?

We need to have side effects, or fool the optimizer to think there are side
effects, by calling external functions. There might be other ways too.
Jul 22 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

9 posts views Thread by Greg Brunet | last post: by
reply views Thread by Alex Vinokur | last post: by
6 posts views Thread by AC Slater | last post: by
14 posts views Thread by Sean C. | last post: by
23 posts views Thread by Rudolf Bargholz | last post: by
1 post views Thread by Lakesider | last post: by
4 posts views Thread by tarscher | last post: by
10 posts views Thread by shsandeep | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by suresh191 | last post: by
reply views Thread by harlem98 | last post: by
1 post views Thread by Geralt96 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.