473,394 Members | 1,821 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Benchmark results unrealistic?

Hi!
I've created a benchmark tool which uses Agner Fog's asmlib to count the
clockcycles a function takes. I 'm using it to measure the
MersenneTwister.h speed.
Sourcecode is here:
http://code.google.com/p/multirng/so...s/benchmarks.h
(the main function just calls this functions)
When I run on a P4 Prescott (MinGW with GCC4, Win XP MediaCenter) with
-O3 -fexpensive-optimizations and Prescott-specific optimizations, it
shows me that e.g. mtr.rand() takes ~1200 clockcycles. I think this is
realistic.
But when I write something like

time[0] = ReadTSC();
for(int i = 0;i < NUMTESTS;i++) rand();
time[1] = ReadTSC();

and

cout << "rand() time:" << (time[1]-time[0])/NUMTESTS << endl;

it shows me that it (and the other functions too) takes only 20
clockcycles. Is this realistic? I think it's OK that when you call the
function it takes more clockcycles than in the average, but 20
clockcycles for creating a random number? However, even if I set
NUMTESTS to higher or lower values, the result remains the same (except
of a difference of about 3 or 4 clockcycles)

Thanks in advance, Hans
Feb 11 '08 #1
1 1959
On Feb 11, 3:30 pm, Hans Mull <deyrin...@googlemail.comwrote:
I've created a benchmark tool which uses Agner Fog's asmlib to count the
clockcycles a function takes. I 'm using it to measure the
MersenneTwister.h speed.
Sourcecode is here:http://code.google.com/p/multirng/so...enchmarks/benc...
(the main function just calls this functions)
When I run on a P4 Prescott (MinGW with GCC4, Win XP MediaCenter) with
-O3 -fexpensive-optimizations and Prescott-specific optimizations, it
shows me that e.g. mtr.rand() takes ~1200 clockcycles. I think this is
realistic.
But when I write something like
time[0] = ReadTSC();
for(int i = 0;i < NUMTESTS;i++) rand();
time[1] = ReadTSC();
and
cout << "rand() time:" << (time[1]-time[0])/NUMTESTS << endl;
it shows me that it (and the other functions too) takes only
20 clockcycles. Is this realistic? I think it's OK that when
you call the function it takes more clockcycles than in the
average, but 20 clockcycles for creating a random number?
That sounds a bit high for the usual implementations of rand(),
yes. But maybe your platform uses something better than the
usual implementations. Which aren't always that good, although
on a 64 bit machine, you can implement a reasonable good RGN
with only 2 cycles of computation. And of course, since it is a
function, you have the overhead of a function call in there. On
some machines, that can be several clock cycles in itself. Plus
the stores to memory, etc.

Of course, clock cycles don't really mean much on a modern
machine anyway. Most modern machines are capable of executing
several instructions in parallel, in a single clock, if there
are no dependencies, where as a rapid sequence of memory
accesses may lead to the memory pipeline staturating, and
several clocks in which no instructions can be executed. The
time it takes to execute rand() in a loop like this is probably
not typical of the time it would take to execute it in normal
program flow.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Feb 12 '08 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

15
by: Duncan Lissett | last post by:
I'd appreciate any suggestions on how to make faster Python implementations of Richards benchmark. Perhaps there are obvious problems that can be corrected? http://www.lissett.com/ben/bench1.htm
0
by: julio | last post by:
Hello Guys, AlphaServer EV6 processor 500mhz/1Gm memory with Linux mysql> SELECT BENCHMARK(1000000,ENCODE("hello","goodbye")); +----------------------------------------------+ |...
0
by: Jan Pieter Kunst | last post by:
I recently ran the MySQL benchmark suite on a Dual 1 GHz G4 running Mac OS X Server 10.2.8, and an 800 MHz Intel machine running SuSE Linux 8.0. Both installations used the same my.cnf file. The...
6
by: O. Kouame | last post by:
Hi all, Does anyone know where I can find an implementation of the TPC-C benchmark for SQL Server 2000 (preferably written in .NET) ? All I can find is an old Microsoft implementation of TPC-B...
2
by: Jan Schäfer | last post by:
Hi all, I want to measure Compiler performance in different C++ abstraction levels on several architectures. I am writing my own benchmark code, implementing the main algorithm I use in my...
74
by: aruna.mysore | last post by:
Hi all, I have a simple definitioin in a C file something like this. main() { char a; ....... int k; }
16
by: Jorge | last post by:
Webkit r34469 vs. Opera 9.50 : 3.00x as fast 6339.6ms(Opera) 2109.8ms (Webkit) ----- FF3.0 (final) vs. Opera 9.50 : 1.94x as fast 6339.6ms (Opera) 3269.6ms (FF3)
0
by: Jon Harrop | last post by:
A JVM developer called John Rose from Sun Microsystems recently claimed on a mailing list that Sun's JVM offers C-like performance whereas .NET only offers performance comparable to old JVM...
37
by: Jack | last post by:
I know one benchmark doesn't mean much but it's still disappointing to see Python as one of the slowest languages in the test: ...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.