470,594 Members | 1,523 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,594 developers. It's quick & easy.

Is there a faster hi resolution timer for diy profiling

Hi,

Im trying to time some parts of my code using the Stopwatch class,
but it is so slow, I tried the QueryPerformanceFrequency()
but this seems to be just as slow,
if I just simply call this function in a loop it takes 21 ticks
with a reported frequency of 3.5mhz
its is about 6us wich is over 12k instructions on my 2ghz cpu.
this negates the idea of having a high res timer ...

Is there any way to access the timer more directly ?
I know the cpu actually has a very high resolution timer,
it would be nice to be able to just read this value directly,
maybe il have to write a c++ library and do it in asm
thats assuming I can get the OS to grant me acces to that hardware address.

basicaly im making a diy profiler, I have a profile class
wich needs Enter(enum p) funtion at the start and a Leave()
function at the end of some functions of interest.

primarily it keeps tarck of the number of calls made,
and a lists the calling functions counts too.
it also measures the total time spent in the function and the subtracts the
time spent in
subroutins to provide an actual time too.

It works pretty quick for counts but when I use the timer it is like 20
times slower,
wich although I can live with waiting 20 minutes to collect the data it does
mean
that there is a considerable error for short functions,

I have added a fudge wich subtracts a fiddle value from the timer,
wich is increased by 21 ticks every times its called,
this might be close but its far from ideal.

Im trying to find out why/where its spending 75% of the cpu time in
mscorlib and mscorwks libraries, the amd codeanalyst identifies its spending
most of its time in a function called mscorwks::gc_heap::adjust_limit_clr

thanks
Colin =^.^=
Jan 15 '08 #1
19 4968
"colin" <co*********@ntworld.NOSPAM.comwrote in message
news:kr**************@newsfe4-win.ntli.net...
Hi,

Im trying to time some parts of my code using the Stopwatch class,
but it is so slow, I tried the QueryPerformanceFrequency()
but this seems to be just as slow,
if I just simply call this function in a loop it takes 21 ticks
with a reported frequency of 3.5mhz
its is about 6us wich is over 12k instructions on my 2ghz cpu.
this negates the idea of having a high res timer ...

Is there any way to access the timer more directly ?
I know the cpu actually has a very high resolution timer,
it would be nice to be able to just read this value directly,
maybe il have to write a c++ library and do it in asm
thats assuming I can get the OS to grant me acces to that hardware
address.

basicaly im making a diy profiler, I have a profile class
wich needs Enter(enum p) funtion at the start and a Leave()
function at the end of some functions of interest.

primarily it keeps tarck of the number of calls made,
and a lists the calling functions counts too.
it also measures the total time spent in the function and the subtracts
the time spent in
subroutins to provide an actual time too.

It works pretty quick for counts but when I use the timer it is like 20
times slower,
wich although I can live with waiting 20 minutes to collect the data it
does mean
that there is a considerable error for short functions,

I have added a fudge wich subtracts a fiddle value from the timer,
wich is increased by 21 ticks every times its called,
this might be close but its far from ideal.

Im trying to find out why/where its spending 75% of the cpu time in
mscorlib and mscorwks libraries, the amd codeanalyst identifies its
spending
most of its time in a function called mscorwks::gc_heap::adjust_limit_clr

thanks
Colin =^.^=


Please post your code, there must be something wrong with it,
QueryPerformanceFrequency()
can't take 6µsec to complete.
Note that there is no need to call QueryPerformanceFrequency(), this value
exposed by the Stopwatch class.
Willy.

Jan 15 '08 #2
I just copied the code from here
http://msdn2.microsoft.com/en-us/lib...92(VS.80).aspx

and oops I meant the QueryPerformanceCounter() call is taking so long ...

I just have a loop wich calls it repeatedly 100000 times to measure how long
it takes.

the Stopwatch also took just as long.

Colin =^.^=
>
Please post your code, there must be something wrong with it,
QueryPerformanceFrequency()
can't take 6µsec to complete.
Note that there is no need to call QueryPerformanceFrequency(), this value
exposed by the Stopwatch class.
Willy.

Jan 15 '08 #3
"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:ea**************@TK2MSFTNGP03.phx.gbl...
"colin" <co*********@ntworld.NOSPAM.comwrote in message
news:kr**************@newsfe4-win.ntli.net...
>Hi,

Im trying to time some parts of my code using the Stopwatch class,
but it is so slow, I tried the QueryPerformanceFrequency()
but this seems to be just as slow,
if I just simply call this function in a loop it takes 21 ticks
with a reported frequency of 3.5mhz
its is about 6us wich is over 12k instructions on my 2ghz cpu.
this negates the idea of having a high res timer ...

Is there any way to access the timer more directly ?
I know the cpu actually has a very high resolution timer,
it would be nice to be able to just read this value directly,
maybe il have to write a c++ library and do it in asm
thats assuming I can get the OS to grant me acces to that hardware
address.

basicaly im making a diy profiler, I have a profile class
wich needs Enter(enum p) funtion at the start and a Leave()
function at the end of some functions of interest.

primarily it keeps tarck of the number of calls made,
and a lists the calling functions counts too.
it also measures the total time spent in the function and the subtracts
the time spent in
subroutins to provide an actual time too.

It works pretty quick for counts but when I use the timer it is like 20
times slower,
wich although I can live with waiting 20 minutes to collect the data it
does mean
that there is a considerable error for short functions,

I have added a fudge wich subtracts a fiddle value from the timer,
wich is increased by 21 ticks every times its called,
this might be close but its far from ideal.

Im trying to find out why/where its spending 75% of the cpu time in
mscorlib and mscorwks libraries, the amd codeanalyst identifies its
spending
most of its time in a function called mscorwks::gc_heap::adjust_limit_clr

thanks
Colin =^.^=

Please post your code, there must be something wrong with it,
QueryPerformanceFrequency()
can't take 6µsec to complete.
Note that there is no need to call QueryPerformanceFrequency(), this value
exposed by the Stopwatch class.
Willy.


Well, QueryPerformanceFrequency must transition into the kernel, which is
quite expensive, (6000 instructions), so this function may well take +4
µsec on a 2Ghz machine. The same is true for QueryPerformanceCounter, so you
need to account for this overhead when profiling.

Windows Vista and WS2008 have a new API QueryThreadCycleTime which returns
the CPU cycles count.
This API calls the __rdtsc(); VC++ compilers intrinsic function, you can
implement a small C function that calls this intrinsic and call this one
from C# using PInvoke.
Willy.
Jan 15 '08 #4
"colin" <co*********@ntworld.NOSPAM.comwrote in message
news:SY****************@newsfe1-win.ntli.net...
>I just copied the code from here
http://msdn2.microsoft.com/en-us/lib...92(VS.80).aspx

and oops I meant the QueryPerformanceCounter() call is taking so long ...

I just have a loop wich calls it repeatedly 100000 times to measure how
long it takes.

the Stopwatch also took just as long.

Hmmm.. this code is meant for the compact framework (WinCE).
The exact PInvoke declaration for Windows is:

[DllImport("kernel32"),SuppressUnmanagedCodeSecurit y]
static extern int QueryPerformanceFrequency(out long freq);

but as I said in another reply QueryPerformanceFrequency is an expensive
call.

Willy.

Jan 15 '08 #5
"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:eA**************@TK2MSFTNGP03.phx.gbl...
"colin" <co*********@ntworld.NOSPAM.comwrote in message
news:SY****************@newsfe1-win.ntli.net...
>>I just copied the code from here
http://msdn2.microsoft.com/en-us/lib...92(VS.80).aspx

and oops I meant the QueryPerformanceCounter() call is taking so long ...

I just have a loop wich calls it repeatedly 100000 times to measure how
long it takes.

the Stopwatch also took just as long.


Hmmm.. this code is meant for the compact framework (WinCE).
The exact PInvoke declaration for Windows is:

[DllImport("kernel32"),SuppressUnmanagedCodeSecurit y]
static extern int QueryPerformanceFrequency(out long freq);

but as I said in another reply QueryPerformanceFrequency is an expensive
call.
Ah yes i found that out after I copied it and had to go search for
coredll.dll
it seems its in kernel32 for non ce
I aded the supress security too
in the hope it might speed it up ...

seems these two calls are easy to mix up when writing them lol...
i just call frequency func once, and the counter func about 100million times

thanks
Colin =^.^=
Jan 15 '08 #6
"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:eM**************@TK2MSFTNGP05.phx.gbl...
"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:ea**************@TK2MSFTNGP03.phx.gbl...
>"colin" <co*********@ntworld.NOSPAM.comwrote in message
news:kr**************@newsfe4-win.ntli.net...
>>Hi,

Im trying to time some parts of my code using the Stopwatch class,
but it is so slow, I tried the QueryPerformanceFrequency()
but this seems to be just as slow,
if I just simply call this function in a loop it takes 21 ticks
with a reported frequency of 3.5mhz
its is about 6us wich is over 12k instructions on my 2ghz cpu.
this negates the idea of having a high res timer ...

Is there any way to access the timer more directly ?
I know the cpu actually has a very high resolution timer,
it would be nice to be able to just read this value directly,
maybe il have to write a c++ library and do it in asm
thats assuming I can get the OS to grant me acces to that hardware
address.

basicaly im making a diy profiler, I have a profile class
wich needs Enter(enum p) funtion at the start and a Leave()
function at the end of some functions of interest.

primarily it keeps tarck of the number of calls made,
and a lists the calling functions counts too.
it also measures the total time spent in the function and the subtracts
the time spent in
subroutins to provide an actual time too.

It works pretty quick for counts but when I use the timer it is like 20
times slower,
wich although I can live with waiting 20 minutes to collect the data it
does mean
that there is a considerable error for short functions,

I have added a fudge wich subtracts a fiddle value from the timer,
wich is increased by 21 ticks every times its called,
this might be close but its far from ideal.

Im trying to find out why/where its spending 75% of the cpu time in
mscorlib and mscorwks libraries, the amd codeanalyst identifies its
spending
most of its time in a function called
mscorwks::gc_heap::adjust_limit_clr

thanks
Colin =^.^=

Please post your code, there must be something wrong with it,
QueryPerformanceFrequency()
can't take 6µsec to complete.
Note that there is no need to call QueryPerformanceFrequency(), this
value exposed by the Stopwatch class.
Willy.

Well, QueryPerformanceFrequency must transition into the kernel, which is
quite expensive, (6000 instructions), so this function may well take +4
µsec on a 2Ghz machine. The same is true for QueryPerformanceCounter, so
you need to account for this overhead when profiling.

Windows Vista and WS2008 have a new API QueryThreadCycleTime which returns
the CPU cycles count.
This API calls the __rdtsc(); VC++ compilers intrinsic function, you can
implement a small C function that calls this intrinsic and call this one
from C# using PInvoke.
Willy.



// C function that uses an intrinsic to retrieve the Processor's cycle
counter register
// Beware your CPU may not support this!
// Compile from the command line: cl /EHsc /O2 <thisfile.cpp>
#include <intrin.h>
#pragma intrinsic(__rdtsc)
extern "C"
__declspec(dllexport) unsigned __int64 __stdcall rdtsc()
{
return __rdtsc();
}
// C#
[DllImport("thisfile.dll"),SuppressUnmanagedCodeSec urity]
static extern ulong rdtsc();

...
ulong cycles = rdtsc();

Willy.

Jan 15 '08 #7
"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:%2****************@TK2MSFTNGP02.phx.gbl...
"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
// C function that uses an intrinsic to retrieve the Processor's cycle
counter register
// Beware your CPU may not support this!
// Compile from the command line: cl /EHsc /O2 <thisfile.cpp>
#include <intrin.h>
#pragma intrinsic(__rdtsc)
extern "C"
__declspec(dllexport) unsigned __int64 __stdcall rdtsc()
{
return __rdtsc();
}
// C#
[DllImport("thisfile.dll"),SuppressUnmanagedCodeSec urity]
static extern ulong rdtsc();

...
ulong cycles = rdtsc();

Willy.
thanks that worked, however it stil takes 9000 clock cycles
wich I think is roughly about the same time ...
is that PInvoke realy so time consuming ?
what does it have todo ?

thanks
Colin =^.^=
Jan 16 '08 #8
"colin" <co*********@ntworld.NOSPAM.comwrote in message
news:4J***************@newsfe6-win.ntli.net...
"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:%2****************@TK2MSFTNGP02.phx.gbl...
>"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
>// C function that uses an intrinsic to retrieve the Processor's cycle
counter register
// Beware your CPU may not support this!
// Compile from the command line: cl /EHsc /O2 <thisfile.cpp>
#include <intrin.h>
#pragma intrinsic(__rdtsc)
extern "C"
__declspec(dllexport) unsigned __int64 __stdcall rdtsc()
{
return __rdtsc();
}
// C#
[DllImport("thisfile.dll"),SuppressUnmanagedCodeSec urity]
static extern ulong rdtsc();

...
ulong cycles = rdtsc();

Willy.

thanks that worked, however it stil takes 9000 clock cycles
wich I think is roughly about the same time ...
is that PInvoke realy so time consuming ?
what does it have todo ?

Ive just used Environment.TickCount, its only 1ms resolution
but it seems to be instant.

as ive plenty enough samples it seems to give some results still ...

thanks

Colin =^.^=
Jan 16 '08 #9
"colin" <co*********@ntworld.NOSPAM.comwrote in message
news:4J***************@newsfe6-win.ntli.net...
"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:%2****************@TK2MSFTNGP02.phx.gbl...
>"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
>// C function that uses an intrinsic to retrieve the Processor's cycle
counter register
// Beware your CPU may not support this!
// Compile from the command line: cl /EHsc /O2 <thisfile.cpp>
#include <intrin.h>
#pragma intrinsic(__rdtsc)
extern "C"
__declspec(dllexport) unsigned __int64 __stdcall rdtsc()
{
return __rdtsc();
}
// C#
[DllImport("thisfile.dll"),SuppressUnmanagedCodeSec urity]
static extern ulong rdtsc();

...
ulong cycles = rdtsc();

Willy.

thanks that worked, however it stil takes 9000 clock cycles
wich I think is roughly about the same time ...
is that PInvoke realy so time consuming ?
what does it have todo ?

thanks
Colin =^.^=


What exactly takes 9000 cycles?

Willy.

Jan 16 '08 #10
"colin" <co*********@ntworld.NOSPAM.comwrote in message
news:D%***************@newsfe6-win.ntli.net...
"colin" <co*********@ntworld.NOSPAM.comwrote in message
news:4J***************@newsfe6-win.ntli.net...
>"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:%2****************@TK2MSFTNGP02.phx.gbl...
>>"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
>>// C function that uses an intrinsic to retrieve the Processor's cycle
counter register
// Beware your CPU may not support this!
// Compile from the command line: cl /EHsc /O2 <thisfile.cpp>
#include <intrin.h>
#pragma intrinsic(__rdtsc)
extern "C"
__declspec(dllexport) unsigned __int64 __stdcall rdtsc()
{
return __rdtsc();
}
// C#
[DllImport("thisfile.dll"),SuppressUnmanagedCodeSec urity]
static extern ulong rdtsc();

...
ulong cycles = rdtsc();

Willy.

thanks that worked, however it stil takes 9000 clock cycles
wich I think is roughly about the same time ...
is that PInvoke realy so time consuming ?
what does it have todo ?


Ive just used Environment.TickCount, its only 1ms resolution
but it seems to be instant.
To do what?
Using TickCount to measure something that takes less that the RTC interval
will return the same count, the reason is that TickCount is updated at the
rithm of the system clock (every 10 or 15.6 msec. or more....).

Willy.
Willy.
Willy.

Jan 16 '08 #11

"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:ev**************@TK2MSFTNGP06.phx.gbl...
"colin" <co*********@ntworld.NOSPAM.comwrote in message
news:D%***************@newsfe6-win.ntli.net...
>"colin" <co*********@ntworld.NOSPAM.comwrote in message
news:4J***************@newsfe6-win.ntli.net...
>>"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:%2****************@TK2MSFTNGP02.phx.gbl.. .
"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message

// C function that uses an intrinsic to retrieve the Processor's cycle
counter register
// Beware your CPU may not support this!
// Compile from the command line: cl /EHsc /O2 <thisfile.cpp>
#include <intrin.h>
#pragma intrinsic(__rdtsc)
extern "C"
__declspec(dllexport) unsigned __int64 __stdcall rdtsc()
{
return __rdtsc();
}
// C#
[DllImport("thisfile.dll"),SuppressUnmanagedCodeSec urity]
static extern ulong rdtsc();

...
ulong cycles = rdtsc();

Willy.

thanks that worked, however it stil takes 9000 clock cycles
wich I think is roughly about the same time ...
is that PInvoke realy so time consuming ?
what does it have todo ?


Ive just used Environment.TickCount, its only 1ms resolution
but it seems to be instant.

To do what?
Using TickCount to measure something that takes less that the RTC interval
will return the same count, the reason is that TickCount is updated at the
rithm of the system clock (every 10 or 15.6 msec. or more....).

Willy.
well yes one would think so wich is why I wanted a high res timer,
but if you call it 1000 times and only 1 call it changes by 1 then
you know on average its taken 1us. ofc this means its subject to random
noise in the results.
the usuble resolution is timer_resolution/sqrt(number sampls)

Colin =^.^=
Jan 16 '08 #12

"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:ec**************@TK2MSFTNGP06.phx.gbl...
"colin" <co*********@ntworld.NOSPAM.comwrote in message
news:4J***************@newsfe6-win.ntli.net...
>"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:%2****************@TK2MSFTNGP02.phx.gbl...
>>"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
>>// C function that uses an intrinsic to retrieve the Processor's cycle
counter register
// Beware your CPU may not support this!
// Compile from the command line: cl /EHsc /O2 <thisfile.cpp>
#include <intrin.h>
#pragma intrinsic(__rdtsc)
extern "C"
__declspec(dllexport) unsigned __int64 __stdcall rdtsc()
{
return __rdtsc();
}
// C#
[DllImport("thisfile.dll"),SuppressUnmanagedCodeSec urity]
static extern ulong rdtsc();

...
ulong cycles = rdtsc();

Willy.

thanks that worked, however it stil takes 9000 clock cycles
wich I think is roughly about the same time ...
is that PInvoke realy so time consuming ?
what does it have todo ?

thanks
Colin =^.^=
What exactly takes 9000 cycles?
int cnt=100000;
start = timer.Value;
for (int x = 0; x < cnt; x++)
{
ulong cycles = timer.value
}
timerOverhead = (timer.Value - start)/cnt;

timer.value -get{return rdtsc();}

timerOverhead comes out as about 9000;

Colin =^.^=

Jan 16 '08 #13
"colin" <co*********@ntworld.NOSPAM.comwrote in message
news:CN***************@newsfe6-win.ntli.net...
>
"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:ec**************@TK2MSFTNGP06.phx.gbl...
>"colin" <co*********@ntworld.NOSPAM.comwrote in message
news:4J***************@newsfe6-win.ntli.net...
>>"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:%2****************@TK2MSFTNGP02.phx.gbl.. .
"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message

// C function that uses an intrinsic to retrieve the Processor's cycle
counter register
// Beware your CPU may not support this!
// Compile from the command line: cl /EHsc /O2 <thisfile.cpp>
#include <intrin.h>
#pragma intrinsic(__rdtsc)
extern "C"
__declspec(dllexport) unsigned __int64 __stdcall rdtsc()
{
return __rdtsc();
}
// C#
[DllImport("thisfile.dll"),SuppressUnmanagedCodeSec urity]
static extern ulong rdtsc();

...
ulong cycles = rdtsc();

Willy.

thanks that worked, however it stil takes 9000 clock cycles
wich I think is roughly about the same time ...
is that PInvoke realy so time consuming ?
what does it have todo ?

thanks
Colin =^.^=
What exactly takes 9000 cycles?

int cnt=100000;
start = timer.Value;
for (int x = 0; x < cnt; x++)
{
ulong cycles = timer.value
}
timerOverhead = (timer.Value - start)/cnt;

timer.value -get{return rdtsc();}

timerOverhead comes out as about 9000;

Colin =^.^=
Following code:

[DllImport("rtc"),SuppressUnmanagedCodeSecurity]
static extern ulong rdtsc();

static void Main()

{
int cnt = 100000;
ulong cycles = 0;
ulong start = rdtsc();
for (int x = 0; x < cnt; x++)
{
cycles = rdtsc();
}
ulong timerOverhead = (cycles - start) / (ulong)cnt;
Console.WriteLine(timerOverhead);
}
}

comes out as 13 on my box.
That means 13 cycles per iteration, the total number of instructions
executed is ~35, from which ~30 is Interop call overhead, the remaining in
the increment and compare of cnt.
The called function itself is only two instructions:

10001000 0f31 rdtsc
10001002 c3 ret

Seems like your processor does not support the rdtsc feature, what CPU do
you run this on?

You can call following C function before you are trying to use the Time
Stamp Counter, this function returns true if TSC is supported else false....

extern "C"
__declspec(dllexport) bool __stdcall tsc()
{
bool tscFeature = false;
int CPUInfo[4] = {-1};
__cpuid(CPUInfo, 0x80000001);
int nFeatureInfo = CPUInfo[3];
// check tsc feature bit (bit 27) in CPUInfo[3]
if (nFeatureInfo & 0x4000000)
tscFeature = true;
return tscFeature;
}
[DllImport("yourdll"),SuppressUnmanagedCodeSecurity]
static extern ulong rdtsc();
Willy.

Jan 16 '08 #14
"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:Or**************@TK2MSFTNGP02.phx.gbl...
"colin" <co*********@ntworld.NOSPAM.comwrote in message
news:CN***************@newsfe6-win.ntli.net...
>>
"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:ec**************@TK2MSFTNGP06.phx.gbl...
>>"colin" <co*********@ntworld.NOSPAM.comwrote in message
news:4J***************@newsfe6-win.ntli.net...
"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:%2****************@TK2MSFTNGP02.phx.gbl. ..
"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message

// C function that uses an intrinsic to retrieve the Processor's cycle
counter register
// Beware your CPU may not support this!
// Compile from the command line: cl /EHsc /O2 <thisfile.cpp>
#include <intrin.h>
#pragma intrinsic(__rdtsc)
extern "C"
__declspec(dllexport) unsigned __int64 __stdcall rdtsc()
{
return __rdtsc();
}
>
>
// C#
[DllImport("thisfile.dll"),SuppressUnmanagedCodeSec urity]
static extern ulong rdtsc();
>
...
ulong cycles = rdtsc();
>
Willy.

thanks that worked, however it stil takes 9000 clock cycles
wich I think is roughly about the same time ...
is that PInvoke realy so time consuming ?
what does it have todo ?

thanks
Colin =^.^=

What exactly takes 9000 cycles?

int cnt=100000;
start = timer.Value;
for (int x = 0; x < cnt; x++)
{
ulong cycles = timer.value
}
timerOverhead = (timer.Value - start)/cnt;

timer.value -get{return rdtsc();}

timerOverhead comes out as about 9000;

Colin =^.^=

Following code:

[DllImport("rtc"),SuppressUnmanagedCodeSecurity]
static extern ulong rdtsc();

static void Main()

{
int cnt = 100000;
ulong cycles = 0;
ulong start = rdtsc();
for (int x = 0; x < cnt; x++)
{
cycles = rdtsc();
}
ulong timerOverhead = (cycles - start) / (ulong)cnt;
Console.WriteLine(timerOverhead);
}
}

comes out as 13 on my box.
That means 13 cycles per iteration, the total number of instructions
executed is ~35, from which ~30 is Interop call overhead, the remaining in
the increment and compare of cnt.
The called function itself is only two instructions:

10001000 0f31 rdtsc
10001002 c3 ret

Seems like your processor does not support the rdtsc feature, what CPU do
you run this on?

You can call following C function before you are trying to use the Time
Stamp Counter, this function returns true if TSC is supported else
false....

extern "C"
__declspec(dllexport) bool __stdcall tsc()
{
bool tscFeature = false;
int CPUInfo[4] = {-1};
__cpuid(CPUInfo, 0x80000001);
int nFeatureInfo = CPUInfo[3];
// check tsc feature bit (bit 27) in CPUInfo[3]
if (nFeatureInfo & 0x4000000)
tscFeature = true;
return tscFeature;
}
[DllImport("yourdll"),SuppressUnmanagedCodeSecurity]
static extern ulong rdtsc();

thanks for trying it on your machie :D

hmm seems running from the ide debugger is making it incredibly slow
I just made completly new files and new c# console project,
managed to get 12 too by using ctrl-f5,
but when I run with just f5 its 9000 !

is there a way to avoid the overhead the debugger being attached seems to
impose ?

I only use the timer in debug mode to do profileing anyway,
but I gues I cld detect if the debuger is not atacched and dump it to a file
....

thanks
Colin =^.^=
Jan 16 '08 #15
"colin" <co*********@ntworld.NOSPAM.comwrote in message
news:p0******************@newsfe5-win.ntli.net...
"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
.....
>
thanks for trying it on your machie :D

hmm seems running from the ide debugger is making it incredibly slow
I just made completly new files and new c# console project,
managed to get 12 too by using ctrl-f5,
but when I run with just f5 its 9000 !

is there a way to avoid the overhead the debugger being attached seems to
impose ?

I only use the timer in debug mode to do profileing anyway,
but I gues I cld detect if the debuger is not atacched and dump it to a
file ...
I also went back and tried the

Stopwatch.GetTimestamp()

function amd found this is also much faster with no debugger attached,
the overhead is only 3 ticks instead of 13,
incidently it goes upto 39 ticks if optimization is off too.

3 ticks is still 1us and the rdtsc is much much faster,
but I could of lived with 3 ticks, it would of added 100s
to the run time instead of over 600.

thanks again
Colin =^.^=
Jan 16 '08 #16
"colin" <co*********@ntworld.NOSPAM.comwrote in message
news:p0******************@newsfe5-win.ntli.net...
"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:Or**************@TK2MSFTNGP02.phx.gbl...
>"colin" <co*********@ntworld.NOSPAM.comwrote in message
news:CN***************@newsfe6-win.ntli.net...
>>>
"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:ec**************@TK2MSFTNGP06.phx.gbl...
"colin" <co*********@ntworld.NOSPAM.comwrote in message
news:4J***************@newsfe6-win.ntli.net...
"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:%2****************@TK2MSFTNGP02.phx.gbl.. .
>"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
>
>// C function that uses an intrinsic to retrieve the Processor's
>cycle counter register
>// Beware your CPU may not support this!
>// Compile from the command line: cl /EHsc /O2 <thisfile.cpp>
>#include <intrin.h>
>#pragma intrinsic(__rdtsc)
>extern "C"
>__declspec(dllexport) unsigned __int64 __stdcall rdtsc()
>{
> return __rdtsc();
>}
>>
>>
>// C#
> [DllImport("thisfile.dll"),SuppressUnmanagedCodeSec urity]
> static extern ulong rdtsc();
>>
> ...
> ulong cycles = rdtsc();
>>
>Willy.
>
thanks that worked, however it stil takes 9000 clock cycles
wich I think is roughly about the same time ...
is that PInvoke realy so time consuming ?
what does it have todo ?
>
thanks
Colin =^.^=
>
What exactly takes 9000 cycles?

int cnt=100000;
start = timer.Value;
for (int x = 0; x < cnt; x++)
{
ulong cycles = timer.value
}
timerOverhead = (timer.Value - start)/cnt;

timer.value -get{return rdtsc();}

timerOverhead comes out as about 9000;

Colin =^.^=

Following code:

[DllImport("rtc"),SuppressUnmanagedCodeSecurity]
static extern ulong rdtsc();

static void Main()

{
int cnt = 100000;
ulong cycles = 0;
ulong start = rdtsc();
for (int x = 0; x < cnt; x++)
{
cycles = rdtsc();
}
ulong timerOverhead = (cycles - start) / (ulong)cnt;
Console.WriteLine(timerOverhead);
}
}

comes out as 13 on my box.
That means 13 cycles per iteration, the total number of instructions
executed is ~35, from which ~30 is Interop call overhead, the remaining
in the increment and compare of cnt.
The called function itself is only two instructions:

10001000 0f31 rdtsc
10001002 c3 ret

Seems like your processor does not support the rdtsc feature, what CPU do
you run this on?

You can call following C function before you are trying to use the Time
Stamp Counter, this function returns true if TSC is supported else
false....

extern "C"
__declspec(dllexport) bool __stdcall tsc()
{
bool tscFeature = false;
int CPUInfo[4] = {-1};
__cpuid(CPUInfo, 0x80000001);
int nFeatureInfo = CPUInfo[3];
// check tsc feature bit (bit 27) in CPUInfo[3]
if (nFeatureInfo & 0x4000000)
tscFeature = true;
return tscFeature;
}
[DllImport("yourdll"),SuppressUnmanagedCodeSecurity]
static extern ulong rdtsc();


thanks for trying it on your machie :D

hmm seems running from the ide debugger is making it incredibly slow
I just made completly new files and new c# console project,
managed to get 12 too by using ctrl-f5,
but when I run with just f5 its 9000 !
Let me see, you are running a debug version in the IDE debugger and it
returns 9000, right? Well, It will be slower but not that slow, there is
clearly something wrong (this 9000 seems like a magic number, weird) .

is there a way to avoid the overhead the debugger being attached seems to
impose ?
No there is no posibility (other than running outside the debugger) to avoid
the overhead imposed by the fact you are 1) running non optimized code 2)
inside the IDE debugger.
I only use the timer in debug mode to do profileing anyway,
but I gues I cld detect if the debuger is not atacched and dump it to a
file
You should not instrument debug builds, but if you do, you should accept the
overhead imposed by the environment (debugger, non optimized JIT code).

Willy.
Jan 16 '08 #17

"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:ug**************@TK2MSFTNGP04.phx.gbl...
"colin" <co*********@ntworld.NOSPAM.comwrote in message
news:p0******************@newsfe5-win.ntli.net...

Let me see, you are running a debug version in the IDE debugger and it
returns 9000, right? Well, It will be slower but not that slow, there is
clearly something wrong (this 9000 seems like a magic number, weird) .
yeah strange, maybe some who knows the internals can shed some light on
this?
>
>is there a way to avoid the overhead the debugger being attached seems to
impose ?

No there is no posibility (other than running outside the debugger) to
avoid the overhead imposed by the fact you are 1) running non optimized
code 2) inside the IDE debugger.
without actually knowing where the code is spending its time during
those 9000 clock cyclces its hard to know whats cuasing it.
>I only use the timer in debug mode to do profileing anyway,
but I gues I cld detect if the debuger is not atacched and dump it to a
file

You should not instrument debug builds, but if you do, you should accept
the overhead imposed by the environment (debugger, non optimized JIT
code).
yes good point, however I wanted to make sure the realease build was clean.
Theres little difference in the run time beteween debug and release.

I was only looking for the most time consuming parts of the code
I didnt expect a 20 times increase in time just becuase of this timer
tho lol, the increase occurs in the release or debug version,
purly determined by the debugegr being attched or not.

thanks very much for your help finaly got a workable solution anyway :)
Colin =^.^=
Jan 16 '08 #18
"colin" <co*********@ntworld.NOSPAM.comwrote in message
news:RR***************@newsfe4-win.ntli.net...
>
"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:ug**************@TK2MSFTNGP04.phx.gbl...
>"colin" <co*********@ntworld.NOSPAM.comwrote in message
news:p0******************@newsfe5-win.ntli.net...

Let me see, you are running a debug version in the IDE debugger and it
returns 9000, right? Well, It will be slower but not that slow, there is
clearly something wrong (this 9000 seems like a magic number, weird) .

yeah strange, maybe some who knows the internals can shed some light on
this?
You need to check your project/debug attributes for this project.
>>
>>is there a way to avoid the overhead the debugger being attached seems
to impose ?

No there is no posibility (other than running outside the debugger) to
avoid the overhead imposed by the fact you are 1) running non optimized
code 2) inside the IDE debugger.

without actually knowing where the code is spending its time during
those 9000 clock cyclces its hard to know whats cuasing it.
>>I only use the timer in debug mode to do profileing anyway,
but I gues I cld detect if the debuger is not atacched and dump it to a
file

You should not instrument debug builds, but if you do, you should accept
the overhead imposed by the environment (debugger, non optimized JIT
code).

yes good point, however I wanted to make sure the realease build was
clean.
Theres little difference in the run time beteween debug and release.

I was only looking for the most time consuming parts of the code
I didnt expect a 20 times increase in time just becuase of this timer
tho lol, the increase occurs in the release or debug version,
purly determined by the debugegr being attched or not.

thanks very much for your help finaly got a workable solution anyway :)
Colin =^.^=


The debug version of the code I posted, takes 230-350 cycles (that is the
time needed to service that rdtsc function)when running in the managed
debugger (IDE VS2008) on my box.
Which is quite normal, because the CLR has to signal the transition into
unmanaged code to the debugging engine (and obviously to the attached
debugger).

Willy.

Jan 17 '08 #19

"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:ex*************@TK2MSFTNGP04.phx.gbl...
....
You need to check your project/debug attributes for this project.
Ive looked in all the settings and theres hardly any at all that look like
they might make a difference,
espcially compared to the options for c++,
in the build options theres optimise code wich makes a slight difference.

there may well be options I havnt discovered yet,
I had to ask how to do 1 or 2 things so far only to be told you cant do
that,
then someone says its an option in a menu id never seen before by prerssing
"ctrl-d" + e
The debug version of the code I posted, takes 230-350 cycles (that is the
time needed to service that rdtsc function)when running in the managed
debugger (IDE VS2008) on my box.
Which is quite normal, because the CLR has to signal the transition into
unmanaged code to the debugging engine (and obviously to the attached
debugger).
so im seeing an increase of 30 times greater than what your seeing, maybe
it has to do with VS2005 express ?
im using .net2.0 although i cant see that making a difference,
I do notice however that there seems to be debug versions of the .net dlls

anyway Ive got my run time down from 3000 seconds to 20 seconds wich is
acceptable for now, there seems to be no single thing wich looks like it
could easily be optimised further, although some octal tree sectioning
<mighthelp.
but this is a complication il think of when ive done some other things,
and only if its realy necessary.

although the managed directx does seem to spend quite a while copying vertex
information,
wich is an array of structs,
I cant create an array of structs in advance becuse I dont know the size in
advance,
so I make a list wich it then has to convert to an array,
then copy it to the graphics device lol.
unfortunatly I cant make the struct array very big to start with becuase it
copies the whole array and complains if I tell it the size is smaller than
its length.

il probly try to work out the size in advance or something.

Colin =^.^=
Jan 17 '08 #20

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

15 posts views Thread by ham-z | last post: by
2 posts views Thread by Michael Evans | last post: by
19 posts views Thread by John | last post: by
11 posts views Thread by cnb | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.