By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
431,899 Members | 1,066 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 431,899 IT Pros & Developers. It's quick & easy.

Interesting results in speed comparison with C++

P: n/a
I wrote two trivial test programs that do a billion iterations of a
virtual method call, first in C# (Visual Studio 2005):

Thing t = new DerivedThing();
for (System.Int64 n = 0; n < 10000000000; n++)
t.method();

Then in C++ (using Visual C++ 2005):

Thing *t = new DerivedThing();
for (__int64 n = 0; n < 10000000000L; n++)
t->method();

.... with appropriate declarations in each case for Thing (abstract
base class) and DerivedThing (method increments a counter).

C# took 47 seconds, C++ took 58 seconds. Both were release builds.

Now, given that the C++ implementation of virtual method dispatch is
very "close to the metal", this must mean that by the time the C#
version is running, there is no virtual method dispatch happening. The
CLR JIT must be inlining the method call, right? (I looked at the IL
and it's not being inlined by the C# compiler).

Then I tried moving the allocation of the DerivedThing inside the loop
- for the C++ program this also meant putting a 'delete t' after the
method call. Note that DerivedThing is a class in C#, not a struct,
and it holds some data.

C# took 13 seconds, C++ took 175 seconds. I was a bit shocked by this,
so ran both a few more times, with identical results.

I thought maybe the JIT looks at what I'm doing with the object and
realises that I'm not holding onto a reference to it outside of the
loop scope, and so it doesn't need to be allocated on the garbage
collected heap in the same way as a long-lived object. Of course to
know that, it would have to look at the code of method(), because it
could be stashing the 'this' reference somewhere.

So I modified DerivedThing's method so it stored the 'this' reference
in a static member, but only on the forteenth time (out of a billion!)
that it was called. Now the CLR has to allocate a garbage collected
object each time around the loop, right?

But this merely increased the running time to 16 seconds, still less
than 10% of the C++ result.

So maybe it inlines method(), then looks at what it does and
completely rewrites it to produce the same effect without allocating a
billion objects?

Are there any articles that will tell me what the CLR's garbage
collected heap (and/or the JIT) is actually doing in this case? How
can it be more than ten times faster than the non-garbage collected C+
+ heap?

Sep 7 '07 #1
Share this Question
Share on Google+
12 Replies


P: n/a
On 7 Sep, 12:30, Daniel Earwicker <daniel.earwic...@gmail.comwrote:
I wrote two trivial test programs that do a billion iterations of a
virtual method call, first in C# (Visual Studio 2005):
Oops, I should have said: the original programs did 10 billion
iterations, the modified versions (that do heap allocation) did 1
billion. Otherwise the results really would be crazy!
Sep 7 '07 #2

P: n/a
"Daniel Earwicker" <da**************@gmail.comschrieb im Newsbeitrag
news:11**********************@w3g2000hsg.googlegro ups.com...
>I wrote two trivial test programs that do a billion iterations of a
virtual method call, first in C# (Visual Studio 2005):

Thing t = new DerivedThing();
for (System.Int64 n = 0; n < 10000000000; n++)
t.method();

Then in C++ (using Visual C++ 2005):

Thing *t = new DerivedThing();
for (__int64 n = 0; n < 10000000000L; n++)
t->method();

... with appropriate declarations in each case for Thing (abstract
base class) and DerivedThing (method increments a counter).

C# took 47 seconds, C++ took 58 seconds. Both were release builds.

Now, given that the C++ implementation of virtual method dispatch is
very "close to the metal", this must mean that by the time the C#
version is running, there is no virtual method dispatch happening. The
CLR JIT must be inlining the method call, right? (I looked at the IL
and it's not being inlined by the C# compiler).
Both versions need to do a virtual dispatch, because the method called
depends on the type of the instance, not on the type of the variable. Both
compilers have an easy chance to optimize the dispath away, because the
atual type is easy known and allways the same. Maybe one of them is smarter.
We would have to look at the runtime code to know this.

Christof
Sep 7 '07 #3

P: n/a
"Christof Nordiek" <cn@nospam.deschrieb im Newsbeitrag
news:eK**************@TK2MSFTNGP04.phx.gbl...
Both compilers have an easy chance to optimize the dispath away, because
the atual type is easy known and allways the same.
This maybe could be prevented by putting the assignment outside of the
method and using a parameter of the derived type.

Christof
Sep 7 '07 #4

P: n/a
These types of comparisons are very superficial IMO. It's difficult to gauge
the overall performance of two compilers for the same language let alone two
different languages. Performance is affected by many things and will usually
have more to do with the compiler itself than the (potential) efficacy of
the language (not even considering all the optimizations which may or may
not be turned on). You may find this interesting:

http://www.research.att.com/~bs/performanceTR.pdf

For starters you may want to analyze the benchmarks beginning in section
2.3.2 and beyond (based on appendix D).
Sep 7 '07 #5

P: n/a
Arne Vajh°j wrote:
Weird.

When I tried making some code similar to yours I got:

csc 7.1 s
csc /o+ 5.9 s
cl /EHsc 4.5 s
cl /EHsc /Ox 3.2 s
I think I found something.

My code was not completely identical with yours.

I I change my code to use a 64 bit loop counter instead
of a 32 bit loop counter (but same classes and same number
of iterations) I get:
csc 5.0 s
csc /o+ 5.0 s
cl /EHsc 7.5 s
cl /EHsc /Ox 3.8 s

Which is interesting.

It looks as if the C# compiler & .NET JIT likes 64 bit loop counter
and C++ native compiler does not.

Arne
Sep 9 '07 #6

P: n/a
"Arne Vajh°j" <ar**@vajhoej.dkwrote in message
news:46***********************@news.sunsite.dk...
Arne Vajh°j wrote:
>Weird.

When I tried making some code similar to yours I got:

csc 7.1 s
csc /o+ 5.9 s
cl /EHsc 4.5 s
cl /EHsc /Ox 3.2 s

I think I found something.

My code was not completely identical with yours.

I I change my code to use a 64 bit loop counter instead
of a 32 bit loop counter (but same classes and same number
of iterations) I get:

How could you ever compile this with a count of 10000000000 using an Int32?,
this value is outside the range of an Int32!
csc 5.0 s
csc /o+ 5.0 s
cl /EHsc 7.5 s
cl /EHsc /Ox 3.8 s

Which is interesting.
And not possible IMHO, using a long instead of an int must result in slower
code on 32 bit windows.
Are you sure you are using 10000000000 as count value?

Willy.

Sep 9 '07 #7

P: n/a
Willy Denoyette [MVP] wrote:
"Arne Vajh°j" <ar**@vajhoej.dkwrote in message
news:46***********************@news.sunsite.dk...
>Arne Vajh°j wrote:
>>Weird.

When I tried making some code similar to yours I got:

csc 7.1 s
csc /o+ 5.9 s
cl /EHsc 4.5 s
cl /EHsc /Ox 3.2 s

I think I found something.

My code was not completely identical with yours.

I I change my code to use a 64 bit loop counter instead
of a 32 bit loop counter (but same classes and same number
of iterations) I get:

How could you ever compile this with a count of 10000000000 using an
Int32?, this value is outside the range of an Int32!
I just used 1/10 of it.
>csc 5.0 s
csc /o+ 5.0 s
cl /EHsc 7.5 s
cl /EHsc /Ox 3.8 s

Which is interesting.

And not possible IMHO, using a long instead of an int must result in
slower code on 32 bit windows.
32 bit windows mean that addresses are 32 bit. It does not say anything
about whether operations on 64 bit integers are fast or slow.
Are you sure you are using 10000000000 as count value?
I am sure that I did not.

I don't think the long operations are faster than the int operations.

Something must have been running during that test.

The point is that C++ with 64 bit counter without /Ox seems to be
unexplainable slow.

And my hypothesis is that the results the original poster was seeing
was due to the long counter without /Ox and not due to virtual methods
optimizations.

Arne
Sep 9 '07 #8

P: n/a
"Arne Vajh°j" <ar**@vajhoej.dkwrote in message
news:46***********************@news.sunsite.dk...
Willy Denoyette [MVP] wrote:
>"Arne Vajh°j" <ar**@vajhoej.dkwrote in message
news:46***********************@news.sunsite.dk. ..
>>Arne Vajh°j wrote:
Weird.

When I tried making some code similar to yours I got:

csc 7.1 s
csc /o+ 5.9 s
cl /EHsc 4.5 s
cl /EHsc /Ox 3.2 s

I think I found something.

My code was not completely identical with yours.

I I change my code to use a 64 bit loop counter instead
of a 32 bit loop counter (but same classes and same number
of iterations) I get:

How could you ever compile this with a count of 10000000000 using an
Int32?, this value is outside the range of an Int32!

I just used 1/10 of it.
>>csc 5.0 s
csc /o+ 5.0 s
cl /EHsc 7.5 s
cl /EHsc /Ox 3.8 s

Which is interesting.

And not possible IMHO, using a long instead of an int must result in
slower code on 32 bit windows.

32 bit windows mean that addresses are 32 bit. It does not say anything
about whether operations on 64 bit integers are fast or slow.
This is not true,the code produced by the JIT when the counter type is a
long is not the same as the code produced when it's an int.

for(long n = 0; n < 1000000000; n++) !!! note the 1000000000 value!!!
....

// 32 bit version:
// Increments ecx two times until ecx overflow (that is 2 * 2^32).
// increment ecx for the remaining of 10000000000 - 4(2^32) (that is the
value 540BE400h)

....
001f00d8 83c101 add ecx,1
001f00db 83d600 adc esi,0
001f00de 85f6 test esi,esi
001f00e0 7f0a jg 001f00ec
001f00e2 7cf4 jl 001f00d8
001f00e4 81f900ca9a3b cmp ecx,3B9ACA00h
001f00ea 72ec jb 001f00d8
....

while a 32 bit increment like this:
for(int n = 0; n < 100000000; n++) !!! note the 1000000000 value!!!
produces this:

....
002100d6 83c001 add eax,1
002100d9 3d00ca9a3b cmp eax,3B9ACA00h
002100de 7cf6 jl 002100d6
....
Quite different and quite faster (~2X) to execute ?

>Are you sure you are using 10000000000 as count value?

I am sure that I did not.

I don't think the long operations are faster than the int operations.
No long operations are in general slower on 32 bit platforms than int
operations, see above!
Something must have been running during that test.

The point is that C++ with 64 bit counter without /Ox seems to be
unexplainable slow.

And my hypothesis is that the results the original poster was seeing
was due to the long counter without /Ox and not due to virtual methods
optimizations.
This is guesswork as long as the OP doesn't post the whole code and the
compiler flags used to compile he programs.
The C# compiler removes the call to the method when compiled with o+,
unless it returns a value that is used outside the loop, you are simply
counting is the time needed to increment a registered value 10000000000
times.
The 32 bit C++ compiler (using the Ox flag) does exactly the same, it
removes the call if the return value is not used outside the loop, the code
produced is exactly the same as what the JIT32 produces.

Willy.

Sep 9 '07 #9

P: n/a
"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:%2****************@TK2MSFTNGP04.phx.gbl...
"Arne Vajh°j" <ar**@vajhoej.dkwrote in message
news:46***********************@news.sunsite.dk...
>Willy Denoyette [MVP] wrote:
>>"Arne Vajh°j" <ar**@vajhoej.dkwrote in message
news:46***********************@news.sunsite.dk.. .
Arne Vajh°j wrote:
Weird.
>
When I tried making some code similar to yours I got:
>
csc 7.1 s
csc /o+ 5.9 s
cl /EHsc 4.5 s
cl /EHsc /Ox 3.2 s

I think I found something.

My code was not completely identical with yours.

I I change my code to use a 64 bit loop counter instead
of a 32 bit loop counter (but same classes and same number
of iterations) I get:
How could you ever compile this with a count of 10000000000 using an
Int32?, this value is outside the range of an Int32!

I just used 1/10 of it.
>>>csc 5.0 s
csc /o+ 5.0 s
cl /EHsc 7.5 s
cl /EHsc /Ox 3.8 s

Which is interesting.

And not possible IMHO, using a long instead of an int must result in
slower code on 32 bit windows.

32 bit windows mean that addresses are 32 bit. It does not say anything
about whether operations on 64 bit integers are fast or slow.

This is not true,the code produced by the JIT when the counter type is a
long is not the same as the code produced when it's an int.

for(long n = 0; n < 1000000000; n++) !!! note the 1000000000 value!!!
...

// 32 bit version:
// Increments ecx two times until ecx overflow (that is 2 * 2^32).
// increment ecx for the remaining of 10000000000 - 4(2^32) (that is the
value 540BE400h)

...
001f00d8 83c101 add ecx,1
001f00db 83d600 adc esi,0
001f00de 85f6 test esi,esi
001f00e0 7f0a jg 001f00ec
001f00e2 7cf4 jl 001f00d8
001f00e4 81f900ca9a3b cmp ecx,3B9ACA00h
001f00ea 72ec jb 001f00d8
...

while a 32 bit increment like this:
for(int n = 0; n < 100000000; n++) !!! note the 1000000000 value!!!
produces this:

...
002100d6 83c001 add eax,1
002100d9 3d00ca9a3b cmp eax,3B9ACA00h
002100de 7cf6 jl 002100d6
...
Quite different and quite faster (~2X) to execute ?

>>Are you sure you are using 10000000000 as count value?

I am sure that I did not.

I don't think the long operations are faster than the int operations.
No long operations are in general slower on 32 bit platforms than int
operations, see above!
>Something must have been running during that test.

The point is that C++ with 64 bit counter without /Ox seems to be
unexplainable slow.

And my hypothesis is that the results the original poster was seeing
was due to the long counter without /Ox and not due to virtual methods
optimizations.

This is guesswork as long as the OP doesn't post the whole code and the
compiler flags used to compile he programs.
The C# compiler removes the call to the method when compiled with o+,
unless it returns a value that is used outside the loop, you are simply
counting is the time needed to increment a registered value 10000000000
times.
The 32 bit C++ compiler (using the Ox flag) does exactly the same, it
removes the call if the return value is not used outside the loop, the
code produced is exactly the same as what the JIT32 produces.

Willy.


Well I see I'm started to explain something in above like this:

// 32 bit version:
// Increments ecx two times until ecx overflow (that is 2 * 2^32).
// increment ecx for the remaining of 10000000000 - 4(2^32) (that is the
value 540BE400h)
.... but failed to add the code :-(
Here is the code produced for a count of 10000000000.

007a00ea 83c101 add ecx,1
007a00ed 83d600 adc esi,0
007a00f0 83fe02 cmp esi,2
007a00f3 7f0a jg 007a00ff
007a00f5 7cf3 jl 007a00ea
007a00f7 81f900e40b54 cmp ecx,540BE400h
007a00fd 72eb jb 007a00ea

You see almost the same code "sequence" is used, this because it's
determined by the type of the counter, and not by it's value.

Willy.

Sep 9 '07 #10

P: n/a
Willy Denoyette [MVP] wrote:
"Arne Vajh°j" <ar**@vajhoej.dkwrote in message
news:46***********************@news.sunsite.dk...
>Willy Denoyette [MVP] wrote:
>>news:46***********************@news.sunsite.dk.. .
And not possible IMHO, using a long instead of an int must result in
slower code on 32 bit windows.

32 bit windows mean that addresses are 32 bit. It does not say anything
about whether operations on 64 bit integers are fast or slow.

This is not true
Yes. It is true.
,the code produced by the JIT when the counter type is a
long is not the same as the code produced when it's an int.

for(long n = 0; n < 1000000000; n++) !!! note the 1000000000 value!!!
...

// 32 bit version:
// Increments ecx two times until ecx overflow (that is 2 * 2^32).
// increment ecx for the remaining of 10000000000 - 4(2^32) (that is the
value 540BE400h)

...
001f00d8 83c101 add ecx,1
001f00db 83d600 adc esi,0
001f00de 85f6 test esi,esi
001f00e0 7f0a jg 001f00ec
001f00e2 7cf4 jl 001f00d8
001f00e4 81f900ca9a3b cmp ecx,3B9ACA00h
001f00ea 72ec jb 001f00d8
...

while a 32 bit increment like this:
for(int n = 0; n < 100000000; n++) !!! note the 1000000000 value!!!
produces this:

...
002100d6 83c001 add eax,1
002100d9 3d00ca9a3b cmp eax,3B9ACA00h
002100de 7cf6 jl 002100d6
...
Quite different and quite faster (~2X) to execute ?
Very interesting.

But completely irrelevant.

We are discussing whether the speed of 64 bit operations depend on
whether Windows is 32 or 64 bit - not whether 32 bit operations are
faster than 64 bit operations.
>The point is that C++ with 64 bit counter without /Ox seems to be
unexplainable slow.

And my hypothesis is that the results the original poster was seeing
was due to the long counter without /Ox and not due to virtual methods
optimizations.

This is guesswork as long as the OP doesn't post the whole code and the
compiler flags used to compile he programs.
The C# compiler removes the call to the method when compiled with o+,
unless it returns a value that is used outside the loop, you are simply
counting is the time needed to increment a registered value 10000000000
times.
The 32 bit C++ compiler (using the Ox flag) does exactly the same, it
removes the call if the return value is not used outside the loop, the
code produced is exactly the same as what the JIT32 produces.
It is guesswork, but I think it is the best guess so far.

Arne
Sep 9 '07 #11

P: n/a
"Arne Vajh°j" <ar**@vajhoej.dkwrote in message
news:46***********************@news.sunsite.dk...
Willy Denoyette [MVP] wrote:
>"Arne Vajh°j" <ar**@vajhoej.dkwrote in message
news:46***********************@news.sunsite.dk. ..
>>Willy Denoyette [MVP] wrote:
news:46***********************@news.sunsite.dk. ..
And not possible IMHO, using a long instead of an int must result in
slower code on 32 bit windows.

32 bit windows mean that addresses are 32 bit. It does not say anything
about whether operations on 64 bit integers are fast or slow.

This is not true

Yes. It is true.
Yes it's true, why do you ignore the code I posed for both 32 bit and 64 bit
integers, the JIT produces different and less optimum code for 64 bit types.
32 bit windows has no 64 registers to use for the counter, it has to use a
32 bit register and some extra instructions to account for the overflow whe
he has to deal with long counter types.
>
> ,the code produced by the JIT when the counter type is
a long is not the same as the code produced when it's an int.

for(long n = 0; n < 1000000000; n++) !!! note the 1000000000 value!!!
...

// 32 bit version:
// Increments ecx two times until ecx overflow (that is 2 * 2^32).
// increment ecx for the remaining of 10000000000 - 4(2^32) (that is the
value 540BE400h)

...
001f00d8 83c101 add ecx,1
001f00db 83d600 adc esi,0
001f00de 85f6 test esi,esi
001f00e0 7f0a jg 001f00ec
001f00e2 7cf4 jl 001f00d8
001f00e4 81f900ca9a3b cmp ecx,3B9ACA00h
001f00ea 72ec jb 001f00d8
...

while a 32 bit increment like this:
for(int n = 0; n < 100000000; n++) !!! note the 1000000000 value!!!
produces this:

...
002100d6 83c001 add eax,1
002100d9 3d00ca9a3b cmp eax,3B9ACA00h
002100de 7cf6 jl 002100d6
...
Quite different and quite faster (~2X) to execute ?

Very interesting.

But completely irrelevant.
We are discussing whether the speed of 64 bit operations depend on
whether Windows is 32 or 64 bit - not whether 32 bit operations are
faster than 64 bit operations.
Where did you ever mention 64 bit Windows in this thread?
You were comparing 32 bit with 64 bit counter types didn't you?
I clearly showed you that Int64 operations (loop counters) are slower than
Int32 on 32 bit windows.
>>The point is that C++ with 64 bit counter without /Ox seems to be
unexplainable slow.

And my hypothesis is that the results the original poster was seeing
was due to the long counter without /Ox and not due to virtual methods
optimizations.

This is guesswork as long as the OP doesn't post the whole code and the
compiler flags used to compile he programs.
The C# compiler removes the call to the method when compiled with o+,
unless it returns a value that is used outside the loop, you are simply
counting is the time needed to increment a registered value 10000000000
times.
The 32 bit C++ compiler (using the Ox flag) does exactly the same, it
removes the call if the return value is not used outside the loop, the
code produced is exactly the same as what the JIT32 produces.

It is guesswork, but I think it is the best guess so far.
No it's not, you did not prove anything, Just post your "complete code",
else this discussion makes no sense.
Willy.
Sep 10 '07 #12

P: n/a
"Daniel Earwicker" <da**************@gmail.comwrote in message
news:11**********************@w3g2000hsg.googlegro ups.com...
>I wrote two trivial test programs that do a billion iterations of a
virtual method call, first in C# (Visual Studio 2005):

Thing t = new DerivedThing();
for (System.Int64 n = 0; n < 10000000000; n++)
t.method();

Then in C++ (using Visual C++ 2005):

Thing *t = new DerivedThing();
for (__int64 n = 0; n < 10000000000L; n++)
t->method();

... with appropriate declarations in each case for Thing (abstract
base class) and DerivedThing (method increments a counter).

C# took 47 seconds, C++ took 58 seconds. Both were release builds.

Now, given that the C++ implementation of virtual method dispatch is
very "close to the metal", this must mean that by the time the C#
version is running, there is no virtual method dispatch happening. The
CLR JIT must be inlining the method call, right? (I looked at the IL
and it's not being inlined by the C# compiler).

Then I tried moving the allocation of the DerivedThing inside the loop
- for the C++ program this also meant putting a 'delete t' after the
method call. Note that DerivedThing is a class in C#, not a struct,
and it holds some data.

C# took 13 seconds, C++ took 175 seconds. I was a bit shocked by this,
so ran both a few more times, with identical results.

I thought maybe the JIT looks at what I'm doing with the object and
realises that I'm not holding onto a reference to it outside of the
loop scope, and so it doesn't need to be allocated on the garbage
collected heap in the same way as a long-lived object. Of course to
know that, it would have to look at the code of method(), because it
could be stashing the 'this' reference somewhere.

So I modified DerivedThing's method so it stored the 'this' reference
in a static member, but only on the forteenth time (out of a billion!)
that it was called. Now the CLR has to allocate a garbage collected
object each time around the loop, right?

But this merely increased the running time to 16 seconds, still less
than 10% of the C++ result.

So maybe it inlines method(), then looks at what it does and
completely rewrites it to produce the same effect without allocating a
billion objects?

Are there any articles that will tell me what the CLR's garbage
collected heap (and/or the JIT) is actually doing in this case? How
can it be more than ten times faster than the non-garbage collected C+
+ heap?


Running this on Windows 32 bit (Win2K3 SP2):

// C# code
using System;
using System.Diagnostics;
namespace Willys
{
abstract class Thing
{
int i;
internal virtual int Method()
{
return i++;
}
}
sealed class DerivedThing : Thing
{}
class Program
{
static long oneBillion = 1000000000;
static void Main()

{
Test1();
GC.Collect();
GC.WaitForPendingFinalizers();
Test2();
}
static void Test1()
{
DerivedThing dt = new DerivedThing();
Stopwatch watch = Stopwatch.StartNew();
for (long n = 0; n < oneBillion; n++)
dt.Method();
Console.WriteLine ("Test1: {0} msecs.", watch.ElapsedMilliseconds);
}
static void Test2()
{
Stopwatch watch = Stopwatch.StartNew();
for (long n = 0; n < oneBillion; n++)
{
DerivedThing dt = new DerivedThing();
dt.Method();
}
Console.WriteLine ("Test2: {0} msecs.", watch.ElapsedMilliseconds);
}
}
}

compiled with /o+, results in:
Test1: 3620 msecs.
Test2: 11325 msecs.

While running this:

// CPP code
#include <windows.h>
#include <cstdio>
class B
{
protected:
int i;
virtual int Method() = 0; // = 0 =pure virtual function

};

class C : B
{
public:
virtual int Method() {return i++;}
};

static __int64 oneBillion = 1000000000;
void Test1()
{
C *c = new C;
LARGE_INTEGER start, stop;
QueryPerformanceCounter(&start);
for(__int64 n = 0; n < oneBillion ; n++)
c->Method();
QueryPerformanceCounter(&stop);
printf_s("Test1: %I64d msecs.\n", (stop.QuadPart - start.QuadPart) /
10000);
}
void Test2()
{
LARGE_INTEGER start, stop;
QueryPerformanceCounter(&start);
for(__int64 n = 0; n < oneBillion ; n++)
{
C *c = new C();
c->Method();
delete c;
}
QueryPerformanceCounter(&stop);
printf_s("Test2: %I64d msecs.\n", (stop.QuadPart - start.QuadPart) /
10000);
}
int main()
{
Test1();
Test2();
}

compiled with /O2 or Ox, results in:

Test1: 1135 msecs.
Test2: 157780 msecs.

You see that C++ is 3X faster than C# for Test1, the reasons are:
1. some better optimized Method (5 instructions for C# vs. 4 for C++)
2. faster virtual dispatch for C++

Here are the disassemblies for both C# and C++ methods and call sites
(partial)
Method cs:
001e01f0 8b5104 mov edx,dword ptr [ecx+4]
001e01f3 8d4201 lea eax,[edx+1]
001e01f6 894104 mov dword ptr [ecx+4],eax
001e01f9 8bc2 mov eax,edx
001e01fb c3 ret

Call site cs:
....
001e0164 8b4de8 mov ecx,dword ptr [ebp-18h]
001e0167 8b01 mov eax,dword ptr [ecx]
001e0169 ff5038 call dword ptr [eax+38h]
001e016c 83c601 add esi,1
001e016f 83d700 adc edi,0
001e0172 3b3d20301600 cmp edi,dword ptr ds:[163020h]
001e0178 7f0a jg 001e0184
001e017a 7ce8 jl 001e0164
001e017c 3b351c301600 cmp esi,dword ptr ds:[16301Ch]
001e0182 72e0 jb 001e0164
....
Method cpp:
00401000 8b4104 mov eax,dword ptr [ecx+4]
00401003 8d5001 lea edx,[eax+1]
00401006 895104 mov dword ptr [ecx+4],edx
00401009 c3 ret

Call site cpp:
....
00401060 8b17 mov edx,dword ptr [edi]
00401062 8b02 mov eax,dword ptr [edx]
00401064 8bcf mov ecx,edi
00401066 ffd0 call eax
00401068 83c301 add ebx,1
0040106b 83d600 adc esi,0
0040106e 3b3504d04000 cmp esi,dword ptr [image00400000+0xd004
(0040d004)]
00401074 7cea jl image00400000+0x1060 (00401060)
00401076 7f08 jg image00400000+0x1080 (00401080)
00401078 3b1d00d04000 cmp ebx,dword ptr [image00400000+0xd000
(0040d000)]
0040107e 72e0 jb image00400000+0x1060 (00401060)
....

On the other end, Test2 is much faster in C#, this is because of the GC,
which can delay the collection of the garbage till after thousands of
instatiations. These collections are extremely fast, the net result is that
Test2 is ~12X faster in C# despite the tiny slower code and dispatch.

Willy.
Sep 10 '07 #13

This discussion thread is closed

Replies have been disabled for this discussion.