473,783 Members | 2,286 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Interesting results in speed comparison with C++

I wrote two trivial test programs that do a billion iterations of a
virtual method call, first in C# (Visual Studio 2005):

Thing t = new DerivedThing();
for (System.Int64 n = 0; n < 10000000000; n++)
t.method();

Then in C++ (using Visual C++ 2005):

Thing *t = new DerivedThing();
for (__int64 n = 0; n < 10000000000L; n++)
t->method();

.... with appropriate declarations in each case for Thing (abstract
base class) and DerivedThing (method increments a counter).

C# took 47 seconds, C++ took 58 seconds. Both were release builds.

Now, given that the C++ implementation of virtual method dispatch is
very "close to the metal", this must mean that by the time the C#
version is running, there is no virtual method dispatch happening. The
CLR JIT must be inlining the method call, right? (I looked at the IL
and it's not being inlined by the C# compiler).

Then I tried moving the allocation of the DerivedThing inside the loop
- for the C++ program this also meant putting a 'delete t' after the
method call. Note that DerivedThing is a class in C#, not a struct,
and it holds some data.

C# took 13 seconds, C++ took 175 seconds. I was a bit shocked by this,
so ran both a few more times, with identical results.

I thought maybe the JIT looks at what I'm doing with the object and
realises that I'm not holding onto a reference to it outside of the
loop scope, and so it doesn't need to be allocated on the garbage
collected heap in the same way as a long-lived object. Of course to
know that, it would have to look at the code of method(), because it
could be stashing the 'this' reference somewhere.

So I modified DerivedThing's method so it stored the 'this' reference
in a static member, but only on the forteenth time (out of a billion!)
that it was called. Now the CLR has to allocate a garbage collected
object each time around the loop, right?

But this merely increased the running time to 16 seconds, still less
than 10% of the C++ result.

So maybe it inlines method(), then looks at what it does and
completely rewrites it to produce the same effect without allocating a
billion objects?

Are there any articles that will tell me what the CLR's garbage
collected heap (and/or the JIT) is actually doing in this case? How
can it be more than ten times faster than the non-garbage collected C+
+ heap?

Sep 7 '07 #1
12 4491
On 7 Sep, 12:30, Daniel Earwicker <daniel.earwic. ..@gmail.comwro te:
I wrote two trivial test programs that do a billion iterations of a
virtual method call, first in C# (Visual Studio 2005):
Oops, I should have said: the original programs did 10 billion
iterations, the modified versions (that do heap allocation) did 1
billion. Otherwise the results really would be crazy!
Sep 7 '07 #2
"Daniel Earwicker" <da************ **@gmail.comsch rieb im Newsbeitrag
news:11******** **************@ w3g2000hsg.goog legroups.com...
>I wrote two trivial test programs that do a billion iterations of a
virtual method call, first in C# (Visual Studio 2005):

Thing t = new DerivedThing();
for (System.Int64 n = 0; n < 10000000000; n++)
t.method();

Then in C++ (using Visual C++ 2005):

Thing *t = new DerivedThing();
for (__int64 n = 0; n < 10000000000L; n++)
t->method();

... with appropriate declarations in each case for Thing (abstract
base class) and DerivedThing (method increments a counter).

C# took 47 seconds, C++ took 58 seconds. Both were release builds.

Now, given that the C++ implementation of virtual method dispatch is
very "close to the metal", this must mean that by the time the C#
version is running, there is no virtual method dispatch happening. The
CLR JIT must be inlining the method call, right? (I looked at the IL
and it's not being inlined by the C# compiler).
Both versions need to do a virtual dispatch, because the method called
depends on the type of the instance, not on the type of the variable. Both
compilers have an easy chance to optimize the dispath away, because the
atual type is easy known and allways the same. Maybe one of them is smarter.
We would have to look at the runtime code to know this.

Christof
Sep 7 '07 #3
"Christof Nordiek" <cn@nospam.desc hrieb im Newsbeitrag
news:eK******** ******@TK2MSFTN GP04.phx.gbl...
Both compilers have an easy chance to optimize the dispath away, because
the atual type is easy known and allways the same.
This maybe could be prevented by putting the assignment outside of the
method and using a parameter of the derived type.

Christof
Sep 7 '07 #4
These types of comparisons are very superficial IMO. It's difficult to gauge
the overall performance of two compilers for the same language let alone two
different languages. Performance is affected by many things and will usually
have more to do with the compiler itself than the (potential) efficacy of
the language (not even considering all the optimizations which may or may
not be turned on). You may find this interesting:

http://www.research.att.com/~bs/performanceTR.pdf

For starters you may want to analyze the benchmarks beginning in section
2.3.2 and beyond (based on appendix D).
Sep 7 '07 #5
Arne Vajhøj wrote:
Weird.

When I tried making some code similar to yours I got:

csc 7.1 s
csc /o+ 5.9 s
cl /EHsc 4.5 s
cl /EHsc /Ox 3.2 s
I think I found something.

My code was not completely identical with yours.

I I change my code to use a 64 bit loop counter instead
of a 32 bit loop counter (but same classes and same number
of iterations) I get:
csc 5.0 s
csc /o+ 5.0 s
cl /EHsc 7.5 s
cl /EHsc /Ox 3.8 s

Which is interesting.

It looks as if the C# compiler & .NET JIT likes 64 bit loop counter
and C++ native compiler does not.

Arne
Sep 9 '07 #6
"Arne Vajhøj" <ar**@vajhoej.d kwrote in message
news:46******** *************** @news.sunsite.d k...
Arne Vajhøj wrote:
>Weird.

When I tried making some code similar to yours I got:

csc 7.1 s
csc /o+ 5.9 s
cl /EHsc 4.5 s
cl /EHsc /Ox 3.2 s

I think I found something.

My code was not completely identical with yours.

I I change my code to use a 64 bit loop counter instead
of a 32 bit loop counter (but same classes and same number
of iterations) I get:

How could you ever compile this with a count of 10000000000 using an Int32?,
this value is outside the range of an Int32!
csc 5.0 s
csc /o+ 5.0 s
cl /EHsc 7.5 s
cl /EHsc /Ox 3.8 s

Which is interesting.
And not possible IMHO, using a long instead of an int must result in slower
code on 32 bit windows.
Are you sure you are using 10000000000 as count value?

Willy.

Sep 9 '07 #7
Willy Denoyette [MVP] wrote:
"Arne Vajhøj" <ar**@vajhoej.d kwrote in message
news:46******** *************** @news.sunsite.d k...
>Arne Vajhøj wrote:
>>Weird.

When I tried making some code similar to yours I got:

csc 7.1 s
csc /o+ 5.9 s
cl /EHsc 4.5 s
cl /EHsc /Ox 3.2 s

I think I found something.

My code was not completely identical with yours.

I I change my code to use a 64 bit loop counter instead
of a 32 bit loop counter (but same classes and same number
of iterations) I get:

How could you ever compile this with a count of 10000000000 using an
Int32?, this value is outside the range of an Int32!
I just used 1/10 of it.
>csc 5.0 s
csc /o+ 5.0 s
cl /EHsc 7.5 s
cl /EHsc /Ox 3.8 s

Which is interesting.

And not possible IMHO, using a long instead of an int must result in
slower code on 32 bit windows.
32 bit windows mean that addresses are 32 bit. It does not say anything
about whether operations on 64 bit integers are fast or slow.
Are you sure you are using 10000000000 as count value?
I am sure that I did not.

I don't think the long operations are faster than the int operations.

Something must have been running during that test.

The point is that C++ with 64 bit counter without /Ox seems to be
unexplainable slow.

And my hypothesis is that the results the original poster was seeing
was due to the long counter without /Ox and not due to virtual methods
optimizations.

Arne
Sep 9 '07 #8
"Arne Vajhøj" <ar**@vajhoej.d kwrote in message
news:46******** *************** @news.sunsite.d k...
Willy Denoyette [MVP] wrote:
>"Arne Vajhøj" <ar**@vajhoej.d kwrote in message
news:46******* *************** *@news.sunsite. dk...
>>Arne Vajhøj wrote:
Weird.

When I tried making some code similar to yours I got:

csc 7.1 s
csc /o+ 5.9 s
cl /EHsc 4.5 s
cl /EHsc /Ox 3.2 s

I think I found something.

My code was not completely identical with yours.

I I change my code to use a 64 bit loop counter instead
of a 32 bit loop counter (but same classes and same number
of iterations) I get:

How could you ever compile this with a count of 10000000000 using an
Int32?, this value is outside the range of an Int32!

I just used 1/10 of it.
>>csc 5.0 s
csc /o+ 5.0 s
cl /EHsc 7.5 s
cl /EHsc /Ox 3.8 s

Which is interesting.

And not possible IMHO, using a long instead of an int must result in
slower code on 32 bit windows.

32 bit windows mean that addresses are 32 bit. It does not say anything
about whether operations on 64 bit integers are fast or slow.
This is not true,the code produced by the JIT when the counter type is a
long is not the same as the code produced when it's an int.

for(long n = 0; n < 1000000000; n++) !!! note the 1000000000 value!!!
....

// 32 bit version:
// Increments ecx two times until ecx overflow (that is 2 * 2^32).
// increment ecx for the remaining of 10000000000 - 4(2^32) (that is the
value 540BE400h)

....
001f00d8 83c101 add ecx,1
001f00db 83d600 adc esi,0
001f00de 85f6 test esi,esi
001f00e0 7f0a jg 001f00ec
001f00e2 7cf4 jl 001f00d8
001f00e4 81f900ca9a3b cmp ecx,3B9ACA00h
001f00ea 72ec jb 001f00d8
....

while a 32 bit increment like this:
for(int n = 0; n < 100000000; n++) !!! note the 1000000000 value!!!
produces this:

....
002100d6 83c001 add eax,1
002100d9 3d00ca9a3b cmp eax,3B9ACA00h
002100de 7cf6 jl 002100d6
....
Quite different and quite faster (~2X) to execute ?

>Are you sure you are using 10000000000 as count value?

I am sure that I did not.

I don't think the long operations are faster than the int operations.
No long operations are in general slower on 32 bit platforms than int
operations, see above!
Something must have been running during that test.

The point is that C++ with 64 bit counter without /Ox seems to be
unexplainable slow.

And my hypothesis is that the results the original poster was seeing
was due to the long counter without /Ox and not due to virtual methods
optimizations.
This is guesswork as long as the OP doesn't post the whole code and the
compiler flags used to compile he programs.
The C# compiler removes the call to the method when compiled with o+,
unless it returns a value that is used outside the loop, you are simply
counting is the time needed to increment a registered value 10000000000
times.
The 32 bit C++ compiler (using the Ox flag) does exactly the same, it
removes the call if the return value is not used outside the loop, the code
produced is exactly the same as what the JIT32 produces.

Willy.

Sep 9 '07 #9
"Willy Denoyette [MVP]" <wi************ *@telenet.bewro te in message
news:%2******** ********@TK2MSF TNGP04.phx.gbl. ..
"Arne Vajhøj" <ar**@vajhoej.d kwrote in message
news:46******** *************** @news.sunsite.d k...
>Willy Denoyette [MVP] wrote:
>>"Arne Vajhøj" <ar**@vajhoej.d kwrote in message
news:46****** *************** **@news.sunsite .dk...
Arne Vajhøj wrote:
Weird.
>
When I tried making some code similar to yours I got:
>
csc 7.1 s
csc /o+ 5.9 s
cl /EHsc 4.5 s
cl /EHsc /Ox 3.2 s

I think I found something.

My code was not completely identical with yours.

I I change my code to use a 64 bit loop counter instead
of a 32 bit loop counter (but same classes and same number
of iterations) I get:
How could you ever compile this with a count of 10000000000 using an
Int32?, this value is outside the range of an Int32!

I just used 1/10 of it.
>>>csc 5.0 s
csc /o+ 5.0 s
cl /EHsc 7.5 s
cl /EHsc /Ox 3.8 s

Which is interesting.

And not possible IMHO, using a long instead of an int must result in
slower code on 32 bit windows.

32 bit windows mean that addresses are 32 bit. It does not say anything
about whether operations on 64 bit integers are fast or slow.

This is not true,the code produced by the JIT when the counter type is a
long is not the same as the code produced when it's an int.

for(long n = 0; n < 1000000000; n++) !!! note the 1000000000 value!!!
...

// 32 bit version:
// Increments ecx two times until ecx overflow (that is 2 * 2^32).
// increment ecx for the remaining of 10000000000 - 4(2^32) (that is the
value 540BE400h)

...
001f00d8 83c101 add ecx,1
001f00db 83d600 adc esi,0
001f00de 85f6 test esi,esi
001f00e0 7f0a jg 001f00ec
001f00e2 7cf4 jl 001f00d8
001f00e4 81f900ca9a3b cmp ecx,3B9ACA00h
001f00ea 72ec jb 001f00d8
...

while a 32 bit increment like this:
for(int n = 0; n < 100000000; n++) !!! note the 1000000000 value!!!
produces this:

...
002100d6 83c001 add eax,1
002100d9 3d00ca9a3b cmp eax,3B9ACA00h
002100de 7cf6 jl 002100d6
...
Quite different and quite faster (~2X) to execute ?

>>Are you sure you are using 10000000000 as count value?

I am sure that I did not.

I don't think the long operations are faster than the int operations.
No long operations are in general slower on 32 bit platforms than int
operations, see above!
>Something must have been running during that test.

The point is that C++ with 64 bit counter without /Ox seems to be
unexplainabl e slow.

And my hypothesis is that the results the original poster was seeing
was due to the long counter without /Ox and not due to virtual methods
optimization s.

This is guesswork as long as the OP doesn't post the whole code and the
compiler flags used to compile he programs.
The C# compiler removes the call to the method when compiled with o+,
unless it returns a value that is used outside the loop, you are simply
counting is the time needed to increment a registered value 10000000000
times.
The 32 bit C++ compiler (using the Ox flag) does exactly the same, it
removes the call if the return value is not used outside the loop, the
code produced is exactly the same as what the JIT32 produces.

Willy.


Well I see I'm started to explain something in above like this:

// 32 bit version:
// Increments ecx two times until ecx overflow (that is 2 * 2^32).
// increment ecx for the remaining of 10000000000 - 4(2^32) (that is the
value 540BE400h)
.... but failed to add the code :-(
Here is the code produced for a count of 10000000000.

007a00ea 83c101 add ecx,1
007a00ed 83d600 adc esi,0
007a00f0 83fe02 cmp esi,2
007a00f3 7f0a jg 007a00ff
007a00f5 7cf3 jl 007a00ea
007a00f7 81f900e40b54 cmp ecx,540BE400h
007a00fd 72eb jb 007a00ea

You see almost the same code "sequence" is used, this because it's
determined by the type of the counter, and not by it's value.

Willy.

Sep 9 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
2717
by: Greg Brunet | last post by:
In doing some testing of different but simple algorithms for getting a list of prime numbers, I ended up getting some results that seem a bit contradictory. Given the following test program (testPrimes.py) with two algorithms that both check for primes by testing only odd numbers using factors up to the square root of the value, where Primes1 is based on all of the existing primes so far, and Primes2 is based on all odd numbers, I would...
29
3605
by: Bart Nessux | last post by:
Just fooling around this weekend. Wrote and timed programs in C, Perl and Python. Each Program counts to 1,000,000 and prints each number to the console as it counts. I was a bit surprised. I'm not an expert C or Perl programming expery, I'm most familiar with Python, but can use the others as well. Here are my results: C = 23 seconds Python = 26.5 seconds
15
2103
by: Nick Coghlan | last post by:
Thought some folks here might find this one interesting. No great revelations, just a fairly sensible piece on writing readable code :) The whole article: http://www.acmqueue.com/modules.php?name=Content&pa=showpage&pid=271&page=1 The section specifically on white space: http://www.acmqueue.com/modules.php?name=Content&pa=showpage&pid=271&page=3 Cheers,
3
1676
by: main\(\){}; | last post by:
I can't ignore the speed of .NET managed applications in manipulating string, I/O and arithmetic operations. However, we can never compare the speed of a C/C++ program with its .NET counterpart when it comes to some heavy operations, like long loops, graphics, load time and many other issues. The dream is; having an intermediate language (IL) run in the virtual machine at the speed of an unmanaged code. Java has in many ways solved this...
74
4610
by: aruna.mysore | last post by:
Hi all, I have a simple definitioin in a C file something like this. main() { char a; ....... int k; }
27
2340
by: Frederick Gotham | last post by:
I thought it might be interesting to share experiences of tracking down a subtle or mysterious bug. I myself haven't much experience with tracking down bugs, but there's one in particular which comes to mind. I was writing usable which dealt with strings. As per usual with my code, I made it efficient to the extreme. One thing I did was replace, where possible, any usages of "strlen" with something like: struct PtrAndLen { char *p;
27
3249
by: SQL Learner | last post by:
Hi all, I have an Access db with two large tables - 3,100,000 (tblA) and 7,000 (tblB) records. I created a select query using Inner Join by partial matching two fields (X from tblA and Y from tblB). The size of the db is about 200MBs. Now my issue is, the query has been running for over 3 hours already - I have no idea when it will end. I am using Access 2003. Are there ways to improve the speed performance? (Also, would the...
40
2730
by: nufuhsus | last post by:
Hello all, First let me appologise if this has been answered but I could not find an acurate answer to this interesting problem. If the following is true: C:\Python25\rg.py>python Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) on win32 Type "help", "copyright", "credits" or "license" for more
4
2341
by: wang frank | last post by:
Hi, While comparing the speed of octave and matlab, I decided to do a similar test for python and matlab. The result shows that python is slower than matlab by a factor of 5. It is not bad since octave is about 30 time slower than matlab. Here is the result in matlab: Elapsed time is 0.015389 seconds.
0
9480
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10147
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10083
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9946
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7494
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6737
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
4044
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3645
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2877
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.