469,280 Members | 1,813 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,280 developers. It's quick & easy.

Why is C# 450% slower than C++ on nested loops ??

http://www.tommti-systems.de/go.html...enchmarks.html
The above link shows that C# is 450% slower on something as simple as a nested
loop. Is this because .NET is inherently slower or does the C# compiler merely
produce code that is not as well optimized as the C++ compiler?
Dec 28 '05
77 4736


"Christoph Nahr" <ch************@kynosarges.de> wrote in message
news:vd********************************@4ax.com...
On Thu, 29 Dec 2005 22:31:53 +0100, "Willy Denoyette [MVP]"
<wi*************@telenet.be> wrote:
Sorry, above is not completely true, the C# IL may differ as I only
compared
the IL of both: C++ /clr and C++ /clr:safe, and despite they are identical
(only for the loop part), the results are quite different (100%).
That means that even with identical IL the perf. level may differ, and
different IL may produce equal results. The secret (especially in this
case!) lays in the metadata which 'tells' the JIT that one is verifiable
code while the other is not, so it's allowed to generate different machine
code for the same IL.


That's an interesting idea but it must be something else (see below).

The unsafe method modifier allows the C# compiler to generate
unverifiable code, too, but a quick test showed absolutely no
difference between two benchmark runs -- one unchanged, one with the
"unsafe" modifier on the nested loop test. So apparently the machine
code generated for either flag is identical.
Marcus code posted in this thread is wrong,he changed:

x+=a+b+c+d+e+f;
into:
x++;
, please re-read his post and the follow-up,


Right, I saw his post now.
PS. find the IL for both in attachment.


You've lost me again. I've compared both IL sequences, and they are
quite different! The unverifiable version has a different calling
convention, a smaller local stack, calls to C++ library methods, and
overall a smaller IL instruction count.

My diff program can't find the identical loop part that you claim
exists because the two IL sequences are so different. Evidently the
/clr:safe switch causes the C++ compiler to generate different IL, not
just to set a different JIT flag.
--
http://www.kynosarges.de


Chris,

I guess you think both IL functions comes from the same C++ code, which
isn't the case. The first comes from the original benchmark (modified to
take a __int64 for x) the second is the same function modified such that it
can be compiled in safe mode. For completeness I included both functions[1].

I never said the IL for the whole function was the same, I said - identical
IL for the "loop part only", Note that I did include the loop start and end
address labels in the attached file, just compare what's in between these
and you will see that the IL is the same (but don't use diff, the address
labels and local variable labels are different).

I know that the calling convention is different (cdecl/clrcall) and I know
that there are call's into unmanage code like clock() and printf(), but
these don't account for the time it takes to run the loop.

And, applying the 'unsafe' modifier on the loop is a nop, there is nothing
"unsafe" happening inside the loop (unless you modified the code and used
pointer arithmetic), the JIT is smarter than that. If you really want to
know what's happening, you'll have to attach a native debugger set a
breakpoint before the call of the nl function, and single step through the
JIT code (make sure you have the right symbols loaded for mscorwks.dll),
there you will see that a different path is taken for the "mixed mode"
assembly (the one compiled with /clr) based on some metadata flags.

PS. I think we realy have to let this thread die, or we must move this to a
private conversation.

Willy.

[C++/CLI /clr]
__int64 nl(int n) {
double elapsedTime;
clock_t stopTime;
clock_t startTime = clock();

int a, b, c, d, e, f;
__int64 x=0;
//--- start of the loop
for (a=0; a<n; a++)
for (b=0; b<n; b++)
for (c=0; c<n; c++)
for (d=0; d<n; d++)
for (e=0; e<n; e++)
for (f=0; f<n; f++)
x+=a+b+c+d+e+f;
// end of the loop
stopTime = clock();
elapsedTime = (stopTime - startTime) / (CLOCKS_PER_SEC / (double) 1000.0);
printf("Nested Loop elapsed time: %1.0f ms %I64d\n", elapsedTime, x);

return (__int64)elapsedTime;
}

[C++/CLI /clr:safe]
...
static void nl(int n) {
int a, b, c, d, e, f;
__int64 x = 0;
// use System::Diagnostics::StopWatch to time the loop
Stopwatch^ stopWatch = gcnew Stopwatch;
stopWatch->Start();
//--- start of the loop
for (a=0; a<n; a++)
for (b=0; b<n; b++)
for (c=0; c<n; c++)
for (d=0; d<n; d++)
for (e=0; e<n; e++)
for (f=0; f<n; f++)
x+=a+b+c+d+e+f;
// end of the loop
stopWatch->Stop();
TimeSpan ts = stopWatch->Elapsed;

System::Console::WriteLine("Nested Loop elapsed time: {0} sec.{1} msec.
{2}\n", ts.Seconds, ts.Milliseconds, x);

}


Dec 30 '05 #51
Willy,

I hang my head in shame -- once I had edited out the different time
measuring prologue/epilogue and deleted the labels, the remaining IL
code was indeed exactly the same. I should have looked more closely,
but my diff was thrown off by the different line labels and marked so
many differences that I thought the IL was significantly different.
And, applying the 'unsafe' modifier on the loop is a nop, there is nothing
"unsafe" happening inside the loop (unless you modified the code and used
pointer arithmetic), the JIT is smarter than that.
Or dumber! If a simple metadata flag can create faster code then it
sure would be nice if the C# /unsafe option would set that flag...
If you really want to
know what's happening, you'll have to attach a native debugger set a
breakpoint before the call of the nl function, and single step through the
JIT code (make sure you have the right symbols loaded for mscorwks.dll),
there you will see that a different path is taken for the "mixed mode"
assembly (the one compiled with /clr) based on some metadata flags.
Very interesting. I think I'll have to experiment a bit and see if
there's a way to force C# to emit those flags.
PS. I think we realy have to let this thread die, or we must move this to a
private conversation.


No problem, I believe you now. I'm just stunned that the JIT compiler
would create better code based on some secret flags that C# simply
doesn't emit. :-(
--
http://www.kynosarges.de
Dec 30 '05 #52
Here is an interesting article on the optimizations of the new 2005 compiler.
http://msdn.microsoft.com/msdnmag/is...5/VisualC2005/
"Christoph Nahr" <ch************@kynosarges.de> wrote in message
news:vd********************************@4ax.com...
On Thu, 29 Dec 2005 22:31:53 +0100, "Willy Denoyette [MVP]"
<wi*************@telenet.be> wrote:
Sorry, above is not completely true, the C# IL may differ as I only compared
the IL of both: C++ /clr and C++ /clr:safe, and despite they are identical
(only for the loop part), the results are quite different (100%).
That means that even with identical IL the perf. level may differ, and
different IL may produce equal results. The secret (especially in this
case!) lays in the metadata which 'tells' the JIT that one is verifiable
code while the other is not, so it's allowed to generate different machine
code for the same IL.


That's an interesting idea but it must be something else (see below).

The unsafe method modifier allows the C# compiler to generate
unverifiable code, too, but a quick test showed absolutely no
difference between two benchmark runs -- one unchanged, one with the
"unsafe" modifier on the nested loop test. So apparently the machine
code generated for either flag is identical.
Marcus code posted in this thread is wrong,he changed:

x+=a+b+c+d+e+f;
into:
x++;
, please re-read his post and the follow-up,


Right, I saw his post now.
PS. find the IL for both in attachment.


You've lost me again. I've compared both IL sequences, and they are
quite different! The unverifiable version has a different calling
convention, a smaller local stack, calls to C++ library methods, and
overall a smaller IL instruction count.

My diff program can't find the identical loop part that you claim
exists because the two IL sequences are so different. Evidently the
/clr:safe switch causes the C++ compiler to generate different IL, not
just to set a different JIT flag.
--
http://www.kynosarges.de

Dec 30 '05 #53

"Anders Borum" <an****@sphereworks.dk> wrote in message
news:ed**************@TK2MSFTNGP10.phx.gbl...
Hello!
It must have been after the JIT, because C# beat C++ on some things, just not
the nested loops. The author of the link would know relatively obvious things
such as this.
Unless somebody documents what methods have been used to measure the
benchmarks, I am disposing the data altogether. Benchmarks should be
verifiable ..


I searched high and low and finally found some stuff on unverified code. The
MSDN documentation mentions it yet does not discuss what it means. It could be
anything from security checks to tests for memory leaks. It doesn't say what its
purpose is.
--
With regards
Anders Borum / SphereWorks
Microsoft Certified Professional (.NET MCP)

Dec 30 '05 #54
http://msdn.microsoft.com/msdnmag/is...5/VisualC2005/
The above is an interesting link that deals with the latest 2005 optimizations.
If it is something more than marketing hype my problems may be solved. I already
mentioned why I am posting here. I must find out in advance whether or not my
crucial 100-line function will run fast enough in .NET, and what I can do about
it if it doesn't. The above link makes it very clear which optimizations are
performed, and when they are performed. Like I already said most of them are
performed before the MSIL is generated. Of course the crucial optimization of
register allocation must wait until after the specific machine architecture is
known. In other words it must be done by the JIT.

"Anders Borum" <an****@sphereworks.dk> wrote in message
news:O$*************@TK2MSFTNGP09.phx.gbl...
I think Mr. Olcott was busy filing patent requests instead of reading the
excellent posting by Mr. Denoyette. I must admit that I have learned some great
things about IL and optimization from this thread, but I am still wondering why
some people post here in the first place ..

--
With regards
Anders Borum / SphereWorks
Microsoft Certified Professional (.NET MCP)

Dec 30 '05 #55

"Willy Denoyette [MVP]" <wi*************@telenet.be> wrote in message
news:O%****************@tk2msftngp13.phx.gbl...


"Christoph Nahr" <ch************@kynosarges.de> wrote in message
news:vd********************************@4ax.com...
On Thu, 29 Dec 2005 22:31:53 +0100, "Willy Denoyette [MVP]"
<wi*************@telenet.be> wrote:
Sorry, above is not completely true, the C# IL may differ as I only compared
the IL of both: C++ /clr and C++ /clr:safe, and despite they are identical
(only for the loop part), the results are quite different (100%).
That means that even with identical IL the perf. level may differ, and
different IL may produce equal results. The secret (especially in this
case!) lays in the metadata which 'tells' the JIT that one is verifiable
code while the other is not, so it's allowed to generate different machine
code for the same IL.
That's an interesting idea but it must be something else (see below).

The unsafe method modifier allows the C# compiler to generate
unverifiable code, too, but a quick test showed absolutely no
difference between two benchmark runs -- one unchanged, one with the
"unsafe" modifier on the nested loop test. So apparently the machine
code generated for either flag is identical.
Marcus code posted in this thread is wrong,he changed:

x+=a+b+c+d+e+f;
into:
x++;
, please re-read his post and the follow-up,


Right, I saw his post now.
PS. find the IL for both in attachment.


You've lost me again. I've compared both IL sequences, and they are
quite different! The unverifiable version has a different calling
convention, a smaller local stack, calls to C++ library methods, and
overall a smaller IL instruction count.

My diff program can't find the identical loop part that you claim
exists because the two IL sequences are so different. Evidently the
/clr:safe switch causes the C++ compiler to generate different IL, not
just to set a different JIT flag.
--
http://www.kynosarges.de


Chris,

I guess you think both IL functions comes from the same C++ code, which isn't
the case. The first comes from the original benchmark (modified to take a
__int64 for x) the second is the same function modified such that it can be
compiled in safe mode. For completeness I included both functions[1].


That would explain the 450% performance difference on a machine with a 32-bit
int. It must do the same sort of thing as the old 16-bit machines emulating a
32-bit int, handle it as two pieces.

I never said the IL for the whole function was the same, I said - identical IL
for the "loop part only", Note that I did include the loop start and end
address labels in the attached file, just compare what's in between these and
you will see that the IL is the same (but don't use diff, the address labels
and local variable labels are different).

I know that the calling convention is different (cdecl/clrcall) and I know
that there are call's into unmanage code like clock() and printf(), but these
don't account for the time it takes to run the loop.

And, applying the 'unsafe' modifier on the loop is a nop, there is nothing
"unsafe" happening inside the loop (unless you modified the code and used
pointer arithmetic), the JIT is smarter than that. If you really want to know
what's happening, you'll have to attach a native debugger set a breakpoint
before the call of the nl function, and single step through the JIT code (make
sure you have the right symbols loaded for mscorwks.dll), there you will see
that a different path is taken for the "mixed mode" assembly (the one compiled
with /clr) based on some metadata flags.

PS. I think we realy have to let this thread die, or we must move this to a
private conversation.

Willy.

[C++/CLI /clr]
__int64 nl(int n) {
double elapsedTime;
clock_t stopTime;
clock_t startTime = clock();

int a, b, c, d, e, f;
__int64 x=0;
//--- start of the loop
for (a=0; a<n; a++)
for (b=0; b<n; b++)
for (c=0; c<n; c++)
for (d=0; d<n; d++)
for (e=0; e<n; e++)
for (f=0; f<n; f++)
x+=a+b+c+d+e+f;
// end of the loop
stopTime = clock();
elapsedTime = (stopTime - startTime) / (CLOCKS_PER_SEC / (double) 1000.0);
printf("Nested Loop elapsed time: %1.0f ms %I64d\n", elapsedTime, x);

return (__int64)elapsedTime;
}

[C++/CLI /clr:safe]
..
static void nl(int n) {
int a, b, c, d, e, f;
__int64 x = 0;
// use System::Diagnostics::StopWatch to time the loop
Stopwatch^ stopWatch = gcnew Stopwatch;
stopWatch->Start();
//--- start of the loop
for (a=0; a<n; a++)
for (b=0; b<n; b++)
for (c=0; c<n; c++)
for (d=0; d<n; d++)
for (e=0; e<n; e++)
for (f=0; f<n; f++)
x+=a+b+c+d+e+f;
// end of the loop
stopWatch->Stop();
TimeSpan ts = stopWatch->Elapsed;

System::Console::WriteLine("Nested Loop elapsed time: {0} sec.{1} msec.
{2}\n", ts.Seconds, ts.Milliseconds, x);

}



Dec 30 '05 #56

"Christoph Nahr" <ch************@kynosarges.de> wrote in message
news:68********************************@4ax.com...
Willy,

I hang my head in shame -- once I had edited out the different time
measuring prologue/epilogue and deleted the labels, the remaining IL
code was indeed exactly the same. I should have looked more closely,
but my diff was thrown off by the different line labels and marked so
many differences that I thought the IL was significantly different.
And, applying the 'unsafe' modifier on the loop is a nop, there is nothing
"unsafe" happening inside the loop (unless you modified the code and used
pointer arithmetic), the JIT is smarter than that.


Or dumber! If a simple metadata flag can create faster code then it
sure would be nice if the C# /unsafe option would set that flag...
If you really want to
know what's happening, you'll have to attach a native debugger set a
breakpoint before the call of the nl function, and single step through the
JIT code (make sure you have the right symbols loaded for mscorwks.dll),
there you will see that a different path is taken for the "mixed mode"
assembly (the one compiled with /clr) based on some metadata flags.


Very interesting. I think I'll have to experiment a bit and see if
there's a way to force C# to emit those flags.
PS. I think we realy have to let this thread die, or we must move this to a
private conversation.


No problem, I believe you now. I'm just stunned that the JIT compiler
would create better code based on some secret flags that C# simply
doesn't emit. :-(
--
http://www.kynosarges.de


/clr:safe prohibits some of the optimizations that clt:pure allows.
Dec 30 '05 #57
Peter Olcott wrote:
"Anders Borum" <an****@sphereworks.dk> wrote in message
news:ed**************@TK2MSFTNGP10.phx.gbl...
Hello!

It must have been after the JIT, because C# beat C++ on some things, just not
the nested loops. The author of the link would know relatively obvious things
such as this.


Unless somebody documents what methods have been used to measure the
benchmarks, I am disposing the data altogether. Benchmarks should be
verifiable ..

I searched high and low and finally found some stuff on unverified code. The
MSDN documentation mentions it yet does not discuss what it means. It could be
anything from security checks to tests for memory leaks. It doesn't say what its
purpose is.
--
With regards
Anders Borum / SphereWorks
Microsoft Certified Professional (.NET MCP)


I guess you should google for 'define:verify'. Verifiable benchmarks
have nothing to do with verifiable code. Anders just wanted the
benchmarks to be verifiably *valid*, which they aren't by any means. A
nested loop written the way it is in the benchmark is measuring nothing
but a compiler's ability to optimize nested loops that do more or less
nothing. Verifiable code means (in the .NET vocabulary) that the CLR can
statically verify the code to ensure it will do nothing it is not
expected to do. But enough of that, let's see what native assembly the
compilers generate for the loop and get over with it.

<...>

ha! I won't post the complete disassemblies (the C++ one is terribly
cryptic so it would do no help), but I've found the following. The C++
compiler (8.0 in my case, but I suspect 6.0 is doing the same) manages
to pre-cache the additions in the outer loops, which the C# compiler
doesn't. Thus, in the C# code all the additions that happen at

x+=a+b+c+d+e+f;

are evaluated for every single innermost loop, while c++ does something
like the following:

for (a = 0; a < n; a++)
{
for (b = 0, tmp_ab = a; b < n; b++, tmp_ab++)
{
for (c = 0, tmp_abc = tmp_ab; c < n; c++, tmp_abc++)
{
for (d = 0, tmp_abcd = tmp_abc; d < n; d++, tmp_abcd++)
{
for (e = 0, tmp_abcde = tmp_abcd; e < n; e++, tmp_abcde++)
{
for (f = 0, tmp = tmp_abcde; f < n; f++, tmp++)
x += tmp;
}
}
}
}
}

If you compile this code in C#, the execution time is 6734 ms for .NET
2.0 and 8593 ms for .NET 1.1 versus 3656 ms for unmanaged C++ (8.0) on
my machine. But note that the innermost loop in the C++ is only four
instructions. This can hardly be matched to any real-life algorithm, and
I can assure you that if there was anything more than 'nothing' in the
inner loop, the performance difference would be much less than 80% (I
expect it to drop far below 10% for most real-life algorithms).

That said, I wouldn't expect C# (or .NET) to match pure native code
performance for purely computational tasks like the one you describe is.
C# can win easily in cases where a lot of dynamic allocation is
involved, etc., but it will probably never outperform optimized native
C++ (or handwritten assembly) for computational tasks. If you need to
squeeze every clock of your CPU, you will probably get best results
using an optimizing compiler targeted for exactly the CPU the code will
run on.

I didn't want to write all this, but after reading through the long
discussions on this matter without touching the importatnt points, I
just decided to do so :)

Just my 2c.

Stefan
Dec 30 '05 #58

"Stefan Simek" <si**********@triaxis.nospam.sk> wrote in message
news:e2**************@tk2msftngp13.phx.gbl...
Peter Olcott wrote:
"Anders Borum" <an****@sphereworks.dk> wrote in message
news:ed**************@TK2MSFTNGP10.phx.gbl...
Hello!
It must have been after the JIT, because C# beat C++ on some things, just
not the nested loops. The author of the link would know relatively obvious
things such as this.

Unless somebody documents what methods have been used to measure the
benchmarks, I am disposing the data altogether. Benchmarks should be
verifiable ..

I searched high and low and finally found some stuff on unverified code. The
MSDN documentation mentions it yet does not discuss what it means. It could
be anything from security checks to tests for memory leaks. It doesn't say
what its purpose is.
--
With regards
Anders Borum / SphereWorks
Microsoft Certified Professional (.NET MCP)


I guess you should google for 'define:verify'. Verifiable benchmarks have
nothing to do with verifiable code. Anders just wanted the benchmarks to be
verifiably *valid*, which they aren't by any means. A nested loop written the
way it is in the benchmark is measuring nothing but a compiler's ability to
optimize nested loops that do more or less nothing. Verifiable code means (in
the .NET vocabulary) that the CLR can statically verify the code to ensure it
will do nothing it is not expected to do. But enough of that, let's see what
native assembly the


Yet I just found a Microsoft article that says that this sort of technology is
between 20 and 50 years away. Code is not expected to be incorrect. Proving that
code is not incorrect is technolgy that does not yet exist. I don't know what it
is verifying, but, correct code is not it. The posted benchmark was crucial to
my needs because my critical 100-line function is mostly loops.
compilers generate for the loop and get over with it.

<...>

ha! I won't post the complete disassemblies (the C++ one is terribly cryptic
so it would do no help), but I've found the following. The C++ compiler (8.0
in my case, but I suspect 6.0 is doing the same) manages to pre-cache the
additions in the outer loops, which the C# compiler doesn't. Thus, in the C#
code all the additions that happen at

x+=a+b+c+d+e+f;

are evaluated for every single innermost loop, while c++ does something like
the following:

for (a = 0; a < n; a++)
{
for (b = 0, tmp_ab = a; b < n; b++, tmp_ab++)
{
for (c = 0, tmp_abc = tmp_ab; c < n; c++, tmp_abc++)
{
for (d = 0, tmp_abcd = tmp_abc; d < n; d++, tmp_abcd++)
{
for (e = 0, tmp_abcde = tmp_abcd; e < n; e++, tmp_abcde++)
{
for (f = 0, tmp = tmp_abcde; f < n; f++, tmp++)
x += tmp;
}
}
}
}
}

If you compile this code in C#, the execution time is 6734 ms for .NET 2.0 and
8593 ms for .NET 1.1 versus 3656 ms for unmanaged C++ (8.0) on my machine. But
note that the innermost loop in the C++ is only four instructions. This can
hardly be matched to any real-life algorithm, and I can assure you that if
there was anything more than 'nothing' in the inner loop, the performance
difference would be much less than 80% (I expect it to drop far below 10% for
most real-life algorithms).
Since this is not the same code that was posted on the benchmark, it is not a
valid test. If you are worried about arithmetic overflow then define X as a
double. I want to know why there was a 450% difference in the original
benchmark. When you change the code, this avoids rather than answers the
question. It is reported that 2005 does a much better job of optimization of
..NET code, yet, only with the C++, not the C# compiler.

That said, I wouldn't expect C# (or .NET) to match pure native code
performance for purely computational tasks like the one you describe is. C#
can win easily in cases where a lot of dynamic allocation is involved, etc.,
but it will probably never outperform optimized native C++ (or handwritten
assembly) for computational tasks. If you need to squeeze every clock of your
CPU, you will probably get best results using an optimizing compiler targeted
for exactly the CPU the code will run on.
There is no reason (in theory) why .NET code would need to be any slower than
native code. The reason in practice seems to be that Microsoft just hasn't
gotten around to implementing every possible optimization in every language,
yet. The one tricky thing about optimization in .NET is that it is divided into
two different processes. Some of the optimizations at the compile step can screw
up some of the optimizations at the JIT step. The problem is that the compile
step does not know how many registers the target architecture has.

I didn't want to write all this, but after reading through the long
discussions on this matter without touching the importatnt points, I just
decided to do so :)

Just my 2c.

Stefan

Dec 31 '05 #59

"Stefan Simek" <si**********@triaxis.nospam.sk> wrote in message
news:e2**************@tk2msftngp13.phx.gbl...
Peter Olcott wrote:
"Anders Borum" <an****@sphereworks.dk> wrote in message
news:ed**************@TK2MSFTNGP10.phx.gbl...
Hello!
It must have been after the JIT, because C# beat C++ on some things,
just not the nested loops. The author of the link would know relatively
obvious things such as this.

Unless somebody documents what methods have been used to measure the
benchmarks, I am disposing the data altogether. Benchmarks should be
verifiable ..

I searched high and low and finally found some stuff on unverified code.
The MSDN documentation mentions it yet does not discuss what it means. It
could be anything from security checks to tests for memory leaks. It
doesn't say what its purpose is.
--
With regards
Anders Borum / SphereWorks
Microsoft Certified Professional (.NET MCP)


I guess you should google for 'define:verify'. Verifiable benchmarks have
nothing to do with verifiable code. Anders just wanted the benchmarks to
be verifiably *valid*, which they aren't by any means. A nested loop
written the way it is in the benchmark is measuring nothing but a
compiler's ability to optimize nested loops that do more or less nothing.
Verifiable code means (in the .NET vocabulary) that the CLR can statically
verify the code to ensure it will do nothing it is not expected to do. But
enough of that, let's see what native assembly the compilers generate for
the loop and get over with it.

<...>

ha! I won't post the complete disassemblies (the C++ one is terribly
cryptic so it would do no help), but I've found the following. The C++
compiler (8.0 in my case, but I suspect 6.0 is doing the same) manages to
pre-cache the additions in the outer loops, which the C# compiler doesn't.
Thus, in the C# code all the additions that happen at

x+=a+b+c+d+e+f;

are evaluated for every single innermost loop, while c++ does something
like the following:

for (a = 0; a < n; a++)
{
for (b = 0, tmp_ab = a; b < n; b++, tmp_ab++)
{
for (c = 0, tmp_abc = tmp_ab; c < n; c++, tmp_abc++)
{
for (d = 0, tmp_abcd = tmp_abc; d < n; d++, tmp_abcd++)
{
for (e = 0, tmp_abcde = tmp_abcd; e < n; e++, tmp_abcde++)
{
for (f = 0, tmp = tmp_abcde; f < n; f++, tmp++)
x += tmp;
}
}
}
}
}

If you compile this code in C#, the execution time is 6734 ms for .NET 2.0
and 8593 ms for .NET 1.1 versus 3656 ms for unmanaged C++ (8.0) on my
machine. But note that the innermost loop in the C++ is only four
instructions. This can hardly be matched to any real-life algorithm, and I
can assure you that if there was anything more than 'nothing' in the inner
loop, the performance difference would be much less than 80% (I expect it
to drop far below 10% for most real-life algorithms).


The snip you posted is meaningless without the value of n and the types of
the local variables, so please post the whole function.
Willy.

Dec 31 '05 #60
Willy Denoyette [MVP] wrote:
"Stefan Simek" <si**********@triaxis.nospam.sk> wrote in message
news:e2**************@tk2msftngp13.phx.gbl...
Peter Olcott wrote:
"Anders Borum" <an****@sphereworks.dk> wrote in message
news:ed**************@TK2MSFTNGP10.phx.gbl...
Hello!

>It must have been after the JIT, because C# beat C++ on some things,
>just not the nested loops. The author of the link would know relatively
>obvious things such as this.

Unless somebody documents what methods have been used to measure the
benchmarks, I am disposing the data altogether. Benchmarks should be
verifiable ..
I searched high and low and finally found some stuff on unverified code.
The MSDN documentation mentions it yet does not discuss what it means. It
could be anything from security checks to tests for memory leaks. It
doesn't say what its purpose is.
--
With regards
Anders Borum / SphereWorks
Microsoft Certified Professional (.NET MCP)

I guess you should google for 'define:verify'. Verifiable benchmarks have
nothing to do with verifiable code. Anders just wanted the benchmarks to
be verifiably *valid*, which they aren't by any means. A nested loop
written the way it is in the benchmark is measuring nothing but a
compiler's ability to optimize nested loops that do more or less nothing.
Verifiable code means (in the .NET vocabulary) that the CLR can statically
verify the code to ensure it will do nothing it is not expected to do. But
enough of that, let's see what native assembly the compilers generate for
the loop and get over with it.

<...>

ha! I won't post the complete disassemblies (the C++ one is terribly
cryptic so it would do no help), but I've found the following. The C++
compiler (8.0 in my case, but I suspect 6.0 is doing the same) manages to
pre-cache the additions in the outer loops, which the C# compiler doesn't.
Thus, in the C# code all the additions that happen at

x+=a+b+c+d+e+f;

are evaluated for every single innermost loop, while c++ does something
like the following:

for (a = 0; a < n; a++)
{
for (b = 0, tmp_ab = a; b < n; b++, tmp_ab++)
{
for (c = 0, tmp_abc = tmp_ab; c < n; c++, tmp_abc++)
{
for (d = 0, tmp_abcd = tmp_abc; d < n; d++, tmp_abcd++)
{
for (e = 0, tmp_abcde = tmp_abcd; e < n; e++, tmp_abcde++)
{
for (f = 0, tmp = tmp_abcde; f < n; f++, tmp++)
x += tmp;
}
}
}
}
}

If you compile this code in C#, the execution time is 6734 ms for .NET 2.0
and 8593 ms for .NET 1.1 versus 3656 ms for unmanaged C++ (8.0) on my
machine. But note that the innermost loop in the C++ is only four
instructions. This can hardly be matched to any real-life algorithm, and I
can assure you that if there was anything more than 'nothing' in the inner
loop, the performance difference would be much less than 80% (I expect it
to drop far below 10% for most real-life algorithms).



The snip you posted is meaningless without the value of n and the types of
the local variables, so please post the whole function.
Willy.


The code I have used is from the original benchmark from the link the OP
has mentioned so many times. The C++ code is at
http://www.tommti-systems.de/main-Da.../Benchmark.cpp

and CS code is at
http://www.tommti-systems.de/main-Da...s/Benchmark.cs

The pseudocode I've posted is just an example close to what the C++
compiler makes out of the original code, based on my analysis of the
generated x86 assembly.

Stefan
Dec 31 '05 #61
Peter Olcott wrote:
"Stefan Simek" <si**********@triaxis.nospam.sk> wrote in message
news:e2**************@tk2msftngp13.phx.gbl...
Peter Olcott wrote:
"Anders Borum" <an****@sphereworks.dk> wrote in message
news:ed**************@TK2MSFTNGP10.phx.gbl...
Hello!

>It must have been after the JIT, because C# beat C++ on some things, just
>not the nested loops. The author of the link would know relatively obvious
>things such as this.

Unless somebody documents what methods have been used to measure the
benchmarks, I am disposing the data altogether. Benchmarks should be
verifiable ..
I searched high and low and finally found some stuff on unverified code. The
MSDN documentation mentions it yet does not discuss what it means. It could
be anything from security checks to tests for memory leaks. It doesn't say
what its purpose is.
--
With regards
Anders Borum / SphereWorks
Microsoft Certified Professional (.NET MCP)

I guess you should google for 'define:verify'. Verifiable benchmarks have
nothing to do with verifiable code. Anders just wanted the benchmarks to be
verifiably *valid*, which they aren't by any means. A nested loop written the
way it is in the benchmark is measuring nothing but a compiler's ability to
optimize nested loops that do more or less nothing. Verifiable code means (in
the .NET vocabulary) that the CLR can statically verify the code to ensure it
will do nothing it is not expected to do. But enough of that, let's see what
native assembly the

Yet I just found a Microsoft article that says that this sort of technology is
between 20 and 50 years away. Code is not expected to be incorrect. Proving that
code is not incorrect is technolgy that does not yet exist. I don't know what it
is verifying, but, correct code is not it. The posted benchmark was crucial to
my needs because my critical 100-line function is mostly loops.


That's why the code must meet certain criteria in order to be verifiable
today. C# compiler generates such code, and the C++/CLI compiler is
able to do so.

compilers generate for the loop and get over with it.

<...>

ha! I won't post the complete disassemblies (the C++ one is terribly cryptic
so it would do no help), but I've found the following. The C++ compiler (8.0
in my case, but I suspect 6.0 is doing the same) manages to pre-cache the
additions in the outer loops, which the C# compiler doesn't. Thus, in the C#
code all the additions that happen at

x+=a+b+c+d+e+f;

are evaluated for every single innermost loop, while c++ does something like
the following:

for (a = 0; a < n; a++)
{
for (b = 0, tmp_ab = a; b < n; b++, tmp_ab++)
{
for (c = 0, tmp_abc = tmp_ab; c < n; c++, tmp_abc++)
{
for (d = 0, tmp_abcd = tmp_abc; d < n; d++, tmp_abcd++)
{
for (e = 0, tmp_abcde = tmp_abcd; e < n; e++, tmp_abcde++)
{
for (f = 0, tmp = tmp_abcde; f < n; f++, tmp++)
x += tmp;
}
}
}
}
}

If you compile this code in C#, the execution time is 6734 ms for .NET 2.0 and
8593 ms for .NET 1.1 versus 3656 ms for unmanaged C++ (8.0) on my machine. But
note that the innermost loop in the C++ is only four instructions. This can
hardly be matched to any real-life algorithm, and I can assure you that if
there was anything more than 'nothing' in the inner loop, the performance
difference would be much less than 80% (I expect it to drop far below 10% for
most real-life algorithms).

Since this is not the same code that was posted on the benchmark, it is not a
valid test. If you are worried about arithmetic overflow then define X as a
double. I want to know why there was a 450% difference in the original
benchmark. When you change the code, this avoids rather than answers the
question. It is reported that 2005 does a much better job of optimization of
.NET code, yet, only with the C++, not the C# compiler.


No. I gave you and exact answer *why* there is the difference. The C++
translates the original code logic to something comparable to the
snippet I have posted. C# compiler fails to do this optimization. But it
has nothing to do with nested loops per se. It is an optimization of
the addition performed in the innermost loop.

That said, I wouldn't expect C# (or .NET) to match pure native code
performance for purely computational tasks like the one you describe is. C#
can win easily in cases where a lot of dynamic allocation is involved, etc.,
but it will probably never outperform optimized native C++ (or handwritten
assembly) for computational tasks. If you need to squeeze every clock of your
CPU, you will probably get best results using an optimizing compiler targeted
for exactly the CPU the code will run on.

There is no reason (in theory) why .NET code would need to be any slower than
native code. The reason in practice seems to be that Microsoft just hasn't
gotten around to implementing every possible optimization in every language,
yet. The one tricky thing about optimization in .NET is that it is divided into
two different processes. Some of the optimizations at the compile step can screw
up some of the optimizations at the JIT step. The problem is that the compile
step does not know how many registers the target architecture has.


That's why I used the word *probably*. The JIT is much more time
constrained, so it *might* never be able to match the optimizations of
traditional compilers. It *should* certainly give better *average*
results across platforms in the future, because it makes use of any
available hardware present.

I didn't want to write all this, but after reading through the long
discussions on this matter without touching the importatnt points, I just
decided to do so :)

Just my 2c.

Stefan



If you really want to get answers to your questions, not just troll
around, you really can get them here. If you don't...

Stefan
Dec 31 '05 #62

">
The code I have used is from the original benchmark from the link the OP
has mentioned so many times. The C++ code is at
http://www.tommti-systems.de/main-Da.../Benchmark.cpp

and CS code is at
http://www.tommti-systems.de/main-Da...s/Benchmark.cs

The pseudocode I've posted is just an example close to what the C++
compiler makes out of the original code, based on my analysis of the
generated x86 assembly.

Stefan


Wait a minute, You said...
<If you compile this code in C#, the execution time is 6734 ms for .NET ...>
To me "this code" means you ran YOUR code and not the one from the above URL
(you can't possibly achieve such results using the original code), so please
post the C# function you used to achieve the results you posted.

Willy.
Dec 31 '05 #63
Willy Denoyette [MVP] wrote:
">
The code I have used is from the original benchmark from the link the OP
has mentioned so many times. The C++ code is at
http://www.tommti-systems.de/main-Da.../Benchmark.cpp

and CS code is at
http://www.tommti-systems.de/main-Da...s/Benchmark.cs

The pseudocode I've posted is just an example close to what the C++
compiler makes out of the original code, based on my analysis of the
generated x86 assembly.

Stefan

Wait a minute, You said...
<If you compile this code in C#, the execution time is 6734 ms for .NET ...>
To me "this code" means you ran YOUR code and not the one from the above URL
(you can't possibly achieve such results using the original code), so please
post the C# function you used to achieve the results you posted.

Willy.


"This code" was meant to resemble the optimizations made by the C++
compiler. I used it to demonstrate that the problem is not in nested
loops themselves, but in the optimization of the addition statement. For
the sake of completeness, here is the code. I thought it's obvious which
part of the code should be replaced, as well as that the types of the
tmp variables are int (introducing a single long in the benchmarking
loop would make it absolutely invalid for 32-bit platforms).

public static long nl(int n)
{
long elapsedMilliseconds;
DateTime startTime = DateTime.Now;

int a, b, c, d, e, f;
int tmp_ab, tmp_abc, tmp_abcd, tmp_abcde, tmp;
int x = 0;

if (n < 1) n = 1;

for (a = 0; a < n; a++)
{
for (b = 0, tmp_ab = a; b < n; b++, tmp_ab++)
{
for (c = 0, tmp_abc = tmp_ab; c < n; c++, tmp_abc++)
{
for (d = 0, tmp_abcd = tmp_abc; d < n; d++, tmp_abcd++)
{
for (e = 0, tmp_abcde = tmp_abcd; e < n; e++,
tmp_abcde++)
{
for (f = 0, tmp = tmp_abcde; f < n; f++, tmp++)
x += tmp;
}
}
}
}
}

// Console.WriteLine(x.ToString() + "\n");
DateTime stopTime = DateTime.Now;
TimeSpan elapsedTime = stopTime.Subtract(startTime);
elapsedMilliseconds = (int)elapsedTime.TotalMilliseconds;
Console.WriteLine("Nested Loop elapsed time: " +
elapsedMilliseconds + " ms - " + x.ToString());

return elapsedMilliseconds;
}

Is this better?
Dec 31 '05 #64

"Stefan Simek" <si**********@triaxis.nospam.sk> wrote in message
news:uu**************@TK2MSFTNGP11.phx.gbl...

Is this better?


Somewhat...
Following are the result I obtained when running [1], here I call both the
original function (modified to prevent x to overflow) and your modified
function (here named nl1).

Start C# benchmark
Nested Loop elapsed time: 9952 ms - 479232000000
Nested Loop elapsed time: 13171 ms - -1804337152
End C# benchmark

Notes:
- the result (value of x) is incorrect for nl1.
<quote
(introducing a single long in the benchmarking
loop would make it absolutely invalid for 32-bit platforms).
/quote>
Don't know what is meant by this. The result of the calculation yiels a
value that is > int.MAX, that means that there a billions of arithmetic
overruns (479232000000 - int.Max ), you consider this "valid"?
-I'm also not clear on why your run is 2X faster than mine, your CPU isn't
twice as fast as mine I 'm sure.

- Notice that the original function performs better than yours. The reason
is simple, the modified nl (where x is a long) doesn't incur a hit when x
overflows.
Your function has a better loop algorithm, but still incurs the hit because
of the overflow on the int result x. Note that all modern x86 CPU's (I know
of) take a (tiny) hit on an integer arithmetic overflow.
- If you compile with /checked+, you code will throw.

- Following is the result of another run where x in nl1 is changed into a
long.
Note also that the result (value of x) is correct now!

Start C# benchmark
Nested Loop elapsed time: 9983 ms - 479232000000
Nested Loop elapsed time: 9998 ms - 479232000000
End C# benchmark

Willy.

[1]
// Compile this with: csc /o+ filename.cs

/* original code copyright 2004 Christopher W. Cowell-Shah
http://www.cowell-shah.com/research/benchmark/code */
/* other code portions copyright http://dada.perl.it/shootout/ and Doug
Bagley http://www.bagley.org/~doug/shootout */
/* combined, modified and fixed by Thomas Bruckschlegel -
http://www.tommti-systems.com */
using System;
using System.Collections;
using System.Text;
using System.IO;
using System.Runtime.InteropServices;
using System.Security.Permissions;
namespace Benchmark_CSharp
{
class BenchmarkCSharp
{
static DateTime startTime;
static DateTime stopTime;
static TimeSpan elapsedTime;

[STAThread]
static void Main(string[] args)
{
Console.WriteLine("Start C# benchmark");
int n=10;
nl(n*4);
nl1(n*4);
Console.WriteLine("End C# benchmark");
}
public static void nl(int n)
{
long elapsedMilliseconds;

int a, b, c, d, e, f;
long x=0;
if (n < 1) n = 1;
startTime = DateTime.Now;

for (a=0; a!=n; a++)
for (b=0; b!=n; b++)
for (c=0; c!=n; c++)
for (d=0; d!=n; d++)
for (e=0; e!=n; e++)
for (f=0; f!=n; f++)
x+=a+b+c+d+e+f;

stopTime = DateTime.Now;
elapsedTime = stopTime.Subtract(startTime);
elapsedMilliseconds = (int)elapsedTime.TotalMilliseconds;
Console.WriteLine("Nested Loop elapsed time: " + elapsedMilliseconds + "
ms - "+x.ToString());
}
//end

// Your modified function
public static long nl1(int n)
{
long elapsedMilliseconds;
DateTime startTime = DateTime.Now;
int a, b, c, d, e, f;
int tmp_ab, tmp_abc, tmp_abcd, tmp_abcde;
int tmp = 0;
int x = 0;

if (n < 1) n = 1;

for (a = 0; a < n; a++)
{
for (b = 0, tmp_ab = a; b < n; b++, tmp_ab++)
{
for (c = 0, tmp_abc = tmp_ab; c < n; c++, tmp_abc++)
{
for (d = 0, tmp_abcd = tmp_abc; d < n; d++, tmp_abcd++)
{
for (e = 0, tmp_abcde = tmp_abcd; e < n; e++,
tmp_abcde++)
{
for (f = 0, tmp = tmp_abcde; f < n; f++, tmp++)
x += tmp;
}
}
}
}
}

// Console.WriteLine(x.ToString() + "\n");
DateTime stopTime = DateTime.Now;
TimeSpan elapsedTime = stopTime.Subtract(startTime);
elapsedMilliseconds = (int)elapsedTime.TotalMilliseconds;
Console.WriteLine("Nested Loop elapsed time: " + elapsedMilliseconds +
" ms - " + x.ToString());

return elapsedMilliseconds;
}

}
}


Dec 31 '05 #65
Willy Denoyette [MVP] wrote:
"Stefan Simek" <si**********@triaxis.nospam.sk> wrote in message
news:uu**************@TK2MSFTNGP11.phx.gbl...
Is this better?

Somewhat...
Following are the result I obtained when running [1], here I call both the
original function (modified to prevent x to overflow) and your modified
function (here named nl1).

Start C# benchmark
Nested Loop elapsed time: 9952 ms - 479232000000
Nested Loop elapsed time: 13171 ms - -1804337152
End C# benchmark


The original benchmark included the overflows. Your modification to
prevent the overflow makes it invalid, IMO. Overflows can be sometimes a
desired effect, for example when calculating checksums. Changing the
type of x to long introduces another problem - 64-bit addition, which is
non-trivial on 32-bit platforms.

Notes:
- the result (value of x) is incorrect for nl1.
<quote
(introducing a single long in the benchmarking
loop would make it absolutely invalid for 32-bit platforms).
/quote>
Don't know what is meant by this. The result of the calculation yiels a
value that is > int.MAX, that means that there a billions of arithmetic
overruns (479232000000 - int.Max ), you consider this "valid"?
-I'm also not clear on why your run is 2X faster than mine, your CPU isn't
twice as fast as mine I 'm sure.
The result of the calculation is not important in the benchmark, IMO.
What is important is the time the calculation takes to complete.

- Notice that the original function performs better than yours. The reason
is simple, the modified nl (where x is a long) doesn't incur a hit when x
overflows. The performance hit of adding to 64 bit integer should be much higher
than any hit caused by the overflow that happens once in a time.
Your function has a better loop algorithm, but still incurs the hit because
of the overflow on the int result x. Note that all modern x86 CPU's (I know
of) take a (tiny) hit on an integer arithmetic overflow. Not sure of that, but not sure it isn't true either... But arithmetic
overflows are desirable in many cases anyway.
- If you compile with /checked+, you code will throw. I still think that the overflow is desired in this benchmark, so I would
wrap the *whole* benchmarking code in an unchecked { } block. /checked+
hits performance quite a lot.

- Following is the result of another run where x in nl1 is changed into a
long.
Note also that the result (value of x) is correct now!

Start C# benchmark
Nested Loop elapsed time: 9983 ms - 479232000000
Nested Loop elapsed time: 9998 ms - 479232000000
End C# benchmark
Interesting, when I change the x to long, I get 25437 ms execution time.
This clearly shows the difference between 32 bit and 64 bit arithmetic.
Are you sure you don't have an x64 machine? That would explain the
difference between our results perfectly.

Willy.

[1]
// Compile this with: csc /o+ filename.cs

/* original code copyright 2004 Christopher W. Cowell-Shah
http://www.cowell-shah.com/research/benchmark/code */
/* other code portions copyright http://dada.perl.it/shootout/ and Doug
Bagley http://www.bagley.org/~doug/shootout */
/* combined, modified and fixed by Thomas Bruckschlegel -
http://www.tommti-systems.com */
using System;
using System.Collections;
using System.Text;
using System.IO;
using System.Runtime.InteropServices;
using System.Security.Permissions;
namespace Benchmark_CSharp
{
class BenchmarkCSharp
{
static DateTime startTime;
static DateTime stopTime;
static TimeSpan elapsedTime;

[STAThread]
static void Main(string[] args)
{
Console.WriteLine("Start C# benchmark");
int n=10;
nl(n*4);
nl1(n*4);
Console.WriteLine("End C# benchmark");
}
public static void nl(int n)
{
long elapsedMilliseconds;

int a, b, c, d, e, f;
long x=0;
if (n < 1) n = 1;
startTime = DateTime.Now;

for (a=0; a!=n; a++)
for (b=0; b!=n; b++)
for (c=0; c!=n; c++)
for (d=0; d!=n; d++)
for (e=0; e!=n; e++)
for (f=0; f!=n; f++)
x+=a+b+c+d+e+f;

stopTime = DateTime.Now;
elapsedTime = stopTime.Subtract(startTime);
elapsedMilliseconds = (int)elapsedTime.TotalMilliseconds;
Console.WriteLine("Nested Loop elapsed time: " + elapsedMilliseconds + "
ms - "+x.ToString());
}
//end

// Your modified function
public static long nl1(int n)
{
long elapsedMilliseconds;
DateTime startTime = DateTime.Now;
int a, b, c, d, e, f;
int tmp_ab, tmp_abc, tmp_abcd, tmp_abcde;
int tmp = 0;
int x = 0;

if (n < 1) n = 1;

for (a = 0; a < n; a++)
{
for (b = 0, tmp_ab = a; b < n; b++, tmp_ab++)
{
for (c = 0, tmp_abc = tmp_ab; c < n; c++, tmp_abc++)
{
for (d = 0, tmp_abcd = tmp_abc; d < n; d++, tmp_abcd++)
{
for (e = 0, tmp_abcde = tmp_abcd; e < n; e++,
tmp_abcde++)
{
for (f = 0, tmp = tmp_abcde; f < n; f++, tmp++)
x += tmp;
}
}
}
}
}

// Console.WriteLine(x.ToString() + "\n");
DateTime stopTime = DateTime.Now;
TimeSpan elapsedTime = stopTime.Subtract(startTime);
elapsedMilliseconds = (int)elapsedTime.TotalMilliseconds;
Console.WriteLine("Nested Loop elapsed time: " + elapsedMilliseconds +
" ms - " + x.ToString());

return elapsedMilliseconds;
}

}
}


I don't want to persuade anyone of my truth. As I've mentioned in my
first post in this endless thread, I just wanted to point out some facts
*I* considered relevant. It's everyone's right to disagree and make
their own opinion, and I enjoy having this right as well.

Stefan
Dec 31 '05 #66

"Stefan Simek" <si**********@triaxis.nospam.sk> wrote in message
news:O0*************@TK2MSFTNGP11.phx.gbl...
Peter Olcott wrote:
"Stefan Simek" <si**********@triaxis.nospam.sk> wrote in message
news:e2**************@tk2msftngp13.phx.gbl...
Peter Olcott wrote:

"Anders Borum" <an****@sphereworks.dk> wrote in message
news:ed**************@TK2MSFTNGP10.phx.gbl.. .
>Hello!
>
>
>
>>It must have been after the JIT, because C# beat C++ on some things, just
>>not the nested loops. The author of the link would know relatively obvious
>>things such as this.
>
>Unless somebody documents what methods have been used to measure the
>benchmarks, I am disposing the data altogether. Benchmarks should be
>verifiable ..
I searched high and low and finally found some stuff on unverified code. The
MSDN documentation mentions it yet does not discuss what it means. It could
be anything from security checks to tests for memory leaks. It doesn't say
what its purpose is.
>--
>With regards
>Anders Borum / SphereWorks
>Microsoft Certified Professional (.NET MCP)
>

I guess you should google for 'define:verify'. Verifiable benchmarks have
nothing to do with verifiable code. Anders just wanted the benchmarks to be
verifiably *valid*, which they aren't by any means. A nested loop written the
way it is in the benchmark is measuring nothing but a compiler's ability to
optimize nested loops that do more or less nothing. Verifiable code means (in
the .NET vocabulary) that the CLR can statically verify the code to ensure it
will do nothing it is not expected to do. But enough of that, let's see what
native assembly the

Yet I just found a Microsoft article that says that this sort of technology
is between 20 and 50 years away. Code is not expected to be incorrect.
Proving that code is not incorrect is technolgy that does not yet exist. I
don't know what it is verifying, but, correct code is not it. The posted
benchmark was crucial to my needs because my critical 100-line function is
mostly loops.


That's why the code must meet certain criteria in order to be verifiable
today. C# compiler generates such code, and the C++/CLI compiler is able to do
so.

compilers generate for the loop and get over with it.

<...>

ha! I won't post the complete disassemblies (the C++ one is terribly cryptic
so it would do no help), but I've found the following. The C++ compiler (8.0
in my case, but I suspect 6.0 is doing the same) manages to pre-cache the
additions in the outer loops, which the C# compiler doesn't. Thus, in the C#
code all the additions that happen at

x+=a+b+c+d+e+f;

are evaluated for every single innermost loop, while c++ does something like
the following:

for (a = 0; a < n; a++)
{
for (b = 0, tmp_ab = a; b < n; b++, tmp_ab++)
{
for (c = 0, tmp_abc = tmp_ab; c < n; c++, tmp_abc++)
{
for (d = 0, tmp_abcd = tmp_abc; d < n; d++, tmp_abcd++)
{
for (e = 0, tmp_abcde = tmp_abcd; e < n; e++, tmp_abcde++)
{
for (f = 0, tmp = tmp_abcde; f < n; f++, tmp++)
x += tmp;
}
}
}
}
}

If you compile this code in C#, the execution time is 6734 ms for .NET 2.0
and 8593 ms for .NET 1.1 versus 3656 ms for unmanaged C++ (8.0) on my
machine. But note that the innermost loop in the C++ is only four
instructions. This can hardly be matched to any real-life algorithm, and I
can assure you that if there was anything more than 'nothing' in the inner
loop, the performance difference would be much less than 80% (I expect it to
drop far below 10% for most real-life algorithms).

Since this is not the same code that was posted on the benchmark, it is not a
valid test. If you are worried about arithmetic overflow then define X as a
double. I want to know why there was a 450% difference in the original
benchmark. When you change the code, this avoids rather than answers the
question. It is reported that 2005 does a much better job of optimization of
.NET code, yet, only with the C++, not the C# compiler.


No. I gave you and exact answer *why* there is the difference. The C++
translates the original code logic to something comparable to the snippet I
have posted. C# compiler fails to do this optimization. But it has nothing to
do with nested loops per se. It is an optimization of the addition performed
in the innermost loop.

That said, I wouldn't expect C# (or .NET) to match pure native code
performance for purely computational tasks like the one you describe is. C#
can win easily in cases where a lot of dynamic allocation is involved, etc.,
but it will probably never outperform optimized native C++ (or handwritten
assembly) for computational tasks. If you need to squeeze every clock of your
CPU, you will probably get best results using an optimizing compiler targeted
for exactly the CPU the code will run on.

There is no reason (in theory) why .NET code would need to be any slower than
native code. The reason in practice seems to be that Microsoft just hasn't
gotten around to implementing every possible optimization in every language,
yet. The one tricky thing about optimization in .NET is that it is divided
into two different processes. Some of the optimizations at the compile step
can screw up some of the optimizations at the JIT step. The problem is that
the compile step does not know how many registers the target architecture
has.


That's why I used the word *probably*. The JIT is much more time constrained,
so it *might* never be able to match the optimizations of traditional
compilers. It *should* certainly give better *average*


http://msdn.microsoft.com/msdnmag/is...5/VisualC2005/
If you read this link you will see that this is not even the way that it works.
As I already said, most of the optimizations are accomplished by the language
compiler, not the JIT.
results across platforms in the future, because it makes use of any available
hardware present.

I didn't want to write all this, but after reading through the long
discussions on this matter without touching the importatnt points, I just
decided to do so :)

Just my 2c.

Stefan



If you really want to get answers to your questions, not just troll around,
you really can get them here. If you don't...

Stefan

Dec 31 '05 #67

"Stefan Simek" <si**********@triaxis.nospam.sk> wrote in message
news:uu**************@TK2MSFTNGP11.phx.gbl...
Willy Denoyette [MVP] wrote:
">
The code I have used is from the original benchmark from the link the OP has
mentioned so many times. The C++ code is at
http://www.tommti-systems.de/main-Da.../Benchmark.cpp

and CS code is at
http://www.tommti-systems.de/main-Da...s/Benchmark.cs

The pseudocode I've posted is just an example close to what the C++ compiler
makes out of the original code, based on my analysis of the generated x86
assembly.

Stefan

Wait a minute, You said...
<If you compile this code in C#, the execution time is 6734 ms for .NET ...>
To me "this code" means you ran YOUR code and not the one from the above URL
(you can't possibly achieve such results using the original code), so please
post the C# function you used to achieve the results you posted.

Willy.


"This code" was meant to resemble the optimizations made by the C++ compiler.
I used it to demonstrate that the problem is not in nested


Why not just post the actual code generated from the actual C++ benchmark?
loops themselves, but in the optimization of the addition statement. For the
sake of completeness, here is the code. I thought it's obvious which part of
the code should be replaced, as well as that the types of the tmp variables
are int (introducing a single long in the benchmarking loop would make it
absolutely invalid for 32-bit platforms).

public static long nl(int n)
{
long elapsedMilliseconds;
DateTime startTime = DateTime.Now;

int a, b, c, d, e, f;
int tmp_ab, tmp_abc, tmp_abcd, tmp_abcde, tmp;
int x = 0;

if (n < 1) n = 1;

for (a = 0; a < n; a++)
{
for (b = 0, tmp_ab = a; b < n; b++, tmp_ab++)
{
for (c = 0, tmp_abc = tmp_ab; c < n; c++, tmp_abc++)
{
for (d = 0, tmp_abcd = tmp_abc; d < n; d++, tmp_abcd++)
{
for (e = 0, tmp_abcde = tmp_abcd; e < n; e++, tmp_abcde++)
{
for (f = 0, tmp = tmp_abcde; f < n; f++, tmp++)
x += tmp;
}
}
}
}
}

// Console.WriteLine(x.ToString() + "\n");
DateTime stopTime = DateTime.Now;
TimeSpan elapsedTime = stopTime.Subtract(startTime);
elapsedMilliseconds = (int)elapsedTime.TotalMilliseconds;
Console.WriteLine("Nested Loop elapsed time: " + elapsedMilliseconds + "
ms - " + x.ToString());

return elapsedMilliseconds;
}
Is this better?

Dec 31 '05 #68

"Willy Denoyette [MVP]" <wi*************@telenet.be> wrote in message
news:e$*************@tk2msftngp13.phx.gbl...

"Stefan Simek" <si**********@triaxis.nospam.sk> wrote in message
news:uu**************@TK2MSFTNGP11.phx.gbl...

Is this better?


Somewhat...
Following are the result I obtained when running [1], here I call both the
original function (modified to prevent x to overflow) and your modified
function (here named nl1).

Start C# benchmark
Nested Loop elapsed time: 9952 ms - 479232000000
Nested Loop elapsed time: 13171 ms - -1804337152
End C# benchmark

Notes:
- the result (value of x) is incorrect for nl1.
<quote
(introducing a single long in the benchmarking
loop would make it absolutely invalid for 32-bit platforms).
/quote>
Don't know what is meant by this. The result of the calculation yiels a value
that is > int.MAX, that means that there a billions of arithmetic overruns
(479232000000 - int.Max ), you consider this "valid"?
-I'm also not clear on why your run is 2X faster than mine, your CPU isn't
twice as fast as mine I 'm sure.

- Notice that the original function performs better than yours. The reason is
simple, the modified nl (where x is a long) doesn't incur a hit when x
overflows.
Your function has a better loop algorithm, but still incurs the hit because of
the overflow on the int result x. Note that all modern x86 CPU's (I know of)
take a (tiny) hit on an integer arithmetic overflow.
- If you compile with /checked+, you code will throw.

- Following is the result of another run where x in nl1 is changed into a
long.
Note also that the result (value of x) is correct now!

Start C# benchmark
Nested Loop elapsed time: 9983 ms - 479232000000
Nested Loop elapsed time: 9998 ms - 479232000000
End C# benchmark

Willy.

[1]
// Compile this with: csc /o+ filename.cs

/* original code copyright 2004 Christopher W. Cowell-Shah
http://www.cowell-shah.com/research/benchmark/code */
/* other code portions copyright http://dada.perl.it/shootout/ and Doug Bagley
http://www.bagley.org/~doug/shootout */
/* combined, modified and fixed by Thomas Bruckschlegel -
http://www.tommti-systems.com */
using System;
using System.Collections;
using System.Text;
using System.IO;
using System.Runtime.InteropServices;
using System.Security.Permissions;
namespace Benchmark_CSharp
{
class BenchmarkCSharp
{
static DateTime startTime;
static DateTime stopTime;
static TimeSpan elapsedTime;

[STAThread]
static void Main(string[] args)
{
Console.WriteLine("Start C# benchmark");
int n=10;
nl(n*4);
nl1(n*4);
Console.WriteLine("End C# benchmark");
}
public static void nl(int n)
{
long elapsedMilliseconds;

int a, b, c, d, e, f;
long x=0;
if (n < 1) n = 1;
startTime = DateTime.Now;

for (a=0; a!=n; a++)
for (b=0; b!=n; b++)
for (c=0; c!=n; c++)
for (d=0; d!=n; d++)
for (e=0; e!=n; e++)
for (f=0; f!=n; f++)
x+=a+b+c+d+e+f;

stopTime = DateTime.Now;
elapsedTime = stopTime.Subtract(startTime);
elapsedMilliseconds = (int)elapsedTime.TotalMilliseconds;
Console.WriteLine("Nested Loop elapsed time: " + elapsedMilliseconds + "
ms - "+x.ToString());
}
//end

// Your modified function
public static long nl1(int n)
{
long elapsedMilliseconds;
DateTime startTime = DateTime.Now;
int a, b, c, d, e, f;
int tmp_ab, tmp_abc, tmp_abcd, tmp_abcde;
int tmp = 0;
int x = 0;

if (n < 1) n = 1;

for (a = 0; a < n; a++)
{
for (b = 0, tmp_ab = a; b < n; b++, tmp_ab++)
{
for (c = 0, tmp_abc = tmp_ab; c < n; c++, tmp_abc++)
{
for (d = 0, tmp_abcd = tmp_abc; d < n; d++, tmp_abcd++)
{
for (e = 0, tmp_abcde = tmp_abcd; e < n; e++, tmp_abcde++)
{
for (f = 0, tmp = tmp_abcde; f < n; f++, tmp++)
x += tmp;
}
}
}
}
}

// Console.WriteLine(x.ToString() + "\n");
DateTime stopTime = DateTime.Now;
TimeSpan elapsedTime = stopTime.Subtract(startTime);
elapsedMilliseconds = (int)elapsedTime.TotalMilliseconds;
Console.WriteLine("Nested Loop elapsed time: " + elapsedMilliseconds + "
ms - " + x.ToString());

return elapsedMilliseconds;
}

}
}

It would be better to declare x as a double than to either use a long unless C++
has a 64-bit long. This would put C++ and C# on the same footing. Your change to
the benchmark changes its semantic meaning too much.
Dec 31 '05 #69

"Stefan Simek" <si**********@triaxis.nospam.sk> wrote in message
news:%2****************@TK2MSFTNGP15.phx.gbl...
Willy Denoyette [MVP] wrote:
"Stefan Simek" <si**********@triaxis.nospam.sk> wrote in message
news:uu**************@TK2MSFTNGP11.phx.gbl...
Is this better?

Somewhat...
Following are the result I obtained when running [1], here I call both the
original function (modified to prevent x to overflow) and your modified
function (here named nl1).

Start C# benchmark
Nested Loop elapsed time: 9952 ms - 479232000000
Nested Loop elapsed time: 13171 ms - -1804337152
End C# benchmark


The original benchmark included the overflows. Your modification to prevent
the overflow makes it invalid, IMO. Overflows can be sometimes a desired
effect, for example when calculating checksums. Changing the type of x to long
introduces another problem - 64-bit addition, which is non-trivial on 32-bit
platforms.


Same thing that I said. He can use a double to get around both of these problems
at once.


Notes:
- the result (value of x) is incorrect for nl1.
<quote
(introducing a single long in the benchmarking
loop would make it absolutely invalid for 32-bit platforms).
/quote>
Don't know what is meant by this. The result of the calculation yiels a value
that is > int.MAX, that means that there a billions of arithmetic overruns
(479232000000 - int.Max ), you consider this "valid"?
-I'm also not clear on why your run is 2X faster than mine, your CPU isn't
twice as fast as mine I 'm sure.
The result of the calculation is not important in the benchmark, IMO. What is
important is the time the calculation takes to complete.

Yes, I agree. Unless there is some sort of overflow detection that gets invoked
in .NET that makes it run much more slowly when overflow is detected.

- Notice that the original function performs better than yours. The reason is
simple, the modified nl (where x is a long) doesn't incur a hit when x
overflows. The performance hit of adding to 64 bit integer should be much higher than any
hit caused by the overflow that happens once in a time.

Yes, a 64-bit integer on a 32-bit platform must be treated as two separate
pieces.
Your function has a better loop algorithm, but still incurs the hit because
of the overflow on the int result x. Note that all modern x86 CPU's (I know
of) take a (tiny) hit on an integer arithmetic overflow. Not sure of that, but not sure it isn't true either... But arithmetic
overflows are desirable in many cases anyway.
- If you compile with /checked+, you code will throw.

I still think that the overflow is desired in this benchmark, so I would wrap
the *whole* benchmarking code in an unchecked { } block. /checked+ hits
performance quite a lot.

- Following is the result of another run where x in nl1 is changed into a
long.
Note also that the result (value of x) is correct now!

Start C# benchmark
Nested Loop elapsed time: 9983 ms - 479232000000
Nested Loop elapsed time: 9998 ms - 479232000000
End C# benchmark


Interesting, when I change the x to long, I get 25437 ms execution time. This
clearly shows the difference between 32 bit and 64 bit arithmetic. Are you
sure you don't have an x64 machine? That would explain the difference between
our results perfectly.


That is what I suspected. The original C# benchmark declared x to be an int, so
this should not have caused the original 450% performance difference. Hopefully
all of these problems are solved with the new 2005 compiler. Microsoft does
indicate this at least for managed C++, not managed C#.

Willy.

[1]
// Compile this with: csc /o+ filename.cs

/* original code copyright 2004 Christopher W. Cowell-Shah
http://www.cowell-shah.com/research/benchmark/code */
/* other code portions copyright http://dada.perl.it/shootout/ and Doug
Bagley http://www.bagley.org/~doug/shootout */
/* combined, modified and fixed by Thomas Bruckschlegel -
http://www.tommti-systems.com */
using System;
using System.Collections;
using System.Text;
using System.IO;
using System.Runtime.InteropServices;
using System.Security.Permissions;
namespace Benchmark_CSharp
{
class BenchmarkCSharp
{
static DateTime startTime;
static DateTime stopTime;
static TimeSpan elapsedTime;

[STAThread]
static void Main(string[] args)
{
Console.WriteLine("Start C# benchmark");
int n=10;
nl(n*4);
nl1(n*4);
Console.WriteLine("End C# benchmark");
}
public static void nl(int n)
{
long elapsedMilliseconds;

int a, b, c, d, e, f;
long x=0;
if (n < 1) n = 1;
startTime = DateTime.Now;

for (a=0; a!=n; a++)
for (b=0; b!=n; b++)
for (c=0; c!=n; c++)
for (d=0; d!=n; d++)
for (e=0; e!=n; e++)
for (f=0; f!=n; f++)
x+=a+b+c+d+e+f;

stopTime = DateTime.Now;
elapsedTime = stopTime.Subtract(startTime);
elapsedMilliseconds = (int)elapsedTime.TotalMilliseconds;
Console.WriteLine("Nested Loop elapsed time: " + elapsedMilliseconds + "
ms - "+x.ToString());
}
//end

// Your modified function
public static long nl1(int n)
{
long elapsedMilliseconds;
DateTime startTime = DateTime.Now;
int a, b, c, d, e, f;
int tmp_ab, tmp_abc, tmp_abcd, tmp_abcde;
int tmp = 0;
int x = 0;

if (n < 1) n = 1;

for (a = 0; a < n; a++)
{
for (b = 0, tmp_ab = a; b < n; b++, tmp_ab++)
{
for (c = 0, tmp_abc = tmp_ab; c < n; c++, tmp_abc++)
{
for (d = 0, tmp_abcd = tmp_abc; d < n; d++, tmp_abcd++)
{
for (e = 0, tmp_abcde = tmp_abcd; e < n; e++,
tmp_abcde++)
{
for (f = 0, tmp = tmp_abcde; f < n; f++, tmp++)
x += tmp;
}
}
}
}
}

// Console.WriteLine(x.ToString() + "\n");
DateTime stopTime = DateTime.Now;
TimeSpan elapsedTime = stopTime.Subtract(startTime);
elapsedMilliseconds = (int)elapsedTime.TotalMilliseconds;
Console.WriteLine("Nested Loop elapsed time: " + elapsedMilliseconds + "
ms - " + x.ToString());

return elapsedMilliseconds;
}

}
}


I don't want to persuade anyone of my truth. As I've mentioned in my first
post in this endless thread, I just wanted to point out some facts *I*
considered relevant. It's everyone's right to disagree and make their own
opinion, and I enjoy having this right as well.

Stefan

Dec 31 '05 #70
C++ does have a (nonstandard) 64-bit long. GNU calls it "long long"; I
think MS calls it __int64.

Jesse

Jan 1 '06 #71
Jesse McGrew wrote:
C++ does have a (nonstandard) 64-bit long. GNU calls it "long long"; I
think MS calls it __int64.

Jesse

Sure it does, but as I've said before, the long addition would be by far
the most expensive thing in the benchmark. The benchmark is supposed to
measure performance of nested loops, not 64-bit addition.
Jan 1 '06 #72

"Stefan Simek" <si**********@triaxis.nospam.sk> wrote in message
news:eq**************@TK2MSFTNGP09.phx.gbl...
Jesse McGrew wrote:
C++ does have a (nonstandard) 64-bit long. GNU calls it "long long"; I
think MS calls it __int64.

Jesse

Sure it does, but as I've said before, the long addition would be by far
the most expensive thing in the benchmark. The benchmark is supposed to
measure performance of nested loops, not 64-bit addition.


That's your assumption, and maybe it was supposed to be so, but it missed
the point, what takes most of the time are the additions in the most
internal loop, not the nesting of the for loops.
The original benchmark loops 409.600.000 times the internal loop and
performs 7 add's per loop, you can hardly call this a looping benchmark.
The reason the calculation is added, is to prevent aggressive optimizations
when executing empty loops that the author included an operation that
included all variables used in the loop.
Again, as I stated in my first reply, these kind of micro-benchmarks
(originating from C), have no real value, they are misleading they produce
wrong results (ever asked yourself why the output include the value of x?)
and they don't account for side-effects.

Willy.

Jan 1 '06 #73

"Stefan Simek" <si**********@triaxis.nospam.sk> wrote in message
news:%2****************@TK2MSFTNGP15.phx.gbl...
Willy Denoyette [MVP] wrote:
"Stefan Simek" <si**********@triaxis.nospam.sk> wrote in message
news:uu**************@TK2MSFTNGP11.phx.gbl...
The original benchmark included the overflows. Your modification to
prevent the overflow makes it invalid, IMO. Overflows can be sometimes a
desired effect, for example when calculating checksums. Changing the type
of x to long introduces another problem - 64-bit addition, which is
non-trivial on 32-bit platforms.

A desired effect, right, but not in a benchmark that was meant to measure
nested loop performance. The 64 bit add. hit taken is largely compensated by
the lack of overflow handling by the CPU's microcode in the original code.
Note also that the function using a long has 3 instructions more than the
int version (11 vs. 14), but this doesn't say that much about the
performance delta's on super-scalar CPU's like P4 and AMD.
The result of the calculation is not important in the benchmark, IMO. What
is important is the time the calculation takes to complete.

Without correct results, why performing a calculation that yields a plain
wrong result, I'm sorry but I disagree.

The performance hit of adding to 64 bit integer should be much higher than
any hit caused by the overflow that happens once in a time.

Do you have an idea about the no. of overflows? Did you run the code with
/checked while counting the no. of exceptions? Guess not, it would take a
considerable time (minutes? hours?) to do it.
Your function has a better loop algorithm, but still incurs the hit
because of the overflow on the int result x. Note that all modern x86
CPU's (I know of) take a (tiny) hit on an integer arithmetic overflow.

Not sure of that, but not sure it isn't true either... But arithmetic
overflows are desirable in many cases anyway.
- If you compile with /checked+, you code will throw.

I still think that the overflow is desired in this benchmark, so I would
wrap the *whole* benchmarking code in an unchecked { } block. /checked+
hits performance quite a lot.


If it was desired it should have been mentioned in the benchmark txt, and as
far as I see it's not.
IMO it's just an oversight.

- Following is the result of another run where x in nl1 is changed into a
long.
Note also that the result (value of x) is correct now!

Start C# benchmark
Nested Loop elapsed time: 9983 ms - 479232000000
Nested Loop elapsed time: 9998 ms - 479232000000
End C# benchmark


Interesting, when I change the x to long, I get 25437 ms execution time.
This clearly shows the difference between 32 bit and 64 bit arithmetic.
Are you sure you don't have an x64 machine? That would explain the
difference between our results perfectly.


No, running on 32 bit HW. Why would a 32 bit ADD be slower than a 64bit ADD
on X64 anyway?
I ran the benchmark on PIII, P4 (above results) Pentium M and to please you
also on AMD64.

To illustrate what happens consider following results, run on Pentium M
1.6GHz, XP SP2 - Framework 2.0.50727. I took a slower machine to better
illustrate the differences, note that running on PIII, while slower, shows
the same pattern.
---------- original code --------
int o-
Nested Loop elapsed time: 32136 ms - -1804337152 108%

int o+
Nested Loop elapsed time: 29772 ms - -1804337152 reference value 100%

long o-
Nested Loop elapsed time: 24595 ms - 479232000000 83%

long o+

Nested Loop elapsed time: 20789 ms - 479232000000 70%

--------- optimized loop ---------
int o-
Nested Loop elapsed time: 17204 ms - -1804337152 58% 107%
int o+

Nested Loop elapsed time: 16073 ms - -1804337152 54% 100%

long o-
Nested Loop elapsed time: 21230 ms - 479232000000 71% 132%

long o+

Nested Loop elapsed time: 20219 ms - 479232000000 68% 126%
As you can see, the result of the original benchmark (see reference value)
is slower than the same using a long result.

That means that using a long in the "original code" loop, compensates for
the overflow hit.
Inspecting the JIT generated code for the inner loop, gives:
11 instructions for the int result version vs. 14 instructions for the long
result version. So 'theoretically the long version should be slower (but it
isn't). NOTE: that your milleage may vary depending on HW and operational
conditions (beware CPU clock throttling!).
Your hand "optimized loop" code is in general faster than the original code,
but it slows down when using long as result.
Your modified loop compensates the overflow hit by using a better algorithm
at the source code level, that is, THE NUMBER of X86 ADD's resulting from
your algorithm is lower than the original code loop (innerloop). Because of
this, the result field can be register allocated, while in the original
code, the result must be loaded from the stack before the addition (x +=
....) and save back in the stack when done. this takes a huge hit.
[note that your inner loop is 12 instructions long!)

Your code using a long takes a small hit (2 X86 instructions) because of the
64 bit integer addition on 32 bit HW, but this hit is much smaller what the
figures are telling, what happens is that the JIT generates less efficient
code or the whole function (especially the innerloop) when using a long
result (note that this isn't the case with the original code). The rgister
(eax) con no longer be used to store and hold the result field, so now your
code takes the load/store from the stack just like the original code (both
for int and long).

I hope this clears things up a bit, anyway I have no ambition to spend more
time on this neither to start another argument, if you really wanna know
what's happening, run this silly thing against a native debugger and watch
the JIT created code, and you will understand.
PS. I ran this on X64 AMD64 VISTA build 5270 with v2.0.50727.52 (note 52 is
the latest framework build for Vista)

Original code using int resp. long as result
Nested Loop elapsed time: 16370 ms - -1804337152
Nested Loop elapsed time: 10053 ms - 479232000000

Hand-tweaked code using int resp. long as result
Nested Loop elapsed time: 16292 ms - -1804337152
Nested Loop elapsed time: 10194 ms - 479232000000

Notice that this is beta code, so this further reduces the value of the
results obtained.

Willy.
Jan 1 '06 #74

"Jesse McGrew" <jm*****@gmail.com> wrote in message
news:11*********************@o13g2000cwo.googlegro ups.com...
C++ does have a (nonstandard) 64-bit long. GNU calls it "long long"; I
think MS calls it __int64.

Jesse


Actually, the new standard for C includes long long. I read this several years
ago. The problem is not whether or not the language has a 64-bit long. The
problem is making C# use a 64-bit long on a 32-bit platform, while letting C++
use a 32-bit long on this same platform.
Jan 1 '06 #75

"Stefan Simek" <si**********@triaxis.nospam.sk> wrote in message
news:eq**************@TK2MSFTNGP09.phx.gbl...
Jesse McGrew wrote:
C++ does have a (nonstandard) 64-bit long. GNU calls it "long long"; I
think MS calls it __int64.

Jesse

Sure it does, but as I've said before, the long addition would be by far the
most expensive thing in the benchmark. The benchmark is supposed to measure
performance of nested loops, not 64-bit addition.


I don't think that it would be the most expensive thing, although it is far more
expensive than native architecture sized arithmetic.
Jan 1 '06 #76

"Willy Denoyette [MVP]" <wi*************@telenet.be> wrote in message
news:u2**************@TK2MSFTNGP11.phx.gbl...

"Stefan Simek" <si**********@triaxis.nospam.sk> wrote in message
news:eq**************@TK2MSFTNGP09.phx.gbl...
Jesse McGrew wrote:
C++ does have a (nonstandard) 64-bit long. GNU calls it "long long"; I
think MS calls it __int64.

Jesse

Sure it does, but as I've said before, the long addition would be by far the
most expensive thing in the benchmark. The benchmark is supposed to measure
performance of nested loops, not 64-bit addition.


That's your assumption, and maybe it was supposed to be so, but it missed the
point, what takes most of the time are the additions in the most internal
loop, not the nesting of the for loops.
The original benchmark loops 409.600.000 times the internal loop and performs
7 add's per loop, you can hardly call this a looping benchmark.
The reason the calculation is added, is to prevent aggressive optimizations
when executing empty loops that the author included an operation that included
all variables used in the loop.
Again, as I stated in my first reply, these kind of micro-benchmarks
(originating from C), have no real value, they are misleading they produce
wrong results (ever asked yourself why the output include the value of x?) and
they don't account for side-effects.

Willy.


Although it is possible that they may be misleading in some cases, in my case
where my critical 100-line function does little more than looping, it is a valid
benchmark. If most of the time is used by the addition, then it might not even
be valid in my case.
Jan 1 '06 #77
On Sun, 1 Jan 2006 16:55:14 +0100, "Willy Denoyette [MVP]"
<wi*************@telenet.be> wrote:

<snip>

Please don't edit subject headers unless you intend to start a new
thread. Otherwise they become disjoined from the main thread and make
it confusing to try to follow the original thread.

Many thanks. [8-)

Ken Wilson
Seeking viable employment in Victoria, BC
Jan 5 '06 #78

This discussion thread is closed

Replies have been disabled for this discussion.

By using this site, you agree to our Privacy Policy and Terms of Use.