469,600 Members | 2,281 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,600 developers. It's quick & easy.

How about this syntactic candy?

Hi,

for (int i = 0; i < list.Count; i++)

has a hidden performance hit; i.e. list.Count gets evaluated each time, so
we write something like:

int listCount = list.Count;
for (int i = 0; i < listCount; i++)

How about simply:

for (int i = 0; i < const list.Count; i++)

I think it looks kinda nice and is a simple and clean to improve performance
(or at least remove the performance hit). Thoughts?

Hilton
P.S.: Disclaimer, I'm not too familiar with the new C# changes so if
something similar is in there, let me know.
Nov 25 '07 #1
37 1875
Hilton wrote:
for (int i = 0; i < list.Count; i++)

has a hidden performance hit; i.e. list.Count gets evaluated each time, so
we write something like:

int listCount = list.Count;
for (int i = 0; i < listCount; i++)

How about simply:

for (int i = 0; i < const list.Count; i++)

I think it looks kinda nice and is a simple and clean to improve performance
(or at least remove the performance hit). Thoughts?
I don't like it.

I think you should write the code as:

for (int i = 0; i < list.Count; i++)

which is the most readable code and leave the optimization
to the compiler (csc & clr jit).

They should optimize that call away not the programmer.

(I don't know if they already do)

Arne


Nov 25 '07 #2
I agree with Arne. Do you actually *know* that list.Count gets evaluated
each time and the end result is *actually* slower?

I don't claim to know, but suspect there is a good chance the optimisation
will sort it all out anyway. I once saw a demo of a for loop all compressed
into one line with ingenious shortcuts and contractions, and after the
optimising compiler had been through it, it generated EXACTLY the same
machine code as an alternative loop written out in full.

In fact, I vaguely remember someone explaining that code which is highly
optimised by the writer can even make the optimising compiler produce
*worse* code than a simple, but verbose, version of the algorithm. I can't
remember the product, unfortunately. Could it have been an old Delphi?

I've always believed that optimising at the written code stage is:

1/ undesirable, because it makes the code much harder to understand, debug
and maintain

2/ unlikely to work anyway

SteveT

Nov 25 '07 #3
Arne Vajhøj wrote:
Hilton wrote:
>for (int i = 0; i < list.Count; i++)

has a hidden performance hit; i.e. list.Count gets evaluated each
time, so we write something like:

int listCount = list.Count;
for (int i = 0; i < listCount; i++)

How about simply:

for (int i = 0; i < const list.Count; i++)

I think it looks kinda nice and is a simple and clean to improve
performance (or at least remove the performance hit). Thoughts?

I don't like it.

I think you should write the code as:

for (int i = 0; i < list.Count; i++)

which is the most readable code and leave the optimization
to the compiler (csc & clr jit).

They should optimize that call away not the programmer.

(I don't know if they already do)
I made a small test. It seems as the overhead by having
..Count in the for loop is about 0.2 nanoseconds per
iteration on my 3 year old PC. That is very little.

Arne
Nov 25 '07 #4
"Hilton" wrote:
Hi,

for (int i = 0; i < list.Count; i++)

has a hidden performance hit; i.e. list.Count gets evaluated each time, so
we write something like:

int listCount = list.Count;
for (int i = 0; i < listCount; i++)

How about simply:

for (int i = 0; i < const list.Count; i++)

I think it looks kinda nice and is a simple and clean to improve performance
(or at least remove the performance hit). Thoughts?

Hilton
P.S.: Disclaimer, I'm not too familiar with the new C# changes so if
something similar is in there, let me know.
What should a compiler say about this code (assuming all control loops would
be enhanced):

while (const list.Count 0)
{
list.RemoveAt(0);
}

Basically, the while control could be thought of as valid, as the const
evaluates to true or false at runtime, so no compiler error. The line that
removes an item "violates" the const condition, so is this flagged as a
compiler error or a runtime error or neither, where the last case leads to
debugging nightmares?

Just a first reaction. I agree with the others, that in a for loop, the
readable version is probably optimized.

Nov 25 '07 #5

"Family Tree Mike" <Fa************@discussions.microsoft.comwrote in
message news:90**********************************@microsof t.com...
"Hilton" wrote:
>Hi,

for (int i = 0; i < list.Count; i++)

has a hidden performance hit; i.e. list.Count gets evaluated each time,
so
we write something like:

int listCount = list.Count;
for (int i = 0; i < listCount; i++)

How about simply:

for (int i = 0; i < const list.Count; i++)

I think it looks kinda nice and is a simple and clean to improve
performance
(or at least remove the performance hit). Thoughts?

Hilton
P.S.: Disclaimer, I'm not too familiar with the new C# changes so if
something similar is in there, let me know.

What should a compiler say about this code (assuming all control loops
would
be enhanced):

while (const list.Count 0)
{
list.RemoveAt(0);
}
Well, this is a bug and not what I was proposing anyway. My suggestions was
(only) to be used in "for" loops.

Basically, the while control could be thought of as valid, as the const
evaluates to true or false at runtime, so no compiler error. The line
that
removes an item "violates" the const condition, so is this flagged as a
compiler error or a runtime error or neither, where the last case leads to
debugging nightmares?
Even if it was allowed here, it definitely cannot be avaluated at compile,
but before we go off on this tangent, I was only proposing this for a "for"
loop.

Just a first reaction. I agree with the others, that in a for loop, the
readable version is probably optimized.
I suggest you read:
http://msdn2.microsoft.com/en-us/library/ms998547.aspx
"Note that if these are fields, it may be possible for the compiler to do
this optimization automatically. If they are properties, it is much less
likely. If the properties are virtual, it cannot be done automatically."

Hilton
Nov 25 '07 #6
Arne wrote:
I made a small test. It seems as the overhead by having
.Count in the for loop is about 0.2 nanoseconds per
iteration on my 3 year old PC. That is very little.
That information by itself is meaningless. Did the loop run once or a
billion times? What were you doing in the loop? Was the entirely optimized
out by the compiler? etc etc etc...

I'm not saying that the call time is insignificant as you suggest, just that
what you wrote does not tell the full story.

Hilton
Nov 25 '07 #7
Hilton wrote:
Arne wrote:
>I made a small test. It seems as the overhead by having
.Count in the for loop is about 0.2 nanoseconds per
iteration on my 3 year old PC. That is very little.

That information by itself is meaningless. Did the loop run once or a
billion times? What were you doing in the loop? Was the entirely optimized
out by the compiler? etc etc etc...
10 billion times.

A single if statement, but I would not expect the overhead per
iteration to change due to the content of the loop.

It was obviously not entirely optimized out.

Arne
Nov 25 '07 #8
On 2007-11-24 18:39:31 -0800, "Steve Thackery" <no****@nowhere.comsaid:
I agree with Arne. Do you actually *know* that list.Count gets
evaluated each time and the end result is *actually* slower?
list.Count had _better_ get evaluated each time. There's nothing about
C# that suggests that it should assume the value in the loop continue
condition won't change. Unless the code in the loop is so simple that
it is guaranteed that the list.Count property won't change during the
execution of the loop, optimizing the expression to avoid reevaluating
list.Count would be wrong.
I don't claim to know, but suspect there is a good chance the
optimisation will sort it all out anyway. I once saw a demo of a for
loop all compressed into one line with ingenious shortcuts and
contractions, and after the optimising compiler had been through it, it
generated EXACTLY the same machine code as an alternative loop written
out in full.
There may be a situation where the compiler cannot tell that list.Count
is constant, but where it actually is. In that case, I could see the
utility in providing the compiler with an explicit statement to that
effect, but it seems to me that the current method required is so easy
and simple, I can't imagine the point in adding something extra to the
language to support the behavior.

One thing I like very much about C# is its simplicity. I think people
can get carried away trying to add all their favorite "syntactical
sugar" to the language, weighing it down with all sorts of stuff that
just makes things harder on the compiler, the spec writers, and the
developers. Things that are added to the language should be _really_
important and provide some significant functionality. I don't think
this example applies.
In fact, I vaguely remember someone explaining that code which is
highly optimised by the writer can even make the optimising compiler
produce *worse* code than a simple, but verbose, version of the
algorithm. I can't remember the product, unfortunately. Could it have
been an old Delphi?
I think that's a general truth, regardless of language. Optimizing
compilers often can take advantage of commonly used code structures.
If you write code that's "sneaky", where you've tried to create some
optimization outside of the compiler, it's possible that the compiler
could fail to be able to actually generate optimal code.

I would expect that to be a rare situation. Compiler technology has
gotten extremely good. But I wouldn't rule it out.

The point is somewhat moot anyway. After all, you shouldn't be
optimizing code that doesn't need optimizing anyway. Write it readable
first, optimize if and when there's a specific performance issue that
needs to be fixed.
I've always believed that optimising at the written code stage is:

1/ undesirable, because it makes the code much harder to understand,
debug and maintain

2/ unlikely to work anyway
I agree. :)

Pete

Nov 25 '07 #9
On 2007-11-24 20:52:55 -0800, "Hilton" <no****@nospam.comsaid:
>What should a compiler say about this code (assuming all control loops
would be enhanced):

while (const list.Count 0)
{
list.RemoveAt(0);
}

Well, this is a bug and not what I was proposing anyway. My suggestions was
(only) to be used in "for" loops.
Well, then what would happen if you wrote this:

for ( ; const list.Count 0; list.RemoveAt(0));

Not a syntax I'd prefer, but perfectly legal assuming "const" is a
legal keyword in a for() statement.

When you write "const" what is the compiler supposed to assume is
constant? What happens if it's not actually constant?
[...]
>Just a first reaction. I agree with the others, that in a for loop, the
readable version is probably optimized.

I suggest you read:
http://msdn2.microsoft.com/en-us/library/ms998547.aspx
"Note that if these are fields, it may be possible for the compiler to do
this optimization automatically. If they are properties, it is much less
likely. If the properties are virtual, it cannot be done automatically."
Well, it all depends. I do agree that properties are more difficult.
But in many cases they will be simple retrievals of a field, and if
it's a class for which the property can be inlined, it can be optimized
just as a field could be.

Given that the programmer would still be responsible for identifying
constant expressions manually anyway, and given the fact that this new
use of the "const" keyword would add complications for the compiler
even as it allows for one specific kind of optimization, it would be
better to just let the programmer take advantage of the language as it
exists now rather than adding a new syntax to it.

Pete

Nov 25 '07 #10
Did you use an array or a list? Anyway, here are the results on my machine
and the results are significant performance-wise:

foreach takes 3.7 seconds
list.Count takes 1.6 seconds
const list.Count take 0.88 seconds

So, even if "const" never gets adopted, using a temporary variable
(especially for simple loops) seems to achieve 2x results. The temporary
variable is the same as my proposed const and it is exactly what the
compiler would do anyway. I bet that most for loops loop to a constant and
that we're all losing a lot of effiency by using "i < list.Count" in the for
condition.

Here is the code - I used an ArrayList so the length wasn't fixed as with an
array.

using System;
using System.Collections;

namespace Test
{
class Class1
{
[STAThread]
static void Main(string[] args)
{
int LOOP_COUNT = 100000;

IList list = new ArrayList ();

for (int i = 0; i < 1024; i++)
{
list.Add (string.Empty);
}

int x = 0;

DateTime d0 = DateTime.Now;
for (int loop = 0; loop < LOOP_COUNT; loop++)
{
foreach (object o in list)
{
if (o == null) // if (IsNull (o, list))
{
x++;
}
}
}
DateTime d1 = DateTime.Now;

DateTime d2 = DateTime.Now;
for (int loop = 0; loop < LOOP_COUNT; loop++)
{
for (int i = 0; i < 1024; i++)
{
if (list [i] == null) // if (IsNull (i, list))
{
x++;
}
}
}
DateTime d3 = DateTime.Now;

DateTime d4 = DateTime.Now;
for (int loop = 0; loop < LOOP_COUNT; loop++)
{
for (int i = 0; i < list.Count; i++)
{
if (list [i] == null) // if (IsNull (i, list))
{
x++;
}
}
}
DateTime d5 = DateTime.Now;

Console.WriteLine ((d1 - d0).TotalSeconds.ToString ("F3"));
Console.WriteLine ((d3 - d2).TotalSeconds.ToString ("F3"));
Console.WriteLine ((d5 - d4).TotalSeconds.ToString ("F3"));
Console.WriteLine (x);
}

static bool IsNull (object o, IList list)
{
return o == null;
}

static bool IsNull (int i, IList list)
{
return list [i] == null;
}
}
}
Nov 25 '07 #11
Peter Duniho wrote:
On 2007-11-24 20:52:55 -0800, "Hilton" <no****@nospam.comsaid:
>>What should a compiler say about this code (assuming all control loops
would be enhanced):

while (const list.Count 0)
{
list.RemoveAt(0);
}

Well, this is a bug and not what I was proposing anyway. My suggestions
was
(only) to be used in "for" loops.

Well, then what would happen if you wrote this:

for ( ; const list.Count 0; list.RemoveAt(0));

Not a syntax I'd prefer, but perfectly legal assuming "const" is a legal
keyword in a for() statement.

When you write "const" what is the compiler supposed to assume is
constant? What happens if it's not actually constant?
I never claimed to develop a syntax that prevented buggy code being written.
:) Anyway, your example would not compile - you changed my suggestion. I
suggested: "for (int i = 0; i < const list.Count; i++)" and I bet the
compiler writers could add this very very simply.

C# and other languages are full of syntactic candy. e.g. A ? B : C. What
about the foreach to loop through stuff even when it is MUCH slower than a
simple for loop. My suggestion speeds things up 2x (see my other post), is
easy to read, and adds no new keywords. For the record, I know the exact
same thing can be done with a temporary variable, but IMHO, "const" is much
easier to read.

Thanks,

Hilton
Nov 25 '07 #12
Hilton <no****@nospam.comwrote:
Did you use an array or a list? Anyway, here are the results on my machine
and the results are significant performance-wise:

foreach takes 3.7 seconds
list.Count takes 1.6 seconds
const list.Count take 0.88 seconds

So, even if "const" never gets adopted, using a temporary variable
(especially for simple loops) seems to achieve 2x results. The temporary
variable is the same as my proposed const and it is exactly what the
compiler would do anyway. I bet that most for loops loop to a constant and
that we're all losing a lot of effiency by using "i < list.Count" in the for
condition.
"A lot of efficiency"? Only if you habitually write loops where the
dominant factor is the looping part itself.

In most code this would be lost in the noise - and where possible, I'd
use foreach despite it being (gasp!) 4 times as slow in your test.

Micro-optimise where you have a bottleneck, not before.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
World class .NET training in the UK: http://iterativetraining.co.uk
Nov 25 '07 #13
Now all you have to do is find an application where this "performance
hit" is measurable.

A good program is one which is readily readable and easy to change. If
you are going to waste your time trying to optimize source code in this
manner, you will never get anything done.

While this loop of yours is running, windows would be processing about
10 zillion messages, writing to the monitor, processing interupts,
executing other programs etc
-----Original Message-----
From: Hilton [mailto:no****@nospam.com]
Posted At: Sunday, 25 November 2007 10:23 AM
Posted To: microsoft.public.dotnet.languages.csharp
Conversation: How about this syntactic candy?
Subject: How about this syntactic candy?

Hi,

for (int i = 0; i < list.Count; i++)

has a hidden performance hit; i.e. list.Count gets evaluated each
time,
so
we write something like:

int listCount = list.Count;
for (int i = 0; i < listCount; i++)

How about simply:

for (int i = 0; i < const list.Count; i++)

I think it looks kinda nice and is a simple and clean to improve
performance
(or at least remove the performance hit). Thoughts?

Hilton
P.S.: Disclaimer, I'm not too familiar with the new C# changes so if
something similar is in there, let me know.

Nov 25 '07 #14
On 2007-11-24 22:52:24 -0800, "Hilton" <no****@nospam.comsaid:
I never claimed to develop a syntax that prevented buggy code being written.
What's the point of "const" here if it's not enforced? The only thing
I can think of that's worse than adding a rarely needed, rarely
beneficial syntax is adding one that actually allows one to lie to the
compiler in a way that _breaks_ the code.
:) Anyway, your example would not compile - you changed my suggestion. I
suggested: "for (int i = 0; i < const list.Count; i++)" and I bet the
compiler writers could add this very very simply.
What's the difference? Are you proposing that the only legal variation
of the syntax would be "[comparison operator] const"? If so, then I
submit that's an even more limited addition (and thus unwarranted) than
I'd originally thought you were talking about.

If not, then perhaps you could be more clear about what it is exactly
you're proposing. So far, you haven't been very specific.
C# and other languages are full of syntactic candy.
I never said there's never any use for it. But it should add real
value, rather than being some sort of gimmick.
e.g. A ? B : C.
Well, first of all, lots of languages do okay without that. But more
significantly, IMHO, the reason that syntax has survived so many
generations of this family of languages is that it's so commonly
useful. It _significantly_ abbreviates the code, but in a way that's
still quite readable. More important, it provides a syntax to do
something that would simply not be practical any other way.

The primary alternative would be to write a function to perform the
test. It's not just some abbreviated way to write something the
language supports via other means. It provides a way of representing
an expression with a conditional value, without the overhead of having
to call a function.

In C you could hack it together by taking advantage of a boolean
expression evaluating to 0 or 1, but it would be significantly _less_
readable that way, and much more complicated to write. In C# it gets
even worse, because you'd have to get around the languages reluctance
to treat the bool type as a numerical type in the first place.

IMHO, your suggestion doesn't even come close to solving the kind of
issues that an expression like "A ? B : C" solves.
What
about the foreach to loop through stuff even when it is MUCH slower than a
simple for loop.
What about it? It's true, IEnumerable is slower than the usual simple
mechanisms available via an indexing for() loop. So what? For one,
there's a real code-correctness and maintainability value in having a
typed enumeration of some collection, and for another it would be
unusual for the speed difference to have any real significance in your
code.

And in those rare cases when it does, then fine...write it as an
indexed for() loop instead.

But first of all, don't waste time optimizing until you have shown
there's a need. Sound familiar? And secondly, what's foreach() got to
do with this thread? If you're using foreach(), you're not going to be
able to use "const" as you suggest anyway.
My suggestion speeds things up 2x (see my other post),
It speeds things up 2x in a very specific test, and a degenerate one at
that. I doubt it would have anywhere close to that kind of difference
in code that actually did something interesting.
is easy to read,
That's definitely an "eye of the beholder" thing.
and adds no new keywords.
It does add a new interpretation of an existing keyword, and especially
in the context of C# it's a fairly orthogonal interpretation at that
("const" not having all of the same uses in C# that it has in C++).
For the record, I know the exact
same thing can be done with a temporary variable, but IMHO, "const" is much
easier to read.
Again, "eye of the beholder". I find that it obscures a key detail
with respect to the implementation, one that's obvious and hard-coded
rather than assumed when a local variable is used.

In any case, when you write code that copies a value to a local, you
are _guaranteeing_ the compiler that the value won't change. No such
guarantee is made if you insert "const" in the condition for the loop,
and that's a definite code-correctness problem.

I will pick maintainable and correct code over fast every single time. YMMV.

Pete

Nov 25 '07 #15
On 2007-11-25 01:01:26 -0800, Peter Duniho <Np*********@NnOwSlPiAnMk.comsaid:
[...]
In any case, when you write code that copies a value to a local, you
are _guaranteeing_ the compiler that the value won't change. No such
guarantee is made if you insert "const" in the condition for the loop,
and that's a definite code-correctness problem.
And just to be clear, in case the way I've put this was too ambiguous:

I do understand that you can still wind up writing a bug even if you
copy the variable to a local. My point is that the "const" keyword is
traditionally used in places where the compiler can actually ensure the
"const"-ness of the affected identifier, which can't be done here. As
well, using a local variable would IMHO make such a bug much more
obvious than something that _looks_ like it's more than just a hint to
the compiler.

Pete

Nov 25 '07 #16
For info, I treaked your sample to use a stopwatch, and rather than
hard-coding the 1024 in the middle test, I used:

int j = list.Count;
for (int i = 0; i < j; i++) {...}

since this is what you would need to do anyway (we don't know the size
and compile-time). My results don't reproduce your 1.6 =0.88
observations (I'm using release mode, C# 3, .NET 2.0, command line (no
debugger)):

6047316
1400371
1693672

(numbers are ticks)

OK, it is faster (between 2 & 3) but not by as much as you hinted.

Marc

Nov 25 '07 #17
Peter Duniho wrote:
On 2007-11-24 22:52:24 -0800, "Hilton" <no****@nospam.comsaid:
>I never claimed to develop a syntax that prevented buggy code being
written.

What's the point of "const" here if it's not enforced? The only thing I
can think of that's worse than adding a rarely needed, rarely beneficial
syntax is adding one that actually allows one to lie to the compiler in a
way that _breaks_ the code.
What are you talking about? Lying? Breaking the code?

>:) Anyway, your example would not compile - you changed my suggestion.
I
suggested: "for (int i = 0; i < const list.Count; i++)" and I bet the
compiler writers could add this very very simply.

What's the difference? Are you proposing that the only legal variation of
the syntax would be "[comparison operator] const"? If so, then I submit
that's an even more limited addition (and thus unwarranted) than I'd
originally thought you were talking about.

If not, then perhaps you could be more clear about what it is exactly
you're proposing. So far, you haven't been very specific.
In the post to which you're repying, I wrote: "I suggested: "for (int i = 0;
i < const list.Count; i++)""

>C# and other languages are full of syntactic candy.

I never said there's never any use for it. But it should add real value,
rather than being some sort of gimmick.
>e.g. A ? B : C.

Well, first of all, lots of languages do okay without that. But more
significantly, IMHO, the reason that syntax has survived so many
generations of this family of languages is that it's so commonly useful.
It _significantly_ abbreviates the code, but in a way that's still quite
readable. More important, it provides a syntax to do something that would
simply not be practical any other way.
What? Come on Peter...

x = A ? B : C can easily be written with an "if". If you look at the code,
that is exactly what compiler generally do anyway.

[I've ignored formatting here]
if (A) { x = B; } else {x = C; }
[zap]
IMHO, your suggestion doesn't even come close to solving the kind of
issues that an expression like "A ? B : C" solves.
"A ? B : C" solves no 'issues' - it is pure syntactic candy.

But first of all, don't waste time optimizing until you have shown there's
a need. Sound familiar?
Well since you don't develop for the Compact Framework, you clearly don't
need to optimize. (Yes, I really am just kidding here...)

Peter, C# has added so much new stuff in the latter versions that I
personally feel that C#'s readability has taken a huge nose-dive. As you
say, that is in the eye of the beer-holder, uhh, I mean beholder. I simply
suggested one little syntax change. It really isn't such a big deal.

Hilton
Nov 25 '07 #18
Although Hilton's results disprove my arguments ("instincts" would be a
better term), I want to congratulate him on actually doing some
measurements!

There's a tongue-in-cheek expression - "Don't let the facts ruin a perfectly
good theory" - which we are probably all guilty of at some time or other.
Lots of our debates in this forum are based on opinions rather than facts,
so it's a pleasant contrast to see some real evidence in support of
someone's position!

SteveT

Nov 25 '07 #19
Lew
Hilton wrote:
In the post to which you're repying, I wrote: "I suggested: "for (int i = 0;
i < const list.Count; i++)""
As many have pointed out, the problem is that list.Count is not constant. It
can change between iterations. Adding "const" to it is therefore a lie, and
breaks the code.

--
Lew
Nov 25 '07 #20
Hilton wrote:
Did you use an array or a list? Anyway, here are the results on my machine
and the results are significant performance-wise:

foreach takes 3.7 seconds
list.Count takes 1.6 seconds
const list.Count take 0.88 seconds

So, even if "const" never gets adopted, using a temporary variable
(especially for simple loops) seems to achieve 2x results. The temporary
variable is the same as my proposed const and it is exactly what the
compiler would do anyway. I bet that most for loops loop to a constant and
that we're all losing a lot of effiency by using "i < list.Count" in the for
condition.
1)

for (int loop = 0; loop < LOOP_COUNT; loop++)
{
for (int i = 0; i < 1024; i++)
{
if (list [i] == null) // if (IsNull (i, list))
{
x++;
}
}
}

is not what you suggested. You suggested:

for (int loop = 0; loop < LOOP_COUNT; loop++)
{
int n = list.Count;
for (int i = 0; i < n; i++)
{
if (list [i] == null) // if (IsNull (i, list))
{
x++;
}
}
}

2)

Somehow I think you used .NET 1.1 and not 2.0:

C:\>csc /o+ z.cs
Microsoft (R) Visual C# .NET Compiler version 7.10.6001.4
for Microsoft (R) .NET Framework version 1.1.4322
Copyright (C) Microsoft Corporation 2001-2002. All rights reserved.
C:\>z
2,781
0,656
1,234
0

C:\>csc /o+ z.cs
Microsoft (R) Visual C# 2005 Compiler version 8.00.50727.1433
for Microsoft (R) Windows (R) 2005 Framework version 2.0.50727
Copyright (C) Microsoft Corporation 2001-2005. All rights reserved.
C:\>z
2,344
0,719
1,172
0

the relative gap narrows a bit with a recent CLR.

3)

The relative gap is rather uninteresting.

It is still only 5 nanoseconds per iteration.

How many apps would that have an impact on total performance ??

4)

Changing:

IList list = new ArrayList ();

to:

List<stringlist = new List<string();

gives:

1,000
0,219
0,297
0

which is still a relative 35% difference, but the absolute gap
has been reduced to 1 nanosecond per iteration.

Using a newer CLR and a newer library is much more important
than teaking the code manually.

Arne
Nov 25 '07 #21
On Sat, 24 Nov 2007 16:22:46 -0800, "Hilton" <no****@nospam.com>
wrote:
>Hi,

for (int i = 0; i < list.Count; i++)

has a hidden performance hit; i.e. list.Count gets evaluated each time, so
we write something like:

int listCount = list.Count;
for (int i = 0; i < listCount; i++)
You might be able to help the compiler more by adding a const to the
decalration:

const int listCount = list.Count;
for (int i = 0; i < listCount; i++)

rossum
>
How about simply:

for (int i = 0; i < const list.Count; i++)

I think it looks kinda nice and is a simple and clean to improve performance
(or at least remove the performance hit). Thoughts?

Hilton
P.S.: Disclaimer, I'm not too familiar with the new C# changes so if
something similar is in there, let me know.
Nov 25 '07 #22
Lew
rossum wrote:
You might be able to help the compiler more by adding a const to the
decalration:

const int listCount = list.Count;
for (int i = 0; i < listCount; i++)
Of course, if the list size changes during the loop this will cause trouble.
This is a fine idiom for when you know the size will not change.

--
Lew
Nov 25 '07 #23
On 2007-11-25 03:14:01 -0800, "Hilton" <no****@nospam.comsaid:
>What's the point of "const" here if it's not enforced? The only thing I
can think of that's worse than adding a rarely needed, rarely beneficial
syntax is adding one that actually allows one to lie to the compiler in a
way that _breaks_ the code.

What are you talking about? Lying? Breaking the code?
If the compiler does not enforce the "const"-ness of the expression,
then it's possible for the programmer to write "const" even when it's
not. Instant bug, due to an invalid optimization.

I'm not aware of any other use of the keyword "const" where that's
possible, at least not without some explicit "unsafe" code on the part
of the programmer (like casting away the "const" attribute).
>>:) Anyway, your example would not compile - you changed my suggestion.
I
suggested: "for (int i = 0; i < const list.Count; i++)" and I bet the
compiler writers could add this very very simply.

What's the difference? Are you proposing that the only legal variation of
the syntax would be "[comparison operator] const"? If so, then I submit
that's an even more limited addition (and thus unwarranted) than I'd
originally thought you were talking about.

If not, then perhaps you could be more clear about what it is exactly
you're proposing. So far, you haven't been very specific.

In the post to which you're repying, I wrote: "I suggested: "for (int i = 0;
i < const list.Count; i++)""
Yes, but you're using a single example to try to explain some more
general rule. What is your more general rule? Both FTM and I have
proposed other examples of code that would conform to _a_ general rule
that includes your example, but you are unsatisfied with those examples.

So what _is_ your general rule that describes what a legal syntax would
be, if not one of the rules we've inferred so far?

Describe the grammar. Providing a single code example is insufficient,
because language specifications rely on grammar descriptions, not code
examples.
>>C# and other languages are full of syntactic candy.

I never said there's never any use for it. But it should add real value,
rather than being some sort of gimmick.
>>e.g. A ? B : C.

Well, first of all, lots of languages do okay without that. But more
significantly, IMHO, the reason that syntax has survived so many
generations of this family of languages is that it's so commonly useful.
It _significantly_ abbreviates the code, but in a way that's still quite
readable. More important, it provides a syntax to do something that would
simply not be practical any other way.

What? Come on Peter...

x = A ? B : C can easily be written with an "if". If you look at the code,
that is exactly what compiler generally do anyway.
Wrong.
[I've ignored formatting here]
if (A) { x = B; } else {x = C; }
The expression is "A ? B : C". Your code is not the expression, it's a
different way of achieving the assignment to the variable "x".

It's true that there is an alternative way to represent "x = A ? B :
C", but a) that's not what was originally being discussed (you've added
the assignment), and b) the new representation isn't actually the
literal equivalent of the original. The original uses a conditional
expression in a single assignment to a variable "x", where as your
"alternative" uses a conditional expression to choose between two
different code paths assigning to the variable "x".

The difference is subtle and I don't blame you for missing it. But it
does exist, and it is important.

As Jon pointed out, that is just one example of how the expression
could be used. But an if() statement isn't an expression and can't be
used where an expression can, unlike the "?:" operation. While in the
above example, you're correct that you can essentially achieve the same
thing, there are lots of examples that cannot be successfully converted
like that, and even in this case the resulting code is not necessarily
identical.
[zap]
>IMHO, your suggestion doesn't even come close to solving the kind of
issues that an expression like "A ? B : C" solves.

"A ? B : C" solves no 'issues' - it is pure syntactic candy.
Of course it solves issues. I haven't even said your suggestion
doesn't solve an issue. It just doesn't solve a very important issue,
where as the issue addressed by "A ? B : C" is quite common, and very
important in that there really is no readable, concise equivalent
without such a syntax.
>But first of all, don't waste time optimizing until you have shown there's
a need. Sound familiar?

Well since you don't develop for the Compact Framework, you clearly don't
need to optimize. (Yes, I really am just kidding here...)

Peter, C# has added so much new stuff in the latter versions that I
personally feel that C#'s readability has taken a huge nose-dive. As you
say, that is in the eye of the beer-holder, uhh, I mean beholder. I simply
suggested one little syntax change. It really isn't such a big deal.
What's not a big deal? IMHO, it'd be pretty dumb to add it to the
language and if it were added I'd say that'd be a pretty big deal.
Fortunately, I doubt those in charge of the language would ever do
something like that.

But is your suggestion itself a big deal? No...I agree, it's not.
Frankly, I have no idea why you've invested so much effort in defending
it.

It's pretty funny, actually. It seems like the people most likely to
refuse to accept any valid commentary regarding their suggestions are
the people who write things like "Thoughts?" as if they are actually
open to commentary. It appears to me that you aren't actually looking
for commentary; you're only interested in someone agreeing with you and
if they don't, well then you just don't have any use for their
contribution.

Well, good luck with that.

Pete

Nov 25 '07 #24
Jon wrote:
The changes in C# 3 actually *vastly* improve readability once you're
used to them. While the new elements such as lambda expressions are
"strangers" to you then yes, it'll be harder to read. The
expressiveness of LINQ etc is really great though, IMO. Just wait until
you've used it for a while. I get frustrated when I write C# 2 these
days...
That's what I feel like now when I write Java. Sounds corny but C# is a
very clean and neat language. For example "person.setAge
(person.getAge()+1)" is simply person.Age++, that's nice and very readable.
I really haven't got into C#2 or C#3 at all since I'm targetting CF1
devices, pity I like some (all?) of the new stuff, but some of it looks
C++ish which I found to be 'ugly'. Again, just my initial impressions of
generics etc. As you point out, after using it for a while, I'd probably
change my opinion on it.

Keep well,

Hilton
Nov 25 '07 #25
Arne,

Arne Vajhøj wrote:
1)

for (int loop = 0; loop < LOOP_COUNT; loop++)
{
for (int i = 0; i < 1024; i++)
{
if (list [i] == null) // if (IsNull (i, list))
{
x++;
}
}
}

is not what you suggested. You suggested:

for (int loop = 0; loop < LOOP_COUNT; loop++)
{
int n = list.Count;
for (int i = 0; i < n; i++)
{
if (list [i] == null) // if (IsNull (i, list))
{
x++;
}
}
}
Good point, that's for pointing that out and running the numbers.

2)

Somehow I think you used .NET 1.1 and not 2.0:

C:\>csc /o+ z.cs
Microsoft (R) Visual C# .NET Compiler version 7.10.6001.4
for Microsoft (R) .NET Framework version 1.1.4322
Copyright (C) Microsoft Corporation 2001-2002. All rights reserved.
C:\>z
2,781
0,656
1,234
0

C:\>csc /o+ z.cs
Microsoft (R) Visual C# 2005 Compiler version 8.00.50727.1433
for Microsoft (R) Windows (R) 2005 Framework version 2.0.50727
Copyright (C) Microsoft Corporation 2001-2005. All rights reserved.
C:\>z
2,344
0,719
1,172
0
Very nice that two of them got faster, but the 2nd got slower - interesting,
but of course, it could just accuracy of the timing.

the relative gap narrows a bit with a recent CLR.
That's good to see, I wonder why; i.e. did they speed up method calls or do
some unrolling. The unrolling would cause the code to get larger and might
narrow the gap too (as we see above)

3)

The relative gap is rather uninteresting.

It is still only 5 nanoseconds per iteration.

How many apps would that have an impact on total performance ??

4)

Changing:

IList list = new ArrayList ();

to:

List<stringlist = new List<string();

gives:

1,000
0,219
0,297
0
This is interesting. I don't have the later Framekworks installed. If you
could list the IL on these that would be interesting.

Hilton
Nov 26 '07 #26
Hilton wrote:
Arne Vajhøj wrote:
>C:\>csc /o+ z.cs
Microsoft (R) Visual C# .NET Compiler version 7.10.6001.4
for Microsoft (R) .NET Framework version 1.1.4322
Copyright (C) Microsoft Corporation 2001-2002. All rights reserved.
C:\>z
2,781
0,656
1,234
0

C:\>csc /o+ z.cs
Microsoft (R) Visual C# 2005 Compiler version 8.00.50727.1433
for Microsoft (R) Windows (R) 2005 Framework version 2.0.50727
Copyright (C) Microsoft Corporation 2001-2005. All rights reserved.
C:\>z
2,344
0,719
1,172
0

Very nice that two of them got faster, but the 2nd got slower - interesting,
but of course, it could just accuracy of the timing.
I don't think it is accuracy. I tried with x10 more iterations. Same
picture.

But this type of benchmarks is measuring the speed of one or two simple
constructs. They are extremely sensitive to small changes to the
optimization. A real program consist of tens of thousands of simple
constructs. An enhanced optimizer may make practically all real programs
faster, but it will almost certainly make some constructs with some
data slower.

I don't think that it say much.
This is interesting. I don't have the later Framekworks installed.
You should - 1.1 is 4 years old.

Arne
Nov 26 '07 #27
Hilton wrote:
If you
could list the IL on these that would be interesting.
..method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
.custom instance void [mscorlib]System.STAThreadAttribute::.ctor() =
( 01 00 00 00 )
// Code size 459 (0x1cb)
.maxstack 2
.locals init (int32 V_0,
class [mscorlib]System.Collections.Generic.List`1<stringV_1,
int32 V_2,
int32 V_3,
valuetype [mscorlib]System.DateTime V_4,
int32 V_5,
object V_6,
valuetype [mscorlib]System.DateTime V_7,
valuetype [mscorlib]System.DateTime V_8,
valuetype [mscorlib]System.DateTime V_9,
valuetype [mscorlib]System.DateTime V_10,
valuetype [mscorlib]System.DateTime V_11,
bool V_12,
valuetype
[mscorlib]System.Collections.Generic.List`1/Enumerator<stringV_13,
valuetype [mscorlib]System.TimeSpan V_14,
float64 V_15)
IL_0000: nop
IL_0001: ldc.i4 0x186a0
IL_0006: stloc.0
IL_0007: newobj instance void class
[mscorlib]System.Collections.Generic.List`1<string>::.ctor()
IL_000c: stloc.1
IL_000d: ldc.i4.0
IL_000e: stloc.2
IL_000f: br.s IL_0023
IL_0011: nop
IL_0012: ldloc.1
IL_0013: ldsfld string [mscorlib]System.String::Empty
IL_0018: callvirt instance void class
[mscorlib]System.Collections.Generic.List`1<string>::Add(!0)
IL_001d: nop
IL_001e: nop
IL_001f: ldloc.2
IL_0020: ldc.i4.1
IL_0021: add
IL_0022: stloc.2
IL_0023: ldloc.2
IL_0024: ldc.i4 0x400
IL_0029: clt
IL_002b: stloc.s V_12
IL_002d: ldloc.s V_12
IL_002f: brtrue.s IL_0011
IL_0031: ldc.i4.0
IL_0032: stloc.3
IL_0033: call valuetype [mscorlib]System.DateTime
[mscorlib]System.DateTime::get_Now()
IL_0038: stloc.s V_4
IL_003a: ldc.i4.0
IL_003b: stloc.s V_5
IL_003d: br.s IL_0090
IL_003f: nop
IL_0040: nop
IL_0041: ldloc.1
IL_0042: callvirt instance valuetype
[mscorlib]System.Collections.Generic.List`1/Enumerator<!0class
[mscorlib]System.Collections.Generic.List`1<string>::GetEnum erator()
IL_0047: stloc.s V_13
.try
{
IL_0049: br.s IL_006a
IL_004b: ldloca.s V_13
IL_004d: call instance !0 valuetype
[mscorlib]System.Collections.Generic.List`1/Enumerator<string>::get_Current()
IL_0052: stloc.s V_6
IL_0054: nop
IL_0055: ldloc.s V_6
IL_0057: ldnull
IL_0058: ceq
IL_005a: ldc.i4.0
IL_005b: ceq
IL_005d: stloc.s V_12
IL_005f: ldloc.s V_12
IL_0061: brtrue.s IL_0069
IL_0063: nop
IL_0064: ldloc.3
IL_0065: ldc.i4.1
IL_0066: add
IL_0067: stloc.3
IL_0068: nop
IL_0069: nop
IL_006a: ldloca.s V_13
IL_006c: call instance bool valuetype
[mscorlib]System.Collections.Generic.List`1/Enumerator<string>::MoveNext()
IL_0071: stloc.s V_12
IL_0073: ldloc.s V_12
IL_0075: brtrue.s IL_004b
IL_0077: leave.s IL_0088
} // end .try
finally
{
IL_0079: ldloca.s V_13
IL_007b: constrained. valuetype
[mscorlib]System.Collections.Generic.List`1/Enumerator<string>
IL_0081: callvirt instance void
[mscorlib]System.IDisposable::Dispose()
IL_0086: nop
IL_0087: endfinally
} // end handler
IL_0088: nop
IL_0089: nop
IL_008a: ldloc.s V_5
IL_008c: ldc.i4.1
IL_008d: add
IL_008e: stloc.s V_5
IL_0090: ldloc.s V_5
IL_0092: ldloc.0
IL_0093: clt
IL_0095: stloc.s V_12
IL_0097: ldloc.s V_12
IL_0099: brtrue.s IL_003f
IL_009b: call valuetype [mscorlib]System.DateTime
[mscorlib]System.DateTime::get_Now()
IL_00a0: stloc.s V_7
IL_00a2: call valuetype [mscorlib]System.DateTime
[mscorlib]System.DateTime::get_Now()
IL_00a7: stloc.s V_8
IL_00a9: ldc.i4.0
IL_00aa: stloc.s V_5
IL_00ac: br.s IL_00e7
IL_00ae: nop
IL_00af: ldc.i4.0
IL_00b0: stloc.2
IL_00b1: br.s IL_00d2
IL_00b3: nop
IL_00b4: ldloc.1
IL_00b5: ldloc.2
IL_00b6: callvirt instance !0 class
[mscorlib]System.Collections.Generic.List`1<string>::get_Ite m(int32)
IL_00bb: ldnull
IL_00bc: ceq
IL_00be: ldc.i4.0
IL_00bf: ceq
IL_00c1: stloc.s V_12
IL_00c3: ldloc.s V_12
IL_00c5: brtrue.s IL_00cd
IL_00c7: nop
IL_00c8: ldloc.3
IL_00c9: ldc.i4.1
IL_00ca: add
IL_00cb: stloc.3
IL_00cc: nop
IL_00cd: nop
IL_00ce: ldloc.2
IL_00cf: ldc.i4.1
IL_00d0: add
IL_00d1: stloc.2
IL_00d2: ldloc.2
IL_00d3: ldc.i4 0x400
IL_00d8: clt
IL_00da: stloc.s V_12
IL_00dc: ldloc.s V_12
IL_00de: brtrue.s IL_00b3
IL_00e0: nop
IL_00e1: ldloc.s V_5
IL_00e3: ldc.i4.1
IL_00e4: add
IL_00e5: stloc.s V_5
IL_00e7: ldloc.s V_5
IL_00e9: ldloc.0
IL_00ea: clt
IL_00ec: stloc.s V_12
IL_00ee: ldloc.s V_12
IL_00f0: brtrue.s IL_00ae
IL_00f2: call valuetype [mscorlib]System.DateTime
[mscorlib]System.DateTime::get_Now()
IL_00f7: stloc.s V_9
IL_00f9: call valuetype [mscorlib]System.DateTime
[mscorlib]System.DateTime::get_Now()
IL_00fe: stloc.s V_10
IL_0100: ldc.i4.0
IL_0101: stloc.s V_5
IL_0103: br.s IL_013f
IL_0105: nop
IL_0106: ldc.i4.0
IL_0107: stloc.2
IL_0108: br.s IL_0129
IL_010a: nop
IL_010b: ldloc.1
IL_010c: ldloc.2
IL_010d: callvirt instance !0 class
[mscorlib]System.Collections.Generic.List`1<string>::get_Ite m(int32)
IL_0112: ldnull
IL_0113: ceq
IL_0115: ldc.i4.0
IL_0116: ceq
IL_0118: stloc.s V_12
IL_011a: ldloc.s V_12
IL_011c: brtrue.s IL_0124
IL_011e: nop
IL_011f: ldloc.3
IL_0120: ldc.i4.1
IL_0121: add
IL_0122: stloc.3
IL_0123: nop
IL_0124: nop
IL_0125: ldloc.2
IL_0126: ldc.i4.1
IL_0127: add
IL_0128: stloc.2
IL_0129: ldloc.2
IL_012a: ldloc.1
IL_012b: callvirt instance int32 class
[mscorlib]System.Collections.Generic.List`1<string>::get_Cou nt()
IL_0130: clt
IL_0132: stloc.s V_12
IL_0134: ldloc.s V_12
IL_0136: brtrue.s IL_010a
IL_0138: nop
IL_0139: ldloc.s V_5
IL_013b: ldc.i4.1
IL_013c: add
IL_013d: stloc.s V_5
IL_013f: ldloc.s V_5
IL_0141: ldloc.0
IL_0142: clt
IL_0144: stloc.s V_12
IL_0146: ldloc.s V_12
IL_0148: brtrue.s IL_0105
IL_014a: call valuetype [mscorlib]System.DateTime
[mscorlib]System.DateTime::get_Now()
IL_014f: stloc.s V_11
IL_0151: ldloc.s V_7
IL_0153: ldloc.s V_4
IL_0155: call valuetype [mscorlib]System.TimeSpan
[mscorlib]System.DateTime::op_Subtraction(valuetype
[mscorlib]System.DateTime,

valuetype [mscorlib]System.DateTime)
IL_015a: stloc.s V_14
IL_015c: ldloca.s V_14
IL_015e: call instance float64
[mscorlib]System.TimeSpan::get_TotalSeconds()
IL_0163: stloc.s V_15
IL_0165: ldloca.s V_15
IL_0167: ldstr "F3"
IL_016c: call instance string
[mscorlib]System.Double::ToString(string)
IL_0171: call void [mscorlib]System.Console::WriteLine(string)
IL_0176: nop
IL_0177: ldloc.s V_9
IL_0179: ldloc.s V_8
IL_017b: call valuetype [mscorlib]System.TimeSpan
[mscorlib]System.DateTime::op_Subtraction(valuetype
[mscorlib]System.DateTime,

valuetype [mscorlib]System.DateTime)
IL_0180: stloc.s V_14
IL_0182: ldloca.s V_14
IL_0184: call instance float64
[mscorlib]System.TimeSpan::get_TotalSeconds()
IL_0189: stloc.s V_15
IL_018b: ldloca.s V_15
IL_018d: ldstr "F3"
IL_0192: call instance string
[mscorlib]System.Double::ToString(string)
IL_0197: call void [mscorlib]System.Console::WriteLine(string)
IL_019c: nop
IL_019d: ldloc.s V_11
IL_019f: ldloc.s V_10
IL_01a1: call valuetype [mscorlib]System.TimeSpan
[mscorlib]System.DateTime::op_Subtraction(valuetype
[mscorlib]System.DateTime,

valuetype [mscorlib]System.DateTime)
IL_01a6: stloc.s V_14
IL_01a8: ldloca.s V_14
IL_01aa: call instance float64
[mscorlib]System.TimeSpan::get_TotalSeconds()
IL_01af: stloc.s V_15
IL_01b1: ldloca.s V_15
IL_01b3: ldstr "F3"
IL_01b8: call instance string
[mscorlib]System.Double::ToString(string)
IL_01bd: call void [mscorlib]System.Console::WriteLine(string)
IL_01c2: nop
IL_01c3: ldloc.3
IL_01c4: call void [mscorlib]System.Console::WriteLine(int32)
IL_01c9: nop
IL_01ca: ret
} // end of method Class1::Main
Nov 26 '07 #28


"Peter Duniho" wrote:
>
Yes, but you're using a single example to try to explain some more
general rule. What is your more general rule? Both FTM and I have
proposed other examples of code that would conform to _a_ general rule
that includes your example, but you are unsatisfied with those examples.

So what _is_ your general rule that describes what a legal syntax would
be, if not one of the rules we've inferred so far?

Describe the grammar. Providing a single code example is insufficient,
because language specifications rely on grammar descriptions, not code
examples.

I stepped away from a little bit, but the discussion looks great. What
Peter wrote was what I was trying to get at describing. With one example,
the impact on the language can not be known. You have to have multiple
people interpret how they would use the change.
Nov 26 '07 #29
Peter Duniho wrote:
>In another thread you
said that Dictionaries would just be a very little bit slower that arrays
and therefore my suggestion was not good. But, did you post any timing,
any
code, anything? No.

Would you like me to? It would be trivial to test.
Yes please. Putting aside our disagreements, I really would be interested
to see how much slower (if any) using a dictionary is than "array[x]++".

Thanks,

Hilton
Nov 26 '07 #30
On 2007-11-26 15:11:16 -0800, "Hilton" <no****@nospam.comsaid:
Peter Duniho wrote:
>>In another thread you
said that Dictionaries would just be a very little bit slower that arrays
and therefore my suggestion was not good. But, did you post any timing,
any code, anything? No.

Would you like me to? It would be trivial to test.

Yes please. Putting aside our disagreements, I really would be interested
to see how much slower (if any) using a dictionary is than "array[x]++".
I'm sorry. You appear to have missed the rest of my post. Granted,
you're a big fan of your "zap", trimming anything that you'd rather not
respond to. But I already explained my position. Frankly, nothing
you've ever written in this newsgroup gives me any confidence there
will be a point in putting together such a test.

Like I said, if I thought there was a good chance you'd admit your
original mistake when presented with the numbers, I'd be happy to
invest the time (and I have in fact done so in other discussions
regarding other topics). But there's already been plenty of other
kinds of evidence suggesting a mistake in your original claim that the
dictionary solution is "inefficient", with nary a peep from you
acknowledging it.

So, are you saying that you're prepared to admit you were wrong? If
so, why haven't you acknowledged the other information that's already
been provided? What motivation do I have for writing and running the
code, given the ample evidence that already exists in support of my
statements?

Pete

Nov 27 '07 #31
Peter Duniho wrote:
On 2007-11-26 15:11:16 -0800, "Hilton" <no****@nospam.comsaid:
>Peter Duniho wrote:
>>>In another thread you
said that Dictionaries would just be a very little bit slower that
arrays
and therefore my suggestion was not good. But, did you post any
timing,
any code, anything? No.

Would you like me to? It would be trivial to test.

Yes please. Putting aside our disagreements, I really would be
interested
to see how much slower (if any) using a dictionary is than "array[x]++".

I'm sorry. You appear to have missed the rest of my post. Granted,
you're a big fan of your "zap", trimming anything that you'd rather not
respond to. But I already explained my position. Frankly, nothing you've
ever written in this newsgroup gives me any confidence there will be a
point in putting together such a test.
Frankly, I don't care what you think about me or what I write, I honestly
truely don't, I care about how much slower (if any) using a ditionary is
than "array[x]++". Because you have yet to produce any code or timings to
back up what you've claimed in this thread (I and others have), other people
would tell you to "put up or shut up", but I'd rather simply ignore your
ranting posts and poll the other readers here to see if anyone has a few
minutes to write a "dictionary versus array" time trial. [I would, but I
delibrately need to keep (only) earlier versions on the .NET Framework on my
work machine.]

Like I said, if I thought there was a good chance you'd admit your
original mistake when presented with the numbers, I'd be happy to invest
the time (and I have in fact done so in other discussions regarding other
topics). But there's already been plenty of other kinds of evidence
suggesting a mistake in your original claim that the dictionary solution
is "inefficient", with nary a peep from you acknowledging it.

So, are you saying that you're prepared to admit you were wrong?
Yes, I always am when given cold, hard facts. The nice thing about being
proven wrong is that you learn something.

Hilton
Nov 27 '07 #32
On Nov 27, 8:37 am, "Hilton" <nos...@nospam.comwrote:
Frankly, I don't care what you think about me or what I write, I honestly
truely don't, I care about how much slower (if any) using a ditionary is
than "array[x]++". Because you have yet to produce any code or timings to
back up what you've claimed in this thread (I and others have), other people
would tell you to "put up or shut up", but I'd rather simply ignore your
ranting posts and poll the other readers here to see if anyone has a few
minutes to write a "dictionary versus array" time trial. [I would, but I
delibrately need to keep (only) earlier versions on the .NET Framework on my
work machine.]
Okay, here's a fairly quick bit of code. I don't like it because it
uses mutable structs for your array-based version, but that's what you
were suggesting. I haven't optimised the dictionary code at all.

The program takes three parameters:
1) The number of iterations to use
2) The range of values (e.g. 100 means all values are in the range
0-99)
3) The number of samples

Code is at the bottom, here are the results:
test 100000 100 100 // Range=sample size
ArrayCountSorter: 4207
DictionaryCountSorter: 4020

test 100000 100 1000 // 10 times as many samples as range
ArrayCountSorter: 8625
DictionaryCountSorter: 15900

test 10000 100 10000 // 100 times as many samples as range
ArrayCountSorter: 5053
DictionaryCountSorter: 11486

test 10000 10 100000 // 10,000 times as many samples as range
ArrayCountSorter: 45711
DictionaryCountSorter: 109260

test 100000 1000 100 // 1/10 as many samples as range
ArrayCountSorter: 36110
DictionaryCountSorter: 4999

test 10000 10000 100 // 1/100 as many samples as range
ArrayCountSorter: 50147
DictionaryCountSorter: 521

I've conducted more tests, but here's the gist of the results:
1) When the range ~= sample size, the two algorithms are roughly
equal, with the dictionary approach just about winning
2) As the sample size increases without changing the range, the array
approach is better than the dictionary approach, but by *about* a
factor of 2 (increasing slightly, but not dramatically) as the
proportion changes
3) As the range size increases without changing the sample size, the
dictionary approach is *much, much* better.

Given a choice between "twice as fast in certain situations" and
"scales well and is never *very* slow" I know which I'd say is a
better *general* option (given no further information).

Oh, and the dictionary approach has the added benefit of being generic
across types, so you could use it for string-based samples etc with
*no* code changes :)
Here's the code:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;

struct Pair
{
public int Key;
public int Value;

public Pair(int key)
{
Key = key;
Value = 0;
}
}

public class Test
{
const int Iterations = 100000;
const int MaxValue = 100;
const int DataSize = 100;

static void Main(string[] args)
{
int iterations = int.Parse(args[0]);
int range = int.Parse(args[1]);
int samples = int.Parse(args[2]);

Stopwatch sw = Stopwatch.StartNew();
for (int i=0; i < iterations; i++)
{
ArrayCountSorter sorter = new ArrayCountSorter(range);
sorter.CountAndSort(CreateData(range, samples));
}
sw.Stop();
Console.WriteLine ("ArrayCountSorter: {0}",
sw.ElapsedMilliseconds);

sw = Stopwatch.StartNew();
for (int i=0; i < iterations; i++)
{
DictionaryCountSorter<intsorter = new
DictionaryCountSorter<int>();
sorter.CountAndSort(CreateData(range, samples));
}
Console.WriteLine ("DictionaryCountSorter: {0}",
sw.ElapsedMilliseconds);
sw.Stop();
}

static IEnumerable<intCreateData(int range, int samples)
{
// Always use the same seed - always get the same data
Random rng = new Random(0);
for (int i=0; i < samples; i++)
{
yield return rng.Next(range);
}
}
}

class ArrayCountSorter
{
int range;

public ArrayCountSorter(int range)
{
this.range = range;
}

public IEnumerable<PairCountAndSort(IEnumerable<intinput)
{
Pair[] values = new Pair[range];
for (int i=0; i < range; i++)
{
values[i] = new Pair(i);
}
foreach(int datum in input)
{
values[datum].Value++;
}
Array.Sort(values, delegate(Pair p1, Pair p2)
{ return p2.Value.CompareTo(p1.Value); }
);
return values;
}
}

class DictionaryCountSorter<T>
{
public IEnumerable<KeyValuePair<T,int>>
CountAndSort(IEnumerable<Tinput)
{
Dictionary<T,inthistogram = new Dictionary<T,int>();
foreach (T datum in input)
{
int count;
histogram.TryGetValue(datum, out count);
histogram[datum] = count+1;
}
List<KeyValuePair<T,int>list = new
List<KeyValuePair<T,int>>(histogram);
list.Sort(delegate(KeyValuePair<T,intp1, KeyValuePair<T,int>
p2)
{ return p2.Value.CompareTo(p1.Value); }
);
return list;
}
}

Jon
Nov 27 '07 #33
Jon,

Fantastic, thanks, very interesting. When I have some time here I'll take a
closer look at the code, but as a first pass it looks like a tie for few
samples going to about 2x for a larger sample size. With a very few
samples, the sorting then takes the majority of the time and the Dictionary
wins.

Jon, assuming you were going to write it using an array, how would you store
the info given that you don't like the struct code below?

Thanks, and thanks again for the code, numbers, and time to put this
together.

Hilton
"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:6b**********************************@s36g2000 prg.googlegroups.com...
On Nov 27, 8:37 am, "Hilton" <nos...@nospam.comwrote:
>Frankly, I don't care what you think about me or what I write, I honestly
truely don't, I care about how much slower (if any) using a ditionary is
than "array[x]++". Because you have yet to produce any code or timings
to
back up what you've claimed in this thread (I and others have), other
people
would tell you to "put up or shut up", but I'd rather simply ignore your
ranting posts and poll the other readers here to see if anyone has a few
minutes to write a "dictionary versus array" time trial. [I would, but I
delibrately need to keep (only) earlier versions on the .NET Framework on
my
work machine.]

Okay, here's a fairly quick bit of code. I don't like it because it
uses mutable structs for your array-based version, but that's what you
were suggesting. I haven't optimised the dictionary code at all.

The program takes three parameters:
1) The number of iterations to use
2) The range of values (e.g. 100 means all values are in the range
0-99)
3) The number of samples

Code is at the bottom, here are the results:
test 100000 100 100 // Range=sample size
ArrayCountSorter: 4207
DictionaryCountSorter: 4020

test 100000 100 1000 // 10 times as many samples as range
ArrayCountSorter: 8625
DictionaryCountSorter: 15900

test 10000 100 10000 // 100 times as many samples as range
ArrayCountSorter: 5053
DictionaryCountSorter: 11486

test 10000 10 100000 // 10,000 times as many samples as range
ArrayCountSorter: 45711
DictionaryCountSorter: 109260

test 100000 1000 100 // 1/10 as many samples as range
ArrayCountSorter: 36110
DictionaryCountSorter: 4999

test 10000 10000 100 // 1/100 as many samples as range
ArrayCountSorter: 50147
DictionaryCountSorter: 521

I've conducted more tests, but here's the gist of the results:
1) When the range ~= sample size, the two algorithms are roughly
equal, with the dictionary approach just about winning
2) As the sample size increases without changing the range, the array
approach is better than the dictionary approach, but by *about* a
factor of 2 (increasing slightly, but not dramatically) as the
proportion changes
3) As the range size increases without changing the sample size, the
dictionary approach is *much, much* better.

Given a choice between "twice as fast in certain situations" and
"scales well and is never *very* slow" I know which I'd say is a
better *general* option (given no further information).

Oh, and the dictionary approach has the added benefit of being generic
across types, so you could use it for string-based samples etc with
*no* code changes :)
Here's the code:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;

struct Pair
{
public int Key;
public int Value;

public Pair(int key)
{
Key = key;
Value = 0;
}
}

public class Test
{
const int Iterations = 100000;
const int MaxValue = 100;
const int DataSize = 100;

static void Main(string[] args)
{
int iterations = int.Parse(args[0]);
int range = int.Parse(args[1]);
int samples = int.Parse(args[2]);

Stopwatch sw = Stopwatch.StartNew();
for (int i=0; i < iterations; i++)
{
ArrayCountSorter sorter = new ArrayCountSorter(range);
sorter.CountAndSort(CreateData(range, samples));
}
sw.Stop();
Console.WriteLine ("ArrayCountSorter: {0}",
sw.ElapsedMilliseconds);

sw = Stopwatch.StartNew();
for (int i=0; i < iterations; i++)
{
DictionaryCountSorter<intsorter = new
DictionaryCountSorter<int>();
sorter.CountAndSort(CreateData(range, samples));
}
Console.WriteLine ("DictionaryCountSorter: {0}",
sw.ElapsedMilliseconds);
sw.Stop();
}

static IEnumerable<intCreateData(int range, int samples)
{
// Always use the same seed - always get the same data
Random rng = new Random(0);
for (int i=0; i < samples; i++)
{
yield return rng.Next(range);
}
}
}

class ArrayCountSorter
{
int range;

public ArrayCountSorter(int range)
{
this.range = range;
}

public IEnumerable<PairCountAndSort(IEnumerable<intinput)
{
Pair[] values = new Pair[range];
for (int i=0; i < range; i++)
{
values[i] = new Pair(i);
}
foreach(int datum in input)
{
values[datum].Value++;
}
Array.Sort(values, delegate(Pair p1, Pair p2)
{ return p2.Value.CompareTo(p1.Value); }
);
return values;
}
}

class DictionaryCountSorter<T>
{
public IEnumerable<KeyValuePair<T,int>>
CountAndSort(IEnumerable<Tinput)
{
Dictionary<T,inthistogram = new Dictionary<T,int>();
foreach (T datum in input)
{
int count;
histogram.TryGetValue(datum, out count);
histogram[datum] = count+1;
}
List<KeyValuePair<T,int>list = new
List<KeyValuePair<T,int>>(histogram);
list.Sort(delegate(KeyValuePair<T,intp1, KeyValuePair<T,int>
p2)
{ return p2.Value.CompareTo(p1.Value); }
);
return list;
}
}

Jon

Nov 28 '07 #34
Hilton <no****@nospam.comwrote:
Fantastic, thanks, very interesting. When I have some time here I'll take a
closer look at the code, but as a first pass it looks like a tie for few
samples going to about 2x for a larger sample size. With a very few
samples, the sorting then takes the majority of the time and the Dictionary
wins.

Jon, assuming you were going to write it using an array, how would you store
the info given that you don't like the struct code below?
I suspect I'd use two arrays instead, one for the keys and one for the
values. Or I'd try a mutable reference type. Or use KeyValuePair and
create a new one each time, copying the key.

Mutable structs are bad news. They often behave in ways which are
logical but unexpected.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
World class .NET training in the UK: http://iterativetraining.co.uk
Nov 28 '07 #35
Peter Duniho <Np*********@NnOwSlPiAnMk.comwrote:

<snip>
4) In the dictionary solution, it is faster to call Contains() and
then retrieve and reassign the value, than to call TryGetValue(). As
with the iteration difference, I have not bothered to look closely at
the .NET implementation to better understand why the difference exists.
But it does (my suspicion is that TryGetValue() basically calls
Contains(), and so calling that method simply adds an extra call to the
loop).
Really? Wow - that's a real surprise. I must investigate that further.
(Reflector shows that TryGetValue doesn't call Contains, but both call
FindEntry - so calling Contains and then the indexer to read will end
up finding the entry twice, but TryGetValue only finds the entry once.)

I'll look at it more tomorrow - it's time for bed now.
Anyway, I appreciate that there's at least one person around here calm
and rational enough to just provide the numbers, regardless of whether
they will be heeded or not. :)
You call me rational now... see if you still do after reading this:

http://msmvps.com/blogs/jon.skeet/ar...-t-call-us-we-
ll-call-you-push-enumerators.aspx

I blame Marc Gravell.

<snip>

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
World class .NET training in the UK: http://iterativetraining.co.uk
Nov 28 '07 #36
On 2007-11-27 17:23:51 -0800, Jon Skeet [C# MVP] <sk***@pobox.comsaid:
Really? Wow - that's a real surprise. I must investigate that further.
(Reflector shows that TryGetValue doesn't call Contains, but both call
FindEntry - so calling Contains and then the indexer to read will end
up finding the entry twice, but TryGetValue only finds the entry once.)
I was surprised too. The reason I even compared the two is that I had
assumed that TryGetValue would do the check for containment and
retrieval as a single operation. If anything, I expected it to be
faster. But in my tests it's not.

I would not at all be surprised if looking at the code shows that
TryGetValue does indeed appear to have more efficient code (ie
retrieving at the same time as checking for the key's existence). But
_something_ is going on to slow it down. If nothing else, it's yet
another illustration of how optimizing code is best left for when
there's an actual problem to be solved, and when some real world data
can be obtained regarding actual performance.
I'll look at it more tomorrow - it's time for bed now.
>Anyway, I appreciate that there's at least one person around here calm
and rational enough to just provide the numbers, regardless of whether
they will be heeded or not. :)

You call me rational now... see if you still do after reading this:

http://msmvps.com/blogs/jon.skeet/ar...umerators.aspx
Well,
>
even the most rational among us need to escape once in awhile. :)

For what it's worth "push" enumeration isn't unheard of. I would
consider any sort of enumeration involving a callback to be an example
of that, of which there are examples in the Windows API and which I've
used in other contexts as well (I even used it in an image processing
class I wrote once).

Still, I don't know enough about LINQ to really "get" what you're
article is all about but if you think the design in your article is
crazy, who am I to argue? :)
I blame Marc Gravell.
Sounds good to me. Seems like we have precious little things to blame
on him, so why not? I'm sure he can handle the load. :)

Pete

Nov 28 '07 #37
Peter Duniho <Np*********@NnOwSlPiAnMk.comwrote:
Really? Wow - that's a real surprise. I must investigate that further.
(Reflector shows that TryGetValue doesn't call Contains, but both call
FindEntry - so calling Contains and then the indexer to read will end
up finding the entry twice, but TryGetValue only finds the entry once.)

I was surprised too. The reason I even compared the two is that I had
assumed that TryGetValue would do the check for containment and
retrieval as a single operation. If anything, I expected it to be
faster. But in my tests it's not.
I'll need to test it myself - it'll be most odd if your tests show it
to be one way and mine show something else :(
For what it's worth "push" enumeration isn't unheard of. I would
consider any sort of enumeration involving a callback to be an example
of that, of which there are examples in the Windows API and which I've
used in other contexts as well (I even used it in an image processing
class I wrote once).
Well, it was the way of using it that was particularly odd. However,
we've got a better way of doing it :)
Still, I don't know enough about LINQ to really "get" what you're
article is all about but if you think the design in your article is
crazy, who am I to argue? :)
The basic thrust is to do efficient grouped aggregations.

For instance, consider a stream of newsgroup posts, and you want to
build a "most prolific posters" list. With LINQ in its current form,
that can't be done very efficiently (in terms of memory usage). The
push stuff helps with that a lot. The new design will also allow for
multiple aggregates to be computed at once, somewhat coincidentally.
I blame Marc Gravell.

Sounds good to me. Seems like we have precious little things to blame
on him, so why not? I'm sure he can handle the load. :)
His generic maths stuff is evil too. I hadn't looked at it properly
before this afternoon. Ouch, but cool ouch :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
World class .NET training in the UK: http://iterativetraining.co.uk
Nov 28 '07 #38

This discussion thread is closed

Replies have been disabled for this discussion.

By using this site, you agree to our Privacy Policy and Terms of Use.