On Jul 9, 11:58*pm, Ciaran O''Donnell
<CiaranODonn...@discussions.microsoft.comwrote:
I understand your point but I like to know which options perform best in
which situations as I think it is better to code with performance in mind..
Eek. I think readability is far more important. Only a tiny proportion
of the implementation is likely to have a really significant impact on
performance, whereas *all* of the code benefits from being as readable
as possible.
If you dont, you risk getting to the end of a development and
realises you have to go back through it and find optimisations to
overcome a performance issue.
I'd say you need to be aware of *architectural* performance at all
times, but not small pieces of *implementation* performance. Using a
profiler later on will usually find any bottlenecks pretty quickly -
so you can target those bottlenecks and use a more efficient solution
instead of the most readable one for that particular area.
Some people call it premature optimizing and say it wastes time.
It's not the waste of time that I primarily object to, it's the
emphasis on performance over readability.
If you already know certain things are faster or more memory
efficient then its good to use them.
So long as you know the situation well enough to be able to predict
that, of course. See later for my analysis of this particular
performance dilemma.
It stops you having to come back to it later and it also helps
people reading your code pick up good ways to do things.
No, it helps people reading your code pick up *fast* ways of doing
things, which:
a) may not be the most readable ways of have the clearest design
b) may slow them down when reading the code - chances are they want to
understand what's going on with the code, not learn performance tricks
c) may well not apply in situations which appear to be similar
But each to their own and I fully understand in this case for a small array,
it wont make a difference at all. But if I or someone reading my code gotto
a similar situation in the future with a more complex problem, having done
the research at this point might for-arm us with the knowledge of which one
performs better.
As a similar "each to their own" I'll readily acknowledge that there
*are* situations where performance matters, and I'm more likely to go
out of my way to use an efficient solution over a readable one if I
know in advance that the code is going to be heavily used, and/or used
on a low-power device. (Library code is a separate issue, as you often
don't know what the callers will be doing.)
Also, there are some "gotchas" which are just worth knowing about to
start with: don't use exceptions for flow control, don't concatenate
strings in a loop where you can use StringBuilder to great effect,
etc. I'm all for being *aware* of performance characteristics,
particularly where the differences are between O(n) and O(n^2)
complexity, for instance. I just believe that readability is king.
Now, let's look at the 4 proposed pieces of code. We can ignore the
last one (using a query expression) immediately, as the previous one
(with the Where call) results in exactly the same code. So, we're down
to 3. Here they are, reproduced and labeled (somewhere arbitrarily)
for future reference:
Option 1: "Manual"
var result = new List<int>();
foreach (int i in anArray)
if (i 500)
result.Add(i);
(Eek about the lack of braces, by the way...)
Option 2: "FindAll"
var resultFind = Array.FindAll(anArray, elt =elt 500);
Option 3: "LINQ"
var resultWhere = anArray.Where(elt =elt 500);
The first thing to consider is whether or not a list is actually
required. If it's not, then both Manual and FindAll will be wasting
time and memory building the list. This will have low impact when
there aren't many items found, but if the array becomes very large and
almost all elements are over 500, a lot of time may be wasted copying
the internal list buffer. Using a LinkedList instead in Manual would
put the algorithm back to O(n) but with a higher constant factor -
creating a node for the list is more expensive in both time and memory
than just setting an element in a backing array. If there is a
reasonable idea beforehand how many elements will be over 500, the
list's initial size could be specified to help alleviate things - at
the cost of potentially wasting memory if the guess is significantly
higher than reality.
The next thing to consider is the cost of iterating over the array.
Using "foreach" on an array compiles to different code than using it
on another IEnumerable<T>. The compiler knows how to access array
elements in a an efficient way, without needing any method calls.
FindAll will likewise be able to iterate very quickly, as it knows the
backing store internally. Where operates on an arbitrary
IEnumerable<Twhich forces it to genererate an IEnuerator<Tand call
MoveNext() and the Current property, which can't be inlined as the JIT
won't know what implementations it wil be used with.
Next comes the comparison. In Manual this is "inline" and probably
about as efficient as you'll get. The other two both use delegates in
the same way. I'd expect this to be less efficient than the inline
test, as the delegate call can't really be inlined as far as I can
see. One thing to watch out for is that this version doesn't capture
any variables. If the limit weren't set to 500, but the value of a
local variable, that would involve a delegate being created in a
different class, with an instance variable accessed for the limit. I
don't know whether or not that would have any significant effect, but
it might.
Finally there's laziness. The Where call won't actually cost anything
(of any significance) immediately. The only cost will be when the
result is iterated over. If the iteration actually stops after the
first few elements for whatever reason, then both Manual and FindAll
have done a lot of work for no benefit, whereas Where won't have done
any more work than is necessary.
All of these considerations will contribute to the performance based
on the actual situation, including the intended use of the result.
Whichever option ends up being the fastest for the OP's case may well
not be the fastest in a different context. That's one of the dangers
of the approach of "look at some code, learn the fastest way of doing
things." So often the answer to a performance question is "it
depends."
Jon