By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,854 Members | 1,958 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,854 IT Pros & Developers. It's quick & easy.

C# compiler fails to optimize for loop same as foreach

P: n/a
I came across a reference on a web site
(http://www.personalmicrocosms.com/ht...htextbox_lines )
that said to speed up access to a rich text box's lines that you needed to
use a "foreach" loop instead of a "for" loop. This made absolutely no sense
to me, but the author had posted his code and timing results. The "foreach"
(a VB and other languages construct) was 0.01 seconds to access 1000 lines in
rich text box, whereas the "for" loop (a traditional C++ construct) was an
astounding 25 seconds (on a not very fast PC).

I recreated a test file using the partial source code posted by the author
and verified that there is a SIGNIFICANT performance difference between the
two constructs (although on my PC is was 0.01 seconds vs 3.6 seconds - still
a noticeable delay). Unfortunately, there was no explanation as to why this
was the case and I couldn't see anything as to why one loop construct would
be different. Looking at the generated IL code with Lutz Roeder's Reflector
tool, I see that the real culprit is not the loop structure but the
get_Lines() function that is pulled out of the loop in the "foreach" loop and
not in the "for" loop code. Which, leads to me post this question about the
differences in complier code generation/optimization and is there any setting
that can change this.

Interestingly, this is true for both Debug and Release builds. The compiler
generated code that called that function twice for each pass of the loop
(once for the loop index check and then again for the length calculation).
Pulling out unneccessary function calls is pretty basic optimization, and I
surprised that the compiler didn't detect this.

With the IDE's intellisense and auto completion features, the "for" loop
construct shown in the code below seems like something that someone might
actually code up, and of course who would have figured out that the get_Lines
method would be so performance intensive.

Makes me wonder if there are any other gotchas like this.

Thanks, Mike L.

--------------------------------------------------------------------------------------------

//Simple windows form with a richtextbox control, initialized w/1000 lines
of text (e.g., "line #101", etc).

private void ForLoopButton_Click(object sender, System.EventArgs e)
{
Cursor.Current = Cursors.WaitCursor;
int Len = 0;
int Start = Environment.TickCount;
for (int i = 0; i < TheRichTextBox.Lines.Length; i++)
{
Len += TheRichTextBox.Lines[i].Length;
}
int ElapsedTime = Environment.TickCount - Start;
ResultsTextBox.Clear();
RsultsTextBox.Text = "for loop\r\n\r\nElapsed time = " + ((double)
ElapsedTime / (double) 1000.0).ToString() + " seconds\r\n\r\nResult = " +
Len.ToString();
Cursor.Current = Cursors.Arrow;
}

private void ForEachLoopButton_Click(object sender, System.EventArgs e)
{
Cursor.Current = Cursors.WaitCursor;
int Len = 0;
int Start = Environment.TickCount;
foreach (String Line in TheRichTextBox.Lines)
{
Len += Line.Length;
}
int ElapsedTime = Environment.TickCount - Start;
ResultsTextBox.Clear();
ResultsTextBox.Text = "foreach loop\r\n\r\nElapsed time = " + ((double)
ElapsedTime / (double) 1000.0).ToString() + " seconds\r\n\r\nResult = " +
Len.ToString();
Cursor.Current = Cursors.Arrow;
}

private void ForLoopButton2_Click(object sender, System.EventArgs e)
{
//Performance results now same as ForEachLoopButton_Click with the changes
made.
Cursor.Current = Cursors.WaitCursor;
int Len = 0;
int Start = Environment.TickCount;
string[] lines = TheTextBox.Lines;
for (int i = 0; i < lines.Length; i++)
{
Len += lines[i].Length;
}
int ElapsedTime = Environment.TickCount - Start;
ResultsTextBox.Clear();
RsultsTextBox.Text = "for loop\r\n\r\nElapsed time = " + ((double)
ElapsedTime / (double) 1000.0).ToString() + " seconds\r\n\r\nResult = " +
Len.ToString();
Cursor.Current = Cursors.Arrow;
}

Nov 16 '05 #1
Share this Question
Share on Google+
15 Replies


P: n/a

"Mike Lansdaal" <ml*****@newsgroup.nospam> wrote in message
news:70**********************************@microsof t.com...
I came across a reference on a web site
(http://www.personalmicrocosms.com/ht...htextbox_lines ) that said to speed up access to a rich text box's lines that you needed to
use a "foreach" loop instead of a "for" loop. This made absolutely no sense to me, but the author had posted his code and timing results. The "foreach" (a VB and other languages construct) was 0.01 seconds to access 1000 lines in rich text box, whereas the "for" loop (a traditional C++ construct) was an
astounding 25 seconds (on a not very fast PC).

I recreated a test file using the partial source code posted by the author
and verified that there is a SIGNIFICANT performance difference between the two constructs (although on my PC is was 0.01 seconds vs 3.6 seconds - still a noticeable delay). Unfortunately, there was no explanation as to why this was the case and I couldn't see anything as to why one loop construct would be different. Looking at the generated IL code with Lutz Roeder's Reflector tool, I see that the real culprit is not the loop structure but the
get_Lines() function that is pulled out of the loop in the "foreach" loop and not in the "for" loop code. Which, leads to me post this question about the differences in complier code generation/optimization and is there any setting that can change this.

Interestingly, this is true for both Debug and Release builds. The compiler generated code that called that function twice for each pass of the loop
(once for the loop index check and then again for the length calculation).
Pulling out unneccessary function calls is pretty basic optimization, and I surprised that the compiler didn't detect this.

With the IDE's intellisense and auto completion features, the "for" loop
construct shown in the code below seems like something that someone might
actually code up, and of course who would have figured out that the get_Lines method would be so performance intensive.

Makes me wonder if there are any other gotchas like this.

Thanks, Mike L.

-------------------------------------------------------------------------- ------------------
//Simple windows form with a richtextbox control, initialized w/1000 lines
of text (e.g., "line #101", etc).

private void ForLoopButton_Click(object sender, System.EventArgs e)
{
Cursor.Current = Cursors.WaitCursor;
int Len = 0;
int Start = Environment.TickCount;
for (int i = 0; i < TheRichTextBox.Lines.Length; i++)
{
Len += TheRichTextBox.Lines[i].Length;
}
int ElapsedTime = Environment.TickCount - Start;
ResultsTextBox.Clear();
RsultsTextBox.Text = "for loop\r\n\r\nElapsed time = " + ((double)
ElapsedTime / (double) 1000.0).ToString() + " seconds\r\n\r\nResult = " +
Len.ToString();
Cursor.Current = Cursors.Arrow;
}

private void ForEachLoopButton_Click(object sender, System.EventArgs e)
{
Cursor.Current = Cursors.WaitCursor;
int Len = 0;
int Start = Environment.TickCount;
foreach (String Line in TheRichTextBox.Lines)
{
Len += Line.Length;
}
int ElapsedTime = Environment.TickCount - Start;
ResultsTextBox.Clear();
ResultsTextBox.Text = "foreach loop\r\n\r\nElapsed time = " + ((double)
ElapsedTime / (double) 1000.0).ToString() + " seconds\r\n\r\nResult = " +
Len.ToString();
Cursor.Current = Cursors.Arrow;
}

private void ForLoopButton2_Click(object sender, System.EventArgs e)
{
//Performance results now same as ForEachLoopButton_Click with the changes
made.
Cursor.Current = Cursors.WaitCursor;
int Len = 0;
int Start = Environment.TickCount;
string[] lines = TheTextBox.Lines;
for (int i = 0; i < lines.Length; i++)
{
Len += lines[i].Length;
}
int ElapsedTime = Environment.TickCount - Start;
ResultsTextBox.Clear();
RsultsTextBox.Text = "for loop\r\n\r\nElapsed time = " + ((double)
ElapsedTime / (double) 1000.0).ToString() + " seconds\r\n\r\nResult = " +
Len.ToString();
Cursor.Current = Cursors.Arrow;
}


Amazing! I had no idea. I sure hope someone is capable of explaining this.

/ Fredrik
Nov 16 '05 #2

P: n/a

Hi

I found something here that may explain this problem:
http://www.codeproject.com/csharp/foreach.asp

/ Fredrik
Nov 16 '05 #3

P: n/a

"Mike Lansdaal" <ml*****@newsgroup.nospam> wrote in message
news:70**********************************@microsof t.com...
I came across a reference on a web site
(http://www.personalmicrocosms.com/ht...htextbox_lines
)
that said to speed up access to a rich text box's lines that you needed to
use a "foreach" loop instead of a "for" loop. This made absolutely no
sense
to me, but the author had posted his code and timing results. The
"foreach"
(a VB and other languages construct) was 0.01 seconds to access 1000 lines
in
rich text box, whereas the "for" loop (a traditional C++ construct) was an
astounding 25 seconds (on a not very fast PC).

I recreated a test file using the partial source code posted by the author
and verified that there is a SIGNIFICANT performance difference between
the
two constructs (although on my PC is was 0.01 seconds vs 3.6 seconds -
still
a noticeable delay). Unfortunately, there was no explanation as to why
this
was the case and I couldn't see anything as to why one loop construct
would
be different. Looking at the generated IL code with Lutz Roeder's
Reflector
tool, I see that the real culprit is not the loop structure but the
get_Lines() function that is pulled out of the loop in the "foreach" loop
and
not in the "for" loop code. Which, leads to me post this question about
the
differences in complier code generation/optimization and is there any
setting
that can change this.


Ah. It's not a compiler problem. It's a property problem.

get_Lines() is expensive. Who Knew? That's the problem with properties: you
never know how much code they run.

Anyway, try this:

string[] lines = TheRichTextBox.Lines;
for (int i = 0; i < lines.Length; i++)
{
Len += lines[i].Length;
}

It should be similar to the foreach case.

David
Nov 16 '05 #4

P: n/a
Frederik - Interesting article (which recommends to always use for instead of
foreach, but also generated opposing thoughts). I found this blog link in
the article comments
(http://blogs.msdn.com/brada/archive/...29/123105.aspx ) which suggests
that the code generation forthe two loop types are "bascially identical" and
that a "foreach" is recommended for "clarity".

Thanks, Mike

"Fredrik Wahlgren" wrote:

Hi

I found something here that may explain this problem:
http://www.codeproject.com/csharp/foreach.asp

/ Fredrik

Nov 16 '05 #5

P: n/a
Yes, exactly. Thats what I did (and that's what the foreach does). My
concern was that in one case the compiler did one thing (pulled the property
call out of the loop) and in another case didn't (in the for loop case, its
there for the loop check and again for the calcuation).

Thanks, Mike

"David Browne" wrote:

"Mike Lansdaal" <ml*****@newsgroup.nospam> wrote in message
news:70**********************************@microsof t.com...
I came across a reference on a web site
(http://www.personalmicrocosms.com/ht...htextbox_lines
)
that said to speed up access to a rich text box's lines that you needed to
use a "foreach" loop instead of a "for" loop. This made absolutely no
sense
to me, but the author had posted his code and timing results. The
"foreach"
(a VB and other languages construct) was 0.01 seconds to access 1000 lines
in
rich text box, whereas the "for" loop (a traditional C++ construct) was an
astounding 25 seconds (on a not very fast PC).

I recreated a test file using the partial source code posted by the author
and verified that there is a SIGNIFICANT performance difference between
the
two constructs (although on my PC is was 0.01 seconds vs 3.6 seconds -
still
a noticeable delay). Unfortunately, there was no explanation as to why
this
was the case and I couldn't see anything as to why one loop construct
would
be different. Looking at the generated IL code with Lutz Roeder's
Reflector
tool, I see that the real culprit is not the loop structure but the
get_Lines() function that is pulled out of the loop in the "foreach" loop
and
not in the "for" loop code. Which, leads to me post this question about
the
differences in complier code generation/optimization and is there any
setting
that can change this.


Ah. It's not a compiler problem. It's a property problem.

get_Lines() is expensive. Who Knew? That's the problem with properties: you
never know how much code they run.

Anyway, try this:

string[] lines = TheRichTextBox.Lines;
for (int i = 0; i < lines.Length; i++)
{
Len += lines[i].Length;
}

It should be similar to the foreach case.

David

Nov 16 '05 #6

P: n/a

"Mike Lansdaal" <ml*****@newsgroup.nospam> wrote in message
news:3E**********************************@microsof t.com...
Yes, exactly. Thats what I did (and that's what the foreach does). My
concern was that in one case the compiler did one thing (pulled the
property
call out of the loop) and in another case didn't (in the for loop case,
its
there for the loop check and again for the calcuation).


Well in the for loop it can't pull it out. For all the compiler knows
get_Lines() might start returning a completely different array half way
through the iteration.

In the foreach case, the compiler has more information. It knows that it's
iterating the result of get_Lines().
David
Nov 16 '05 #7

P: n/a
David - Thanks. I think I was assuming something about the context of the
iteration, but I see that with your explanation that it would be impossible
for the compiler to determine that.

Thanks, Mike

"David Browne" wrote:

"Mike Lansdaal" <ml*****@newsgroup.nospam> wrote in message
news:3E**********************************@microsof t.com...
Yes, exactly. Thats what I did (and that's what the foreach does). My
concern was that in one case the compiler did one thing (pulled the
property
call out of the loop) and in another case didn't (in the for loop case,
its
there for the loop check and again for the calcuation).


Well in the for loop it can't pull it out. For all the compiler knows
get_Lines() might start returning a completely different array half way
through the iteration.

In the foreach case, the compiler has more information. It knows that it's
iterating the result of get_Lines().
David

Nov 16 '05 #8

P: n/a
Hi Mike,

Here is an official document from MSDN. I think it will be clearer after
checking this article.

http://msdn.microsoft.com/library/de...us/dnpag/html/
scalenetchapt05.asp

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 16 '05 #9

P: n/a
Hi Mike,

Generally, a For loop has better performance than Foreach loop. In the link
you have provided in your first post, there are some differences between
For and Foreach loop which make the performance much differenct. For
example,

for (int i = 0; i < TheRichTextBox.Lines.Length; i++)

Since RichTextBox.Length returns length by caculating, so each time in the
loop, this property will be called. Also this calculating the length takes
a lot of time, since it needs to go through all the text in the
RichTextbox. If you change the code to the following, the return time will
tremendously decrease.

int a=TheRichTextBox.Lines.Length;
for (int i = 0; i < a; i++)

There are also many other differences in getting the line reference here.
So I don't think this tesing result is reliable. Please refer to the
official document as I provided in my last post.

HTH.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 16 '05 #10

P: n/a
Ah. It's not a compiler problem. It's a property problem.


That is true, never make /use properties that return arrays of something.
It is a guideline somewhere (just don't ask me where I've seen this). This
property is converting each time the internal data to an array of string.

public string[] Lines {get; set;}

kind regards

Alexander

Nov 16 '05 #11

P: n/a
Alexander Muylaert wrote:
Ah. It's not a compiler problem. It's a property problem.

That is true, never make /use properties that return arrays of something.
It is a guideline somewhere (just don't ask me where I've seen this). This
property is converting each time the internal data to an array of string.

public string[] Lines {get; set;}

kind regards

Alexander


That's an overstatement to me. How would/could you (simply) return a
list of items atomically without using lock semantics (in a
multi-threaded context) ?

Btw, enumerators, most of the time and in this particular case, are the
way to go, as they know best how to iterate over the internal data.
They might also handle thread-safety behind the scenes.

regards

Benoit
Nov 16 '05 #12

P: n/a
or how about :

for ( int i = TheRichTextBox.Lines.Length-1 ; i >=0 ; i-- )

"Kevin Yu [MSFT]" <v-****@online.microsoft.com> wrote in message
news:X7*************@cpmsftngxa10.phx.gbl...
Hi Mike,

Generally, a For loop has better performance than Foreach loop. In the link you have provided in your first post, there are some differences between
For and Foreach loop which make the performance much differenct. For
example,

for (int i = 0; i < TheRichTextBox.Lines.Length; i++)

Since RichTextBox.Length returns length by caculating, so each time in the
loop, this property will be called. Also this calculating the length takes
a lot of time, since it needs to go through all the text in the
RichTextbox. If you change the code to the following, the return time will
tremendously decrease.

int a=TheRichTextBox.Lines.Length;
for (int i = 0; i < a; i++)

There are also many other differences in getting the line reference here.
So I don't think this tesing result is reliable. Please refer to the
official document as I provided in my last post.

HTH.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 16 '05 #13

P: n/a

"Kevin Yu [MSFT]" <v-****@online.microsoft.com> wrote in message
news:X7*************@cpmsftngxa10.phx.gbl...
Hi Mike,

Generally, a For loop has better performance than Foreach loop. In the
link
you have provided in your first post, there are some differences between
For and Foreach loop which make the performance much differenct. For
example,

for (int i = 0; i < TheRichTextBox.Lines.Length; i++)

Since RichTextBox.Length returns length by caculating, so each time in the
loop, this property will be called. Also this calculating the length takes
a lot of time, since it needs to go through all the text in the
RichTextbox. If you change the code to the following, the return time will
tremendously decrease.

int a=TheRichTextBox.Lines.Length;
for (int i = 0; i < a; i++)

In C that would be preferable, but in C#

string[] lines = TheRichTextBox.Lines;
for (int i=0; i < lines.Length; i++)
{
length += lines[i];
}

is better since the bounds check on the array access will be eliminated.

David
Nov 16 '05 #14

P: n/a

"gerry" <ge**@hotmail.com> wrote in message
news:Od**************@TK2MSFTNGP09.phx.gbl...
or how about :

for ( int i = TheRichTextBox.Lines.Length-1 ; i >=0 ; i-- )

That's no good since it still calls RichTextBox.get_Lines() on each
iteration.

David
Nov 16 '05 #15

P: n/a
You're welcome, Mike.

Thanks for sharing your experience with all the people here. If you have
any questions, please feel free to post them in the community.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 16 '05 #16

This discussion thread is closed

Replies have been disabled for this discussion.