Okay...had a chance to look through your code. I don't know if I can
cut three minutes or more from the cost while sticking with the same
basic algorithm, but there is some low-hanging fruit. You haven't
provided any information regarding which parts of your initialization is
slow, so it may or may not be that some or all of these changes are
significant. But they are still potentially issues that can be fixed.
See what happens if you clean some of these things up (and by the way,
please when posting code make sure that your indentation is
preserved...it's a lot harder to read the code without it):
Brian Cook wrote:
This reads and formats the text into the single line format;
----- Begin Code -----
using System;
using System.Drawing;
using System.IO;
using System.Collections.Generic;
using System.Text;
using System.Windows.Forms;
namespace TCEditor
{
class Streamer
{
#region File Stream and Format Routine
public static string LayoutInput(string input)
{
StreamReader sr = File.OpenText(input);
StringBuilder sb = new StringBuilder(input.Length);
bool firstLine = true;
string line;
while ((line = sr.ReadLine()) != null)
{
if (line.Trim() == "")
continue;
if (line.Length < 29) { throw new InvalidOperationException("invalid
input"); }
if (line[29] != ' ')
{
int txPos;
int rxPos = -1;
int len = 0;
if (firstLine)
firstLine = false;
else
sb.Append("\r\n");
if (((txPos = line.IndexOf("TX")) -1) || ((rxPos = line.IndexOf("RX")) 0))
Minor issue above: IndexOf() has to scan through the string each time
you call it. Worst-case is, of course, when the text you're looking for
doesn't exist.
You might do better using a regular expression (see Regex class) to
search for both "TX" and "RX" at the same time. Then you can check the
actual match (if any) to determined which matched. Though, that said, I
don't see anything that actually depends on which one matched; you seem
to treat both the same. So even more reason to just use Regex.
{
int charactersTillPoint;
if (txPos -1)
{
charactersTillPoint = txPos;
len = line.Substring(txPos).Length;
Major issue above: this is the worst way to calculate "len". Getting
the length of the original string is fast. Doing a subtraction is fast.
Creating a whole new string just so you can see how long it is? Not fast.
Change this line to:
len = line.Length - txPos;
}
else
{
charactersTillPoint = rxPos;
len = line.Substring(rxPos).Length;
Likewise.
}
string part0 = line.Substring(0, charactersTillPoint);
string part1 = line.Substring(charactersTillPoint);
sb.Append(part0.PadRight(86));
sb.Append(part1);
Medium issue: creating new strings just to append to the StringBuilder
incurs the overhead of the instance creation. But the StringBuilder has
overloads for the Append() method to avoid that.
I did a quick-and-dirty test, and it appears to me that creating the
string almost doubles the total time it takes to append text to a
StringBuilder.
At the very least, I would change the "part1" appending so that it looks
more like this:
sb.Append(line, charactersTillPoint, len);
I suspect you would also gain a win by not using PadRight, and instead
doing that work yourself:
sb.Append(line, 0, charactersTillPoint);
sb.Append(new string(' ', 86 - charactersTillPoint));
That way, you avoid the creation of both substrings (but not the new
padding string, of course). This cuts your string instantiation in this
area of the code from three strings down to just one.
}
else
sb.Append(line);
sb.Append(' ');
if (len == 12)
sb.Append(' ');
}
else
{
sb.Append(line.Substring(31));
Likewise here, use the Append() overload that extracts the substring for
you:
sb.Append(line, 31, line.Length - 31);
One last note on the substring thing: I also tested the overload that
takes a char[] instead of an array, along with the substring index and
length. It's actually even a little faster than passing a string, but
not by a lot. The big win here will be to just stop instantiating new
strings to append.
}
}
return sb.ToString();
}
#endregion
}
}
-----End Code-----
This is the routine that adds the color to the specified locations;
I call this as the next routine in the OpenFileDialog
I'm not going to provide specific comments for this method. Probably
the most costly part of it is all of the selecting and formatting that's
going on, and the best way to fix that would be to simply move the
formatting logic into the same method where you are reading the file,
and insert the necessary RTF format codes, rather than interacting with
the control directly.
That said, on a style perspective, I'd suggest that one significant
thing wrong with this method is that you wind up with two copies of the
text from the control. IMHO, since you want to process the text on a
line-by-line basis, you should just get the string[] from the Lines
property and operate on that. The RichTextBox control has methods such
as GetFirstCharIndexFromLine to allow you to determine actual character
indices for the purpose of formatting, and I would be surprised if using
that method significantly reduces the overall performance of this
method. As a result, your memory footprint will be halved, and the code
will be much closer to your intended algorithm.
A couple of other changes I'd make are to not call DoEvents(), and to
not update your progress control so often.
If you can't get the performance of this stuff down to something
acceptable for being in-line with the UI code, the correct solution is
to move the processing to a background thread. The BackgroundWorker
class is designed especially for this sort of thing and would work
nicely for you.
As far as the updating of the progress control goes, the main issue
there is that it potentially generates UI updates. I haven't checked
its exact implementation, but at the very least you are calling the
control many more times than one would actually be able to perceive.
IMHO, it'd be better to set the max for the control to 100 and update it
any time you progress 1%. A possible middle-ground would be to base the
maximum on the number of lines, and only update it when you hit the code
that checks for the start of a new line. Of course, if you fix the code
to be line-based in the first place, this becomes even easier.
All of this is moot if you change the design to generate RTF text
instead. That's actually the solution that I think would provide the
best results.
Pete