* Arne Vajhøj wrote, On 20-5-2007 2:18:
Jon Skeet [C# MVP] wrote:
>Ben Voigt <rb*@nospam.nospamwrote:
>>"Nicholas Paldino [.NET/C# MVP]" <mv*@spam.guard.caspershouse.com>
wrote in message
news:49**********************************@micros oft.com...
Absolutely, just wondering why you wouldn't take the simpler,
more maintainable (depending on who is looking at it, at least from
my point of view) approach. =)
Because your simpler method involves three complete string copies
instead of one!
Do we have any evidence that performance is an issue here? Further, do
we have evidence that regular expressions will actually make this
faster on the sample data?
I was intrigued by your results, so I expanded the test a little more.
My adjusted tests also keep in mind the fact that the amount to replace
will have impact on the execution speed.
And I was right.
The thing that scares me though, is that for String manipulation &
Stringbuilder, the impact of a larger amount to remove has 'little'
impact. In fact a stringbuilder the best option if you have a lot to remove.
The Regular Expressions get much, much, much slower when the amount to
remove increases. It looks like there is some very expensive buffer
copying going on in there.
Attached you'll find my adjusted test app. I'll attach the test results
at the bottom of this post. All tests were run under a x64 compiled
executable, no debugger attached, full optimization. This made quite a
difference by the way.
>>RegEx.Replace ought to do it.
>At what cost to readability though?
Actually I think the regex code is more readable.
If there were more characters to strip, say 10 or more, the regex will
become more readable very fast in this case. Though I personally would
have chosen for the following construction:
string victim = "...";
string[] stringsToRemove = new string[]{"\r", "\n", "\t"};
foreach (string stringToRemove in stringsToRemove)
{
victim = victim.Remove(stringToRemove);
// Or a stringbuilder variant;
}
This is easier to read, variables have logical names and it is easy to
add new characters later, or switch strategy without having to go
through 7 or more calls which are all the same.
The regex variant I would have used would have looked like this:
Regex rx = new Regex(@"
[\r\n\t] (?# Characters to replace )
", RegexOptions.Compiled);
Or:
Regex rx = new Regex(@"
( (?# Characters to replace )
\r
| \n
| \t
)
", RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
One thing we haven't looked at till now is the pressure generated on the
garbage collector. Memory usage spiked to 120MB in the most extensive
test case, which I think is pretty much. And things started jo-jo-ing
between 80 and 107 at another point, very large deltas. It was good that
my system had ample RAM and nothing else to do. Processor time isn't the
only thing that counts :).
Ok. As promised, the results:
================================================== ===================
0% whitespace to replace
String Replace : 1000 : 0
StringBuilder Replace : 1000 : 0
Regex Replace : 1000 : 0
Regex Replace Optimized : 1000 : 0
Regex Replace A : 1000 : 16
Regex Replace A Optimized : 1000 : 0
String Replace : 10000 : 16
StringBuilder Replace : 10000 : 47
Regex Replace : 10000 : 31
Regex Replace Optimized : 10000 : 31
Regex Replace A : 10000 : 31
Regex Replace A Optimized : 10000 : 47
String Replace : 100000 : 94
StringBuilder Replace : 100000 : 203
Regex Replace : 100000 : 344
Regex Replace Optimized : 100000 : 391
Regex Replace A : 100000 : 344
Regex Replace A Optimized : 100000 : 281
String Replace : 1000000 : 891
StringBuilder Replace : 1000000 : 1750
Regex Replace : 1000000 : 2859
Regex Replace Optimized : 1000000 : 2891
Regex Replace A : 1000000 : 2891
Regex Replace A Optimized : 1000000 : 2438
5% whitespace to replace
String Replace : 1000 : 0
StringBuilder Replace : 1000 : 0
Regex Replace : 1000 : 16
Regex Replace Optimized : 1000 : 16
Regex Replace A : 1000 : 16
Regex Replace A Optimized : 1000 : 16
String Replace : 10000 : 16
StringBuilder Replace : 10000 : 31
Regex Replace : 10000 : 78
Regex Replace Optimized : 10000 : 63
Regex Replace A : 10000 : 78
Regex Replace A Optimized : 10000 : 63
String Replace : 100000 : 219
StringBuilder Replace : 100000 : 203
Regex Replace : 100000 : 688
Regex Replace Optimized : 100000 : 531
Regex Replace A : 100000 : 656
Regex Replace A Optimized : 100000 : 563
String Replace : 1000000 : 1734
StringBuilder Replace : 1000000 : 1703
Regex Replace : 1000000 : 5531
Regex Replace Optimized : 1000000 : 4406
Regex Replace A : 1000000 : 5516
Regex Replace A Optimized : 1000000 : 4500
50% whitespace to replace
String Replace : 1000 : 0
StringBuilder Replace : 1000 : 0
Regex Replace : 1000 : 47
Regex Replace Optimized : 1000 : 31
Regex Replace A : 1000 : 31
Regex Replace A Optimized : 1000 : 31
String Replace : 10000 : 47
StringBuilder Replace : 10000 : 16
Regex Replace : 10000 : 281
Regex Replace Optimized : 10000 : 203
Regex Replace A : 10000 : 297
Regex Replace A Optimized : 10000 : 219
String Replace : 100000 : 281
StringBuilder Replace : 100000 : 156
Regex Replace : 100000 : 2438
Regex Replace Optimized : 100000 : 1828
Regex Replace A : 100000 : 2375
Regex Replace A Optimized : 100000 : 1828
String Replace : 1000000 : 2344
StringBuilder Replace : 1000000 : 1203
Regex Replace : 1000000 : 20609
Regex Replace Optimized : 1000000 : 14750
Regex Replace A : 1000000 : 19594
Regex Replace A Optimized : 1000000 : 14875
95% whitespace to replace
String Replace : 1000 : 16
StringBuilder Replace : 1000 : 0
Regex Replace : 1000 : 63
Regex Replace Optimized : 1000 : 47
Regex Replace A : 1000 : 63
Regex Replace A Optimized : 1000 : 47
String Replace : 10000 : 47
StringBuilder Replace : 10000 : 16
Regex Replace : 10000 : 500
Regex Replace Optimized : 10000 : 359
Regex Replace A : 10000 : 469
Regex Replace A Optimized : 10000 : 344
String Replace : 100000 : 344
StringBuilder Replace : 100000 : 78
Regex Replace : 100000 : 4125
Regex Replace Optimized : 100000 : 3141
Regex Replace A : 100000 : 4047
Regex Replace A Optimized : 100000 : 2922
String Replace : 1000000 : 2859
StringBuilder Replace : 1000000 : 656
Regex Replace : 1000000 : 32750
Regex Replace Optimized : 1000000 : 24016
Regex Replace A : 1000000 : 31453
Regex Replace A Optimized : 1000000 : 23953
100% whitespace to replace
String Replace : 1000 : 0
StringBuilder Replace : 1000 : 16
Regex Replace : 1000 : 63
Regex Replace Optimized : 1000 : 47
Regex Replace A : 1000 : 63
Regex Replace A Optimized : 1000 : 47
String Replace : 10000 : 31
StringBuilder Replace : 10000 : 0
Regex Replace : 10000 : 516
Regex Replace Optimized : 10000 : 359
Regex Replace A : 10000 : 500
Regex Replace A Optimized : 10000 : 375
String Replace : 100000 : 328
StringBuilder Replace : 100000 : 78
Regex Replace : 100000 : 4172
Regex Replace Optimized : 100000 : 3406
Regex Replace A : 100000 : 4203
Regex Replace A Optimized : 100000 : 3031
String Replace : 1000000 : 2891
StringBuilder Replace : 1000000 : 625
Regex Replace : 1000000 : 34203
Regex Replace Optimized : 1000000 : 24781
Regex Replace A : 1000000 : 32672
Regex Replace A Optimized : 1000000 : 24547
================================================== ===================
And the code:
================================================== ===================
using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
private const string FMT = "{0,-25} : {1,-15} :
{4,11:##########0}";
private static Regex rxA = new Regex(@"[\r\n\t]",
RegexOptions.Compiled);
private static Regex rxB = new Regex(@"(\r|\n|\t)",
RegexOptions.Compiled | RegexOptions.ExplicitCapture);
private static void TestStringReplace(string s)
{
int n = ComputeRepetitions(s);
string s2 = null;
DateTime dt1 = DateTime.Now;
for (int i = 0; i < n; i++)
{
s2 = s.Replace("\r", "").Replace("\n",
"").Replace("\t", "");
}
DateTime dt2 = DateTime.Now;
Console.WriteLine(String.Format(FMT, "String Replace",
s.Length, s2.Length, n, (dt2 - dt1).TotalMilliseconds));
}
private static void TestStringBuilderReplace(string s)
{
int n = ComputeRepetitions(s);
StringBuilder sb = new StringBuilder(s);
string s2 = null;
DateTime dt1 = DateTime.Now;
for (int i = 0; i < n; i++)
{
s2 = sb.Replace("\r", "").Replace("\n",
"").Replace("\t", "").ToString();
}
DateTime dt2 = DateTime.Now;
Console.WriteLine(String.Format(FMT, "StringBuilder
Replace", s.Length, s2.Length, n, (dt2 - dt1).TotalMilliseconds));
}
private static void TestRegexReplace(string s)
{
int n = ComputeRepetitions(s);
string s2 = null;
DateTime dt1 = DateTime.Now;
for (int i = 0; i < n; i++)
{
s2 = Regex.Replace(s, @"[\r\n\t]", "");
}
DateTime dt2 = DateTime.Now;
Console.WriteLine(String.Format(FMT, "Regex Replace",
s.Length, s2.Length, n, (dt2 - dt1).TotalMilliseconds));
}
private static void TestRegexReplaceOptimized(string s)
{
int n = ComputeRepetitions(s);
Regex re = rxA;
string s2 = null;
DateTime dt1 = DateTime.Now;
for (int i = 0; i < n; i++)
{
s2 = re.Replace(s, "");
}
DateTime dt2 = DateTime.Now;
Console.WriteLine(String.Format(FMT, "Regex Replace
Optimized", s.Length, s2.Length, n, (dt2 - dt1).TotalMilliseconds));
}
private static void TestRegexReplaceAlternate(string s)
{
int n = ComputeRepetitions(s);
string s2 = null;
DateTime dt1 = DateTime.Now;
for (int i = 0; i < n; i++)
{
s2 = Regex.Replace(s, @"(?:\r|\n|\t)", "",
RegexOptions.None);
}
DateTime dt2 = DateTime.Now;
Console.WriteLine(String.Format(FMT, "Regex Replace A",
s.Length, s2.Length, n, (dt2 - dt1).TotalMilliseconds));
}
private static void TestRegexReplaceOptimizedAlternate(string s)
{
int n = ComputeRepetitions(s);
Regex re = rxB;
string s2 = null;
DateTime dt1 = DateTime.Now;
for (int i = 0; i < n; i++)
{
s2 = re.Replace(s, "");
}
DateTime dt2 = DateTime.Now;
Console.WriteLine(String.Format(FMT, "Regex Replace A
Optimized", s.Length, s2.Length, n, (dt2 - dt1).TotalMilliseconds));
}
private static void CollectGarbage()
{
GC.Collect();
}
private static void Test(string s)
{
CollectGarbage();
TestStringReplace(s);
CollectGarbage();
TestStringBuilderReplace(s);
CollectGarbage();
TestRegexReplace(s);
CollectGarbage();
TestRegexReplaceOptimized(s);
CollectGarbage();
TestRegexReplaceAlternate(s);
CollectGarbage();
TestRegexReplaceOptimizedAlternate(s);
}
public static int ComputeRepetitions(string s)
{
int n = Convert.ToInt32(1000 / Math.Log(s.Length));
return n;
}
public static void Main(string[] args)
{
rxA.Replace("", "");
rxB.Replace("", "");
int[] whitespace = new int[] { 0, 5, 50, 95, 100 };
int minsize = 3;
int maxsize = 6;
foreach (int percentage in whitespace)
{
Console.WriteLine("\r\n{0}% whitespace to replace\r\n",
percentage);
for (int i = minsize; i <= maxsize; i++)
{
int length = Convert.ToInt32(Math.Pow(10, i));
string test = GenerateString(length, length,
percentage);
Test(test);
}
}
Console.ReadLine();
}
private static readonly char[] PossibleChars = new char[]
{
'a','b','c','d','e','f','g','h','i','j','k','l','m ','n','o','p','q','r','s','t','u','v','w','x','y', 'z',
'A','B','C','D','E','F','G','H','I','J','K','L','M ','N','O','P','Q','R','S','T','U','V','W','X','Y', 'Z',
'0','1','2','3','4','5','6','7','8',',','.','"','\ '','!','?','-'
};
private static readonly char[] PossibleWhitespaceChars = new char[]
{
' ', '\r', '\n', '\t'
};
public static Random _random = new
Random(DateTime.Now.Millisecond);
public static char GenerateRandomCharacter(char[] allowedChars)
{
int pos = _random.Next(allowedChars.Length - 1);
return allowedChars[pos];
}
public static string GenerateString(int minLength, int
maxLength, int spaceChance)
{
int length = minLength + _random.Next(maxLength - minLength);
StringBuilder sb = new StringBuilder(length);
for (int i = 0; i < length; i++)
{
if (spaceChance != 0 && i != 0 && i != length - 1 &&
_random.Next(100) <= spaceChance)
{
sb.Append(GenerateRandomCharacter(PossibleWhitespa ceChars));
}
else
{
sb.Append(GenerateRandomCharacter(PossibleChars));
}
}
return sb.ToString();
}
}
}
================================================== ===================
Kind Regards,
Jesse Houwing