472,794 Members | 1,894 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,794 software developers and data experts.

Regex to remove \t \r \n from string

Hi, i would like to remove a number of characters from my string (\t
\r \n which are throughout the string), i know regex can do this but i
have no idea how. Any pointers much appreciated.

Chris

May 15 '07 #1
15 49715
Chris,

Why not just use three calls to the Replace method on the String class?

string myString = input.Replace("\t", "").Replace("\r", "").Replace("\n",
"");

You can use the character version here as well if you wish.

--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

<mo*****@gmail.comwrote in message
news:11**********************@y80g2000hsf.googlegr oups.com...
Hi, i would like to remove a number of characters from my string (\t
\r \n which are throughout the string), i know regex can do this but i
have no idea how. Any pointers much appreciated.

Chris
May 15 '07 #2
Why not just use three calls to the Replace method on the String class?

I am currently using the 3 replace calls :), however i have always
avoided reglular expressions before this seemed the ideal excuse to
learn them! I would also be interested in turning \r\n in a string to
just \n also. im sure it must be possible?


May 15 '07 #3
Absolutely, just wondering why you wouldn't take the simpler, more
maintainable (depending on who is looking at it, at least from my point of
view) approach. =)

In this case, I believe you can have a regular expression of "[\t\r\n]"
and then call the Replace method, passing your input string and an empty
string (or whatever you want to replace any of the characters in that set
with) and it should work.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

<mo*****@gmail.comwrote in message
news:11**********************@y80g2000hsf.googlegr oups.com...
> Why not just use three calls to the Replace method on the String
class?

I am currently using the 3 replace calls :), however i have always
avoided reglular expressions before this seemed the ideal excuse to
learn them! I would also be interested in turning \r\n in a string to
just \n also. im sure it must be possible?

May 15 '07 #4
it certainly is possible. you should create a little test project and
play with it. thing to remember about regex is to start small and
build up. its not hard really, but its horribly easy to assume that
things will behave differently than the reality.

been a while since ive done captures with PCRE, but for the simple
replace you are probably looking at something like this: [\r|\n|\t]

..net also has some context variable to make sure you have your
endlines localized correctly if thats all you are trying to do.

May 15 '07 #5

"Nicholas Paldino [.NET/C# MVP]" <mv*@spam.guard.caspershouse.comwrote in
message news:49**********************************@microsof t.com...
Absolutely, just wondering why you wouldn't take the simpler, more
maintainable (depending on who is looking at it, at least from my point of
view) approach. =)
Because your simpler method involves three complete string copies instead of
one!
RegEx.Replace ought to do it.
May 15 '07 #6
Ben Voigt <rb*@nospam.nospamwrote:
"Nicholas Paldino [.NET/C# MVP]" <mv*@spam.guard.caspershouse.comwrote in
message news:49**********************************@microsof t.com...
Absolutely, just wondering why you wouldn't take the simpler, more
maintainable (depending on who is looking at it, at least from my point of
view) approach. =)

Because your simpler method involves three complete string copies instead of
one!
Do we have any evidence that performance is an issue here? Further, do
we have evidence that regular expressions will actually make this
faster on the sample data?

Until both of those have been determined, I'd take a default course of
the simplest code which does the job.
RegEx.Replace ought to do it.
At what cost to readability though?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
May 15 '07 #7

"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:MP*********************@msnews.microsoft.com. ..
Ben Voigt <rb*@nospam.nospamwrote:
>"Nicholas Paldino [.NET/C# MVP]" <mv*@spam.guard.caspershouse.comwrote
in
message news:49**********************************@microsof t.com...
Absolutely, just wondering why you wouldn't take the simpler, more
maintainable (depending on who is looking at it, at least from my point
of
view) approach. =)

Because your simpler method involves three complete string copies instead
of
one!

Do we have any evidence that performance is an issue here? Further, do
we have evidence that regular expressions will actually make this
faster on the sample data?

Until both of those have been determined, I'd take a default course of
the simplest code which does the job.
Well, ok, but you asked why anyone would ever choose not to do it that way,
and I gave an example.
>
>RegEx.Replace ought to do it.

At what cost to readability though?
Admittedly, a String.Replace(RegEx, String) method would be far more
readable, but set up a dependency from string on RegEx.
>
--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

May 15 '07 #8
Until both of those have been determined, I'd take a default course of
the simplest code which does the job.

Well, ok, but you asked why anyone would ever choose not to do it that way,
and I gave an example.
That's fair enough.
RegEx.Replace ought to do it.
At what cost to readability though?

Admittedly, a String.Replace(RegEx, String) method would be far more
readable, but set up a dependency from string on RegEx.
More importantly, it sets up a dependency on the reader understanding
regular expressions, which I've seen causing issues time and time again
in these newsgroups.

I'm all for regular expressions when their power is really needed, but
that tends to be pretty rare IME.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
May 15 '07 #9
Jon Skeet [C# MVP] wrote:
Ben Voigt <rb*@nospam.nospamwrote:
>"Nicholas Paldino [.NET/C# MVP]" <mv*@spam.guard.caspershouse.comwrote in
message news:49**********************************@microsof t.com...
>> Absolutely, just wondering why you wouldn't take the simpler, more
maintainable (depending on who is looking at it, at least from my point of
view) approach. =)
Because your simpler method involves three complete string copies instead of
one!

Do we have any evidence that performance is an issue here? Further, do
we have evidence that regular expressions will actually make this
faster on the sample data?
A simple test seems to indicate that regex is slower.

String Replace : 15 -12 x 6666666 : 6,6875
StringBuilder Replace : 15 -12 x 6666666 : 6,546875
Regex Replace : 15 -12 x 6666666 : 27,1875
Regex Replace Optimized : 15 -12 x 6666666 : 15,828125
String Replace : 960 -768 x 104166 : 3,3125
StringBuilder Replace : 960 -768 x 104166 : 2,03125
Regex Replace : 960 -768 x 104166 : 17,421875
Regex Replace Optimized : 960 -768 x 104166 : 13,4375
String Replace : 1000 -1000 x 100000 : 1,15625
StringBuilder Replace : 1000 -1000 x 100000 : 2,4375
Regex Replace : 1000 -1000 x 100000 : 3,78125
Regex Replace Optimized : 1000 -1000 x 100000 : 2,703125

(see code below)
>RegEx.Replace ought to do it.

At what cost to readability though?
Actually I think the regex code is more readable.

Arne

================================================== ========

using System;
using System.Text;
using System.Text.RegularExpressions;

namespace E
{
public class MainClass
{
private const int N = 100000000;
private const string FMT = "{0,-25} : {1} -{2} x {3} : {4}";
private static void TestStringReplace(string s)
{
int n = N / s.Length;
string s2 = null;
DateTime dt1 = DateTime.Now;
for(int i = 0; i < n; i++)
{
s2 = s.Replace("\r", "").Replace("\n", "").Replace("\t", "");
}
DateTime dt2 = DateTime.Now;
Console.WriteLine(String.Format(FMT, "String Replace", s.Length,
s2.Length, n, (dt2 - dt1).TotalSeconds));
}
private static void TestStringBuilderReplace(string s)
{
int n = N / s.Length;
StringBuilder sb = new StringBuilder(s);
string s2 = null;
DateTime dt1 = DateTime.Now;
for(int i = 0; i < n; i++)
{
s2 = sb.Replace("\r", "").Replace("\n", "").Replace("\t",
"").ToString();
}
DateTime dt2 = DateTime.Now;
Console.WriteLine(String.Format(FMT, "StringBuilder Replace",
s.Length, s2.Length, n, (dt2 - dt1).TotalSeconds));
}
private static void TestRegexReplace(string s)
{
int n = N / s.Length;
string s2 = null;
DateTime dt1 = DateTime.Now;
for(int i = 0; i < n; i++)
{
s2 = Regex.Replace(s, "[\r\n\t]", "");
}
DateTime dt2 = DateTime.Now;
Console.WriteLine(String.Format(FMT, "Regex Replace", s.Length,
s2.Length, n, (dt2 - dt1).TotalSeconds));
}
private static void TestRegexReplaceOptimized(string s)
{
int n = N / s.Length;
Regex re = new Regex("[\r\n\t]", RegexOptions.Compiled);
string s2 = null;
DateTime dt1 = DateTime.Now;
for(int i = 0; i < n; i++)
{
s2 = re.Replace(s, "");
}
DateTime dt2 = DateTime.Now;
Console.WriteLine(String.Format(FMT, "Regex Replace Optimized",
s.Length, s2.Length, n, (dt2 - dt1).TotalSeconds));
}
private static void Test(string s)
{
TestStringReplace(s);
TestStringBuilderReplace(s);
TestRegexReplace(s);
TestRegexReplaceOptimized(s);
}
public static void Main(string[] args)
{
string shortstr = "aaa\rbbb\nccc\tddd";
Test(shortstr);
string longstr = shortstr;
longstr += longstr;
longstr += longstr;
longstr += longstr;
longstr += longstr;
longstr += longstr;
longstr += longstr;
Test(longstr);
string nonestr = String.Empty.PadRight(1000, 'A');
Test(nonestr);
Console.ReadLine();
}
}
}
May 20 '07 #10
Arne Vajhøj <ar**@vajhoej.dkwrote:
At what cost to readability though?
Actually I think the regex code is more readable.
Well, it's interesting that your regex is "[\r\n\t]". I'm actually
slightly surprised this even works, as the \r, \n and \t are being
taken literally by the regex engine rather than having been escaped in
the normal way. I'd have expected "[\\r\\n\\t]" or @"[\r\n\t]" to make
it clear to the regex engine that you really meant the carriage return
etc to be part of the regex, and not incidental or for the sake of
readability (splitting the regex over several lines, as shown in
Jesse's example in another thread).

That extra level of escaping which is required in *some* cases (but
clearly not all) as well as having to understand the basic language of
regex in the first place is what makes it less readable in my opinion.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
May 20 '07 #11
Jon Skeet [C# MVP] wrote:
Arne Vajhøj <ar**@vajhoej.dkwrote:
>>At what cost to readability though?
Actually I think the regex code is more readable.

Well, it's interesting that your regex is "[\r\n\t]". I'm actually
slightly surprised this even works, as the \r, \n and \t are being
taken literally by the regex engine rather than having been escaped in
the normal way. I'd have expected "[\\r\\n\\t]" or @"[\r\n\t]" to make
it clear to the regex engine that you really meant the carriage return
etc to be part of the regex, and not incidental or for the sake of
readability (splitting the regex over several lines, as shown in
Jesse's example in another thread).

That extra level of escaping which is required in *some* cases (but
clearly not all) as well as having to understand the basic language of
regex in the first place is what makes it less readable in my opinion.
I just used the regex provided by Nicholas.

And yes there are different rules inside and outside character
classes.

And I can not see the readability problem. The intent of the
code is obvious.

You are not sure that it works correctly. But that can be
verified.

The Substring/IndexOf combo could be less obvious to read
and would still need to be verified that it works.

Arne
May 20 '07 #12
Arne Vajhøj <ar**@vajhoej.dkwrote:
That extra level of escaping which is required in *some* cases (but
clearly not all) as well as having to understand the basic language of
regex in the first place is what makes it less readable in my opinion.
I just used the regex provided by Nicholas.

And yes there are different rules inside and outside character
classes.

And I can not see the readability problem. The intent of the
code is obvious.
To you, possibly. To me, even - I've done just enough regex to work out
what it means, although I wouldn't necessarily say it's obvious. To
every maintenance engineer? Not necessarily.
You are not sure that it works correctly. But that can be
verified.
There are lots of things that can be verified, but which are still less
obvious than writing things in a simpler way.
The Substring/IndexOf combo could be less obvious to read
and would still need to be verified that it works.
There's no Substring/IndexOf to be done - just three calls to Replace.
It's blindingly obvious what *they* do.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
May 20 '07 #13
Jon Skeet [C# MVP] wrote:
Arne Vajhøj <ar**@vajhoej.dkwrote:
>And I can not see the readability problem. The intent of the
code is obvious.

To you, possibly. To me, even - I've done just enough regex to work out
what it means, although I wouldn't necessarily say it's obvious. To
every maintenance engineer? Not necessarily.
It is a feature in .NET - it is a feature in most programming
environments today.

If they don't know, then they should learn.
>The Substring/IndexOf combo could be less obvious to read
and would still need to be verified that it works.

There's no Substring/IndexOf to be done - just three calls to Replace.
It's blindingly obvious what *they* do.
No Substring/IndexOf in this case. But often regex is replaced
with some string manipulation code in the worst tradition of
C str functions.

Arne
May 20 '07 #14
Arne Vajhøj <ar**@vajhoej.dkwrote:
To you, possibly. To me, even - I've done just enough regex to work out
what it means, although I wouldn't necessarily say it's obvious. To
every maintenance engineer? Not necessarily.
It is a feature in .NET - it is a feature in most programming
environments today.

If they don't know, then they should learn.
I'd rather not have to check the ins and outs of regular expressions
when there's a *very* simple alternative. It's so easy to go wrong with
regular expressions - I only use them when they provide a clear
benefit, which I don't believe they do in this case.

Just because you *can* do something with a regex doesn't mean you
*should*. I'm happy to go back and be really careful with regular
expressions when there's a good reason to use them, like validating
something which is genuinely a *pattern*, but I've seen enough people
get confused by them to be wary of them myself.
The Substring/IndexOf combo could be less obvious to read
and would still need to be verified that it works.
There's no Substring/IndexOf to be done - just three calls to Replace.
It's blindingly obvious what *they* do.
No Substring/IndexOf in this case. But often regex is replaced
with some string manipulation code in the worst tradition of
C str functions.
And likewise simple string manipulation code is replaced with a regex
for no reason whatsoever, sometimes introducing bugs at the same time.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
May 20 '07 #15
* Arne Vajhøj wrote, On 20-5-2007 2:18:
Jon Skeet [C# MVP] wrote:
>Ben Voigt <rb*@nospam.nospamwrote:
>>"Nicholas Paldino [.NET/C# MVP]" <mv*@spam.guard.caspershouse.com>
wrote in message
news:49**********************************@micros oft.com...
Absolutely, just wondering why you wouldn't take the simpler,
more maintainable (depending on who is looking at it, at least from
my point of view) approach. =)
Because your simpler method involves three complete string copies
instead of one!

Do we have any evidence that performance is an issue here? Further, do
we have evidence that regular expressions will actually make this
faster on the sample data?
I was intrigued by your results, so I expanded the test a little more.
My adjusted tests also keep in mind the fact that the amount to replace
will have impact on the execution speed.

And I was right.

The thing that scares me though, is that for String manipulation &
Stringbuilder, the impact of a larger amount to remove has 'little'
impact. In fact a stringbuilder the best option if you have a lot to remove.

The Regular Expressions get much, much, much slower when the amount to
remove increases. It looks like there is some very expensive buffer
copying going on in there.

Attached you'll find my adjusted test app. I'll attach the test results
at the bottom of this post. All tests were run under a x64 compiled
executable, no debugger attached, full optimization. This made quite a
difference by the way.
>>RegEx.Replace ought to do it.
>At what cost to readability though?
Actually I think the regex code is more readable.
If there were more characters to strip, say 10 or more, the regex will
become more readable very fast in this case. Though I personally would
have chosen for the following construction:

string victim = "...";
string[] stringsToRemove = new string[]{"\r", "\n", "\t"};
foreach (string stringToRemove in stringsToRemove)
{
victim = victim.Remove(stringToRemove);
// Or a stringbuilder variant;
}

This is easier to read, variables have logical names and it is easy to
add new characters later, or switch strategy without having to go
through 7 or more calls which are all the same.

The regex variant I would have used would have looked like this:

Regex rx = new Regex(@"
[\r\n\t] (?# Characters to replace )
", RegexOptions.Compiled);

Or:

Regex rx = new Regex(@"
( (?# Characters to replace )
\r
| \n
| \t
)
", RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
One thing we haven't looked at till now is the pressure generated on the
garbage collector. Memory usage spiked to 120MB in the most extensive
test case, which I think is pretty much. And things started jo-jo-ing
between 80 and 107 at another point, very large deltas. It was good that
my system had ample RAM and nothing else to do. Processor time isn't the
only thing that counts :).

Ok. As promised, the results:

================================================== ===================

0% whitespace to replace

String Replace : 1000 : 0
StringBuilder Replace : 1000 : 0
Regex Replace : 1000 : 0
Regex Replace Optimized : 1000 : 0
Regex Replace A : 1000 : 16
Regex Replace A Optimized : 1000 : 0
String Replace : 10000 : 16
StringBuilder Replace : 10000 : 47
Regex Replace : 10000 : 31
Regex Replace Optimized : 10000 : 31
Regex Replace A : 10000 : 31
Regex Replace A Optimized : 10000 : 47
String Replace : 100000 : 94
StringBuilder Replace : 100000 : 203
Regex Replace : 100000 : 344
Regex Replace Optimized : 100000 : 391
Regex Replace A : 100000 : 344
Regex Replace A Optimized : 100000 : 281
String Replace : 1000000 : 891
StringBuilder Replace : 1000000 : 1750
Regex Replace : 1000000 : 2859
Regex Replace Optimized : 1000000 : 2891
Regex Replace A : 1000000 : 2891
Regex Replace A Optimized : 1000000 : 2438

5% whitespace to replace

String Replace : 1000 : 0
StringBuilder Replace : 1000 : 0
Regex Replace : 1000 : 16
Regex Replace Optimized : 1000 : 16
Regex Replace A : 1000 : 16
Regex Replace A Optimized : 1000 : 16
String Replace : 10000 : 16
StringBuilder Replace : 10000 : 31
Regex Replace : 10000 : 78
Regex Replace Optimized : 10000 : 63
Regex Replace A : 10000 : 78
Regex Replace A Optimized : 10000 : 63
String Replace : 100000 : 219
StringBuilder Replace : 100000 : 203
Regex Replace : 100000 : 688
Regex Replace Optimized : 100000 : 531
Regex Replace A : 100000 : 656
Regex Replace A Optimized : 100000 : 563
String Replace : 1000000 : 1734
StringBuilder Replace : 1000000 : 1703
Regex Replace : 1000000 : 5531
Regex Replace Optimized : 1000000 : 4406
Regex Replace A : 1000000 : 5516
Regex Replace A Optimized : 1000000 : 4500

50% whitespace to replace

String Replace : 1000 : 0
StringBuilder Replace : 1000 : 0
Regex Replace : 1000 : 47
Regex Replace Optimized : 1000 : 31
Regex Replace A : 1000 : 31
Regex Replace A Optimized : 1000 : 31
String Replace : 10000 : 47
StringBuilder Replace : 10000 : 16
Regex Replace : 10000 : 281
Regex Replace Optimized : 10000 : 203
Regex Replace A : 10000 : 297
Regex Replace A Optimized : 10000 : 219
String Replace : 100000 : 281
StringBuilder Replace : 100000 : 156
Regex Replace : 100000 : 2438
Regex Replace Optimized : 100000 : 1828
Regex Replace A : 100000 : 2375
Regex Replace A Optimized : 100000 : 1828
String Replace : 1000000 : 2344
StringBuilder Replace : 1000000 : 1203
Regex Replace : 1000000 : 20609
Regex Replace Optimized : 1000000 : 14750
Regex Replace A : 1000000 : 19594
Regex Replace A Optimized : 1000000 : 14875

95% whitespace to replace

String Replace : 1000 : 16
StringBuilder Replace : 1000 : 0
Regex Replace : 1000 : 63
Regex Replace Optimized : 1000 : 47
Regex Replace A : 1000 : 63
Regex Replace A Optimized : 1000 : 47
String Replace : 10000 : 47
StringBuilder Replace : 10000 : 16
Regex Replace : 10000 : 500
Regex Replace Optimized : 10000 : 359
Regex Replace A : 10000 : 469
Regex Replace A Optimized : 10000 : 344
String Replace : 100000 : 344
StringBuilder Replace : 100000 : 78
Regex Replace : 100000 : 4125
Regex Replace Optimized : 100000 : 3141
Regex Replace A : 100000 : 4047
Regex Replace A Optimized : 100000 : 2922
String Replace : 1000000 : 2859
StringBuilder Replace : 1000000 : 656
Regex Replace : 1000000 : 32750
Regex Replace Optimized : 1000000 : 24016
Regex Replace A : 1000000 : 31453
Regex Replace A Optimized : 1000000 : 23953

100% whitespace to replace

String Replace : 1000 : 0
StringBuilder Replace : 1000 : 16
Regex Replace : 1000 : 63
Regex Replace Optimized : 1000 : 47
Regex Replace A : 1000 : 63
Regex Replace A Optimized : 1000 : 47
String Replace : 10000 : 31
StringBuilder Replace : 10000 : 0
Regex Replace : 10000 : 516
Regex Replace Optimized : 10000 : 359
Regex Replace A : 10000 : 500
Regex Replace A Optimized : 10000 : 375
String Replace : 100000 : 328
StringBuilder Replace : 100000 : 78
Regex Replace : 100000 : 4172
Regex Replace Optimized : 100000 : 3406
Regex Replace A : 100000 : 4203
Regex Replace A Optimized : 100000 : 3031
String Replace : 1000000 : 2891
StringBuilder Replace : 1000000 : 625
Regex Replace : 1000000 : 34203
Regex Replace Optimized : 1000000 : 24781
Regex Replace A : 1000000 : 32672
Regex Replace A Optimized : 1000000 : 24547
================================================== ===================

And the code:

================================================== ===================
using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;

namespace ConsoleApplication1
{
class Program
{
private const string FMT = "{0,-25} : {1,-15} :
{4,11:##########0}";
private static Regex rxA = new Regex(@"[\r\n\t]",
RegexOptions.Compiled);
private static Regex rxB = new Regex(@"(\r|\n|\t)",
RegexOptions.Compiled | RegexOptions.ExplicitCapture);

private static void TestStringReplace(string s)
{
int n = ComputeRepetitions(s);
string s2 = null;
DateTime dt1 = DateTime.Now;
for (int i = 0; i < n; i++)
{
s2 = s.Replace("\r", "").Replace("\n",
"").Replace("\t", "");
}
DateTime dt2 = DateTime.Now;
Console.WriteLine(String.Format(FMT, "String Replace",
s.Length, s2.Length, n, (dt2 - dt1).TotalMilliseconds));
}
private static void TestStringBuilderReplace(string s)
{
int n = ComputeRepetitions(s);
StringBuilder sb = new StringBuilder(s);
string s2 = null;
DateTime dt1 = DateTime.Now;
for (int i = 0; i < n; i++)
{
s2 = sb.Replace("\r", "").Replace("\n",
"").Replace("\t", "").ToString();
}
DateTime dt2 = DateTime.Now;
Console.WriteLine(String.Format(FMT, "StringBuilder
Replace", s.Length, s2.Length, n, (dt2 - dt1).TotalMilliseconds));
}
private static void TestRegexReplace(string s)
{
int n = ComputeRepetitions(s);
string s2 = null;
DateTime dt1 = DateTime.Now;
for (int i = 0; i < n; i++)
{
s2 = Regex.Replace(s, @"[\r\n\t]", "");
}
DateTime dt2 = DateTime.Now;
Console.WriteLine(String.Format(FMT, "Regex Replace",
s.Length, s2.Length, n, (dt2 - dt1).TotalMilliseconds));
}
private static void TestRegexReplaceOptimized(string s)
{
int n = ComputeRepetitions(s);
Regex re = rxA;
string s2 = null;
DateTime dt1 = DateTime.Now;
for (int i = 0; i < n; i++)
{
s2 = re.Replace(s, "");
}
DateTime dt2 = DateTime.Now;
Console.WriteLine(String.Format(FMT, "Regex Replace
Optimized", s.Length, s2.Length, n, (dt2 - dt1).TotalMilliseconds));
}
private static void TestRegexReplaceAlternate(string s)
{
int n = ComputeRepetitions(s);
string s2 = null;
DateTime dt1 = DateTime.Now;
for (int i = 0; i < n; i++)
{
s2 = Regex.Replace(s, @"(?:\r|\n|\t)", "",
RegexOptions.None);
}
DateTime dt2 = DateTime.Now;
Console.WriteLine(String.Format(FMT, "Regex Replace A",
s.Length, s2.Length, n, (dt2 - dt1).TotalMilliseconds));
}
private static void TestRegexReplaceOptimizedAlternate(string s)
{
int n = ComputeRepetitions(s);
Regex re = rxB;
string s2 = null;
DateTime dt1 = DateTime.Now;
for (int i = 0; i < n; i++)
{
s2 = re.Replace(s, "");
}
DateTime dt2 = DateTime.Now;
Console.WriteLine(String.Format(FMT, "Regex Replace A
Optimized", s.Length, s2.Length, n, (dt2 - dt1).TotalMilliseconds));
}
private static void CollectGarbage()
{
GC.Collect();
}
private static void Test(string s)
{
CollectGarbage();
TestStringReplace(s);
CollectGarbage();
TestStringBuilderReplace(s);
CollectGarbage();
TestRegexReplace(s);
CollectGarbage();
TestRegexReplaceOptimized(s);
CollectGarbage();
TestRegexReplaceAlternate(s);
CollectGarbage();
TestRegexReplaceOptimizedAlternate(s);
}

public static int ComputeRepetitions(string s)
{
int n = Convert.ToInt32(1000 / Math.Log(s.Length));
return n;
}

public static void Main(string[] args)
{
rxA.Replace("", "");
rxB.Replace("", "");

int[] whitespace = new int[] { 0, 5, 50, 95, 100 };
int minsize = 3;
int maxsize = 6;
foreach (int percentage in whitespace)
{
Console.WriteLine("\r\n{0}% whitespace to replace\r\n",
percentage);
for (int i = minsize; i <= maxsize; i++)
{
int length = Convert.ToInt32(Math.Pow(10, i));
string test = GenerateString(length, length,
percentage);
Test(test);
}
}
Console.ReadLine();
}

private static readonly char[] PossibleChars = new char[]
{

'a','b','c','d','e','f','g','h','i','j','k','l','m ','n','o','p','q','r','s','t','u','v','w','x','y', 'z',

'A','B','C','D','E','F','G','H','I','J','K','L','M ','N','O','P','Q','R','S','T','U','V','W','X','Y', 'Z',

'0','1','2','3','4','5','6','7','8',',','.','"','\ '','!','?','-'
};

private static readonly char[] PossibleWhitespaceChars = new char[]
{
' ', '\r', '\n', '\t'
};

public static Random _random = new
Random(DateTime.Now.Millisecond);

public static char GenerateRandomCharacter(char[] allowedChars)
{
int pos = _random.Next(allowedChars.Length - 1);

return allowedChars[pos];
}

public static string GenerateString(int minLength, int
maxLength, int spaceChance)
{
int length = minLength + _random.Next(maxLength - minLength);
StringBuilder sb = new StringBuilder(length);
for (int i = 0; i < length; i++)
{
if (spaceChance != 0 && i != 0 && i != length - 1 &&
_random.Next(100) <= spaceChance)
{

sb.Append(GenerateRandomCharacter(PossibleWhitespa ceChars));
}
else
{
sb.Append(GenerateRandomCharacter(PossibleChars));
}
}
return sb.ToString();
}
}
}
================================================== ===================

Kind Regards,

Jesse Houwing
May 20 '07 #16

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Mark Oliver | last post by:
Hi, Is there any way to get the status during a long elapsed time Matches call like this? string fortyMBString=new string('x',0x2800000); rx=new Regex("x"); rx.Matches(fortyMBString); ...
6
by: Dave | last post by:
I'm struggling with something that should be fairly simple. I just don't know the regext syntax very well, unfortunately. I'd like to parse words out of what is basically a boolean search...
0
by: | last post by:
hi there trend setters, i'm not that strong on regular expressions, i have a string that contains html which contains alot of tags such as: <TD class="content">Blah blah blah</TD> what is...
7
by: Aek | last post by:
Hi everyone, I am trying to construct a regular expression and format string to use with a boost::regex_replace() In my file the sample text is: // .fx shader file FLOAT JOE 3545f; FLOAT...
2
by: Mike P | last post by:
How do you remove all chars such as carriage return and new line from a particular field for all records in a dataset? *** Sent via Developersdex http://www.developersdex.com ***
3
by: aspineux | last post by:
My goal is to write a parser for these imaginary string from the SMTP protocol, regarding RFC 821 and 1869. I'm a little flexible with the BNF from these RFC :-) Any comment ? tests= def...
3
by: Smokey Grindel | last post by:
Alright so I have a string... that can be anything like this then have a number like 102.34m, yes there is a m behind it to say "this is money", no I didn't design the spec thats just how data...
1
by: anglaissam | last post by:
I have a regex that is designed to help improve readability for a html document. "(?=((?!<\/?em).)*<\/em>) The purpose of this regex is to escape " marks from within <EM> affected sentences....
2
by: govind161986 | last post by:
Dear All, I have one data like 123, 456, 789, now I want to remove the last , how to do that? Please help me in this regard, Thanks and regards, Govind
4
Coldfire
by: Coldfire | last post by:
I have this URL http://www.mysite.com/customer/city/2?gclid=1kn23k12j1b32l12lj3 I want to redirect the user to http://www.mysite.com/customer/city/2 basically I need to remove this...
0
by: erikbower65 | last post by:
Using CodiumAI's pr-agent is simple and powerful. Follow these steps: 1. Install CodiumAI CLI: Ensure Node.js is installed, then run 'npm install -g codiumai' in the terminal. 2. Connect to...
0
linyimin
by: linyimin | last post by:
Spring Startup Analyzer generates an interactive Spring application startup report that lets you understand what contributes to the application startup time and helps to optimize it. Support for...
0
by: erikbower65 | last post by:
Here's a concise step-by-step guide for manually installing IntelliJ IDEA: 1. Download: Visit the official JetBrains website and download the IntelliJ IDEA Community or Ultimate edition based on...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Sept 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
14
DJRhino1175
by: DJRhino1175 | last post by:
When I run this code I get an error, its Run-time error# 424 Object required...This is my first attempt at doing something like this. I test the entire code and it worked until I added this - If...
0
by: Rina0 | last post by:
I am looking for a Python code to find the longest common subsequence of two strings. I found this blog post that describes the length of longest common subsequence problem and provides a solution in...
5
by: DJRhino | last post by:
Private Sub CboDrawingID_BeforeUpdate(Cancel As Integer) If = 310029923 Or 310030138 Or 310030152 Or 310030346 Or 310030348 Or _ 310030356 Or 310030359 Or 310030362 Or...
0
by: Mushico | last post by:
How to calculate date of retirement from date of birth
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.