473,385 Members | 1,919 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Regex performance issue

Hi all,

Sorry for the lengthy post but as I learned I should post
concise-and-complete code.

So the code belows shows that the execution of ValidateAddress consumes a
lot of time. In the test it is called a 100 times but in my real app it
could be called 50000 or more times.

So my question is if it is somehow possible to speed this up and if so how
this can be done.

Thanks a lot in advance,

Bart

------ Code -----

using System;
using System.Text.RegularExpressions;

namespace ValidateAddress_speed_test
{
class Program
{
#region Regular expression strings

private const string dbBoolAddress_pattern =
@"^(DB)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])(\.)(DBX)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])(\.)[0-7]$";
private const string dbMemAddress_pattern =
@"^(DB)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])(\.)(DBB|DBW|DBD|DBR)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])$";

private const string boolAddress_pattern =
@"^(M|E|A)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])(\.)[0-7]$";
private const string memAddress_pattern =
@"^(EB|EW|ED|AB|AW|AD|MB|MW|MD|MR)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])$";

#endregion

private static void ValidateAddress(string address)
{
if (address == string.Empty)
throw new ArgumentOutOfRangeException("The addres cannot be
an empty string.");
Regex dbBool_Regex = new Regex(dbBoolAddress_pattern,
RegexOptions.Compiled | RegexOptions.IgnoreCase);

if (dbBool_Regex.IsMatch(address))
return;

Regex dbMem_Regex = new Regex(dbMemAddress_pattern,
RegexOptions.Compiled | RegexOptions.IgnoreCase);

if (dbMem_Regex.IsMatch(address))
return;

Regex boolMem_Regex = new Regex(boolAddress_pattern,
RegexOptions.Compiled | RegexOptions.IgnoreCase);

if (boolMem_Regex.IsMatch(address))
return;

Regex Mem_Regex = new Regex(memAddress_pattern,
RegexOptions.Compiled | RegexOptions.IgnoreCase);

if (Mem_Regex.IsMatch(address))
return;
throw new ArgumentOutOfRangeException(string.Format("{0} is not
a valid address.", address));
}
static void Main(string[] args)
{
Console.WriteLine("Test started...");
System.Diagnostics.Stopwatch sw = new
System.Diagnostics.Stopwatch();
sw.Start();
for (int i = 0; i < 100; i++)
{
//ValidateAddress("DB0.DBX0.0");
//ValidateAddress("DB0.DBW0");
//ValidateAddress("M0.0");
ValidateAddress("MB0");
}

sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds.ToString( ) + " ms");
Console.WriteLine("Press any key to quit");
Console.ReadLine();
}
}
}

Oct 10 '08 #1
6 2041
Hello Bart,

In your validateAddress function you're recompiling the same regexes over
and over again. A compiled regex is faster than an uncompiled one, but the
compilation takes time.

To solve this, put your regexes in a private static readonly Regex instance
and reuse that. Like this:

private static readonly Regex dbBoolAddressRegex = new Regex(dbBoolAddress_pattern,
RegexOptions.Compiled | RegexOptions.IgnoreCase);

then from your validate method, use this instance.

Be sure to read up on thread safety. I'm nnot sure if you'll need to make
sure calls to the regex instances are synchronized. But that is something
you'll probably find in the docs, or which doesn't apply to you anyway.

Jesse
Hi all,

Sorry for the lengthy post but as I learned I should post
concise-and-complete code.

So the code belows shows that the execution of ValidateAddress
consumes a lot of time. In the test it is called a 100 times but in my
real app it could be called 50000 or more times.

So my question is if it is somehow possible to speed this up and if so
how this can be done.

Thanks a lot in advance,

Bart

------ Code -----

using System;
using System.Text.RegularExpressions;
namespace ValidateAddress_speed_test
{
class Program
{
#region Regular expression strings
private const string dbBoolAddress_pattern =
@"^(DB)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-
6][0-5][0-3][0-6])(\.)(DBX)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9
][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])(\.)[0-7]$";
private const string dbMemAddress_pattern =
@"^(DB)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-
6][0-5][0-3][0-6])(\.)(DBB|DBW|DBD|DBR)([0-9]|[1-9][0-9]|[1-9][0-9][0-
9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])$";
private const string boolAddress_pattern =
@"^(M|E|A)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6]
[0-6][0-5][0-3][0-6])(\.)[0-7]$";
private const string memAddress_pattern =
@"^(EB|EW|ED|AB|AW|AD|MB|MW|MD|MR)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1
-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])$";
#endregion

private static void ValidateAddress(string address)
{
if (address == string.Empty)
throw new ArgumentOutOfRangeException("The addres
cannot be
an empty string.");
Regex dbBool_Regex = new Regex(dbBoolAddress_pattern,
RegexOptions.Compiled | RegexOptions.IgnoreCase);

if (dbBool_Regex.IsMatch(address))
return;
Regex dbMem_Regex = new Regex(dbMemAddress_pattern,
RegexOptions.Compiled | RegexOptions.IgnoreCase);

if (dbMem_Regex.IsMatch(address))
return;
Regex boolMem_Regex = new Regex(boolAddress_pattern,
RegexOptions.Compiled | RegexOptions.IgnoreCase);

if (boolMem_Regex.IsMatch(address))
return;
Regex Mem_Regex = new Regex(memAddress_pattern,
RegexOptions.Compiled | RegexOptions.IgnoreCase);

if (Mem_Regex.IsMatch(address))
return;
throw new ArgumentOutOfRangeException(string.Format("{0}
is not
a valid address.", address));
}
static void Main(string[] args)
{
Console.WriteLine("Test started...");
System.Diagnostics.Stopwatch sw = new
System.Diagnostics.Stopwatch();
sw.Start();
for (int i = 0; i < 100; i++)
{
//ValidateAddress("DB0.DBX0.0");
//ValidateAddress("DB0.DBW0");
//ValidateAddress("M0.0");
ValidateAddress("MB0");
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds.ToString( ) + "
ms");
Console.WriteLine("Press any key to quit");
Console.ReadLine();
}
}
}
--
Jesse Houwing
jesse.houwing at sogeti.nl
Oct 10 '08 #2
Further to Jesse's point - the Regex class is itself immutable; it is
my /understanding/ that methods like IsMatch etc are thread-safe. MSDN
doesn't make it very clear, though.

Marc
Oct 10 '08 #3
bart brought next idea :
Hi all,

Sorry for the lengthy post but as I learned I should post
concise-and-complete code.

So the code belows shows that the execution of ValidateAddress consumes a lot
of time. In the test it is called a 100 times but in my real app it could be
called 50000 or more times.

So my question is if it is somehow possible to speed this up and if so how
this can be done.

Thanks a lot in advance,

Bart

------ Code -----

using System;
using System.Text.RegularExpressions;

namespace ValidateAddress_speed_test
{
class Program
{
#region Regular expression strings

private const string dbBoolAddress_pattern =
@"^(DB)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])(\.)(DBX)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])(\.)[0-7]$";
private const string dbMemAddress_pattern =
@"^(DB)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])(\.)(DBB|DBW|DBD|DBR)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])$";

private const string boolAddress_pattern =
@"^(M|E|A)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])(\.)[0-7]$";
private const string memAddress_pattern =
@"^(EB|EW|ED|AB|AW|AD|MB|MW|MD|MR)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])$";

#endregion

private static void ValidateAddress(string address)
{
if (address == string.Empty)
throw new ArgumentOutOfRangeException("The addres cannot be
an empty string.");
Regex dbBool_Regex = new Regex(dbBoolAddress_pattern,
RegexOptions.Compiled | RegexOptions.IgnoreCase);

if (dbBool_Regex.IsMatch(address))
return;

Regex dbMem_Regex = new Regex(dbMemAddress_pattern,
RegexOptions.Compiled | RegexOptions.IgnoreCase);

if (dbMem_Regex.IsMatch(address))
return;

Regex boolMem_Regex = new Regex(boolAddress_pattern,
RegexOptions.Compiled | RegexOptions.IgnoreCase);

if (boolMem_Regex.IsMatch(address))
return;

Regex Mem_Regex = new Regex(memAddress_pattern,
RegexOptions.Compiled | RegexOptions.IgnoreCase);

if (Mem_Regex.IsMatch(address))
return;
throw new ArgumentOutOfRangeException(string.Format("{0} is not a
valid address.", address));
}
static void Main(string[] args)
{
Console.WriteLine("Test started...");
System.Diagnostics.Stopwatch sw = new
System.Diagnostics.Stopwatch();
sw.Start();
for (int i = 0; i < 100; i++)
{
//ValidateAddress("DB0.DBX0.0");
//ValidateAddress("DB0.DBW0");
//ValidateAddress("M0.0");
ValidateAddress("MB0");
}

sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds.ToString( ) + " ms");
Console.WriteLine("Press any key to quit");
Console.ReadLine();
}
}
}
Compiling a regex will cost some time, saving a bit when you use it.
The best way (I think) to use a compiled regex:
Make a static readonly Regex variable with that compiled expression,
then use it multiple times.
This means you get the compile-cost just once and the speed benefit
(which in my experience is not huge but still present) every time.

Hans Kesting
Oct 10 '08 #4
Ah, found it:

http://msdn.microsoft.com/en-us/libr...ons.regex.aspx

"The Regex class is immutable (read-only) and is inherently thread
safe. Regex objects can be created on any thread and shared between
threads."
Oct 10 '08 #5
>
To solve this, put your regexes in a private static readonly Regex
instance and reuse that. Like this:

private static readonly Regex dbBoolAddressRegex = new
Regex(dbBoolAddress_pattern, RegexOptions.Compiled |
RegexOptions.IgnoreCase);
Thanks,

This is a huge performance boost :)

A 100000 times takes now about 763 ms

So this is great...

Bart
Oct 10 '08 #6
Hello Bart,
>To solve this, put your regexes in a private static readonly Regex
instance and reuse that. Like this:

private static readonly Regex dbBoolAddressRegex = new
Regex(dbBoolAddress_pattern, RegexOptions.Compiled |
RegexOptions.IgnoreCase);
Thanks,

This is a huge performance boost :)

A 100000 times takes now about 763 ms

So this is great...
You're welcome :)

--
Jesse Houwing
jesse.houwing at sogeti.nl
Oct 10 '08 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

20
by: jeevankodali | last post by:
Hi I have an .Net application which processes thousands of Xml nodes each day and for each node I am using around 30-40 Regex matches to see if they satisfy some conditions are not. These Regex...
15
by: Kay Schluehr | last post by:
I have a list of strings ls = and want to create a regular expression sx from it, such that sx.match(s) yields a SRE_Match object when s starts with an s_i for one i in . There might be...
10
by: igor.kulkin | last post by:
I have a small utility program written in Python which works pretty slow so I've decided to implement it in C. I did some benchmarking of Python's code performance. One of the parts of the program...
4
by: Henrik Dahl | last post by:
Hello! In my application I have a need for using a regular expression now and then. Often the same regular expression must be used multiple times. For performance reasons I use the...
15
by: morleyc | last post by:
Hi, i would like to remove a number of characters from my string (\t \r \n which are throughout the string), i know regex can do this but i have no idea how. Any pointers much appreciated. Chris
16
by: Mark Chambers | last post by:
Hi there, I'm seeking opinions on the use of regular expression searching. Is there general consensus on whether it's now a best practice to rely on this rather than rolling your own (string)...
1
by: jmacduff | last post by:
I have a performance issue related to regular expressions and caching , hopefully someone can point me in the right direction? I have a asp.net web service that is called several million times a...
14
by: ohmmega | last post by:
hy, i've got a simple question (for somebody who already knows the answer) about regex: i've a string like bla@bla@bla or bla@@bla i like to check the @'s, but couldn't figure it out how to set...
0
by: Frenz | last post by:
Hi, I'm facing a performance issue with the following code: MatchCollection match = regex.Matches(strInput); int i = match.Count; //This line is consuming 98% of CPU time for(int...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.