473,597 Members | 2,339 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Regex performance issue

Hi all,

Sorry for the lengthy post but as I learned I should post
concise-and-complete code.

So the code belows shows that the execution of ValidateAddress consumes a
lot of time. In the test it is called a 100 times but in my real app it
could be called 50000 or more times.

So my question is if it is somehow possible to speed this up and if so how
this can be done.

Thanks a lot in advance,

Bart

------ Code -----

using System;
using System.Text.Reg ularExpressions ;

namespace ValidateAddress _speed_test
{
class Program
{
#region Regular expression strings

private const string dbBoolAddress_p attern =
@"^(DB)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])(\.)(DBX)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])(\.)[0-7]$";
private const string dbMemAddress_pa ttern =
@"^(DB)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])(\.)(DBB|DBW|D BD|DBR)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])$";

private const string boolAddress_pat tern =
@"^(M|E|A)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])(\.)[0-7]$";
private const string memAddress_patt ern =
@"^(EB|EW|ED|AB |AW|AD|MB|MW|MD |MR)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])$";

#endregion

private static void ValidateAddress (string address)
{
if (address == string.Empty)
throw new ArgumentOutOfRa ngeException("T he addres cannot be
an empty string.");
Regex dbBool_Regex = new Regex(dbBoolAdd ress_pattern,
RegexOptions.Co mpiled | RegexOptions.Ig noreCase);

if (dbBool_Regex.I sMatch(address) )
return;

Regex dbMem_Regex = new Regex(dbMemAddr ess_pattern,
RegexOptions.Co mpiled | RegexOptions.Ig noreCase);

if (dbMem_Regex.Is Match(address))
return;

Regex boolMem_Regex = new Regex(boolAddre ss_pattern,
RegexOptions.Co mpiled | RegexOptions.Ig noreCase);

if (boolMem_Regex. IsMatch(address ))
return;

Regex Mem_Regex = new Regex(memAddres s_pattern,
RegexOptions.Co mpiled | RegexOptions.Ig noreCase);

if (Mem_Regex.IsMa tch(address))
return;
throw new ArgumentOutOfRa ngeException(st ring.Format("{0 } is not
a valid address.", address));
}
static void Main(string[] args)
{
Console.WriteLi ne("Test started...");
System.Diagnost ics.Stopwatch sw = new
System.Diagnost ics.Stopwatch() ;
sw.Start();
for (int i = 0; i < 100; i++)
{
//ValidateAddress ("DB0.DBX0.0 ");
//ValidateAddress ("DB0.DBW0") ;
//ValidateAddress ("M0.0");
ValidateAddress ("MB0");
}

sw.Stop();
Console.WriteLi ne(sw.ElapsedMi lliseconds.ToSt ring() + " ms");
Console.WriteLi ne("Press any key to quit");
Console.ReadLin e();
}
}
}

Oct 10 '08 #1
6 2067
Hello Bart,

In your validateAddress function you're recompiling the same regexes over
and over again. A compiled regex is faster than an uncompiled one, but the
compilation takes time.

To solve this, put your regexes in a private static readonly Regex instance
and reuse that. Like this:

private static readonly Regex dbBoolAddressRe gex = new Regex(dbBoolAdd ress_pattern,
RegexOptions.Co mpiled | RegexOptions.Ig noreCase);

then from your validate method, use this instance.

Be sure to read up on thread safety. I'm nnot sure if you'll need to make
sure calls to the regex instances are synchronized. But that is something
you'll probably find in the docs, or which doesn't apply to you anyway.

Jesse
Hi all,

Sorry for the lengthy post but as I learned I should post
concise-and-complete code.

So the code belows shows that the execution of ValidateAddress
consumes a lot of time. In the test it is called a 100 times but in my
real app it could be called 50000 or more times.

So my question is if it is somehow possible to speed this up and if so
how this can be done.

Thanks a lot in advance,

Bart

------ Code -----

using System;
using System.Text.Reg ularExpressions ;
namespace ValidateAddress _speed_test
{
class Program
{
#region Regular expression strings
private const string dbBoolAddress_p attern =
@"^(DB)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-
6][0-5][0-3][0-6])(\.)(DBX)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9
][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])(\.)[0-7]$";
private const string dbMemAddress_pa ttern =
@"^(DB)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-
6][0-5][0-3][0-6])(\.)(DBB|DBW|D BD|DBR)([0-9]|[1-9][0-9]|[1-9][0-9][0-
9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])$";
private const string boolAddress_pat tern =
@"^(M|E|A)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6]
[0-6][0-5][0-3][0-6])(\.)[0-7]$";
private const string memAddress_patt ern =
@"^(EB|EW|ED|AB |AW|AD|MB|MW|MD |MR)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1
-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])$";
#endregion

private static void ValidateAddress (string address)
{
if (address == string.Empty)
throw new ArgumentOutOfRa ngeException("T he addres
cannot be
an empty string.");
Regex dbBool_Regex = new Regex(dbBoolAdd ress_pattern,
RegexOptions.Co mpiled | RegexOptions.Ig noreCase);

if (dbBool_Regex.I sMatch(address) )
return;
Regex dbMem_Regex = new Regex(dbMemAddr ess_pattern,
RegexOptions.Co mpiled | RegexOptions.Ig noreCase);

if (dbMem_Regex.Is Match(address))
return;
Regex boolMem_Regex = new Regex(boolAddre ss_pattern,
RegexOptions.Co mpiled | RegexOptions.Ig noreCase);

if (boolMem_Regex. IsMatch(address ))
return;
Regex Mem_Regex = new Regex(memAddres s_pattern,
RegexOptions.Co mpiled | RegexOptions.Ig noreCase);

if (Mem_Regex.IsMa tch(address))
return;
throw new ArgumentOutOfRa ngeException(st ring.Format("{0 }
is not
a valid address.", address));
}
static void Main(string[] args)
{
Console.WriteLi ne("Test started...");
System.Diagnost ics.Stopwatch sw = new
System.Diagnost ics.Stopwatch() ;
sw.Start();
for (int i = 0; i < 100; i++)
{
//ValidateAddress ("DB0.DBX0.0 ");
//ValidateAddress ("DB0.DBW0") ;
//ValidateAddress ("M0.0");
ValidateAddress ("MB0");
}
sw.Stop();
Console.WriteLi ne(sw.ElapsedMi lliseconds.ToSt ring() + "
ms");
Console.WriteLi ne("Press any key to quit");
Console.ReadLin e();
}
}
}
--
Jesse Houwing
jesse.houwing at sogeti.nl
Oct 10 '08 #2
Further to Jesse's point - the Regex class is itself immutable; it is
my /understanding/ that methods like IsMatch etc are thread-safe. MSDN
doesn't make it very clear, though.

Marc
Oct 10 '08 #3
bart brought next idea :
Hi all,

Sorry for the lengthy post but as I learned I should post
concise-and-complete code.

So the code belows shows that the execution of ValidateAddress consumes a lot
of time. In the test it is called a 100 times but in my real app it could be
called 50000 or more times.

So my question is if it is somehow possible to speed this up and if so how
this can be done.

Thanks a lot in advance,

Bart

------ Code -----

using System;
using System.Text.Reg ularExpressions ;

namespace ValidateAddress _speed_test
{
class Program
{
#region Regular expression strings

private const string dbBoolAddress_p attern =
@"^(DB)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])(\.)(DBX)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])(\.)[0-7]$";
private const string dbMemAddress_pa ttern =
@"^(DB)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])(\.)(DBB|DBW|D BD|DBR)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])$";

private const string boolAddress_pat tern =
@"^(M|E|A)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])(\.)[0-7]$";
private const string memAddress_patt ern =
@"^(EB|EW|ED|AB |AW|AD|MB|MW|MD |MR)([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-6][0-6][0-5][0-3][0-6])$";

#endregion

private static void ValidateAddress (string address)
{
if (address == string.Empty)
throw new ArgumentOutOfRa ngeException("T he addres cannot be
an empty string.");
Regex dbBool_Regex = new Regex(dbBoolAdd ress_pattern,
RegexOptions.Co mpiled | RegexOptions.Ig noreCase);

if (dbBool_Regex.I sMatch(address) )
return;

Regex dbMem_Regex = new Regex(dbMemAddr ess_pattern,
RegexOptions.Co mpiled | RegexOptions.Ig noreCase);

if (dbMem_Regex.Is Match(address))
return;

Regex boolMem_Regex = new Regex(boolAddre ss_pattern,
RegexOptions.Co mpiled | RegexOptions.Ig noreCase);

if (boolMem_Regex. IsMatch(address ))
return;

Regex Mem_Regex = new Regex(memAddres s_pattern,
RegexOptions.Co mpiled | RegexOptions.Ig noreCase);

if (Mem_Regex.IsMa tch(address))
return;
throw new ArgumentOutOfRa ngeException(st ring.Format("{0 } is not a
valid address.", address));
}
static void Main(string[] args)
{
Console.WriteLi ne("Test started...");
System.Diagnost ics.Stopwatch sw = new
System.Diagnost ics.Stopwatch() ;
sw.Start();
for (int i = 0; i < 100; i++)
{
//ValidateAddress ("DB0.DBX0.0 ");
//ValidateAddress ("DB0.DBW0") ;
//ValidateAddress ("M0.0");
ValidateAddress ("MB0");
}

sw.Stop();
Console.WriteLi ne(sw.ElapsedMi lliseconds.ToSt ring() + " ms");
Console.WriteLi ne("Press any key to quit");
Console.ReadLin e();
}
}
}
Compiling a regex will cost some time, saving a bit when you use it.
The best way (I think) to use a compiled regex:
Make a static readonly Regex variable with that compiled expression,
then use it multiple times.
This means you get the compile-cost just once and the speed benefit
(which in my experience is not huge but still present) every time.

Hans Kesting
Oct 10 '08 #4
Ah, found it:

http://msdn.microsoft.com/en-us/libr...ons.regex.aspx

"The Regex class is immutable (read-only) and is inherently thread
safe. Regex objects can be created on any thread and shared between
threads."
Oct 10 '08 #5
>
To solve this, put your regexes in a private static readonly Regex
instance and reuse that. Like this:

private static readonly Regex dbBoolAddressRe gex = new
Regex(dbBoolAdd ress_pattern, RegexOptions.Co mpiled |
RegexOptions.Ig noreCase);
Thanks,

This is a huge performance boost :)

A 100000 times takes now about 763 ms

So this is great...

Bart
Oct 10 '08 #6
Hello Bart,
>To solve this, put your regexes in a private static readonly Regex
instance and reuse that. Like this:

private static readonly Regex dbBoolAddressRe gex = new
Regex(dbBoolAd dress_pattern, RegexOptions.Co mpiled |
RegexOptions.I gnoreCase);
Thanks,

This is a huge performance boost :)

A 100000 times takes now about 763 ms

So this is great...
You're welcome :)

--
Jesse Houwing
jesse.houwing at sogeti.nl
Oct 10 '08 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

20
8081
by: jeevankodali | last post by:
Hi I have an .Net application which processes thousands of Xml nodes each day and for each node I am using around 30-40 Regex matches to see if they satisfy some conditions are not. These Regex matches are called within a loop (like if or for). E.g. for(int i = 0; i < 10; i++) { Regex r = new Regex();
15
3223
by: Kay Schluehr | last post by:
I have a list of strings ls = and want to create a regular expression sx from it, such that sx.match(s) yields a SRE_Match object when s starts with an s_i for one i in . There might be relations between those strings: s_k.startswith(s_1) -> True or s_k.endswith(s_1) -> True. An extreme case would be ls = . For this reason SRE_Match should provide the longest possible match. Is there a Python module able to create an optimized regex rx...
10
3855
by: igor.kulkin | last post by:
I have a small utility program written in Python which works pretty slow so I've decided to implement it in C. I did some benchmarking of Python's code performance. One of the parts of the program is using Python's standard re (regular expressions) module to parse the input file. As Python's routines to read from the file and regular expressions are most likely implemented via native libraries I would expect that the C code, which reads...
4
2176
by: Henrik Dahl | last post by:
Hello! In my application I have a need for using a regular expression now and then. Often the same regular expression must be used multiple times. For performance reasons I use the RegexOptions.Compiled when I instantiate it. It must be obvious that it takes some time to instantiate such an object. My question is, does the Regex instantiation somehow deal with some caching internally so instantiating a Regex object multiple times...
15
50203
by: morleyc | last post by:
Hi, i would like to remove a number of characters from my string (\t \r \n which are throughout the string), i know regex can do this but i have no idea how. Any pointers much appreciated. Chris
16
2243
by: Mark Chambers | last post by:
Hi there, I'm seeking opinions on the use of regular expression searching. Is there general consensus on whether it's now a best practice to rely on this rather than rolling your own (string) pattern search functions. Where performance is an issue you can alway write your own specialized routine of course. However, for the occasional pattern search where performance isn't an issue, would most seasoned .NET developers rely on "Regex" and...
1
1398
by: jmacduff | last post by:
I have a performance issue related to regular expressions and caching , hopefully someone can point me in the right direction? I have a asp.net web service that is called several million times a day. It does not have data caching enabled since the input variables for the webmethods change every time its called. It's my understanding that if the input paramters change frequently that the hash table created by a caching option wont...
14
2240
by: ohmmega | last post by:
hy, i've got a simple question (for somebody who already knows the answer) about regex: i've a string like bla@bla@bla or bla@@bla i like to check the @'s, but couldn't figure it out how to set zero or more char's. (zero or one was easy). thank's rené
0
972
by: Frenz | last post by:
Hi, I'm facing a performance issue with the following code: MatchCollection match = regex.Matches(strInput); int i = match.Count; //This line is consuming 98% of CPU time for(int j =0;j<i;j++) {
0
7971
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8276
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8381
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8259
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6698
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
5847
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
3889
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
3932
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1495
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.