473,398 Members | 2,812 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,398 software developers and data experts.

Regex Novice needs help

I'm writing an app which is going to rely extremely heavily on the
usage of regular expressions. I'm reading the docs but having trouble
wrapping my head around some of this since it's all fairly new to me.
I have two questions, I'm hoping I can get answers to at least one :)
Any help is better than no help:

1) I have many cases I am checking if a particular string matches
against a particular regular expression. However, if the match happens
"inside" the string I don't consider it a match. I need the entire
string to constitute as a match. How can I force this check on the
RegEx engine?

2) Performance is going to be a big factor for this particular app. I
have about 300 pre-determined hardcoded regular expressions, and in
peak scenarios I will be matching incoming strings at a rate of about
10-15 per second. Is there a list of "guidelines" somewhere for
writing performance-aware regular expressions?

Thanks
Zach

Apr 12 '06 #1
3 1229
Zach <di***********@gmail.com> wrote:
I'm writing an app which is going to rely extremely heavily on the
usage of regular expressions. I'm reading the docs but having trouble
wrapping my head around some of this since it's all fairly new to me.
I have two questions, I'm hoping I can get answers to at least one :)
Any help is better than no help:

1) I have many cases I am checking if a particular string matches
against a particular regular expression. However, if the match happens
"inside" the string I don't consider it a match. I need the entire
string to constitute as a match. How can I force this check on the
RegEx engine?
Use ^ and $ to specify the start and end of the string.
2) Performance is going to be a big factor for this particular app. I
have about 300 pre-determined hardcoded regular expressions, and in
peak scenarios I will be matching incoming strings at a rate of about
10-15 per second. Is there a list of "guidelines" somewhere for
writing performance-aware regular expressions?


Do you mean you'd be running 300 regular expressions on each of 10-15
seconds per second? I wouldn't like to say for *sure* without testing
it (with examples of the actual regular expressions and sample data)
but I wouldn't have thought that would be a problem.

One important thing is to make sure you build the regular expressions
ahead of time and re-use them rather than creating new ones each time.
Also, use RegexOptions.Compiled. I'm sure others will be able to help
further - but the best thing to do to start with is to work out your
regular expressions and create a good sample data set. Then measure,
measure, measure - whenever you change something, run the test data set
through again and record the change to performance. Make sure you keep
that record - don't just do it on a scrap of paper. If possible, keep
the test results in the same source control system as the source, so
you can work out *exactly* which set of test results came from which
version of the code.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Apr 12 '06 #2
What I meant regarding the 300 and the 10-15 numbers is that my entire
set of regular expressions consists of about 300ish. Sometimes I will
have around 10-15 input strings per second to check against these
regular expressions. However, each input string will never be checked
against more than 3-4 regular expressions out of those 300. So a true
worst case is like (10-15)*(3-4) = 30-60 -> 45ish matches per second or
so.

Apr 12 '06 #3
Zach <di***********@gmail.com> wrote:
What I meant regarding the 300 and the 10-15 numbers is that my entire
set of regular expressions consists of about 300ish. Sometimes I will
have around 10-15 input strings per second to check against these
regular expressions. However, each input string will never be checked
against more than 3-4 regular expressions out of those 300. So a true
worst case is like (10-15)*(3-4) = 30-60 -> 45ish matches per second or
so.


Right - that shouldn't be a problem at all. As ever though, it's worth
measuring. Of course, if the regexes are incredibly complicated, it
could take a long time.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Apr 12 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
by: chris | last post by:
i can see the power of regular expressions but am having a bit of a battle getting my head around them. can anyone recommend some BASIC - tutorials for using regex something like th idots...
1
by: rdimayuga | last post by:
I need a regex pattern that will match a string starting with zero or one dot's. For example, ".string" and "string" should both match, but something like "estring" should not match. So far, I've...
8
by: Johnny | last post by:
I need to determine whether a text box contains a value that does not convert to a decimal. If the value does not convert to a decimal, I want to throw a MessageBox to have the user correct the...
2
by: Mortimer Schnurd | last post by:
Hi All, I am a VB 6 programmer who is now trying to learn C#. In doing so, I am trying to convert some of my VB modules to C#. I routinely user Reg Expressions in VB and am having some trouble...
2
by: John Grandy | last post by:
Is it advisable to compile a Regex for a massively scalable ASP.NET web-application ? How exactly does this work ? Do you create a separate class library and expose the Regex.Replace() as a...
17
by: clintonG | last post by:
I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher ...
7
by: Mike Labosh | last post by:
I have the following System.Text.RegularExpressions.Regex that is supposed to remove this predefined list of garbage characters from contact names that come in on import files : Dim...
0
by: Sebosac | last post by:
hi, novice on regex, i'm searching for THE master Regex will retieve php variable name like "$varname" in my script. $tagparse = fil_gzet_contents('myscript.php'); preg_match_all("(.\$)",...
11
by: coflo | last post by:
Hello I would like to replace an a href link that is provided in the RSS below with my own link. The link that I am looking to replace is defined in the <description> tag within the RSS. Im...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.