469,631 Members | 1,343 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,631 developers. It's quick & easy.

Regex Novice needs help

I'm writing an app which is going to rely extremely heavily on the
usage of regular expressions. I'm reading the docs but having trouble
wrapping my head around some of this since it's all fairly new to me.
I have two questions, I'm hoping I can get answers to at least one :)
Any help is better than no help:

1) I have many cases I am checking if a particular string matches
against a particular regular expression. However, if the match happens
"inside" the string I don't consider it a match. I need the entire
string to constitute as a match. How can I force this check on the
RegEx engine?

2) Performance is going to be a big factor for this particular app. I
have about 300 pre-determined hardcoded regular expressions, and in
peak scenarios I will be matching incoming strings at a rate of about
10-15 per second. Is there a list of "guidelines" somewhere for
writing performance-aware regular expressions?

Thanks
Zach

Apr 12 '06 #1
3 1117
Zach <di***********@gmail.com> wrote:
I'm writing an app which is going to rely extremely heavily on the
usage of regular expressions. I'm reading the docs but having trouble
wrapping my head around some of this since it's all fairly new to me.
I have two questions, I'm hoping I can get answers to at least one :)
Any help is better than no help:

1) I have many cases I am checking if a particular string matches
against a particular regular expression. However, if the match happens
"inside" the string I don't consider it a match. I need the entire
string to constitute as a match. How can I force this check on the
RegEx engine?
Use ^ and $ to specify the start and end of the string.
2) Performance is going to be a big factor for this particular app. I
have about 300 pre-determined hardcoded regular expressions, and in
peak scenarios I will be matching incoming strings at a rate of about
10-15 per second. Is there a list of "guidelines" somewhere for
writing performance-aware regular expressions?


Do you mean you'd be running 300 regular expressions on each of 10-15
seconds per second? I wouldn't like to say for *sure* without testing
it (with examples of the actual regular expressions and sample data)
but I wouldn't have thought that would be a problem.

One important thing is to make sure you build the regular expressions
ahead of time and re-use them rather than creating new ones each time.
Also, use RegexOptions.Compiled. I'm sure others will be able to help
further - but the best thing to do to start with is to work out your
regular expressions and create a good sample data set. Then measure,
measure, measure - whenever you change something, run the test data set
through again and record the change to performance. Make sure you keep
that record - don't just do it on a scrap of paper. If possible, keep
the test results in the same source control system as the source, so
you can work out *exactly* which set of test results came from which
version of the code.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Apr 12 '06 #2
What I meant regarding the 300 and the 10-15 numbers is that my entire
set of regular expressions consists of about 300ish. Sometimes I will
have around 10-15 input strings per second to check against these
regular expressions. However, each input string will never be checked
against more than 3-4 regular expressions out of those 300. So a true
worst case is like (10-15)*(3-4) = 30-60 -> 45ish matches per second or
so.

Apr 12 '06 #3
Zach <di***********@gmail.com> wrote:
What I meant regarding the 300 and the 10-15 numbers is that my entire
set of regular expressions consists of about 300ish. Sometimes I will
have around 10-15 input strings per second to check against these
regular expressions. However, each input string will never be checked
against more than 3-4 regular expressions out of those 300. So a true
worst case is like (10-15)*(3-4) = 30-60 -> 45ish matches per second or
so.


Right - that shouldn't be a problem at all. As ever though, it's worth
measuring. Of course, if the regexes are incredibly complicated, it
could take a long time.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Apr 12 '06 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

12 posts views Thread by chris | last post: by
1 post views Thread by rdimayuga | last post: by
8 posts views Thread by Johnny | last post: by
2 posts views Thread by Mortimer Schnurd | last post: by
2 posts views Thread by John Grandy | last post: by
17 posts views Thread by clintonG | last post: by
7 posts views Thread by Mike Labosh | last post: by
reply views Thread by Sebosac | last post: by
reply views Thread by gheharukoh7 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.