By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,369 Members | 1,138 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,369 IT Pros & Developers. It's quick & easy.

Regular Expression limits?

P: n/a
I'm working on an application in C# that will perform regular expression
matching against a small string. Usually regular expressions are used
such that the text being searched is large while the regular expression
itself is relatively small. What I'm doing is exactly the opposite. The
searched string will generally be small, while the Regex is going to be
large (in terms of a regular expression pattern anyway).

The reason the Regex will be large is because it is taking a large
number of disparate patterns and stringing them together using OR, e.g.

"pattern1|pattern2|pattern3..."

The string being searched could potentially match any of these patterns,
so it must be checked against them. What happens if I have a string of
1000 patterns OR'd together as a single Regex? Is performance hindered,
does Regex have limits that won't allow this?

Thanks, gilad
Jul 21 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
gilad <gi***@arbingerBADSPAMBOTsys.com> wrote:
I'm working on an application in C# that will perform regular expression
matching against a small string. Usually regular expressions are used
such that the text being searched is large while the regular expression
itself is relatively small. What I'm doing is exactly the opposite. The
searched string will generally be small, while the Regex is going to be
large (in terms of a regular expression pattern anyway).

The reason the Regex will be large is because it is taking a large
number of disparate patterns and stringing them together using OR, e.g.

"pattern1|pattern2|pattern3..."

The string being searched could potentially match any of these patterns,
so it must be checked against them. What happens if I have a string of
1000 patterns OR'd together as a single Regex? Is performance hindered,
does Regex have limits that won't allow this?


Well, the easiest way to find out would be to just test it.

However, you should probably also test what happens if you just have
1000 separate patterns, and just test each of them in turn. That may be
faster or it may be slower, but in my view it would almost certainly be
easier to understand...

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #2

P: n/a
> The reason the Regex will be large is because it is taking a large
number of disparate patterns and stringing them together using OR, e.g.

"pattern1|pattern2|pattern3..."

The string being searched could potentially match any of these patterns,
so it must be checked against them. What happens if I have a string of
1000 patterns OR'd together as a single Regex? Is performance hindered,
does Regex have limits that won't allow this?


I've not seen anything about limits (unlike the Linux POSIX regex API,
which has a 64K regex table limit) - try it and see.

As for performance, the regex compiler will generally do a great job
implementing the pattern you give it, but I don't believe it will do
much optimization. For example,

"^(pattern1|pattern2|pattern3)$"

may be essentially equivalent to

"^pattern1$|^pattern2$|^pattern3$"

(with RegexOptions.ExplicitCapture turned on) but will, so far as I
know, result in tow distinct regexes. I mention this because if you
have to match, say, "abe", "abet", or "abraham", you may get better
performance by sorting and building a tree

"^ab((e($|t$))|raham$)"

than by matching for

"^(abe|abet|abraham)$"

Again, try it and see.

--

www.midnightbeach.com
Jul 21 '05 #3

P: n/a
Thanks for the feedback. I was hoping there was something like

Jon Shemitz wrote:
I've not seen anything about limits (unlike the Linux POSIX regex API,
which has a 64K regex table limit) - try it and see.
that would simply tell me. I guess I'll just have to test.

Jon Skeet [C# MVP] wrote: However, you should probably also test what happens if you just have
1000 separate patterns, and just test each of them in turn. That may be
faster or it may be slower, but in my view it would almost certainly be
easier to understand...


I agree that it might be more clear to represent the patterns
individually. However, it also seems (from a logical point of view) that
instead of having to compute each of the 1000 pattern strings as a regex
individually, it would be faster to compute them all at once. Ultimately
I don't think the regex will ever be very complex--it will mostly
consist of simple patterns strung together with the OR operator. It
might just be large. Anyway, I'm off to test my assumptions...

Thanks again for the comments. gilad
Jul 21 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.