473,378 Members | 1,037 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

Regular Expression limits?

I'm working on an application in C# that will perform regular expression
matching against a small string. Usually regular expressions are used
such that the text being searched is large while the regular expression
itself is relatively small. What I'm doing is exactly the opposite. The
searched string will generally be small, while the Regex is going to be
large (in terms of a regular expression pattern anyway).

The reason the Regex will be large is because it is taking a large
number of disparate patterns and stringing them together using OR, e.g.

"pattern1|pattern2|pattern3..."

The string being searched could potentially match any of these patterns,
so it must be checked against them. What happens if I have a string of
1000 patterns OR'd together as a single Regex? Is performance hindered,
does Regex have limits that won't allow this?

Thanks, gilad
Jul 21 '05 #1
3 1739
gilad <gi***@arbingerBADSPAMBOTsys.com> wrote:
I'm working on an application in C# that will perform regular expression
matching against a small string. Usually regular expressions are used
such that the text being searched is large while the regular expression
itself is relatively small. What I'm doing is exactly the opposite. The
searched string will generally be small, while the Regex is going to be
large (in terms of a regular expression pattern anyway).

The reason the Regex will be large is because it is taking a large
number of disparate patterns and stringing them together using OR, e.g.

"pattern1|pattern2|pattern3..."

The string being searched could potentially match any of these patterns,
so it must be checked against them. What happens if I have a string of
1000 patterns OR'd together as a single Regex? Is performance hindered,
does Regex have limits that won't allow this?


Well, the easiest way to find out would be to just test it.

However, you should probably also test what happens if you just have
1000 separate patterns, and just test each of them in turn. That may be
faster or it may be slower, but in my view it would almost certainly be
easier to understand...

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #2
> The reason the Regex will be large is because it is taking a large
number of disparate patterns and stringing them together using OR, e.g.

"pattern1|pattern2|pattern3..."

The string being searched could potentially match any of these patterns,
so it must be checked against them. What happens if I have a string of
1000 patterns OR'd together as a single Regex? Is performance hindered,
does Regex have limits that won't allow this?


I've not seen anything about limits (unlike the Linux POSIX regex API,
which has a 64K regex table limit) - try it and see.

As for performance, the regex compiler will generally do a great job
implementing the pattern you give it, but I don't believe it will do
much optimization. For example,

"^(pattern1|pattern2|pattern3)$"

may be essentially equivalent to

"^pattern1$|^pattern2$|^pattern3$"

(with RegexOptions.ExplicitCapture turned on) but will, so far as I
know, result in tow distinct regexes. I mention this because if you
have to match, say, "abe", "abet", or "abraham", you may get better
performance by sorting and building a tree

"^ab((e($|t$))|raham$)"

than by matching for

"^(abe|abet|abraham)$"

Again, try it and see.

--

www.midnightbeach.com
Jul 21 '05 #3
Thanks for the feedback. I was hoping there was something like

Jon Shemitz wrote:
I've not seen anything about limits (unlike the Linux POSIX regex API,
which has a 64K regex table limit) - try it and see.
that would simply tell me. I guess I'll just have to test.

Jon Skeet [C# MVP] wrote: However, you should probably also test what happens if you just have
1000 separate patterns, and just test each of them in turn. That may be
faster or it may be slower, but in my view it would almost certainly be
easier to understand...


I agree that it might be more clear to represent the patterns
individually. However, it also seems (from a logical point of view) that
instead of having to compute each of the 1000 pattern strings as a regex
individually, it would be faster to compute them all at once. Ultimately
I don't think the regex will ever be very complex--it will mostly
consist of simple patterns strung together with the OR operator. It
might just be large. Anyway, I'm off to test my assumptions...

Thanks again for the comments. gilad
Jul 21 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

14
by: Tina Li | last post by:
Hello, I've been struggling with a regular expression for parsing XML files, which keeps giving the run time error "maximum recursion limit exceeded". Here is the pattern string: ...
1
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make...
11
by: Ron Rohrssen | last post by:
Slightly off topic.... How can I write a regex that limits user input to 3 digits in the range of 1-128? I've been trying \d{1,128} but this allows for a match on more than 3 digits. ...
6
by: Ioannis Vranos | last post by:
Given the regular expression: S"^(+|+\\s+)$" 1) Isn't the "+|+" part redundant? As far as I can understand it means exactly the same as "+" alone. 2) Isn't the parenthesis grouping...
3
by: Mark | last post by:
To validate the length of a multiline textbox, I'm told that I have to use a regular expression validator. The regular expression below limits it to 25 characters in length, but if the user enters...
3
by: gilad | last post by:
I'm working on an application in C# that will perform regular expression matching against a small string. Usually regular expressions are used such that the text being searched is large while the...
7
by: Billa | last post by:
Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I...
25
by: Mike | last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...
2
by: Tedmond | last post by:
Dear all, I want to read a file data block by block using regular expression. The file contents is like MWH ........ ................. ..................... MWH ....................
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.