By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,263 Members | 1,656 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,263 IT Pros & Developers. It's quick & easy.

Opinion wanted on "Regex"

P: n/a
Hi there,

I'm seeking opinions on the use of regular expression searching. Is there
general consensus on whether it's now a best practice to rely on this rather
than rolling your own (string) pattern search functions. Where performance
is an issue you can alway write your own specialized routine of course.
However, for the occasional pattern search where performance isn't an issue,
would most seasoned .NET developers rely on "Regex" and cousins. Are there
any disadvantages I should be aware of other than possible efficiency issues
(compared to a specialized routine). Thanks in advance.
May 17 '07 #1
Share this Question
Share on Google+
16 Replies


P: n/a
Mark,

I think that if the work that you are doing is simple (say, replacing
one character with another), then I would not use a regular expression to do
the work. However, for more complex pattern matching (and it doesn't take
too much to get to that state, say, in processing zip codes, phone numbers,
ip addresses, etc, etc), I would definitely use a regular expression.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"Mark Chambers" <no_spam@_nospam.comwrote in message
news:Oz**************@TK2MSFTNGP03.phx.gbl...
Hi there,

I'm seeking opinions on the use of regular expression searching. Is there
general consensus on whether it's now a best practice to rely on this
rather than rolling your own (string) pattern search functions. Where
performance is an issue you can alway write your own specialized routine
of course. However, for the occasional pattern search where performance
isn't an issue, would most seasoned .NET developers rely on "Regex" and
cousins. Are there any disadvantages I should be aware of other than
possible efficiency issues (compared to a specialized routine). Thanks in
advance.
May 17 '07 #2

P: n/a
I think that if the work that you are doing is simple (say, replacing
one character with another), then I would not use a regular expression to
do the work. However, for more complex pattern matching (and it doesn't
take too much to get to that state, say, in processing zip codes, phone
numbers, ip addresses, etc, etc), I would definitely use a regular
expression.
Thanks for the feedback. Yes, I'm talking strictly about pattern matching.
You always want to take the path of least resistance so I wouldn't use it if
something simpler is readily available. However, I started writing an
elaborate routine the other day and after about 20 lines I decided that a
regular expression would be much simpler. After years of working in C++ on
Win32 however (without the luxury of "Regex"), I'm so used to rolling my own
that it didn't even occur to me to try it. Presumably there are few
negatives and your own experience is obviously positve based on your
response. I'll probably rely on it myself from here on. Thanks again for
your input.
May 17 '07 #3

P: n/a
* Mark Chambers wrote, On 17-5-2007 14:31:
Hi there,

I'm seeking opinions on the use of regular expression searching. Is there
general consensus on whether it's now a best practice to rely on this rather
than rolling your own (string) pattern search functions. Where performance
is an issue you can alway write your own specialized routine of course.
However, for the occasional pattern search where performance isn't an issue,
would most seasoned .NET developers rely on "Regex" and cousins. Are there
any disadvantages I should be aware of other than possible efficiency issues
(compared to a specialized routine). Thanks in advance.

I usually prefer to rely on Regular Expressions. They get optimized to
do these string searches almost automatically. Just be very careful with
stuff like .* everywhere, those are killing for performance.

The main reason I like regular expressions is that they are a unified
way to write string manipulation, whereas your own string manipulation
functions can all be different, yet work fine. So from a maintenance
perspective, it should be easier.

This does require you to work with IgnoreWhiteSpace on and verbatim
strings, so you van easily insert comments and such:

sstring regex = @"

[a-z] (?#any character from the alfabet)
(
[ ][a-z][0-9]
)+ (?# Some other nifty comment)

"

It makes it much easier to read, just make sure you write your spaces as
[ ] as in the example.

Jesse
May 17 '07 #4

P: n/a
I usually prefer to rely on Regular Expressions. They get optimized to do
these string searches almost automatically. Just be very careful with
stuff like .* everywhere, those are killing for performance.

The main reason I like regular expressions is that they are a unified way
to write string manipulation, whereas your own string manipulation
functions can all be different, yet work fine. So from a maintenance
perspective, it should be easier.

This does require you to work with IgnoreWhiteSpace on and verbatim
strings, so you van easily insert comments and such:

sstring regex = @"

[a-z] (?#any character from the alfabet)
(
[ ][a-z][0-9]
)+ (?# Some other nifty comment)

"

It makes it much easier to read, just make sure you write your spaces as
[ ] as in the example.
Thanks for the insight and I agree. I think that one drawback of regular
expressions however is that they can be tricky to get exactly right
depending on the complexity. When you write your own routine you're focused
on the task at hand and are completely responsible to get the algorithm
right. This is done using native language constructs which developers are
usually much more comfortable with. It's therefore easier to apply since you
just have to get the search logic itself right (granted, this isn't always
trivial). However, regular expressions is a language unto itself. Most
developers probably only use it occasionally and so they aren't as
experienced with it as their native language. It's therefore (potentially)
easier to get tripped up trying to write a complicated search pattern,
compared to writing your own routine (even if it's many times longer
provided the logic itself is straight-forward).
May 17 '07 #5

P: n/a
* Mark Chambers wrote, On 17-5-2007 15:55:
>I usually prefer to rely on Regular Expressions. They get optimized to do
these string searches almost automatically. Just be very careful with
stuff like .* everywhere, those are killing for performance.

The main reason I like regular expressions is that they are a unified way
to write string manipulation, whereas your own string manipulation
functions can all be different, yet work fine. So from a maintenance
perspective, it should be easier.

This does require you to work with IgnoreWhiteSpace on and verbatim
strings, so you van easily insert comments and such:

sstring regex = @"

[a-z] (?#any character from the alfabet)
(
[ ][a-z][0-9]
)+ (?# Some other nifty comment)

"

It makes it much easier to read, just make sure you write your spaces as
[ ] as in the example.

Thanks for the insight and I agree. I think that one drawback of regular
expressions however is that they can be tricky to get exactly right
depending on the complexity. When you write your own routine you're focused
on the task at hand and are completely responsible to get the algorithm
right. This is done using native language constructs which developers are
usually much more comfortable with. It's therefore easier to apply since you
just have to get the search logic itself right (granted, this isn't always
trivial). However, regular expressions is a language unto itself. Most
developers probably only use it occasionally and so they aren't as
experienced with it as their native language. It's therefore (potentially)
easier to get tripped up trying to write a complicated search pattern,
compared to writing your own routine (even if it's many times longer
provided the logic itself is straight-forward).

Agreed, but a developer must be pretty good at his/her language to write
performant algorithms. I'd rather leave the performance stuff to the
people who're really good at that (the guys who wrote the rgeex engine
for .NET).

That's why every developer we have is forced to get regex training. I'm
one of the trainers, so for me it has always been 'easy' :)

Jesse
May 17 '07 #6

P: n/a
Agreed, but a developer must be pretty good at his/her language to write
performant algorithms. I'd rather leave the performance stuff to the
people who're really good at that (the guys who wrote the rgeex engine for
.NET).
Most of the time that's true. However, since regular expressions depend on
generalized algorithms, they usually won't perform better than a specialized
algorithim. The good thing is that performance isn't really an issue most of
the time.
That's why every developer we have is forced to get regex training. I'm
one of the trainers, so for me it has always been 'easy' :)
I'm a former C++ junkie still coping with my addiction. Do you have a
12-step program you can recommend? :)
May 17 '07 #7

P: n/a
* Mark Chambers wrote, On 17-5-2007 17:29:
>Agreed, but a developer must be pretty good at his/her language to write
performant algorithms. I'd rather leave the performance stuff to the
people who're really good at that (the guys who wrote the rgeex engine for
.NET).

Most of the time that's true. However, since regular expressions depend on
generalized algorithms, they usually won't perform better than a specialized
algorithim. The good thing is that performance isn't really an issue most of
the time.
>That's why every developer we have is forced to get regex training. I'm
one of the trainers, so for me it has always been 'easy' :)

I'm a former C++ junkie still coping with my addiction. Do you have a
12-step program you can recommend? :)
My 12 step program was to develop SpamAssassin anti-spam rules for about
1.5 years. It's the best way to learn these things :).

But seriously, Just try. Take a couple of your older algo's and try to
convert them to regex. You can always ask here (or by email) for
suggestions.

I've seen the regex engine outwit dedicated string searching functions
on more than one occasion. And usually with a lot less code (except for
that one barely readable line of regex in there ;)).

Jesse
May 17 '07 #8

P: n/a
On May 17, 5:31 am, "Mark Chambers" <no_spam@_nospam.comwrote:
Hi there,

I'm seeking opinions on the use of regular expression searching. Is there
general consensus on whether it's now a best practice to rely on this rather
than rolling your own (string) pattern search functions. Where performance
is an issue you can alway write your own specialized routine of course.
However, for the occasional pattern search where performance isn't an issue,
would most seasoned .NET developers rely on "Regex" and cousins. Are there
any disadvantages I should be aware of other than possible efficiency issues
(compared to a specialized routine). Thanks in advance.
IMHO there is a range of problems that are best solved using Regex.

As Nicholas points out, very simple problems can be solved using
regular string routines. Solving them with Regex is like killing a fly
with a sledgehammer. Stuff like finding a particular string in
another, or replacing one character with another.

At the other end of the scale, I've seen posts here asking how to do
such-and-so using Regex, where the patterns are so baroque that it
takes a Regex expert here to sort them out and get them right. This
strikes me as unduly brittle and hard to maintain. If you can't figure
out for yourself how to write the correct Regex to match something,
break it down into a combination of open code and Regex, or test
multiple, simpler Regex patterns one after the other. If it takes you
hours of struggle to come up with the correct pattern, then perhaps
it's a sign that you're over-reaching and you should break the problem
into more manageable chunks.

Nonetheless, there are a huge number of string matching problems in
this mid-range: complex enough that open code becomes unwieldy, but
simple enough that I can write a Regex pattern in a few minutes to
match it.

May 17 '07 #9

P: n/a
Jesse Houwing <je***********@nospam-sogeti.nlwrote:

<snip>
I've seen the regex engine outwit dedicated string searching functions
on more than one occasion. And usually with a lot less code (except for
that one barely readable line of regex in there ;)).
And that's exactly the problem - the regex which is barely readable.
I'd rather read five or six lines of simple string manipulation than
rely on not only *my* understanding of regex (and the subtleties) but
also the understanding of whoever's reading and maintaining my code at
a later date.

Regexes are useful in their place, but they can be horribly overused.
I've seen them being used (incorrectly, even!) to check whether a
string starts with a particular string (not a pattern, just a straight
string) and whether a string has a particular length. It's crazy - just
as crazy as writing a complicated pattern matcher rather than using a
regex where appropriate.

Basically, use the right tool for the job. Sometimes that may require
writing some code to achieve a goal both with "hand-coding" and with a
regex, then looking (or asking others) to see which is more readable.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
May 17 '07 #10

P: n/a
* Jon Skeet [C# MVP] wrote, On 17-5-2007 20:12:
Jesse Houwing <je***********@nospam-sogeti.nlwrote:
>I've seen the regex engine outwit dedicated string searching functions
on more than one occasion. And usually with a lot less code (except for
that one barely readable line of regex in there ;)).
And that's exactly the problem - the regex which is barely readable.
I'd rather read five or six lines of simple string manipulation than
rely on not only *my* understanding of regex (and the subtleties) but
also the understanding of whoever's reading and maintaining my code at
a later date.
Agreed. That's why I opted to use verbatim strings and regex comments in
combination with IgnoreWhiteSpace. A well commented regex willd o
wonders :). But then it's no longer one line :(. (kidding).
Basically, use the right tool for the job. Sometimes that may require
writing some code to achieve a goal both with "hand-coding" and with a
regex, then looking (or asking others) to see which is more readable.
Agreed. One use for regexes I use very often is to use a simple regex to
find the general pattern. Then use string manipulation or another regex
to finish the job. MatchEvaluators are your friend there.

Also building your regex in code from several well named variables and
+ing them together in the end will improve readability.

I've seen people writing awful string manipulations as well... Whichever
tool or language you choose, the art lies in puttign it to good use
*and* making the result both 100% correct and maintainable.

Jesse
May 17 '07 #11

P: n/a
Mark Chambers wrote:
I'm seeking opinions on the use of regular expression searching. Is there
general consensus on whether it's now a best practice to rely on this rather
than rolling your own (string) pattern search functions. Where performance
is an issue you can alway write your own specialized routine of course.
However, for the occasional pattern search where performance isn't an issue,
would most seasoned .NET developers rely on "Regex" and cousins. Are there
any disadvantages I should be aware of other than possible efficiency issues
(compared to a specialized routine). Thanks in advance.
You should go for the solution that gives the simplest code.

Which means regex for everything >2 Substring and/or IndexOf.

Arne
May 19 '07 #12

P: n/a
On May 18, 7:22 pm, Arne Vajhøj <a...@vajhoej.dkwrote:
Mark Chambers wrote:
I'm seeking opinions on the use of regular expression searching. Is there
general consensus on whether it's now a best practice to rely on this rather
than rolling your own (string) pattern search functions. Where performance
is an issue you can alway write your own specialized routine of course.
However, for the occasional pattern search where performance isn't an issue,
would most seasoned .NET developers rely on "Regex" and cousins. Are there
any disadvantages I should be aware of other than possible efficiency issues
(compared to a specialized routine). Thanks in advance.

You should go for the solution that gives the simplest code.

Which means regex for everything >2 Substring and/or IndexOf.
Sorry. I disagree in part: there's an upper limit at which one tries
to stuff too much into Regex and it becomes a delicate, unreadable
mess.

Yes, Regex for everything more complex than a few simple string
operations... up to where you can't easily understand the Regex, at
which point you should start thinking of ways to break the problem
into smaller, more manageable parts.

May 19 '07 #13

P: n/a
Bruce Wood wrote:
On May 18, 7:22 pm, Arne Vajhøj <a...@vajhoej.dkwrote:
>Mark Chambers wrote:
>>I'm seeking opinions on the use of regular expression searching. Is there
general consensus on whether it's now a best practice to rely on this rather
than rolling your own (string) pattern search functions. Where performance
is an issue you can alway write your own specialized routine of course.
However, for the occasional pattern search where performance isn't an issue,
would most seasoned .NET developers rely on "Regex" and cousins. Are there
any disadvantages I should be aware of other than possible efficiency issues
(compared to a specialized routine). Thanks in advance.
You should go for the solution that gives the simplest code.

Which means regex for everything >2 Substring and/or IndexOf.

Sorry. I disagree in part: there's an upper limit at which one tries
to stuff too much into Regex and it becomes a delicate, unreadable
mess.

Yes, Regex for everything more complex than a few simple string
operations... up to where you can't easily understand the Regex, at
which point you should start thinking of ways to break the problem
into smaller, more manageable parts.
Using Substring/IndexOf will be an even bigger mess for very
complex stuff.

Breaking up the problem is a solution. But that solution can be
used both with regex and Substring/IndexOf.

For real complex stuff a scanner and parser a la lex & yacc
may be the ultimate solution.

Arne

May 19 '07 #14

P: n/a
For real complex stuff a scanner and parser a la lex & yacc
may be the ultimate solution.
Which we've been given for scanner/parser generation in C#. It's
included in the latest Visual Studio SDK for Visual Studio 2005.

Jesse
May 20 '07 #15

P: n/a
On May 19, 2:01 pm, Arne Vajhøj <a...@vajhoej.dkwrote:
Bruce Wood wrote:
On May 18, 7:22 pm, Arne Vajhøj <a...@vajhoej.dkwrote:
Mark Chambers wrote:
I'm seeking opinions on the use of regular expression searching. Is there
general consensus on whether it's now a best practice to rely on thisrather
than rolling your own (string) pattern search functions. Where performance
is an issue you can alway write your own specialized routine of course.
However, for the occasional pattern search where performance isn't anissue,
would most seasoned .NET developers rely on "Regex" and cousins. Are there
any disadvantages I should be aware of other than possible efficiencyissues
(compared to a specialized routine). Thanks in advance.
You should go for the solution that gives the simplest code.
Which means regex for everything >2 Substring and/or IndexOf.
Sorry. I disagree in part: there's an upper limit at which one tries
to stuff too much into Regex and it becomes a delicate, unreadable
mess.
Yes, Regex for everything more complex than a few simple string
operations... up to where you can't easily understand the Regex, at
which point you should start thinking of ways to break the problem
into smaller, more manageable parts.

Using Substring/IndexOf will be an even bigger mess for very
complex stuff.
Perhaps, perhaps not. The point is that more complex problems require
some soft of design up front. Multiple regexes may be the solution, or
perhaps some sort of state machine, or perhaps, as you stated later, a
full-blown parser.
Breaking up the problem is a solution. But that solution can be
used both with regex and Substring/IndexOf.
True. We agree there.
For real complex stuff a scanner and parser a la lex & yacc may be the ultimate solution.
Yes, and those may or may not use Regex internally. The point is that
the problem becomes so complex that attempting to tackle it in a
pattern is daunting, and even if you managed it, it would be
unmaintainable.

For my money, there are three groups of string-mashing problems:

1. So simple that using Regex is overkill.
2. Appropriately solved using Regex (this is a very large group).
3. So complex that they require a design phase, at which point the
technology finally employed may be string functions, regex calls, more
sophisticated approaches, or any combination of these.

So, one can try to do things in open code that are simpler to tackle
using Regex, and on the other hand there's the "since I got this great
big hammer, everything looks like a nail" problem, in which people try
to use Regex for _everything_, and sometimes end up with a baroque
mess.

Nonetheless, for the vast majority of day-to-day string parsing
problems, Regex is still the best way to go.

May 21 '07 #16

P: n/a
Bruce Wood wrote:
On May 19, 2:01 pm, Arne Vajhøj <a...@vajhoej.dkwrote:
>Using Substring/IndexOf will be an even bigger mess for very
complex stuff.

Perhaps, perhaps not. The point is that more complex problems require
some soft of design up front. Multiple regexes may be the solution, or
perhaps some sort of state machine, or perhaps, as you stated later, a
full-blown parser.
Difficult not to agree,

:-)
For my money, there are three groups of string-mashing problems:

1. So simple that using Regex is overkill.
2. Appropriately solved using Regex (this is a very large group).
3. So complex that they require a design phase, at which point the
technology finally employed may be string functions, regex calls, more
sophisticated approaches, or any combination of these.

So, one can try to do things in open code that are simpler to tackle
using Regex, and on the other hand there's the "since I got this great
big hammer, everything looks like a nail" problem, in which people try
to use Regex for _everything_, and sometimes end up with a baroque
mess.

Nonetheless, for the vast majority of day-to-day string parsing
problems, Regex is still the best way to go.
I think we are in violent agreement.

Arne
May 22 '07 #17

This discussion thread is closed

Replies have been disabled for this discussion.