By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,583 Members | 3,411 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,583 IT Pros & Developers. It's quick & easy.

Search for multiple things in a string

P: n/a
Can you do a search for more that one string in another string?

Something like:

someString.IndexOf("something1","something2","some thing3",0)

or would you have to do something like:

if ((someString.IndexOf("something1",0) >= 0) ||
((someString.IndexOf("something2",0) >= 0) ||
((someString.IndexOf("something3",0) >= 0))
{
Do something
}

Thanks,

Tom
Nov 17 '05 #1
Share this Question
Share on Google+
32 Replies


P: n/a
Tom,

Your best bet would be to use a regular expression. You can use the
classes in the System.Text.RegularExpressions namespace to do this.

Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"tshad" <ts**********@ftsolutions.com> wrote in message
news:OO**************@TK2MSFTNGP09.phx.gbl...
Can you do a search for more that one string in another string?

Something like:

someString.IndexOf("something1","something2","some thing3",0)

or would you have to do something like:

if ((someString.IndexOf("something1",0) >= 0) ||
((someString.IndexOf("something2",0) >= 0) ||
((someString.IndexOf("something3",0) >= 0))
{
Do something
}

Thanks,

Tom

Nov 17 '05 #2

P: n/a
"Nicholas Paldino [.NET/C# MVP]" <mv*@spam.guard.caspershouse.com> wrote in
message news:O5*************@TK2MSFTNGP12.phx.gbl...
Tom,

Your best bet would be to use a regular expression. You can use the
classes in the System.Text.RegularExpressions namespace to do this.
This would be preferrable to the multiple if tests?

I don't know which is more efficient. Both would have to go back and test
for all the different items.

Thanks,

Tom
Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"tshad" <ts**********@ftsolutions.com> wrote in message
news:OO**************@TK2MSFTNGP09.phx.gbl...
Can you do a search for more that one string in another string?

Something like:

someString.IndexOf("something1","something2","some thing3",0)

or would you have to do something like:

if ((someString.IndexOf("something1",0) >= 0) ||
((someString.IndexOf("something2",0) >= 0) ||
((someString.IndexOf("something3",0) >= 0))
{
Do something
}

Thanks,

Tom


Nov 17 '05 #3

P: n/a
tshad <ts**********@ftsolutions.com> wrote:
Your best bet would be to use a regular expression. You can use the
classes in the System.Text.RegularExpressions namespace to do this.


This would be preferrable to the multiple if tests?

I don't know which is more efficient. Both would have to go back and test
for all the different items.


Personally, I'd go for the "if" tests - possibly with a helper method
using a params string array to aid readability - unless the performance
is really a problem, in which case measuring that performance and that
of the regular expressions would be an absolute necessity.

Regular expressions are really powerful, but can be much harder to read
than a series of very simple string operations.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 17 '05 #4

P: n/a
Jon Skeet [C# MVP] wrote:
Regular expressions are really powerful, but can be much harder to read
than a series of very simple string operations.


But they really aren't in this case:

if (Regex.IsMatch(myString, @"something1|something2|something3"))
...

or even, in this special case:

if (Regex.IsMatch(myString, @"something[123]"))
...

I tend to think that regular expressions get hard to read when they are
used to do complicated stuff - and then the alternative is usually not a
"very simple string operation". Part of the reason, though, is that people
don't know it's possible to stretch regular expressions over multiple
lines and even use comments in them. I could rewrite the code above like
this:

string myRegex = @"
something1 | # something1 is our first option
something2 | # something2 would also be fine
something3 # last chance, something3";

if (Regex.IsMatch(myString, myRegex, RegexOptions.IgnorePatternWhitespace))
...

So it's really easy to pick apart the expression and comment the parts - I
don't think it's less readable than any other part of code. You have to
know the language of course, but that's the same for any other programming
language or construct out there.

But you're right about the performance question for simple cases like
this, of course.
Oliver Sturm
--
Expert programming and consulting services available
See http://www.sturmnet.org (try /blog as well)
Nov 17 '05 #5

P: n/a

"Oliver Sturm" <ol****@sturmnet.org> wrote in message
news:xn****************@msnews.microsoft.com...
Jon Skeet [C# MVP] wrote:
Regular expressions are really powerful, but can be much harder to read
than a series of very simple string operations.
But they really aren't in this case:

if (Regex.IsMatch(myString, @"something1|something2|something3"))
...

or even, in this special case:

if (Regex.IsMatch(myString, @"something[123]"))
...

I tend to think that regular expressions get hard to read when they are
used to do complicated stuff - and then the alternative is usually not a
"very simple string operation". Part of the reason, though, is that people
don't know it's possible to stretch regular expressions over multiple
lines and even use comments in them. I could rewrite the code above like
this:

string myRegex = @"
something1 | # something1 is our first option
something2 | # something2 would also be fine
something3 # last chance, something3";

if (Regex.IsMatch(myString, myRegex,
RegexOptions.IgnorePatternWhitespace))
...

So it's really easy to pick apart the expression and comment the parts - I
don't think it's less readable than any other part of code. You have to
know the language of course, but that's the same for any other programming
language or construct out there.

But you're right about the performance question for simple cases like
this, of course.


But it is nice to know the options.

BTW, what is the "@" for?

Thanks,

Tom

Oliver Sturm
--
Expert programming and consulting services available
See http://www.sturmnet.org (try /blog as well)

Nov 17 '05 #6

P: n/a
tshad wrote:
But it is nice to know the options.

BTW, what is the "@" for?


It defines a verbatim literal string. See here (MSDN):
http://shrinkster.com/81i
Oliver Sturm
--
Expert programming and consulting services available
See http://www.sturmnet.org (try /blog as well)
Nov 17 '05 #7

P: n/a
Oliver Sturm <ol****@sturmnet.org> wrote:
Jon Skeet [C# MVP] wrote:
Regular expressions are really powerful, but can be much harder to read
than a series of very simple string operations.
But they really aren't in this case:

if (Regex.IsMatch(myString, @"something1|something2|something3"))
...

or even, in this special case:

if (Regex.IsMatch(myString, @"something[123]"))
...


Until, of course, something1 etc start having characters in which need
escaping - how confident would you be that you'd get that right? It's
an extra thing to think about - and I'm sure the real strings aren't
actually "something1" etc.
I tend to think that regular expressions get hard to read when they are
used to do complicated stuff - and then the alternative is usually not a
"very simple string operation". Part of the reason, though, is that people
don't know it's possible to stretch regular expressions over multiple
lines and even use comments in them. I could rewrite the code above like
this:

string myRegex = @"
something1 | # something1 is our first option
something2 | # something2 would also be fine
something3 # last chance, something3";

if (Regex.IsMatch(myString, myRegex, RegexOptions.IgnorePatternWhitespace))
...

So it's really easy to pick apart the expression and comment the parts - I
don't think it's less readable than any other part of code. You have to
know the language of course, but that's the same for any other programming
language or construct out there.


Well, I don't have to learn (or more importantly, remember) *any* extra
bits of language other than C# (which I already need to know) to get it
right with IndexOf, even if the strings I'm looking for contain things
like dots, stars etc. That isn't true for regular expressions.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 17 '05 #8

P: n/a
Jon Skeet [C# MVP] wrote:
Until, of course, something1 etc start having characters in which need
escaping - how confident would you be that you'd get that right? It's
an extra thing to think about - and I'm sure the real strings aren't
actually "something1" etc.
Aren't you exaggerating a bit here? There are regex testers out there to
help you with building regular expressions and the Regex class itself
knows how to escape special chars - it's not that big a deal.
Well, I don't have to learn (or more importantly, remember) any extra
bits of language other than C# (which I already need to know) to get it
right with IndexOf, even if the strings I'm looking for contain things
like dots, stars etc. That isn't true for regular expressions.


No, it isn't. But you won't get far in today's programming world if you
don't know the first thing about SQL or XML, for example, so I guess
you're not suggesting that one language is enough? I believe that Regular
Expressions are a powerful technology well worth learning - and it's
probably good advice to stay clear of them for anything but the simplest
applications if you're not willing to put in a bit of time to get to know
them.

About IndexOf, as I meant to say already, as long as the problems you're
trying to solve are the kind that can be solved with those simple string
functions (and without resulting in huge algorithms), you'll probably have
the performance argument on your side anyway.
Oliver Sturm
--
Expert programming and consulting services available
See http://www.sturmnet.org (try /blog as well)
Nov 17 '05 #9

P: n/a
Oliver Sturm <ol****@sturmnet.org> wrote:
Until, of course, something1 etc start having characters in which need
escaping - how confident would you be that you'd get that right? It's
an extra thing to think about - and I'm sure the real strings aren't
actually "something1" etc.
Aren't you exaggerating a bit here? There are regex testers out there to
help you with building regular expressions and the Regex class itself
knows how to escape special chars - it's not that big a deal.


No, but it's still harder to remember than not having to remember
anything special at all, which is what you get with IndexOf.

In a hurry, I can very easily see someone changing a string literal
from one thing to another, not noticing that as it's a regular
expression, they need to escape part of their new string.

Now, where's the *advantage* of using regular expressions in this case?
Well, I don't have to learn (or more importantly, remember) any extra
bits of language other than C# (which I already need to know) to get it
right with IndexOf, even if the strings I'm looking for contain things
like dots, stars etc. That isn't true for regular expressions.


No, it isn't. But you won't get far in today's programming world if you
don't know the first thing about SQL or XML, for example, so I guess
you're not suggesting that one language is enough?


No - but I'm suggesting that when one language works perfectly well for
the task at hand, and it's the same language that the rest of your code
is written in, it's easier to stick within that language.
I believe that Regular Expressions are a powerful technology well
worth learning - and it's probably good advice to stay clear of them
for anything but the simplest applications if you're not willing to
put in a bit of time to get to know them.
Regular expressions are absolutely worth learning for where they
provide extra value. In cases like this, where they're only really
providing extra things to remember (what you need to escape, or to call
Regex's own escaping mechanism) I don't think there's any value.
About IndexOf, as I meant to say already, as long as the problems you're
trying to solve are the kind that can be solved with those simple string
functions (and without resulting in huge algorithms), you'll probably have
the performance argument on your side anyway.


Well, I'm much keener on the readability argument than the performance
one - I suspect that the performance difference would rarely be of
overall significance.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 17 '05 #10

P: n/a
Jon Skeet [C# MVP] wrote:
In a hurry, I can very easily see someone changing a string literal
from one thing to another, not noticing that as it's a regular
expression, they need to escape part of their new string.
In a hurry, all kinds of things can happen when making changes to source
code.
Now, where's the advantage of using regular expressions in this case?


I wasn't saying there was one in the specific scenario the OP introduced.
I was using the example to show that regular expressions don't have to be
any more complicated than simple string operations.
About IndexOf, as I meant to say already, as long as the problems you're
trying to solve are the kind that can be solved with those simple string
functions (and without resulting in huge algorithms), you'll probably have
the performance argument on your side anyway.


Well, I'm much keener on the readability argument than the performance
one - I suspect that the performance difference would rarely be of
overall significance.


As I'm trying to say all the time, as soon as an implementation reaches a
complexity that makes it worth thinking about regular expressions, I'm
sure an alternative solution based on simple string functions won't be
more readable any longer. I'd even go so far as to say that as soon as
more than one call to a simple string function is needed for a given
problem, most probably I'll find the regular expression solution more
readable. This is, after all, a subjective decision to make.
Oliver Sturm
--
Expert programming and consulting services available
See http://www.sturmnet.org (try /blog as well)
Nov 17 '05 #11

P: n/a
Oliver Sturm <ol****@sturmnet.org> wrote:
In a hurry, I can very easily see someone changing a string literal
from one thing to another, not noticing that as it's a regular
expression, they need to escape part of their new string.
In a hurry, all kinds of things can happen when making changes to source
code.


Indeed - but why make it even easier to introduce bugs? Changing a
search from "somewhere" to "somewhere.com" *shouldn't* be something
which requires significant thought, in my view - but it does as soon as
you're using regular expressions.
Now, where's the advantage of using regular expressions in this case?


I wasn't saying there was one in the specific scenario the OP introduced.
I was using the example to show that regular expressions don't have to be
any more complicated than simple string operations.


But there's *always* the added complexity of "do I have to escape this
or not". There are certainly times when the string operations become
more complicated than the corresponding regular expressions (otherwise
they really would be pointless - something I've never suggested), but I
don't believe that's the case here.
About IndexOf, as I meant to say already, as long as the problems you're
trying to solve are the kind that can be solved with those simple string
functions (and without resulting in huge algorithms), you'll probably have
the performance argument on your side anyway.


Well, I'm much keener on the readability argument than the performance
one - I suspect that the performance difference would rarely be of
overall significance.


As I'm trying to say all the time, as soon as an implementation reaches a
complexity that makes it worth thinking about regular expressions, I'm
sure an alternative solution based on simple string functions won't be
more readable any longer.


Well, Nicholas certainly thought it worth thinking about regular
expressions in this case - do you? (The earlier part of your reply
suggests not, but the bit below suggests you do.)
I'd even go so far as to say that as soon as
more than one call to a simple string function is needed for a given
problem, most probably I'll find the regular expression solution more
readable. This is, after all, a subjective decision to make.


Whereas three calls to IndexOf is *definitely* more readable than a
regular expression which, depending on the strings involved may well
need to involve escaping.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 17 '05 #12

P: n/a
Jon Skeet [C# MVP] wrote:
In a hurry, all kinds of things can happen when making changes to source
code.


Indeed - but why make it even easier to introduce bugs? Changing a
search from "somewhere" to "somewhere.com" shouldn't be something
which requires significant thought, in my view - but it does as soon as
you're using regular expressions.


But in any proper real-world use case of regular expressions, there won't
be an expression saying "somewhere" to start with. If the pattern string
doesn't show any trace of wildcards or other recognizable regular
expression features, it should be safe to assume that regular expressions
aren't being used. If a string in some source code I don't know shows
signs of being a match pattern and there's nothing else that tells me
whether it's a regular expression or not, I'll have to look and find it
out, there's no way around that. To be safe in assuming that no string
could ever be a regular expression, regardless of whether it looks like
it, you would have to forbid them completely in your team at least.
As I'm trying to say all the time, as soon as an implementation reaches a
complexity that makes it worth thinking about regular expressions, I'm
sure an alternative solution based on simple string functions won't be
more readable any longer.


Well, Nicholas certainly thought it worth thinking about regular
expressions in this case - do you? (The earlier part of your reply
suggests not, but the bit below suggests you do.)
I'd even go so far as to say that as soon as
more than one call to a simple string function is needed for a given
problem, most probably I'll find the regular expression solution more
readable. This is, after all, a subjective decision to make.


Whereas three calls to IndexOf is definitely more readable than a
regular expression which, depending on the strings involved may well
need to involve escaping.


In this case, as far as it's described by the sample we've seen, I
wouldn't favor the usage of regular expressions. I don't know whether the
actual code that the OP is writing might justify regexes better. Anyway, I
was merely using the case to demonstrate the fact that regular expressions
don't have a readability problem, IMHO, or at least they don't need to
have one if used properly.
Oliver Sturm
--
Expert programming and consulting services available
See http://www.sturmnet.org (try /blog as well)
Nov 17 '05 #13

P: n/a
Oliver Sturm <ol****@sturmnet.org> wrote:
Indeed - but why make it even easier to introduce bugs? Changing a
search from "somewhere" to "somewhere.com" shouldn't be something
which requires significant thought, in my view - but it does as soon as
you're using regular expressions.
But in any proper real-world use case of regular expressions, there won't
be an expression saying "somewhere" to start with. If the pattern string
doesn't show any trace of wildcards or other recognizable regular
expression features, it should be safe to assume that regular expressions
aren't being used. If a string in some source code I don't know shows
signs of being a match pattern and there's nothing else that tells me
whether it's a regular expression or not, I'll have to look and find it
out, there's no way around that. To be safe in assuming that no string
could ever be a regular expression, regardless of whether it looks like
it, you would have to forbid them completely in your team at least.


No - you just have to be careful when you're using regular expressions.
I prefer code which means I don't have to take as much care, because
being human, sooner or later I'll be careless. The fewer possibilities
I have for carelessness actually causing an error, the better.

I know I couldn't off the top of my head list all the characters which
need escaping for regular expressions - could you *and* every member of
your team?
Whereas three calls to IndexOf is definitely more readable than a
regular expression which, depending on the strings involved may well
need to involve escaping.


In this case, as far as it's described by the sample we've seen, I
wouldn't favor the usage of regular expressions.


Even though it's more than one call to a simple string function?
I don't know whether the
actual code that the OP is writing might justify regexes better. Anyway, I
was merely using the case to demonstrate the fact that regular expressions
don't have a readability problem, IMHO, or at least they don't need to
have one if used properly.


They have a readability problem compared with simple operations - they
require more care than simple literals. To me, "more care required"
means "lower readability and maintainability", which is a problem.

I'm not saying they're hideously unreadable - just *less* readable.
That's enough for me.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #14

P: n/a
Jon Skeet [C# MVP] wrote:
I know I couldn't off the top of my head list all the characters which
need escaping for regular expressions - could you and every member of
your team?
I think I might, they are not really as many as you think. But that's not
the point; I use a testing tool when I create a larger expression and I
most probably use it again when I make changes. I have comments on my
regular expressions telling me what they do, what sample input and output
is. The first thing that's important is just that someone has to recognize
a regular expression when he encounters it, you're right about that.
Whereas three calls to IndexOf is definitely more readable than a
regular expression which, depending on the strings involved may well
need to involve escaping.


In this case, as far as it's described by the sample we've seen, I
wouldn't favor the usage of regular expressions.


Even though it's more than one call to a simple string function?


Probably... the number of calls is not really what counts, is it?
Sometimes, string parsing algorithms that don't make use of regular
expressions involve several nested loops, several temporary variables and
just a single call to a simple string function. Yet these beasts can be
horrible because it takes only a short while until even the author can't
reliably remember what the algorithm does.

I won't contest the fact that three lines of code, calling IndexOf three
times, are probably a better alternative to a regular expression.
They have a readability problem compared with simple operations - they
require more care than simple literals. To me, "more care required"
means "lower readability and maintainability", which is a problem.
Well, let's agree to disagree. I'm still trying to make the point that the
comparison with simple string literals is a bad one, because the two won't
ever be equal alternatives in any real world problem situation. Use the
simple operations as long as it makes sense, but don't hesitate to look at
other solutions because you think someone else on the team might make a
mistake changing a string literal later on.
I'm not saying they're hideously unreadable - just less readable.
That's enough for me.


Jon, I'm with you most of the way. But there's a limit to the demand for
readability, as I see it. I'm not likely to turn down a useful technology
in cases where it is practically without alternatives because the solution
doesn't please me aesthetically.
Oliver Sturm
--
Expert programming and consulting services available
See http://www.sturmnet.org (try /blog as well)
Nov 17 '05 #15

P: n/a
Oliver Sturm <ol****@sturmnet.org> wrote:
I know I couldn't off the top of my head list all the characters which
need escaping for regular expressions - could you and every member of
your team?
I think I might, they are not really as many as you think. But that's not
the point; I use a testing tool when I create a larger expression and I
most probably use it again when I make changes. I have comments on my
regular expressions telling me what they do, what sample input and output
is. The first thing that's important is just that someone has to recognize
a regular expression when he encounters it, you're right about that.


Absolutely - especially when your tests may well not catch the problem.
For instance, if you have a search for "jon.skeet", are you going to
write a test to make sure that "jonxskeet" doesn't match? Unless you
actually know what to avoid (in which case you're likely to have
written it correctly in the first place) the test may well not pick up
on a missed character which needs escaping.
Whereas three calls to IndexOf is definitely more readable than a
regular expression which, depending on the strings involved may well
need to involve escaping.

In this case, as far as it's described by the sample we've seen, I
wouldn't favor the usage of regular expressions.


Even though it's more than one call to a simple string function?


Probably... the number of calls is not really what counts, is it?


I was only going by what you'd said previously:

<quote>
I'd even go so far as to say that as soon as more than one call to a
simple string function is needed for a given problem, most probably
I'll find the regular expression solution more readable.
</quote>
Sometimes, string parsing algorithms that don't make use of regular
expressions involve several nested loops, several temporary variables and
just a single call to a simple string function. Yet these beasts can be
horrible because it takes only a short while until even the author can't
reliably remember what the algorithm does.
Absolutely.
I won't contest the fact that three lines of code, calling IndexOf three
times, are probably a better alternative to a regular expression.
Goodo :)
They have a readability problem compared with simple operations - they
require more care than simple literals. To me, "more care required"
means "lower readability and maintainability", which is a problem.


Well, let's agree to disagree. I'm still trying to make the point that the
comparison with simple string literals is a bad one, because the two won't
ever be equal alternatives in any real world problem situation.


I don't see how you can say that when using regular expressions was one
suggested solution, and using IndexOf was another suggested solution.
Use the simple operations as long as it makes sense, but don't
hesitate to look at other solutions because you think someone else on
the team might make a mistake changing a string literal later on.


If the other solution is likely to be fundamentally simpler, I'm all
for that. It was this particular situation that I was commenting on,
and the general comment that regular expressions are often used as a
sledgehammer to crack a pretty flimsy nut.
I'm not saying they're hideously unreadable - just less readable.
That's enough for me.


Jon, I'm with you most of the way. But there's a limit to the demand for
readability, as I see it. I'm not likely to turn down a useful technology
in cases where it is practically without alternatives because the solution
doesn't please me aesthetically.


Me either - but where there *is* a practical alternative which is more
readable, I'll go for that. If you only have one solution, you *can't*
turn it down really, can you? (Unless you can forego the feature which
requires it, of course, which is unlikely.)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #16

P: n/a
Jon Skeet [C# MVP] wrote:
Even though it's more than one call to a simple string function?


Probably... the number of calls is not really what counts, is it?


I was only going by what you'd said previously:

<quote>
I'd even go so far as to say that as soon as more than one call to a
simple string function is needed for a given problem, most probably
I'll find the regular expression solution more readable.
</quote>


I know I said that and I know you were referring to it. But I meant one
call as in "one call at runtime", as opposed to "one line of code that
makes the call".
Well, let's agree to disagree. I'm still trying to make the point that the
comparison with simple string literals is a bad one, because the two won't
ever be equal alternatives in any real world problem situation.


I don't see how you can say that when using regular expressions was one
suggested solution, and using IndexOf was another suggested solution.


Sorry, I meant "simple string operations". And I meant that I wouldn't
consider using a regular expression if an IndexOf could do the job just as
well - the two are no equal alternatives because I wouldn't seriously
consider one of them.
Use the simple operations as long as it makes sense, but don't
hesitate to look at other solutions because you think someone else on
the team might make a mistake changing a string literal later on.


If the other solution is likely to be fundamentally simpler, I'm all
for that. It was this particular situation that I was commenting on,
and the general comment that regular expressions are often used as a
sledgehammer to crack a pretty flimsy nut.


You're right about that. Complex technologies tend to be misused more
often than simple ones, don't they?
Jon, I'm with you most of the way. But there's a limit to the demand for
readability, as I see it. I'm not likely to turn down a useful technology
in cases where it is practically without alternatives because the solution
doesn't please me aesthetically.


Me either - but where there is a practical alternative which is more
readable, I'll go for that. If you only have one solution, you can't
turn it down really, can you? (Unless you can forego the feature which
requires it, of course, which is unlikely.)


Well, usually someone will come forward with other solutions, however
far-fetched. One that can actually be quite a good alternative to more
complex regular expression scenarios is writing a parser - or rather,
using a compiler compiler to create one. But in my experience there's a
lot of room for nicely written regular expressions, somewhere between a
few IndexOf calls and a complete lex/yacc/SLK/Coco/R implementation. :-)
Oliver Sturm
--
Expert programming and consulting services available
See http://www.sturmnet.org (try /blog as well)
Nov 17 '05 #17

P: n/a
Oliver Sturm <ol****@sturmnet.org> wrote:
<quote>
I'd even go so far as to say that as soon as more than one call to a
simple string function is needed for a given problem, most probably
I'll find the regular expression solution more readable.
</quote>


I know I said that and I know you were referring to it. But I meant one
call as in "one call at runtime", as opposed to "one line of code that
makes the call".


Not quite with you there - in this case, there would be three calls at
runtime, and three lines of code.
Well, let's agree to disagree. I'm still trying to make the point that the
comparison with simple string literals is a bad one, because the two won't
ever be equal alternatives in any real world problem situation.


I don't see how you can say that when using regular expressions was one
suggested solution, and using IndexOf was another suggested solution.


Sorry, I meant "simple string operations". And I meant that I wouldn't
consider using a regular expression if an IndexOf could do the job just as
well - the two are no equal alternatives because I wouldn't seriously
consider one of them.


Right - but unfortunately (IMO) other people do.
If the other solution is likely to be fundamentally simpler, I'm all
for that. It was this particular situation that I was commenting on,
and the general comment that regular expressions are often used as a
sledgehammer to crack a pretty flimsy nut.


You're right about that. Complex technologies tend to be misused more
often than simple ones, don't they?


Absolutely...
Me either - but where there is a practical alternative which is more
readable, I'll go for that. If you only have one solution, you can't
turn it down really, can you? (Unless you can forego the feature which
requires it, of course, which is unlikely.)


Well, usually someone will come forward with other solutions, however
far-fetched. One that can actually be quite a good alternative to more
complex regular expression scenarios is writing a parser - or rather,
using a compiler compiler to create one. But in my experience there's a
lot of room for nicely written regular expressions, somewhere between a
few IndexOf calls and a complete lex/yacc/SLK/Coco/R implementation. :-)


Oh certainly. I'm really *not* trying to suggest that regular
expressions should never be used - just that they shouldn't be the
first port of call as soon as you need to do anything with a string :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #18

P: n/a
Jon Skeet [C# MVP] wrote:
<quote>
I'd even go so far as to say that as soon as more than one call to a
simple string function is needed for a given problem, most probably
I'll find the regular expression solution more readable.
</quote>


I know I said that and I know you were referring to it. But I meant one
call as in "one call at runtime", as opposed to "one line of code that
makes the call".


Not quite with you there - in this case, there would be three calls at
runtime, and three lines of code.


And in this case I would be prepared to see things differently - I said
already that I don't believe in call counting. But the sentence you quoted
was meant more in the context of the problem I was describing, where
simple string functions are used as a part of a, possibly hugely
complicated, larger algorithm.

As soon as there are loops involved, which may or may not result in a
single line with such a call being executed multiple times, things start
getting complex very quickly in my experience. How often have you been
sitting there with the debugger running, counting characters in a string
to find that one-off problem somebody introduced? I'll take an enormously
unreadable regular expression over that task any day :-)

Oliver Sturm
--
Expert programming and consulting services available
See http://www.sturmnet.org (try /blog as well)
Nov 17 '05 #19

P: n/a

"Oliver Sturm" <ol****@sturmnet.org> wrote in message
news:xn****************@msnews.microsoft.com...
Jon Skeet [C# MVP] wrote:
In a hurry, all kinds of things can happen when making changes to source
code.
Indeed - but why make it even easier to introduce bugs? Changing a
search from "somewhere" to "somewhere.com" shouldn't be something
which requires significant thought, in my view - but it does as soon as
you're using regular expressions.


But in any proper real-world use case of regular expressions, there won't
be an expression saying "somewhere" to start with. If the pattern string
doesn't show any trace of wildcards or other recognizable regular
expression features, it should be safe to assume that regular expressions
aren't being used. If a string in some source code I don't know shows
signs of being a match pattern and there's nothing else that tells me
whether it's a regular expression or not, I'll have to look and find it
out, there's no way around that. To be safe in assuming that no string
could ever be a regular expression, regardless of whether it looks like
it, you would have to forbid them completely in your team at least.
As I'm trying to say all the time, as soon as an implementation reaches a
complexity that makes it worth thinking about regular expressions, I'm
sure an alternative solution based on simple string functions won't be
more readable any longer.


Well, Nicholas certainly thought it worth thinking about regular
expressions in this case - do you? (The earlier part of your reply
suggests not, but the bit below suggests you do.)
I'd even go so far as to say that as soon as
more than one call to a simple string function is needed for a given
problem, most probably I'll find the regular expression solution more
readable. This is, after all, a subjective decision to make.


Whereas three calls to IndexOf is definitely more readable than a
regular expression which, depending on the strings involved may well
need to involve escaping.


In this case, as far as it's described by the sample we've seen, I
wouldn't favor the usage of regular expressions. I don't know whether the
actual code that the OP is writing might justify regexes better. Anyway, I
was merely using the case to demonstrate the fact that regular expressions
don't have a readability problem, IMHO, or at least they don't need to
have one if used properly.


I also feel that Regular Expressions, being an object in asp.net (not
necessarily C#) makes it just as valid as C#.

As far as readability, it has nothing to do with Regular Expressions whether
it is readable or not, as Oliver mentions, but how you write it.

You can also make some pretty unreadable C# code as well. Readability is a
function of the programmer not the language (in most cases). As was also
mentioned you also need to know the language. For someone not used to
objects, abstract objects and interfaces are also hard to read.

I like seeing different options and make a choice. Sometimes I may use
something like Regex just so I am used to using it, as long as the problem
warrants it.

You don't use it - you lose it.

Tom

Oliver Sturm
--
Expert programming and consulting services available
See http://www.sturmnet.org (try /blog as well)

Nov 17 '05 #20

P: n/a
tshad <ts**********@ftsolutions.com> wrote:
Escaping?

You've mentioned that as being a problem a couple of times.

What do you mean by this?

Are you talking about stopping if you find the first one matching?


No - I'm talking about finding things like "jon.skeet" in a string.
Using IndexOf, that's no problem - no characters are interpreted in a
"special" way by IndexOf.

Regular expressions, however, treat "." as "any character", so to find
an actual dot, you need to escape it with a backslash - and from a C#
point of view that means either doubling the backslash or using a
verbatim string literal, i.e.
"jon\\.skeet"
or
@"jon\.skeet"

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #21

P: n/a
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
tshad <ts**********@ftsolutions.com> wrote:
Escaping?

You've mentioned that as being a problem a couple of times.

What do you mean by this?

Are you talking about stopping if you find the first one matching?
No - I'm talking about finding things like "jon.skeet" in a string.
Using IndexOf, that's no problem - no characters are interpreted in a
"special" way by IndexOf.

Regular expressions, however, treat "." as "any character", so to find
an actual dot, you need to escape it with a backslash - and from a C#
point of view that means either doubling the backslash or using a
verbatim string literal, i.e.
"jon\\.skeet"
or
@"jon\.skeet"


Got ya.

I thought you were talking about escaping the function/call as you might in
a loop when you find what you are looking for.

Thanks,

Tom
--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #22

P: n/a
tshad <ts**********@ftsolutions.com> wrote:
I also feel that Regular Expressions, being an object in asp.net (not
necessarily C#) makes it just as valid as C#.
Regular expressions have nothing to do with ASP.NET - they're a part of
"normal" .NET.
As far as readability, it has nothing to do with Regular Expressions whether
it is readable or not, as Oliver mentions, but how you write it.
No - I believe that searching for "jon.skeet" with IndexOf is clearer
than searching for "jon\\.skeet" or @"jon\.skeet". Which of them
contains just the information which is actually of concern, and which
contains information which is only present due to the technology used
to do the searching?
You can also make some pretty unreadable C# code as well.
Sure, but that's no reason to use regular expressions just to make
things worse.
Readability is a function of the programmer not the language (in most
cases).
Yes, but it's the programmer's decision how to approach things -
whether you do things the simple way or the complex way. You *could*
implement the string search by manually iterating over all the
characters in the string, perhaps even writing your own state machine
to do it. The code could be pretty readable considering what it's doing
- but it's *bound* to be more complex than using IndexOf.
As was also mentioned you also need to know the language. For someone
not used to objects, abstract objects and interfaces are also hard to
read.
Sure - but why introduce unnecessarily complexity? You're already
writing C#, so you'd better know C# - but why add regular expressions
into the mix when they're unnecessary?
I like seeing different options and make a choice. Sometimes I may use
something like Regex just so I am used to using it, as long as the problem
warrants it.
And that's the point - I don't think this problem *does* warrant it.
You don't use it - you lose it.


So do you add a database when you just need to do a hashtable lookup,
just in case you forget SQL? Do you use reflection to get at the value
of a property, just in case you forget how to use that? I hope not.

It's very important to use appropriate technology, rather than using it
for the sake of it. (It's one thing to experiment with technology for
the sake of it as a learning tool, but I wouldn't do it in production
code.)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #23

P: n/a
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
tshad <ts**********@ftsolutions.com> wrote:
I also feel that Regular Expressions, being an object in asp.net (not
necessarily C#) makes it just as valid as C#.
Regular expressions have nothing to do with ASP.NET - they're a part of
"normal" .NET.


Actually, you're right.

But that was my point.

Regex is part of .net as is C# (although it doesn't have to be) or VB.Net.
So using Regex is not really like using another language (as C# is different
from VB.Net).

But the discussion was valid in you use the best tool for the situation.
As far as readability, it has nothing to do with Regular Expressions
whether
it is readable or not, as Oliver mentions, but how you write it.
No - I believe that searching for "jon.skeet" with IndexOf is clearer
than searching for "jon\\.skeet" or @"jon\.skeet".


That's maybe true. But it would be clear to someone used to using both C#
and Regex.

Also, you have the same problem when dealing with web pages or getting a
file from the disk. You still use the escape character there (and as you
say, is a little confusing) - but you still do it.
Which of them
contains just the information which is actually of concern, and which
contains information which is only present due to the technology used
to do the searching?
You can also make some pretty unreadable C# code as well.
Sure, but that's no reason to use regular expressions just to make
things worse.


I agree with you that readability is important.

It used to be that people didn't like C and C++ for exactly the same reason
you point out. The code was not as clear as COBOL or Basic and that was the
complaint back then. I happened to be a Fortran programmer at that time and
was not interested to moving to C for that reason (not that Fortran was
better - readability wise).

The problem with C back that was that even though much of the code was
really cryptic. But it didn't have to be, that was just how people coded
back then. Mainly, it was important to make the most efficient code
possible because of the limited computing power and efficient rarely equates
to readable. And I am not even talking about compiling and linking and all
the options and cryptic command lines.
Readability is a function of the programmer not the language (in most
cases).
Yes, but it's the programmer's decision how to approach things -
whether you do things the simple way or the complex way. You *could*
implement the string search by manually iterating over all the
characters in the string, perhaps even writing your own state machine
to do it. The code could be pretty readable considering what it's doing
- but it's *bound* to be more complex than using IndexOf.


I agree.

Just because you can - doesn't mean you should.
As was also mentioned you also need to know the language. For someone
not used to objects, abstract objects and interfaces are also hard to
read.
Sure - but why introduce unnecessarily complexity? You're already
writing C#, so you'd better know C# - but why add regular expressions
into the mix when they're unnecessary?


But if you know both and as I (and you) mentioned regex is part of .net as
is C# - so it is already in the mix. But you're right, don't introduce any
more complexity that necessary. But if it's 6 of one ... it's really up to
the programmer. In the original case, that was what it was. You can't tell
me that you feel that the solution suggested for this case was even close to
being unreadable (if you are even a stones throw from understanding Regular
Expressions).

I personally feel that both solutions are equally usable and readable (in
this situation).

I have also seen times when I just couldn't find an easy solution in C# or
VB and it was fairly easy in Regex.

I myself would usually opt for the C# or VB solutions first, but would have
no problem using Regex. As a matter of fact, I use Regex to strip commas
and $ from my textbox fields before writing it to SQL as it was the best
solution I could find. Such as:

SalaryMax.Text =
String.Format("{0:c}",CalculateYearly(Regex.Replac e(WagesMax.Text,"\$|\,","")))

At the time, I couldn't seem to find as simple a solution as this in VB.Net
so I use this (not saying there isn't one).
I like seeing different options and make a choice. Sometimes I may use
something like Regex just so I am used to using it, as long as the
problem
warrants it.
And that's the point - I don't think this problem *does* warrant it.


I agree that is isn't necessary here, but I don't think it is warranted or
unwarranted here. I think it's just as readable either way.
You don't use it - you lose it.
So do you add a database when you just need to do a hashtable lookup,
just in case you forget SQL? Do you use reflection to get at the value
of a property, just in case you forget how to use that? I hope not.


Of course not. But as was mentioned there are times where Regex may be a
good solution and if you can do it either way, why not.

It's very important to use appropriate technology, rather than using it
for the sake of it. (It's one thing to experiment with technology for
the sake of it as a learning tool, but I wouldn't do it in production
code.)
Right. But Regex is not inappropriate technology. As you said, trying to
loop through each character when there is an easier way is a bit much.

But Regex is valid and is an appropriate method for handling strings and if
you are as comfortable with one as the other than it isn't inappropriate.
It's all in how you use it. And I was not saying experiment with it. I was
saying using it for the sake of staying familier with it. I don't want to
need to use it and have to figure it out when I need to use it.

As you said. Use the appropriate tool. If the appropriate tool is Regex,
it is going to be d... inconvenient to need it and not know how to use it.

Now I am not saying go out and learn every tool out there. But if it is a
valid tool in your particular environment, and it is available - why would
you not avail yourself of it?

Tom --
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #24

P: n/a
tshad <ts**********@ftsolutions.com> wrote:
Regular expressions have nothing to do with ASP.NET - they're a part of
"normal" .NET.
Actually, you're right.

But that was my point.

Regex is part of .net as is C# (although it doesn't have to be) or VB.Net.
So using Regex is not really like using another language (as C# is different
from VB.Net).


It is - the regular expression *language* is a different language to
C#, in the same way that XPath is. That's why under "regular
expressions" in MSDN, there's a "language elements" section.
But the discussion was valid in you use the best tool for the situation.
Indeed.
As far as readability, it has nothing to do with Regular Expressions
whether
it is readable or not, as Oliver mentions, but how you write it.


No - I believe that searching for "jon.skeet" with IndexOf is clearer
than searching for "jon\\.skeet" or @"jon\.skeet".


That's maybe true. But it would be clear to someone used to using both C#
and Regex.


But not as instantly clear, I believe. Can you really say that you find
the regex version doesn't take you *any* longer to understand than the
non-regex version?
Also, you have the same problem when dealing with web pages or getting a
file from the disk. You still use the escape character there (and as you
say, is a little confusing) - but you still do it.
You have to know the C# escaping, but not the regular expression
escaping.
You can also make some pretty unreadable C# code as well.


Sure, but that's no reason to use regular expressions just to make
things worse.


I agree with you that readability is important.

It used to be that people didn't like C and C++ for exactly the same reason
you point out. The code was not as clear as COBOL or Basic and that was the
complaint back then. I happened to be a Fortran programmer at that time and
was not interested to moving to C for that reason (not that Fortran was
better - readability wise).

The problem with C back that was that even though much of the code was
really cryptic. But it didn't have to be, that was just how people coded
back then. Mainly, it was important to make the most efficient code
possible because of the limited computing power and efficient rarely equates
to readable. And I am not even talking about compiling and linking and all
the options and cryptic command lines.


To me, a lot of readability comes from decent naming and commenting,
which fortunately are available in pretty much any language. I'd
certainly agree that object orientation (and exceptions, automatic
memory management etc) makes it a lot easier to write readable code
though.
Yes, but it's the programmer's decision how to approach things -
whether you do things the simple way or the complex way. You *could*
implement the string search by manually iterating over all the
characters in the string, perhaps even writing your own state machine
to do it. The code could be pretty readable considering what it's doing
- but it's *bound* to be more complex than using IndexOf.


I agree.

Just because you can - doesn't mean you should.


Exactly.
Sure - but why introduce unnecessarily complexity? You're already
writing C#, so you'd better know C# - but why add regular expressions
into the mix when they're unnecessary?


But if you know both and as I (and you) mentioned regex is part of .net as
is C# - so it is already in the mix.


No, it's not. It's not already used in every single C# program, any
more than SQL is.
But you're right, don't introduce any
more complexity that necessary. But if it's 6 of one ... it's really up to
the programmer.
In what way is it 6 of one or half a dozen of the other when one
solution requires knowing more than the other? I would expect *any* C#
programmer to know what String.IndexOf does. I wouldn't expect all C#
programmers to know by heart which regex language elements require
escaping - and if you don't know that off the top of your head, then
changing the code to search for a different string involves an extra
bit of brainpower.
In the original case, that was what it was. You can't tell
me that you feel that the solution suggested for this case was even close to
being unreadable (if you are even a stones throw from understanding Regular
Expressions).
It was *less* readable though - and would have been *significantly*
less readable if the string being searched for had included dots,
brackets etc.
I personally feel that both solutions are equally usable and readable (in
this situation).
I suspect not all programmers would though. Don't forget that the
person who writes the code is very often not the one to maintain it.
Can you guarantee that *everyone* who touches the code will find
regexes as readable as String.IndexOf?
I have also seen times when I just couldn't find an easy solution in C# or
VB and it was fairly easy in Regex.
Which is why I've said repeatedly that I'm not trying to suggest that
regexes are bad, or should never be used. I'm just saying that in this
case it's using a sledgehammer to crack a nut.
I myself would usually opt for the C# or VB solutions first, but would have
no problem using Regex. As a matter of fact, I use Regex to strip commas
and $ from my textbox fields before writing it to SQL as it was the best
solution I could find. Such as:

SalaryMax.Text =
String.Format("{0:c}",CalculateYearly(Regex.Replac e(WagesMax.Text,"\$|\,","")))

At the time, I couldn't seem to find as simple a solution as this in VB.Net
so I use this (not saying there isn't one).
And of course there is:
SalaryMax.Text =
String.Format ("{0:c}",CalculateYearly(WagesMax.Text.Replace("$" , "")
.Replace(",", ""));

I know which version I'd rather read...
And that's the point - I don't think this problem *does* warrant it.


I agree that is isn't necessary here, but I don't think it is warranted or
unwarranted here. I think it's just as readable either way.


But I suspect you're more used to regular expressions than many other
programmers - and making the code less readable for other programmers
for no benefit is what makes it unwarranted here, even in the simple
case where there's nothing to escape.
So do you add a database when you just need to do a hashtable lookup,
just in case you forget SQL? Do you use reflection to get at the value
of a property, just in case you forget how to use that? I hope not.


Of course not. But as was mentioned there are times where Regex may be a
good solution and if you can do it either way, why not.


Because it's more complicated! You can't deny that there's more to
consider due to the escaping. There's more to know, more to consider,
and it doesn't get the job done any more cleanly.
It's very important to use appropriate technology, rather than using it
for the sake of it. (It's one thing to experiment with technology for
the sake of it as a learning tool, but I wouldn't do it in production
code.)


Right. But Regex is not inappropriate technology. As you said, trying to
loop through each character when there is an easier way is a bit much.


As is using the power of regular expressions when there is an easier
way - using IndexOf, which is *precisely* there to find one string
within another.
But Regex is valid and is an appropriate method for handling strings and if
you are as comfortable with one as the other than it isn't inappropriate.
It's all in how you use it. And I was not saying experiment with it. I was
saying using it for the sake of staying familier with it. I don't want to
need to use it and have to figure it out when I need to use it.
Do you really think it would take you that long to refamiliarise
yourself with it? I don't see why it's a good idea to make some poor
maintenance engineer who hasn't used regular expressions before try to
figure out that *actually* you were just trying to find strings within
each other just so you can keep your skill set current.
As you said. Use the appropriate tool. If the appropriate tool is Regex,
it is going to be d... inconvenient to need it and not know how to use it.
I've never had a problem with reading the documentation when I've
needed to use regular expressions, without putting it in projects in
places where I *don't* need it.
Now I am not saying go out and learn every tool out there. But if it is a
valid tool in your particular environment, and it is available - why would
you not avail yourself of it?


Because it makes things more complicated for no benefit. The reflection
example was a good one - that allows you to get a property value, so do
you think it's a good idea to write:

string x = (string) something.GetType()
.GetProperty("Name")
.GetValue(something, null);
or

string x = something.Name;

?

Maybe I should use the latter. After all, I wouldn't want to forget how
to use reflection, would I?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #25

P: n/a
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
tshad <ts**********@ftsolutions.com> wrote:
> Regular expressions have nothing to do with ASP.NET - they're a part of
> "normal" .NET.
Actually, you're right.

But that was my point.

Regex is part of .net as is C# (although it doesn't have to be) or
VB.Net.
So using Regex is not really like using another language (as C# is
different
from VB.Net).


It is - the regular expression *language* is a different language to
C#, in the same way that XPath is. That's why under "regular
expressions" in MSDN, there's a "language elements" section.


I think calling it a language is a stretch, although I know it is called a
language in places(it's all in what you define as a language). It really is
a text/string processor, as is: IndexOf, Substring, Right, Replace etc used
by various languages.

You don't build pages with it. It isn't procedural. It is a tool used by
the other languages. You don't use VB.Net in C# or Vice versa but both use
Regular expressions (as the both use Substring, Replace etc).
But the discussion was valid in you use the best tool for the situation.
Indeed.
>> As far as readability, it has nothing to do with Regular Expressions
>> whether
>> it is readable or not, as Oliver mentions, but how you write it.
>
> No - I believe that searching for "jon.skeet" with IndexOf is clearer
> than searching for "jon\\.skeet" or @"jon\.skeet".


That's maybe true. But it would be clear to someone used to using both
C#
and Regex.


But not as instantly clear, I believe. Can you really say that you find
the regex version doesn't take you *any* longer to understand than the
non-regex version?


Depends on the C# code as well as the Regex code.

Again, are we talking about the best tool for the job or the most
readability. As was mentioned before, you set up loops and temporary
variables to do what you can do in a simple Regular Expression.

Again, I am not pushing Regular Expressions here, just that they are just a
valid as C# (or VB.Net) string handlers.

I do use them when convenient.

For example, I was creating a simple text search engine and wanted to modify
what the user put in and found it simpler to do the following than in VB or
C:

' The following replaces all multiple blanks with " ". It then takes
' out the anomalies, such as "and not and" and replaces them with "and"

keywords = trim(Regex.Replace(keywords, "\s{2,}", " "))
keywords = Regex.Replace(keywords, "( )", " or ")
keywords = Regex.Replace(keywords," or or "," ")
keywords = Regex.Replace(keywords,"or and or","and")
keywords = Regex.Replace(keywords,"or near or","near")
keywords = Regex.Replace(keywords,"and not or","and not")

Fairly straight forward and easy to follow.
Also, you have the same problem when dealing with web pages or getting a
file from the disk. You still use the escape character there (and as you
say, is a little confusing) - but you still do it.
You have to know the C# escaping, but not the regular expression
escaping.


But you do NEED to know the C# escaping (readability not high - unless you
understand it).
>> You can also make some pretty unreadable C# code as well.
>
> Sure, but that's no reason to use regular expressions just to make
> things worse.
I agree with you that readability is important.

It used to be that people didn't like C and C++ for exactly the same
reason
you point out. The code was not as clear as COBOL or Basic and that was
the
complaint back then. I happened to be a Fortran programmer at that time
and
was not interested to moving to C for that reason (not that Fortran was
better - readability wise).

The problem with C back that was that even though much of the code was
really cryptic. But it didn't have to be, that was just how people coded
back then. Mainly, it was important to make the most efficient code
possible because of the limited computing power and efficient rarely
equates
to readable. And I am not even talking about compiling and linking and
all
the options and cryptic command lines.


To me, a lot of readability comes from decent naming and commenting,
which fortunately are available in pretty much any language. I'd
certainly agree that object orientation (and exceptions, automatic
memory management etc) makes it a lot easier to write readable code
though.


But writing objects and the objects themselves are not easily readable. But
you would advocate not writing them, would you?
> Yes, but it's the programmer's decision how to approach things -
> whether you do things the simple way or the complex way. You *could*
> implement the string search by manually iterating over all the
> characters in the string, perhaps even writing your own state machine
> to do it. The code could be pretty readable considering what it's doing
> - but it's *bound* to be more complex than using IndexOf.
I agree.

Just because you can - doesn't mean you should.


Exactly.
> Sure - but why introduce unnecessarily complexity? You're already
> writing C#, so you'd better know C# - but why add regular expressions
> into the mix when they're unnecessary?


But if you know both and as I (and you) mentioned regex is part of .net
as
is C# - so it is already in the mix.


No, it's not. It's not already used in every single C# program, any
more than SQL is.


Nor are all the objects you use.

But if you are using .Net, it is part of the mix.
But you're right, don't introduce any
more complexity that necessary. But if it's 6 of one ... it's really up
to
the programmer.
In what way is it 6 of one or half a dozen of the other when one
solution requires knowing more than the other? I would expect *any* C#
programmer to know what String.IndexOf does. I wouldn't expect all C#
programmers to know by heart which regex language elements require
escaping - and if you don't know that off the top of your head, then
changing the code to search for a different string involves an extra
bit of brainpower.


Why? Ever heard of references or cheat sheets? And what is wrong with a
little extra brainpower - if you don't use it, you lose it :)

I don't know all of the possible combinations of calls to every Object, but
that doesn't preclude me from using them.

My position has always been, don't memorize. You will remember what you
use. But if you know how to get it (where to look), then you have
everything you need.

I happen to use .Net. Regex is part of .Net. I would be limiting myself if
I didn't use Regex in places where it is appropriate. If I happen to know a
good way in Regex to solve a problem, I am not going use *extra brainpower*
to try to solve the problem in C#.
In the original case, that was what it was. You can't tell
me that you feel that the solution suggested for this case was even close
to
being unreadable (if you are even a stones throw from understanding
Regular
Expressions).
It was *less* readable though - and would have been *significantly*
less readable if the string being searched for had included dots,
brackets etc.


But it didn't. But if it did, it is no different than having to deal with
escapes in C (less readable)

If you are talking about

if ((someString.IndexOf("something1",0) >= 0) ||
((someString.IndexOf("something2",0) >= 0) ||
((someString.IndexOf("something3",0) >= 0))
{
Do something
}

vs

if (Regex.IsMatch(myString, @"something1|something2|something3"))

If you know absolutely nothing about Regular expressions, I would agree that
this is less readable.

But I would also contend that IndexOf could be just as confusing. What is
the first 0 for? What about the 2nd? It is readable because you know C.

I would maintain that if even if you knew nothing about Regex, you would
assume that you are doing a Match (can't tell that from the word "IndexOf")
and it probably has something to do with the words "something1",
"something2" and "something3". Now if you know C than I would assume you
would pick up that "|" is "or" (not so clear to a VB programmer). And that
would be to someone not familier with regular expressions doing a quick
perusal

So I am at a loss as to how this regular expression is more unreadable than
the C# counterpart. That is not to say that you couldn't make it more
unreadable - but you could do the same with C# if you wanted to.
I personally feel that both solutions are equally usable and readable (in
this situation).
I suspect not all programmers would though. Don't forget that the
person who writes the code is very often not the one to maintain it.
Can you guarantee that *everyone* who touches the code will find
regexes as readable as String.IndexOf?


As was said, you can make readable and unreadable C or Regex code. Are you
going to tell your programmers they "cannot" use Regex for the same reason?

Are you going to leave out some objects that programmers may not be familier
with?
I have also seen times when I just couldn't find an easy solution in C#
or
VB and it was fairly easy in Regex.
Which is why I've said repeatedly that I'm not trying to suggest that
regexes are bad, or should never be used. I'm just saying that in this
case it's using a sledgehammer to crack a nut.


And I don't in this case, as I think I've shown. Less typing, easy to read,
straight forward - in this case.
I myself would usually opt for the C# or VB solutions first, but would
have
no problem using Regex. As a matter of fact, I use Regex to strip commas
and $ from my textbox fields before writing it to SQL as it was the best
solution I could find. Such as:

SalaryMax.Text =
String.Format("{0:c}",CalculateYearly(Regex.Replac e(WagesMax.Text,"\$|\,","")))

At the time, I couldn't seem to find as simple a solution as this in
VB.Net
so I use this (not saying there isn't one).
And of course there is:
SalaryMax.Text =
String.Format ("{0:c}",CalculateYearly(WagesMax.Text.Replace("$" , "")
.Replace(",", ""));

I know which version I'd rather read...


I can read either (although, I didn't know you could string multiple
"Replace"s together).
> And that's the point - I don't think this problem *does* warrant it.
I agree that is isn't necessary here, but I don't think it is warranted
or
unwarranted here. I think it's just as readable either way.


But I suspect you're more used to regular expressions than many other
programmers - and making the code less readable for other programmers
for no benefit is what makes it unwarranted here, even in the simple
case where there's nothing to escape.


First of all, I am not. I don't use it much at all, but I find it easy to
figure out and staight forward (but you can make it really complex). I use
it to validate phone numbers, credit card numbers, zip codes etc. Which are
very well documented and when there are a myiad of ways a user can put input
these types of data, I prefer to use Regular expressions which are all over
the place (easy to find) then try to come put with some complex set of loops
and temporary variables which make it far easier to make a mistake and much
more unreadable the the Regex equivelant.
> So do you add a database when you just need to do a hashtable lookup,
> just in case you forget SQL? Do you use reflection to get at the value
> of a property, just in case you forget how to use that? I hope not.
Of course not. But as was mentioned there are times where Regex may be a
good solution and if you can do it either way, why not.


Because it's more complicated! You can't deny that there's more to
consider due to the escaping. There's more to know, more to consider,
and it doesn't get the job done any more cleanly.


Escaping seems to be your main compaint with it.

I have the same problem with C or VB when trying to remember when to use "\"
vs "/" in paths or do I need to add "\" in front of my slash or quote.
These are inherent problems with pretty much all of them.
> It's very important to use appropriate technology, rather than using it
> for the sake of it. (It's one thing to experiment with technology for
> the sake of it as a learning tool, but I wouldn't do it in production
> code.)
Right. But Regex is not inappropriate technology. As you said, trying
to
loop through each character when there is an easier way is a bit much.


As is using the power of regular expressions when there is an easier
way - using IndexOf, which is *precisely* there to find one string
within another.


I am not discounting IndexOf, I am just saying that both work fine and are
just as readable (in this case). In other cases, that may not be the case
(with either C or Regex).
But Regex is valid and is an appropriate method for handling strings and
if
you are as comfortable with one as the other than it isn't inappropriate.
It's all in how you use it. And I was not saying experiment with it. I
was
saying using it for the sake of staying familier with it. I don't want
to
need to use it and have to figure it out when I need to use it.
Do you really think it would take you that long to refamiliarise
yourself with it? I don't see why it's a good idea to make some poor
maintenance engineer who hasn't used regular expressions before try to
figure out that *actually* you were just trying to find strings within
each other just so you can keep your skill set current.


So you would prefer to code to the lowest common denominator.

I am not going to code to the level of a junior programmer. I prefer that
he learn to code to a higher level.

I am not saying that that you still should write decent, readable, commented
code. But I am not going to limit myself because another programmer may not
be able to read well written code. If that were the case, I would not be
writing objects (abstract classes, interfaces, etc).
As you said. Use the appropriate tool. If the appropriate tool is
Regex,
it is going to be d... inconvenient to need it and not know how to use
it.
I've never had a problem with reading the documentation when I've
needed to use regular expressions, without putting it in projects in
places where I *don't* need it.


"Need" is a personal question. I don't thing it applies here. You prefer
IndexOf and I might prefer IsMatch.
Now I am not saying go out and learn every tool out there. But if it is
a
valid tool in your particular environment, and it is available - why
would
you not avail yourself of it?


Because it makes things more complicated for no benefit. The reflection
example was a good one - that allows you to get a property value, so do
you think it's a good idea to write:

string x = (string) something.GetType()
.GetProperty("Name")
.GetValue(something, null);
or

string x = something.Name;

?

Maybe I should use the latter. After all, I wouldn't want to forget how
to use reflection, would I?


Lost me on that one.

Tom
--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #26

P: n/a
tshad <ts**********@ftsolutions.com> wrote:
It is - the regular expression *language* is a different language to
C#, in the same way that XPath is. That's why under "regular
expressions" in MSDN, there's a "language elements" section.
I think calling it a language is a stretch, although I know it is called a
language in places(it's all in what you define as a language).


In plenty of places. It has a language with a defined syntax etc.
It really is
a text/string processor, as is: IndexOf, Substring, Right, Replace etc used
by various languages.

You don't build pages with it. It isn't procedural.
Neither of those are required for it to be a language.
It is a tool used by the other languages.
Sure - so is XPath, but that's a language too.
(See http://www.w3.org/TR/xpath)
You don't use VB.Net in C# or Vice versa but both use
Regular expressions (as the both use Substring, Replace etc).
None of those state that regular expressions aren't a language.
But not as instantly clear, I believe. Can you really say that you find
the regex version doesn't take you *any* longer to understand than the
non-regex version?


Depends on the C# code as well as the Regex code.


The C# code in question would be:

if (someVariable.IndexOf ("firstliteral") != -1 ||
someVariable.IndexOf ("secondliteral") != -1 ||
someVariable.IndexOf ("thirdliteral") != -1)

If I did it regularly, I'd write a short method which took a params
string array.
Again, are we talking about the best tool for the job or the most
readability.
Unless there's another compelling argument in favour of one tool or
another, readability is a very important part of choosing the best
tool.
As was mentioned before, you set up loops and temporary
variables to do what you can do in a simple Regular Expression.

Again, I am not pushing Regular Expressions here, just that they are just a
valid as C# (or VB.Net) string handlers.
But you're effectively pushing them in the situation described by the
OP when you say that the solution using regular expressions is as
readable as the solution without.
I do use them when convenient.

For example, I was creating a simple text search engine and wanted to modify
what the user put in and found it simpler to do the following than in VB or
C:

' The following replaces all multiple blanks with " ". It then takes
' out the anomalies, such as "and not and" and replaces them with "and"

keywords = trim(Regex.Replace(keywords, "\s{2,}", " "))
keywords = Regex.Replace(keywords, "( )", " or ")
keywords = Regex.Replace(keywords," or or "," ")
keywords = Regex.Replace(keywords,"or and or","and")
keywords = Regex.Replace(keywords,"or near or","near")
keywords = Regex.Replace(keywords,"and not or","and not")

Fairly straight forward and easy to follow.
Reasonably, although apart from the first regex, I'd suggest doing the
rest with straight calls to String.Replace. As an example of why I
think that would be more readable, what exactly do the second line do?
In some flavours of regular expressions, brackets form capturing
groups. Do they in .NET? I'd have to look it up. If it's really just
trying to replace the string "( )" with " or ", a call to
String.Replace would mean I didn't need to look anything up.
Also, you have the same problem when dealing with web pages or getting a
file from the disk. You still use the escape character there (and as you
say, is a little confusing) - but you still do it.


You have to know the C# escaping, but not the regular expression
escaping.


But you do NEED to know the C# escaping (readability not high - unless you
understand it).


Yes, but I *already* need to know that in order to write C#. Choosing
to use String.IndexOf doesn't add to what I need to remember - choosing
regular expressions does. In addition, there aren't many things which
need escaping compared with those which need escaping in regular
expressions. In addition to *that*, whenever you need to escape in
regular expressions, you also need to escape in C# (or remember to use
verbatim string literals) - yet another piece of headache.
To me, a lot of readability comes from decent naming and commenting,
which fortunately are available in pretty much any language. I'd
certainly agree that object orientation (and exceptions, automatic
memory management etc) makes it a lot easier to write readable code
though.


But writing objects and the objects themselves are not easily readable. But
you would advocate not writing them, would you?


No, but I don't see how that's relevant.
But if you know both and as I (and you) mentioned regex is part of .net
as is C# - so it is already in the mix.


No, it's not. It's not already used in every single C# program, any
more than SQL is.


Nor are all the objects you use.

But if you are using .Net, it is part of the mix.


It's not necessarily part of the mix I have to use. I suspect *very*
few programs don't do any string manipulation - knowing the string
methods well is *far* more fundamental to .NET programming than knowing
regular expressions.
In what way is it 6 of one or half a dozen of the other when one
solution requires knowing more than the other? I would expect *any* C#
programmer to know what String.IndexOf does. I wouldn't expect all C#
programmers to know by heart which regex language elements require
escaping - and if you don't know that off the top of your head, then
changing the code to search for a different string involves an extra
bit of brainpower.


Why? Ever heard of references or cheat sheets? And what is wrong with a
little extra brainpower - if you don't use it, you lose it :)


If you truly think that given two solutions which are otherwise equal,
the solution which is easiest to write, read and maintain doesn't win
hands down, we'll definitely never agree.

If you want to keep your hand in with respect to regular expressions,
do it in a test project, or with a regular expressions workbench. Keep
it out of code which needs to be read and maintained, probably by other
people who don't want to waste time because you wanted to keep your
skill set up to date.
I don't know all of the possible combinations of calls to every Object, but
that doesn't preclude me from using them.
Exactly - and you wouldn't go out of your way to use methods you don't
need, just to get into the habit of using them, would you?
My position has always been, don't memorize. You will remember what you
use. But if you know how to get it (where to look), then you have
everything you need.
Absolutely - so why are you so keen on making people either memorise or
look up the characters which need escaping for regular expressions
every time they read or modify your code?
I happen to use .Net. Regex is part of .Net. I would be limiting myself if
I didn't use Regex in places where it is appropriate.
I seem to be having difficulty making myself clear on this point: I
have never stated and will never state that you shouldn't use regular
expressions where they're appropriate. But they are *not* appropriate
in this case, as they are a more complex and less readable way of
solving the problem.

Show me a problem where the regex way of solving it is simpler than
using simple string operations (and there are plenty of problems like
that) and I'll plump for the regex in a heartbeat.
If I happen to know a good way in Regex to solve a problem, I am not
going use *extra brainpower* to try to solve the problem in C#.
In what way is using the method which is designed for *precisely* the
task in hand (finding something in a string) using extra brainpower? If
you're not familiar with String.IndexOf, you've got *much* bigger
things to worry about than whether or not your regular expression
skills are getting rusty.
It was *less* readable though - and would have been *significantly*
less readable if the string being searched for had included dots,
brackets etc.


But it didn't. But if it did, it is no different than having to deal with
escapes in C (less readable)

If you are talking about

if ((someString.IndexOf("something1",0) >= 0) ||
((someString.IndexOf("something2",0) >= 0) ||
((someString.IndexOf("something3",0) >= 0))
{
Do something
}

vs

if (Regex.IsMatch(myString, @"something1|something2|something3"))

If you know absolutely nothing about Regular expressions, I would agree that
this is less readable.

But I would also contend that IndexOf could be just as confusing. What is
the first 0 for? What about the 2nd? It is readable because you know C.


Well, for a start the 0s aren't necessary, and I wouldn't include them.
I would maintain that if even if you knew nothing about Regex, you would
assume that you are doing a Match (can't tell that from the word "IndexOf")
and it probably has something to do with the words "something1",
"something2" and "something3". Now if you know C than I would assume you
would pick up that "|" is "or" (not so clear to a VB programmer). And that
would be to someone not familier with regular expressions doing a quick
perusal
Okay - now suppose I need to change it from searching for "something1"
to "something.1" or "something[1]". How long does it take to change in
each version? How easy is it to read afterwards?
So I am at a loss as to how this regular expression is more unreadable than
the C# counterpart. That is not to say that you couldn't make it more
unreadable - but you could do the same with C# if you wanted to.
You could start by making the C# more readable, as I've shown...

However, the regex is already less readable:
1) It's got "|" as a "magic character" in there.
2) It's got all the strings concatenated, so it's harder to spot each
of them separately.

And that's before you need to actually *maintain* the code.

Furthermore, suppose you didn't just want to search for literals -
suppose one of the strings you wanted to search for was contained in a
variable. How sure are you that *no-one* on your team would use:

x+"|something2|something3"

as the regular expression?
I suspect not all programmers would though. Don't forget that the
person who writes the code is very often not the one to maintain it.
Can you guarantee that *everyone* who touches the code will find
regexes as readable as String.IndexOf?


As was said, you can make readable and unreadable C or Regex code. Are you
going to tell your programmers they "cannot" use Regex for the same reason?


I would tell programmers on my team not to use regular expressions
where the alternative is simpler and more readbale, yes.
Are you going to leave out some objects that programmers may not be familier
with?
Absolutely, where there are simpler and more familiar ways of solving
the same problem.
Which is why I've said repeatedly that I'm not trying to suggest that
regexes are bad, or should never be used. I'm just saying that in this
case it's using a sledgehammer to crack a nut.


And I don't in this case, as I think I've shown. Less typing, easy to read,
straight forward - in this case.


You've shown nothing of the kind - whereas I think I've given plenty of
examples of how using regular expressions make the code less easily
maintainable, even if you consider it equally readable to start with
(which I don't).
SalaryMax.Text =
String.Format("{0:c}",CalculateYearly(Regex.Replac e(WagesMax.Text,"\$|\,","")))

At the time, I couldn't seem to find as simple a solution as this in
VB.Net
so I use this (not saying there isn't one).


And of course there is:
SalaryMax.Text =
String.Format ("{0:c}",CalculateYearly(WagesMax.Text.Replace("$" , "")
.Replace(",", ""));

I know which version I'd rather read...


I can read either (although, I didn't know you could string multiple
"Replace"s together).


Yes, I can read either too. The point is that in reading my version, I
didn't need to wade through various special characters, understanding
exactly what was there for. Of course, your version wasn't even valid
C#, as it didn't escape the backslashes and you didn't specify a
verbatim literal. I assume it was originally VB.NET. I wonder which
version would be easier to convert to valid C#? Mine, perhaps?
But I suspect you're more used to regular expressions than many other
programmers - and making the code less readable for other programmers
for no benefit is what makes it unwarranted here, even in the simple
case where there's nothing to escape.


First of all, I am not. I don't use it much at all, but I find it easy to
figure out and staight forward (but you can make it really complex). I use
it to validate phone numbers, credit card numbers, zip codes etc.


And in all of those cases, regular expressions are really useful.
Which are very well documented and when there are a myiad of ways a
user can put input these types of data, I prefer to use Regular
expressions which are all over the place (easy to find) then try to
come put with some complex set of loops and temporary variables which
make it far easier to make a mistake and much more unreadable the the
Regex equivelant.
Where exactly are the complex loops and temporary variables in this
specific case? After all, you have been arguing for using regular
expressions in *this specific case*, haven't you?
Because it's more complicated! You can't deny that there's more to
consider due to the escaping. There's more to know, more to consider,
and it doesn't get the job done any more cleanly.


Escaping seems to be your main compaint with it.


It's the main potential source of problems, yes. It's a potential
source of problems which simply doesn't exist when you use
String.IndexOf.
I have the same problem with C or VB when trying to remember when to use "\"
vs "/" in paths or do I need to add "\" in front of my slash or quote.
These are inherent problems with pretty much all of them.
You already need to know that when writing C# though - my use of
String.IndexOf doesn't add to the volume of knowledge required.
As is using the power of regular expressions when there is an easier
way - using IndexOf, which is *precisely* there to find one string
within another.


I am not discounting IndexOf, I am just saying that both work fine and are
just as readable (in this case). In other cases, that may not be the case
(with either C or Regex).


Just because they're as readable *to you* doesn't mean they're as
readable to everyone. How sure are you that the next engineer to read
this code will be familiar with regular expressions? How sure are you
that when you need to change it to look for a different string, you'll
check whether any of the characters need to be escaped? Why would you
even want to force that check on yourself?
Do you really think it would take you that long to refamiliarise
yourself with it? I don't see why it's a good idea to make some poor
maintenance engineer who hasn't used regular expressions before try to
figure out that *actually* you were just trying to find strings within
each other just so you can keep your skill set current.


So you would prefer to code to the lowest common denominator.


When there's no good reason not to, absolutely.
I am not going to code to the level of a junior programmer. I prefer that
he learn to code to a higher level.
Learning to solve problems as simply as possible *is* learning to code
to a higher level.
I am not saying that that you still should write decent, readable, commented
code. But I am not going to limit myself because another programmer may not
be able to read well written code. If that were the case, I would not be
writing objects (abstract classes, interfaces, etc).


If it's not the simplest code for the situation, it's not well written
IMO. If it introduces risk for no reward (the risk of maintenance
failing to notice that they might need to escape something, versus no
reward) then it's not well written.
I've never had a problem with reading the documentation when I've
needed to use regular expressions, without putting it in projects in
places where I *don't* need it.


"Need" is a personal question. I don't thing it applies here. You prefer
IndexOf and I might prefer IsMatch.


I bet if I showed my code to a random sample of a hundred C# developers
and asked them to change it to search for "hello[there]", virtually all
of them would get it right. I also bet that if I showed your code to
them and asked them for the same change, some would fail to escape it
appropriately. Do you disagree?
Because it makes things more complicated for no benefit. The reflection
example was a good one - that allows you to get a property value, so do
you think it's a good idea to write:

string x = (string) something.GetType()
.GetProperty("Name")
.GetValue(something, null);
or

string x = something.Name;

?

Maybe I should use the latter. After all, I wouldn't want to forget how
to use reflection, would I?


Lost me on that one.


Both are ways of finding the value of a property. The first is harder
to maintain and harder to read, just like your use of regular
expressions in this instance. Now, which of the above snippets of code
would you use, and why?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #27

P: n/a
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
tshad <ts**********@ftsolutions.com> wrote:
> It is - the regular expression *language* is a different language to
> C#, in the same way that XPath is. That's why under "regular
> expressions" in MSDN, there's a "language elements" section.
I think calling it a language is a stretch, although I know it is called
a
language in places(it's all in what you define as a language).


In plenty of places. It has a language with a defined syntax etc.


Yes, but so are dolphin sounds.

When I talk about a Programming Language - I am talking about a Procedural
Language (C, Fortran, VB, Pascal, etc.).
It really is
a text/string processor, as is: IndexOf, Substring, Right, Replace etc
used
by various languages.

You don't build pages with it. It isn't procedural.
Neither of those are required for it to be a language.
It is a tool used by the other languages.


Sure - so is XPath, but that's a language too.
(See http://www.w3.org/TR/xpath)
You don't use VB.Net in C# or Vice versa but both use
Regular expressions (as the both use Substring, Replace etc).


None of those state that regular expressions aren't a language.
> But not as instantly clear, I believe. Can you really say that you find
> the regex version doesn't take you *any* longer to understand than the
> non-regex version?


Depends on the C# code as well as the Regex code.


The C# code in question would be:

if (someVariable.IndexOf ("firstliteral") != -1 ||
someVariable.IndexOf ("secondliteral") != -1 ||
someVariable.IndexOf ("thirdliteral") != -1)


And the Regex version:

if (Regex.IsMatch(myString, @"something1|something2|something3"))
If I did it regularly, I'd write a short method which took a params
string array.
Again, are we talking about the best tool for the job or the most
readability.
Unless there's another compelling argument in favour of one tool or
another, readability is a very important part of choosing the best
tool.


Again, why do I need a compelling reason. If I have the solution and it
happens to be Regex, I would use it, I wouldn't necessarily say to myself -
"Is there perhaps a more readable way to write this? I wonder if Jim will
be able to read this or not."
As was mentioned before, you set up loops and temporary
variables to do what you can do in a simple Regular Expression.

Again, I am not pushing Regular Expressions here, just that they are just
a
valid as C# (or VB.Net) string handlers.
But you're effectively pushing them in the situation described by the
OP when you say that the solution using regular expressions is as
readable as the solution without.


No.

No pushing. No more than your pushing not using it.
I do use them when convenient.

For example, I was creating a simple text search engine and wanted to
modify
what the user put in and found it simpler to do the following than in VB
or
C:

' The following replaces all multiple blanks with " ". It then takes
' out the anomalies, such as "and not and" and replaces them with "and"

keywords = trim(Regex.Replace(keywords, "\s{2,}", " "))
keywords = Regex.Replace(keywords, "( )", " or ")
keywords = Regex.Replace(keywords," or or "," ")
keywords = Regex.Replace(keywords,"or and or","and")
keywords = Regex.Replace(keywords,"or near or","near")
keywords = Regex.Replace(keywords,"and not or","and not")

Fairly straight forward and easy to follow.
Reasonably, although apart from the first regex, I'd suggest doing the
rest with straight calls to String.Replace. As an example of why I
think that would be more readable, what exactly do the second line do?


Actually, nothing. It is grouping a " ", which isn't necessary. I think I
used to have something else there and took it out and didn't realize I
didn't need the ().
In some flavours of regular expressions, brackets form capturing
groups. Do they in .NET? I'd have to look it up. If it's really just
trying to replace the string "( )" with " or ", a call to
String.Replace would mean I didn't need to look anything up.
Obviously, you didn't need to look this one up either - as you were correct.
It is just grouping a blank.
>> Also, you have the same problem when dealing with web pages or getting
>> a
>> file from the disk. You still use the escape character there (and as
>> you
>> say, is a little confusing) - but you still do it.
>
> You have to know the C# escaping, but not the regular expression
> escaping.
But you do NEED to know the C# escaping (readability not high - unless
you
understand it).


Yes, but I *already* need to know that in order to write C#. Choosing
to use String.IndexOf doesn't add to what I need to remember - choosing
regular expressions does. In addition, there aren't many things which
need escaping compared with those which need escaping in regular
expressions. In addition to *that*, whenever you need to escape in
regular expressions, you also need to escape in C# (or remember to use
verbatim string literals) - yet another piece of headache.
> To me, a lot of readability comes from decent naming and commenting,
> which fortunately are available in pretty much any language. I'd
> certainly agree that object orientation (and exceptions, automatic
> memory management etc) makes it a lot easier to write readable code
> though.


But writing objects and the objects themselves are not easily readable.
But
you would advocate not writing them, would you?


No, but I don't see how that's relevant.


Just that you don't want to Regex as it is not easily readable. Neither are
Regex.

But the fact a junior programmer might not understand Objects as you do
would not prevent you from writing them, would you?
>> But if you know both and as I (and you) mentioned regex is part of
>> .net
>> as is C# - so it is already in the mix.
>
> No, it's not. It's not already used in every single C# program, any
> more than SQL is.
Nor are all the objects you use.

But if you are using .Net, it is part of the mix.


It's not necessarily part of the mix I have to use.


You don't have to use lots of things. That doesn't make them invalid.
Neither is the fact that you use Foreach vs For {}. They are there and are
part of the mix as is Regex. I might agree with you more if Regex were some
component that you picked up and added. Or if Regex were some obscure
technique that few knew about. They have been around for quite a long time
and is just another gun in your arsenal. If I thought that MS were
deprecating it, I would also think twice about using it. But it is part of
..Net that all the languages can make use of and I would never tell a
programmer, who may be really comfortable with it and uses it responsibly
(not obscure cryptic non-commented code), that he should be using IndexOf
instead.
I suspect *very*
few programs don't do any string manipulation - knowing the string
methods well is *far* more fundamental to .NET programming than knowing
regular expressions.
I agree with part of that and think that regular expressions are just as
important to know. As we have been saying, it is here and many people use
it, so to not understand it is to limit yourself. You don't have to use it,
but you should at least understand the basics of how it works. What are you
going to do when someone uses a RegularExpressionValidator and you don't
understand what the expression is? The fact that it is not C# (neither is a
textbox, datagrid, etc), doesn't mean you should understand them. Whether
you use them is up to you.

As you point out, you are not the only programmer and many programmers like
to use Regex and that doesn't make them any lesser programmers. What are
you going to when you run into their code?

I see code all the time (much of the time it is mine) and wonder why the
programmer didn't do it another way. There are many ways to skin a cat.
Sometimes it is just style, sometimes it is all they know. But if they
follow whatever standards are setup (and in your case maybe you forbid
Regex) then as long as the code is well written and clean - I have no
problem with it.
> In what way is it 6 of one or half a dozen of the other when one
> solution requires knowing more than the other? I would expect *any* C#
> programmer to know what String.IndexOf does. I wouldn't expect all C#
> programmers to know by heart which regex language elements require
> escaping - and if you don't know that off the top of your head, then
> changing the code to search for a different string involves an extra
> bit of brainpower.
Why? Ever heard of references or cheat sheets? And what is wrong with a
little extra brainpower - if you don't use it, you lose it :)


If you truly think that given two solutions which are otherwise equal,
the solution which is easiest to write, read and maintain doesn't win
hands down, we'll definitely never agree.


I agree there.

Which is easier to write is obviously your perception. I found my example,
as easy as yours to write and just as readable.
If you want to keep your hand in with respect to regular expressions,
do it in a test project, or with a regular expressions workbench. Keep
it out of code which needs to be read and maintained, probably by other
people who don't want to waste time because you wanted to keep your
skill set up to date.

Keep regular expressions out of my code?????

So now you are saying there is no use for it?
I don't know all of the possible combinations of calls to every Object,
but
that doesn't preclude me from using them.


Exactly - and you wouldn't go out of your way to use methods you don't
need, just to get into the habit of using them, would you?


Sure.

If it is valid. As I said there are many ways to skin ..., depending on the
situation I may do it one way and the next time another way. Gives me many
options. I don't do it willy nilly, as you seem to suggest, as a test
bench.
My position has always been, don't memorize. You will remember what you
use. But if you know how to get it (where to look), then you have
everything you need.
Absolutely - so why are you so keen on making people either memorise or
look up the characters which need escaping for regular expressions
every time they read or modify your code?

I am not. I don't memorize. But I still use it.
I happen to use .Net. Regex is part of .Net. I would be limiting myself
if
I didn't use Regex in places where it is appropriate.


I seem to be having difficulty making myself clear on this point: I
have never stated and will never state that you shouldn't use regular
expressions where they're appropriate. But they are *not* appropriate
in this case, as they are a more complex and less readable way of
solving the problem.


No you are very clear. If you are so concerned with others being able to
read your code and problems with escape characters - why would you EVER want
them to use them. You can't have it both ways.

If they would have a hard time with a nothing expression like "if
(Regex.IsMatch(myString, @"something1|something2|something3"))" - they are
never going to get some of the of the other standard Regex solutions I
mentioned before.

As you said, the two solutions are equal. Your solution is that you MUST go
with IndexOf. Mine is you can use either.

Show me a problem where the regex way of solving it is simpler than
using simple string operations (and there are plenty of problems like
that) and I'll plump for the regex in a heartbeat.
If I happen to know a good way in Regex to solve a problem, I am not
going use *extra brainpower* to try to solve the problem in C#.
In what way is using the method which is designed for *precisely* the
task in hand (finding something in a string) using extra brainpower?


I wasn't referring to this particular issue when I said this.
If
you're not familiar with String.IndexOf, you've got *much* bigger
things to worry about than whether or not your regular expression
skills are getting rusty.
I never said I was not familier with IndexOf.

As a matter of fact, the original question was given whether you could "do a
search for more that one string in another string".

************************************************** **************
Can you do a search for more that one string in another string?

Something like:

someString.IndexOf("something1","something2","some thing3",0)

or would you have to do something like:

if ((someString.IndexOf("something1",0) >= 0) ||
((someString.IndexOf("something2",0) >= 0) ||
((someString.IndexOf("something3",0) >= 0))
{
Do something
}
************************************************** *************************
IndexOf doesn't do it. This was the original question. You have to do
multiple calls as is said in the original question. Nicholas was correct in
his assessment. One Regex call would work.
> It was *less* readable though - and would have been *significantly*
> less readable if the string being searched for had included dots,
> brackets etc.
But it didn't. But if it did, it is no different than having to deal
with
escapes in C (less readable)

If you are talking about

if ((someString.IndexOf("something1",0) >= 0) ||
((someString.IndexOf("something2",0) >= 0) ||
((someString.IndexOf("something3",0) >= 0))
{
Do something
}

vs

if (Regex.IsMatch(myString, @"something1|something2|something3"))

If you know absolutely nothing about Regular expressions, I would agree
that
this is less readable.

But I would also contend that IndexOf could be just as confusing. What
is
the first 0 for? What about the 2nd? It is readable because you know C.


Well, for a start the 0s aren't necessary, and I wouldn't include them.


You're right.
I would maintain that if even if you knew nothing about Regex, you would
assume that you are doing a Match (can't tell that from the word
"IndexOf")
and it probably has something to do with the words "something1",
"something2" and "something3". Now if you know C than I would assume you
would pick up that "|" is "or" (not so clear to a VB programmer). And
that
would be to someone not familier with regular expressions doing a quick
perusal
Okay - now suppose I need to change it from searching for "something1"
to "something.1" or "something[1]". How long does it take to change in
each version? How easy is it to read afterwards?


That wasn't the question.

What if you wanted to change "something1" to "something\". Same problem.
And if escapes were a problem (if it were me) I would have a little sheet
that showed them at my desk within easy reach.
So I am at a loss as to how this regular expression is more unreadable
than
the C# counterpart. That is not to say that you couldn't make it more
unreadable - but you could do the same with C# if you wanted to.
You could start by making the C# more readable, as I've shown...


As you can with Regular Expressions.

However, the regex is already less readable:
1) It's got "|" as a "magic character" in there.
| = or (same as C)
2) It's got all the strings concatenated, so it's harder to spot each
of them separately.
You are kidding, right?

And that's before you need to actually *maintain* the code.

Furthermore, suppose you didn't just want to search for literals -
suppose one of the strings you wanted to search for was contained in a
variable. How sure are you that *no-one* on your team would use:

x+"|something2|something3"

as the regular expression?
You are now leaving the original question. I never said that Regular
Expressions was the better (or not better) in all cases.
> I suspect not all programmers would though. Don't forget that the
> person who writes the code is very often not the one to maintain it.
> Can you guarantee that *everyone* who touches the code will find
> regexes as readable as String.IndexOf?


As was said, you can make readable and unreadable C or Regex code. Are
you
going to tell your programmers they "cannot" use Regex for the same
reason?


I would tell programmers on my team not to use regular expressions
where the alternative is simpler and more readbale, yes.


Why use them at all? It isn't readable.

And if your programmers can't maintain the simple Regexs, they definately
won't be able to handle the more complicated ones.
Are you going to leave out some objects that programmers may not be
familier
with?
Absolutely, where there are simpler and more familiar ways of solving
the same problem.
> Which is why I've said repeatedly that I'm not trying to suggest that
> regexes are bad, or should never be used. I'm just saying that in this
> case it's using a sledgehammer to crack a nut.


And I don't in this case, as I think I've shown. Less typing, easy to
read,
straight forward - in this case.


You've shown nothing of the kind - whereas I think I've given plenty of
examples of how using regular expressions make the code less easily
maintainable, even if you consider it equally readable to start with
(which I don't).


Not in this specific case. I was never maintaining or pushing Regex for all
or any situations.

But I am not going to force my programmers to come to me to find out whether
or not Regex is the easiest way or not. That is up to the programmer. If
there is a problem with their code and feel the programmer is way off base
in his coding we would talk about (that would be the case with his C#, VB or
Regex code).
>> SalaryMax.Text =
>> String.Format("{0:c}",CalculateYearly(Regex.Replac e(WagesMax.Text,"\$|\,","")))
>>
>> At the time, I couldn't seem to find as simple a solution as this in
>> VB.Net
>> so I use this (not saying there isn't one).
>
> And of course there is:
> SalaryMax.Text =
> String.Format ("{0:c}",CalculateYearly(WagesMax.Text.Replace("$" , "")
> .Replace(",", ""));
>
> I know which version I'd rather read...
I can read either (although, I didn't know you could string multiple
"Replace"s together).


Yes, I can read either too. The point is that in reading my version, I
didn't need to wade through various special characters, understanding
exactly what was there for.


If you knew enough to know about Regex at all (which you said you would have
no problem with in some situations - so the programmers better be able to
read it), there should not be a problem with the 2 special characters which
is the same as C#. There is nothing obscure in this example - that I can
see.
Of course, your version wasn't even valid
C#, as it didn't escape the backslashes and you didn't specify a
verbatim literal. I assume it was originally VB.NET. I wonder which
version would be easier to convert to valid C#? Mine, perhaps?
Actually, it was VB.Net.
> But I suspect you're more used to regular expressions than many other
> programmers - and making the code less readable for other programmers
> for no benefit is what makes it unwarranted here, even in the simple
> case where there's nothing to escape.
First of all, I am not. I don't use it much at all, but I find it easy
to
figure out and staight forward (but you can make it really complex). I
use
it to validate phone numbers, credit card numbers, zip codes etc.


And in all of those cases, regular expressions are really useful.


But according to you, you shouldn't use them as some of the programmers may
not be able to maintain it. Definately if they would have a problem with
our example.

Can't have it both ways. If you allow Regular Expressions, you shouldn't
have a problem if a programmer used the Regex or IndexOf in our example.
Anyone maintaining the "USEFUL" ones would have zero problems with this one.
Which are very well documented and when there are a myiad of ways a
user can put input these types of data, I prefer to use Regular
expressions which are all over the place (easy to find) then try to
come put with some complex set of loops and temporary variables which
make it far easier to make a mistake and much more unreadable the the
Regex equivelant.
Where exactly are the complex loops and temporary variables in this
specific case? After all, you have been arguing for using regular
expressions in *this specific case*, haven't you?


I was obviously talking about Regular Expressions in general here as I was
refering to the standard ones you can get anywhere dealing with (Phone
numbers, credit card etc). There would be none in this case, obviously.
But there may be in more complicated cases.
> Because it's more complicated! You can't deny that there's more to
> consider due to the escaping. There's more to know, more to consider,
> and it doesn't get the job done any more cleanly.


Escaping seems to be your main compaint with it.


It's the main potential source of problems, yes. It's a potential
source of problems which simply doesn't exist when you use
String.IndexOf.
I have the same problem with C or VB when trying to remember when to use
"\"
vs "/" in paths or do I need to add "\" in front of my slash or quote.
These are inherent problems with pretty much all of them.


You already need to know that when writing C# though - my use of
String.IndexOf doesn't add to the volume of knowledge required.

It is still an issue. Just as the Regular expressions are. And again, if
you are going to allow Regex at all, you would still need to know about the
escapes.
> As is using the power of regular expressions when there is an easier
> way - using IndexOf, which is *precisely* there to find one string
> within another.


I am not discounting IndexOf, I am just saying that both work fine and
are
just as readable (in this case). In other cases, that may not be the
case
(with either C or Regex).


Just because they're as readable *to you* doesn't mean they're as
readable to everyone. How sure are you that the next engineer to read
this code will be familiar with regular expressions? How sure are you
that when you need to change it to look for a different string, you'll
check whether any of the characters need to be escaped? Why would you
even want to force that check on yourself?


Again - then don't allow them at all.
> Do you really think it would take you that long to refamiliarise
> yourself with it? I don't see why it's a good idea to make some poor
> maintenance engineer who hasn't used regular expressions before try to
> figure out that *actually* you were just trying to find strings within
> each other just so you can keep your skill set current.
So you would prefer to code to the lowest common denominator.


When there's no good reason not to, absolutely.


I guess that is where we disagree.
I am not going to code to the level of a junior programmer. I prefer
that
he learn to code to a higher level.
Learning to solve problems as simply as possible *is* learning to code
to a higher level.


No argument there.
I am not saying that that you still should write decent, readable,
commented
code. But I am not going to limit myself because another programmer may
not
be able to read well written code. If that were the case, I would not be
writing objects (abstract classes, interfaces, etc).
If it's not the simplest code for the situation, it's not well written
IMO. If it introduces risk for no reward (the risk of maintenance
failing to notice that they might need to escape something, versus no
reward) then it's not well written.

I see no risk in the example we are talking about. At least, no more that
in the IndexOf solution (in this situation).
> I've never had a problem with reading the documentation when I've
> needed to use regular expressions, without putting it in projects in
> places where I *don't* need it.


"Need" is a personal question. I don't thing it applies here. You
prefer
IndexOf and I might prefer IsMatch.


I bet if I showed my code to a random sample of a hundred C# developers
and asked them to change it to search for "hello[there]", virtually all
of them would get it right. I also bet that if I showed your code to
them and asked them for the same change, some would fail to escape it
appropriately. Do you disagree?


No. But then the same developers would have a problem with the more
complicated expressions you claim is useful.
> Because it makes things more complicated for no benefit. The reflection
> example was a good one - that allows you to get a property value, so do
> you think it's a good idea to write:
>
> string x = (string) something.GetType()
> .GetProperty("Name")
> .GetValue(something, null);
> or
>
> string x = something.Name;
>
> ?
>
> Maybe I should use the latter. After all, I wouldn't want to forget how
> to use reflection, would I?


Lost me on that one.


Both are ways of finding the value of a property. The first is harder
to maintain and harder to read, just like your use of regular
expressions in this instance. Now, which of the above snippets of code
would you use, and why?


Since I am not sure why you would use the first, I would do the 2nd.

But in our case, I would still use either - as I see the Regex version as
easy as the IndexOf.

Tom
Nov 17 '05 #28

P: n/a
tshad <ts**********@ftsolutions.com> wrote:
In plenty of places. It has a language with a defined syntax etc.
Yes, but so are dolphin sounds.

When I talk about a Programming Language - I am talking about a Procedural
Language (C, Fortran, VB, Pascal, etc.).


So you wouldn't regard LISP as a programming language, just because
it's functional rather than procedural?

Of course, you didn't even specify "programming language" before.

Regular expressions form a language in computing, and that language
needs to be learned before being used, just as any other language does,
whether it's C#, HTML, XPath or VB.NET.
The C# code in question would be:

if (someVariable.IndexOf ("firstliteral") != -1 ||
someVariable.IndexOf ("secondliteral") != -1 ||
someVariable.IndexOf ("thirdliteral") != -1)


And the Regex version:

if (Regex.IsMatch(myString, @"something1|something2|something3"))


Right. Immediately the IndexOf value is more readable, by more clearly
separating the three separate strings which are being searched on.
(Oliver Sturm's version is more readable than that
Unless there's another compelling argument in favour of one tool or
another, readability is a very important part of choosing the best
tool.


Again, why do I need a compelling reason. If I have the solution and it
happens to be Regex, I would use it, I wouldn't necessarily say to myself -
"Is there perhaps a more readable way to write this? I wonder if Jim will
be able to read this or not."


Then I'm afraid that's your problem. It sounds like you're basically
admitting that you're not that interested in readability. Personally, I
like writing code which is elegant but easy to maintain. Having *a*
solution which happens to work isn't enough when there are obviously
others available which could well be simpler.

Far more time is spent maintaining code than writing it in the first
place. Taking the attitude you take above just isn't cost-effective in
the long run.
But you're effectively pushing them in the situation described by the
OP when you say that the solution using regular expressions is as
readable as the solution without.


No.

No pushing. No more than your pushing not using it.


But I'll readily admit to pushing the (IMO simpler) solution, for this
particular situation. So are you actually admitting that you *are*
pushing the use of regular expressions here?
' The following replaces all multiple blanks with " ". It then takes
' out the anomalies, such as "and not and" and replaces them with "and"

keywords = trim(Regex.Replace(keywords, "\s{2,}", " "))
keywords = Regex.Replace(keywords, "( )", " or ")
keywords = Regex.Replace(keywords," or or "," ")
keywords = Regex.Replace(keywords,"or and or","and")
keywords = Regex.Replace(keywords,"or near or","near")
keywords = Regex.Replace(keywords,"and not or","and not")

Fairly straight forward and easy to follow.


Reasonably, although apart from the first regex, I'd suggest doing the
rest with straight calls to String.Replace. As an example of why I
think that would be more readable, what exactly do the second line do?


Actually, nothing. It is grouping a " ", which isn't necessary. I think I
used to have something else there and took it out and didn't realize I
didn't need the ().


So again, the code could be made more readable even by just modifying
the existing regex replacement, let alone by replacing the regular
expressions with simple String.Replace calls. Had they been
String.Replace calls, the meaning of the second line would have been
unambiguous - you'd have had to write it the simple way to start with.

Note that your first replacement will replace two tabs with a single
space, but leave one tab alone, by the way. It would be better to
replace "\s+" with the space, IMO.
In some flavours of regular expressions, brackets form capturing
groups. Do they in .NET? I'd have to look it up. If it's really just
trying to replace the string "( )" with " or ", a call to
String.Replace would mean I didn't need to look anything up.


Obviously, you didn't need to look this one up either - as you were correct.
It is just grouping a blank.


I have had to look it up if you hadn't been answering the question
though. Why make the code harder to understand in the first place? If
you want to replace a space with " or ", just use
keywords = keywords.Replace (" ", " or ");
Much more straightforward.
But writing objects and the objects themselves are not easily readable.
But
you would advocate not writing them, would you?


No, but I don't see how that's relevant.


Just that you don't want to Regex as it is not easily readable. Neither are
Regex.


Eh?
But the fact a junior programmer might not understand Objects as you do
would not prevent you from writing them, would you?
When using C#, one has to use objects. I will almost always try to
implement the simplest solution to a problem, unless there is a
compelling reason to use a more complex solution. That way, anyone
reading the code has to learn relatively little "extra" stuff beyond
the language itself.
But if you are using .Net, it is part of the mix.


It's not necessarily part of the mix I have to use.


You don't have to use lots of things. That doesn't make them invalid.
Neither is the fact that you use Foreach vs For {}. They are there and are
part of the mix as is Regex.


No, they really aren't. for and foreach are well-defined in the C#
language specification. If the program is in C# to start with, it is
reasonable to assume competency in C# on the part of the reader of the
code. It is *not* reasonable to assume competency in regular
expressions, and while that wouldn't prevent me from using regular
expressions where they provide value, they just *don't* here.
I might agree with you more if Regex were some
component that you picked up and added. Or if Regex were some obscure
technique that few knew about. They have been around for quite a long time
and is just another gun in your arsenal. If I thought that MS were
deprecating it, I would also think twice about using it. But it is part of
.Net that all the languages can make use of and I would never tell a
programmer, who may be really comfortable with it and uses it responsibly
(not obscure cryptic non-commented code), that he should be using IndexOf
instead.
Clearly not, as you seem to be keen on using them instead of simple
string manipulations all over the place - if I saw anyone using regular
expressions rather than String.Replace in the way you've shown in other
code posts, that code would never get through code review.
I suspect *very*
few programs don't do any string manipulation - knowing the string
methods well is *far* more fundamental to .NET programming than knowing
regular expressions.


I agree with part of that and think that regular expressions are just as
important to know.


Why? I'm working on a fairly large project which hasn't needed to use
regular expressions and wouldn't have benefitted from them once. I
suspect many people could say the same thing. I suspect very few if any
of them could say the same thing about the basic string manipulation
methods - and yet you were surprised to see that one could call Replace
on the result of another Replace method call, which I'd consider a far
more "basic" level of understanding than knowledge of regular
expressions.
As we have been saying, it is here and many people use it, so to not
understand it is to limit yourself.
It's one thing to understand the general power of regular expressions,
so you would know when they may be applicable - it's another thing to
use them when they serve no purpose beyond what can be more simply
achieved with the simple String methods.
You don't have to use it, but you should at least understand the
basics of how it works. What are you going to do when someone uses a
RegularExpressionValidator and you don't understand what the
expression is?
At that point, if I didn't understand the regular expression, I'd look
it up in the documentation. Do you know every part of regular
expression syntax off by heart?
The fact that it is not C# (neither is a textbox, datagrid, etc),
doesn't mean you should understand them. Whether you use them is up
to you.

As you point out, you are not the only programmer and many programmers like
to use Regex and that doesn't make them any lesser programmers. What are
you going to when you run into their code?
If they're on my team, I'll tell them to refactor their code to only
use them when they're appropriate, frankly.
I see code all the time (much of the time it is mine) and wonder why the
programmer didn't do it another way. There are many ways to skin a cat.
Sometimes it is just style, sometimes it is all they know. But if they
follow whatever standards are setup (and in your case maybe you forbid
Regex) then as long as the code is well written and clean - I have no
problem with it.
If code uses regular expressions when they serve no purpose, it is
*not* well written and clean though - it is less maintainable than it
might be.
If you truly think that given two solutions which are otherwise equal,
the solution which is easiest to write, read and maintain doesn't win
hands down, we'll definitely never agree.


I agree there.

Which is easier to write is obviously your perception. I found my example,
as easy as yours to write and just as readable.


And you believe that everyone else does? Again, bear in mind that
you're unlikely to be the only person ever to read your code.
If you want to keep your hand in with respect to regular expressions,
do it in a test project, or with a regular expressions workbench. Keep
it out of code which needs to be read and maintained, probably by other
people who don't want to waste time because you wanted to keep your
skill set up to date.


Keep regular expressions out of my code?????

So now you are saying there is no use for it?


Not at all - I'm saying that you shouldn't put regular expressions in
your code just for the sake of keeping your hand in. Use them where
they're applicable, and only there.
I don't know all of the possible combinations of calls to every Object,
but that doesn't preclude me from using them.


Exactly - and you wouldn't go out of your way to use methods you don't
need, just to get into the habit of using them, would you?


Sure.

If it is valid. As I said there are many ways to skin ..., depending on the
situation I may do it one way and the next time another way. Gives me many
options. I don't do it willy nilly, as you seem to suggest, as a test
bench.


But that's *exactly* what you've suggested you should do with regular
expressions - use them even when there's no real purpose in doing so,
just so that you remember what they look like.
Absolutely - so why are you so keen on making people either memorise or
look up the characters which need escaping for regular expressions
every time they read or modify your code?


I am not. I don't memorize. But I still use it.


Okay, so you don't memorise it, which means you *do* have to look up
which characters require escaping. I think you've just admitted that
your code is less maintainable than mine.
I seem to be having difficulty making myself clear on this point: I
have never stated and will never state that you shouldn't use regular
expressions where they're appropriate. But they are *not* appropriate
in this case, as they are a more complex and less readable way of
solving the problem.


No you are very clear. If you are so concerned with others being able to
read your code and problems with escape characters - why would you EVER want
them to use them. You can't have it both ways.


I would use them when the solution which uses regular expressions is
clearer than the solution which doesn't use them. It seems a pretty
simple policy to me.
If they would have a hard time with a nothing expression like "if
(Regex.IsMatch(myString, @"something1|something2|something3"))" - they are
never going to get some of the of the other standard Regex solutions I
mentioned before.
Those maintaining the code could no doubt understand it after looking
at it for a little while, just like they could work out your other
regular expressions after looking at them and consulting the
documentation - but why are you trying to make their jobs harder? Why
are you not concerned that the code you're writing is costing your
company money by making it harder to maintain than it needs to be?
As you said, the two solutions are equal. Your solution is that you MUST go
with IndexOf. Mine is you can use either.
Well, they're equal in terms of their semantics. They're definitely not
equal in terms of maintainability, and as that's important to me, I
don't see what's wrong with saying that I'm very strongly in favour of
avoiding the less readable/maintainable code.
Show me a problem where the regex way of solving it is simpler than
using simple string operations (and there are plenty of problems like
that) and I'll plump for the regex in a heartbeat.
If I happen to know a good way in Regex to solve a problem, I am not
going use *extra brainpower* to try to solve the problem in C#.


In what way is using the method which is designed for *precisely* the
task in hand (finding something in a string) using extra brainpower?


I wasn't referring to this particular issue when I said this.


It would have been nice if you'd indicated that. Do you agree then that
it doesn't actually take any more brainpower to come up with
String.IndexOf instead of Regex.IsMatch, but in fact it takes *less*
brainpower when it comes to maintaining the IndexOf solution?
If
you're not familiar with String.IndexOf, you've got *much* bigger
things to worry about than whether or not your regular expression
skills are getting rusty.


I never said I was not familier with IndexOf.

As a matter of fact, the original question was given whether you could "do a
search for more that one string in another string".


And of course the answer is "yes, by calling IndexOf multiple times".
************************************************** **************
Can you do a search for more that one string in another string?

Something like:

someString.IndexOf("something1","something2","some thing3",0)

or would you have to do something like:

if ((someString.IndexOf("something1",0) >= 0) ||
((someString.IndexOf("something2",0) >= 0) ||
((someString.IndexOf("something3",0) >= 0))
{
Do something
}
************************************************** *************************
IndexOf doesn't do it. This was the original question. You have to do
multiple calls as is said in the original question. Nicholas was correct in
his assessment. One Regex call would work.
Yes, as would a single call to a method which called IndexOf on the
string multiple times. I disagree with you - Nicholas wasn't correct in
his assessment, as he claimed that the "best bet" would be to use a
regular expression. Using regular expressions is just *not* the best
bet here - it requires more effort, as I've described repeatedly.
Okay - now suppose I need to change it from searching for "something1"
to "something.1" or "something[1]". How long does it take to change in
each version? How easy is it to read afterwards?


That wasn't the question.


Are you suggesting that maintainability isn't something that should be
considered? Do you *really* want to look for "something1",
"something2" and "something3" or were they (as I suspect) just
examples, and the real values could easily have dots, brackets etc in?
What if you wanted to change "something1" to "something\". Same problem.
Well, half the problem with IndexOf than it is with regular
expressions. With regular expressions, you'd need to know that not only
does backslash need escaping in C#, it also needs escaping in regular
expressions.

IndexOf: "something\\" or @"something\"
Regex: "something\\\\" or @"something\\"

Once again, the IndexOf version is easier to understand - there's less
to mentally unescape to work out what's actually being asked for.
And if escapes were a problem (if it were me) I would have a little sheet
that showed them at my desk within easy reach.
Whereas by needing to know less (just the C# escapes) it's really easy
to memorise everything I need to know to solve this situation.
So I am at a loss as to how this regular expression is more unreadable
than
the C# counterpart. That is not to say that you couldn't make it more
unreadable - but you could do the same with C# if you wanted to.


You could start by making the C# more readable, as I've shown...


As you can with Regular Expressions.


Well, Oliver Sturm has shown a more readable version, but you seem to
be keen on the "put them all in the same line" version.

Neither is as readable as the String.IndexOf version, however.
However, the regex is already less readable:
1) It's got "|" as a "magic character" in there.


| = or (same as C)


Yup, but it's something that isn't used in string literals other than
for regular expressions. It's an extra thing to bear in mind
unnecessarily.
2) It's got all the strings concatenated, so it's harder to spot each
of them separately.


You are kidding, right?


Absolutely not! It's significantly easier to spot the three separate
values when they're three separate strings than when they're all mashed
together.
Furthermore, suppose you didn't just want to search for literals -
suppose one of the strings you wanted to search for was contained in a
variable. How sure are you that *no-one* on your team would use:

x+"|something2|something3"

as the regular expression?


You are now leaving the original question. I never said that Regular
Expressions was the better (or not better) in all cases.


While I'm leaving the exact original question, it's far from out of the
question that the original code wouldn't need to be changed to use a
variable to be searched for some time. At that point, can you guarantee
that your team would get it right? They'd need to be on their guard
when using regular expressions - they wouldn't need to be on their
guard using IndexOf.
I would tell programmers on my team not to use regular expressions
where the alternative is simpler and more readbale, yes.


Why use them at all? It isn't readable.


They aren't as readable *in this case*. In other, more complicated
situations, the version which only used IndexOf would be harder to read
than the regular expression version.

Using a regular expression is like getting a car compared with walking
somewhere - it's absolutely the right thing to do when you're going on
a long journey, but in this case you're advocating getting in a car
just to travel to the next room. It's simpler to walk.
And if your programmers can't maintain the simple Regexs, they definately
won't be able to handle the more complicated ones.
You seem to fail to grasp the "make it as simple as possible" concept.
It's not a case of maintenance engineers being idiots - it's about
presenting them with fewer possible risks. Why leave them a trap to
fall into when you can write simpler code which is easier to change
later on?
You've shown nothing of the kind - whereas I think I've given plenty of
examples of how using regular expressions make the code less easily
maintainable, even if you consider it equally readable to start with
(which I don't).


Not in this specific case. I was never maintaining or pushing Regex for all
or any situations.


But you're pushing for regular expressions in *this* situation, or at
least saying it's just as good as using IndexOf. You've also shown in
your other code that you use regular expressions unnecessarily for
replacement, making a simple two-step replacement into a complicated
single-step replacement where the number of characters which *aren't*
just plain text is greater than the number of characters which are.
But I am not going to force my programmers to come to me to find out whether
or not Regex is the easiest way or not. That is up to the programmer. If
there is a problem with their code and feel the programmer is way off base
in his coding we would talk about (that would be the case with his C#, VB or
Regex code).
Using regular expressions in this case *is* a problem with their code,
IMO. It's just asking for trouble later on.
Yes, I can read either too. The point is that in reading my version, I
didn't need to wade through various special characters, understanding
exactly what was there for.


If you knew enough to know about Regex at all (which you said you would have
no problem with in some situations - so the programmers better be able to
read it), there should not be a problem with the 2 special characters which
is the same as C#. There is nothing obscure in this example - that I can
see.


Of course there is - to work out what's going on, you've got to
mentally unescape the dollar and the comma, but *not* mentally unescape
the |. All that rather than just "replace dollar with space, replace
comma with space" in a simple form with no hidden meanings to anything.
Of course, your version wasn't even valid
C#, as it didn't escape the backslashes and you didn't specify a
verbatim literal. I assume it was originally VB.NET. I wonder which
version would be easier to convert to valid C#? Mine, perhaps?


Actually, it was VB.Net.


Right. So in the C#, you'd either have to have more escapes, or make
them verbatim literals. More stuff to get right. Note how no escaping
at all is required in my version.
And in all of those cases, regular expressions are really useful.


But according to you, you shouldn't use them as some of the programmers may
not be able to maintain it.


<sigh> If you actually believe that, you haven't been reading what I've
been writing.
Definately if they would have a problem with our example.

Can't have it both ways. If you allow Regular Expressions, you shouldn't
have a problem if a programmer used the Regex or IndexOf in our example.
Anyone maintaining the "USEFUL" ones would have zero problems with this one.
How very black and white of you. Do you really have no concept of
someone being able to understand something, but having a harder time
understanding it one way than the other?
Which are very well documented and when there are a myiad of ways a
user can put input these types of data, I prefer to use Regular
expressions which are all over the place (easy to find) then try to
come put with some complex set of loops and temporary variables which
make it far easier to make a mistake and much more unreadable the the
Regex equivelant.


Where exactly are the complex loops and temporary variables in this
specific case? After all, you have been arguing for using regular
expressions in *this specific case*, haven't you?


I was obviously talking about Regular Expressions in general here as I was
refering to the standard ones you can get anywhere dealing with (Phone
numbers, credit card etc). There would be none in this case, obviously.
But there may be in more complicated cases.


Yes - the complicated cases where I've already said that regular
expressions are useful!
You already need to know that when writing C# though - my use of
String.IndexOf doesn't add to the volume of knowledge required.


It is still an issue.


Yes, it's still going to be harder to search for "some\thing" than
"something". However, it's *not* going to be harder to search for
"some.thing", or "(something)", or "[something]", or "some,thing", or
"some*thing" or "some+thing" etc. Furthermore, there's still going to
be less to remember when you *are* faced with searching for
"some\thing" than there would be using regular expressions.
Just as the Regular expressions are. And again, if
you are going to allow Regex at all, you would still need to know about the
escapes.
You'd need to know about the escapes where regular expressions are
used. The fewer places they're used, the fewer times someone will need
to look them up in the documentation.
Just because they're as readable *to you* doesn't mean they're as
readable to everyone. How sure are you that the next engineer to read
this code will be familiar with regular expressions? How sure are you
that when you need to change it to look for a different string, you'll
check whether any of the characters need to be escaped? Why would you
even want to force that check on yourself?


Again - then don't allow them at all.


No, just allow them where they make sense. Note that if you only use
them where they're going to be doing something fairly involved, it's
much less likely that an engineer will forget that he's actually
dealing with a regular expression than with a simple string.
When there's no good reason not to, absolutely.


I guess that is where we disagree.


It certainly sounds like it.
I am not going to code to the level of a junior programmer. I prefer
that
he learn to code to a higher level.


Learning to solve problems as simply as possible *is* learning to code
to a higher level.


No argument there.


But regular expressions are by their very nature more complicated than
a simple String.IndexOf call. If they weren't they wouldn't be as
powerful as they are.
If it's not the simplest code for the situation, it's not well written
IMO. If it introduces risk for no reward (the risk of maintenance
failing to notice that they might need to escape something, versus no
reward) then it's not well written.


I see no risk in the example we are talking about. At least, no more that
in the IndexOf solution (in this situation).


You don't think there's any risk that someone will forget one of the
regular expression characters which needs escaping? There is no string
you could need to search for which needs *less* escaping in regular
expressions than with String.IndexOf, but there are *lots* of strings
which need more escaping - thus there's more overall risk.
I bet if I showed my code to a random sample of a hundred C# developers
and asked them to change it to search for "hello[there]", virtually all
of them would get it right. I also bet that if I showed your code to
them and asked them for the same change, some would fail to escape it
appropriately. Do you disagree?


No. But then the same developers would have a problem with the more
complicated expressions you claim is useful.


Actually, the fact that they were presented with a complicated
expression would immediately make them wary, I suspect. Problems tend
to creep in when something *looks* simpler than it actually is - as is
the case here.
Both are ways of finding the value of a property. The first is harder
to maintain and harder to read, just like your use of regular
expressions in this instance. Now, which of the above snippets of code
would you use, and why?


Since I am not sure why you would use the first, I would do the 2nd.


You'd use the first to keep up your knowledge of reflection, of course.
After all, if you don't use it, you lose it, right? That's your
argument for using regular expressions where they're completely
unnecessary and provide no benefit, after all.
But in our case, I would still use either - as I see the Regex version as
easy as the IndexOf.


I think we'll have to agree to disagree. You seem to be unable to grasp
the idea that there are more potential pitfalls and more knowledge
required for the regular expression version than for the IndexOf
version.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #29

P: n/a
I'm back.

Was a little busy and didn't have time to respond.

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
tshad <ts**********@ftsolutions.com> wrote:
> In plenty of places. It has a language with a defined syntax etc.
Yes, but so are dolphin sounds.

When I talk about a Programming Language - I am talking about a
Procedural
Language (C, Fortran, VB, Pascal, etc.).


So you wouldn't regard LISP as a programming language, just because
it's functional rather than procedural?

I don't know much about LISP, but Mathematics is also a language, but not
the same way as English and German are.
Of course, you didn't even specify "programming language" before.
True.

But I did specify, that it depends on how you define it.

Regular expressions form a language in computing, and that language
needs to be learned before being used, just as any other language does,
whether it's C#, HTML, XPath or VB.NET.

OK
> The C# code in question would be:
>
> if (someVariable.IndexOf ("firstliteral") != -1 ||
> someVariable.IndexOf ("secondliteral") != -1 ||
> someVariable.IndexOf ("thirdliteral") != -1)
>


And the Regex version:

if (Regex.IsMatch(myString, @"something1|something2|something3"))


Right. Immediately the IndexOf value is more readable, by more clearly
separating the three separate strings which are being searched on.
(Oliver Sturm's version is more readable than that

I assume you mean "if (Regex.IsMatch(myString, @"something[123]"))".

But actually they are both Olivers.

I don't agree there. I think the Regex is just as readable, as long as you
have a bit of Regular Expression understanding, obviously. I also think
that if you understand C and didn't understand Regex - you would get what it
is saying (IsMatch is pretty much of a giveaway). Much than if you didn't
understand C and so the IndexOf - which doesn't really telling you what it
is doing. IsMatch is much more understandable term than IndexOf.
> Unless there's another compelling argument in favour of one tool or
> another, readability is a very important part of choosing the best
> tool.
Again, why do I need a compelling reason. If I have the solution and it
happens to be Regex, I would use it, I wouldn't necessarily say to
myself -
"Is there perhaps a more readable way to write this? I wonder if Jim
will
be able to read this or not."


Then I'm afraid that's your problem. It sounds like you're basically
admitting that you're not that interested in readability. Personally, I
like writing code which is elegant but easy to maintain. Having *a*
solution which happens to work isn't enough when there are obviously
others available which could well be simpler.

I never said that.

I never said readability is not an issue, but I am not going to write "Cat
in the Hat" instead of a novel so that the programmers with the simplest of
experience can read it. But I am not going to write cryptic code either so
they can't read it.

I assume there are company standards to program by and I would follow that.
Having *a*
solution which happens to work isn't enough when there are obviously
others available which could well be simpler.
I am not writing simple code, I am writing code to handle a problem. I
prefer to write good code not simple code. Sometimes they are synonymous,
sometimes they aren't.

But in our case, I still them as equally readable.
Far more time is spent maintaining code than writing it in the first
place. Taking the attitude you take above just isn't cost-effective in
the long run.
Don't agree there.
> But you're effectively pushing them in the situation described by the
> OP when you say that the solution using regular expressions is as
> readable as the solution without.
No.

No pushing. No more than your pushing not using it.


But I'll readily admit to pushing the (IMO simpler) solution, for this
particular situation. So are you actually admitting that you *are*
pushing the use of regular expressions here?

In your opinion (as you say).

And you obviously are not listening. I am not pushing either side. I have
been saying over and over that in this situation, they are the same (IMO).
I am not pushing Regex nor am I ruling them out. You however, can't make up
your mind. One minute you say that something as simple as the example we
are using is too complex for a programmer and then proceed to say that you
would use Regex in other situations (which would have to be more
complicated), makes no sense.
>> ' The following replaces all multiple blanks with " ". It then takes
>> ' out the anomalies, such as "and not and" and replaces them with
>> "and"
>>
>> keywords = trim(Regex.Replace(keywords, "\s{2,}", " "))
>> keywords = Regex.Replace(keywords, "( )", " or ")
>> keywords = Regex.Replace(keywords," or or "," ")
>> keywords = Regex.Replace(keywords,"or and or","and")
>> keywords = Regex.Replace(keywords,"or near or","near")
>> keywords = Regex.Replace(keywords,"and not or","and not")
>>
>> Fairly straight forward and easy to follow.
>
> Reasonably, although apart from the first regex, I'd suggest doing the
> rest with straight calls to String.Replace. As an example of why I
> think that would be more readable, what exactly do the second line do?


Actually, nothing. It is grouping a " ", which isn't necessary. I think
I
used to have something else there and took it out and didn't realize I
didn't need the ().


So again, the code could be made more readable even by just modifying
the existing regex replacement, let alone by replacing the regular
expressions with simple String.Replace calls. Had they been
String.Replace calls, the meaning of the second line would have been
unambiguous - you'd have had to write it the simple way to start with.

I am not saying there may not be other ways to write the code. As I said, I
often rewrite my own code later as I see a way I like better that I may not
have thought of at the time I wrote it. Many times it isn't better code,
just different.
Note that your first replacement will replace two tabs with a single
space, but leave one tab alone, by the way. It would be better to
replace "\s+" with the space, IMO.
Probably true. I am not a Regex expert. That was what I came up with at
the time.
> In some flavours of regular expressions, brackets form capturing
> groups. Do they in .NET? I'd have to look it up. If it's really just
> trying to replace the string "( )" with " or ", a call to
> String.Replace would mean I didn't need to look anything up.
Obviously, you didn't need to look this one up either - as you were
correct.
It is just grouping a blank.


I have had to look it up if you hadn't been answering the question
though. Why make the code harder to understand in the first place? If
you want to replace a space with " or ", just use
keywords = keywords.Replace (" ", " or ");
Much more straightforward.

Even in C, which I have used for years, I have to look up parameters to make
sure I have the right parameters and have them in the right order.

As I said, the Parens were probably a mistake and may have made some changes
to the line and left the parens in. I agree yours is the correct one.
>> But writing objects and the objects themselves are not easily
>> readable.
>> But
>> you would advocate not writing them, would you?
>
> No, but I don't see how that's relevant.


Just that you don't want to Regex as it is not easily readable. Neither
are
Regex.


Eh?

Must have had a little brain fade there. Not sure what I was saying.
But the fact a junior programmer might not understand Objects as you do
would not prevent you from writing them, would you?


When using C#, one has to use objects. I will almost always try to
implement the simplest solution to a problem, unless there is a
compelling reason to use a more complex solution. That way, anyone
reading the code has to learn relatively little "extra" stuff beyond
the language itself.


That isn't the point.

We are talking readability here. So don't write any objects. You can use
the ones you need to, but if you write objects and someone has to maintain
it, it could be a problem if he doesn't understand objects.

You can write the same code in straight C to do what objects do. We got
along fine before there were objects. So I think, based on your statements,
you should write the easier code that some very junior programmer might have
to read.
>> But if you are using .Net, it is part of the mix.
>
> It's not necessarily part of the mix I have to use.
You don't have to use lots of things. That doesn't make them invalid.
Neither is the fact that you use Foreach vs For {}. They are there and
are
part of the mix as is Regex.


No, they really aren't. for and foreach are well-defined in the C#
language specification. If the program is in C# to start with, it is
reasonable to assume competency in C# on the part of the reader of the
code. It is *not* reasonable to assume competency in regular
expressions, and while that wouldn't prevent me from using regular
expressions where they provide value, they just *don't* here.


But I am not writing in C# only. I am writing in .Net.
I might agree with you more if Regex were some
component that you picked up and added. Or if Regex were some obscure
technique that few knew about. They have been around for quite a long
time
and is just another gun in your arsenal. If I thought that MS were
deprecating it, I would also think twice about using it. But it is part
of
.Net that all the languages can make use of and I would never tell a
programmer, who may be really comfortable with it and uses it responsibly
(not obscure cryptic non-commented code), that he should be using IndexOf
instead.
Clearly not, as you seem to be keen on using them instead of simple
string manipulations all over the place - if I saw anyone using regular
expressions rather than String.Replace in the way you've shown in other
code posts, that code would never get through code review.

Obviously, you micro manage more than I.

If you would have a problem with our examples, I don't think I would like to
work in your team.

In my area, if your code is reasonable and well written and it follows our
standards, it's fine.
>I suspect *very*
> few programs don't do any string manipulation - knowing the string
> methods well is *far* more fundamental to .NET programming than knowing
> regular expressions.


I agree with part of that and think that regular expressions are just as
important to know.


Why?


Because they are perfectly valid and as you said before there are some that
are useful (therefore, you should know them as someone might use them and
you may have to maintain it).
I'm working on a fairly large project which hasn't needed to use
regular expressions and wouldn't have benefitted from them once.
That's your style and position, but may not be someone else's.
I suspect many people could say the same thing. I suspect very few if any
of them could say the same thing about the basic string manipulation
methods - and yet you were surprised to see that one could call Replace
on the result of another Replace method call, which I'd consider a far
more "basic" level of understanding than knowledge of regular
expressions.
As we have been saying, it is here and many people use it, so to not
understand it is to limit yourself.
It's one thing to understand the general power of regular expressions,
so you would know when they may be applicable - it's another thing to
use them when they serve no purpose beyond what can be more simply
achieved with the simple String methods.
You don't have to use it, but you should at least understand the
basics of how it works. What are you going to do when someone uses a
RegularExpressionValidator and you don't understand what the
expression is?


At that point, if I didn't understand the regular expression, I'd look
it up in the documentation. Do you know every part of regular
expression syntax off by heart?


According to your position, you should ban them altogether for ANY use,
since you can do anything in C# you can do in Regex.
The fact that it is not C# (neither is a textbox, datagrid, etc),
doesn't mean you should understand them. Whether you use them is up
to you.

As you point out, you are not the only programmer and many programmers
like
to use Regex and that doesn't make them any lesser programmers. What are
you going to when you run into their code?
If they're on my team, I'll tell them to refactor their code to only
use them when they're appropriate, frankly.


Appropriate as defined by you. Why allow them at all?
I see code all the time (much of the time it is mine) and wonder why the
programmer didn't do it another way. There are many ways to skin a cat.
Sometimes it is just style, sometimes it is all they know. But if they
follow whatever standards are setup (and in your case maybe you forbid
Regex) then as long as the code is well written and clean - I have no
problem with it.
If code uses regular expressions when they serve no purpose, it is
*not* well written and clean though - it is less maintainable than it
might be.

They serve a purpose. They do the same as your string routines, so there is
a pupose. Both are string handling routines.
> If you truly think that given two solutions which are otherwise equal,
> the solution which is easiest to write, read and maintain doesn't win
> hands down, we'll definitely never agree.


I agree there.

Which is easier to write is obviously your perception. I found my
example,
as easy as yours to write and just as readable.


And you believe that everyone else does? Again, bear in mind that
you're unlikely to be the only person ever to read your code.


So you should never EVER use Regex. Someone else might read your code.

This is going in circles.

As I said, I would have a problem with someone who couldn't figure out what
the example we were using was doing.
> If you want to keep your hand in with respect to regular expressions,
> do it in a test project, or with a regular expressions workbench. Keep
> it out of code which needs to be read and maintained, probably by other
> people who don't want to waste time because you wanted to keep your
> skill set up to date.
Keep regular expressions out of my code?????

So now you are saying there is no use for it?


Not at all - I'm saying that you shouldn't put regular expressions in
your code just for the sake of keeping your hand in. Use them where
they're applicable, and only there.


There either is a use or not. You can't say there is a use for it and then
brow beat a programmer because he happens to like to use it. Has a
programmer got to come to you each time he wants to use it to get your
permission.

I can see it if he writes some obscure cyptic Regular Expression - but come
on.
>> I don't know all of the possible combinations of calls to every
>> Object,
>> but that doesn't preclude me from using them.
>
> Exactly - and you wouldn't go out of your way to use methods you don't
> need, just to get into the habit of using them, would you?
Sure.

If it is valid. As I said there are many ways to skin ..., depending on
the
situation I may do it one way and the next time another way. Gives me
many
options. I don't do it willy nilly, as you seem to suggest, as a test
bench.


But that's *exactly* what you've suggested you should do with regular
expressions - use them even when there's no real purpose in doing so,
just so that you remember what they look like.


Sure.

If they are both perfectly valid, I might. Depends on my mood (you should
really have a problem with that). :)
> Absolutely - so why are you so keen on making people either memorise or
> look up the characters which need escaping for regular expressions
> every time they read or modify your code?
I am not. I don't memorize. But I still use it.


Okay, so you don't memorise it, which means you *do* have to look up
which characters require escaping. I think you've just admitted that
your code is less maintainable than mine.


No.

I can maintain my car, but I might still have to look up specs on it.
> I seem to be having difficulty making myself clear on this point: I
> have never stated and will never state that you shouldn't use regular
> expressions where they're appropriate. But they are *not* appropriate
> in this case, as they are a more complex and less readable way of
> solving the problem.
No you are very clear. If you are so concerned with others being able to
read your code and problems with escape characters - why would you EVER
want
them to use them. You can't have it both ways.


I would use them when the solution which uses regular expressions is
clearer than the solution which doesn't use them. It seems a pretty
simple policy to me.


If they are not readable, you shouldn't use them at all. I personally think
they are both readable, in this case.
If they would have a hard time with a nothing expression like "if
(Regex.IsMatch(myString, @"something1|something2|something3"))" - they
are
never going to get some of the of the other standard Regex solutions I
mentioned before.
Those maintaining the code could no doubt understand it after looking
at it for a little while, just like they could work out your other
regular expressions after looking at them and consulting the
documentation - but why are you trying to make their jobs harder? Why
are you not concerned that the code you're writing is costing your
company money by making it harder to maintain than it needs to be?


Again, then you feel there is no place for Regex as you can do anything with
C# that you can do with Regex. As you say, it will always be harder to
read.
As you said, the two solutions are equal. Your solution is that you MUST
go
with IndexOf. Mine is you can use either.
Well, they're equal in terms of their semantics. They're definitely not
equal in terms of maintainability, and as that's important to me, I
don't see what's wrong with saying that I'm very strongly in favour of
avoiding the less readable/maintainable code.

I didn't say that.
> Show me a problem where the regex way of solving it is simpler than
> using simple string operations (and there are plenty of problems like
> that) and I'll plump for the regex in a heartbeat.
>
>> If I happen to know a good way in Regex to solve a problem, I am not
>> going use *extra brainpower* to try to solve the problem in C#.
>
> In what way is using the method which is designed for *precisely* the
> task in hand (finding something in a string) using extra brainpower?


I wasn't referring to this particular issue when I said this.


It would have been nice if you'd indicated that. Do you agree then that
it doesn't actually take any more brainpower to come up with
String.IndexOf instead of Regex.IsMatch, but in fact it takes *less*
brainpower when it comes to maintaining the IndexOf solution?

In this case, no. In other cases, could be. Would have to look at it. I
never said that Regex is the best thing out there. I was just saying that
it is valid and can be readable - can also be cryptic (as can C#).
>If
> you're not familiar with String.IndexOf, you've got *much* bigger
> things to worry about than whether or not your regular expression
> skills are getting rusty.


I never said I was not familier with IndexOf.

As a matter of fact, the original question was given whether you could
"do a
search for more that one string in another string".


And of course the answer is "yes, by calling IndexOf multiple times".


That wasn't the question asked. That was the example that was given and the
question was can you do it in one statement.

So the answer is no, using IndexOf.
************************************************** **************
Can you do a search for more that one string in another string?

Something like:

someString.IndexOf("something1","something2","some thing3",0)

or would you have to do something like:

if ((someString.IndexOf("something1",0) >= 0) ||
((someString.IndexOf("something2",0) >= 0) ||
((someString.IndexOf("something3",0) >= 0))
{
Do something
}
************************************************** *************************
IndexOf doesn't do it. This was the original question. You have to do
multiple calls as is said in the original question. Nicholas was correct
in
his assessment. One Regex call would work.
Yes, as would a single call to a method which called IndexOf on the
string multiple times. I disagree with you - Nicholas wasn't correct in
his assessment, as he claimed that the "best bet" would be to use a
regular expression. Using regular expressions is just *not* the best
bet here - it requires more effort, as I've described repeatedly.


No, he was correct in his answer to the question. The question was never
"Which is better", but can you do it . And you can do a method which called
IndexOf multiple times. But then it isn't one line, is it?
> Okay - now suppose I need to change it from searching for "something1"
> to "something.1" or "something[1]". How long does it take to change in
> each version? How easy is it to read afterwards?


That wasn't the question.


Are you suggesting that maintainability isn't something that should be
considered? Do you *really* want to look for "something1",
"something2" and "something3" or were they (as I suspect) just
examples, and the real values could easily have dots, brackets etc in?


I don't really remember what the context was originally. But I know they
didn't have dots and brackets in it.
What if you wanted to change "something1" to "something\". Same problem.
Well, half the problem with IndexOf than it is with regular
expressions. With regular expressions, you'd need to know that not only
does backslash need escaping in C#, it also needs escaping in regular
expressions.

IndexOf: "something\\" or @"something\"
Regex: "something\\\\" or @"something\\"

Once again, the IndexOf version is easier to understand - there's less
to mentally unescape to work out what's actually being asked for.

Splitting hairs, now. Both are the same, as far as I can see (here).
And if escapes were a problem (if it were me) I would have a little sheet
that showed them at my desk within easy reach.


Whereas by needing to know less (just the C# escapes) it's really easy
to memorise everything I need to know to solve this situation.

That's true, but then you would only know C#. And if that is your aim.
That's fine.
>> So I am at a loss as to how this regular expression is more unreadable
>> than
>> the C# counterpart. That is not to say that you couldn't make it more
>> unreadable - but you could do the same with C# if you wanted to.
>
> You could start by making the C# more readable, as I've shown...


As you can with Regular Expressions.


Well, Oliver Sturm has shown a more readable version, but you seem to
be keen on the "put them all in the same line" version.

Neither is as readable as the String.IndexOf version, however.
> However, the regex is already less readable:
> 1) It's got "|" as a "magic character" in there.


| = or (same as C)


Yup, but it's something that isn't used in string literals other than
for regular expressions. It's an extra thing to bear in mind
unnecessarily.

No room for it, huh?
> 2) It's got all the strings concatenated, so it's harder to spot each
> of them separately.


You are kidding, right?


Absolutely not! It's significantly easier to spot the three separate
values when they're three separate strings than when they're all mashed
together.
> Furthermore, suppose you didn't just want to search for literals -
> suppose one of the strings you wanted to search for was contained in a
> variable. How sure are you that *no-one* on your team would use:
>
> x+"|something2|something3"
>
> as the regular expression?


You are now leaving the original question. I never said that Regular
Expressions was the better (or not better) in all cases.


While I'm leaving the exact original question, it's far from out of the
question that the original code wouldn't need to be changed to use a
variable to be searched for some time. At that point, can you guarantee
that your team would get it right? They'd need to be on their guard
when using regular expressions - they wouldn't need to be on their
guard using IndexOf.

Right. No one makes mistakes with IndexOf.
> I would tell programmers on my team not to use regular expressions
> where the alternative is simpler and more readbale, yes.


Why use them at all? It isn't readable.


They aren't as readable *in this case*. In other, more complicated
situations, the version which only used IndexOf would be harder to read
than the regular expression version.


But your problem was that it would be hard for other programmers to read.
If they can read your more complicated version, this one should be easy.
Using a regular expression is like getting a car compared with walking
somewhere - it's absolutely the right thing to do when you're going on
a long journey, but in this case you're advocating getting in a car
just to travel to the next room. It's simpler to walk.
And if your programmers can't maintain the simple Regexs, they definately
won't be able to handle the more complicated ones.
You seem to fail to grasp the "make it as simple as possible" concept.
It's not a case of maintenance engineers being idiots - it's about
presenting them with fewer possible risks. Why leave them a trap to
fall into when you can write simpler code which is easier to change
later on?

No.

I just find it as simple, in this case and you don't.
> You've shown nothing of the kind - whereas I think I've given plenty of
> examples of how using regular expressions make the code less easily
> maintainable, even if you consider it equally readable to start with
> (which I don't).


Not in this specific case. I was never maintaining or pushing Regex for
all
or any situations.


But you're pushing for regular expressions in *this* situation, or at
least saying it's just as good as using IndexOf. You've also shown in
your other code that you use regular expressions unnecessarily for
replacement, making a simple two-step replacement into a complicated
single-step replacement where the number of characters which *aren't*
just plain text is greater than the number of characters which are.


No. Not pushing. But think they are equivelant in this case. As you said
earlier, I am sure others would disagree. But I don't think that the
difference is significant enough, in this case, even if I were to agree on
which is easier, to preclude it.
But I am not going to force my programmers to come to me to find out
whether
or not Regex is the easiest way or not. That is up to the programmer.
If
there is a problem with their code and feel the programmer is way off
base
in his coding we would talk about (that would be the case with his C#, VB
or
Regex code).
Using regular expressions in this case *is* a problem with their code,
IMO. It's just asking for trouble later on.
> Yes, I can read either too. The point is that in reading my version, I
> didn't need to wade through various special characters, understanding
> exactly what was there for.


If you knew enough to know about Regex at all (which you said you would
have
no problem with in some situations - so the programmers better be able to
read it), there should not be a problem with the 2 special characters
which
is the same as C#. There is nothing obscure in this example - that I can
see.


Of course there is - to work out what's going on, you've got to
mentally unescape the dollar and the comma, but *not* mentally unescape
the |. All that rather than just "replace dollar with space, replace
comma with space" in a simple form with no hidden meanings to anything.
>Of course, your version wasn't even valid
> C#, as it didn't escape the backslashes and you didn't specify a
> verbatim literal. I assume it was originally VB.NET. I wonder which
> version would be easier to convert to valid C#? Mine, perhaps?


Actually, it was VB.Net.


Right. So in the C#, you'd either have to have more escapes, or make
them verbatim literals. More stuff to get right. Note how no escaping
at all is required in my version.
> And in all of those cases, regular expressions are really useful.


But according to you, you shouldn't use them as some of the programmers
may
not be able to maintain it.


<sigh> If you actually believe that, you haven't been reading what I've
been writing.
Definately if they would have a problem with our example.

Can't have it both ways. If you allow Regular Expressions, you shouldn't
have a problem if a programmer used the Regex or IndexOf in our example.
Anyone maintaining the "USEFUL" ones would have zero problems with this
one.


How very black and white of you. Do you really have no concept of
someone being able to understand something, but having a harder time
understanding it one way than the other?

Who?

The person who can understand Regex if complicated, but would be trashed
trying to figure out our little example.

Bit of a stretch there.
>> Which are very well documented and when there are a myiad of ways a
>> user can put input these types of data, I prefer to use Regular
>> expressions which are all over the place (easy to find) then try to
>> come put with some complex set of loops and temporary variables which
>> make it far easier to make a mistake and much more unreadable the the
>> Regex equivelant.
>
> Where exactly are the complex loops and temporary variables in this
> specific case? After all, you have been arguing for using regular
> expressions in *this specific case*, haven't you?


I was obviously talking about Regular Expressions in general here as I
was
refering to the standard ones you can get anywhere dealing with (Phone
numbers, credit card etc). There would be none in this case, obviously.
But there may be in more complicated cases.


Yes - the complicated cases where I've already said that regular
expressions are useful!


Just make sure the programmer that can't handle the easy Regex doesn't see
that one.
> You already need to know that when writing C# though - my use of
> String.IndexOf doesn't add to the volume of knowledge required. Can't have that !!!!
It is still an issue.


Yes, it's still going to be harder to search for "some\thing" than
"something". However, it's *not* going to be harder to search for
"some.thing", or "(something)", or "[something]", or "some,thing", or
"some*thing" or "some+thing" etc. Furthermore, there's still going to
be less to remember when you *are* faced with searching for
"some\thing" than there would be using regular expressions.
Just as the Regular expressions are. And again, if
you are going to allow Regex at all, you would still need to know about
the
escapes.


You'd need to know about the escapes where regular expressions are
used. The fewer places they're used, the fewer times someone will need
to look them up in the documentation.
> Just because they're as readable *to you* doesn't mean they're as
> readable to everyone. How sure are you that the next engineer to read
> this code will be familiar with regular expressions? How sure are you
> that when you need to change it to look for a different string, you'll
> check whether any of the characters need to be escaped? Why would you
> even want to force that check on yourself?


Again - then don't allow them at all.


No, just allow them where they make sense. Note that if you only use
them where they're going to be doing something fairly involved, it's
much less likely that an engineer will forget that he's actually
dealing with a regular expression than with a simple string.


Already dealt with.
> When there's no good reason not to, absolutely.


I guess that is where we disagree.


It certainly sounds like it.
>> I am not going to code to the level of a junior programmer. I prefer
>> that
>> he learn to code to a higher level.
>
> Learning to solve problems as simply as possible *is* learning to code
> to a higher level.


No argument there.


But regular expressions are by their very nature more complicated than
a simple String.IndexOf call. If they weren't they wouldn't be as
powerful as they are.

Write and vanilla C# is less complicated than writing objects, but we still
do them.
> If it's not the simplest code for the situation, it's not well written
> IMO. If it introduces risk for no reward (the risk of maintenance
> failing to notice that they might need to escape something, versus no
> reward) then it's not well written.


I see no risk in the example we are talking about. At least, no more
that
in the IndexOf solution (in this situation).


You don't think there's any risk that someone will forget one of the
regular expression characters which needs escaping? There is no string
you could need to search for which needs *less* escaping in regular
expressions than with String.IndexOf, but there are *lots* of strings
which need more escaping - thus there's more overall risk.
> I bet if I showed my code to a random sample of a hundred C# developers
> and asked them to change it to search for "hello[there]", virtually all
> of them would get it right. I also bet that if I showed your code to
> them and asked them for the same change, some would fail to escape it
> appropriately. Do you disagree?


No. But then the same developers would have a problem with the more
complicated expressions you claim is useful.


Actually, the fact that they were presented with a complicated
expression would immediately make them wary, I suspect. Problems tend
to creep in when something *looks* simpler than it actually is - as is
the case here.
> Both are ways of finding the value of a property. The first is harder
> to maintain and harder to read, just like your use of regular
> expressions in this instance. Now, which of the above snippets of code
> would you use, and why?


Since I am not sure why you would use the first, I would do the 2nd.


You'd use the first to keep up your knowledge of reflection, of course.
After all, if you don't use it, you lose it, right? That's your
argument for using regular expressions where they're completely
unnecessary and provide no benefit, after all.
But in our case, I would still use either - as I see the Regex version as
easy as the IndexOf.


I think we'll have to agree to disagree. You seem to be unable to grasp
the idea that there are more potential pitfalls and more knowledge
required for the regular expression version than for the IndexOf
version.


Agreed.

Tom
Nov 17 '05 #30

P: n/a
tshad <ts**********@ftsolutions.com> wrote:
When I talk about a Programming Language - I am talking about a
Procedural
Language (C, Fortran, VB, Pascal, etc.).
So you wouldn't regard LISP as a programming language, just because
it's functional rather than procedural?


I don't know much about LISP, but Mathematics is also a language, but not
the same way as English and German are.


Indeed.
Of course, you didn't even specify "programming language" before.


True.

But I did specify, that it depends on how you define it.


True.
Right. Immediately the IndexOf value is more readable, by more clearly
separating the three separate strings which are being searched on.
(Oliver Sturm's version is more readable than that


I assume you mean "if (Regex.IsMatch(myString, @"something[123]"))".


No, I mean:

Yes, where the string itself is separated onto three lines.
But actually they are both Olivers.

I don't agree there. I think the Regex is just as readable, as long as you
have a bit of Regular Expression understanding, obviously. I also think
that if you understand C and didn't understand Regex - you would get what it
is saying (IsMatch is pretty much of a giveaway). Much than if you didn't
understand C and so the IndexOf - which doesn't really telling you what it
is doing.
Yes it does, it's finding the index of one string within another.
IsMatch is much more understandable term than IndexOf.
The name is as understandable, but the exact semantics are *much* more
obscure. The name doesn't suggest that you can't just put a only in
there and expect it to only match a dot for instance, does it?
Again, why do I need a compelling reason. If I have the solution and it
happens to be Regex, I would use it, I wouldn't necessarily say to
myself -
"Is there perhaps a more readable way to write this? I wonder if Jim
will
be able to read this or not."


Then I'm afraid that's your problem. It sounds like you're basically
admitting that you're not that interested in readability. Personally, I
like writing code which is elegant but easy to maintain. Having *a*
solution which happens to work isn't enough when there are obviously
others available which could well be simpler.


I never said that.


You said that when you have a solution, you won't consider whether a
more readable way of writing it. To me, that demonstrates that you
don't care very much about readability.
I never said readability is not an issue, but I am not going to write "Cat
in the Hat" instead of a novel so that the programmers with the simplest of
experience can read it. But I am not going to write cryptic code either so
they can't read it.
If the "Cat in the Hat" does the job as well as the novel and is easier
to read, why on earth would you want to write the novel?
I assume there are company standards to program by and I would follow that.
There aren't usually company standards down to the level of when to use
regular expressions.
Having *a*
solution which happens to work isn't enough when there are obviously
others available which could well be simpler.


I am not writing simple code, I am writing code to handle a problem. I
prefer to write good code not simple code. Sometimes they are synonymous,
sometimes they aren't.


I disagree - simple code that works (as well as the more complicated
code) is always good. Note that this is in terms of implementation, not
design - there's sometimes a very simple but inelegant design which
ends up costing a lot more work in the long run. That's a different
matter.
But in our case, I still them as equally readable.
You still haven't said whether you see them as equally readable *and
maintainable* to others though.
Far more time is spent maintaining code than writing it in the first
place. Taking the attitude you take above just isn't cost-effective in
the long run.


Don't agree there.


With which bit? If you're going to disagree with the first sentence
quoted, we really don't have much basis for discussion. I thought it
was pretty much universally accepted these days that code almost always
spends more time in maintenance than in original coding. That's why I'm
always happy to spend a bit more time refactoring working code to make
it easier to maintain.
No pushing. No more than your pushing not using it.


But I'll readily admit to pushing the (IMO simpler) solution, for this
particular situation. So are you actually admitting that you *are*
pushing the use of regular expressions here?

In your opinion (as you say).

And you obviously are not listening. I am not pushing either side. I have
been saying over and over that in this situation, they are the same (IMO).


But that *is* pushing regular expressions from my point of view, where
they shouldn't be an option.

Consider an exaggerated equivalent situation. Suppose we were
discussing how to implement addition. Suppose I thought that just using
the expression x+y was the easiest way of doing things, and you thought
it was just as easy to write a remote web service which took two
integers. By *not* ruling out the more complex solution, you're
*effectively* pushing it - at least pushing it as an equally valid
option.
I am not pushing Regex nor am I ruling them out. You however, can't make up
your mind. One minute you say that something as simple as the example we
are using is too complex for a programmer and then proceed to say that you
would use Regex in other situations (which would have to be more
complicated), makes no sense.
<sigh> I don't know whether you're intentionally missing the point or
whether I'm genuinely not getting through.

There is always risk associated with changing code. When writing code,
you should try to reduce the risk that future changes will incur. That
means making the code as simple as possible, and easy to change.

In some cases a regular expression will be a lot simpler to read and
change than the equivalent "primitive string manipulation" code. Those
cases would usually be where the string manipulation involves several
steps, often nested loops etc. There, the complexity of regular
expressions (which is still there) is less than the complexity of the
primitive solution.

In this case, however, the primitive solution is very simple and
understandable. Changing it to search for a different string or an
extra string (or even a string passed in as a parameter) is trivial.
Changing the regular expression is not.
So again, the code could be made more readable even by just modifying
the existing regex replacement, let alone by replacing the regular
expressions with simple String.Replace calls. Had they been
String.Replace calls, the meaning of the second line would have been
unambiguous - you'd have had to write it the simple way to start with.

I am not saying there may not be other ways to write the code. As I said, I
often rewrite my own code later as I see a way I like better that I may not
have thought of at the time I wrote it. Many times it isn't better code,
just different.


In this case though, it *would* be better - it would be simpler to
understand, and simpler to write in the first place.

For instance, I wouldn't have had to consider whether the brackets were
doing something clever or not. I had to look up .NET regular
expressions just to check the meaning in this case. Do you really
believe that a solution which *doesn't* involve that extra thought
isn't better?
Note that your first replacement will replace two tabs with a single
space, but leave one tab alone, by the way. It would be better to
replace "\s+" with the space, IMO.


Probably true. I am not a Regex expert. That was what I came up with at
the time.


And that's part of the risk - that someone doesn't put enough effort
into the regex to get the *actually* desired behaviour. Where the
alternative is a complex solution, it makes a lot of sense to put
significant effort into getting the regex right. When you could do the
same thing with a few string operations, it's just not worth it.

(For this first line, a regex is probably the best way to go - but you
need to think about it more closely.)
I have had to look it up if you hadn't been answering the question
though. Why make the code harder to understand in the first place? If
you want to replace a space with " or ", just use
keywords = keywords.Replace (" ", " or ");
Much more straightforward.

Even in C, which I have used for years, I have to look up parameters to make
sure I have the right parameters and have them in the right order.


Usually intellisense can help you with that though - it *doesn't* start
explaining the details of regular expressions though.
As I said, the Parens were probably a mistake and may have made some changes
to the line and left the parens in. I agree yours is the correct one.
And if you weren't taking "use regular expressions" as your default
position, you wouldn't have made the mistake in the first place. The
first thing you should try to think of is the simplest one. You want to
manipulate a string, so ask yourself if there's anything in the string
class which does what you want.
But the fact a junior programmer might not understand Objects as you do
would not prevent you from writing them, would you?


When using C#, one has to use objects. I will almost always try to
implement the simplest solution to a problem, unless there is a
compelling reason to use a more complex solution. That way, anyone
reading the code has to learn relatively little "extra" stuff beyond
the language itself.


That isn't the point.


It may not be your point, but it's part of my point.
We are talking readability here. So don't write any objects. You can use
the ones you need to, but if you write objects and someone has to maintain
it, it could be a problem if he doesn't understand objects.
I'm assuming that "the solution uses .NET" is a given - in other words,
any maintenance engineer should know C# and the basics of .NET. To me
"the basics" don't include regular expressions and memorising all the
details of them. *Some* familiarity can be hoped for, but not knowing
all the constructs - so anything which requires that people *do* know
the regex constructs in order to change things is at a disadvantage.
You can write the same code in straight C to do what objects do. We got
along fine before there were objects. So I think, based on your statements,
you should write the easier code that some very junior programmer might have
to read.
No, we didn't "get along fine" before there were objects. C code is
typically far harder to read than OO code - and where it's not, that's
often because it's effectively written in a semi-OO way, just using
naming to indicate which type of object is being used (just without
polymorphism etc).
No, they really aren't. for and foreach are well-defined in the C#
language specification. If the program is in C# to start with, it is
reasonable to assume competency in C# on the part of the reader of the
code. It is *not* reasonable to assume competency in regular
expressions, and while that wouldn't prevent me from using regular
expressions where they provide value, they just *don't* here.


But I am not writing in C# only. I am writing in .Net.


So you would assume that everyone who is reading and maintaining your
code knows every class in the .NET framework? I don't.
Clearly not, as you seem to be keen on using them instead of simple
string manipulations all over the place - if I saw anyone using regular
expressions rather than String.Replace in the way you've shown in other
code posts, that code would never get through code review.

Obviously, you micro manage more than I.


Well, I code review, just as my peers code review. We almost always
find things which can be done better (which works even better when pair
programming). That doesn't indicate that we're not good developers -
just that an extra point of view is always helpful. It also stops us
from getting lazy and implementing something which is just "okay"
rather than as good as it should be.
If you would have a problem with our examples, I don't think I would like to
work in your team.
Likewise if you don't consider that finding the simplest way of
implementing a solution is worth doing, I wouldn't like to work on your
code.
In my area, if your code is reasonable and well written and it follows our
standards, it's fine.
Being more complex than it needs to be means that code *isn't*
reasonable and well-written, IMO.
>I suspect *very*
> few programs don't do any string manipulation - knowing the string
> methods well is *far* more fundamental to .NET programming than knowing
> regular expressions.

I agree with part of that and think that regular expressions are just as
important to know.


Why?


Because they are perfectly valid and as you said before there are some that
are useful (therefore, you should know them as someone might use them and
you may have to maintain it).


Occasionally they're useful. I haven't used a single one in the project
I've been working on for the last six months. On the other hand, I've
used string manipulation all over the place.

I would expect that the number of straight string manipulations in most
code should be *much* higher than the number of regular expressions
used - hence it's more important to thoroughly understand the string
methods than regexes.
I'm working on a fairly large project which hasn't needed to use
regular expressions and wouldn't have benefitted from them once.


That's your style and position, but may not be someone else's.


Everyone else in the team certainly feels the same way.
At that point, if I didn't understand the regular expression, I'd look
it up in the documentation. Do you know every part of regular
expression syntax off by heart?


According to your position, you should ban them altogether for ANY use,
since you can do anything in C# you can do in Regex.


No, because - as I *keep* saying - there are things you can't do as
*simply* using straight string manipulation. Where it's simpler to use
regexes, I'd use them. Those situations come up occasionally, but not
with the frequency you seem to use regular expressions.
If they're on my team, I'll tell them to refactor their code to only
use them when they're appropriate, frankly.


Appropriate as defined by you. Why allow them at all?


See the various places I've exlained that both in this post and many
others.
If code uses regular expressions when they serve no purpose, it is
*not* well written and clean though - it is less maintainable than it
might be.

They serve a purpose. They do the same as your string routines, so there is
a pupose. Both are string handling routines.


No, using regular expressions *instead* of the string handling routines
serves no purpose, just as using a web service to perform addition
would serve no purpose.

There's no advantage in using the regular expression here, and there
*is* a disadvantage.
And you believe that everyone else does? Again, bear in mind that
you're unlikely to be the only person ever to read your code.


So you should never EVER use Regex. Someone else might read your code.

This is going in circles.


Yes, because you seem unable to understand the position I've presented
several times.
As I said, I would have a problem with someone who couldn't figure out what
the example we were using was doing.
But would you have a problem with the same person if they forgot or
didn't check whether, say, '[' needed escaping? I'd find that a fairly
understandable mistake (although I'd hope that unit tests would show
the problem up).
Keep regular expressions out of my code?????

So now you are saying there is no use for it?


Not at all - I'm saying that you shouldn't put regular expressions in
your code just for the sake of keeping your hand in. Use them where
they're applicable, and only there.


There either is a use or not. You can't say there is a use for it and then
brow beat a programmer because he happens to like to use it.


I certainly can when the programmer uses it where there's no good
reason. There's a time and place to use reflection, but I would
certainly brow-beat a programmer who decided to use it to get the value
of a property which could be done in a safer way (using normal property
access syntax).
Has a programmer got to come to you each time he wants to use it to get your
permission.
In our team a programmer (including myself) has to get "permission"
every time they want to check anything in. It's called code review, and
it vastly improves the quality of the code.
I can see it if he writes some obscure cyptic Regular Expression - but come
on.
Cryptic such as "( )" where a straight " " would have been more
readable? Code review should have picked that up.
But that's *exactly* what you've suggested you should do with regular
expressions - use them even when there's no real purpose in doing so,
just so that you remember what they look like.


Sure.

If they are both perfectly valid, I might. Depends on my mood (you should
really have a problem with that). :)


I certainly do. "Valid" to me involves the code being as simple as
possible.
Okay, so you don't memorise it, which means you *do* have to look up
which characters require escaping. I think you've just admitted that
your code is less maintainable than mine.


No.

I can maintain my car, but I might still have to look up specs on it.


But wouldn't it be easier to maintain something which *didn't* require
you to look up anything?
I would use them when the solution which uses regular expressions is
clearer than the solution which doesn't use them. It seems a pretty
simple policy to me.


If they are not readable, you shouldn't use them at all. I personally think
they are both readable, in this case.


Readability is not a black and white issue. Something is "more
readable" than something else - in this case, using string manipulation
is more readable (and maintainable, importantly) than using regular
expressions. In other cases, it isn't.
Those maintaining the code could no doubt understand it after looking
at it for a little while, just like they could work out your other
regular expressions after looking at them and consulting the
documentation - but why are you trying to make their jobs harder? Why
are you not concerned that the code you're writing is costing your
company money by making it harder to maintain than it needs to be?


Again, then you feel there is no place for Regex as you can do anything with
C# that you can do with Regex. As you say, it will always be harder to
read.


Where did I say it will *always* be harder to read? Please don't put
words in my mouth, especially when I've expressly stated otherwise
elsewhere.

At times, regular expressions will be easier to understand than the
equivalent string manipulation solution. In this case, they're not.
As you said, the two solutions are equal. Your solution is that you MUST
go
with IndexOf. Mine is you can use either.


Well, they're equal in terms of their semantics. They're definitely not
equal in terms of maintainability, and as that's important to me, I
don't see what's wrong with saying that I'm very strongly in favour of
avoiding the less readable/maintainable code.


I didn't say that.


Didn't say what?
I wasn't referring to this particular issue when I said this.


It would have been nice if you'd indicated that. Do you agree then that
it doesn't actually take any more brainpower to come up with
String.IndexOf instead of Regex.IsMatch, but in fact it takes *less*
brainpower when it comes to maintaining the IndexOf solution?


In this case, no.


So you don't think that it would be harder to change the regex code to
look for "hello[there" than it would be to change the IndexOf code in
the same way?
In other cases, could be. Would have to look at it. I
never said that Regex is the best thing out there. I was just saying that
it is valid and can be readable - can also be cryptic (as can C#).
And I've never argued with that. I've argued against it being *as*
readable and maintainable in *this* case.
And of course the answer is "yes, by calling IndexOf multiple times".


That wasn't the question asked. That was the example that was given and the
question was can you do it in one statement.

So the answer is no, using IndexOf.


Okay. But the follow-on answer is "the best way to do it is to use
IndexOf repeatedly" possibly with "and you can always write your own
method to do this if you want".
Yes, as would a single call to a method which called IndexOf on the
string multiple times. I disagree with you - Nicholas wasn't correct in
his assessment, as he claimed that the "best bet" would be to use a
regular expression. Using regular expressions is just *not* the best
bet here - it requires more effort, as I've described repeatedly.


No, he was correct in his answer to the question. The question was never
"Which is better", but can you do it .


His answer talked about the "best bet" - although the question didn't
ask about the best way, his answer did. I disagree with that answer.
And you can do a method which called
IndexOf multiple times. But then it isn't one line, is it?


You could put it in one line if you wanted to. It wouldn't be as easy
to read, but you could do it.
Are you suggesting that maintainability isn't something that should be
considered? Do you *really* want to look for "something1",
"something2" and "something3" or were they (as I suspect) just
examples, and the real values could easily have dots, brackets etc in?


I don't really remember what the context was originally. But I know they
didn't have dots and brackets in it.


And wouldn't ever?
What if you wanted to change "something1" to "something\". Same problem.


Well, half the problem with IndexOf than it is with regular
expressions. With regular expressions, you'd need to know that not only
does backslash need escaping in C#, it also needs escaping in regular
expressions.

IndexOf: "something\\" or @"something\"
Regex: "something\\\\" or @"something\\"

Once again, the IndexOf version is easier to understand - there's less
to mentally unescape to work out what's actually being asked for.

Splitting hairs, now. Both are the same, as far as I can see (here).


You don't think that having to count 4 backslashes is even slightly
harder than only counting 2? I can spot a double-backslash without
doing any double-checking. I'd always be careful when I needed four.
And if escapes were a problem (if it were me) I would have a little sheet
that showed them at my desk within easy reach.


Whereas by needing to know less (just the C# escapes) it's really easy
to memorise everything I need to know to solve this situation.


That's true, but then you would only know C#. And if that is your aim.
That's fine.


My aim is to only *need* to know as little as possible. The rest is
available where necessary.
Yup, but it's something that isn't used in string literals other than
for regular expressions. It's an extra thing to bear in mind
unnecessarily.


No room for it, huh?


Not when there's a simpler solution, no.
While I'm leaving the exact original question, it's far from out of the
question that the original code wouldn't need to be changed to use a
variable to be searched for some time. At that point, can you guarantee
that your team would get it right? They'd need to be on their guard
when using regular expressions - they wouldn't need to be on their
guard using IndexOf.


Right. No one makes mistakes with IndexOf.


More rarely than with regular expressions.
They aren't as readable *in this case*. In other, more complicated
situations, the version which only used IndexOf would be harder to read
than the regular expression version.


But your problem was that it would be hard for other programmers to read.
If they can read your more complicated version, this one should be easy.


<sigh> It's a matter of degree, as I keep saying. It's a case of how
much effort needs to be put in to understand something.
You seem to fail to grasp the "make it as simple as possible" concept.
It's not a case of maintenance engineers being idiots - it's about
presenting them with fewer possible risks. Why leave them a trap to
fall into when you can write simpler code which is easier to change
later on?


No.

I just find it as simple, in this case and you don't.


I would be willing to wager large amounts of money on others
(particularly junior programmers) finding it less simple though. I'm
absolutely certain that if thousands of programmers had to maintain the
IndexOf version and change it to look for "foo.bar", fewer would make a
mistake than thousands of equivalent programmers maintaining the
regular expression version.

Are you absolutely certain that the regular expression *wouldn't* prove
more bug-prone?
But you're pushing for regular expressions in *this* situation, or at
least saying it's just as good as using IndexOf. You've also shown in
your other code that you use regular expressions unnecessarily for
replacement, making a simple two-step replacement into a complicated
single-step replacement where the number of characters which *aren't*
just plain text is greater than the number of characters which are.


No. Not pushing. But think they are equivelant in this case. As you said
earlier, I am sure others would disagree. But I don't think that the
difference is significant enough, in this case, even if I were to agree on
which is easier, to preclude it.


To me, it's definitely signifiant. Using regular expressions here
introduces risk for no benefit.
Can't have it both ways. If you allow Regular Expressions, you shouldn't
have a problem if a programmer used the Regex or IndexOf in our example.
Anyone maintaining the "USEFUL" ones would have zero problems with this
one.


How very black and white of you. Do you really have no concept of
someone being able to understand something, but having a harder time
understanding it one way than the other?

Who?

The person who can understand Regex if complicated, but would be trashed
trying to figure out our little example.

Bit of a stretch there.


Again, you're being black and white. I'm not saying that people
*couldn't* understand the regular expression - although they're more
likely to make a simple mistake without thinking about it. I'm saying
that they'll need to put more effort into understanding it than a
straight IndexOf.
Yes - the complicated cases where I've already said that regular
expressions are useful!


Just make sure the programmer that can't handle the easy Regex doesn't see
that one.


I would hope that anyone maintaining a complex regular expression will
double-check what's going on. It's easy to conceive of someone
maintaining a simple one failing to do so.
No, just allow them where they make sense. Note that if you only use
them where they're going to be doing something fairly involved, it's
much less likely that an engineer will forget that he's actually
dealing with a regular expression than with a simple string.


Already dealt with.


Where?
But regular expressions are by their very nature more complicated than
a simple String.IndexOf call. If they weren't they wouldn't be as
powerful as they are.


Write and vanilla C# is less complicated than writing objects, but we still
do them.


No, it's not less complicated. If you avoided using objects, the code
would be *much* harder to read and maintain.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #31

P: n/a
Jon Skeet [C# MVP] <sk***@pobox.com> wrote:
(I'm hoping Nick's going to be at the MVP summit and I can ask him for
a bit of clarification on this point - I'll let you know if I get to
chat with him.)


<snip>

Update: I've now met Nick, and we've talked about many things. We
managed to stay on this topic for about a minute before moving onto
something else - it was one of those conversations. I wouldn't like to
trust my memory of the very brief mention of it to say whether or not
he agreed with me on the maintenance point.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #32

P: n/a
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Jon Skeet [C# MVP] <sk***@pobox.com> wrote:
(I'm hoping Nick's going to be at the MVP summit and I can ask him for
a bit of clarification on this point - I'll let you know if I get to
chat with him.)
<snip>

Update: I've now met Nick, and we've talked about many things. We
managed to stay on this topic for about a minute before moving onto
something else - it was one of those conversations. I wouldn't like to
trust my memory of the very brief mention of it to say whether or not
he agreed with me on the maintenance point.


He probably did.

I haven't had time to finish up our discussion, but will try to get to it
this weekend.

Tom
--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #33

This discussion thread is closed

Replies have been disabled for this discussion.