function to match parenthesis '(' and ')'

How about a Regular Expression?

The following Regular Expression will capture all opening and closing
parentheses. It puts opening parentheses into Group 1 ("opening"), and
closing parenthese into Group 2 ("closing"):

(?<opening>[$])|(?<closing>[$])

Using the Regex Class, you can count the number of each group separately.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.
..
"Steve Kirsch" <te**@test.com> wrote in message
news:%2****************@TK2MSFTNGP15.phx.gbl...

I need a simple function that can match the number of beginning and ending
parenthesis in an expression. Here's a sample expression:

( ( "john" ) and ( "jane" ) and ( "joe" ) )

Does .NET have something built-in that can accomplish this, or do I have
to write my own parser? I don't want to reinvent the wheel if possible.

Nov 17 '05 #6

Bill Butler

"ALI RAZA" <al*************@yahoo.com> wrote in message
news:%2****************@TK2MSFTNGP15.phx.gbl...

Salam

The simple solution to this problem is to implement a stack, when u encounter '(' push it on the
stack, and when u encounter ')', pop it from the stack.

In the End, check if the stach is empty, if it is empty then the parenthesis are match else they
are not.

--

All of this assumes that ALL parens should be pushed on the stack.
For instance:
Console.WriteLine("(");
would not work.

In a situation like this you might need to make sure the paren is not part of a string.

All of this depends on the use to which the function is being applied.

Bill

Nov 17 '05 #7

Kevin Spencer <ke***@DIESPAMMERSDIEtakempis.com> wrote:

How about a Regular Expression?

The following Regular Expression will capture all opening and closing
parentheses. It puts opening parentheses into Group 1 ("opening"), and
closing parenthese into Group 2 ("closing"):

(?<opening>[$])|(?<closing>[$])

Using the Regex Class, you can count the number of each group separately.

That will count them, but not make sure that they match - it would say
that:

)(

is valid, for instance.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #8

True, Jon, but that was not the question asked. A regular expression could
also be developed that would ensure they match. If it were requested, I
would be glad to take the time to do so.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...

Kevin Spencer <ke***@DIESPAMMERSDIEtakempis.com> wrote:
How about a Regular Expression?

The following Regular Expression will capture all opening and closing
parentheses. It puts opening parentheses into Group 1 ("opening"), and
closing parenthese into Group 2 ("closing"):

(?<opening>[$])|(?<closing>[$])

Using the Regex Class, you can count the number of each group separately.

That will count them, but not make sure that they match - it would say
that:

)(

is valid, for instance.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #9

Kevin Spencer <ke***@DIESPAMMERSDIEtakempis.com> wrote:

True, Jon, but that was not the question asked.
Well, it asked about "matching" parentheses. I think it's reasonable to
assume that the OP doesn't actually want ")(" to match - especially
given the sample.
A regular expression could
also be developed that would ensure they match. If it were requested, I
would be glad to take the time to do so.

Hmm... I seem to remember something in CS classes saying that this is
something that regular expressions actually can't do. You can develop
them that will match down to a certain level of nesting (i.e. for any
given number of levels of nesting, you can build a regular expression
which will match it), but nothing to match *any* level of nesting. I
could easily be wrong though...

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #10

He did say "match," although what he meant by it was somewhat ambiguous. If
I have the time, I'll give it a shot.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...

Kevin Spencer <ke***@DIESPAMMERSDIEtakempis.com> wrote:
True, Jon, but that was not the question asked.

Well, it asked about "matching" parentheses. I think it's reasonable to
assume that the OP doesn't actually want ")(" to match - especially
given the sample.
A regular expression could
also be developed that would ensure they match. If it were requested, I
would be glad to take the time to do so.

Hmm... I seem to remember something in CS classes saying that this is
something that regular expressions actually can't do. You can develop
them that will match down to a certain level of nesting (i.e. for any
given number of levels of nesting, you can build a regular expression
which will match it), but nothing to match *any* level of nesting. I
could easily be wrong though...

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #11

Steve Kirsch

Hi folks,
I am the person who started this thread, and appreciate all the assistance
and code samples. To clarify, I was looking for a simple way to validate an
expression by counting the number of opening and closing parenthesis. If the
expression is determined to be invalid based on that criteria, I show a
message box to the user saying please check your input.

Marcus Andren's code was simple and easy enough to do the job, so I'm using
that for now. I'm not knowledgable about regular expressions at the moment,
and don't have the time to learn them right now (this project is due
tomorrow!). So, I'll stick with the simple validation accomplished by
Marcus's code (above), and hopefully I'll learn some regular expressions in
the future to come enhance this in a future version. If one of you would
like to suggest an alternative to Marcus' code using regular expressions,
please post the code. Thanks again, and best regards to all.

=Steve

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:%2****************@TK2MSFTNGP12.phx.gbl...

He did say "match," although what he meant by it was somewhat ambiguous.
If I have the time, I'll give it a shot.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Kevin Spencer <ke***@DIESPAMMERSDIEtakempis.com> wrote:
True, Jon, but that was not the question asked.

Well, it asked about "matching" parentheses. I think it's reasonable to
assume that the OP doesn't actually want ")(" to match - especially
given the sample.
A regular expression could
also be developed that would ensure they match. If it were requested, I
would be glad to take the time to do so.

Hmm... I seem to remember something in CS classes saying that this is
something that regular expressions actually can't do. You can develop
them that will match down to a certain level of nesting (i.e. for any
given number of levels of nesting, you can build a regular expression
which will match it), but nothing to match *any* level of nesting. I
could easily be wrong though...

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #12

Steve Kirsch <te**@test.com> wrote:

Marcus Andren's code was simple and easy enough to do the job, so I'm using
that for now. I'm not knowledgable about regular expressions at the moment,
and don't have the time to learn them right now (this project is due
tomorrow!). So, I'll stick with the simple validation accomplished by
Marcus's code (above), and hopefully I'll learn some regular expressions in
the future to come enhance this in a future version. If one of you would
like to suggest an alternative to Marcus' code using regular expressions,
please post the code. Thanks again, and best regards to all.

It's worth learning regular expressions just because they're very
useful in certain situations. However, any regular expression solution
to this problem is likely to be significantly more complicated than the
code Marcus provided, and probably slower too.

It's very easy to see by inspection that Marcus's code works.
(Admittedly I'd decrement balance in one statement and check it in
another, but that's a minor nit-pick.) Checking that a matching regular
expression - especially one that copes with arbitrary levels of nesting
- is correct would be very, very hard.

It's always worth thinking about the right tool for the job - in this
case, regular expressions aren't the right tool, IMO.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #13

Steve, you can use the following regex :
^[^()]*(?:$([^()]*(?:(?:(\()[^()]*)+(?:(?<-2>$)[^()]*)+)*(?(2)(?!)))\)[^()
]*)*$

Here a simple example to know if parenthesis match :
Regex regex=new
Regex(@"^[^()]*(?:$([^()]*(?:(?:(\()[^()]*)+(?:(?<-2>$)[^()]*)+)*(?(2)(?!)
))\)[^()]*)*$");
if (!regex.IsMatch(theStringIWantToCheck)) MessageBox.Show("Parenthesis
do not match !");

Another example to know all the subexpressions (not the inner subexpression
but you can find them with recursivity) :
Regex regex=new
Regex(@"^[^()]*(?:$([^()]*(?:(?:(\()[^()]*)+(?:(?<-2>$)[^()]*)+)*(?(2)(?!)
))\)[^()]*)*$");
Match match=regex.Match(theStringIWantToCheck);
if (match.Success) {
foreach(Capture c in match.Groups[1].Captures)
MessageBox.Show("SubExpression="+c.Value);
//You can have the number of subexpressions with
match.Groups[1].Captures.Count
} else {
MessageBox.Show("Parenthesis do not match !");
}

If you want to understand how it works, there is a good article from Ryan
Byington that helped me to make this regexe :
http://blogs.msdn.com/bclteam/archiv...15/396452.aspx

You can also use RegexOptions.Compiled in Regex, and also use Static Regex
class if you want a faster Regex.

Marcus Code is obviously quicker than this one with Regex (on my computer,
it's 4 times quicker), but I would prefer Regex. It's more "object" like.
You can do what you want without modifying your source code. Everything is
stored in Capures field in one shot, so you can navigate very easily. If you
wanted to do the same with Marcus Code, I guess it would not be 4 times
quicker but maybe not more than 2 times quicker.)

Hope it helps,

Ludovic Soeur.

Nov 17 '05 #14

Michael S

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...

Checking that a matching regular
expression - especially one that copes with arbitrary levels of nesting
- is correct would be very, very hard.
"Ludovic SOEUR" <Lu***********@hotmail.com> wrote in message
news:ev**************@TK2MSFTNGP15.phx.gbl... Steve, you can use the following regex :
^[^()]*(?:$([^()]*(?:(?:(\()[^()]*)+(?:(?<-2>$)[^()]*)+)*(?(2)(?!)))\)[^()
]*)*$

Hehe. Guess Ludovic just proved Jon's point. =)

I think if you are evil enough to write such a regexp, you must never ever
supply a comment telling what the regexp does. Heck, the maintainance guys
need to think for themselves! =)

Happy Coding
- Michael S

Nov 17 '05 #15

Ludovic SOEUR wrote:

Steve, you can use the following regex :
^[^()]*(?:$([^()]*(?:(?:(\()[^()]*)+(?:(?<-2>$)[^()]*)+)*(?(2)(?!)))\)[^()
]*)*$

Here a simple example to know if parenthesis match :
Regex regex=new
Regex(@"^[^()]*(?:$([^()]*(?:(?:(\()[^()]*)+(?:(?<-2>$)[^()]*)+)*(?(2)(?!)
))\)[^()]*)*$");
That's simple, is it? Out of interest, how long do you think it would
take to understand that, if presented with it with no prior knowledge?
How long do you think you'd take to verify that it's correct?

<snip>
Marcus Code is obviously quicker than this one with Regex (on my computer,
it's 4 times quicker), but I would prefer Regex. It's more "object" like.

Well, the code which does it explicitly is faster, and *much* easier to
understand. I don't see where the real benefit comes in, just from it
being more "object" like. You could always encapsulate Marcus's code in
a "ParenthesisMatcher" class or something similar.

With the explicit code, it's easy to modify, easy to verify by
inspection, easy to understand, and fast. The regular expression is
none of these. It's just the wrong tool for the job, IMO.

Jon

Nov 17 '05 #16

I did not think that my post would start a debate. So as it has started, I
have to reply.

I put this sample to prove that it's possible with regex and I would like to
do a time comparison too. I still stay on my point of view when I say I
prefer RegEx version if you do not need to do this check 100 000 times.

First of all, I would like to answer to Michael's reply. No, I'm not evil
enough to write such a RegEx. I used the Regex from the Ryan Byington's one
http://blogs.msdn.com/bclteam/archiv...15/396452.aspx
that is very well explained. The only improvment was to remove all 'names'
and to capture ONLY valid strings (the ones Steve is looking for). I ALWAYS
explain my regular expressions and this one is a very usefull one that is
known by most of who works with regex.
Maintenance for this case is useless. It's like providing a function for the
System namespace. You dont have to do maintenance for "substring" method of
string class. When a regular expression is known to be correct and used by
many developpers, you don't have to change anything. Only put a link to the
original explanation and add comment in your code if you want to be more
precise. On the other hand, Marcus code must be commented, and i would say,
more than the regex one...it should be surprising for most people who are
not used to balancing in regex.

Second, to answer to Jon, it took me 2 minutes to understand the original
regex from Ryan Byington. Firstly because it's well commented and also
because I am used to writting balanced regex. To verify it was correct was
very quick. Encapsulate Marcus' code in a "ParenthesisMatcher" is logical
and I agree with you. Doing that and then store all subparts to have a
direct access takes time. I said it was 4 times quicker (in fact it's 3.5)
and I guess with en encapulated version, Marcus's one can reach 2 times. So
why taking time to write an entire class doing the check, giving a direct
access to the subelements, comment all this methods of course, write many
lines of code ? It is alredy done with this regex which is known. It does
all the works. Why allways rewriting what is done and working ?

I totally disagree when you say that regular expression is not the job for
this work. With one line that you do not mind if it's understable, you can
do everything you need with parenthesis balancing. It's the right tool if
you not need to do a loop over thousands of checks, you don't have to write
your own classes or method, everything is already done. It's like using a
dll, libray, in four words, reusing what was done.

Ludovic SOEUR.

"Jon Skeet [C# MVP]" <sk***@pobox.com> a écrit dans le message de
news:11*********************@g49g2000cwa.googlegro ups.com...

Ludovic SOEUR wrote:
Steve, you can use the following regex :
^[^()]*(?:$([^()]*(?:(?:(\()[^()]*)+(?:(?<-2>$)[^()]*)+)*(?(2)(?!)))\)[^() ]*)*$

Here a simple example to know if parenthesis match :
Regex regex=new
Regex(@"^[^()]*(?:$([^()]*(?:(?:(\()[^()]*)+(?:(?<-2>$)[^()]*)+)*(?(2)(?!) ))\)[^()]*)*$");
That's simple, is it? Out of interest, how long do you think it would
take to understand that, if presented with it with no prior knowledge?
How long do you think you'd take to verify that it's correct?

<snip>
Marcus Code is obviously quicker than this one with Regex (on my computer, it's 4 times quicker), but I would prefer Regex. It's more "object"

like.
Well, the code which does it explicitly is faster, and *much* easier to
understand. I don't see where the real benefit comes in, just from it
being more "object" like. You could always encapsulate Marcus's code in
a "ParenthesisMatcher" class or something similar.

With the explicit code, it's easy to modify, easy to verify by
inspection, easy to understand, and fast. The regular expression is
none of these. It's just the wrong tool for the job, IMO.

Jon

Nov 17 '05 #17

Hi Jon,

That's not too long for a Regular Expression. They are quite cryptic,
though, if you're not used to the syntax. Unlike C, there is no ability to
separate sections of instruction, even across lines. Like C, though, the
more you practice, the better you get at it. It also helps to have a good
IDE (or 2) for developing and testing Regular Expressions in.

Regular Expressions are quite powerful in terms of how they do their
parsing. It is essentially procedural, and character-based.

Still, in this case, I can see how Marcus' code would be a more elegant
solution. Just wanted to pique your interest in the power of Regular
Expressions, which are invaluable in certain situations, such as parsing
complex patterns in large strings and/or documents, replacing patterns
rather than identical blocks of text, etc.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:11*********************@g49g2000cwa.googlegro ups.com...

Ludovic SOEUR wrote:
Steve, you can use the following regex :
^[^()]*(?:$([^()]*(?:(?:(\()[^()]*)+(?:(?<-2>$)[^()]*)+)*(?(2)(?!)))\)[^()
]*)*$

Here a simple example to know if parenthesis match :
Regex regex=new
Regex(@"^[^()]*(?:$([^()]*(?:(?:(\()[^()]*)+(?:(?<-2>$)[^()]*)+)*(?(2)(?!)
))\)[^()]*)*$");

That's simple, is it? Out of interest, how long do you think it would
take to understand that, if presented with it with no prior knowledge?
How long do you think you'd take to verify that it's correct?

<snip>
Marcus Code is obviously quicker than this one with Regex (on my
computer,
it's 4 times quicker), but I would prefer Regex. It's more "object" like.

Well, the code which does it explicitly is faster, and *much* easier to
understand. I don't see where the real benefit comes in, just from it
being more "object" like. You could always encapsulate Marcus's code in
a "ParenthesisMatcher" class or something similar.

With the explicit code, it's easy to modify, easy to verify by
inspection, easy to understand, and fast. The regular expression is
none of these. It's just the wrong tool for the job, IMO.

Jon

Nov 17 '05 #18

Kevin Spencer wrote:

That's not too long for a Regular Expression.
Maybe not for a regular expression, but compared with the simple C#
code, it's much too long and cryptic.
They are quite cryptic, though, if you're not used to the syntax.
I would argue that they're *relatively* cryptic even if you are used to
the syntax. I'm not exactly a total newbie to regular expressions, but
it would have taken me a fair amount of time to work out what that one
was doing and verify it.
Unlike C, there is no ability to
separate sections of instruction, even across lines. Like C, though, the
more you practice, the better you get at it. It also helps to have a good
IDE (or 2) for developing and testing Regular Expressions in.
Does that mean you agree that it's pretty difficult to understand them
by just inspection?
Regular Expressions are quite powerful in terms of how they do their
parsing. It is essentially procedural, and character-based.
Oh, they're incredibly powerful and very useful - *in the right
situations*.
Still, in this case, I can see how Marcus' code would be a more elegant
solution. Just wanted to pique your interest in the power of Regular
Expressions, which are invaluable in certain situations, such as parsing
complex patterns in large strings and/or documents, replacing patterns
rather than identical blocks of text, etc.

I certainly appreciate that regexes are useful. I just don't think this
is a place where they're suitable. I would guess that at least half of
the requests for regular expressions I see on the newsgroups are
inappropriate places to apply regular expressions. A lot of people seem
to approach any textual problem with the immediate idea of "use a
regular expression!" without considering whether or not it's the
best/simplest way of solving it.

Jon

Nov 17 '05 #19

Ludovic SOEUR wrote:

Maintenance for this case is useless. It's like providing a function for the
System namespace. You dont have to do maintenance for "substring" method of
string class. When a regular expression is known to be correct and used by
many developpers, you don't have to change anything.
So you're assuming that it will never, ever have to change it's
function. It won't have to, say, change to balance < and > as well as
or instead of ( and ). I don't like making that kind of assumption.

I don't like having code which is hard to read - in particular, harder
to read that an alternative, which is the case here. In many cases,
regular expressions prevent you from having to write very complicated
parsers. In those cases, the REs are the simpler code. Here, they're
just not.
Second, to answer to Jon, it took me 2 minutes to understand the original
regex from Ryan Byington. Firstly because it's well commented and also
because I am used to writting balanced regex. To verify it was correct was
very quick.
Do you think every maintenance engineer would only take two minutes?
How long did it take you to understand the code Marcus produced? It
certainly took me less than two minutes...
Encapsulate Marcus' code in a "ParenthesisMatcher" is logical
and I agree with you. Doing that and then store all subparts to have a
direct access takes time. I said it was 4 times quicker (in fact it's 3.5)
and I guess with en encapulated version, Marcus's one can reach 2 times. So
why taking time to write an entire class doing the check, giving a direct
access to the subelements, comment all this methods of course, write many
lines of code ? It is alredy done with this regex which is known. It does
all the works. Why allways rewriting what is done and working ?
Either you encapsulate this in another method (or specific type) or you
don't. If you don't, you have to have the bare regular expression
itself available, which is pretty unpleasant IMO. If you do, it doesn't
matter (from a usage point of view) whether you use a regular
expression or the code from Marcus.
I totally disagree when you say that regular expression is not the job for
this work. With one line that you do not mind if it's understable, you can
do everything you need with parenthesis balancing.

You may not mind whether or not all your code is as readable as it can
be. I certainly do.

You might be interested to know that I ran the alternatives of "use a
regular expression" or "use some simple logic" past a couple of
colleagues - they were pretty much dumbfounded that anyone would use a
regular expression in this situation, where the logic is so
straightforward.

Jon

Nov 17 '05 #20

> I certainly appreciate that regexes are useful. I just don't think this

is a place where they're suitable. I would guess that at least half of
the requests for regular expressions I see on the newsgroups are
inappropriate places to apply regular expressions. A lot of people seem
to approach any textual problem with the immediate idea of "use a
regular expression!" without considering whether or not it's the
best/simplest way of solving it.

Yes, I agree with you. that is a good summary of Regex usage today.
I will however add, for this topic (validating and parsing expressions with
parenthesis), that a known regex does all the job requested.
Of course you can write code, write your own libraries that should be more
elegant (it depends on what you think is elegant). I prefer reuse what is
already done and only optimise it if necessary.

Nov 17 '05 #21

Ludovic SOEUR wrote:

First of all, I would like to answer to Michael's reply. No, I'm not evil
enough to write such a RegEx. I used the Regex from the Ryan Byington's one
http://blogs.msdn.com/bclteam/archiv...15/396452.aspx
that is very well explained.

<snip>

Sorry to reply twice to the same post, but I just looked at the blog
above.

There are two points which are interesting to note:

1) This is .NET-specific - as I stated (with some trepidation!) before,
there is no way to do this in "general" regular expressions. Therefore
anyone reading it and trying to understand it not only needs to know
regular expressions, but the .NET variety of them.

2) One of the comments is "My brain just exploded." That comment was
added by someone who is very familiar with regular expressions, given
the blog he links to.

I would be willing to bet that if you gave both versions to 1000
"average" C# coders, far, far more would be able to understand the code
from Marcus than the regular expression. Why include effectively "black
box" code when you can so easily avoid it?

Jon

Nov 17 '05 #22

Thanks for all your remarks, Jon.

I would however add some arguments :
Few years ago, I was not used to regular expressions. I would have written
my own classe to do balancing, and I would probably have done a code close
to Marcus' one. Then I would have written all the code necessary to store
subelements, and so on.
Years after years, I got used to regex. I also had to write an expression
validator. First, I thought it was impossible with regex and it was not the
right tool. I was right in your point of view. So I wrote everything. It did
not took me a lot, the source is clear, commented but all the parsing class
is important. I however wanted to know if it was possible with regex. I
tried to understand balancing in regex. Many days later, I found that a
regex was doing everything I wanted in one line.
Then I had to write another expression parser and I used this regex because
I did not need efficiency. With one line I had everything.
I also could have copied my old file where I had already developped a
similar parser and use it nearly the same.
I would have had a "black box" too. I don't need to look in my source
because I know how I did it and that it works, the same for the regex.
In my mind, the regex line only means "ParenthesisBalanceClasss".
If I want efficiency, I can replace this regex line by a call to my class I
wrote few years ago. It would be exactly the same, I don't see the
difference....except that with the class version is quicker but the code is
longer.
What is the best ? I think it depends on what you really want to do.
What is the more readable ? It depends on if you have ever seen this regex.
If yes, both are the same. If not, of course Marcus' code is the one to use.
So you will say that my code is not maintenable ? I disagree. Firstly, there
is no need to modify it (it's like modifying indexOf of string class). A
black box ? It's the same when you use a DLL or use any of the function in
C#.

I think I can't change your mind. I also understand your point of view,
which is logical.
I would end saying that Regex is not the better for this balancing, but
someone did an interesting thing for it. I prefer using it than copying a
long class doing exactly the same.
It's my choice and a choice of many developers that do not want to rewrite
what is already simply done.

"Jon Skeet [C# MVP]" <sk***@pobox.com> a écrit dans le message de
news:11**********************@f14g2000cwb.googlegr oups.com...

Ludovic SOEUR wrote:
First of all, I would like to answer to Michael's reply. No, I'm not evil enough to write such a RegEx. I used the Regex from the Ryan Byington's one http://blogs.msdn.com/bclteam/archiv...15/396452.aspx
that is very well explained.

<snip>

Sorry to reply twice to the same post, but I just looked at the blog
above.

There are two points which are interesting to note:

1) This is .NET-specific - as I stated (with some trepidation!) before,
there is no way to do this in "general" regular expressions. Therefore
anyone reading it and trying to understand it not only needs to know
regular expressions, but the .NET variety of them.

2) One of the comments is "My brain just exploded." That comment was
added by someone who is very familiar with regular expressions, given
the blog he links to.

I would be willing to bet that if you gave both versions to 1000
"average" C# coders, far, far more would be able to understand the code
from Marcus than the regular expression. Why include effectively "black
box" code when you can so easily avoid it?

Jon

Nov 17 '05 #23

Greg Bacon

In article <MP************************@msnews.microsoft.com >,
Jon Skeet [C# MVP] <sk***@pobox.com> wrote:

: Hmm... I seem to remember something in CS classes saying that this is
: something that regular expressions actually can't do. You can develop
: them that will match down to a certain level of nesting (i.e. for any
: given number of levels of nesting, you can build a regular expression
: which will match it), but nothing to match *any* level of nesting. I
: could easily be wrong though...

Yes and no.

If you mean regular expressions in the same sense as theoretical
computer scientists use it, you're correct. Keep in mind, however,
that the toolbox is pretty bare: literals, concatenation, grouping,
alternation, and Kleene star.

The .NET framework's "regular expressions" are more powerful in that
they don't limit reconizers to regular languages (i.e., languages
that finite-state automata can recognize).

For example, by application of the pumping lemma, we can show that
the language of strings where each character is either an 'a' or a 'b'
and where each string consists of two copies of a substring appended
together is not regular. (Notation makes this clearer: L = { ww :
w \in {a, b}* }.)

Backreferences make the construction trivial:

Regex r = new Regex(@"^([ab]*)\1$");

In *Mastering Regular Expressions*, Friedl gives an VB.NET example for
matching balanced parentheses, which I've paraphrased and translated to
C# below:

string pattern =
@"^ $
(?>
\( (?<DEPTH>) |
$ (?<-DEPTH>) )*
(?(DEPTH)(?!))
\) $";

Regex balanced = new Regex(pattern,
RegexOptions.IgnorePatternWhitespace);

See page 430 of http://www.oreilly.com/catalog/regex2/chapter/ch09.pdf

Hope this helps,
Greg
--
- That scruffy beard... Those suspenders... That smug expression...
You're one of those condescending Unix computer users!
- Here's a nickel, kid. Get yourself a better computer.
-- Dilbert

Nov 17 '05 #24

Greg Bacon

In article <11*********************@g49g2000cwa.googlegroups. com>,
Jon Skeet [C# MVP] <sk***@pobox.com> wrote:

: [...]
: With the explicit code, it's easy to modify, easy to verify by
: inspection, easy to understand, and fast. The regular expression is
: none of these. It's just the wrong tool for the job, IMO.

I mean, really. What's wrong with MOV and SCASB?

Greg
--
Today is already the tomorrow which the bad economist yesterday
urged us to ignore.
-- Henry Hazlitt, Economics in One Lesson

Nov 17 '05 #25

Bill Butler

"Ludovic SOEUR" <Lu***********@hotmail.com> wrote in message
news:ec**************@tk2msftngp13.phx.gbl...
<snip>

So you will say that my code is not maintenable ? I disagree. Firstly, there
is no need to modify it (it's like modifying indexOf of string class). A
black box ? It's the same when you use a DLL or use any of the function in
C#.

It is true that THIS Regex will never need to be changed because this regex solves THIS problem.
But,
What if the problem changes???
Jon already brought up <,>.
What about {,}and [,]?
Or...Let's go crazy, what about /*,*/

These are all common Balanced pairs.
What if the problem domain demands that we handle them too? (or some arbitrary subset)

Granted, you may find another Off-the-shelf Regex to handle the problem, and you may not.
More importantly, The guy who inherited your code will need to find it.
So the original Regex may not need maintenance, but it may diverge form the problem set requiring a
new solution.

So, how likely is it that the Problem domain will change over time?
That depend on the problem.
If the problem will not likely change over time Perhaps the Regex will be fine.
If obvious changes to the problem can be handled easily by Regex as well then Perhaps the Regex will
be fine.
If it is likely that the problem will morph beyond a "simple" regex then perhaps it is NOT the way
to go.

I brought up the question of embedded parens in quotes as follows:
Console.WriteLine("(");
Should this match or not:? Depends on the problem domain
What about this:
(("Hello")("<G>")(":^)"))

That smiley in the third set of parens COULD ruin your day.
So the question is:
Can your solution easily handle small (obvious) changes to the problem set?
If so, go for it.
If not, then you are doing no favors to the guy who needs to fix the code.

I am not passing judgment of the use of the Regex.
I find it difficult to understand, but, if it works...what the heck.
When I solve a problem I always play devil's advocate with my solution.
Have I built a solution that is robust enough to handle obvious (and not so obvious) changes?

Bill

Nov 17 '05 #26

Agreed on all points. Jon!

--
Kevin Spencer
Microsoft MVP
..Net Developer
A watched clock never boils.

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:11**********************@g14g2000cwa.googlegr oups.com...

Kevin Spencer wrote:
That's not too long for a Regular Expression.

Maybe not for a regular expression, but compared with the simple C#
code, it's much too long and cryptic.
They are quite cryptic, though, if you're not used to the syntax.

I would argue that they're *relatively* cryptic even if you are used to
the syntax. I'm not exactly a total newbie to regular expressions, but
it would have taken me a fair amount of time to work out what that one
was doing and verify it.
Unlike C, there is no ability to
separate sections of instruction, even across lines. Like C, though, the
more you practice, the better you get at it. It also helps to have a good
IDE (or 2) for developing and testing Regular Expressions in.

Does that mean you agree that it's pretty difficult to understand them
by just inspection?
Regular Expressions are quite powerful in terms of how they do their
parsing. It is essentially procedural, and character-based.

Oh, they're incredibly powerful and very useful - *in the right
situations*.
Still, in this case, I can see how Marcus' code would be a more elegant
solution. Just wanted to pique your interest in the power of Regular
Expressions, which are invaluable in certain situations, such as parsing
complex patterns in large strings and/or documents, replacing patterns
rather than identical blocks of text, etc.

I certainly appreciate that regexes are useful. I just don't think this
is a place where they're suitable. I would guess that at least half of
the requests for regular expressions I see on the newsgroups are
inappropriate places to apply regular expressions. A lot of people seem
to approach any textual problem with the immediate idea of "use a
regular expression!" without considering whether or not it's the
best/simplest way of solving it.

Jon

Nov 17 '05 #27

Michael S

"Ludovic SOEUR" <Lu***********@hotmail.com> wrote in message
news:%2****************@TK2MSFTNGP15.phx.gbl...

I did not think that my post would start a debate. So as it has started, I
have to reply.
Well, you can't expect a nice troll like me not to participate in one. =)
While I'm a joker and probably will drive you nuts below and in future
posts;
I do think you have created a nice debate and have some serious points.
I put this sample to prove that it's possible with regex and I would like
to
do a time comparison too. I still stay on my point of view when I say I
prefer RegEx version if you do not need to do this check 100 000 times.
What works and what we find desirable are seldom the same.
We all know you CAN do it, but do we really WANT it. What do we find USEFUL?
First of all, I would like to answer to Michael's reply. No, I'm not evil
enough to write such a RegEx. I used the Regex from the Ryan Byington's
one
http://blogs.msdn.com/bclteam/archiv...15/396452.aspx
that is very well explained. The only improvment was to remove all 'names'
and to capture ONLY valid strings (the ones Steve is looking for). I
ALWAYS
explain my regular expressions and this one is a very usefull one that is
known by most of who works with regex.
And I was being funny/ironic.
If you do code such a complex regexp, I would expect a full description of
what it does.
The thing is that the commentary would probably take up a lot of space and
be forced to use complex language.

I would bet that for the comment to be useful; it would take up more space
than if you coded it 'by hand' as in the example in the previous post.
Hence, not only do you got a cryptic regexp, you also have a cryptic comment
that actually is poetry, for what code could better describe, taking up same
amount of .cs-space.
Maintenance for this case is useless. It's like providing a function for
the
System namespace. You dont have to do maintenance for "substring" method
of
string class. When a regular expression is known to be correct and used by
many developpers, you don't have to change anything. Only put a link to
the
original explanation and add comment in your code if you want to be more
precise. On the other hand, Marcus code must be commented, and i would
say,
more than the regex one...it should be surprising for most people who are
not used to balancing in regex.
Amagad. I don't know where to begin...

First things first:
This is a typical case for maintanance. Wanna bet that the OP (his/hers
customer really) comes back and says: - I need to to skip '(' and ')' that
are inside strings a la " and also support braces and curly braces.

Code tend to change...

Secondly:
This is also a typical example for when "developers" don't care for
"maintainers".
As an architect and technical project manager, a part of my job is to
'approve code'.
Often I read 'smart code' like in your example and simply won't accept it.
Too complex.

While being a joking lurking troll, with some (dubious) knowledge, in this
newsgroup; I can be a pretty serious dude at work. When I review code, there
is just one criteria: - Is this maintainable? If so, quality and
performance; are aspects I'd consider. But the first question takes care of
several sub-questions. Is it understandable? Is it readable? Does it need
commenting? Is it fault tolerant? Are all bounds, constraints and exits
handled? Is that visible? Do I like what I read?

Your regexp would never make it to the test-server if I eyeballed it. =)
Second, to answer to Jon, it took me 2 minutes to understand the original
regex from Ryan Byington. Firstly because it's well commented and also
because I am used to writting balanced regex.
<-snip->
Well, YOU are perhaps. The poor dude or gal that needs to modify your
regexp, 3 years after you wrote, it might not be as clever as you.... Or
even find your code as 'fun' as you thought it was when you wrote it. I've
seen it before and I'll probably see it again in the future: People with a
lot of free brainpower doing smart stuff to please themselves and not the
project, system and customer they work for.

Are you even sure that regexp will be still in use for the next 3 years? 5?
10? 20?
I'm not so sure. But what I do know is that keywords like while, for, if,
else and switch are here to stay.
I also believe that humans can and like to follow a sequence, and dislikes
trying to figure out how someone elses regexp works. Call me silly...
I totally disagree when you say that regular expression is not the job for
this work. With one line that you do not mind if it's understable, you can
do everything you need with parenthesis balancing. It's the right tool if
you not need to do a loop over thousands of checks, you don't have to
write
your own classes or method, everything is already done. It's like using a
dll, libray, in four words, reusing what was done.
I totally agree that this is a wonderful example of when not to use regexp.

Just to prove my point:
I for sure didn't even try to understand your regexp. Too complex...
Why don't you write a GOOD comment to your regexp and we'll see if thh rest
of us can understand that.

I dare you.. =)
Ludovic SOEUR.

High Regards
- Michael S

Nov 17 '05 #28

Michael S

Thanks Bill for living on the same planet as I.

Good post!

Happy Maintaince
- Michael S

Nov 17 '05 #29

Ludovic SOEUR <Lu***********@hotmail.com> wrote:

I certainly appreciate that regexes are useful. I just don't think this
is a place where they're suitable. I would guess that at least half of
the requests for regular expressions I see on the newsgroups are
inappropriate places to apply regular expressions. A lot of people seem
to approach any textual problem with the immediate idea of "use a
regular expression!" without considering whether or not it's the
best/simplest way of solving it.

Yes, I agree with you. that is a good summary of Regex usage today.
I will however add, for this topic (validating and parsing expressions with
parenthesis), that a known regex does all the job requested.
Of course you can write code, write your own libraries that should be more
elegant (it depends on what you think is elegant). I prefer reuse what is
already done and only optimise it if necessary.

You had to go to a web page to find that code. I'm sure you *could*
have found something like Marcus's code on a web page. Alternatively,
the code that Marcus wrote was easy to write in the first place - no
need to hunt it down.

Given the option of "look for something that someone else has done,
which I (and anyone reading the code) will need to put significant
effort into understanding" and "write something from scratch in about
10 lines of easily understandable code" I'll pick the second one every
time.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #30

Bill Butler

"Michael S" <a@b.c> wrote in message news:Ou**************@TK2MSFTNGP12.phx.gbl...

Thanks Bill for living on the same planet as I.

Good post!

That's funny.
I just read your post with the same thought. {of course Bill was replaced with Michael in my thought
:^) }

Bill

Nov 17 '05 #31

Ludovic SOEUR <Lu***********@hotmail.com> wrote:

<snip>

What is the best ? I think it depends on what you really want to do.
What is the more readable ? It depends on if you have ever seen this regex.
If yes, both are the same. If not, of course Marcus' code is the one to use.
So you will say that my code is not maintenable ? I disagree. Firstly, there
is no need to modify it (it's like modifying indexOf of string class). A
black box ? It's the same when you use a DLL or use any of the function in
C#.

Let's take a step back for a second.

I think we can both agree that it's worth encapsulating the logic
(however it's implemented) in a method - or possibly even in its own
class, if it needs to be shared between different projects.

Let's look at the different "stakeholders" in the code, and which
version they'd prefer:

1) The "client" of the method - whatever calls it: they're unlikely to
care much, so long as it works. The regex will be a bit slower, but I
think it's unlikely to be a problem in most situations.

2) The original coder: you looked up a regular expression, and given
that you found it on a blog, I suspect that took you a little while of
searching (even if only a couple of minutes). You then had to modify it
to match parentheses rather than square brackets. I believe many coders
could easily have written the code Marcus presented (or very similar
code) in about the same amount of time - and been more confident of it
working. (There are plenty of regular expressions out there which
*don't* work, for instance for email address validation.)

3) The code reviewer: I would always want to understand any code I
approved for check-in. That would mean I'd have to go to that blog,
read through the whole entry, then look back and check that the change
from square brackets to parentheses had been done correctly. Or, I
could look over about 10 lines of very simple C#. For me, the regex
loses heavily here.

4) The maintenance engineer trying to debug a problem: The first rule
of debugging is not to assume that things are actually working - so the
maintenance engineer working through the code would probably have to
understand the regular expression too. Another trip to the blog...
Also, if a typo *had* been made somewhere when entering the regular
expression, it's a lot easier to step through the simple code than it
is to step through the regular expression. Yes, you could pull up a
regular expression workbench, but then it's got to accept the same
kinds of regular expressions as .NET, etc.

5) The maintenance engineer trying to change the behaviour: Again,
you'd have to understand the code first. Again, I believe that would
take longer with the regular expression version, even assuming a
reasonable understanding of regular expressions (and with someone who
is pretty rusty on regular expressions, it could take a *lot* longer).
Then you'd have to decide how to go about making the change - fiddle
with the existing regular expression (which involves making sure you
really, really understand it very thoroughly) or trying to find a new
one on the web? Alternatively, modify the simple C#... again, I believe
the regex loses heavily.

I can't see any stakeholder who benefits from using the regular
expression version, unless any of them are *really, really* familiar
with regular expressions (uses them day-in/day-out) and *unfamiliar*
with C#. I think that's an unlikely scenario.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #32

Michael S

Hehe.

We should set up a virtual company and hire Ludovic with a nice salary and
have him maintain code for a while. With that experience he'll come running
back late, in like a week, crawling with bugs Then we can fire him without
even pay him for doing nothing. =)

Happy Maintaining
- Michael S

ps.
Ludovic: I hope you get that I'm just being funny on your behalf. I hope
you'll forgive me.
"Bill Butler" <qw****@asdf.com> wrote in message
news:CA87f.6596$%A1.5723@trndny01...

"Michael S" <a@b.c> wrote in message
news:Ou**************@TK2MSFTNGP12.phx.gbl...
Thanks Bill for living on the same planet as I.

Good post!

That's funny.
I just read your post with the same thought. {of course Bill was replaced
with Michael in my thought :^) }

Bill

Nov 17 '05 #33

Thanks all for all your comments about this debate. That 's very interesting
to see all these points of view.

I agree on all everybody have said since the beginning except for 2 things :

1) You say that in ten lines, you are able to write something that can say
if the balance matching is correct. You did not say how many lines it takes
to have a collection with all matching subelements. Again, in one line, that
is done. Do you know why there is XPath for xml ? To simplify treatment for
selecting a node. In one line, you can reach the nodes you wanted without
rewritting the same similar code may times. I consider it's the same with
Regex.
2) There some case about little changes where the regex modification is
quicker than a code like marcus' one. If instead of evaluating expression
with only parenthesis, you want to find all arguments of a function in a
dynamical compilation of a source code. You only have to add the name of the
function between the first parenthesis to match. With a code like marcus's
one, you will have to write a string parser wich is already implemented in
regex.

Seeing all you replies, I modify what I said at the beginning of this
debate. Maybe only for matching parenthesis, it's better not to use regex.
When you have to add parsing with the parenthesis matching, I'm sure that
regex is the proper tool.

P.S. Michael : I could not stop laughting seeing your reply about me. Why
not try to work in the same company ? It should be very interesting.... ;-)

Ludovic.
"Jon Skeet [C# MVP]" <sk***@pobox.com> a écrit dans le message de
news:MP************************@msnews.microsoft.c om...

Ludovic SOEUR <Lu***********@hotmail.com> wrote:
I certainly appreciate that regexes are useful. I just don't think this is a place where they're suitable. I would guess that at least half of
the requests for regular expressions I see on the newsgroups are
inappropriate places to apply regular expressions. A lot of people seem to approach any textual problem with the immediate idea of "use a
regular expression!" without considering whether or not it's the
best/simplest way of solving it.

Yes, I agree with you. that is a good summary of Regex usage today.
I will however add, for this topic (validating and parsing expressions with parenthesis), that a known regex does all the job requested.
Of course you can write code, write your own libraries that should be more elegant (it depends on what you think is elegant). I prefer reuse what is already done and only optimise it if necessary.

You had to go to a web page to find that code. I'm sure you *could*
have found something like Marcus's code on a web page. Alternatively,
the code that Marcus wrote was easy to write in the first place - no
need to hunt it down.

Given the option of "look for something that someone else has done,
which I (and anyone reading the code) will need to put significant
effort into understanding" and "write something from scratch in about
10 lines of easily understandable code" I'll pick the second one every
time.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #34

Michael S

"Ludovic SOEUR" <Lu***********@hotmail.com> wrote in message
news:uG**************@TK2MSFTNGP15.phx.gbl...

P.S. Michael : I could not stop laughting seeing your reply about me. Why
not try to work in the same company ? It should be very interesting....
;-)
Nice that I made you laugh.
Me and you at the same work-place?
I think I would need an army of ninja-wizards.. and bears too...

http://unix.rulez.org/~calver/pictures/mordor.gif
Ludovic.

Regards
- Michael S

Nov 17 '05 #35

Ludovic SOEUR <Lu***********@hotmail.com> wrote:

Thanks all for all your comments about this debate. That 's very interesting
to see all these points of view.

I agree on all everybody have said since the beginning except for 2 things :

1) You say that in ten lines, you are able to write something that can say
if the balance matching is correct. You did not say how many lines it takes
to have a collection with all matching subelements. Again, in one line, that
is done. Do you know why there is XPath for xml ? To simplify treatment for
selecting a node. In one line, you can reach the nodes you wanted without
rewritting the same similar code may times. I consider it's the same with
Regex.
Indeed, it would be significantly more complicated to return the actual
elements. Of course, that's not what was asked for. If it had been,
it's quite possible that the regular expression would have been the
right solution. I'm not sure though - I'd have to look at both a
regular expression solution and a "hand-crafted" ones.
2) There some case about little changes where the regex modification is
quicker than a code like marcus' one. If instead of evaluating expression
with only parenthesis, you want to find all arguments of a function in a
dynamical compilation of a source code. You only have to add the name of the
function between the first parenthesis to match. With a code like marcus's
one, you will have to write a string parser wich is already implemented in
regex.
However, in any of those changes, you have to really understand the
original regular expression to start with - the same regular expression
that provoked a comment of "My brain just exploded" from someone who is
already familiar with regexes. You then end up with code which is that
hard to read again after making the change.
Seeing all you replies, I modify what I said at the beginning of this
debate. Maybe only for matching parenthesis, it's better not to use regex.
When you have to add parsing with the parenthesis matching, I'm sure that
regex is the proper tool.

I wouldn't say that I was *sure* that regular expressions would be the
way to go, but they'd certainly be much more likely to be the right
solution in that case than they are when just matching.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #36

Kevin

Couldn't you use the split function.

string a = "a( ( 'john' ) and ( 'jane' ) and ( 'joe' ) )";
string[] b;

b = a.Split( '(' );
MessageBox.Show((b.Length-1).ToString());

b = a.Split( ')' );
MessageBox.Show((b.Length-1).ToString());

Nov 17 '05 #37