On Jun 12, 6:06*am, David C. Ullrich <dullr...@sprynet.comwrote:
There's no regex that detects balanced parentheses,
or is there?
That is, search('and so ((x+y)+z) = (x+(y+z))')
should return '((x+y)+z)'.
Not just a theoretical question, I'm cleaning up
a large body of TeX code and a regex that did that
would be very convenient. (Yes, I know it's not
hard to detect balanced parentheses by hand...)
I don't know that stuff, but I seen to recall reading
that there's a theoretical notion of "regular expression"
in CS, and a regular expression in that sense cannot
do this. But in the same place I read that the actual
regexes in programming languages include things which
are not regular expressions in that theoretical sense.
David C. Ullrich
Pyparsing includes several helper methods for building common
expression patterns, such as delimitedList, oneOf, operatorPrecedence,
countedArray - and a fairly recent addition, nestedExpr. nestedExpr
creates an expression for matching nested text within opening and
closing delimiters, such as ()'s, []'s, {}'s, etc. The default
delimiters are ()'s. You can also specify a content expression, so
that pyparsing will look for and construct meaningful results. The
default is to return any text nested within the delimiters, broken up
by whitespace.
Here is your sample string parsed using the default nestedExpr:
>>from pyparsing import nestedExpr
for e in nestedExpr().searchString('and so ((x+y)+z) = (x+(y+z))'):
... print e[0]
...
[['x+y'], '+z']
['x+', ['y+z']]
Pyparsing found 2 matches in your input string. Note that the parens
are gone from the results - nestedExpr returns a nested list
structure, with nesting corresponding to the ()'s found in the
original string.
Pyparsing supports parse-time callbacks, called 'parse actions', and
it comes with several commonly used methods, such as removeQuotes,
upcaseTokens, and keepOriginalText. The purpose of keepOriginalText
is to revert any structuring or parsing an expression or other parse
actions might do, and just return the originally matched text.
Here is how keepOriginalText gives you back just the nested
parenthetical expressions, without any additional processing or
grouping:
>>from pyparsing import keepOriginalText
matchedParens = nestedExpr().setParseAction(keepOriginalText)
for e in matchedParens.searchString('and so ((x+y)+z) = (x+(y+z))'):
... print e[0]
...
((x+y)+z)
(x+(y+z))
-- Paul