By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,719 Members | 1,811 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,719 IT Pros & Developers. It's quick & easy.

Recursive regexps?

P: n/a
Hi!

I've been looking at ways of dealing with nested structures in regexps
(becuase I figured that would be faster than the Python parsing code
I've currently got) and came across a few interesting things...

First there is this, mentioning the (?<DEPTH>) and (?<-DEPTH>)
constructs of the .NET regexp matcher:

http://www.rassoc.com/gregr/weblog/a....aspx?post=590

Then there was some documentation on PCRE, mentioning (among other
things) The (?R) construct, which (as far as I can tell) matches the
pattern recursively):

http://www.gsp.com/cgi-bin/man.cgi?s...ic=pcrepattern

It seems that this functionality was introduced in Perl 5.6.

Is this (or something similar) something that would be realistic to
get in the re module? I know I would find it very useful...

--
Magnus Lie Hetland Fallen flower I see / Returning to its branch
http://hetland.org Ah! a butterfly. [Arakida Moritake]
Jul 18 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
It seems there are a *couple* of (experimental) PCRE features that
allow recursive matching:

[T]here is some experimental support for recursive patterns using
the non-Perl items (?R), (?number) and (?P>name). Also, the PCRE
"callout" feature allows an external function to be called during
pattern matching.

[http://www.slabihoud.de/spampal/pcrecompat.html]

All of these seem pretty useful/neat :-}

--
Magnus Lie Hetland Fallen flower I see / Returning to its branch
http://hetland.org Ah! a butterfly. [Arakida Moritake]
Jul 18 '05 #2

P: n/a
"Magnus Lie Hetland" <ml*@furu.idi.ntnu.no> wrote in message
news:slrncpsqp4.o00.ml*@furu.idi.ntnu.no...
It seems there are a *couple* of (experimental) PCRE features that
allow recursive matching:

[T]here is some experimental support for recursive patterns using
the non-Perl items (?R), (?number) and (?P>name). Also, the PCRE
"callout" feature allows an external function to be called during
pattern matching.

[http://www.slabihoud.de/spampal/pcrecompat.html]

All of these seem pretty useful/neat :-}

--
Magnus Lie Hetland Fallen flower I see / Returning to its branch
http://hetland.org Ah! a butterfly. [Arakida Moritake]


For those who find regexp strings to be fairly opaque, pyparsing supports
both of these features.
- Recursive grammars are defined using the Forward() parsing class. See
this snippet from the fourFn.py infix notation parser:

addop = plus | minus
multop = mult | div
pi = CaselessLiteral( "PI" )

expr = Forward()
factor = ( pi | e | fnumber ).setParseAction( pushFirst ) | ( "(" +
expr.suppress() + ")" )
term = factor + ZeroOrMore( ( multop + factor ).setParseAction(
pushFirst ) )
expr << term + ZeroOrMore( ( addop + term ).setParseAction(
pushFirst ) )

expr is defined to be a Forward() instance, and then the recursive contents
are inserted using the "<<" operator.

- Note also the calls to setParseAction(). These represent callouts to
external functions to be invoked during the parsing process. These external
functions can modify the matched text, update global data structures, or
even supersede the matching rules and raise a ParseException, thereby
cancelling the match. In this case, the expression elements are being
pushed onto an expression stack, to be recursively evaluated after parsing
is complete.

-- Paul
Download pyparsing at http://pyparsing.sourceforge.net.
Jul 18 '05 #3

P: n/a

This may or may not be topical to the original post, but I think this would be
cool in python:
string = "abc 123 456 abc"
import re
regex = re.compile(r'(abc (\d+ )+abc)')
s = regex.match(string)
s.groups() ('abc 123 123 abc', ('123 ', '456 '))

(or similar), instead of
s.groups()

('abc 123 123 abc', '123 ')
I dont have a CS degree, but I'm pretty sure that computers can do this sort
of thing. It must not be obvious. I'm thinking of patenting the idea (regex
engine returns nested matches) and copyrighting the name for it.

James

--
James Stroud, Ph.D.
UCLA-DOE Institute for Genomics and Proteomics
611 Charles E. Young Dr. S.
MBI 205, UCLA 951570
Los Angeles CA 90095-1570
http://www.jamesstroud.com/
Jul 18 '05 #4

P: n/a
On Fri, 19 Nov 2004 15:39:02 -0800, James Stroud wrote:
I dont have a CS degree, but I'm pretty sure that computers can do this sort
of thing. It must not be obvious. I'm thinking of patenting the idea (regex
engine returns nested matches) and copyrighting the name for it.


The only possible name is Recursive Descent Regular Expressions.

(CS joke. There are three basic language domains, based on parser power
necessary to recognize it. "Regular Expressions" is the lowest, but the
actual 'regular expression' the math refers to is much, much weaker the re
library, which nowadays is technically on the top level I think since you
can shell out to functions in the middle of an RE operation. A "Recursive
descent parser" is a parser for a subset of the middle class of languages,
the "context-free languages". "Unrestricted grammars" are the top, parsed
by Turing Machines.

Of course, it would be even more accurate to call it "Recursive Descent
Regular Expressions with a side of Turing Completeness" to mix all three
levels, but that's not so catchy for a brand name. Ought to make the
patent office swoon, though.)

Jul 18 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.