On Tue, 1 Jun 2004,
gs*@wetnet.com wrote:
Hi,
I'm fairly new to regular expressions, and this may be a rather dumb
question, but so far I haven't found the answer in any tutorial or
reference yet...
If I have f.i. the string "The {{{{power of {{{{regular
expressions}}}} comes from}}}} the ability to include alternatives
and repetitions in the pattern." from which I want to extract chunks
starting with "{{{{" and ending with "}}}}".
When I use "/\{{4}((?s).*?)\}{4}/" as the pattern, it matches
"{{{{power of {{{{regular expressions}}}}".
What I want is for it to match "{{{{regular expressions}}}}", so
somehow I should be able to avoid having "{{{{" contained in the
subpattern. Exclusion, afaik, only works with character classes and
not with with sequences of characters. I could make the the
subpattern exclude the "{" character as in
"/\{{4}((?s)[^\{]*?)\}{4}/", but this does not satify my needs
because I want to avoid the occurence of having 4 accolades, no more
no less...
so, how would I have to write the pattern to exclude a sequence of
specific characters, rather than a character class?
Any help or pointers to tutorials that tackle this subject would be
great.
The problem you're trying to tackle is hard with regular expressions --
matching balanced parenthesized strings really requires a stack, unless
you can limit the depth of nesting.
One approach would be to break the string into pieces separated by
either {{{{ or }}}} and then you could look at the pieces. If you make
functions that match the pieces, PHP supplies the stack you need.
The "greedy match" problem (matching the longest possible string) is
getting you -- if you could say "match the shortest string between {{{{
and }}}}" you'd be home free, right?
PHP includes both ereg and preg functions -- the latter use the regular
expressions from Perl -- so tutorials on Perl regular expressions would
be useful. That being the case, there's an O'Reilly book:
Jeffrey Friedl's "Mastering Regular Expressions" that might be useful --
it's about Perl regular expressions. One thing that's great about
Perl's regular expressions is that they have some extensions that allow
you to say you want the shortest match.
To get around the "groups of 4" problem, see if you can find a character
that doesn't occur in the strings --
For instance, if you knew '<' and '>' did not occur, you could replace
'{{{{' with '<' and '}}}}' with '>' and then you can use character
classes -- [^>] would be anything except the '}}}}' for instance. At
then end, you put reverse the transformation. If there aren't any such
characters, you might replace all occurrances of '<' with '<' or some
other string that you have determined isn't there, etc.
<http://www.regular-expressions.info/tutorial.html> might be useful, but
it's may be too basic.
joe