468,556 Members | 2,376 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,556 developers. It's quick & easy.

regexp: not matching a sequence of characters

Hi,

I'm fairly new to regular expressions, and this may be a rather dumb
question, but so far I haven't found the answer in any tutorial or reference
yet...

If I have f.i. the string "The {{{{power of {{{{regular expressions}}}}
comes from}}}} the ability to include alternatives and repetitions in the
pattern." from which I want to extract chunks starting with "{{{{" and
ending with "}}}}".

When I use "/\{{4}((?s).*?)\}{4}/" as the pattern, it matches "{{{{power of
{{{{regular expressions}}}}".

What I want is for it to match "{{{{regular expressions}}}}", so somehow I
should be able to avoid having "{{{{" contained in the subpattern.
Exclusion, afaik, only works with character classes and not with with
sequences of characters.
I could make the the subpattern exclude the "{" character as in
"/\{{4}((?s)[^\{]*?)\}{4}/", but this does not satify my needs because I
want to avoid the occurence of having 4 accolades, no more no less...

so, how would I have to write the pattern to exclude a sequence of specific
characters, rather than a character class?

Any help or pointers to tutorials that tackle this subject would be great.

thanks in advance

dominique
Jul 17 '05 #1
4 5241
>> (snip, since I restate the question anyway)

Okay, as I see it, you want to extract:

Anything in {{{{four brackets}}}}. Things like {{{{this}}}}.
^^^^^^^^^^^^^ ^^^^
without picking up the {s or }s. In PCRE (preg functions):
/{{{{([^{}]+)}}}}/
should work.

This means: Match "{{{{", followed by more than one of anything that isn't
{ or }, which we capture (the parentheses), followed by "}}}}".

just look in your "matches" array under position 1.

--
-- Rudy Fleminger
-- sp@mmers.and.evil.ones.will.bow-down-to.us
(put "Hey!" in the Subject line for priority processing!)
-- http://www.pixelsaredead.com
Jul 17 '05 #2
> Okay, as I see it, you want to extract:

Anything in {{{{four brackets}}}}. Things like {{{{this}}}}.

I want to extract anything between four brackets, indeed
except that it should be able to contain any number of nested brackets,
EXCEPT for the occurrence of 4 brackets in a row

{{{{four brackets}}}} should match
{{{{four {{{more}}} brackets}}}} should match
{{{{fo{ur {{m{{{o}}}re brack}}}ets}}}} should match

{{{{four {{{{more}}}} brackets}}}} should NOT match, and just return
{{{{match}}}}

I would need ungreedyness in the reverse direction or something like that
:-)

thanx for replying though!
Jul 17 '05 #3
Regarding this well-known quote, often attributed to Dominique Callewaert's
famous "Thu, 3 Jun 2004 09:52:55 +0200" speech:
Okay, as I see it, you want to extract:

Anything in {{{{four brackets}}}}. Things like {{{{this}}}}.

I want to extract anything between four brackets, indeed
except that it should be able to contain any number of nested brackets,
EXCEPT for the occurrence of 4 brackets in a row

{{{{four brackets}}}} should match
{{{{four {{{more}}} brackets}}}} should match
{{{{fo{ur {{m{{{o}}}re brack}}}ets}}}} should match

{{{{four {{{{more}}}} brackets}}}} should NOT match, and just return
{{{{match}}}}

I would need ungreedyness in the reverse direction or something like that
:-)

thanx for replying though!


Well, I'm not sure how to use them, so I'll leave that up to you, but it
looks like you want a "negative lookahead".

I suppose you could also loop and try the regexp again in the match, but
that's frankly a kludge.

--
-- Rudy Fleminger
-- sp@mmers.and.evil.ones.will.bow-down-to.us
(put "Hey!" in the Subject line for priority processing!)
-- http://www.pixelsaredead.com
Jul 17 '05 #4
On Tue, 1 Jun 2004, gs*@wetnet.com wrote:
Hi,

I'm fairly new to regular expressions, and this may be a rather dumb
question, but so far I haven't found the answer in any tutorial or
reference yet...

If I have f.i. the string "The {{{{power of {{{{regular
expressions}}}} comes from}}}} the ability to include alternatives
and repetitions in the pattern." from which I want to extract chunks
starting with "{{{{" and ending with "}}}}".

When I use "/\{{4}((?s).*?)\}{4}/" as the pattern, it matches
"{{{{power of {{{{regular expressions}}}}".

What I want is for it to match "{{{{regular expressions}}}}", so
somehow I should be able to avoid having "{{{{" contained in the
subpattern. Exclusion, afaik, only works with character classes and
not with with sequences of characters. I could make the the
subpattern exclude the "{" character as in
"/\{{4}((?s)[^\{]*?)\}{4}/", but this does not satify my needs
because I want to avoid the occurence of having 4 accolades, no more
no less...

so, how would I have to write the pattern to exclude a sequence of
specific characters, rather than a character class?

Any help or pointers to tutorials that tackle this subject would be
great.


The problem you're trying to tackle is hard with regular expressions --
matching balanced parenthesized strings really requires a stack, unless
you can limit the depth of nesting.

One approach would be to break the string into pieces separated by
either {{{{ or }}}} and then you could look at the pieces. If you make
functions that match the pieces, PHP supplies the stack you need.

The "greedy match" problem (matching the longest possible string) is
getting you -- if you could say "match the shortest string between {{{{
and }}}}" you'd be home free, right?

PHP includes both ereg and preg functions -- the latter use the regular
expressions from Perl -- so tutorials on Perl regular expressions would
be useful. That being the case, there's an O'Reilly book:
Jeffrey Friedl's "Mastering Regular Expressions" that might be useful --
it's about Perl regular expressions. One thing that's great about
Perl's regular expressions is that they have some extensions that allow
you to say you want the shortest match.

To get around the "groups of 4" problem, see if you can find a character
that doesn't occur in the strings --
For instance, if you knew '<' and '>' did not occur, you could replace
'{{{{' with '<' and '}}}}' with '>' and then you can use character
classes -- [^>] would be anything except the '}}}}' for instance. At
then end, you put reverse the transformation. If there aren't any such
characters, you might replace all occurrances of '<' with '&lt;' or some
other string that you have determined isn't there, etc.

<http://www.regular-expressions.info/tutorial.html> might be useful, but
it's may be too basic.

joe

Jul 17 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by Martin Lucas-Smith | last post: by
8 posts views Thread by gry | last post: by
5 posts views Thread by Ronald Fischer | last post: by
6 posts views Thread by papa.coen | last post: by
9 posts views Thread by =?ISO-8859-1?Q?BJ=F6rn_Lindqvist?= | last post: by
5 posts views Thread by gentsquash | last post: by
4 posts views Thread by r | last post: by
reply views Thread by NPC403 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.