473,394 Members | 1,717 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

regexp: not matching a sequence of characters

Hi,

I'm fairly new to regular expressions, and this may be a rather dumb
question, but so far I haven't found the answer in any tutorial or reference
yet...

If I have f.i. the string "The {{{{power of {{{{regular expressions}}}}
comes from}}}} the ability to include alternatives and repetitions in the
pattern." from which I want to extract chunks starting with "{{{{" and
ending with "}}}}".

When I use "/\{{4}((?s).*?)\}{4}/" as the pattern, it matches "{{{{power of
{{{{regular expressions}}}}".

What I want is for it to match "{{{{regular expressions}}}}", so somehow I
should be able to avoid having "{{{{" contained in the subpattern.
Exclusion, afaik, only works with character classes and not with with
sequences of characters.
I could make the the subpattern exclude the "{" character as in
"/\{{4}((?s)[^\{]*?)\}{4}/", but this does not satify my needs because I
want to avoid the occurence of having 4 accolades, no more no less...

so, how would I have to write the pattern to exclude a sequence of specific
characters, rather than a character class?

Any help or pointers to tutorials that tackle this subject would be great.

thanks in advance

dominique
Jul 17 '05 #1
4 5472
>> (snip, since I restate the question anyway)

Okay, as I see it, you want to extract:

Anything in {{{{four brackets}}}}. Things like {{{{this}}}}.
^^^^^^^^^^^^^ ^^^^
without picking up the {s or }s. In PCRE (preg functions):
/{{{{([^{}]+)}}}}/
should work.

This means: Match "{{{{", followed by more than one of anything that isn't
{ or }, which we capture (the parentheses), followed by "}}}}".

just look in your "matches" array under position 1.

--
-- Rudy Fleminger
-- sp@mmers.and.evil.ones.will.bow-down-to.us
(put "Hey!" in the Subject line for priority processing!)
-- http://www.pixelsaredead.com
Jul 17 '05 #2
> Okay, as I see it, you want to extract:

Anything in {{{{four brackets}}}}. Things like {{{{this}}}}.

I want to extract anything between four brackets, indeed
except that it should be able to contain any number of nested brackets,
EXCEPT for the occurrence of 4 brackets in a row

{{{{four brackets}}}} should match
{{{{four {{{more}}} brackets}}}} should match
{{{{fo{ur {{m{{{o}}}re brack}}}ets}}}} should match

{{{{four {{{{more}}}} brackets}}}} should NOT match, and just return
{{{{match}}}}

I would need ungreedyness in the reverse direction or something like that
:-)

thanx for replying though!
Jul 17 '05 #3
Regarding this well-known quote, often attributed to Dominique Callewaert's
famous "Thu, 3 Jun 2004 09:52:55 +0200" speech:
Okay, as I see it, you want to extract:

Anything in {{{{four brackets}}}}. Things like {{{{this}}}}.

I want to extract anything between four brackets, indeed
except that it should be able to contain any number of nested brackets,
EXCEPT for the occurrence of 4 brackets in a row

{{{{four brackets}}}} should match
{{{{four {{{more}}} brackets}}}} should match
{{{{fo{ur {{m{{{o}}}re brack}}}ets}}}} should match

{{{{four {{{{more}}}} brackets}}}} should NOT match, and just return
{{{{match}}}}

I would need ungreedyness in the reverse direction or something like that
:-)

thanx for replying though!


Well, I'm not sure how to use them, so I'll leave that up to you, but it
looks like you want a "negative lookahead".

I suppose you could also loop and try the regexp again in the match, but
that's frankly a kludge.

--
-- Rudy Fleminger
-- sp@mmers.and.evil.ones.will.bow-down-to.us
(put "Hey!" in the Subject line for priority processing!)
-- http://www.pixelsaredead.com
Jul 17 '05 #4
On Tue, 1 Jun 2004, gs*@wetnet.com wrote:
Hi,

I'm fairly new to regular expressions, and this may be a rather dumb
question, but so far I haven't found the answer in any tutorial or
reference yet...

If I have f.i. the string "The {{{{power of {{{{regular
expressions}}}} comes from}}}} the ability to include alternatives
and repetitions in the pattern." from which I want to extract chunks
starting with "{{{{" and ending with "}}}}".

When I use "/\{{4}((?s).*?)\}{4}/" as the pattern, it matches
"{{{{power of {{{{regular expressions}}}}".

What I want is for it to match "{{{{regular expressions}}}}", so
somehow I should be able to avoid having "{{{{" contained in the
subpattern. Exclusion, afaik, only works with character classes and
not with with sequences of characters. I could make the the
subpattern exclude the "{" character as in
"/\{{4}((?s)[^\{]*?)\}{4}/", but this does not satify my needs
because I want to avoid the occurence of having 4 accolades, no more
no less...

so, how would I have to write the pattern to exclude a sequence of
specific characters, rather than a character class?

Any help or pointers to tutorials that tackle this subject would be
great.


The problem you're trying to tackle is hard with regular expressions --
matching balanced parenthesized strings really requires a stack, unless
you can limit the depth of nesting.

One approach would be to break the string into pieces separated by
either {{{{ or }}}} and then you could look at the pieces. If you make
functions that match the pieces, PHP supplies the stack you need.

The "greedy match" problem (matching the longest possible string) is
getting you -- if you could say "match the shortest string between {{{{
and }}}}" you'd be home free, right?

PHP includes both ereg and preg functions -- the latter use the regular
expressions from Perl -- so tutorials on Perl regular expressions would
be useful. That being the case, there's an O'Reilly book:
Jeffrey Friedl's "Mastering Regular Expressions" that might be useful --
it's about Perl regular expressions. One thing that's great about
Perl's regular expressions is that they have some extensions that allow
you to say you want the shortest match.

To get around the "groups of 4" problem, see if you can find a character
that doesn't occur in the strings --
For instance, if you knew '<' and '>' did not occur, you could replace
'{{{{' with '<' and '}}}}' with '>' and then you can use character
classes -- [^>] would be anything except the '}}}}' for instance. At
then end, you put reverse the transformation. If there aren't any such
characters, you might replace all occurrances of '<' with '&lt;' or some
other string that you have determined isn't there, etc.

<http://www.regular-expressions.info/tutorial.html> might be useful, but
it's may be too basic.

joe

Jul 17 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Martin Lucas-Smith | last post by:
Is there some way of using ereg to detect when certain filename extensions are supplied and to return false if so, WITHOUT using the ! operator before ereg () ? I have an API that allows as an...
8
by: gry | last post by:
I have a string like: {'the','dog\'s','bite'} or maybe: {'the'} or sometimes: {} which I'm trying to parse with the re module.
3
by: Jane Doe | last post by:
Hello, I need to browse a list of hyperlinks, each followed by an author, and remove the links only for certain authors. 1. I searched the archives on Google, but didn't find how to tell the...
5
by: Ronald Fischer | last post by:
I have a server-side JavaScript function returning a string. I would like to test wheather or not the string contains the following pattern: - an equal sign, - followed by one or more characters...
6
by: papa.coen | last post by:
Hi, I need to split/match the following type of (singleline) syntax on all commas (or text in between) that are not between qoutes: A,'B,B',C,,'E',F The text between quotes can be _any_ text...
8
by: regis | last post by:
Greetings, about scanf matching nonempty sequences using the "%" matches a nonempty sequence of anything except '-' "%" matches a nonempty sequence of anything except ']" matches a nonempty...
9
by: =?ISO-8859-1?Q?BJ=F6rn_Lindqvist?= | last post by:
With regexps you can search for strings matching it. For example, given the regexp: "foobar\d\d\d". "foobar123" would match. I want to do the reverse, from a regexp generate all strings that could...
5
by: gentsquash | last post by:
In a setting where I can specify only a JS regular expression, but not the JS code that will use it, I seek a regexp component that matches a string of letters, ignoring case. E.g, for "cat" I'd...
4
by: r | last post by:
Hello, It seems delimiters can cause trouble sometimes. Look at this : <script type="text/javascript"> function isDigit(s) { var DECIMAL = '\\.'; var exp = '/(^?0(' + DECIMAL
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.