By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
434,921 Members | 1,466 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 434,921 IT Pros & Developers. It's quick & easy.

trying to find repeated substrings with regular expression

P: n/a
Hello all,

I'm trying to find substrings that look like 'FOO blah blah blah'
in a string. For example give 'blah FOO blah1a blah1b FOO blah2
FOO blah3a blah3b blah3b' I want to get three substrings,
'FOO blah1a blah1b', 'FOO blah2', and 'FOO blah3a blah3b blah3b'.

I've tried numerous variations on '.*(FOO((?!FOO).)*)+.*'
and everything I've tried either matches too much or too little.

I've decided it's easier for me just to search for FOO, and then
break up the string based on the locations of FOO.

But I'd like to better understand regular expressions.
Can someone suggest a regular expression which will return
groups corresponding to the FOO substrings above?

Thanks for any insights, I appreciate it a lot.

Robert Dodier

Mar 13 '06 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Robert Dodier wrote:
Hello all,

I'm trying to find substrings that look like 'FOO blah blah blah'
in a string. For example give 'blah FOO blah1a blah1b FOO blah2
FOO blah3a blah3b blah3b' I want to get three substrings,
'FOO blah1a blah1b', 'FOO blah2', and 'FOO blah3a blah3b blah3b'. [...] Can someone suggest a regular expression which will return
groups corresponding to the FOO substrings above?


FOO.*?(?=(?:FOO|$))
--
Giovanni Bajo
Mar 13 '06 #2

P: n/a
Robert Dodier wrote:
Hello all,

I'm trying to find substrings that look like 'FOO blah blah blah'
in a string. For example give 'blah FOO blah1a blah1b FOO blah2
FOO blah3a blah3b blah3b' I want to get three substrings,
'FOO blah1a blah1b', 'FOO blah2', and 'FOO blah3a blah3b blah3b'.

I've tried numerous variations on '.*(FOO((?!FOO).)*)+.*'
and everything I've tried either matches too much or too little.
FOO(.*?)(?=FOO|$)

I've decided it's easier for me just to search for FOO, and then
break up the string based on the locations of FOO.


Use re.split() for this.

Kent
Mar 13 '06 #3

P: n/a
Robert Dodier wrote:
I've decided it's easier for me just to search for FOO, and then
break up the string based on the locations of FOO.

But I'd like to better understand regular expressions.


Those who cannot learn regular expressions are doomed to repeat string
searches. Which is not such a bad thing.

txt = "blah FOO blah1a blah1b FOO blah2 FOO blah3a blah3b blah3b"

def fa(s, pat):
retlist = []
try:
while True:
i = s.rindex(pat)
retlist.insert(0,s[i:])
s = s[:i]
except:
return retlist

print fa(txt, "FOO")

Mar 13 '06 #4

P: n/a
[Robert Dodier]
I'm trying to find substrings that look like 'FOO blah blah blah'
in a string. For example give 'blah FOO blah1a blah1b FOO blah2
FOO blah3a blah3b blah3b' I want to get three substrings,
'FOO blah1a blah1b', 'FOO blah2', and 'FOO blah3a blah3b blah3b'.
No need for regular expressions on this one:
s = 'blah FOO blah1a blah1b FOO blah2 FOO blah3a blah3b blah3b'
['FOO' + tail for tail in s.split('FOO')[1:]] ['FOO blah1a blah1b ', 'FOO blah2 ', 'FOO blah3a blah3b blah3b']


I've tried numerous variations on '.*(FOO((?!FOO).)*)+.*'
and everything I've tried either matches too much or too little.


The regular expression way is to find the target phrase followed by any
text followed by the target phrase. The first two are in a group and
the last is not included in the result group. The any-text section is
non-greedy:
import re
re.findall('(FOO.*?)(?=FOO|$)', s)

['FOO blah1a blah1b ', 'FOO blah2 ', 'FOO blah3a blah3b blah3b']
Raymond

Mar 14 '06 #5

This discussion thread is closed

Replies have been disabled for this discussion.