471,325 Members | 1,702 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,325 software developers and data experts.

RE Help

Not specific to Python, but it will be implemented in it... how do I
compile a RE to catch everything between two know values? Here's what
I've tried (but failed) to accomplish... the knowns here are START and
END:

data = "asdfasgSTARTpruyerfghdfjENDhfawrgbqfgsfgsdfg"
x = re.compile('START.END', re.DOTALL)

x.findall(data)

Sep 21 '07 #1
9 1064
data = "asdfasgSTARTpruyerfghdfjENDhfawrgbqfgsfgsdfg"
x = re.compile('START.END', re.DOTALL)
This should work:

x = re.compile('START(.*)END', re.DOTALL)
Sep 21 '07 #2
On Sep 21, 2:44 pm, David <wizza...@gmail.comwrote:
data = "asdfasgSTARTpruyerfghdfjENDhfawrgbqfgsfgsdfg"
x = re.compile('START.END', re.DOTALL)

This should work:

x = re.compile('START(.*)END', re.DOTALL)
You'll want to use a non-greedy match:

x = re.compile(r"START(.*?)END", re.DOTALL)

Otherwise the . will match END as well.

Sep 21 '07 #3
ch************@gmail.com wrote:
On Sep 21, 2:44 pm, David <wizza...@gmail.comwrote:
>>data = "asdfasgSTARTpruyerfghdfjENDhfawrgbqfgsfgsdfg"
x = re.compile('START.END', re.DOTALL)
This should work:

x = re.compile('START(.*)END', re.DOTALL)

You'll want to use a non-greedy match:

x = re.compile(r"START(.*?)END", re.DOTALL)

Otherwise the . will match END as well.
Only if there's a later END in the string, in which case the user's
requirements will determine whether greedy matching is appropriate.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden

Sorry, the dog ate my .sigline

Sep 21 '07 #4
You'll want to use a non-greedy match:
x = re.compile(r"START(.*?)END", re.DOTALL)
Otherwise the . will match END as well.
On Sep 21, 3:23 pm, Steve Holden <st...@holdenweb.comwrote:
Only if there's a later END in the string, in which case the user's
requirements will determine whether greedy matching is appropriate.

regards
Steve
There will be lots of START END combinations in the data. This is more
accurate:

sfgdfg*START*dfhdgh*END*dfdgh*START*dfhfdgh*END*df gsdh*START*sdfhfdhj*END*fdghfdj

The RE should extract the data between each couples of START and END.

Thanks!

Sep 21 '07 #5
On Fri, Sep 21, 2007 at 12:05:51PM -0700, ch************@gmail.com wrote regarding Re: RE Help:
>

x = re.compile('START(.*)END', re.DOTALL)

You'll want to use a non-greedy match:

x = re.compile(r"START(.*?)END", re.DOTALL)

Otherwise the . will match END as well.
The . will only consume END if there is another END to match later on in the string. And then it's a question of desired fuctionality. If the given string is: "abcdSTARTefgENDxyzENDhijk" do you want to match "STARTefgEND" (in which case you need a non-greedy match r".*?" )? or do you want to match "STARTefgENDxyzEND" (in which case you need a greedy match: r".*" )?

Cheers,
Cliff
Sep 21 '07 #6
On Sep 21, 3:32 pm, byte8b...@gmail.com wrote:
You'll want to use a non-greedy match:
x = re.compile(r"START(.*?)END", re.DOTALL)
Otherwise the . will match END as well.

On Sep 21, 3:23 pm, Steve Holden <st...@holdenweb.comwrote:
Only if there's a later END in the string, in which case the user's
requirements will determine whether greedy matching is appropriate.
regards
Steve

There will be lots of START END combinations in the data. This is more
accurate:

sfgdfg*START*dfhdgh*END*dfdgh*START*dfhfdgh*END*df gsdh*START*sdfhfdhj*END*fdghfdj

The RE should extract the data between each couples of START and END.

Thanks!
You'll want to use my version then. Glad to help!

Sep 21 '07 #7
On Friday 21 September 2007, by*******@gmail.com wrote:
Not specific to Python, but it will be implemented in it... how do I
compile a RE to catch everything between two know values? Here's what
I've tried (but failed) to accomplish... the knowns here are START and
END:

data = "asdfasgSTARTpruyerfghdfjENDhfawrgbqfgsfgsdfg"
x = re.compile('START.END', re.DOTALL)

x.findall(data)
I'm not sure finding a variable number of occurences can be done with re. How
about

# data = the string
strings = []
for s in data.split('START')[1:]:
strings.append(s.split('END')[0])
Sep 21 '07 #8
On Sep 21, 4:09 pm, Thomas Jollans <tho...@jollans.comwrote:
On Friday 21 September 2007, byte8b...@gmail.com wrote:
Not specific to Python, but it will be implemented in it... how do I
compile a RE to catch everything between two know values? Here's what
I've tried (but failed) to accomplish... the knowns here are START and
END:
data = "asdfasgSTARTpruyerfghdfjENDhfawrgbqfgsfgsdfg"
x = re.compile('START.END', re.DOTALL)
x.findall(data)

I'm not sure finding a variable number of occurences can be done with re. How
about

# data = the string
strings = []
for s in data.split('START')[1:]:
strings.append(s.split('END')[0])
use re.findall :-)

Sep 21 '07 #9
Thomas Jollans wrote:
On Friday 21 September 2007, by*******@gmail.com wrote:
>Not specific to Python, but it will be implemented in it... how do I
compile a RE to catch everything between two know values? Here's what
I've tried (but failed) to accomplish... the knowns here are START and
END:

data = "asdfasgSTARTpruyerfghdfjENDhfawrgbqfgsfgsdfg"
x = re.compile('START.END', re.DOTALL)

x.findall(data)

I'm not sure finding a variable number of occurences can be done with re. How
about

# data = the string
strings = []
for s in data.split('START')[1:]:
strings.append(s.split('END')[0])
Nice. I've noticed that since I switched from Perl to Python, I hardly
ever use regular expressions anymore. In perl, they're so easy to fire
up that they become the first tool out of the toolbox, but when you make
the barrier to access just a tiny bit higher (import re/re.compile) you
start noticing how easy it is to accomplish most of those feats without
regexes, and much more readably, too.

Of course, it should be noted that the different implementations
suggested behave differently, which could also affect the choice of
method. If you have "abcSTARTdefSTARTghiEND", your version will spit
out strings = ['def', 'ghi'], but a regex, depending on whether it is
greedy or non greedy, will either spit out ['STARTdefSTARTghiEND'] or
['STARTghiEND'].

Correction, it will spit out the first one, whether greedy or not. The
difference comes with two END tags in a row.
Cheers,
Cliff
Sep 21 '07 #10

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

9 posts views Thread by Tom | last post: by
6 posts views Thread by wukexin | last post: by
3 posts views Thread by Colin J. Williams | last post: by
7 posts views Thread by Corepaul | last post: by
5 posts views Thread by Steve | last post: by
8 posts views Thread by Mark | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.