By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
448,497 Members | 1,295 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 448,497 IT Pros & Developers. It's quick & easy.

Why do look-ahead and look-behind have to be fixed-width patterns?

P: n/a
Hi i'm a newbie at this and probably always will be, so don't be surprised
if I don't know what i'm talking about.

but I don't understand why regex look-behinds (and look-aheads) have to be
fixed-width patterns.

i'm getting the impression that it's supposed to make searching
exponentially slower otherwise

but i just don't see how.

say i have the expression (?<=.*?:.*?:).*
all the engine has to do is search for .*?:.*?:.*, and then in each result,
find .*?:.*?: and return the string starting at the point just after the
length of the match.
no exponential time there, and even that is probably more inefficient than
it has to be.

Jul 18 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a

inhahe wrote:
Hi i'm a newbie at this and probably always will be, so don't be surprised if I don't know what i'm talking about.

but I don't understand why regex look-behinds (and look-aheads) have to be fixed-width patterns.

i'm getting the impression that it's supposed to make searching
exponentially slower otherwise

but i just don't see how.

say i have the expression (?<=.*?:.*?:).*
all the engine has to do is search for .*?:.*?:.*, and then in each result, find .*?:.*?: and return the string starting at the point just after the length of the match.
no exponential time there, and even that is probably more inefficient than it has to be.


But that's not what you are telling it to do. You are telling it to
firstly find each position which starts a match with .* -- i.e. every
position -- and then look backwards to check that the previous text
matches .*?:.*?:

To grab the text after the 2nd colon (if indeed there are two or more),
it's much simpler to do this:
import re
q = re.compile(r'.*?:.*?:(.*)').search
def grab(s): .... m = q(s)
.... if m:
.... print m.group(1)
.... else:
.... print 'not found!'
.... grab('') not found! grab('::::') :: grab('a:b:yadda') yadda>> grab('a:b:c:d') c:d grab('a:b:')


Jul 18 '05 #2

P: n/a
John Machin wrote:
To grab the text after the 2nd colon (if indeed there are two or more),
it's much simpler to do this:
import re
q = re.compile(r'.*?:.*?:(.*)').search
def grab(s):
... m = q(s)
... if m:
... print m.group(1)
... else:
... print 'not found!'
...
grab('') not found!
grab('::::') ::
grab('a:b:yadda') yadda
>grab('a:b:c:d') c:d
grab('a:b:')



Or without any regular expressions:

py> def grab(s):
.... try:
.... first, second, rest = s.split(':', 2)
.... print rest
.... except ValueError:
.... print 'not found!'
....
py> grab('')
not found!
py> grab('a:b:yadda')
yadda
py> grab('a:b:c:d')
c:d
py> grab('a:b:')

py>

To the OP: what is it you're trying to do? Often there is a much
cleaner way to do it without regular expressions...

Steve
Jul 18 '05 #3

P: n/a
> but I don't understand why regex look-behinds (and look-aheads) have to be
fixed-width patterns.

i'm getting the impression that it's supposed to make searching
exponentially slower otherwise


That's because of the underlying theory of regular expressions. They are
modelled using so called finite state automata (FSM). These are very much
limited in the complexity of things they can do, and so are regular
expressions. Explaining that further would require to dig deep into the
details of FSM, grammars and languages - deeper than I'm currently willing
to do :) But I wanted to point out that there is a "real" technical reason
for that, not just a lack of feature or willingness to implement one.

--

Regards,

Diez B. Roggisch
Jul 18 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.