Why do look-ahead and look-behind have to be fixed-width patterns?

inhahe

Hi i'm a newbie at this and probably always will be, so don't be surprised
if I don't know what i'm talking about.

but I don't understand why regex look-behinds (and look-aheads) have to be
fixed-width patterns.

i'm getting the impression that it's supposed to make searching
exponentially slower otherwise

but i just don't see how.

say i have the expression (?<=.*?:.*?:).*
all the engine has to do is search for .*?:.*?:.*, and then in each result,
find .*?:.*?: and return the string starting at the point just after the
length of the match.
no exponential time there, and even that is probably more inefficient than
it has to be.

Jul 18 '05 #1

Subscribe Post Reply

5088

John Machin

inhahe wrote:

Hi i'm a newbie at this and probably always will be, so don't be surprised if I don't know what i'm talking about.

but I don't understand why regex look-behinds (and look-aheads) have to be fixed-width patterns.

i'm getting the impression that it's supposed to make searching
exponentially slower otherwise

but i just don't see how.

say i have the expression (?<=.*?:.*?:).*
all the engine has to do is search for .*?:.*?:.*, and then in each result, find .*?:.*?: and return the string starting at the point just after the length of the match.
no exponential time there, and even that is probably more inefficient than it has to be.

But that's not what you are telling it to do. You are telling it to
firstly find each position which starts a match with .* -- i.e. every
position -- and then look backwards to check that the previous text
matches .*?:.*?:

To grab the text after the 2nd colon (if indeed there are two or more),
it's much simpler to do this:

import re
q = re.compile(r'.*?:.*?:(.*)').search
def grab(s): .... m = q(s)
.... if m:
.... print m.group(1)
.... else:
.... print 'not found!'
.... grab('') not found! grab('::::') :: grab('a:b:yadda') yadda>> grab('a:b:c:d') c:d grab('a:b:')

Jul 18 '05 #2

Steven Bethard

John Machin wrote:

To grab the text after the 2nd colon (if indeed there are two or more),
it's much simpler to do this:
import re
q = re.compile(r'.*?:.*?:(.*)').search
def grab(s):
... m = q(s)
... if m:
... print m.group(1)
... else:
... print 'not found!'
...
grab('') not found!
grab('::::') ::
grab('a:b:yadda') yadda
>grab('a:b:c:d') c:d
grab('a:b:')

Or without any regular expressions:

py> def grab(s):
.... try:
.... first, second, rest = s.split(':', 2)
.... print rest
.... except ValueError:
.... print 'not found!'
....
py> grab('')
not found!
py> grab('a:b:yadda')
yadda
py> grab('a:b:c:d')
c:d
py> grab('a:b:')

py>

To the OP: what is it you're trying to do? Often there is a much
cleaner way to do it without regular expressions...

Steve

Jul 18 '05 #3

Diez B. Roggisch

> but I don't understand why regex look-behinds (and look-aheads) have to be

fixed-width patterns.

i'm getting the impression that it's supposed to make searching
exponentially slower otherwise

That's because of the underlying theory of regular expressions. They are
modelled using so called finite state automata (FSM). These are very much
limited in the complexity of things they can do, and so are regular
expressions. Explaining that further would require to dig deep into the
details of FSM, grammars and languages - deeper than I'm currently willing
to do :) But I wanted to point out that there is a "real" technical reason
for that, not just a lack of feature or willingness to implement one.

--

Regards,

Diez B. Roggisch

Jul 18 '05 #4

Similar topics

preg's 'negative lookbehind' -- broken?

by: Margaret MacDonald | last post by:

I'm trying to write a filter that will ignore text of the form '\_foo\_' while filtering text of the form '_foo_'. In other words, a backslash is meant to protect against the operation of this...

PHP

Negative Lookbehind and Wildcards

by: Thomas F. O'Connell | last post by:

I've been looking through the negative lookbehind posts and haven't yet found a definitive answer to the question I'm about to ask: Does negative lookbehind have lower precedence than even a...

Perl

Negative Lookbehind Replacement?

by: mail | last post by:

Hello, I am trying to use regular expressions to scan a subdirectory structure and run sfv and parity file checks on the directory. However, I am having an issue with my current code using...

Perl

Regex negative lookahead.

by: writebrent | last post by:

I think I need to do a negative lookahead with a regular expression, but I'm a bit confused how to make it all work. Take these example texts: Need to match these two: =========================...

C# / C Sharp

Does boost's regex lib support the lookbehind feature?

by: DSmith1974 | last post by:

Are lookarounds supported in the boost regex lib? In my VS6 project using boost 1.32.0 I can declare a regex as.. <code_snippet> std::wstring wstrFilename = L"01_BAR08"; boost::wregex...

C / C++

I have problem with a simple negative lookahead Reqular Expression

by: intrader | last post by:

The regular expression is /(?!((00000)|(11111)))/ in oRe. That is oRE=/(?!((00000)|(11111)))/ The test strings are 92708, 00000, 11111 in checkStr The expression used is checkStr.search(oRE). The...

Javascript

PCRE - Negative Lookbehind Assertion problem

by: Jim | last post by:

Hi, I'm trying to prefix the "src" attribute of all "img" elements with a given string, $prefix. Here's what I've got: preg_replace('/\<img(.+?)src="(?<!=http)(.+?)"(.+?)\/>/', '<img $1src="'...

PHP

Variable-width lookbehind

by: OKB (not okblacke) | last post by:

For years now Python has not supported variable-length lookbehinds. I'm just curious whether there are any plans to change this in Python 3.0, or before, or after. It seems that Perl 6 will allow...

Python

positive/negative lookahead issue. greedy = problems?

by: vbgunz | last post by:

/* * BEGIN EXAMPLES */ var text = 'A Cats Catalog of Cat Catastrophes and Calamities'; /*** * EXAMPLE 1: negative lookahead assertion logic ***/

Javascript

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware