By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,869 Members | 1,756 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,869 IT Pros & Developers. It's quick & easy.

Regex Nested Backreferences

P: n/a
For my web-based php regex find/replace do-hickey, I need to match
individual back references and wrap a tag around them so they'll be unique
to the rest of the match for individual color markup. Initially this
would seem easy enough, however not all of a potential regex match is
going to be within a back reference. So it's necessary to replace the
back reference, and only the back reference, while preserving the context
of the match. For example, if I were to search the text

fish this fish fish

looking for
..*?(?<=this )(fish).*

I'd match everything, capturing the second instance of fish into the back
reference. I can't simply take the match and run a replace for fish in
order to apply the highlighting, because then i'd end up with 3
highlighted "fish", 2 of which weren't supposed to be. I also couldn't
simply return the back reference with the markup, as that wouldn't return
the non-back referenced stuff.

My initial solution was to run the original find text over the match to
get the back references, using an extra flag to have it return the offset
of each back reference. So now I have the location of the text within the
string, and can get the length of it from that point from the string
itself. Going backwards so as not to mess with the numeric location with
in the string, it captures back references without losing context or
data. Perfect.

.. . . until back references are nested.

In this example:
(.*?(?<=this )(fish).*)

back reference 1 would be fish this fish fish, back reference 2 would be
fish -- here's where the problem surfaces.

If I wrap back reference 2 in the markup, when I apply back reference 1's
markup it's going to apply the end tag in the wrong place since the string
has increased and the original length calculated no longer applies. If I
replace back reference 1 first, same problem. I'm sure there's some
obvious, simple solution I'm overlooking having exhausted a bunch of
complex attempts to compensate for it. Any fresh perspectives on the best
way to markup nested groups while preserving the integrity of the return?

Below is the function the matches are being passed through, you'll see I'm
useing preg_match_all to get the capture groups as well as the match
location and then using substr_repalce to insert the pseudo-markup.

function hltr($text,$find) {
preg_match_all($find,$text,$hlight,PREG_OFFSET_CAP TURE+PREG_SET_ORDER);
if ( isset($_POST['debug']) || isset($_GET['debug']) ) {
echo "<pre>";
print_r($hlight);
echo "</pre>";
}
$n=count($hlight[0])-1;
$text = $hlight[0][0][0];
while ( $n > 0 ) {
$text =
substr_replace($text,"back$n::".$hlight[0][$n][0]."::bk",$hlight[0][$n][1],strlen($hlight[0][$n][0]));
$n--;
}
return('<strong class="result">'.$text.'</strong>');
}

To see it highlight backreferences correctly:
http://tinyurl.com/aongu
And failing on nested groups
http://tinyurl.com/7jp8c

Thanks . . .

Allen
Feb 6 '06 #1
Share this Question
Share on Google+
4 Replies


P: n/a
On Mon, 06 Feb 2006 20:20:58 -0500, bobzimuta <ej******@gmail.com> wrote:
http://roblocher.com/technotes/regexp.aspx


I don't believe you read my message, Bob -- I'm not asking for help with
regex, I know regex. My problem is that I'm trying to take regex and
highlight various aspects of the syntax, in this case the different sub
groups. Had you read the post, you'd have seen that the links to what I'm
working on can do everything and more than what you linked to. Thanks
anyway.

Allen
Feb 7 '06 #3

P: n/a
I skimmed. I saw you wanted to do some highlighting of regex matches.
This guy (Rob Locher) wrote a nice regex highlighter. Thought you could
possibly get something useful out of it (i.e. analyze his algorithm).
You're welcome anyway.

Feb 8 '06 #4

P: n/a
On Tue, 07 Feb 2006 19:50:12 -0500, bobzimuta <ej******@gmail.com> wrote:
I skimmed. I saw you wanted to do some highlighting of regex matches.
This guy (Rob Locher) wrote a nice regex highlighter. Thought you could
possibly get something useful out of it (i.e. analyze his algorithm).
You're welcome anyway.


I'd have appreciated that explanation -- at any rate, I'm sorry for my
curt response, I'd spent too many hours with code to be any good with
people. I did put together a solution, The working model is linked
below. I might have to check out his source to see if there's anything I
can glean from it anyway. Thanks.

A.

--
http://ReReplace.com
A Web based regular expressions powered find/replace utility
Feb 8 '06 #5

This discussion thread is closed

Replies have been disabled for this discussion.