473,383 Members | 1,879 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

Regex Nested Backreferences

For my web-based php regex find/replace do-hickey, I need to match
individual back references and wrap a tag around them so they'll be unique
to the rest of the match for individual color markup. Initially this
would seem easy enough, however not all of a potential regex match is
going to be within a back reference. So it's necessary to replace the
back reference, and only the back reference, while preserving the context
of the match. For example, if I were to search the text

fish this fish fish

looking for
..*?(?<=this )(fish).*

I'd match everything, capturing the second instance of fish into the back
reference. I can't simply take the match and run a replace for fish in
order to apply the highlighting, because then i'd end up with 3
highlighted "fish", 2 of which weren't supposed to be. I also couldn't
simply return the back reference with the markup, as that wouldn't return
the non-back referenced stuff.

My initial solution was to run the original find text over the match to
get the back references, using an extra flag to have it return the offset
of each back reference. So now I have the location of the text within the
string, and can get the length of it from that point from the string
itself. Going backwards so as not to mess with the numeric location with
in the string, it captures back references without losing context or
data. Perfect.

.. . . until back references are nested.

In this example:
(.*?(?<=this )(fish).*)

back reference 1 would be fish this fish fish, back reference 2 would be
fish -- here's where the problem surfaces.

If I wrap back reference 2 in the markup, when I apply back reference 1's
markup it's going to apply the end tag in the wrong place since the string
has increased and the original length calculated no longer applies. If I
replace back reference 1 first, same problem. I'm sure there's some
obvious, simple solution I'm overlooking having exhausted a bunch of
complex attempts to compensate for it. Any fresh perspectives on the best
way to markup nested groups while preserving the integrity of the return?

Below is the function the matches are being passed through, you'll see I'm
useing preg_match_all to get the capture groups as well as the match
location and then using substr_repalce to insert the pseudo-markup.

function hltr($text,$find) {
preg_match_all($find,$text,$hlight,PREG_OFFSET_CAP TURE+PREG_SET_ORDER);
if ( isset($_POST['debug']) || isset($_GET['debug']) ) {
echo "<pre>";
print_r($hlight);
echo "</pre>";
}
$n=count($hlight[0])-1;
$text = $hlight[0][0][0];
while ( $n > 0 ) {
$text =
substr_replace($text,"back$n::".$hlight[0][$n][0]."::bk",$hlight[0][$n][1],strlen($hlight[0][$n][0]));
$n--;
}
return('<strong class="result">'.$text.'</strong>');
}

To see it highlight backreferences correctly:
http://tinyurl.com/aongu
And failing on nested groups
http://tinyurl.com/7jp8c

Thanks . . .

Allen
Feb 6 '06 #1
4 4485
On Mon, 06 Feb 2006 20:20:58 -0500, bobzimuta <ej******@gmail.com> wrote:
http://roblocher.com/technotes/regexp.aspx


I don't believe you read my message, Bob -- I'm not asking for help with
regex, I know regex. My problem is that I'm trying to take regex and
highlight various aspects of the syntax, in this case the different sub
groups. Had you read the post, you'd have seen that the links to what I'm
working on can do everything and more than what you linked to. Thanks
anyway.

Allen
Feb 7 '06 #3
I skimmed. I saw you wanted to do some highlighting of regex matches.
This guy (Rob Locher) wrote a nice regex highlighter. Thought you could
possibly get something useful out of it (i.e. analyze his algorithm).
You're welcome anyway.

Feb 8 '06 #4
On Tue, 07 Feb 2006 19:50:12 -0500, bobzimuta <ej******@gmail.com> wrote:
I skimmed. I saw you wanted to do some highlighting of regex matches.
This guy (Rob Locher) wrote a nice regex highlighter. Thought you could
possibly get something useful out of it (i.e. analyze his algorithm).
You're welcome anyway.


I'd have appreciated that explanation -- at any rate, I'm sorry for my
curt response, I'd spent too many hours with code to be any good with
people. I did put together a solution, The working model is linked
below. I might have to check out his source to see if there's anything I
can glean from it anyway. Thanks.

A.

--
http://ReReplace.com
A Web based regular expressions powered find/replace utility
Feb 8 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Ivan Marsh | last post by:
Hey Folks, This isn't actually a perl question but since you folks ar the regex experts I thought you might be able to help. I want to do a search and replace in my text editor (that supports...
0
by: Dean H. Saxe | last post by:
I'm currently developing a tool in perl to search out potential XSS (Cross Site Scripting) vulnerabilities and correct them in a ColdFusion based web app. I've been having great success so far,...
3
by: Vibha Tripathi | last post by:
Hi Folks, I put a Regular Expression question on this list a couple days ago. I would like to rephrase my question as below: In the Python re.sub(regex, replacement, subject)...
5
by: bluesrift | last post by:
Using the WYSIWYG contenteditable feature, Internet Explorer will often add a style to the image tag to define its display size after the image has been dragged to display at something other than...
4
by: Amy Dillavou | last post by:
Can someone help me with understanding how python uses backreferences? I need to remember the item that was last matched by the re engine but i cant seem to understand anything that I find on...
9
by: Whitless | last post by:
Okay I am ready to pull what little hair I have left out. I pass the function below my String to search, my find string (a regular expression) and my replace string (another regular expression)....
13
by: Chris Lieb | last post by:
I am trying to write a regex that will parse BBcode into HTML using JavaScript. Everything was going smoothly using the string class replace() operator with regex's until I got to the list tag....
8
by: Bob | last post by:
I need to create a Regex to extract all strings (including quotations) from a C# or C++ source file. After being unsuccessful myself, I found this sample on the internet: ...
15
by: Kay Schluehr | last post by:
I have a list of strings ls = and want to create a regular expression sx from it, such that sx.match(s) yields a SRE_Match object when s starts with an s_i for one i in . There might be...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.