Connecting Tech Pros Worldwide Forums | Help | Site Map

Any regex pro's out there?

nel
Guest
 
Posts: n/a
#1: Jun 5 '07
I have two tags:
<!--// Remove Begin //--and <!--// Remove End //-->

I want to use regi_replace() to remove everything between these tags.

The thing is, these tags can be repeated throughout the code.

<!--// Remove Begin //-->(.+)<!--// Remove End //--works, but only
if the tags exists once. Otherwise, it parses out everything between
the first <!--// Remove Begin //--and the last <!--// Remove End //--
Quote:
>.
How could i modify this so that it will...

convert: aaa<!--// Remove Begin //-->bbb <!--// Remove End //--
Quote:
>ccc<!--// Remove Begin //-->ddd<!--// Remove End //-->
into: aaaccc


??


shimmyshack
Guest
 
Posts: n/a
#2: Jun 5 '07

re: Any regex pro's out there?


On Jun 5, 2:18 am, nel <NajibKa...@gmail.comwrote:
Quote:
I have two tags:
<!--// Remove Begin //--and <!--// Remove End //-->
>
I want to use regi_replace() to remove everything between these tags.
>
The thing is, these tags can be repeated throughout the code.
>
<!--// Remove Begin //-->(.+)<!--// Remove End //--works, but only
if the tags exists once. Otherwise, it parses out everything between
the first <!--// Remove Begin //--and the last <!--// Remove End //--
>
Quote:
.
>
How could i modify this so that it will...
>
convert: aaa<!--// Remove Begin //-->bbb <!--// Remove End //-->ccc<!--// Remove Begin //-->ddd<!--// Remove End //-->
>
into: aaaccc
>
??
you need to add a rule: "remove .+ but not if .+ contains the end
marker <!--// Remove End //-->
I am assuming you are doing this inside a [webpage?] where < or
possibly <!-- will be present WITHIN the sections to be removed, so
bbb
could be
<!--comment--><b>hello</>bb
If html will not be present you could simply use a NOT instruction to
look for <
[^<]+

shimmyshack
Guest
 
Posts: n/a
#3: Jun 5 '07

re: Any regex pro's out there?


On Jun 5, 2:18 am, nel <NajibKa...@gmail.comwrote:
Quote:
I have two tags:
<!--// Remove Begin //--and <!--// Remove End //-->
>
I want to use regi_replace() to remove everything between these tags.
>
The thing is, these tags can be repeated throughout the code.
>
<!--// Remove Begin //-->(.+)<!--// Remove End //--works, but only
if the tags exists once. Otherwise, it parses out everything between
the first <!--// Remove Begin //--and the last <!--// Remove End //--
>
Quote:
.
>
How could i modify this so that it will...
>
convert: aaa<!--// Remove Begin //-->bbb <!--// Remove End //-->ccc<!--// Remove Begin //-->ddd<!--// Remove End //-->
>
into: aaaccc
>
??
i should have added, google for ungreedy U switch - your matching is
too greedy, and slurps up one giant match rather than many "least"
matches

Rik
Guest
 
Posts: n/a
#4: Jun 5 '07

re: Any regex pro's out there?


On Tue, 05 Jun 2007 03:56:05 +0200, shimmyshack <matt.farey@gmail.com>
wrote:
Quote:
On Jun 5, 2:18 am, nel <NajibKa...@gmail.comwrote:
Quote:
>I have two tags:
><!--// Remove Begin //--and <!--// Remove End //-->
>>
>I want to use regi_replace() to remove everything between these tags.
>>
>The thing is, these tags can be repeated throughout the code.
>>
><!--// Remove Begin //-->(.+)<!--// Remove End //--works, but only
>if the tags exists once. Otherwise, it parses out everything between
>the first <!--// Remove Begin //--and the last <!--// Remove End //--
>>
>How could i modify this so that it will...
>
i should have added, google for ungreedy U switch - your matching is
too greedy, and slurps up one giant match rather than many "least"
matches
Or just use the ? modifier:
preg_replace('|<!--// Remove Begin //-->.*?<!--// Remove End
//-->|si','',$string);

--
Rik Wasmus
Mike P2
Guest
 
Posts: n/a
#5: Jun 6 '07

re: Any regex pro's out there?


On Jun 5, 11:20 am, Rik <luiheidsgoe...@hotmail.comwrote:
Quote:
On Tue, 05 Jun 2007 03:56:05 +0200, shimmyshack <matt.fa...@gmail.com>
wrote:
Quote:
On Jun 5, 2:18 am, nel <NajibKa...@gmail.comwrote:
Quote:
I have two tags:
<!--// Remove Begin //--and <!--// Remove End //-->
I want to use regi_replace() to remove everything between these tags.
The thing is, these tags can be repeated throughout the code.
<!--// Remove Begin //-->(.+)<!--// Remove End //--works, but only
if the tags exists once. Otherwise, it parses out everything between
the first <!--// Remove Begin //--and the last <!--// Remove End //--
How could i modify this so that it will...
i should have added, google for ungreedy U switch - your matching is
too greedy, and slurps up one giant match rather than many "least"
matches
Or just use the ? modifier:
preg_replace('|<!--// Remove Begin //-->.*?<!--// Remove End
//-->|si','',$string);
--
Rik Wasmus
Just a side note to nel, if you are going to use shimmyshack's U
modifier you have to use PCRE instead as Rik is doing, and be sure not
to copy Rik's exact pattern unless you switch because you are using
PHP's built in regex functions.

At least, I think you are using PHP's built-in regex stuff, assuming
that by regi_replace() you mean eregi_replace()

-Mike PII

nel
Guest
 
Posts: n/a
#6: Jun 6 '07

re: Any regex pro's out there?


Yep thanks, I realized that when I googled "U modifier".

This is what I'm using in case anyone wants to know:

//first replaces any line breaks with a token
identifier since preg_replace doesn't work with multiply lines
$cleaned_content = str_ireplace("\n","<!--// New Line //-->",
$content);
//this creates our regular query sequence... perl-
stylezzz
$reg = '/<!--\/\/ Remove Begin \/\/-->(.+)<!--\/\/ Remove End \/\/--
Quote:
>/U';
$cleaned_content = preg_replace($reg,"",$cleaned_content);
//now just put our line breaks back into place
$cleaned_content = str_ireplace("<!--// New Line //-->","\n",
$cleaned_content);

The above code will replace everything in my string (which I pulled
from an HTML file) with all the <!--// Remove Begin //--tags and
<!--// Remove End //--and anything in between them removed!

The U modifier solved my problem where It was making "abxxab" into ""
when it's supposed to replace everything between a and b instead of
making "abxxab" into "xx";

Thanks again!
-nel


On Jun 5, 8:04 pm, Mike P2 <sumguyovrt...@gmail.comwrote:
Quote:
On Jun 5, 11:20 am, Rik <luiheidsgoe...@hotmail.comwrote:
>
>
>
Quote:
On Tue, 05 Jun 2007 03:56:05 +0200, shimmyshack <matt.fa...@gmail.com>
wrote:
Quote:
On Jun 5, 2:18 am, nel <NajibKa...@gmail.comwrote:
>I have two tags:
><!--// Remove Begin //--and <!--// Remove End //-->
>I want to use regi_replace() to remove everything between these tags.
>The thing is, these tags can be repeated throughout the code.
><!--// Remove Begin //-->(.+)<!--// Remove End //--works, but only
>if the tags exists once. Otherwise, it parses out everything between
>the first <!--// Remove Begin //--and the last <!--// Remove End //--
>How could i modify this so that it will...
i should have added, google for ungreedy U switch - your matching is
too greedy, and slurps up one giant match rather than many "least"
matches
Or just use the ? modifier:
preg_replace('|<!--// Remove Begin //-->.*?<!--// Remove End
//-->|si','',$string);
--
Rik Wasmus
>
Just a side note to nel, if you are going to use shimmyshack's U
modifier you have to use PCRE instead as Rik is doing, and be sure not
to copy Rik's exact pattern unless you switch because you are using
PHP's built in regex functions.
>
At least, I think you are using PHP's built-in regex stuff, assuming
that by regi_replace() you mean eregi_replace()
>
-Mike PII

Closed Thread


Similar PHP bytes