"Razvan" <de********@gmail.comwrote in message
news:11*********************@n76g2000hsh.googlegro ups.com...
On Mar 24, 1:45 pm, "Alan" <a...@spamless.netwrote:
>"Razvan" <defconh...@gmail.comwrote in message
news:11**********************@o5g2000hsb.googlegr oups.com...
Hello there,
I have the following problem:
I have a big html and i want to remove from it everything between some
tags and to keep the rest, of course using regex, but any solution
will be great.
The number and type of tags may vary. Here is an example:
<body>
text text text text text text text
text text text
text text text text
<remove1>
text text text text text text
text text
text
text text text
</remove1>
text text text
text text
<remove1>
text text text text
</remove1>
text text
text text
text text text
<remove2>
text text text text text
text text text
text text
</remove2>
text text text text text
text text text text
</body>
Any suggestions will be appreciated !
Thanks.
regex search and replace with <(/?[^\>]+)and "" leaves just your text
text
text etc
Possible some flavours may need escaping: \<(/?[^\>]+)\>
hth
Alan
i dont understand what are you trying to say. i want to remove
everything between <removeXand </removeXincluding tags.
Sorry, didn't read your post carefully enough. As no other response,
perhaps this may help:
Similar to your original:
<body>
text text text text text text text
text text text
text text text text
<remove1>
text text text text text text
text text
text
text text text
</remove1>
text text text
text text
<anotherremove1>
text text text text
</anotherremove1>
text text
text text
text text text
<remove2>
text text text text text
text text text
text text
</remove2>
text text text text text
text text text text
</body>
Processing this with basically:
(?<=<[ra])(.+\s)+|<[ra]
eg: php processing the file with
$RegStr = '/(?<=<[ra])(.+\s)+|<[ra]/mi';
$OutStr = preg_replace($RegStr,"",$TstStr);
with $TstStr containing the file contents.
will do what you (I think!) want.
Outputs
<body>
text text text text text text text
text text text
text text text text
text text text
text text
text text
text text
text text text
text text text text text
text text text text
</body>
You will need to define the contents of the [ ] enough to identify the
tags and contents you want to remove. Don't know whether this is the best
(simplest?) way to achieve what you want.
If you process the file with a regex search and replace, it will need a
positive look behind assertion capability.
hth
Alan