473,408 Members | 1,761 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,408 software developers and data experts.

Regex problem

Hello there,

I have the following problem:
I have a big html and i want to remove from it everything between some
tags and to keep the rest, of course using regex, but any solution
will be great.
The number and type of tags may vary. Here is an example:

<body>
text text text text text text text
text text text
text text text text

<remove1>
text text text text text text
text text
text
text text text
</remove1>

text text text
text text

<remove1>
text text text text
</remove1>

text text
text text
text text text

<remove2>
text text text text text
text text text
text text
</remove2>

text text text text text
text text text text
</body>

Any suggestions will be appreciated !
Thanks.

Mar 24 '07 #1
3 2360

"Razvan" <de********@gmail.comwrote in message
news:11**********************@o5g2000hsb.googlegro ups.com...
Hello there,

I have the following problem:
I have a big html and i want to remove from it everything between some
tags and to keep the rest, of course using regex, but any solution
will be great.
The number and type of tags may vary. Here is an example:

<body>
text text text text text text text
text text text
text text text text

<remove1>
text text text text text text
text text
text
text text text
</remove1>

text text text
text text

<remove1>
text text text text
</remove1>

text text
text text
text text text

<remove2>
text text text text text
text text text
text text
</remove2>

text text text text text
text text text text
</body>

Any suggestions will be appreciated !
Thanks.
regex search and replace with <(/?[^\>]+)and "" leaves just your text text
text etc

Possible some flavours may need escaping: \<(/?[^\>]+)\>
hth

Alan
Mar 24 '07 #2
On Mar 24, 1:45 pm, "Alan" <a...@spamless.netwrote:
"Razvan" <defconh...@gmail.comwrote in message

news:11**********************@o5g2000hsb.googlegro ups.com...
Hello there,
I have the following problem:
I have a big html and i want to remove from it everything between some
tags and to keep the rest, of course using regex, but any solution
will be great.
The number and type of tags may vary. Here is an example:
<body>
text text text text text text text
text text text
text text text text
<remove1>
text text text text text text
text text
text
text text text
</remove1>
text text text
text text
<remove1>
text text text text
</remove1>
text text
text text
text text text
<remove2>
text text text text text
text text text
text text
</remove2>
text text text text text
text text text text
</body>
Any suggestions will be appreciated !
Thanks.

regex search and replace with <(/?[^\>]+)and "" leaves just your text text
text etc

Possible some flavours may need escaping: \<(/?[^\>]+)\>
hth

Alan
i dont understand what are you trying to say. i want to remove
everything between <removeXand </removeXincluding tags.

Mar 25 '07 #3

"Razvan" <de********@gmail.comwrote in message
news:11*********************@n76g2000hsh.googlegro ups.com...
On Mar 24, 1:45 pm, "Alan" <a...@spamless.netwrote:
>"Razvan" <defconh...@gmail.comwrote in message

news:11**********************@o5g2000hsb.googlegr oups.com...
Hello there,
I have the following problem:
I have a big html and i want to remove from it everything between some
tags and to keep the rest, of course using regex, but any solution
will be great.
The number and type of tags may vary. Here is an example:
<body>
text text text text text text text
text text text
text text text text
<remove1>
text text text text text text
text text
text
text text text
</remove1>
text text text
text text
<remove1>
text text text text
</remove1>
text text
text text
text text text
<remove2>
text text text text text
text text text
text text
</remove2>
text text text text text
text text text text
</body>
Any suggestions will be appreciated !
Thanks.

regex search and replace with <(/?[^\>]+)and "" leaves just your text
text
text etc

Possible some flavours may need escaping: \<(/?[^\>]+)\>
hth

Alan

i dont understand what are you trying to say. i want to remove
everything between <removeXand </removeXincluding tags.
Sorry, didn't read your post carefully enough. As no other response,
perhaps this may help:

Similar to your original:

<body>
text text text text text text text
text text text
text text text text

<remove1>
text text text text text text
text text
text
text text text
</remove1>

text text text
text text

<anotherremove1>
text text text text
</anotherremove1>

text text
text text
text text text

<remove2>
text text text text text
text text text
text text
</remove2>

text text text text text
text text text text
</body>

Processing this with basically:

(?<=<[ra])(.+\s)+|<[ra]

eg: php processing the file with
$RegStr = '/(?<=<[ra])(.+\s)+|<[ra]/mi';
$OutStr = preg_replace($RegStr,"",$TstStr);
with $TstStr containing the file contents.

will do what you (I think!) want.
Outputs

<body>
text text text text text text text
text text text
text text text text
text text text
text text
text text
text text
text text text
text text text text text
text text text text
</body>
You will need to define the contents of the [ ] enough to identify the
tags and contents you want to remove. Don't know whether this is the best
(simplest?) way to achieve what you want.

If you process the file with a regex search and replace, it will need a
positive look behind assertion capability.

hth
Alan
Mar 28 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Jon Maz | last post by:
Hi All, Am getting frustrated trying to port the following (pretty simple) function to CSharp. The problem is that I'm lousy at Regular Expressions.... //from...
4
by: aevans1108 | last post by:
expanding this message to microsoft.public.dotnet.xml Greetings Please direct me to the right group if this is an inappropriate place to post this question. Thanks. I want to format a...
7
by: bill tie | last post by:
I'd appreciate it if you could advise. 1. How do I replace "\" (backslash) with anything? 2. Suppose I want to replace (a) every occurrence of characters "a", "b", "c", "d" with "x", (b)...
6
by: Dave | last post by:
I'm struggling with something that should be fairly simple. I just don't know the regext syntax very well, unfortunately. I'd like to parse words out of what is basically a boolean search...
17
by: clintonG | last post by:
I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher ...
3
by: jg | last post by:
I made a mistake somewhere in my vb code and I look, check and read against the articles and help on regex, I still can't find the mistake I made. I know my test string and the test patterns...
6
by: Talin | last post by:
I've run in to this problem a couple of times. Say I have a piece of text that I want to test against a large number of regular expressions, where a different action is taken based on which regex...
16
by: Mark Chambers | last post by:
Hi there, I'm seeking opinions on the use of regular expression searching. Is there general consensus on whether it's now a best practice to rely on this rather than rolling your own (string)...
7
by: =?Utf-8?B?amFj?= | last post by:
Hi, I have problems with following code and don’t find the bug : // Set ArrayList aArray = new ArrayList(); regStr = new Regex(@"\?)*(\d+)\]"); if(text != null && regStr.IsMatch(text))...
1
by: jonnyboy6969 | last post by:
Hi All Really hoping someone can help me out here with my deficient regex skills :) I have a function which takes a string of HTML and replaces a term (word or phrase) with a link. The pupose...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.