473,423 Members | 1,786 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,423 software developers and data experts.

regex: how to loop through individual matches

I have some vb.net code that is running a regex, matching groups, and
replacing them. I'm trying to come up with a simple script that will strip
all attributes from all HTML tags.

This is what I have:

================================================== ===========

function stripAllAttributes(ByVal textToParse as String, ByVal tagToFind as
String) as String
dim s as String
dim r2 as new regex( _
"(?<theTag>(<" & tagToFind & "))" & _
"(?<everythingUpToEndTag>(([^/>].|\n)*))" _
, RegexOptions.IgnoreCase)
dim m2 as Match = r2.Match(textToParse)
dim strTheTag as String = m2.Groups("theTag").Value.toString
s = r2.Replace(textToParse, strTheTag)
return s
end function

================================================== ===========

This works, but, as you can see, I need to pass each tag I want to strip all
attributes from separately. The reason is that if I just use a regex like
this to grab the opening part of the tag:

(<)([^/>\s\n])*

it WILL grab the opening part of the first tag it sees, but it will then use
the first matched text to replace ALL matches it finds in the rest of the
text it is parsing. I imagine this is due more to my vb code than regex.

For example, if my markup is this:

<table width="100">
<tr width="100">
<td width="100">

And if I run the function (using the generic 'find all tags' regex) against
that, I get this returned:

<table>
<table>
<table>

When I want this:

<table>
<tr>
<td>

Off the top of my head, I can only think of doing it this way:

Function find first HTML tag to strip (ie, find the first tag that has at
least one attribute)
if there's a match
then pass that onto my current function (shown above) to replace all
instances of that tag.
then recursively call this same function so that it finds the next tag
else
assume it has stripped all attributes from all tags
end if

Or is there a way in my original script to do the same without the recursive
part?

-Darrel
Nov 19 '05 #1
1 2879
I'd try something like the following:
function stripAllAttributes(ByVal textToParse as String, ByVal tagToFind
as String) as String
dim s as String
dim r2 as new regex( _
"(?<theTag>(<" & tagToFind & "))" & _
"(?<everythingUpToEndTag>(([^/>].|\n)*))" _
, RegexOptions.IgnoreCase)
s = r2.Replace(textToParse, "$1>")
return s
end function

That uses a backreference to the first match ($1) in the replace
command. For more info on the backreference, check out
http://www.devarticles.com/c/a/VB.Ne...ons-in-.NET/1/

Blair
Nov 19 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Bill Cohagan | last post by:
I'm looking for help with a regular expression question, so my first question is which newsgroup is the best one to post to? Just in case *this* is the best choice, here's the problem: I'm...
7
by: bill tie | last post by:
I'd appreciate it if you could advise. 1. How do I replace "\" (backslash) with anything? 2. Suppose I want to replace (a) every occurrence of characters "a", "b", "c", "d" with "x", (b)...
20
by: jeevankodali | last post by:
Hi I have an .Net application which processes thousands of Xml nodes each day and for each node I am using around 30-40 Regex matches to see if they satisfy some conditions are not. These Regex...
17
by: clintonG | last post by:
I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher ...
7
by: MrNobody | last post by:
I'm trying to do some regex in C# but for some reason linebreaks are causing my regex to not work. the test string goes like this: string ss = "<tagname...
7
by: Jordi Rico | last post by:
Hi, I know I can split a string into an array doing this: Dim s As String()=Regex.Split("One-Two-Three","-") So I would have: s(0)="One" s(1)="Two"
2
by: GS | last post by:
How can one avoid capturing leading empty or blank lines? the data I deal with look like this "will be paid on the dates you specified. xyz supplier amount: $100.52 when: September 07,...
1
by: jonnyboy6969 | last post by:
Hi All Really hoping someone can help me out here with my deficient regex skills :) I have a function which takes a string of HTML and replaces a term (word or phrase) with a link. The pupose...
4
by: Danny Ni | last post by:
Hi, The following code snippet is causing CPU to max out on my local machine and production servers. It looks fine on Expresso though. Regex rgxVideo = new...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.