471,570 Members | 926 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,570 software developers and data experts.

regular expression help needed

Hi

I have a regex question. I want to find all content of a <td
class="someclass"> tag. This means the expression should include all other
tags included between <td class="someclass"> and </td>.

Please help

Regards

Henrik
Dec 7 '05 #1
4 3027
Hi Henrik

I guess something like /^<[^>]+>(.*)<[^>]+>$/ should do the trick,
although I'm not really sure about that greedy matching... (i.e. will
$1 include *everything* after the first tag, or only up to the closing
tag?)
hth,
Markus

Dec 7 '05 #2
It depends on what you really are looking for.

if you document is like this one : <tr><td class="myClass">Some
text</td></tr>
the following regex=new Regex(@"<td\s+class=\"myClass\"\s+>(.*?)</td>");
will give you as result : "Some text"

if you document is like this one : <tr><td class="myClass"> <table><tr><td
class="anotherClass">Other text</td></tr></table> </td></tr>
the following regex=new Regex(@"<td\s+class=\"myClass\"\s+>(.*?)</td>");
will give you as result : " <table><tr><td class="anotherClass">Other text"
wich is not what you are looking for

if you remove the question mark in the regex, it will work for the two
anterior examples but will not work for the following one :
if you document is like this one : <tr><td class="myClass">Some
text</td></tr><tr><td class="anotherClass">Other text</td></tr>
that will give you "Some text</td></tr><tr><td class="anotherClass">Other
text"

My question is : do you have nested <td> tags ? In that case, you have to
use backtracking
Could you give us the most complicated example of file you have ?

Ludovic SOEUR.

"henrik" <no****@thierry.nu> a écrit dans le message de
news:u%******************@TK2MSFTNGP14.phx.gbl...
Hi

I have a regex question. I want to find all content of a <td
class="someclass"> tag. This means the expression should include all other
tags included between <td class="someclass"> and </td>.

Please help

Regards

Henrik

Dec 7 '05 #3
I did the case where you can have nested tags, using backtracking :

public void showTDContent(string content) {
Regex regex=new Regex(@"<td
class=\""someclass\"">(?<tdcontent>.*?((?=<td)|(?= </td))(((?<Open><td.*?>).*
?((?=<td)|(?=</td)))+((?<Close-Open></td>).*?((?=<td)|(?=</td)))+)*(?(Open)(
?!)))</td>");
MatchCollection matches=regex.Matches(content);
foreach(Match match in matches) {
string sMatch=match.Groups["tdcontent"].Value;
MessageBox.Show(sMatch);
showTDContent(sMatch); //Try to find nested tags
}
}

Here an example :
showTDContent(@"<table><tr><td
class=""someclass""><table><tr><td>someText</td><td
class=""someclass"">otherText</td></tr></table></td></tr><tr><td
class=""someclass"">thirdText</td></tr></table>");

It returns three strings :
1) <table><tr><td>someText</td><td
class=""someclass"">otherText</td></tr></table>
2) otherText
3) thirdText

If you dont have nested tags like the example before, you can keep the SAME
regex expression but you don't need to use recursivity
public void showTDContent(string content) {
Regex regex=new Regex(@"<td
class=\""someclass\"">(?<tdcontent>.*?((?=<td)|(?= </td))(((?<Open><td.*?>).*
?((?=<td)|(?=</td)))+((?<Close-Open></td>).*?((?=<td)|(?=</td)))+)*(?(Open)(
?!)))</td>");
MatchCollection matches=regex.Matches(content);
foreach(Match match in matches) {
MessageBox.Show(match.Groups["tdcontent"].Value);
}
}

To explain the regular expression, have a look to
http://blogs.msdn.com/bclteam/archiv...15/396452.aspx. It explains
how works balanced matching
<
[^<>]*

(

(

(?<Open><)

[^<>]*

)+

(

(?<Close-Open>>)

[^<>]*

)+

)*

(?(Open)(?!))


My regex is nearly the same:
<td\s+class=\"someclass\">(?<tdcontent>
.*?((?=<td)|(?=</td))
(
(
(?<Open><td.*?>)
.*?((?=<td)|(?=</td))
)+
(
(?<Close-Open></td>)
.*?((?=<td)|(?=</td))
)+
)*
(?(Open)(?!))
)</td>

In fact,
[^<>]* is replaced by .*?((?=<td)|(?=</td)) that means any opening or
closing TD tag
and the other things are exactly the same.
Hope everything helps,

Ludovic SOEUR.
"henrik" <no****@thierry.nu> a écrit dans le message de
news:u%******************@TK2MSFTNGP14.phx.gbl... Hi

I have a regex question. I want to find all content of a <td
class="someclass"> tag. This means the expression should include all other
tags included between <td class="someclass"> and </td>.

Please help

Regards

Henrik

Dec 7 '05 #4
Hi you Guys

Thank you for you help!

I solved it with <td[\ \s]class="myClass">(\s\S]*?)</td>. I donot have
nested tables, so this did the trick. Very close to some of your
suggenstions.

:o)

Regards,

Henrik
Dec 8 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

2 posts views Thread by Jack Smith | last post: by
14 posts views Thread by Tina Li | last post: by
6 posts views Thread by Chris Lasher | last post: by
4 posts views Thread by Neri | last post: by
5 posts views Thread by tmeister | last post: by
2 posts views Thread by PawelR | last post: by
6 posts views Thread by Øyvind Isaksen | last post: by
7 posts views Thread by Billa | last post: by
reply views Thread by XIAOLAOHU | last post: by
reply views Thread by leo001 | last post: by
reply views Thread by lumer26 | last post: by
reply views Thread by Vinnie | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.