By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
464,386 Members | 1,159 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 464,386 IT Pros & Developers. It's quick & easy.

Find HTML tags using RegEx.

P: n/a
I have the following data in a web page:

<tr height="25">
<td nowrap class="odd" align="center"><img
src="/forums/images/icon_topic_new.gif" width=14 height=14 alt='New Topic'
border=0></td>

<td nowrap class="odd" align="center">&nbsp;</td>

<td nowrap class="odd" align="center">&nbsp;</td>
<td width="85%" class="even" align="left"><font class="new-row"><a
href="topic.asp?tid=106898">
Working with Sequential Text Files</a>&nbsp;</font>
<font class="sub-row">in .NET&nbsp;/&nbsp;.NET Newbies</font><font
class="sub-row"><br>Started 7/24/2005 - pages <a
href="topic.asp?tid=106898">1</a> - last posted by <a
href="profile.asp?action=view&id=jmcilhinney"
onmouseover="window.status='Show the authors profile'; return true;"
onmouseout="window.status=''; return true;">jmcilhinney</a></font></td>
<td width="15%" class="even" valign="middle" align="left"><font
class="new-row"><a href="profile.asp?action=view&id=USMC93"
onmouseover="window.status='Show the authors profile'; return true;"
onmouseout="window.status=''; return true;">USMC93</a></font></td>
<td nowrap class="odd" valign="middle" align="center"><font
class="new-row">6</font></td>
<td nowrap class="odd" valign="middle" align="left">
<font class="new-row">7/24/2005<br>
<font class="sub-row">8:42:37 PM</font></font></td>
</tr>

It's repeated over and over but with different data and is amongst other
unrelated data. I need to capture the following data:

* "topic.asp?tid=106898"
* "Working with Sequential Text Files"
* "in .NET"
* ".NET Newbies"
* "Started 7/24/2005"
* "pages" and "1"
* "last posted by"
* "jmcilhinney"
* "USMC93"
* "7/24/2005"
* "8:42:37 PM"

If someone can show me the Regex to capture say the first two items, I'll
try to figger out the rest.

Thanks.

--
|
+-- Thief_
|

VB.Net
Nov 21 '05 #1
Share this Question
Share on Google+
3 Replies

P: n/a
Hi,

Maybe this will help.

http://www.regexlib.com/REDetails.aspx?regexp_id=984

Ken
-----------------
"Thief_" <th****@hotmail.com> wrote in message
news:uA**************@TK2MSFTNGP10.phx.gbl...
I have the following data in a web page:

<tr height="25">
<td nowrap class="odd" align="center"><img
src="/forums/images/icon_topic_new.gif" width=14 height=14 alt='New Topic'
border=0></td>

<td nowrap class="odd" align="center">&nbsp;</td>

<td nowrap class="odd" align="center">&nbsp;</td>
<td width="85%" class="even" align="left"><font class="new-row"><a
href="topic.asp?tid=106898">
Working with Sequential Text Files</a>&nbsp;</font>
<font class="sub-row">in .NET&nbsp;/&nbsp;.NET Newbies</font><font
class="sub-row"><br>Started 7/24/2005 - pages <a
href="topic.asp?tid=106898">1</a> - last posted by <a
href="profile.asp?action=view&id=jmcilhinney"
onmouseover="window.status='Show the authors profile'; return true;"
onmouseout="window.status=''; return true;">jmcilhinney</a></font></td>
<td width="15%" class="even" valign="middle" align="left"><font
class="new-row"><a href="profile.asp?action=view&id=USMC93"
onmouseover="window.status='Show the authors profile'; return true;"
onmouseout="window.status=''; return true;">USMC93</a></font></td>
<td nowrap class="odd" valign="middle" align="center"><font
class="new-row">6</font></td>
<td nowrap class="odd" valign="middle" align="left">
<font class="new-row">7/24/2005<br>
<font class="sub-row">8:42:37 PM</font></font></td>
</tr>

It's repeated over and over but with different data and is amongst other
unrelated data. I need to capture the following data:

* "topic.asp?tid=106898"
* "Working with Sequential Text Files"
* "in .NET"
* ".NET Newbies"
* "Started 7/24/2005"
* "pages" and "1"
* "last posted by"
* "jmcilhinney"
* "USMC93"
* "7/24/2005"
* "8:42:37 PM"

If someone can show me the Regex to capture say the first two items, I'll
try to figger out the rest.

Thanks.

--
|
+-- Thief_
|

VB.Net

Nov 21 '05 #2

P: n/a
Thief,

Although that fore capturing HTML tags is MSHTML (what is a terrible class
to use). You can as well maybe use the method I showed you in the next
question you showed us.

Cor
Nov 21 '05 #3

P: n/a


Thief_ wrote:
I have the following data in a web page:
[snip] It's repeated over and over but with different data and is amongst other
unrelated data. I need to capture the following data:

* "topic.asp?tid=106898"
* "Working with Sequential Text Files" [etc]
If someone can show me the Regex to capture say the first two items, I'll
try to figger out the rest.


Rather than a RegEx (which you'll agree will be pretty hideous), might
I recommend HtmlAgilityPack for all your HTML parsing needs?

<http://smourier.blogspot.com/2005/05/net-html-agility-pack-how-to-use.html>

It makes dealing with HTML a breeze :)

--
Larry Lard
Replies to group please

Nov 21 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.