473,324 Members | 2,257 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,324 software developers and data experts.

Find HTML tags using RegEx.

I have the following data in a web page:

<tr height="25">
<td nowrap class="odd" align="center"><img
src="/forums/images/icon_topic_new.gif" width=14 height=14 alt='New Topic'
border=0></td>

<td nowrap class="odd" align="center">&nbsp;</td>

<td nowrap class="odd" align="center">&nbsp;</td>
<td width="85%" class="even" align="left"><font class="new-row"><a
href="topic.asp?tid=106898">
Working with Sequential Text Files</a>&nbsp;</font>
<font class="sub-row">in .NET&nbsp;/&nbsp;.NET Newbies</font><font
class="sub-row"><br>Started 7/24/2005 - pages <a
href="topic.asp?tid=106898">1</a> - last posted by <a
href="profile.asp?action=view&id=jmcilhinney"
onmouseover="window.status='Show the authors profile'; return true;"
onmouseout="window.status=''; return true;">jmcilhinney</a></font></td>
<td width="15%" class="even" valign="middle" align="left"><font
class="new-row"><a href="profile.asp?action=view&id=USMC93"
onmouseover="window.status='Show the authors profile'; return true;"
onmouseout="window.status=''; return true;">USMC93</a></font></td>
<td nowrap class="odd" valign="middle" align="center"><font
class="new-row">6</font></td>
<td nowrap class="odd" valign="middle" align="left">
<font class="new-row">7/24/2005<br>
<font class="sub-row">8:42:37 PM</font></font></td>
</tr>

It's repeated over and over but with different data and is amongst other
unrelated data. I need to capture the following data:

* "topic.asp?tid=106898"
* "Working with Sequential Text Files"
* "in .NET"
* ".NET Newbies"
* "Started 7/24/2005"
* "pages" and "1"
* "last posted by"
* "jmcilhinney"
* "USMC93"
* "7/24/2005"
* "8:42:37 PM"

If someone can show me the Regex to capture say the first two items, I'll
try to figger out the rest.

Thanks.

--
|
+-- Thief_
|

VB.Net
Nov 21 '05 #1
3 2344
Hi,

Maybe this will help.

http://www.regexlib.com/REDetails.aspx?regexp_id=984

Ken
-----------------
"Thief_" <th****@hotmail.com> wrote in message
news:uA**************@TK2MSFTNGP10.phx.gbl...
I have the following data in a web page:

<tr height="25">
<td nowrap class="odd" align="center"><img
src="/forums/images/icon_topic_new.gif" width=14 height=14 alt='New Topic'
border=0></td>

<td nowrap class="odd" align="center">&nbsp;</td>

<td nowrap class="odd" align="center">&nbsp;</td>
<td width="85%" class="even" align="left"><font class="new-row"><a
href="topic.asp?tid=106898">
Working with Sequential Text Files</a>&nbsp;</font>
<font class="sub-row">in .NET&nbsp;/&nbsp;.NET Newbies</font><font
class="sub-row"><br>Started 7/24/2005 - pages <a
href="topic.asp?tid=106898">1</a> - last posted by <a
href="profile.asp?action=view&id=jmcilhinney"
onmouseover="window.status='Show the authors profile'; return true;"
onmouseout="window.status=''; return true;">jmcilhinney</a></font></td>
<td width="15%" class="even" valign="middle" align="left"><font
class="new-row"><a href="profile.asp?action=view&id=USMC93"
onmouseover="window.status='Show the authors profile'; return true;"
onmouseout="window.status=''; return true;">USMC93</a></font></td>
<td nowrap class="odd" valign="middle" align="center"><font
class="new-row">6</font></td>
<td nowrap class="odd" valign="middle" align="left">
<font class="new-row">7/24/2005<br>
<font class="sub-row">8:42:37 PM</font></font></td>
</tr>

It's repeated over and over but with different data and is amongst other
unrelated data. I need to capture the following data:

* "topic.asp?tid=106898"
* "Working with Sequential Text Files"
* "in .NET"
* ".NET Newbies"
* "Started 7/24/2005"
* "pages" and "1"
* "last posted by"
* "jmcilhinney"
* "USMC93"
* "7/24/2005"
* "8:42:37 PM"

If someone can show me the Regex to capture say the first two items, I'll
try to figger out the rest.

Thanks.

--
|
+-- Thief_
|

VB.Net

Nov 21 '05 #2
Thief,

Although that fore capturing HTML tags is MSHTML (what is a terrible class
to use). You can as well maybe use the method I showed you in the next
question you showed us.

Cor
Nov 21 '05 #3


Thief_ wrote:
I have the following data in a web page:
[snip] It's repeated over and over but with different data and is amongst other
unrelated data. I need to capture the following data:

* "topic.asp?tid=106898"
* "Working with Sequential Text Files" [etc]
If someone can show me the Regex to capture say the first two items, I'll
try to figger out the rest.


Rather than a RegEx (which you'll agree will be pretty hideous), might
I recommend HtmlAgilityPack for all your HTML parsing needs?

<http://smourier.blogspot.com/2005/05/net-html-agility-pack-how-to-use.html>

It makes dealing with HTML a breeze :)

--
Larry Lard
Replies to group please

Nov 21 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Markus Ernst | last post by:
Hello I have a regex problem, spent about 7 hours on this now, but I don't find the answer in the manual and googling, though I think this must have been discussed before. I try to simply...
0
by: Dean H. Saxe | last post by:
I'm currently developing a tool in perl to search out potential XSS (Cross Site Scripting) vulnerabilities and correct them in a ColdFusion based web app. I've been having great success so far,...
18
by: Shannon Jacobs | last post by:
Trying to solve this with a regex approach rather than the programmatic approach of counting up and down the levels. I have a fairly complicated HTML page that I want to simplify. I've been able to...
23
by: Charles Law | last post by:
Does anyone have a regex pattern to parse HTML from a stream? I have a well structured file, where each line is of the form <sometag someattribute='attr'>text</sometag> for example <SPAN...
4
by: Aaron | last post by:
Hello, I would like to limited what html tags are allowable in inputting into an xml page. Just <a>,<b>,<i>,<font>, <li>, <ul> and <ol>. I am using VB 2005. TIA Aaron
4
by: Spondishy | last post by:
Hi, I'm looking for help with a regular expression and c#. I want to remove all tags from a piece of html except the following. <a> <b> <h1> <h2>
1
by: Patrick | last post by:
I need to parse and HTML document of the following format. I am interested to obtain all the HTML from and including the first <div class="data"> up to and including Data updated dd/mm/yyyy...
3
by: jumblesale | last post by:
Hello all, I'm not all that bad at Regex, but i'm stumped on how to approach my problem. I need to parse a string and remove all html tags except hyperlinks. I can remove all the html tags...
14
by: Andy B | last post by:
I need to create a regular expression that will match a 5 digit number, a space and then anything up to but not including the next closing html tag. Here is an example: <startTag>55555 any...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.