469,927 Members | 1,876 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,927 developers. It's quick & easy.

Regular Expressions to parse HTML

I need to parse and HTML document of the following format.

I am interested to obtain all the HTML from and including the first <div
class="data"> up to and including Data updated dd/mm/yyyy (where dd/mm/yyyy
will change). what kind of regular expressions can I use? Note I want
everything in the core of the HTML including all the tags within the div tags.
<html>
<head>
<!-- Not interested in parsing data in the header-->
</head>
<body>
<div class="head">not interested in this</div>
<div class="data">Interested in data from this first data div</div>
<div class="data">There can be <b>other tags</b> within these divs too!</div>
<a name="data3"></a>(There can be some other stuff in between the div tags)
Data updated dd/mm/yyyy
<img src="notInterested.jpg">
some other rubbish
<div class="footer">not interested</div>
May 2 '06 #1
1 2089
Patrick wrote:
I need to parse and HTML document of the following format.

I am interested to obtain all the HTML from and including the first <div
class="data"> up to and including Data updated dd/mm/yyyy (where dd/mm/yyyy
will change). what kind of regular expressions can I use? Note I want
everything in the core of the HTML including all the tags within the div tags.


Treating the input Html as one string (C# code):

Regex regex = new Regex(@"(<div class=""data"">.*(?=<img))",
RegexOptions.Singleline);
Sample input:
<html>
<head>
<!-- Not interested in parsing data in the header-->
</head>
<body>
<div class="head">not interested in this</div>
<div class="data">Interested in data from this first data div</div>
<div class="data">There can be <b>other tags</b> within these divs too!</div>
<a name="data3"></a>(There can be some other stuff in between the div tags)
Data updated dd/mm/yyyy
<img src="notInterested.jpg">
some other rubbish
<div class="footer">not interested</div>

Sample output:
1 =»<div class="data">Interested in data from this first data div</div>
<div class="data">There can be <b>other tags</b> within these divs too!</div>
<a name="data3"></a>(There can be some other stuff in between the div tags)
Data updated dd/mm/yyyy
«=

--
Take care,
Ken
(to reply directly, remove the cool car. <sigh>)
May 3 '06 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by Bryan | last post: by
4 posts views Thread by Befuddled | last post: by
7 posts views Thread by Patient Guy | last post: by
18 posts views Thread by Q. John Chen | last post: by
4 posts views Thread by rufus | last post: by
1 post views Thread by passion_to_be_free | last post: by
20 posts views Thread by Asper Faner | last post: by
20 posts views Thread by Geoff Hill | last post: by
13 posts views Thread by Wiseman | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.