I need to parse and HTML document of the following format.
I am interested to obtain all the HTML from and including the first <div
class="data"> up to and including Data updated dd/mm/yyyy (where dd/mm/yyyy
will change). what kind of regular expressions can I use? Note I want
everything in the core of the HTML including all the tags within the div tags.
<html>
<head>
<!-- Not interested in parsing data in the header-->
</head>
<body>
<div class="head">no t interested in this</div>
<div class="data">In terested in data from this first data div</div>
<div class="data">Th ere can be <b>other tags</b> within these divs too!</div>
<a name="data3"></a>(There can be some other stuff in between the div tags)
Data updated dd/mm/yyyy
<img src="notInteres ted.jpg">
some other rubbish
<div class="footer"> not interested</div>