In our last episode, <de***************************@news.chello.nl>, the
lovely and talented Sjoerd broadcast on comp.lang.php:
Lars Eighner wrote:
>However, regular expressions simply are not up the task of parsing HTML.
They are not the right tool for the job.
What are?
Nothing in PHP that I know of. Perl has html parsing modules. You probably
could roll your own with one of the off-the-shelf parsers --- but this would
almost certainly be more work than manually editing some number of page one
time.
To see what is wrong with regular expressions in cases like this, back up to
the original post. The OP wants to match up to a certain </div>. But
regular expressions are greedy, so he will match the last </divin the
document. If you make the regular expressions less greedy, you may match a
</divthat is nested in the div you want to match.
Sure, I use perl one liners to alter html documents sometimes, and use
regular expressions in my editor to make html changes in single documents.
If you know and control the source documents, so you know they are valid to
begin with and you know what is in them and sometimes even how they are
formated (such has how many tabs to the </divyou want), you can use
regular expressions to cut corners sometimes, but there is no regular
expression answer to the question as put by the OP.
--
Lars Eighner <http://larseighner.com/
us****@larseighner.com
College: The fountains of knowledge, where everyone goes to drink.