Hi,
In short, how to modify selected tags/sections of a HTML file, using
PHP as the "modifier"/filter? I would have thought this was a very
common usage for PHP...
I have a set of existing .html files that are plain and ugly. I'd like
to create a showdoc.php filter that adds consistent menus, css, look
and feel, so that http://me/showdoc.php?d=story shows a nicely
formatted http://me/story.html
It:
* puts in a nice standard header
* opens story.html
* extacts all <link> and <script> tags from the story <head>
and adds them to the output <head>
* extracts everything between <body> and </body>
* rewrites all non-absolute hrefs e.g.
<a href="other.html"> to <a href="showdoc.php?d=other">
* closes story.html
* puts in a nice standard footer
I realize I can do this by editing all the .html files instead, but
can't I just use php as a filter? Am I the first person to want to do
this?
How?
* I _really_ want to avoid using regexps to match e.g. body and hrefs,
because there are so many caveats involved. Multiline tags,
attributes, for starters. Or how about <nasty attr="</body>"></nasty>
(not sure that really is legal, though...)
* xml_parse() parses XML and HTML is not XML (e.g. valid HTML missing
</end> tags) so xml_parse is out. Or what?
* Since I want to preserve all the <body> except the rewritten hrefs,
if there is a parser involved, I'd like for any parser to produce
output that is easy to re-flatten when generating output.
There are examples out there using CURL, but they often are so simple
that they don't print out *anything* on their own and only the output
of curl_exec(). In any useful application, wouldn't everyone have to
extract selected info from the retrieved web page? What do CURL users
do? regexps only?
Peter