Connecting Tech Pros Worldwide Help | Site Map

Howto use php as filter for HTML files? Curl?

  #1  
Old July 17th, 2005, 02:08 AM
Peter Valdemar M?rch
Guest
 
Posts: n/a
Hi,

In short, how to modify selected tags/sections of a HTML file, using
PHP as the "modifier"/filter? I would have thought this was a very
common usage for PHP...

I have a set of existing .html files that are plain and ugly. I'd like
to create a showdoc.php filter that adds consistent menus, css, look
and feel, so that http://me/showdoc.php?d=story shows a nicely
formatted http://me/story.html
It:
* puts in a nice standard header
* opens story.html
* extacts all <link> and <script> tags from the story <head>
and adds them to the output <head>
* extracts everything between <body> and </body>
* rewrites all non-absolute hrefs e.g.
<a href="other.html"> to <a href="showdoc.php?d=other">
* closes story.html
* puts in a nice standard footer

I realize I can do this by editing all the .html files instead, but
can't I just use php as a filter? Am I the first person to want to do
this?

How?

* I _really_ want to avoid using regexps to match e.g. body and hrefs,
because there are so many caveats involved. Multiline tags,
attributes, for starters. Or how about <nasty attr="</body>"></nasty>
(not sure that really is legal, though...)

* xml_parse() parses XML and HTML is not XML (e.g. valid HTML missing
</end> tags) so xml_parse is out. Or what?

* Since I want to preserve all the <body> except the rewritten hrefs,
if there is a parser involved, I'd like for any parser to produce
output that is easy to re-flatten when generating output.

There are examples out there using CURL, but they often are so simple
that they don't print out *anything* on their own and only the output
of curl_exec(). In any useful application, wouldn't everyone have to
extract selected info from the retrieved web page? What do CURL users
do? regexps only?

Peter
Closed Thread


Similar Threads
Thread Thread Starter Forum Replies Last Post
$_SESSION / $HTTP_SESSION_VARS behaviour Michael Windsor answers 12 November 7th, 2006 01:15 PM
Somewhat bizarre PHP5/Apache startup dilemma voronwae answers 5 April 5th, 2006 06:05 AM