By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
429,101 Members | 1,340 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 429,101 IT Pros & Developers. It's quick & easy.

Howto use php as filter for HTML files? Curl?

P: n/a
Hi,

In short, how to modify selected tags/sections of a HTML file, using
PHP as the "modifier"/filter? I would have thought this was a very
common usage for PHP...

I have a set of existing .html files that are plain and ugly. I'd like
to create a showdoc.php filter that adds consistent menus, css, look
and feel, so that http://me/showdoc.php?d=story shows a nicely
formatted http://me/story.html
It:
* puts in a nice standard header
* opens story.html
* extacts all <link> and <script> tags from the story <head>
and adds them to the output <head>
* extracts everything between <body> and </body>
* rewrites all non-absolute hrefs e.g.
<a href="other.html"> to <a href="showdoc.php?d=other">
* closes story.html
* puts in a nice standard footer

I realize I can do this by editing all the .html files instead, but
can't I just use php as a filter? Am I the first person to want to do
this?

How?

* I _really_ want to avoid using regexps to match e.g. body and hrefs,
because there are so many caveats involved. Multiline tags,
attributes, for starters. Or how about <nasty attr="</body>"></nasty>
(not sure that really is legal, though...)

* xml_parse() parses XML and HTML is not XML (e.g. valid HTML missing
</end> tags) so xml_parse is out. Or what?

* Since I want to preserve all the <body> except the rewritten hrefs,
if there is a parser involved, I'd like for any parser to produce
output that is easy to re-flatten when generating output.

There are examples out there using CURL, but they often are so simple
that they don't print out *anything* on their own and only the output
of curl_exec(). In any useful application, wouldn't everyone have to
extract selected info from the retrieved web page? What do CURL users
do? regexps only?

Peter
Jul 17 '05 #1
Share this question for a faster answer!
Share on Google+

This discussion thread is closed

Replies have been disabled for this discussion.