473,503 Members | 1,700 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Howto use php as filter for HTML files? Curl?

Hi,

In short, how to modify selected tags/sections of a HTML file, using
PHP as the "modifier"/filter? I would have thought this was a very
common usage for PHP...

I have a set of existing .html files that are plain and ugly. I'd like
to create a showdoc.php filter that adds consistent menus, css, look
and feel, so that http://me/showdoc.php?d=story shows a nicely
formatted http://me/story.html
It:
* puts in a nice standard header
* opens story.html
* extacts all <link> and <script> tags from the story <head>
and adds them to the output <head>
* extracts everything between <body> and </body>
* rewrites all non-absolute hrefs e.g.
<a href="other.html"> to <a href="showdoc.php?d=other">
* closes story.html
* puts in a nice standard footer

I realize I can do this by editing all the .html files instead, but
can't I just use php as a filter? Am I the first person to want to do
this?

How?

* I _really_ want to avoid using regexps to match e.g. body and hrefs,
because there are so many caveats involved. Multiline tags,
attributes, for starters. Or how about <nasty attr="</body>"></nasty>
(not sure that really is legal, though...)

* xml_parse() parses XML and HTML is not XML (e.g. valid HTML missing
</end> tags) so xml_parse is out. Or what?

* Since I want to preserve all the <body> except the rewritten hrefs,
if there is a parser involved, I'd like for any parser to produce
output that is easy to re-flatten when generating output.

There are examples out there using CURL, but they often are so simple
that they don't print out *anything* on their own and only the output
of curl_exec(). In any useful application, wouldn't everyone have to
extract selected info from the retrieved web page? What do CURL users
do? regexps only?

Peter
Jul 17 '05 #1
0 2442

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
2917
by: Haluk Durmus | last post by:
Hello I checked out openssl,mm,apr,apr-util,apache 2,curl,libxml and php from cvs. php couse an ERROR I did the following steps:
6
3341
by: benji | last post by:
I have set up a system to download datafeeds in pain text or zipped. The download part of this system uses the curl extension to download the files. All was well when I tested it with various...
1
1689
by: paul fpvt2 | last post by:
How can I copy an html file to a local drive ? For example: I would like to copy www.mywebsite.mypage.htm to c:\inetput\wwwroot\mydir\mypage.htm, can I do that ? Thanks.
3
6592
by: Richard | last post by:
Hi, I have a form based on a table. When I filter the form I want to run a report based on the same table with the same filter as the form. No problem until I want to filter a combo box where...
9
15041
by: John | last post by:
Hi I am using the following for the filter but it allows me to display only one of either an htm or an html file. dlgOpenFile.Filter = "Html files (*.html)|*.html|Htm files (*.htm)|*.htm" ...
82
6218
by: Eric Lindsay | last post by:
I have been trying to get a better understanding of simple HTML, but I am finding conflicting information is very common. Not only that, even in what seemed elementary and without any possibility...
10
3453
by: Eric Lindsay | last post by:
This may be too far off topic, however I was looking at this page http://www.hixie.ch/advocacy/xhtml about XHTML problems by Ian Hickson. It is served as text/plain, according to Firefox...
3
2608
by: Chuck Renner | last post by:
Please help! This MIGHT even be a bug in PHP! I'll provide version numbers and site specific information (browser, OS, and kernel versions) if others cannot reproduce this problem. I'm...
3
5728
by: Wim Kumpen | last post by:
Hey, I have to following code curl_setopt($ch, CURLOPT_URL, "http://collect.myspace.com/index.cfm?fuseaction=invite.addfriend_verify&friendID=" .. $fid); curl_setopt($ch,...
0
7334
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
6993
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7462
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5579
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
5014
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4675
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3168
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
3156
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
383
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.