Howto use php as filter for HTML files? Curl?

Peter Valdemar M?rch

Hi,

In short, how to modify selected tags/sections of a HTML file, using
PHP as the "modifier"/filter? I would have thought this was a very
common usage for PHP...

I have a set of existing .html files that are plain and ugly. I'd like
to create a showdoc.php filter that adds consistent menus, css, look
and feel, so that http://me/showdoc.php?d=story shows a nicely
formatted http://me/story.html
It:
* puts in a nice standard header
* opens story.html
* extacts all <link> and <script> tags from the story <head>
and adds them to the output <head>
* extracts everything between <body> and </body>
* rewrites all non-absolute hrefs e.g.
<a href="other.html"> to <a href="showdoc.php?d=other">
* closes story.html
* puts in a nice standard footer

I realize I can do this by editing all the .html files instead, but
can't I just use php as a filter? Am I the first person to want to do
this?

How?

* I _really_ want to avoid using regexps to match e.g. body and hrefs,
because there are so many caveats involved. Multiline tags,
attributes, for starters. Or how about <nasty attr="</body>"></nasty>
(not sure that really is legal, though...)

* xml_parse() parses XML and HTML is not XML (e.g. valid HTML missing
</end> tags) so xml_parse is out. Or what?

* Since I want to preserve all the <body> except the rewritten hrefs,
if there is a parser involved, I'd like for any parser to produce
output that is easy to re-flatten when generating output.

There are examples out there using CURL, but they often are so simple
that they don't print out *anything* on their own and only the output
of curl_exec(). In any useful application, wouldn't everyone have to
extract selected info from the retrieved web page? What do CURL users
do? regexps only?

Peter

Jul 17 '05 #1

Subscribe Reply

2442

Similar topics

2917

linux, php,cvs, curl instlling problem

by: Haluk Durmus | last post by:

Hello I checked out openssl,mm,apr,apr-util,apache 2,curl,libxml and php from cvs. php couse an ERROR I did the following steps:

PHP

3341

curl extension downloads 1 byte only from zip files

by: benji | last post by:

I have set up a system to download datafeeds in pain text or zipped. The download part of this system uses the curl extension to download the files. All was well when I tested it with various...

PHP

1689

How can I copy an html file to a local drive ?

by: paul fpvt2 | last post by:

How can I copy an html file to a local drive ? For example: I would like to copy www.mywebsite.mypage.htm to c:\inetput\wwwroot\mydir\mypage.htm, can I do that ? Thanks.

ASP / Active Server Pages

6592

Filter a report based on same filter on a form

by: Richard | last post by:

Hi, I have a form based on a table. When I filter the form I want to run a report based on the same table with the same filter as the form. No problem until I want to filter a combo box where...

Microsoft Access / VBA

15041

OpenFile dialog filter

by: John | last post by:

Hi I am using the following for the filter but it allows me to display only one of either an htm or an html file. dlgOpenFile.Filter = "Html files (*.html)|*.html|Htm files (*.htm)|*.htm" ...

Visual Basic .NET

6218

Understanding simplest HTML page

by: Eric Lindsay | last post by:

I have been trying to get a better understanding of simple HTML, but I am finding conflicting information is very common. Not only that, even in what seemed elementary and without any possibility...

HTML / CSS

3453

When plain text page is treated as HTML

by: Eric Lindsay | last post by:

This may be too far off topic, however I was looking at this page http://www.hixie.ch/advocacy/xhtml about XHTML problems by Ian Hickson. It is served as text/plain, according to Firefox...

HTML / CSS

2608

HELP: strange php behavior downloading html

by: Chuck Renner | last post by:

Please help! This MIGHT even be a bug in PHP! I'll provide version numbers and site specific information (browser, OS, and kernel versions) if others cannot reproduce this problem. I'm...

PHP

5728

curl - html/1.0

by: Wim Kumpen | last post by:

Hey, I have to following code curl_setopt($ch, CURLOPT_URL, "http://collect.myspace.com/index.cfm?fuseaction=invite.addfriend_verify&friendID=" .. $fid); curl_setopt($ch,...

PHP

7334

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

6993

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

7462

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

5579

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

5014

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

4675

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

3168

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

Networking - Hardware / Configuration

3156

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

383

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

General