470,849 Members | 1,141 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,849 developers. It's quick & easy.

Importing pages

AJ
Hi all

I've written a content management system that I'm now selling to my
customers. It's very nice when we have a blank canvas of a site, but a pain
in the arse when there is already a site in place.

What I'm in the process of *trying* to put together is a script that would
do the following:

A simple form where you put the address of the site with the static pages

The script then spiders through the site, takes everything between <body>
and </body> and chucks the rest away

It would then take out all class definitions and all embedded styles like
font tags etc but leaves tables, <p> <H?> etc

This would leave a very plain page of HTML that would be inserted into a
database. CSS would control the fonts etc. I'm aware that there would need
to be some tidying up if there was any javascript or anything and also some
basic formatting.

What I want to know is

1. Has it been done and, if so, where might I find something like this
2. Might it have any commercial value to other developers?

Regarding 2, I'm thinking how much time something like this might save me if
I have to convert anything more than a few pages of static HTML into
something that I can put in a database.

Your thoughts would be appreciated.

Andy
Jul 17 '05 #1
1 1205
"AJ" <no****@redcatmedia.net> wrote in message
news:<ci**********@hercules.btinternet.com>...

The script then spiders through the site, takes everything between
<body> and </body> and chucks the rest away
Bad idea. As of HTML 4.0, <head> and <body> tags are optional...
Also, why spider the site, if you can (theoretically, at least)
crawl the local file system?
1. Has it been done and, if so, where might I find something like this
The spidering part along with storing in databases is what search
engines do. What you need to add is the processing in-between.
2. Might it have any commercial value to other developers?


Developers, I doubt it. Content managers, possibly...

Cheers,
NC
Jul 17 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

12 posts views Thread by qwweeeit | last post: by
3 posts views Thread by jbj | last post: by
11 posts views Thread by Grim Reaper | last post: by
7 posts views Thread by Timothy Shih | last post: by
5 posts views Thread by Søren Reinke | last post: by
29 posts views Thread by Natan | last post: by
2 posts views Thread by HMS Surprise | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.