By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,722 Members | 1,227 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,722 IT Pros & Developers. It's quick & easy.

Importing pages

P: n/a
AJ
Hi all

I've written a content management system that I'm now selling to my
customers. It's very nice when we have a blank canvas of a site, but a pain
in the arse when there is already a site in place.

What I'm in the process of *trying* to put together is a script that would
do the following:

A simple form where you put the address of the site with the static pages

The script then spiders through the site, takes everything between <body>
and </body> and chucks the rest away

It would then take out all class definitions and all embedded styles like
font tags etc but leaves tables, <p> <H?> etc

This would leave a very plain page of HTML that would be inserted into a
database. CSS would control the fonts etc. I'm aware that there would need
to be some tidying up if there was any javascript or anything and also some
basic formatting.

What I want to know is

1. Has it been done and, if so, where might I find something like this
2. Might it have any commercial value to other developers?

Regarding 2, I'm thinking how much time something like this might save me if
I have to convert anything more than a few pages of static HTML into
something that I can put in a database.

Your thoughts would be appreciated.

Andy
Jul 17 '05 #1
Share this Question
Share on Google+
1 Reply


P: n/a
"AJ" <no****@redcatmedia.net> wrote in message
news:<ci**********@hercules.btinternet.com>...

The script then spiders through the site, takes everything between
<body> and </body> and chucks the rest away
Bad idea. As of HTML 4.0, <head> and <body> tags are optional...
Also, why spider the site, if you can (theoretically, at least)
crawl the local file system?
1. Has it been done and, if so, where might I find something like this
The spidering part along with storing in databases is what search
engines do. What you need to add is the processing in-between.
2. Might it have any commercial value to other developers?


Developers, I doubt it. Content managers, possibly...

Cheers,
NC
Jul 17 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.