473,320 Members | 1,910 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Importing pages

AJ
Hi all

I've written a content management system that I'm now selling to my
customers. It's very nice when we have a blank canvas of a site, but a pain
in the arse when there is already a site in place.

What I'm in the process of *trying* to put together is a script that would
do the following:

A simple form where you put the address of the site with the static pages

The script then spiders through the site, takes everything between <body>
and </body> and chucks the rest away

It would then take out all class definitions and all embedded styles like
font tags etc but leaves tables, <p> <H?> etc

This would leave a very plain page of HTML that would be inserted into a
database. CSS would control the fonts etc. I'm aware that there would need
to be some tidying up if there was any javascript or anything and also some
basic formatting.

What I want to know is

1. Has it been done and, if so, where might I find something like this
2. Might it have any commercial value to other developers?

Regarding 2, I'm thinking how much time something like this might save me if
I have to convert anything more than a few pages of static HTML into
something that I can put in a database.

Your thoughts would be appreciated.

Andy
Jul 17 '05 #1
1 1265
"AJ" <no****@redcatmedia.net> wrote in message
news:<ci**********@hercules.btinternet.com>...

The script then spiders through the site, takes everything between
<body> and </body> and chucks the rest away
Bad idea. As of HTML 4.0, <head> and <body> tags are optional...
Also, why spider the site, if you can (theoretically, at least)
crawl the local file system?
1. Has it been done and, if so, where might I find something like this
The spidering part along with storing in databases is what search
engines do. What you need to add is the processing in-between.
2. Might it have any commercial value to other developers?


Developers, I doubt it. Content managers, possibly...

Cheers,
NC
Jul 17 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
by: qwweeeit | last post by:
The pythonic way of programming requires, as far as I know, to spread a big application in plenty of more manageable scripts, using import or from ... import to connect the various modules. In...
3
by: jbj | last post by:
hi, i am working on an xhtml page that is becoming very table intensive for just normal mark-up. I have pages and pages of tables and rows and it has gotten quite massive. I can break it into...
11
by: Grim Reaper | last post by:
I am importing a .csv file into Access that has 37 fields. My problem is that sometimes the last field only has data at the end of the column (it looks like when you import a file into Access, for...
7
by: Timothy Shih | last post by:
Hi, I am trying to figure out how to use unmanaged code using P/Invoke. I wrote a simple function which takes in 2 buffers (one a byte buffer, one a char buffer) and copies the contents of the byte...
5
by: Søren Reinke | last post by:
Hi there I am working on a program where the user should be able to import some CSV files. With my set of test data, it takes about 2 minutes to import, while it is importing the program sort...
29
by: Natan | last post by:
When you create and aspx page, this is generated by default: using System; using System.Collections; using System.Collections.Specialized; using System.Configuration; using System.Text; using...
11
by: panic attack | last post by:
Hello everbody, Our system is using Sql Server 2000 on Windows XP / Windows 2000 We have a text file needs to be imported into Sql Server 2000 as a table. But we are facing a problem which is,...
2
by: HMS Surprise | last post by:
Greetings, First I will admit I am new to Python but have experience with C++ and some Tcl/Tk. I am starting to use a tool called MaxQ that uses jython. I have been studying Rossum's tutorial...
4
by: rshepard | last post by:
I'm stymied by what should be a simple Python task: accessing the value of a variable assigned in one module from within a second module. I wonder if someone here can help clarify my thinking. I've...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.