473,387 Members | 1,619 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

ASP: Building a "page grabber" to saveout text AND images?

My well-heeled clients would like a tool that I could build into a web
application that would suck down the entire contents a remote webpage and
store it on a local filesystem.

I know how to scrape textual content from places on the web using MSXML and
assorted scracping widget. That doesn't help me for photos, nor does it
answer how to encapsulate the photos in a form that I can see them.

This would probably only find use in intranet-type situations, so a IE-only
..mht-type encapsulation of the page might be OK.

I have OK skills working with the FSO, so if there's a widget or method by
which the retrieval can be done, I can probably figure things out.

Has anyone done anything like this? Can someone suggest widgets and/or a
design approach?

Thanks.

-KF

Jul 19 '05 #1
7 1849
This isn't an anser to your question, but why? If they want to store Web
sites locally, shouldn't they be looking at proxy server software instead of
an ASP application? I don't really know what it is that your clients want,
but perhaps you can talk them into buy ISA Server or something a bit cheaper
with good proxy services.

Ray at home

"Ken Fine" <ke*****@u.washington.edu> wrote in message
news:c4**********@nntp6.u.washington.edu...
My well-heeled clients would like a tool that I could build into a web
application that would suck down the entire contents a remote webpage and
store it on a local filesystem.

I know how to scrape textual content from places on the web using MSXML and assorted scracping widget. That doesn't help me for photos, nor does it
answer how to encapsulate the photos in a form that I can see them.

This would probably only find use in intranet-type situations, so a IE-only .mht-type encapsulation of the page might be OK.

I have OK skills working with the FSO, so if there's a widget or method by
which the retrieval can be done, I can probably figure things out.

Has anyone done anything like this? Can someone suggest widgets and/or a
design approach?

Thanks.

-KF

Jul 19 '05 #2
I'm not sure if this is a complete answer to your self-described non-answer
:), but...

1)...the pages that the "client" (my employer :) wants don't live forever
on the web. Eventually they're restricted in a way, and we'd to capture and
catalogue them in a local cache before that happens.

2) We want to interactively pick out a very limited selection of a given
site, not the entire site's content. We're only interested in certain pages,
and it requires human judgement to ascertain which pages are best.

Does this help?

-KF



"Ray at <%=sLocation%> [MVP]" <myfirstname at lane34 dot com> wrote in
message news:%2****************@TK2MSFTNGP12.phx.gbl...
This isn't an anser to your question, but why? If they want to store Web
sites locally, shouldn't they be looking at proxy server software instead of an ASP application? I don't really know what it is that your clients want, but perhaps you can talk them into buy ISA Server or something a bit cheaper with good proxy services.

Ray at home

"Ken Fine" <ke*****@u.washington.edu> wrote in message
news:c4**********@nntp6.u.washington.edu...
My well-heeled clients would like a tool that I could build into a web
application that would suck down the entire contents a remote webpage and store it on a local filesystem.

I know how to scrape textual content from places on the web using MSXML

and
assorted scracping widget. That doesn't help me for photos, nor does it
answer how to encapsulate the photos in a form that I can see them.

This would probably only find use in intranet-type situations, so a

IE-only
.mht-type encapsulation of the page might be OK.

I have OK skills working with the FSO, so if there's a widget or method by which the retrieval can be done, I can probably figure things out.

Has anyone done anything like this? Can someone suggest widgets and/or a
design approach?

Thanks.

-KF


Jul 19 '05 #3
Look for "offline browsers" using Google or www.tucows.com. There are plenty
of apps that do that for you right now.

--
Manohar Kamath
Editor, .netWire
www.dotnetwire.com
"Ken Fine" <ke*****@u.washington.edu> wrote in message
news:c4**********@nntp6.u.washington.edu...
My well-heeled clients would like a tool that I could build into a web
application that would suck down the entire contents a remote webpage and
store it on a local filesystem.

I know how to scrape textual content from places on the web using MSXML and assorted scracping widget. That doesn't help me for photos, nor does it
answer how to encapsulate the photos in a form that I can see them.

This would probably only find use in intranet-type situations, so a IE-only .mht-type encapsulation of the page might be OK.

I have OK skills working with the FSO, so if there's a widget or method by
which the retrieval can be done, I can probably figure things out.

Has anyone done anything like this? Can someone suggest widgets and/or a
design approach?

Thanks.

-KF

Jul 19 '05 #4
Thanks, Manohar. But that's not what I need. I need a method or widget that
I can access programmatically, and that I can build into the works of other
web applications that I'm creating. I don't want a fixed tool, no matter how
good the implementation.

Does anyone have other ideas?

-KF

"Manohar Kamath [MVP]" <mk*****@TAKETHISOUTkamath.com> wrote in message
news:uc**************@tk2msftngp13.phx.gbl...
Look for "offline browsers" using Google or www.tucows.com. There are plenty of apps that do that for you right now.

--
Manohar Kamath
Editor, .netWire
www.dotnetwire.com
"Ken Fine" <ke*****@u.washington.edu> wrote in message
news:c4**********@nntp6.u.washington.edu...
My well-heeled clients would like a tool that I could build into a web
application that would suck down the entire contents a remote webpage and store it on a local filesystem.

I know how to scrape textual content from places on the web using MSXML

and
assorted scracping widget. That doesn't help me for photos, nor does it
answer how to encapsulate the photos in a form that I can see them.

This would probably only find use in intranet-type situations, so a

IE-only
.mht-type encapsulation of the page might be OK.

I have OK skills working with the FSO, so if there's a widget or method by which the retrieval can be done, I can probably figure things out.

Has anyone done anything like this? Can someone suggest widgets and/or a
design approach?

Thanks.

-KF


Jul 19 '05 #5
One way to do this, is programmatically call these tools. Doing it on your
own might be "reinventing the wheel" and prolonged.

If that's what you want, here are some articles that might help:

http://www.yart.com.au/articles/Yider.asp
http://www.asp101.com/articles/chris/spider/default.asp
--
Manohar Kamath
Editor, .netWire
www.dotnetwire.com
"Ken Fine" <ke*****@u.washington.edu> wrote in message
news:c4**********@nntp6.u.washington.edu...
Thanks, Manohar. But that's not what I need. I need a method or widget that I can access programmatically, and that I can build into the works of other web applications that I'm creating. I don't want a fixed tool, no matter how good the implementation.

Does anyone have other ideas?

-KF

"Manohar Kamath [MVP]" <mk*****@TAKETHISOUTkamath.com> wrote in message
news:uc**************@tk2msftngp13.phx.gbl...
Look for "offline browsers" using Google or www.tucows.com. There are plenty
of apps that do that for you right now.

--
Manohar Kamath
Editor, .netWire
www.dotnetwire.com
"Ken Fine" <ke*****@u.washington.edu> wrote in message
news:c4**********@nntp6.u.washington.edu...
My well-heeled clients would like a tool that I could build into a web
application that would suck down the entire contents a remote webpage and store it on a local filesystem.

I know how to scrape textual content from places on the web using MSXML
and
assorted scracping widget. That doesn't help me for photos, nor does
it answer how to encapsulate the photos in a form that I can see them.

This would probably only find use in intranet-type situations, so a

IE-only
.mht-type encapsulation of the page might be OK.

I have OK skills working with the FSO, so if there's a widget or method by which the retrieval can be done, I can probably figure things out.

Has anyone done anything like this? Can someone suggest widgets and/or

a design approach?

Thanks.

-KF



Jul 19 '05 #6
Thanks again. Programmatically calling existing tools is a great idea, and
I'll look carefully at what's are out there. One limit of the two tools you
linked to is that neither of them deal with retrieving page images, which is
at the core of what I need to do.

I was hoping someone might already know of such a tool.

I don't really need spidering capability. What I need is a method to suck
down the text and images of a single remote page, and ideally a way to
rework the links so that everything doesn't look broken. The IE "Save page
as single file"/.mht is a good model for what I'd ideally like to do.

-KF

"Manohar Kamath [MVP]" <mk*****@TAKETHISOUTkamath.com> wrote in message
news:ez**************@TK2MSFTNGP11.phx.gbl...
One way to do this, is programmatically call these tools. Doing it on your
own might be "reinventing the wheel" and prolonged.

If that's what you want, here are some articles that might help:

http://www.yart.com.au/articles/Yider.asp
http://www.asp101.com/articles/chris/spider/default.asp
--
Manohar Kamath
Editor, .netWire
www.dotnetwire.com
"Ken Fine" <ke*****@u.washington.edu> wrote in message
news:c4**********@nntp6.u.washington.edu...
Thanks, Manohar. But that's not what I need. I need a method or widget that
I can access programmatically, and that I can build into the works of

other
web applications that I'm creating. I don't want a fixed tool, no matter

how
good the implementation.

Does anyone have other ideas?

-KF

"Manohar Kamath [MVP]" <mk*****@TAKETHISOUTkamath.com> wrote in message
news:uc**************@tk2msftngp13.phx.gbl...
Look for "offline browsers" using Google or www.tucows.com. There are

plenty
of apps that do that for you right now.

--
Manohar Kamath
Editor, .netWire
www.dotnetwire.com
"Ken Fine" <ke*****@u.washington.edu> wrote in message
news:c4**********@nntp6.u.washington.edu...
> My well-heeled clients would like a tool that I could build into a web > application that would suck down the entire contents a remote
webpage
and
> store it on a local filesystem.
>
> I know how to scrape textual content from places on the web using

MSXML and
> assorted scracping widget. That doesn't help me for photos, nor does it > answer how to encapsulate the photos in a form that I can see them.
>
> This would probably only find use in intranet-type situations, so a
IE-only
> .mht-type encapsulation of the page might be OK.
>
> I have OK skills working with the FSO, so if there's a widget or method
by
> which the retrieval can be done, I can probably figure things out.
>
> Has anyone done anything like this? Can someone suggest widgets

and/or a > design approach?
>
> Thanks.
>
> -KF
>
>
>



Jul 19 '05 #7
I have learned about a UNIX utility called wget that does precisely what
I've been describing.

Unless there's something comparable in the realm of ASP that doesn't require
launching an external process (as wget does), that's probably my solution.

Thanks for the pointers...

-KF

"Ken Fine" <ke*****@u.washington.edu> wrote in message
news:c4**********@nntp6.u.washington.edu...
Thanks again. Programmatically calling existing tools is a great idea, and
I'll look carefully at what's are out there. One limit of the two tools you linked to is that neither of them deal with retrieving page images, which is at the core of what I need to do.

I was hoping someone might already know of such a tool.

I don't really need spidering capability. What I need is a method to suck
down the text and images of a single remote page, and ideally a way to
rework the links so that everything doesn't look broken. The IE "Save page
as single file"/.mht is a good model for what I'd ideally like to do.

-KF

"Manohar Kamath [MVP]" <mk*****@TAKETHISOUTkamath.com> wrote in message
news:ez**************@TK2MSFTNGP11.phx.gbl...
One way to do this, is programmatically call these tools. Doing it on your
own might be "reinventing the wheel" and prolonged.

If that's what you want, here are some articles that might help:

http://www.yart.com.au/articles/Yider.asp
http://www.asp101.com/articles/chris/spider/default.asp
--
Manohar Kamath
Editor, .netWire
www.dotnetwire.com
"Ken Fine" <ke*****@u.washington.edu> wrote in message
news:c4**********@nntp6.u.washington.edu...
Thanks, Manohar. But that's not what I need. I need a method or widget

that
I can access programmatically, and that I can build into the works of

other
web applications that I'm creating. I don't want a fixed tool, no matter
how
good the implementation.

Does anyone have other ideas?

-KF

"Manohar Kamath [MVP]" <mk*****@TAKETHISOUTkamath.com> wrote in
message news:uc**************@tk2msftngp13.phx.gbl...
> Look for "offline browsers" using Google or www.tucows.com. There are plenty
> of apps that do that for you right now.
>
> --
> Manohar Kamath
> Editor, .netWire
> www.dotnetwire.com
>
>
> "Ken Fine" <ke*****@u.washington.edu> wrote in message
> news:c4**********@nntp6.u.washington.edu...
> > My well-heeled clients would like a tool that I could build into a

web > > application that would suck down the entire contents a remote webpage and
> > store it on a local filesystem.
> >
> > I know how to scrape textual content from places on the web using

MSXML
> and
> > assorted scracping widget. That doesn't help me for photos, nor does it
> > answer how to encapsulate the photos in a form that I can see

them. > >
> > This would probably only find use in intranet-type situations, so a > IE-only
> > .mht-type encapsulation of the page might be OK.
> >
> > I have OK skills working with the FSO, so if there's a widget or

method
by
> > which the retrieval can be done, I can probably figure things out.
> >
> > Has anyone done anything like this? Can someone suggest widgets

and/or
a
> > design approach?
> >
> > Thanks.
> >
> > -KF
> >
> >
> >
>
>



Jul 19 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Aspersion | last post by:
I'm building an ASP page that has a lot of text and graphics. There is a calculation facility on the page. The user enters several numbers in a form and presses a button to see the calculated...
6
by: Ed | last post by:
I first noticed this in my own app. Images would show up missing randomly on IE 6.0.2800.1106 on Windows 2000 server. I then was able to repro this problem on Microsoft's website!!! The page I...
13
by: | last post by:
Although this question does not belong in here, I hope someone help, as I do not know where to send it, but someone here might have come acros similar problem: Some of JPG files sitting in the...
2
by: Divya | last post by:
Hi all, I have an user.asp page(has html display) which displays the controls; After client-side validations, I do a document.form.submit() to a process asp page(no html, only server side...
2
by: btfbg | last post by:
I am using Response.ContentType = "application/msword" Response.AddHeader "Content-Disposition", "attachment;filename=Letter.doc" to download my asp page to the client. The download process...
4
by: Shawn Berg | last post by:
A web site I am working on is built completely in classic ASP, and allows the user to choose different "skins" for the site. The term "skin" being used very generically. When a user selects a Skin...
1
by: Shiva SG | last post by:
Hello, We are testing a website, which has an index.asp page. Uploaded the files on two different servers. IE opens the index.asp page from both server locations. But Firefox opens the page...
23
by: Peter | last post by:
I have a problem with a page show_image.asp that returns a jpg image under Windows XP Pro SP2. The page sets content type as: Response.ContentType = "image/jpg" While this works perfectly fine...
7
by: ohaya | last post by:
Hi, We want to include some SSI in our ASP pages, but where the SSI is located on a different server, and so we need to be able to access the SSI using a full URL, as opposed to a local file...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.