473,396 Members | 1,671 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Design guidance needed: traversing links in ASP

I would like some guidance regarding a "content scanner" I'm trying to
build. This ASP widget will automatically scan remote web sites for certain
kinds of content using a screen scraping component and simple pattern
matching. The widget will generate reports about what it found and where.

Ideally, I would like the widget to follow all of the http:// links on the
remote page for one level, and scan the child pages for certain kinds of
content. I'm trying to figure out the best way to do this. .

Here's what I'm thinking:

1) Scan known URL, make string of page content, embed that in a variable
strPageContent
2) Examine strPageContent for search term, generate report
3) Use a function to strip out everything from strPageContent except a list
of valid URLs
4) Use another function to remove all duplicate URLs from modified
strPageContent
5) Move strPageContent to an array
6) Loop through all items in the array, screen scraping each URL, and
testing it for the search term.Repeat as necessary.
7) Repeat as necessary for other search terms

I think this will probably work. However, I can't escape the nagging feeling
that either a) someone's already done this far more elegantly, or b) the
functionality may be baked in to ASP or ASP.NET, or available as an add-on.

Any pointers or good ideas out there?

Thanks.
Jul 19 '05 #1
2 1838
Your lack of responses to this post are probably down to the fact that you
have written a functional specification, almost psuedo code rather than
posted details of an ASP problem.

What is preventing you from actually getting started with this?

I have inserted some keywords to look up in MSDN, Google or ASPFAQ in your
original comments. They relate to ASP as this is not a .NET forum.

HTH

"Ken Fine" <ke*****@u.washington.edu> wrote in message
news:bh***********@nntp6.u.washington.edu...
I would like some guidance regarding a "content scanner" I'm trying to
build. This ASP widget will automatically scan remote web sites for certain kinds of content using a screen scraping component and simple pattern
matching. The widget will generate reports about what it found and where.

Ideally, I would like the widget to follow all of the http:// links on the
remote page for one level, and scan the child pages for certain kinds of
content. I'm trying to figure out the best way to do this. .

Here's what I'm thinking:

1) Scan known URL, make string of page content, embed that in a variable
strPageContent
MSXML2.ServerXMLHTTP
2) Examine strPageContent for search term, generate report
ResponseText InStr
3) Use a function to strip out everything from strPageContent except a list of valid URLs
Regular Expressions
4) Use another function to remove all duplicate URLs from modified
strPageContent
(Swap 5 with 4)
Deduplicate array
5) Move strPageContent to an array
Split(strPageContent, "http://")
6) Loop through all items in the array, screen scraping each URL, and
testing it for the search term.Repeat as necessary.
MSXML2.ServerXMLHTTP
InStr
7) Repeat as necessary for other search terms

I think this will probably work. However, I can't escape the nagging feeling that either a) someone's already done this far more elegantly, or b) the
functionality may be baked in to ASP or ASP.NET, or available as an add-on.
Any pointers or good ideas out there?

Thanks.

Jul 19 '05 #2


David,

Thanks much for your helpful reply. The reason that I posted is because
I didn't really know if my specification was actually functional, or if
it was duplicative of functionalities that were already baked into
ASP/ASP.NET.

And although I might seem to have some idea of what I'm talking about,
I've never actually done many of these things. I've never built an array
or looped through it, for instance, even though I think I understand why
people make them and what they're useful for.

I'll look around at your links and get going with this; sounds like you
think it's doable.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Jul 19 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

43
by: grz02 | last post by:
Hi, Im an experienced database+software designer and developer, but, unfortunately, anything to do with web-programming and web-systems designs is still a pretty new area to me... (been working...
15
by: Timin Uram | last post by:
I'm not sure if this forum is the correct place to post this, but I couldn't think of any other group. I would really appreciate any help you could give me. FINAL GOAL OF MY APPLICATION:...
3
by: zlst | last post by:
Many technological innovations rely upon User Interface Design to elevate their technical complexity to a usable product. Technology alone may not win user acceptance and subsequent marketability....
10
by: Mr Newbie | last post by:
Does anyone have any recommendations for a really good book, or web site article on design of N-Tier systems using .NET -- Best Regards The Inimitable Mr Newbie º¿º
4
by: plmanikandan | last post by:
Hi, I am new to link list programming.I need to traverse from the end of link list.Is there any way to find the end of link list without traversing from start(i.e traversing from first to find the...
3
by: FluffyCat | last post by:
Last month I continued my series of design patterns examples using PHP 5 with the Observer Pattern and the Prototype Pattern. Here now is my 16th example, the Adapter pattern. ...
8
by: | last post by:
I'm looking for some design guidance on a collection of projects I'm working on. The project involves a bunch of websites constructed out of a collection of user controls. Different user...
1
by: Rusty Hill | last post by:
In ASP.net 2.0 I need to create a scheduling page that allows my users to book/schedule/reserve six different surgery rooms. What the design calls for is one screen that has the daily schedule on...
10
by: vital | last post by:
Hi, I am designing the middle tier of a project. It has 6 classes and microsoft application data access block. The six classes are DBServices, Logger, ProjectServices ... etc. and all these...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.