473,549 Members | 2,360 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Design guidance needed: traversing links in ASP

I would like some guidance regarding a "content scanner" I'm trying to
build. This ASP widget will automatically scan remote web sites for certain
kinds of content using a screen scraping component and simple pattern
matching. The widget will generate reports about what it found and where.

Ideally, I would like the widget to follow all of the http:// links on the
remote page for one level, and scan the child pages for certain kinds of
content. I'm trying to figure out the best way to do this. .

Here's what I'm thinking:

1) Scan known URL, make string of page content, embed that in a variable
strPageContent
2) Examine strPageContent for search term, generate report
3) Use a function to strip out everything from strPageContent except a list
of valid URLs
4) Use another function to remove all duplicate URLs from modified
strPageContent
5) Move strPageContent to an array
6) Loop through all items in the array, screen scraping each URL, and
testing it for the search term.Repeat as necessary.
7) Repeat as necessary for other search terms

I think this will probably work. However, I can't escape the nagging feeling
that either a) someone's already done this far more elegantly, or b) the
functionality may be baked in to ASP or ASP.NET, or available as an add-on.

Any pointers or good ideas out there?

Thanks.
Jul 19 '05 #1
2 1843
Your lack of responses to this post are probably down to the fact that you
have written a functional specification, almost psuedo code rather than
posted details of an ASP problem.

What is preventing you from actually getting started with this?

I have inserted some keywords to look up in MSDN, Google or ASPFAQ in your
original comments. They relate to ASP as this is not a .NET forum.

HTH

"Ken Fine" <ke*****@u.wash ington.edu> wrote in message
news:bh******** ***@nntp6.u.was hington.edu...
I would like some guidance regarding a "content scanner" I'm trying to
build. This ASP widget will automatically scan remote web sites for certain kinds of content using a screen scraping component and simple pattern
matching. The widget will generate reports about what it found and where.

Ideally, I would like the widget to follow all of the http:// links on the
remote page for one level, and scan the child pages for certain kinds of
content. I'm trying to figure out the best way to do this. .

Here's what I'm thinking:

1) Scan known URL, make string of page content, embed that in a variable
strPageContent
MSXML2.ServerXM LHTTP
2) Examine strPageContent for search term, generate report
ResponseText InStr
3) Use a function to strip out everything from strPageContent except a list of valid URLs
Regular Expressions
4) Use another function to remove all duplicate URLs from modified
strPageContent
(Swap 5 with 4)
Deduplicate array
5) Move strPageContent to an array
Split(strPageCo ntent, "http://")
6) Loop through all items in the array, screen scraping each URL, and
testing it for the search term.Repeat as necessary.
MSXML2.ServerXM LHTTP
InStr
7) Repeat as necessary for other search terms

I think this will probably work. However, I can't escape the nagging feeling that either a) someone's already done this far more elegantly, or b) the
functionality may be baked in to ASP or ASP.NET, or available as an add-on.
Any pointers or good ideas out there?

Thanks.

Jul 19 '05 #2


David,

Thanks much for your helpful reply. The reason that I posted is because
I didn't really know if my specification was actually functional, or if
it was duplicative of functionalities that were already baked into
ASP/ASP.NET.

And although I might seem to have some idea of what I'm talking about,
I've never actually done many of these things. I've never built an array
or looped through it, for instance, even though I think I understand why
people make them and what they're useful for.

I'll look around at your links and get going with this; sounds like you
think it's doable.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Jul 19 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

43
4790
by: grz02 | last post by:
Hi, Im an experienced database+software designer and developer, but, unfortunately, anything to do with web-programming and web-systems designs is still a pretty new area to me... (been working mostly with "legacy" environments the last 10 years) So I am writing this, hoping to get some useful advise and feedback... I have done some...
15
2192
by: Timin Uram | last post by:
I'm not sure if this forum is the correct place to post this, but I couldn't think of any other group. I would really appreciate any help you could give me. FINAL GOAL OF MY APPLICATION: Building a friendster clone for a very large organization (10,000 people). I am using Php/Mysql. SETUP OF SYSTEM: (it may help if you are familiar with...
3
4119
by: zlst | last post by:
Many technological innovations rely upon User Interface Design to elevate their technical complexity to a usable product. Technology alone may not win user acceptance and subsequent marketability. The User Experience, or how the user experiences the end product, is the key to acceptance. And that is where User Interface Design enters the...
10
1302
by: Mr Newbie | last post by:
Does anyone have any recommendations for a really good book, or web site article on design of N-Tier systems using .NET -- Best Regards The Inimitable Mr Newbie º¿º
4
4865
by: plmanikandan | last post by:
Hi, I am new to link list programming.I need to traverse from the end of link list.Is there any way to find the end of link list without traversing from start(i.e traversing from first to find the next for null).Is there any way to find the length of linked list in c.My need is to traverse from the end to 5th node Regards, Mani
3
2449
by: FluffyCat | last post by:
Last month I continued my series of design patterns examples using PHP 5 with the Observer Pattern and the Prototype Pattern. Here now is my 16th example, the Adapter pattern. http://www.fluffycat.com/PHP-Design-Patterns/Adapter/ In the Adapter Pattern we adapt a class we have to provide methods another class needs, without changing the...
8
1690
by: | last post by:
I'm looking for some design guidance on a collection of projects I'm working on. The project involves a bunch of websites constructed out of a collection of user controls. Different user populations with different access rights and "roles" will be visiting the site. I will be using ASP.NET 2.0's membership, roles, and profiles stuff to...
1
1135
by: Rusty Hill | last post by:
In ASP.net 2.0 I need to create a scheduling page that allows my users to book/schedule/reserve six different surgery rooms. What the design calls for is one screen that has the daily schedule on the vertical axis and across the top on a horizontal axis the six different surgery rooms are represented. This way the operator can see at a...
10
3631
by: vital | last post by:
Hi, I am designing the middle tier of a project. It has 6 classes and microsoft application data access block. The six classes are DBServices, Logger, ProjectServices ... etc. and all these classes talk to front-end directly. Do I need to use any design pattern in this? or what kind of design pattern is this?
0
7542
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7467
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7736
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
7827
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6066
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5385
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5110
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3514
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
1
1961
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.