472,967 Members | 1,752 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,967 software developers and data experts.

trying to collect links on my site, how to crawl it?

I am trying to crawl my site to get a list of links. I am using the regular
expressions to get the href tags from the pages and reading the links using
xmlhttp module.

is there an efficient way to loop through the links? If you start with the
home page, how do you do it? How do yo ukeep track of the pages you have
been to and which to do next.

I am not sure how to go about tihs.

I tried some loop but it seemed to take too long to crawl my site.

Thanks again
Jul 19 '05 #1
1 1708
For all .asp and .html files etc., you could build the list of files to hit
using FileSystemObject... store the list in a database or text file, and
scroll through it...

--
Aaron Bertrand
SQL Server MVP
http://www.aspfaq.com/
"Danny" <da********@hotmail.com> wrote in message
news:Wl**********************@news4.srv.hcvlny.cv. net...
I am trying to crawl my site to get a list of links. I am using the
regular
expressions to get the href tags from the pages and reading the links
using
xmlhttp module.

is there an efficient way to loop through the links? If you start with
the
home page, how do you do it? How do yo ukeep track of the pages you have
been to and which to do next.

I am not sure how to go about tihs.

I tried some loop but it seemed to take too long to crawl my site.

Thanks again

Jul 19 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: valerian dinca | last post by:
Hi, This is my latest article for your consideration. If you think that this article could interest your subscribers it is free to reprint in your ezine and/or newsletter as long as you: 1....
1
by: Dave | last post by:
Is it possible to crawl a site using ASP & XML HTTP? I know you can hit one link, but how can you go through each link in a page and validate that it returns 200?
3
by: Richard Fritzler | last post by:
I was given the task of designing a complete web based document prep system. In simplest terms (using a msword explanation) create a database of merge fields, and a library of templates. Allow the...
21
by: John | last post by:
Hi, I updated a site and changed the file extensions from .html to .php. Now i noticed that the google does find the old .html pages but since they're not there anymore... they can't be found....
3
by: Mikegtr | last post by:
I need to collect and send data from a rs232 device- it is a simple temperature controller. I need to be able to collect actual temperature ,store the result in database (mysql) and show it in...
48
by: Ward Bekker | last post by:
Hi, I'm wondering if the GC.Collect method really collects all objects possible objects? Or is this still a "smart" process sometimes keeping objects alive even if they can be garbage collected?...
11
by: ymic8 | last post by:
Hi everyone, this is my first thread coz I just joined. Does anyone know how to crawl a particular URL using Python? I tried to build a breadth-first sort of crawler but have little success. ...
5
by: doublestack | last post by:
Hi everyone, I have a xml file that Simile Timeline uses to call events...Some of the events have links in them and I need to have those links open a new window. As it stands now the links replace...
25
by: pereges | last post by:
Hello, I'm trying to build a database driven website for a library management system. The database is stored on a remote server which all of my team mates can access. I've installed MySQL, PHP and...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
0
by: Aliciasmith | last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
2
by: giovanniandrean | last post by:
The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
3
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
1
by: Teri B | last post by:
Hi, I have created a sub-form Roles. In my course form the user selects the roles assigned to the course. 0ne-to-many. One course many roles. Then I created a report based on the Course form and...
3
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
0
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...
0
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.