471,123 Members | 791 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,123 software developers and data experts.

trying to collect links on my site, how to crawl it?

I am trying to crawl my site to get a list of links. I am using the regular
expressions to get the href tags from the pages and reading the links using
xmlhttp module.

is there an efficient way to loop through the links? If you start with the
home page, how do you do it? How do yo ukeep track of the pages you have
been to and which to do next.

I am not sure how to go about tihs.

I tried some loop but it seemed to take too long to crawl my site.

Thanks again
Jul 19 '05 #1
1 1660
For all .asp and .html files etc., you could build the list of files to hit
using FileSystemObject... store the list in a database or text file, and
scroll through it...

--
Aaron Bertrand
SQL Server MVP
http://www.aspfaq.com/
"Danny" <da********@hotmail.com> wrote in message
news:Wl**********************@news4.srv.hcvlny.cv. net...
I am trying to crawl my site to get a list of links. I am using the
regular
expressions to get the href tags from the pages and reading the links
using
xmlhttp module.

is there an efficient way to loop through the links? If you start with
the
home page, how do you do it? How do yo ukeep track of the pages you have
been to and which to do next.

I am not sure how to go about tihs.

I tried some loop but it seemed to take too long to crawl my site.

Thanks again

Jul 19 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by valerian dinca | last post: by
1 post views Thread by Dave | last post: by
3 posts views Thread by Mikegtr | last post: by
48 posts views Thread by Ward Bekker | last post: by
11 posts views Thread by ymic8 | last post: by
25 posts views Thread by pereges | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.