473,395 Members | 1,454 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

trying to collect links on my site, how to crawl it?

I am trying to crawl my site to get a list of links. I am using the regular
expressions to get the href tags from the pages and reading the links using
xmlhttp module.

is there an efficient way to loop through the links? If you start with the
home page, how do you do it? How do yo ukeep track of the pages you have
been to and which to do next.

I am not sure how to go about tihs.

I tried some loop but it seemed to take too long to crawl my site.

Thanks again
Jul 19 '05 #1
1 1726
For all .asp and .html files etc., you could build the list of files to hit
using FileSystemObject... store the list in a database or text file, and
scroll through it...

--
Aaron Bertrand
SQL Server MVP
http://www.aspfaq.com/
"Danny" <da********@hotmail.com> wrote in message
news:Wl**********************@news4.srv.hcvlny.cv. net...
I am trying to crawl my site to get a list of links. I am using the
regular
expressions to get the href tags from the pages and reading the links
using
xmlhttp module.

is there an efficient way to loop through the links? If you start with
the
home page, how do you do it? How do yo ukeep track of the pages you have
been to and which to do next.

I am not sure how to go about tihs.

I tried some loop but it seemed to take too long to crawl my site.

Thanks again

Jul 19 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: valerian dinca | last post by:
Hi, This is my latest article for your consideration. If you think that this article could interest your subscribers it is free to reprint in your ezine and/or newsletter as long as you: 1....
1
by: Dave | last post by:
Is it possible to crawl a site using ASP & XML HTTP? I know you can hit one link, but how can you go through each link in a page and validate that it returns 200?
3
by: Richard Fritzler | last post by:
I was given the task of designing a complete web based document prep system. In simplest terms (using a msword explanation) create a database of merge fields, and a library of templates. Allow the...
21
by: John | last post by:
Hi, I updated a site and changed the file extensions from .html to .php. Now i noticed that the google does find the old .html pages but since they're not there anymore... they can't be found....
3
by: Mikegtr | last post by:
I need to collect and send data from a rs232 device- it is a simple temperature controller. I need to be able to collect actual temperature ,store the result in database (mysql) and show it in...
48
by: Ward Bekker | last post by:
Hi, I'm wondering if the GC.Collect method really collects all objects possible objects? Or is this still a "smart" process sometimes keeping objects alive even if they can be garbage collected?...
11
by: ymic8 | last post by:
Hi everyone, this is my first thread coz I just joined. Does anyone know how to crawl a particular URL using Python? I tried to build a breadth-first sort of crawler but have little success. ...
5
by: doublestack | last post by:
Hi everyone, I have a xml file that Simile Timeline uses to call events...Some of the events have links in them and I need to have those links open a new window. As it stands now the links replace...
25
by: pereges | last post by:
Hello, I'm trying to build a database driven website for a library management system. The database is stored on a remote server which all of my team mates can access. I've installed MySQL, PHP and...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.