473,320 Members | 2,164 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Creating a web bot/crawler/spider for multiple websites

Hello

I need to create a web bot/crawler/spider that would go into different web sites and collect data for us and store in a database. The crawler needs to 'READ' the options on a website (either from drop-downs, radio-buttons or check-boxesand) to create some input itself OR use some generic pre-defined words (that we provide it with).

For example, a webpage might be structure with a text field and some drop-downs. Typically, if the user enters the case number of a court case the web-site displays the status, and also there might be different legal documents thay could be retrieved through drop-down options like: 'Industry Permits', 'Civil Cases', 'Criminl cases' etc. So the crawler should be able to read and self-generate a list of suitable options and use them to get the data. we want to create a bot/crawler/spider that will automatically enter the information about multiple cases etc. i.e. case numbers (text field), case type (from drop-downs) and retrieve the data about the relevant cases available on the website.

What is the best approach to achieve this? We can write inidividual bots for each website but are trying to come-up with a more intelligent bot or crawler that can be used to crawl multiple websites. Please advise on how we can achive this.

We are not doing anything illegal, everything perfectly legal. Please advise on how we can achieve this.

Regards
Kishore
Oct 21 '08 #1
0 2221

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: Metropolis | last post by:
Hello All, I am currently trying to teach a web crawler how to identify blogs, that is I am trying to determine a fairly inclusive set of criteria that will help my crawler to identify them. ...
0
by: Bill | last post by:
Hello, Does anyone know of an ASP-based search engine that uses a crawler/spider? Preferably, I'd like one that uses markup tags so that I can, for example, exclude blocks of text from being...
5
by: David Baker | last post by:
Hi all I am very new to ASP.Net. I am trying to create a sniffer for our program. We want our users to click our sniffer and hopefully the sniffer will check their computer against our...
1
by: Bill | last post by:
Is there a fast way, with Classic ASP, to determine if a user agent is a search engine spider? I know that ASP.NET has Request.Browser.Crawler, I'm looking to see if classic ASP has something...
0
by: dtsearch | last post by:
New release expands-through a .NET Spider API, to Linux, and to OpenOffice-dtSearch's ability to index over a terabyte of text in a single index, with indexed search time typically less than a...
4
by: StevePBurgess | last post by:
Hi. I have a book affiliate website. Whenever a visitor clicks on one of the books, a script adds one to a field in a mysql database and then takes the visitor to the shopping basket on the book...
3
by: mh121 | last post by:
I am trying to write a web crawler (for academic research purposes) that grabs the number of links different websites/domain names have from other websites, as listed on Google (for example, to get...
7
by: bdy120602 | last post by:
In addition to the question in the subject line, if the answer is yes, is it possible to locate keywords as part of the functionality of said crawler (bot, spider)? Basically, I would like to...
4
by: sonich | last post by:
I need simple web crawler, I found Ruya, but it's seems not currently maintained. Does anybody know good web crawler on python or with python interface?
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.