473,386 Members | 1,828 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Not necessarily related to python Web Crawlers

Hi
Does anyone here have a good recommendation for an open source crawler
that I could get my hands on? It doesn't have to be python based. I am
interested in learning how crawling works. I think python based
crawlers will ensure a high degree of flexibility but at the same time
I am also torn between looking for open source crawlers in python vs C
++ because the latter is much more efficient(or so I heard. I will be
crawling on very cheap hardware.)

I am definitely open to suggestions.

Thx
Jul 5 '08 #1
2 1148
just crawling is supereasy. its how to index and search that is hard.
just start at yahoo.com, scrape out all the links and then for every
site visit every link.
i wrote a crawler in 15 lines of code. but then it all it did was
visit the sites, not indexing them or anything.

you could write a faster one in C++ probably but if you are new to it
doing it in python will let you experiment and learn faster.

some links:
http://infolab.stanford.edu/~backrub/google.html
http://www-csli.stanford.edu/~hinric...eval-book.html

http://www.example-code.com/python/pythonspider.asp
http://www.example-code.com/python/s...pleCrawler.asp
Jul 5 '08 #2
On Jul 5, 2:31*pm, disappeare...@gmail.com wrote:
Hi
Does anyone here have a good recommendation for an open source crawler
that I could get my hands on? It doesn't have to be python based. I am
interested in learning how crawling works. I think python based
crawlers will ensure a high degree of flexibility but at the same time
I am also torn between looking for open source crawlers in python vs C
++ because the latter is much more efficient(or so I heard. I will be
crawling on very cheap hardware.)

I am definitely open to suggestions.

Thx
You can check my python blog. There are some tips and codes on
crawlers.
http://love-python.blogspot.com/

regards,
Subeen
http://love-python.blogspot.com/
Jul 6 '08 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

27
by: Florian Lindner | last post by:
Hello, I've been using Python a lot for scripting (mainly scripts for server administration / DB access). All these scripts were shell based. Now I'm considering using Python (with mod_python on...
5
by: Ian Lane .enizin.net> | last post by:
Hello, Does anyone happen to have a Browsercaps update that properly sets the Crawler attribute? I am seeing that google and others are being recognized as browsers and not crawlers. Thank...
0
by: TomislaW | last post by:
I try to trace users on my web page In global.asax.cs on Application_BeginRequest I check if user has my cookie, if not I give him new cookie (integer identity number from database). When...
18
by: Tempo | last post by:
I was wondering if python is a good language to build a web crawler with? For example, to construct a program that will routinely search x amount of sites to check the availability of a product. Or...
13
by: abhinav | last post by:
Hi guys.I have to implement a topical crawler as a part of my project.What language should i implement C or Python?Python though has fast development cycle but my concern is speed also.I want to...
0
by: Stefano | last post by:
Hi all, I'm trying to create a browser definition file (.browser) that matches crawlers user agents. I don't want modify browser files in the Config system folder. I'd like to use App_Browsers...
4
by: bryan rasmussen | last post by:
Hi I'm wondering if there is a toolkit in python anywhere for doing at a high level web crawling, dumping links to a data set that could be imported into R relatively easy or used in python...
19
by: John Salerno | last post by:
Hey all. Just thought I'd ask a general question for my own interest. Every time I think of something I might do in Python, it usually involves creating a GUI interface, so I was wondering what kind...
12
by: disappearedng | last post by:
Hi all, I am currently planning to write my own web crawler. I know Python but not Perl, and I am interested in knowing which of these two are a better choice given the following scenario: 1)...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.