By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,760 Members | 1,630 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,760 IT Pros & Developers. It's quick & easy.

Not necessarily related to python Web Crawlers

P: n/a
Hi
Does anyone here have a good recommendation for an open source crawler
that I could get my hands on? It doesn't have to be python based. I am
interested in learning how crawling works. I think python based
crawlers will ensure a high degree of flexibility but at the same time
I am also torn between looking for open source crawlers in python vs C
++ because the latter is much more efficient(or so I heard. I will be
crawling on very cheap hardware.)

I am definitely open to suggestions.

Thx
Jul 5 '08 #1
Share this Question
Share on Google+
2 Replies


P: n/a
just crawling is supereasy. its how to index and search that is hard.
just start at yahoo.com, scrape out all the links and then for every
site visit every link.
i wrote a crawler in 15 lines of code. but then it all it did was
visit the sites, not indexing them or anything.

you could write a faster one in C++ probably but if you are new to it
doing it in python will let you experiment and learn faster.

some links:
http://infolab.stanford.edu/~backrub/google.html
http://www-csli.stanford.edu/~hinric...eval-book.html

http://www.example-code.com/python/pythonspider.asp
http://www.example-code.com/python/s...pleCrawler.asp
Jul 5 '08 #2

P: n/a
On Jul 5, 2:31*pm, disappeare...@gmail.com wrote:
Hi
Does anyone here have a good recommendation for an open source crawler
that I could get my hands on? It doesn't have to be python based. I am
interested in learning how crawling works. I think python based
crawlers will ensure a high degree of flexibility but at the same time
I am also torn between looking for open source crawlers in python vs C
++ because the latter is much more efficient(or so I heard. I will be
crawling on very cheap hardware.)

I am definitely open to suggestions.

Thx
You can check my python blog. There are some tips and codes on
crawlers.
http://love-python.blogspot.com/

regards,
Subeen
http://love-python.blogspot.com/
Jul 6 '08 #3

This discussion thread is closed

Replies have been disabled for this discussion.