Sign In | Register Now About Bytes | Help | Site Map
Connecting Tech Pros Worldwide

Not necessarily related to python Web Crawlers

Question posted by: disappearedng@gmail.com (Guest) on July 5th, 2008 08:35 AM
Hi
Does anyone here have a good recommendation for an open source crawler
that I could get my hands on? It doesn't have to be python based. I am
interested in learning how crawling works. I think python based
crawlers will ensure a high degree of flexibility but at the same time
I am also torn between looking for open source crawlers in python vs C
++ because the latter is much more efficient(or so I heard. I will be
crawling on very cheap hardware.)

I am definitely open to suggestions.

Thx
defn noob's Avatar
defn noob
Guest
n/a Posts
July 5th, 2008
09:15 AM
#2

Re: Not necessarily related to python Web Crawlers
just crawling is supereasy. its how to index and search that is hard.
just start at yahoo.com, scrape out all the links and then for every
site visit every link.
i wrote a crawler in 15 lines of code. but then it all it did was
visit the sites, not indexing them or anything.

you could write a faster one in C++ probably but if you are new to it
doing it in python will let you experiment and learn faster.

some links:
http://infolab.stanford.edu/~backrub/google.html
http://www-csli.stanford.edu/~hinri...ieval-book.html



http://www.example-code.com/python/pythonspider.asp
http://www.example-code.com/python/...mpleCrawler.asp

subeen's Avatar
subeen
Guest
n/a Posts
July 6th, 2008
10:35 AM
#3

Re: Not necessarily related to python Web Crawlers
On Jul 5, 2:31*pm, disappeare...@gmail.com wrote:
Quote:
Hi
Does anyone here have a good recommendation for an open source crawler
that I could get my hands on? It doesn't have to be python based. I am
interested in learning how crawling works. I think python based
crawlers will ensure a high degree of flexibility but at the same time
I am also torn between looking for open source crawlers in python vs C
++ because the latter is much more efficient(or so I heard. I will be
crawling on very cheap hardware.)
>
I am definitely open to suggestions.
>
Thx


You can check my python blog. There are some tips and codes on
crawlers.
http://love-python.blogspot.com/

regards,
Subeen
http://love-python.blogspot.com/

 
Not the answer you were looking for? Post your question . . .
189,872 Experts ready to help you find a solution.
Sign up for a free account, or Login (if you're already a member).

Latest Articles: Read & Comment
  • Didn't find the answer you were looking for?
    Post Your Question
  • Top Community Contributors