471,075 Members | 1,253 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,075 software developers and data experts.

Spidering Hacks for Python, not Perl

O'Reilly's Spidering Hacks books terrific. One problem. All the code
samples are in Perl. Nothing Pythonic. Is there a book out there for
Python which covers spidering / crawling in depth?

Mar 21 '06 #1
5 2121
I've been looking for similar stuff recently. I haven't found much, but
this is the list of links I've come across so far:

Harvest Man - http://harvestman.freezope.org/
Mechanize - http://wwwsearch.sourceforge.net/mechanize/
Beautiful Soup - http://www.crummy.com/software/BeautifulSoup/
(Neither Beautiful Soup, nor Mechanize are complete crawlers but
probably have a lot of the nuts and bolts)

If anyone is aware of a book or other documentation like the OP would
like, I would be pleased to see it as well.

Mar 21 '06 #2
Enigma Curry wrote:
I've been looking for similar stuff recently. I haven't found much, but
this is the list of links I've come across so far:

Harvest Man - http://harvestman.freezope.org/
Mechanize - http://wwwsearch.sourceforge.net/mechanize/
Beautiful Soup - http://www.crummy.com/software/BeautifulSoup/
(Neither Beautiful Soup, nor Mechanize are complete crawlers but
probably have a lot of the nuts and bolts)

If anyone is aware of a book or other documentation like the OP would
like, I would be pleased to see it as well.


Don't forget webchecker and websucker in the Python distribution Tools
folder.
Mar 21 '06 #3

gene tani wrote:
Duncan Booth wrote:
Enigma Curry wrote:


a couple more

http://cheeseshop.python.org/pypi/Orchid/1.0
http://cheeseshop.python.org/pypi/webstemmer/0.5.0

Mar 21 '06 #5
da*****@yahoo.com writes:
O'Reilly's Spidering Hacks books terrific. One problem. All the code
samples are in Perl. Nothing Pythonic. Is there a book out there for
Python which covers spidering / crawling in depth?


A fair number of the examples in that book use WWW::Mechanize. I
ported that to Python (it has evolved a bit since then):

http://wwwsearch.sf.net/mechanize
The distribution includes an example from the book ported to Python,
in fact -- porting other examples should be fairly straightforward.

A stable release of mechanize is (finally!) coming soon(-ish).
John

Mar 23 '06 #6

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

42 posts views Thread by Fred Ma | last post: by
31 posts views Thread by surfunbear | last post: by
68 posts views Thread by Lad | last post: by
6 posts views Thread by Mark Watson | last post: by
13 posts views Thread by squash | last post: by
21 posts views Thread by Roy Smith | last post: by
5 posts views Thread by David Waizer | last post: by
1 post views Thread by George Orwell | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.