473,396 Members | 1,707 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Spidering Hacks for Python, not Perl

O'Reilly's Spidering Hacks books terrific. One problem. All the code
samples are in Perl. Nothing Pythonic. Is there a book out there for
Python which covers spidering / crawling in depth?

Mar 21 '06 #1
5 2286
I've been looking for similar stuff recently. I haven't found much, but
this is the list of links I've come across so far:

Harvest Man - http://harvestman.freezope.org/
Mechanize - http://wwwsearch.sourceforge.net/mechanize/
Beautiful Soup - http://www.crummy.com/software/BeautifulSoup/
(Neither Beautiful Soup, nor Mechanize are complete crawlers but
probably have a lot of the nuts and bolts)

If anyone is aware of a book or other documentation like the OP would
like, I would be pleased to see it as well.

Mar 21 '06 #2
Enigma Curry wrote:
I've been looking for similar stuff recently. I haven't found much, but
this is the list of links I've come across so far:

Harvest Man - http://harvestman.freezope.org/
Mechanize - http://wwwsearch.sourceforge.net/mechanize/
Beautiful Soup - http://www.crummy.com/software/BeautifulSoup/
(Neither Beautiful Soup, nor Mechanize are complete crawlers but
probably have a lot of the nuts and bolts)

If anyone is aware of a book or other documentation like the OP would
like, I would be pleased to see it as well.


Don't forget webchecker and websucker in the Python distribution Tools
folder.
Mar 21 '06 #3

gene tani wrote:
Duncan Booth wrote:
Enigma Curry wrote:


a couple more

http://cheeseshop.python.org/pypi/Orchid/1.0
http://cheeseshop.python.org/pypi/webstemmer/0.5.0

Mar 21 '06 #5
da*****@yahoo.com writes:
O'Reilly's Spidering Hacks books terrific. One problem. All the code
samples are in Perl. Nothing Pythonic. Is there a book out there for
Python which covers spidering / crawling in depth?


A fair number of the examples in that book use WWW::Mechanize. I
ported that to Python (it has evolved a bit since then):

http://wwwsearch.sf.net/mechanize
The distribution includes an example from the book ported to Python,
in fact -- porting other examples should be fairly straightforward.

A stable release of mechanize is (finally!) coming soon(-ish).
John

Mar 23 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

42
by: Fred Ma | last post by:
Hello, This is not a troll posting, and I've refrained from asking because I've seen similar threads get all nitter-nattery. But I really want to make a decision on how best to invest my time....
31
by: surfunbear | last post by:
I've read some posts on Perl versus Python and studied a bit of my Python book. I'm a software engineer, familiar with C++ objected oriented development, but have been using Perl because it is...
8
by: Nick Coghlan | last post by:
Time for another random syntax idea. . . So, I was tinkering in the interactive interpreter, and came up with the following one-size-fits-most default argument hack: Py> x = 1 Py> def...
68
by: Lad | last post by:
Is anyone capable of providing Python advantages over PHP if there are any? Cheers, L.
6
by: Mark Watson | last post by:
Last year, I did an experiment of allowing a very polite web spider run for a few days trying to find RDF markup embedded in web pages. I found close to zero RDF - not encouraging! I a recent...
13
by: squash | last post by:
I am a little annoyed at why such a simple program in Perl is causing so much difficulty for python, i.e: $a += 200000 * 140000; print $a;
21
by: Roy Smith | last post by:
I'm working on a product which for a long time has had a Perl binding for our remote access API. A while ago, I wrote a Python binding on my own, chatted it up a bit internally, and recently had a...
5
by: David Waizer | last post by:
Hello.. I'm looking for a script (perl, python, sh...)or program (such as wget) that will help me get a list of ALL the links on a website. For example ./magicscript.pl www.yahoo.com and...
1
by: George Orwell | last post by:
Would I be missing much if I stopped trying to learn Perl well enough to use for spidering, screen scraping etc. and converted over to PHP ? I am looking to do all, or at least most of the hacks...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.