By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,320 Members | 2,109 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,320 IT Pros & Developers. It's quick & easy.

Web Scraping/Site Scraping

P: n/a
Hi, I'm interested in learning about web scraping/site scraping using
Python. Does anybody know of some online resources or have any modules that
are available to help out. O'Reilly published an interesting book
"Spidering Hacks" which covered some great scraping hacks but it is all
written in Perl. I don't know Perl and don't want to. I'm new to
programing and have been advised to start with Python. So far so good ...
but need some help with web programming. Thanks for any help you may
provide. Dave.
Jul 18 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
> Hi, I'm interested in learning about web scraping/site scraping using
Python.


I found this document interesting:

http://www.rexx.com/~dkuhlman/quixote_htmlscraping.html
HTH

--
Anakim Border
ab*****@users.sourceforge.net
Jul 18 '05 #2

P: n/a
"David Jones" <dj****@outrider.net> writes:
Hi, I'm interested in learning about web scraping/site scraping using
Python. Does anybody know of some online resources or have any modules that
are available to help out. O'Reilly published an interesting book
"Spidering Hacks" which covered some great scraping hacks but it is all
written in Perl. I don't know Perl and don't want to. I'm new to
programing and have been advised to start with Python. So far so good ...
but need some help with web programming. Thanks for any help you may
provide. Dave.


http://wwwsearch.sourceforge.net/
http://wwwsearch.sourceforge.net/bits/GeneralFAQ.html

http://lists.sourceforge.net/lists/l...search-general (rather quiet ATM)
I ported one of the examples from "Spidering Hacks" to my Python port
of mechanize. It's in the tarball here:

http://wwwsearch.sourceforge.net/mechanize/
John
Jul 18 '05 #3

P: n/a
> "David Jones" <dj****@outrider.net> writes:
Hi, I'm interested in learning about web scraping/site scraping using
Python. Does anybody know of some online resources or have any modulesthat
are available to help out. O'Reilly published an interesting book
"Spidering Hacks" which covered some great scraping hacks but it is all
written in Perl. I don't know Perl and don't want to. I'm new to
programing and have been advised to start with Python. So far so good ....
but need some help with web programming. Thanks for any help you may
provide. Dave.


Dave, there's a chapter of "Dive Into Python" that deals specificlaly
with processing HTML:

http://diveintopython.org/html_processing/index.html

If you're new to Python and programming, IMO you should start by going
through one or more of the available introductory tutorials:

http://python.org/doc/Intros.html

Good luck!

pb

--
paul bissex, e-scribe.com -- database-driven web development
413.585.8095
69.55.225.29
01061-0847
7239'71"W 4219'42"N
Jul 18 '05 #4

P: n/a
On Sun, Jul 11, 2004 at 01:42:47PM +0000, David Jones wrote:
Hi, I'm interested in learning about web scraping/site scraping using
Python. Does anybody know of some online resources or have any modules that
are available to help out. O'Reilly published an interesting book
"Spidering Hacks" which covered some great scraping hacks but it is all
written in Perl. I don't know Perl and don't want to. I'm new to
programing and have been advised to start with Python. So far so good ...
but need some help with web programming. Thanks for any help you may
provide. Dave.


For the HTML parsing part of the task, I've heard that Beautiful Soup works
well:
http://www.crummy.com/software/BeautifulSoup/

-Andrew.

Jul 18 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.