By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,708 Members | 2,080 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,708 IT Pros & Developers. It's quick & easy.

stealth screen scraping with python?

P: n/a
Folks:

I am screen scraping a large volume of data from Yahoo Finance each
evening, and parsing with Beautiful Soup.

I was wondering if anyone could give me some pointers on how to make
it less obvious to Yahoo that this is what I am doing, as I fear that
they probably monitor for this type of activity, and will soon ban my
IP.

-DE

May 11 '07 #1
Share this Question
Share on Google+
4 Replies


P: n/a
On 11 May 2007 12:32:55 -0700, di**************@gmail.com
<di**************@gmail.comwrote:
Folks:

I am screen scraping a large volume of data from Yahoo Finance each
evening, and parsing with Beautiful Soup.

I was wondering if anyone could give me some pointers on how to make
it less obvious to Yahoo that this is what I am doing, as I fear that
they probably monitor for this type of activity, and will soon ban my
IP.

-DE
So long as you are sending a regular http request, as from a browser,
then they will have no way of knowing. Just keep your queries down to
no more than once every 3-5 seconds and you should be fine. Rotate
your IP, too, if you can.

Dotan Cohen

http://lyricslist.com/lyrics/artist_...rmen_eric.html
http://what-is-what.com/what_is/eula.html
May 11 '07 #2

P: n/a
On May 11, 2:32 pm, different.eng...@gmail.com wrote:
Folks:

I am screen scraping a large volume of data from Yahoo Finance each
evening, and parsing with Beautiful Soup.

I was wondering if anyone could give me some pointers on how to make
it less obvious to Yahoo that this is what I am doing, as I fear that
they probably monitor for this type of activity, and will soon ban my
IP.

-DE
Depends on what you're doing exactly. I've done something like this
and it only hits the page once:

URL = 'http://quote.yahoo.com/d/quotes.csv?s=%s&f=sl1c1p2'
TICKS = ('AMZN', 'AMD', 'EBAY', 'GOOG', 'MSFT', 'YHOO')
u = urlopen(URL % ','.join(TICKS))
for data in u:
tick, price, chg, per = data.split(',')
# do something with data

If you're grabbing all the data in one fell swoop (which is what you
should aim for), then it's harder for Yahoo! to know what you're doing
exactly. And I can't see why they'd care as that is all a browser does
anyway. It's when you hit the site a bunch of times in a short period
of time that sets off the alarms.

Mike

May 11 '07 #3

P: n/a
On Fri, 11 May 2007 12:32:55 -0700, different.engine wrote:
Folks:

I am screen scraping a large volume of data from Yahoo Finance each
evening, and parsing with Beautiful Soup.

I was wondering if anyone could give me some pointers on how to make
it less obvious to Yahoo that this is what I am doing, as I fear that
they probably monitor for this type of activity, and will soon ban my
IP.
Write a virus to hijack tens of thousands of Windows PCs around the world,
and use your army of zombie-PCs to do the screen scraping for you. Each
one only needs to scrape a small amount of data, and Yahoo will have no
way of telling that it is you.

*wink*

--
Steven.

May 12 '07 #4

P: n/a
di**************@gmail.com schrieb:
I am screen scraping a large volume of data from Yahoo Finance each
evening, and parsing with Beautiful Soup.

I was wondering if anyone could give me some pointers on how to make
it less obvious to Yahoo that this is what I am doing, as I fear that
they probably monitor for this type of activity, and will soon ban my
IP.
Use anonymizing proxies:
http://www.google.com/search?q=proxi...OR+anonymizing

--
Thomas Wittek
http://gedankenkonstrukt.de/
Jabber: st*********@jabber.i-pobox.net
May 12 '07 #5

This discussion thread is closed

Replies have been disabled for this discussion.