On Mar 20, 1:56 am, Tina I <tina...@bestemselv.comwrote:
cjl wrote:
Hi.
I am trying to screen scrape some stock data from yahoo, so I am
trying to use urllib2 to retrieve the html and beautiful soup for the
parsing.
Maybe (most likely) I am doing something wrong, but when I use
urllib2.urlopen to fetch a page, and when I view 'page source' of the
exact same URL in firefox, I am seeing slight differences in the raw
html.
Do I need to set a browser agent so yahoo thinks urllib2 is firefox?
Is yahoo detecting that urllib2 doesn't process javascript, and
passing different data?
-cjl
Unless the data you you need depends on the site detecting a specific
browser you will probably receive a 'cleaner' code that's more easily
parsed if you don't set a user agent. Usually the browser optimization
they do is just eye candy, bells and whistles anyway in order to give
you a more 'pleasing experience'. I doubt that your program will care
about that ;)
Tina
You can do this fairly easily. I found a similar program in the book
Core Python Programming. It actually sticks the stocks into an Excel
spreadsheet. The code is below. You can easily modify it to send the
output elsewhere.
# Core Python Chp 23, pg 994
# estock.pyw
from Tkinter import Tk
from time import sleep, ctime
from tkMessageBox import showwarning
from urllib import urlopen
import win32com.client as win32
warn = lambda app: showwarning(app, 'Exit?')
RANGE = range(3, 8)
TICKS = ('AMZN', 'AMD', 'EBAY', 'GOOG', 'MSFT', 'YHOO')
COLS = ('TICKER', 'PRICE', 'CHG', '%AGE')
URL = 'http://quote.yahoo.com/d/quotes.csv?s=%s&f=sl1c1p2'
def excel():
app = 'Excel'
xl = win32.gencache.EnsureDispatch('%s.Application' % app)
ss = xl.Workbooks.Add()
sh = ss.ActiveSheet
xl.Visible = True
sleep(1)
sh.Cells(1, 1).Value = 'Python-to-%s Stock Quote Demo' % app
sleep(1)
sh.Cells(3, 1).Value = 'Prices quoted as of: %s' % ctime()
sleep(1)
for i in range(4):
sh.Cells(5, i+1).Value = COLS[i]
sleep(1)
sh.Range(sh.Cells(5, 1), sh.Cells(5, 4)).Font.Bold = True
sleep(1)
row = 6
u = urlopen(URL % ','.join(TICKS))
for data in u:
tick, price, chg, per = data.split(',')
sh.Cells(row, 1).Value = eval(tick)
sh.Cells(row, 2).Value = ('%.2f' % round(float(price), 2))
sh.Cells(row, 3).Value = chg
sh.Cells(row, 4).Value = eval(per.rstrip())
row += 1
sleep(1)
u.close()
warn(app)
ss.Close(False)
xl.Application.Quit()
if __name__ == '__main__':
Tk().withdraw()
excel()
# Have fun - Mike