By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
464,282 Members | 1,177 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 464,282 IT Pros & Developers. It's quick & easy.

Website data-mining.

P: n/a
Hi--
I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?


nieu
Aug 4 '07 #1
Share this Question
Share on Google+
5 Replies

P: n/a
On Aug 3, 7:50 pm, Coogan <pcb2...@columbia.eduwrote:
Hi--

I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?

nieu
How about this? it will fetch the HTML source of the page.

import datetime, time, re, os, sys, traceback, smtplib, string,\
urllib2, urllib, inspect
from urllib2 import build_opener, HTTPCookieProcessor, Request
opener = build_opener(HTTPCookieProcessor)
from urllib import urlencode

def urlopen2(url, data=None, user_agent='urlopen2'):
"""Opens Our URLS """
if hasattr(data, "__iter__"):
data = urlencode(data)
headers = {'User-Agent' : user_agent}
return opener.open(Request(url, data, headers))

###TESTCASES START HERE###
def publishedNotes():
page = urlopen2("http://www.yourURL.com", ())
pageRead = page.read()
print pageRead

if __name__ == '__main__':
publishedNotes()

sys.exit()

Aug 4 '07 #2

P: n/a
Hello,
I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?
Have a look at http://www.myinterestingfiles.com/20...rmany-ads.html
for getting the data and at http://www.crummy.com/software/BeautifulSoup/
for handling it.

HTH.

--
Miki Tebeka <mi*********@gmail.com>
http://pythonwise.blogspot.com

Aug 4 '07 #3

P: n/a
Miki wrote:
Hello,
>I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?
Have a look at http://www.myinterestingfiles.com/20...rmany-ads.html
Well, it's certainly interesting, but I'm not sure how it might help the OP get data from a website...
for getting the data and at http://www.crummy.com/software/BeautifulSoup/
for handling it.

HTH.

--
Miki Tebeka <mi*********@gmail.com>
http://pythonwise.blogspot.com
Aug 4 '07 #4

P: n/a
Jay Loden wrote:
Miki wrote:
Have a look at http://www.myinterestingfiles.com/20...rmany-ads.html

Well, it's certainly interesting, but I'm not sure how it might help the OP get data from a website...
A case of the Freudian clipboard, perhaps? ;-)

Paul

Aug 4 '07 #5

P: n/a
Hello,
I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?
Have a look athttp://www.myinterestingfiles.com/2007/03/playboy-germany-ads.html

Well, it's certainly interesting, but I'm not sure how it might help the OP get data from a website...
Ouch, let there be a lesson to me to *read* my posts before sending
them :)

Should have been http://wwwsearch.sourceforge.net/mechanize/.

--
Miki (who can't paste) Tebeka
mi*********@gmail.com
http://pythonwise.blogspot.com

Aug 4 '07 #6

This discussion thread is closed

Replies have been disabled for this discussion.