469,641 Members | 1,189 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,641 developers. It's quick & easy.

Website data-mining.

Hi--
I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?


nieu
Aug 4 '07 #1
5 2934
On Aug 3, 7:50 pm, Coogan <pcb2...@columbia.eduwrote:
Hi--

I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?

nieu
How about this? it will fetch the HTML source of the page.

import datetime, time, re, os, sys, traceback, smtplib, string,\
urllib2, urllib, inspect
from urllib2 import build_opener, HTTPCookieProcessor, Request
opener = build_opener(HTTPCookieProcessor)
from urllib import urlencode

def urlopen2(url, data=None, user_agent='urlopen2'):
"""Opens Our URLS """
if hasattr(data, "__iter__"):
data = urlencode(data)
headers = {'User-Agent' : user_agent}
return opener.open(Request(url, data, headers))

###TESTCASES START HERE###
def publishedNotes():
page = urlopen2("http://www.yourURL.com", ())
pageRead = page.read()
print pageRead

if __name__ == '__main__':
publishedNotes()

sys.exit()

Aug 4 '07 #2
Hello,
I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?
Have a look at http://www.myinterestingfiles.com/20...rmany-ads.html
for getting the data and at http://www.crummy.com/software/BeautifulSoup/
for handling it.

HTH.

--
Miki Tebeka <mi*********@gmail.com>
http://pythonwise.blogspot.com

Aug 4 '07 #3
Miki wrote:
Hello,
>I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?
Have a look at http://www.myinterestingfiles.com/20...rmany-ads.html
Well, it's certainly interesting, but I'm not sure how it might help the OP get data from a website...
for getting the data and at http://www.crummy.com/software/BeautifulSoup/
for handling it.

HTH.

--
Miki Tebeka <mi*********@gmail.com>
http://pythonwise.blogspot.com
Aug 4 '07 #4
Jay Loden wrote:
Miki wrote:
Have a look at http://www.myinterestingfiles.com/20...rmany-ads.html

Well, it's certainly interesting, but I'm not sure how it might help the OP get data from a website...
A case of the Freudian clipboard, perhaps? ;-)

Paul

Aug 4 '07 #5
Hello,
I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?
Have a look athttp://www.myinterestingfiles.com/2007/03/playboy-germany-ads.html

Well, it's certainly interesting, but I'm not sure how it might help the OP get data from a website...
Ouch, let there be a lesson to me to *read* my posts before sending
them :)

Should have been http://wwwsearch.sourceforge.net/mechanize/.

--
Miki (who can't paste) Tebeka
mi*********@gmail.com
http://pythonwise.blogspot.com

Aug 4 '07 #6

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

10 posts views Thread by Harry Slaughter | last post: by
8 posts views Thread by Maximilian Hofer | last post: by
5 posts views Thread by Tyler | last post: by
2 posts views Thread by crferguson | last post: by
19 posts views Thread by cpnet | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.