473,386 Members | 1,702 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Website data-mining.

Hi--
I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?


nieu
Aug 4 '07 #1
5 3042
On Aug 3, 7:50 pm, Coogan <pcb2...@columbia.eduwrote:
Hi--

I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?

nieu
How about this? it will fetch the HTML source of the page.

import datetime, time, re, os, sys, traceback, smtplib, string,\
urllib2, urllib, inspect
from urllib2 import build_opener, HTTPCookieProcessor, Request
opener = build_opener(HTTPCookieProcessor)
from urllib import urlencode

def urlopen2(url, data=None, user_agent='urlopen2'):
"""Opens Our URLS """
if hasattr(data, "__iter__"):
data = urlencode(data)
headers = {'User-Agent' : user_agent}
return opener.open(Request(url, data, headers))

###TESTCASES START HERE###
def publishedNotes():
page = urlopen2("http://www.yourURL.com", ())
pageRead = page.read()
print pageRead

if __name__ == '__main__':
publishedNotes()

sys.exit()

Aug 4 '07 #2
Hello,
I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?
Have a look at http://www.myinterestingfiles.com/20...rmany-ads.html
for getting the data and at http://www.crummy.com/software/BeautifulSoup/
for handling it.

HTH.

--
Miki Tebeka <mi*********@gmail.com>
http://pythonwise.blogspot.com

Aug 4 '07 #3
Miki wrote:
Hello,
>I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?
Have a look at http://www.myinterestingfiles.com/20...rmany-ads.html
Well, it's certainly interesting, but I'm not sure how it might help the OP get data from a website...
for getting the data and at http://www.crummy.com/software/BeautifulSoup/
for handling it.

HTH.

--
Miki Tebeka <mi*********@gmail.com>
http://pythonwise.blogspot.com
Aug 4 '07 #4
Jay Loden wrote:
Miki wrote:
Have a look at http://www.myinterestingfiles.com/20...rmany-ads.html

Well, it's certainly interesting, but I'm not sure how it might help the OP get data from a website...
A case of the Freudian clipboard, perhaps? ;-)

Paul

Aug 4 '07 #5
Hello,
I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?
Have a look athttp://www.myinterestingfiles.com/2007/03/playboy-germany-ads.html

Well, it's certainly interesting, but I'm not sure how it might help the OP get data from a website...
Ouch, let there be a lesson to me to *read* my posts before sending
them :)

Should have been http://wwwsearch.sourceforge.net/mechanize/.

--
Miki (who can't paste) Tebeka
mi*********@gmail.com
http://pythonwise.blogspot.com

Aug 4 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Remi | last post by:
Hello Everybody! I am going to develop a multi-language website which will include Traditional Chinese, Simplified Chinese, Thai, Japanese, etc. (and English as well). I would like to take an...
10
by: Harry Slaughter | last post by:
I've got a client who wants to see some immediate results on a brand new website. within a week, they'd like to see the following: 1) basic user authentication (using php sessions/cookies to...
3
by: Maellic | last post by:
Hi, The website I am working on is built with ASP.NET and connects to a SQL Server 2000 database. The web server and database are on the same machine. I have recently tried to modify the timeout...
8
by: Maximilian Hofer | last post by:
Hallo NG, zum erstellen einer Anfrag an eine Website benutze ich folgenden Code: Dim encoding = New System.Text.UTF8Encoding 'Daten zum Posten zusammenbauen Dim postData As String
5
by: Tyler | last post by:
I am developing an application which will allow me to automatically sign into an external website. I can currently do a screen scrape using HTTPWEBREQUEST. However I want to just redirect to the...
2
by: crferguson | last post by:
I'm having a really odd issue. Recently my company has upgraded our data server. For a couple of months I'm having to host two versions of the same website on our webserver until the new data...
19
by: cpnet | last post by:
I'm using VS2005, C#, ASP.NET 2.0. I'm trying to create a report using SQL Reporting Services (to be used in local mode so I don't have to deal with SQL Server). When I create a new report in my...
7
by: Atul | last post by:
Hi Theres a website that books hotels . user enters the information and according to that results are displayed to the user.Let it be website A. Now I want to create a new project with...
6
by: dboyerco | last post by:
I'm working with a company that is tracking my vihicle and they have an API that will allow me to log into their database and retrieve the location of my vihicle, which is returned to their website...
2
by: dogged | last post by:
Website works locally but not when deployed ================================== Can someone please help with what I hope is a common problem (Iím new to .net). I have a simple website generated...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.