473,396 Members | 1,816 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Website data-mining.

Hi--
I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?


nieu
Aug 4 '07 #1
5 3044
On Aug 3, 7:50 pm, Coogan <pcb2...@columbia.eduwrote:
Hi--

I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?

nieu
How about this? it will fetch the HTML source of the page.

import datetime, time, re, os, sys, traceback, smtplib, string,\
urllib2, urllib, inspect
from urllib2 import build_opener, HTTPCookieProcessor, Request
opener = build_opener(HTTPCookieProcessor)
from urllib import urlencode

def urlopen2(url, data=None, user_agent='urlopen2'):
"""Opens Our URLS """
if hasattr(data, "__iter__"):
data = urlencode(data)
headers = {'User-Agent' : user_agent}
return opener.open(Request(url, data, headers))

###TESTCASES START HERE###
def publishedNotes():
page = urlopen2("http://www.yourURL.com", ())
pageRead = page.read()
print pageRead

if __name__ == '__main__':
publishedNotes()

sys.exit()

Aug 4 '07 #2
Hello,
I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?
Have a look at http://www.myinterestingfiles.com/20...rmany-ads.html
for getting the data and at http://www.crummy.com/software/BeautifulSoup/
for handling it.

HTH.

--
Miki Tebeka <mi*********@gmail.com>
http://pythonwise.blogspot.com

Aug 4 '07 #3
Miki wrote:
Hello,
>I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?
Have a look at http://www.myinterestingfiles.com/20...rmany-ads.html
Well, it's certainly interesting, but I'm not sure how it might help the OP get data from a website...
for getting the data and at http://www.crummy.com/software/BeautifulSoup/
for handling it.

HTH.

--
Miki Tebeka <mi*********@gmail.com>
http://pythonwise.blogspot.com
Aug 4 '07 #4
Jay Loden wrote:
Miki wrote:
Have a look at http://www.myinterestingfiles.com/20...rmany-ads.html

Well, it's certainly interesting, but I'm not sure how it might help the OP get data from a website...
A case of the Freudian clipboard, perhaps? ;-)

Paul

Aug 4 '07 #5
Hello,
I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?
Have a look athttp://www.myinterestingfiles.com/2007/03/playboy-germany-ads.html

Well, it's certainly interesting, but I'm not sure how it might help the OP get data from a website...
Ouch, let there be a lesson to me to *read* my posts before sending
them :)

Should have been http://wwwsearch.sourceforge.net/mechanize/.

--
Miki (who can't paste) Tebeka
mi*********@gmail.com
http://pythonwise.blogspot.com

Aug 4 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Remi | last post by:
Hello Everybody! I am going to develop a multi-language website which will include Traditional Chinese, Simplified Chinese, Thai, Japanese, etc. (and English as well). I would like to take an...
10
by: Harry Slaughter | last post by:
I've got a client who wants to see some immediate results on a brand new website. within a week, they'd like to see the following: 1) basic user authentication (using php sessions/cookies to...
3
by: Maellic | last post by:
Hi, The website I am working on is built with ASP.NET and connects to a SQL Server 2000 database. The web server and database are on the same machine. I have recently tried to modify the timeout...
8
by: Maximilian Hofer | last post by:
Hallo NG, zum erstellen einer Anfrag an eine Website benutze ich folgenden Code: Dim encoding = New System.Text.UTF8Encoding 'Daten zum Posten zusammenbauen Dim postData As String
5
by: Tyler | last post by:
I am developing an application which will allow me to automatically sign into an external website. I can currently do a screen scrape using HTTPWEBREQUEST. However I want to just redirect to the...
2
by: crferguson | last post by:
I'm having a really odd issue. Recently my company has upgraded our data server. For a couple of months I'm having to host two versions of the same website on our webserver until the new data...
19
by: cpnet | last post by:
I'm using VS2005, C#, ASP.NET 2.0. I'm trying to create a report using SQL Reporting Services (to be used in local mode so I don't have to deal with SQL Server). When I create a new report in my...
7
by: Atul | last post by:
Hi Theres a website that books hotels . user enters the information and according to that results are displayed to the user.Let it be website A. Now I want to create a new project with...
6
by: dboyerco | last post by:
I'm working with a company that is tracking my vihicle and they have an API that will allow me to log into their database and retrieve the location of my vihicle, which is returned to their website...
2
by: dogged | last post by:
Website works locally but not when deployed ================================== Can someone please help with what I hope is a common problem (I’m new to .net). I have a simple website generated...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.