Website data-mining.

Coogan

Hi--
I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?

nieu

Aug 4 '07 #1

Subscribe Post Reply

3044

SMERSH009

On Aug 3, 7:50 pm, Coogan <pcb2...@columbia.eduwrote:

Hi--

I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?

nieu

How about this? it will fetch the HTML source of the page.

import datetime, time, re, os, sys, traceback, smtplib, string,\
urllib2, urllib, inspect
from urllib2 import build_opener, HTTPCookieProcessor, Request
opener = build_opener(HTTPCookieProcessor)
from urllib import urlencode

def urlopen2(url, data=None, user_agent='urlopen2'):
"""Opens Our URLS """
if hasattr(data, "__iter__"):
data = urlencode(data)
headers = {'User-Agent' : user_agent}
return opener.open(Request(url, data, headers))

###TESTCASES START HERE###
def publishedNotes():
page = urlopen2("http://www.yourURL.com", ())
pageRead = page.read()
print pageRead

if __name__ == '__main__':
publishedNotes()

sys.exit()

Aug 4 '07 #2

Miki

Hello,

I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?

Have a look at http://www.myinterestingfiles.com/20...rmany-ads.html
for getting the data and at http://www.crummy.com/software/BeautifulSoup/
for handling it.

HTH.

--
Miki Tebeka <mi*********@gmail.com>
http://pythonwise.blogspot.com

Aug 4 '07 #3

Jay Loden

Miki wrote:

Hello,

>I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?
Have a look at http://www.myinterestingfiles.com/20...rmany-ads.html

Well, it's certainly interesting, but I'm not sure how it might help the OP get data from a website...

for getting the data and at http://www.crummy.com/software/BeautifulSoup/
for handling it.

HTH.

--
Miki Tebeka <mi*********@gmail.com>
http://pythonwise.blogspot.com

Aug 4 '07 #4

Paul Boddie

Jay Loden wrote:

Miki wrote:
Have a look at http://www.myinterestingfiles.com/20...rmany-ads.html

Well, it's certainly interesting, but I'm not sure how it might help the OP get data from a website...

A case of the Freudian clipboard, perhaps? ;-)

Paul

Aug 4 '07 #5

Miki

Hello,

I'm using Python for the first time to make a plug-in for Firefox.
The goal of this plug-in is to take the source code from a website
and use the metadata and body text for different kinds of analysis.
My question is: How can I retrieve data from a website? I'm not even
sure if this is possible through Python. Any help?
Have a look athttp://www.myinterestingfiles.com/2007/03/playboy-germany-ads.html

Well, it's certainly interesting, but I'm not sure how it might help the OP get data from a website...

Ouch, let there be a lesson to me to *read* my posts before sending
them :)

Should have been http://wwwsearch.sourceforge.net/mechanize/.

--
Miki (who can't paste) Tebeka
mi*********@gmail.com
http://pythonwise.blogspot.com

Aug 4 '07 #6

by: Remi | last post by:

Hello Everybody! I am going to develop a multi-language website which will include Traditional Chinese, Simplified Chinese, Thai, Japanese, etc. (and English as well). I would like to take an...

.NET Framework

how to jump start a brand new website (under the gun)

by: Harry Slaughter | last post by:

I've got a client who wants to see some immediate results on a brand new website. within a week, they'd like to see the following: 1) basic user authentication (using php sessions/cookies to...

PHP

website connection to database time out problem : login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'

by: Maellic | last post by:

Hi, The website I am working on is built with ASP.NET and connects to a SQL Server 2000 database. The web server and database are on the same machine. I have recently tried to modify the timeout...

ASP.NET

Daten aus einer Website einlesen

by: Maximilian Hofer | last post by:

Hallo NG, zum erstellen einer Anfrag an eine Website benutze ich folgenden Code: Dim encoding = New System.Text.UTF8Encoding 'Daten zum Posten zusammenbauen Dim postData As String

Visual Basic .NET

http post to another website

by: Tyler | last post by:

I am developing an application which will allow me to automatically sign into an external website. I can currently do a screen scrape using HTTPWEBREQUEST. However I want to just redirect to the...

.NET Framework

Old Website Opens in New Website's Main Frame

by: crferguson | last post by:

I'm having a really odd issue. Recently my company has upgraded our data server. For a couple of months I'm having to host two versions of the same website on our webserver until the new data...

HTML / CSS

Website Datasources Bug?

by: cpnet | last post by:

I'm using VS2005, C#, ASP.NET 2.0. I'm trying to create a report using SQL Reporting Services (to be used in local mode so I don't have to deal with SQL Server). When I create a new report in my...

ASP.NET

Getting data from another website dynamically

by: Atul | last post by:

Hi Theres a website that books hotels . user enters the information and according to that results are displayed to the user.Let it be website A. Now I want to create a new project with...

C# / C Sharp

Need to capture data returned to another Website

by: dboyerco | last post by:

I'm working with a company that is tracking my vihicle and they have an API that will allow me to log into their database and retrieve the location of my vihicle, which is returned to their website...

Javascript

VS2005 Website works locally but not when deployed

by: dogged | last post by:

Website works locally but not when deployed ================================== Can someone please help with what I hope is a common problem (I’m new to .net). I have a simple website generated...

.NET Framework

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Website data-mining.

Similar topics