need start point for getting html info from web

nephish

hey there,

i have a small app that i am going to need to get information from a
few tables on different websites. i have looked at urllib and httplib.
the sites i need to get data from mostly have this data in tables. So
that, i think would make it easier. Anyone suggest a good starting
point for me to find out how to do this, or know of a link to a good
how-to?
thanks,
sk

Oct 31 '05 #1

Subscribe Reply

1404

Mike Meyer

ne*****@xit.net writes:

i have a small app that i am going to need to get information from a
few tables on different websites. i have looked at urllib and httplib.
the sites i need to get data from mostly have this data in tables. So
that, i think would make it easier. Anyone suggest a good starting
point for me to find out how to do this, or know of a link to a good
how-to?

Don't have a link to a howto. But you're halfway there. urllib (and
urllib2) will get HTML text from the websites. Pulling data from it
sort of depends on the nature of the HTML. If it's well-structured
XHTML, you can use your favorite xml library. if it's well structured
HTML, you can try htmllib, but it's pretty primitive. If it's not
well-structured, you can use BeautifulSoup. I've used it to pull data
from tables. The problem with any of this is that your code really
depends on the structure - or lack thereof - of the HTML you're
scraping. If they change it, your code breaks.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

Oct 31 '05 #2

nephish

yeah, i know i am going to have to write a bunch of stuff because the
values i want to get come from several different sites. ah-well, just
wanting to know the easiest way to learn how to get started. i will
check into beautiful soup, i think i have heard it referred to before.
thanks
shawn

Oct 31 '05 #3

Paul McGuire

<ne*****@xit.net> wrote in message
news:11*********************@g47g2000cwa.googlegro ups.com...

hey there,

i have a small app that i am going to need to get information from a
few tables on different websites. i have looked at urllib and httplib.
the sites i need to get data from mostly have this data in tables. So
that, i think would make it easier. Anyone suggest a good starting
point for me to find out how to do this, or know of a link to a good
how-to?
thanks,
sk

pyparsing comes with a simple HTML scraper example for extracting the NIST
NTP servers from an HTML table. pyparsing is also fairly tolerant of
"unclean" HTML. Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul

Oct 31 '05 #4

alex_f_il

You can easily do it with SW Explorer Automation
(http://home.comcast.net/~furmana/SWIEAutomation.htm).
The program creates an automation API for any Web application which
uses HTML and DHTML and works with Microsoft Internet Explorer. The Web
application becomes programmatically accessible from any .NET language.
The tool has Visual Table Data Extractor. It allows visually define the
table structure. The table becomes accessible from the code as
DataTable class. You can develop the extraction script in hours with
the tool.

Oct 31 '05 #5

Similar topics

9857

Help need to send email using java applet in html page

by: James Hong | last post by:

Help please, I try to sending an email from my html page using the java applet. but it give error on most of the PC only very few work, what is the error i make the java applet show as below ...

Java

2416

application hangs - no start possible - what is wrong?

by: guido.baumhoff | last post by:

Hello everybody, I am very confused: an last saturday my client has installed a new notebook system with the components, I have wish: Microsoft Windows XP, IIS 5.1, SQL MSDE with SP4, .NET...

ASP.NET

4564

Need info, books, articles,code, etc: Access VBA to XML to/from MySQL via HTTP

by: Cheryl Langdon | last post by:

Hello everyone, This is my first attempt at getting help in this manner. Please forgive me if this is an inappropriate request. I suddenly find myself in urgent need of instruction on how to...

Microsoft Access / VBA

2516

Need to extract Querystring

by: fox | last post by:

Hi, Lacking javascript knowledge, I just realized why my project has a bug. I am using ASP to loop through a set of records while it creates URLs with a querystring that has a single value pair....

Javascript

3856

Need Help Declaring a Pointer to an Array of Structures

by: gcary | last post by:

I am having trouble figuring out how to declare a pointer to an array of structures and initializing the pointer with a value. I've looked at older posts in this group, and tried a solution that...

C / C++

5602

How to include JavaScript into PHP to get 2 Variables ??? Need urgent help with that

by: sunbeam | last post by:

Short Description of the Project: we developed a e-learning system for our students. each student has a unique username/password to view the modules he/she should view and nothing more. since we...

PHP

4222

Large web site, need to do some major rearrangement of files...

by: mike | last post by:

I help manage a large web site, one that has over 600 html pages... It's a reference site for ham radio folks and as an example, one page indexes over 1.8 gb of on-line PDF documents. The site...

HTML / CSS

2223

Chicken & Egg Problem Iterating Through Subset of JSON List

by: RMWChaos | last post by:

In my ongoing effort to produce shorter, more efficient code, I have created a "chicken and egg" / "catch-22" problem. I can think of several ways to fix this, none of them elegant. I want my code...

Javascript

2964

need help with my senior project

by: zaina | last post by:

hi everybody i am nwebie in this forum but i think it is useful for me and the member are helpful my project is about connecting client with the server to start exchanging messages between...

Java

7242

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

7353

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

7418

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

7075

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

7508

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

5063

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

4737

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

3212

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

446

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

General