473,491 Members | 2,636 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

any suggestions for URL cataloging project?

I've just come up with an idea to make a small-time record of web
pages linking to other web pages. I don't want to download every page
on the internet (I'll leave google to do that). I just want to know if
anyone has any suggestions on how to acquire just the links from a web
page using python. This is for a cataloging purpose. Is there some
library or script out there that I haven't heard of?
Jul 18 '05 #1
2 1425
Matthew K Jensen <ma**********@gmail.com> wrote:
I've just come up with an idea to make a small-time record of web
pages linking to other web pages. I don't want to download every page
on the internet (I'll leave google to do that). I just want to know if
anyone has any suggestions on how to acquire just the links from a web
page using python. This is for a cataloging purpose. Is there some
library or script out there that I haven't heard of?


Check out Tools/webchecker/ -- the Tools directory is part of Python's
source distribution and should also come with most prepackaged Python
distributions, I believe.
Alex
Jul 18 '05 #2
"Matthew K Jensen" <ma**********@gmail.com> wrote in message
news:a8**************************@posting.google.c om...
I've just come up with an idea to make a small-time record of web
pages linking to other web pages. I don't want to download every page
on the internet (I'll leave google to do that). I just want to know if
anyone has any suggestions on how to acquire just the links from a web
page using python. This is for a cataloging purpose. Is there some
library or script out there that I haven't heard of?


One of the examples that comes with pyparsing is urlextractor.py. Point it
at a web page and it lists out the urls and linked text.

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul
Jul 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

17
8256
by: Phil Powell | last post by:
Where can I find an online PHP form validator script library to use? I have tried hacking the one here at work for weeks now and it's getting more and more impossible to customize, especially now...
10
1274
by: Aditi | last post by:
hi all...i m a software engg. student completed my 2nd yr...i have been asked to make a project during these summer vacations...and hereby i would like to invite some ideas bout the design and...
41
8667
by: Michael Strorm | last post by:
Hi, I'm in the middle of "teaching" myself C++. Having skimmed some of the "Teach Yourself C++ in 21 Days" book, I got a feel for the language, at least. Then I bought "The C++ Programming...
10
1921
by: Ron Ruble | last post by:
I'd like to get suggestions from some of the folks here regarding tools and processes for a new, small development team. I'm starting a new job next week, and part of the fun is being able to...
2
2263
by: Jagdip Singh | last post by:
Hi all, I uninstalled db2 without dropping old instance. I have one database which i want to access again through new db2 i installed. I opted for fresh db2 directory while installing instead...
1
1603
by: kenfar | last post by:
I've got a set of redundant marts that I'm trying to catalog on the client side to allow us to do two things: 1. manually recatalog to point at either of the two fast marts 2. automatically...
7
1033
by: Ivan Weiss | last post by:
Okay guys, I have some theory for ya. I am building a program for my company to manage our products. We are in the foodservice industry and sell equipment. The models will be stored in my...
4
1492
by: Gene Hubert | last post by:
I'm looking for a good coding project for the purpose of getting to the point of feeling proficient in VB.NET programming. I've read a few books, written several test apps, and have prior...
0
1510
by: GetgoodV | last post by:
Hi all, I wonder if anybody can help with a problem I have accessing (caaloging) databases on my network (drive V) I have installed DB2 V8 on Windows XP: -...
0
7112
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7146
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7183
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
7356
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5448
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
4878
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4573
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
1389
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
0
277
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.