473,397 Members | 1,950 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,397 software developers and data experts.

List all links from a webpage

Hi,

I'd like to list all links (url's, not interested in email adresses) from a
web page (htm, html, asp, php ...).
I'am not sure if this can be done using php. So please some advice or maybe
an example.

Thanks in advance,
Robertico
Sep 3 '05 #1
7 1515
Robertico wrote:
I'd like to list all links (url's, not interested in email adresses) from a
web page (htm, html, asp, php ...).


<?php
$url = 'http://tobyinkster.co.uk';
$n = 12 + strlen(`lynx -dump -number_links -nolist '$url'`);
$list = substr(`lynx -dump -number_links '$url'`, $n);
print "<pre>$list</pre>";
?>

Magic.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact

Sep 3 '05 #2
Thanks,
<?php
$url = 'http://tobyinkster.co.uk';
$n = 12 + strlen(`lynx -dump -number_links -nolist '$url'`);
$list = substr(`lynx -dump -number_links '$url'`, $n);
print "<pre>$list</pre>";
?>


Great solution using Lynx. My first impression is that it works great !

Robertico
Sep 3 '05 #3
> <?php
$url = 'http://tobyinkster.co.uk';
$n = 12 + strlen(`lynx -dump -number_links -nolist '$url'`);
$list = substr(`lynx -dump -number_links '$url'`, $n);
print "<pre>$list</pre>";
?>


I tried to remove the numbers (removed -number_links) but it doesn't work.
I 'd like to store the results in a database.
Can you explain why adding 12 to the string length ?

Robertico
Sep 5 '05 #4
Robertico wrote:
I tried to remove the numbers (removed -number_links) but it doesn't work.
Use a regular expression:

$list = preg_replace('/\[[0-9]+\]/', $list);
Can you explain why adding 12 to the string length ?


Try it without the +12 and see.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact

Sep 8 '05 #5
> Use a regular expression:
$list = preg_replace('/\[[0-9]+\]/', $list);
Ok, thats correct, but is there a reason why is doesn't work
without -number_links.
I always want to know why. I'd like to understand what i'am doing. :-))
Try it without the +12 and see.


I already did, but why exactly 12.

Hope i don't tease you to much
Robertico
Sep 8 '05 #6
Robertico wrote:
Ok, thats correct, but is there a reason why is doesn't work
without -number_links.
I always want to know why. I'd like to understand what i'am doing. :-))


ISTR that I couldn't get the two lynx dumps to "line up" without using
"-number_links" for both.
Try it without the +12 and see.


I already did, but why exactly 12.


if (strlen("References\n\n")==12)
{
echo "That's why!\n";
/* Again, IIRC */
}

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact

Sep 8 '05 #7
Thanx !!

Robertico
Sep 9 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Steve | last post by:
Hello, I am writing a script that calls a URL and reads the resulting HTML into a function that strips out everthing and returns ONLY the links, this is so that I can build a link index of various...
18
by: Jan Tuxen | last post by:
Jakob Nielsen in his most recent Alertbox (http://www.useit.com/alertbox/20040503.html) tells web authors to change the color of visited links. I agree to his purpose: Help users understand...
5
by: Darryl B | last post by:
I can not get anywhere on this project I'm tryin to do. I'm not expecting any major help with this but any would be appreciated. The assignment is attached. The problem I'm having is trying to set...
9
by: Muffinman | last post by:
Hi, I have a webpage with two Iframe's. I want to be able to change the target of all links in one frame so it will point to the other frame and all this from the main page. Is this possible and...
1
by: craig.lloyd | last post by:
Hi all, Can anyone tell me how I can display details of the 3 most recently visited links onto my webpage... i.e. My webpage has many links off to various documents etc. I would like to place...
13
by: NoSpamThankYouMam | last post by:
I am looking for a product that I am not sure exists. I have bookmarks to webpages in Internet Explorer, Mozilla Firefox, Opera, Netscape Navigator, and on a "Favorite Links" page on my website....
7
by: Patrick Olurotimi Ige | last post by:
I have a simple Stored Procedure with multiple select statements..doing select * from links for example. I created a DataTable and then fill the tables But the first dtTemplate DataTable doesn't...
6
by: Jetus | last post by:
Is there a good place to look to see where I can find some code that will help me to save webpage's links to the local drive, after I have used urllib2 to retrieve the page? Many times I have to...
1
by: metameta | last post by:
This question may be a little complicated, at least for me, since I am fairly new to python. So I know a webpage that has two drop-down selection boxes. and a 'search' button. When I choose the...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.