473,387 Members | 1,528 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

parsing out links in HTML docs

Hello to you all,

A question from a PHP newby who is disorientated by the overwhelming
amount of existing example scripts.
-- What is the best/simplest way to parse out the links in a a HTML
document and putting them in an array? --

Some hints, functions or snipplets would be highly appreciated.
Thanks.

Marco
Jul 17 '05 #1
2 2471
marco wrote:
Hello to you all,

A question from a PHP newby who is disorientated by the overwhelming
amount of existing example scripts.
-- What is the best/simplest way to parse out the links in a a HTML
document and putting them in an array? --

Some hints, functions or snipplets would be highly appreciated.


Here's a way:

<?

$file = file_get_contents("http://www.php.net/");
preg_match_all("/<a[^>]+href\s*=\s*(\"|')?([^\"'\s>]+)/i", $file, $links);

print "<pre>";
print_r($links[2]);
print "</pre>";

?>
JW

Jul 17 '05 #2
O Yeah, the one magic line...

<< preg_match_all("/<a[^>]+href\s*=\s*(\"|')?([^\"'\s>]+)/i", $file, $links); >>

Sweet.

Thanks a lot.

marco
Jul 17 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: bugbear | last post by:
Subject pretty much says it all. I'd like to parse XML (duh!) using Xerces (because its fast, and reliable, and comprehensive, and supports lots of features). I'd like to conform to standards...
8
by: Gerrit Holl | last post by:
Posted with permission from the author. I have some comments on this PEP, see the (coming) followup to this message. PEP: 321 Title: Date/Time Parsing and Formatting Version: $Revision: 1.3 $...
3
by: Willem Ligtenberg | last post by:
I decided to use SAX to parse my xml file. But the parser crashes on: File "/usr/lib/python2.3/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError raise exception...
1
by: Chris Hemingway | last post by:
Hi I have an html file which links to word docs amongst other things; but these files and their location may change periodically. How can I adapt my html so that if the files do not exist, the...
2
by: Konrad Mathieu | last post by:
Does this work in most browsers, namely MSIE? document.links.href or does it have to be document.links.href ?
16
by: Terry | last post by:
Hi, This is a newbie's question. I want to preload 4 images and only when all 4 images has been loaded into browser's cache, I want to start a slideshow() function. If images are not completed...
0
by: Jarod_24 | last post by:
I've tried the WebBrowser in the System.Windows.Forms namespace, but it dosen't work when you instanciate an object from a class. It needs a Form to live in to work. My application allready has...
3
by: Rodrigo Meza | last post by:
Hello Everyone For a project I am working on, I need to retrieve links from html documents. The easy part is to obtain 'plain' links like <A HREF="http://site/path/document">, but when those...
5
by: Benoit | last post by:
I've been instructing myself in XML DOM parsing using the w3schools tutorial and decided to try an example of my own. I'd written a short XML file that looked like this: <?xml version="1.0"...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.