to extract the links from my website.
This is my script:
Expand|Select|Wrap|Line Numbers
- require("my_functions.php");
- $target_url = "http://www.support-focus.com/customer-service-software.html";
- $userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';
- echo "<br>Starting<br>Target_url: $target_url";
- // make the cURL request to $target_url
- $ch = curl_init();
- curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
- curl_setopt($ch, CURLOPT_URL,$target_url);
- curl_setopt($ch, CURLOPT_FAILONERROR, true);
- curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
- curl_setopt($ch, CURLOPT_AUTOREFERER, true);
- curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
- curl_setopt($ch, CURLOPT_TIMEOUT, 10);
- $page= curl_exec($ch);
- if (!$page) {
- echo "<br />cURL error number:" .curl_errno($ch);
- echo "<br />cURL error:" . curl_error($ch);
- exit;
- }
- // parse the html into a DOMDocument
- $doc = new DOMDocument();
- $doc->loadHTML($page);
- //echo $doc->saveHTML();
- $params = $doc->getElementsByTagName('a'); // Find the a hrefs
- $k=0;
- foreach ($params as $param) //go to each section 1 by 1
- {
- echo "Section Attribute :-> ".$params->item($k)->getAttribute('href')."<br>"; //get a
- $k++;
- }
- ?>
customer service software
and the output is:
Starting
Target_url: http://www.support-focus.com/custome...-software.html
Section Attribute :-> index.php
Section Attribute :-> works.php
Section Attribute :-> pricing.php
Section Attribute :-> special.php
Section Attribute :-> contact.php
Section Attribute :-> login.php
Section Attribute :-> Customer-Service-Software.php
Section Attribute :-> articles.php
Section Attribute :-> Why-Get-An-Internet-Security-Seal.php
Section Attribute :-> The-Fantastic-Return-on-Investment-from-Trust-Seals.php
Section Attribute :-> Turn-Browsers-Into-Buyers-Increase-Your-Sales-Conversion.php
Section Attribute :-> Selecting-The-Best-Trust-Seal-To-Boost-Your-Sales-Conversions.php
Section Attribute :-> Give-Great-Customer-Service-And-Get-A-Trust-Seal-to-Prove-It.php
Section Attribute :-> Customer-Service-Software-Solutions-For-Online-Business.php
Section Attribute :-> 73-Per-Cent-Of-Buyers-Abort-Their-Purchases-How-To-Change-It.php
Section Attribute :-> Why-Are-Your-Visitors-Not-Buying-Your-Products.php
Section Attribute :-> http://www.support-focus.com/index.php
Section Attribute :-> http://www.support-focus.com/special.php
Section Attribute :-> terms.php
Section Attribute :-> privacy.php
Section Attribute :-> earnings_disclaimer.php
Section Attribute :-> articles.php
Works quite well, but some of the links are local and some are full urls.
Given the code I am already using, what is the best way to get
all these links shown as complete urls.
Is there a DOMDoc method to do this ?
Also I want to get out and store the website address
i.e. just the "www.support-focus.com" part.
I realize that I could do it with a preg_match.
It could also be done with strpos and substr - but it would be a bit messy.
But I just thought that if there is something in the DOM class that can do the job then it may be quicker and more efficient.
What do you professionals think is the best way to get the data that I want ?
And how should I construct it ?