473,326 Members | 2,192 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

What is best way to turn local link into complete url?

290 100+
I am using curl and DOMDocument
to extract the links from my website.

This is my script:

Expand|Select|Wrap|Line Numbers
  1. require("my_functions.php");
  2.  
  3. $target_url = "http://www.support-focus.com/customer-service-software.html";
  4. $userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';
  5.  
  6. echo "<br>Starting<br>Target_url: $target_url";
  7.  
  8. // make the cURL request to $target_url
  9. $ch = curl_init();
  10. curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
  11. curl_setopt($ch, CURLOPT_URL,$target_url);
  12. curl_setopt($ch, CURLOPT_FAILONERROR, true);
  13. curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
  14. curl_setopt($ch, CURLOPT_AUTOREFERER, true);
  15. curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
  16. curl_setopt($ch, CURLOPT_TIMEOUT, 10);
  17. $page= curl_exec($ch);
  18. if (!$page) {
  19.     echo "<br />cURL error number:" .curl_errno($ch);
  20.     echo "<br />cURL error:" . curl_error($ch);
  21.     exit;
  22. }
  23.  
  24. // parse the html into a DOMDocument
  25. $doc = new DOMDocument();
  26. $doc->loadHTML($page);
  27.  
  28. //echo $doc->saveHTML();
  29.  
  30. $params = $doc->getElementsByTagName('a'); // Find  the a hrefs
  31. $k=0;
  32. foreach ($params as $param) //go to each section 1 by 1
  33. {
  34.          echo "Section Attribute :-> ".$params->item($k)->getAttribute('href')."<br>";   //get a
  35.  
  36. $k++;   
  37.  
  38. }
  39. ?> 
  40.  
As you can see the target page is this one:

customer service software

and the output is:

Starting
Target_url: http://www.support-focus.com/custome...-software.html

Section Attribute :-> index.php
Section Attribute :-> works.php
Section Attribute :-> pricing.php
Section Attribute :-> special.php
Section Attribute :-> contact.php
Section Attribute :-> login.php
Section Attribute :-> Customer-Service-Software.php
Section Attribute :-> articles.php
Section Attribute :-> Why-Get-An-Internet-Security-Seal.php
Section Attribute :-> The-Fantastic-Return-on-Investment-from-Trust-Seals.php
Section Attribute :-> Turn-Browsers-Into-Buyers-Increase-Your-Sales-Conversion.php
Section Attribute :-> Selecting-The-Best-Trust-Seal-To-Boost-Your-Sales-Conversions.php
Section Attribute :-> Give-Great-Customer-Service-And-Get-A-Trust-Seal-to-Prove-It.php
Section Attribute :-> Customer-Service-Software-Solutions-For-Online-Business.php
Section Attribute :-> 73-Per-Cent-Of-Buyers-Abort-Their-Purchases-How-To-Change-It.php
Section Attribute :-> Why-Are-Your-Visitors-Not-Buying-Your-Products.php
Section Attribute :-> http://www.support-focus.com/index.php
Section Attribute :-> http://www.support-focus.com/special.php
Section Attribute :-> terms.php
Section Attribute :-> privacy.php
Section Attribute :-> earnings_disclaimer.php
Section Attribute :-> articles.php

Works quite well, but some of the links are local and some are full urls.

Given the code I am already using, what is the best way to get
all these links shown as complete urls.

Is there a DOMDoc method to do this ?

Also I want to get out and store the website address
i.e. just the "www.support-focus.com" part.

I realize that I could do it with a preg_match.

It could also be done with strpos and substr - but it would be a bit messy.

But I just thought that if there is something in the DOM class that can do the job then it may be quicker and more efficient.

What do you professionals think is the best way to get the data that I want ?
And how should I construct it ?
Oct 24 '09 #1
1 2503
TheServant
1,168 Expert 1GB
I would simply define your domain name, and then with an if function, add it to those where it's not detected.

If there are different folders for each link, then it's more complicated and I am yet to think of an efficient way.
Oct 25 '09 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

28
by: David MacQuigg | last post by:
I'm concerned that with all the focus on obj$func binding, &closures, and other not-so-pretty details of Prothon, that we are missing what is really good - the simplification of classes. There are...
26
by: Chris Lasher | last post by:
Hello, I have a rather large (100+ MB) FASTA file from which I need to access records in a random order. The FASTA format is a standard format for storing molecular biological sequences. Each...
92
by: Reed L. O'Brien | last post by:
I see rotor was removed for 2.4 and the docs say use an AES module provided separately... Is there a standard module that works alike or an AES module that works alike but with better encryption?...
125
by: Sarah Tanembaum | last post by:
Beside its an opensource and supported by community, what's the fundamental differences between PostgreSQL and those high-price commercial database (and some are bloated such as Oracle) from...
15
by: John J | last post by:
I've written the following code into a class to search for and display the results of all races entered (The complete code is in a previous thread). I wish to amend the code so as to display the...
121
by: typingcat | last post by:
First of all, I'm an Asian and I need to input Japanese, Korean and so on. I've tried many PHP IDEs today, but almost non of them supported Unicode (UTF-8) file. I've found that the only Unicode...
46
by: Keith K | last post by:
Having developed with VB since 1992, I am now VERY interested in C#. I've written several applications with C# and I do enjoy the language. What C# Needs: There are a few things that I do...
21
by: StriderBob | last post by:
Situation : FormX is mdi child form containing 2 ListViews ListView1 contains a list of table names and 4 sub items with data about each table. ListView2 contains a list of the columns on each...
2
by: Aris | last post by:
Hello! Im trying to implement a queue using a linked list. I've made that code and I expected my Degueue() function to return the value of the key of the node I constructed. But It does not....
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.