473,396 Members | 2,076 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

How do I get the final url from a redirection?

290 100+
Hi,

I want to capture the final url that a website redirects to.

Here is an example of what I mean:

www.example.com/sites.php?pd=45

When you click on that link, the site will redirect you to

www.Joe-Blogs.com/green/prod1.html?a=527

As you can see they are two different sites.

What I would like to do is pick the
www.Joe-Blogs.com/green/prod1.html part of the final url
and put it in a variable called $final_url.

So if I have :


Expand|Select|Wrap|Line Numbers
  1. $first_url = "www.example.com/sites.php?pd=45"; 

What would be the best way to get to that $final_url.

Should I be using cUrl or would
file() or get_file_contents() be able to get the url ?

Any ideas on how I can get to my $final_url ?
Feb 15 '10 #1
10 6900
xNephilimx
213 Expert 100+
Are you trying to make some kind of web proxy? If so, there are quite a few around, like PHProxy http://www.phproxy.org/ (source code: http://sourceforge.net/projects/poxy/).
There's no need to reinvent the wheel.

Best regards
Feb 15 '10 #2
jeddiki
290 100+
Thanks,

but no I am not trying to build a proxy,

I want to get the final url so that I can use it in another
web site that does analysis based on the url.
Feb 15 '10 #3
jeddiki
290 100+
If I use cUrl, the code below should get the to the final webpage right?

Is the final destination in the HEADER info ?

Expand|Select|Wrap|Line Numbers
  1. $target_url = "www.example.com/sites.php?pd=45";
  2. $cef = "curl_err.txt"; 
  3. $ceh = fopen($cef, 'w');
  4.  
  5. curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
  6. curl_setopt($ch, CURLOPT_URL,$target_url);
  7. curl_setopt($ch, CURLOPT_FAILONERROR, true);
  8. curl_setopt($ch, CURLOPT_STDERR, $ceh);        
  9. curl_setopt($ch, CURLOPT_VERBOSE, 1);
  10. curl_setopt($ch, CURLOPT_HEADER,1); 
  11. curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
  12. curl_setopt($ch, CURLOPT_AUTOREFERER, true);
  13. curl_setopt($ch, CURLOPT_BINARYTRANSFER,true);
  14. curl_setopt($ch, CURLOPT_TIMEOUT, 10);
  15.  
  16. $output = curl_exec($ch);
  17. $info = curl_getinfo($ch);
  18.  
How would I extract the final url ?

Any ideas ?



.
Feb 15 '10 #4
jeddiki
290 100+
Please HELP !!!

I am really stuck on this one - surely someone knows how
to do this ??


maybe I should be using fsocket() ??

any ideas ?

Thanks



.
Feb 18 '10 #5
kovik
1,044 Expert 1GB
Firstly, why are you doing this? Chances are that you are going about this incorrectly, and we can't help you if you don't give a clear idea of your end-goal.

Secondly, redirection is not a standardized process. It can be performed via headers, meta-tags, or JavaScript. Do you plan to account for all of these?
Feb 19 '10 #6
jeddiki
290 100+
Hi,

If it helps I wll give you a specific real example that
many of us have heard of....

Take the "hop" link that affiliates of clickbank use.

If has the format: xxxx.PROD-ID.hop.cklickbank.net

When you click on a "hop-link" it does not go to clickbank.net
but goes to the product sales page: www.hip-new-product.com
so it redirects via some method ( I don't know what ) to that
sales page.

So what I want to do it capture that end url and then use it
in another place - for example it could be in Alexa.com

So to get site info from Alexa about a website I need to type in Alexa.com?url=www.hip-new-product.com.

Instead of that I can do Alexa.com?url=$finalurl

where $finalurl comes from getting the redirect from cklickbank.net

Hope that helps explain the process.

It is true I don't know which type of redirect is being used, all I want it the final
landing page url.

Any ideas ?
Feb 19 '10 #7
kovik
1,044 Expert 1GB
I would suggest using cURL, as you have opted, and make sure that you set CURLOPT_FOLLOWLOCATION to true. This may not work for JavaScript redirects, but it is designed to work for server-side redirects. Use curl_getinfo() with the option CURLINFO_EFFECTIVE_URL. The 'url' key of the return value should have the final URL that you are looking for.
Feb 19 '10 #8
jeddiki
290 100+
Thanks,

I have managed to get it working.

The only problem is, it takes nearly four hours to process
all 11,000 websites.

This equates to one every 1.26 seconds.

Do you think there is a quicker method ?

May be I should put some of my code into a function - although
I don't know how much of it:

This is my code:

Expand|Select|Wrap|Line Numbers
  1. $sql_url = "SELECT id FROM my_temp ORDER BY id";
  2. $result_url = mysql_query($sql_url)    or write_error("Could not SELECT id FROM my_temp ".mysql_error()." \r\n");     
  3.  
  4. $ctr = 1;
  5.  
  6. while($row_url = mysql_fetch_assoc($result_url)){
  7.  
  8.    $my_code = $row_url['id'];
  9.    $target_url = "http://zzzzz.$my_code.example.com/";  
  10.  
  11.    $ch = curl_init();
  12.    curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
  13.    curl_setopt($ch, CURLOPT_URL,$target_url);
  14.    curl_setopt($ch, CURLOPT_FAILONERROR, true);
  15.    curl_setopt($ch, CURLOPT_STDERR, $ceh);        
  16.    curl_setopt($ch, CURLOPT_VERBOSE, 1);
  17.    curl_setopt($ch, CURLOPT_HEADER,1); 
  18.    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
  19.    curl_setopt($ch, CURLOPT_AUTOREFERER, true);
  20.    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
  21.    curl_setopt($ch, CURLOPT_TIMEOUT, 20);
  22.  
  23.    $output = curl_exec($ch);
  24.    $info = curl_getinfo($ch);
  25.  
  26.    if ($output === FALSE ) {
  27.       write_log("No cURL data returned for $target_url [". $info['http_code']. "]\r\n ");
  28.  
  29.       if (curl_error($ch))  {
  30.          write_log($output."CURL error number: curl_errno($ch)\r\n CURL error: curl_error($ch)\r\n");
  31.         }
  32.       }        
  33.    else {
  34.     $new_url = curl_getinfo($ch,CURLINFO_EFFECTIVE_URL);
  35.     $new_url = mysql_real_escape_string($cb_url); 
  36.  
  37.     $new_dom = GetDomain($new_url);
  38.     $new_dom = mysql_real_escape_string($new_dom);
  39.  
  40.     write_log("LOOK UP URL: $ctr) $target_url: $new_url, $new_dom\r\n");
  41.  
  42.     $sql_temp_url = "UPDATE my_temp SET    
  43.     url = '$new_url',    
  44.     dom = '$new_dom'
  45.     WHERE id = '$newcode' ";
  46.  
  47.     $result_temp_url = mysql_query($sql_temp_url)
  48.            or write_error("Could not UPDATE my_temp url # $ctr.".mysql_error()." \r\n");         
  49.     write_log("WRITEN URL: $ctr) $cb_code: $new_url, $new_dom\r\n");
  50.     }
  51.  
  52.     curl_close($ch);
  53.  $ctr++; 
  54.  }
  55.  
If the curl should go into a function to make the while loop faster,
I am not sure how much should go in the function.

Would it be just this:

Expand|Select|Wrap|Line Numbers
  1. function do_curl($target_url) {
  2.    $ch = curl_init();
  3.    curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
  4.    curl_setopt($ch, CURLOPT_URL,$target_url);
  5.    curl_setopt($ch, CURLOPT_FAILONERROR, true);
  6.    curl_setopt($ch, CURLOPT_STDERR, $ceh);        
  7.    curl_setopt($ch, CURLOPT_VERBOSE, 1);
  8.    curl_setopt($ch, CURLOPT_HEADER,1); 
  9.    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
  10.    curl_setopt($ch, CURLOPT_AUTOREFERER, true);
  11.    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
  12.    curl_setopt($ch, CURLOPT_TIMEOUT, 20);
  13.  
  14.    $output = curl_exec($ch);
  15.    return($output);
  16. }
Then call do_curl($target_url); ?

Or should the checking if's go up in the function a well ?

Would do you recommend to make it most efficient ?

If doing this won't make any difference to the speed of execution, then
for readability, I will leave it as it is. :)

Would appreciate any input.



Thanks.
Feb 22 '10 #9
dlite922
1,584 Expert 1GB
get a faster internet connection. :)





Dan
Feb 22 '10 #10
Look at curl_multi_init. I just finished writing a checked that you pass an array of urls to and it checks them all at once. I've tested with up to 1000 urls at a time and the execution time is roughly the same as the slowest responsing url.

Mine is based off this: http://www.somacon.com/p537.php
Oct 6 '10 #11

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: Albert Ahtenberg | last post by:
Hello, I don't know if it is only me but I was sure that header("Location:url") redirects the browser instantly to URL, or at least stops the execution of the code. But appearantely it continues...
52
by: Gerard M Foley | last post by:
Can one write a webpage which is not displayed but which simply redirects the user to another page without any action by the user? Sorry if this is simple, but I am sometimes simple myself. ...
15
by: Taki Jeden | last post by:
Hello everybody Does anybody know why w3c validator can not get pages that use 404 htaccess redirection? I set up two web sites so that clients request non-existent urls, but htaccess redirects...
193
by: Michael B. | last post by:
I was just thinking about this, specifically wondering if there's any features that the C specification currently lacks, and which may be included in some future standardization. Of course, I...
2
by: Nadav | last post by:
Hi, Introduction: *************** I am trying to redirect stdout to a RichEdit control, this is done by initiating a StringWriter, associated it with a StringBuilder and setting the...
0
by: Dimitrios Mpougas | last post by:
Hello, I have two asp.net pages. The first is a page (main.aspx) wich has four links on it. The href value of each link is: href="view.aspx?id=1" traget="_blank" href="view.aspx?id=2"...
8
by: Luciano A. Ferrer | last post by:
Hi! I was following the http://www.seomoz.org/articles/301-redirects.php article, trying to do that with one of my test sites I added this to the .htaccess file: RewriteEngine On RewriteCond...
13
by: souissipro | last post by:
Hi, I have written a C program that does some of the functionalities mentionned in my previous topic posted some days ago. This shell should: 1- execute input commands from standard input,...
1
by: comp.lang.php | last post by:
require_once("/users/ppowell/web/php_global_vars.php"); if ($_GET) { // INITIALIZE VARS $fileID = @fopen("$userPath/xml/redirect.xml", 'r'); $stuff = @fread($fileID,...
13
by: Massimo Fabbri | last post by:
Maybe it's a little OT, but I'll give it try anyway.... I was asked to maintain and further develop an already existing small company's web site. I know the golden rule of "eternal" URIs, but...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.