473,513 Members | 2,415 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How do I get the final url from a redirection?

290 Contributor
Hi,

I want to capture the final url that a website redirects to.

Here is an example of what I mean:

www.example.com/sites.php?pd=45

When you click on that link, the site will redirect you to

www.Joe-Blogs.com/green/prod1.html?a=527

As you can see they are two different sites.

What I would like to do is pick the
www.Joe-Blogs.com/green/prod1.html part of the final url
and put it in a variable called $final_url.

So if I have :


Expand|Select|Wrap|Line Numbers
  1. $first_url = "www.example.com/sites.php?pd=45"; 

What would be the best way to get to that $final_url.

Should I be using cUrl or would
file() or get_file_contents() be able to get the url ?

Any ideas on how I can get to my $final_url ?
Feb 15 '10 #1
10 6938
xNephilimx
213 Recognized Expert New Member
Are you trying to make some kind of web proxy? If so, there are quite a few around, like PHProxy http://www.phproxy.org/ (source code: http://sourceforge.net/projects/poxy/).
There's no need to reinvent the wheel.

Best regards
Feb 15 '10 #2
jeddiki
290 Contributor
Thanks,

but no I am not trying to build a proxy,

I want to get the final url so that I can use it in another
web site that does analysis based on the url.
Feb 15 '10 #3
jeddiki
290 Contributor
If I use cUrl, the code below should get the to the final webpage right?

Is the final destination in the HEADER info ?

Expand|Select|Wrap|Line Numbers
  1. $target_url = "www.example.com/sites.php?pd=45";
  2. $cef = "curl_err.txt"; 
  3. $ceh = fopen($cef, 'w');
  4.  
  5. curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
  6. curl_setopt($ch, CURLOPT_URL,$target_url);
  7. curl_setopt($ch, CURLOPT_FAILONERROR, true);
  8. curl_setopt($ch, CURLOPT_STDERR, $ceh);        
  9. curl_setopt($ch, CURLOPT_VERBOSE, 1);
  10. curl_setopt($ch, CURLOPT_HEADER,1); 
  11. curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
  12. curl_setopt($ch, CURLOPT_AUTOREFERER, true);
  13. curl_setopt($ch, CURLOPT_BINARYTRANSFER,true);
  14. curl_setopt($ch, CURLOPT_TIMEOUT, 10);
  15.  
  16. $output = curl_exec($ch);
  17. $info = curl_getinfo($ch);
  18.  
How would I extract the final url ?

Any ideas ?



.
Feb 15 '10 #4
jeddiki
290 Contributor
Please HELP !!!

I am really stuck on this one - surely someone knows how
to do this ??


maybe I should be using fsocket() ??

any ideas ?

Thanks



.
Feb 18 '10 #5
kovik
1,044 Recognized Expert Top Contributor
Firstly, why are you doing this? Chances are that you are going about this incorrectly, and we can't help you if you don't give a clear idea of your end-goal.

Secondly, redirection is not a standardized process. It can be performed via headers, meta-tags, or JavaScript. Do you plan to account for all of these?
Feb 19 '10 #6
jeddiki
290 Contributor
Hi,

If it helps I wll give you a specific real example that
many of us have heard of....

Take the "hop" link that affiliates of clickbank use.

If has the format: xxxx.PROD-ID.hop.cklickbank.net

When you click on a "hop-link" it does not go to clickbank.net
but goes to the product sales page: www.hip-new-product.com
so it redirects via some method ( I don't know what ) to that
sales page.

So what I want to do it capture that end url and then use it
in another place - for example it could be in Alexa.com

So to get site info from Alexa about a website I need to type in Alexa.com?url=www.hip-new-product.com.

Instead of that I can do Alexa.com?url=$finalurl

where $finalurl comes from getting the redirect from cklickbank.net

Hope that helps explain the process.

It is true I don't know which type of redirect is being used, all I want it the final
landing page url.

Any ideas ?
Feb 19 '10 #7
kovik
1,044 Recognized Expert Top Contributor
I would suggest using cURL, as you have opted, and make sure that you set CURLOPT_FOLLOWLOCATION to true. This may not work for JavaScript redirects, but it is designed to work for server-side redirects. Use curl_getinfo() with the option CURLINFO_EFFECTIVE_URL. The 'url' key of the return value should have the final URL that you are looking for.
Feb 19 '10 #8
jeddiki
290 Contributor
Thanks,

I have managed to get it working.

The only problem is, it takes nearly four hours to process
all 11,000 websites.

This equates to one every 1.26 seconds.

Do you think there is a quicker method ?

May be I should put some of my code into a function - although
I don't know how much of it:

This is my code:

Expand|Select|Wrap|Line Numbers
  1. $sql_url = "SELECT id FROM my_temp ORDER BY id";
  2. $result_url = mysql_query($sql_url)    or write_error("Could not SELECT id FROM my_temp ".mysql_error()." \r\n");     
  3.  
  4. $ctr = 1;
  5.  
  6. while($row_url = mysql_fetch_assoc($result_url)){
  7.  
  8.    $my_code = $row_url['id'];
  9.    $target_url = "http://zzzzz.$my_code.example.com/";  
  10.  
  11.    $ch = curl_init();
  12.    curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
  13.    curl_setopt($ch, CURLOPT_URL,$target_url);
  14.    curl_setopt($ch, CURLOPT_FAILONERROR, true);
  15.    curl_setopt($ch, CURLOPT_STDERR, $ceh);        
  16.    curl_setopt($ch, CURLOPT_VERBOSE, 1);
  17.    curl_setopt($ch, CURLOPT_HEADER,1); 
  18.    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
  19.    curl_setopt($ch, CURLOPT_AUTOREFERER, true);
  20.    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
  21.    curl_setopt($ch, CURLOPT_TIMEOUT, 20);
  22.  
  23.    $output = curl_exec($ch);
  24.    $info = curl_getinfo($ch);
  25.  
  26.    if ($output === FALSE ) {
  27.       write_log("No cURL data returned for $target_url [". $info['http_code']. "]\r\n ");
  28.  
  29.       if (curl_error($ch))  {
  30.          write_log($output."CURL error number: curl_errno($ch)\r\n CURL error: curl_error($ch)\r\n");
  31.         }
  32.       }        
  33.    else {
  34.     $new_url = curl_getinfo($ch,CURLINFO_EFFECTIVE_URL);
  35.     $new_url = mysql_real_escape_string($cb_url); 
  36.  
  37.     $new_dom = GetDomain($new_url);
  38.     $new_dom = mysql_real_escape_string($new_dom);
  39.  
  40.     write_log("LOOK UP URL: $ctr) $target_url: $new_url, $new_dom\r\n");
  41.  
  42.     $sql_temp_url = "UPDATE my_temp SET    
  43.     url = '$new_url',    
  44.     dom = '$new_dom'
  45.     WHERE id = '$newcode' ";
  46.  
  47.     $result_temp_url = mysql_query($sql_temp_url)
  48.            or write_error("Could not UPDATE my_temp url # $ctr.".mysql_error()." \r\n");         
  49.     write_log("WRITEN URL: $ctr) $cb_code: $new_url, $new_dom\r\n");
  50.     }
  51.  
  52.     curl_close($ch);
  53.  $ctr++; 
  54.  }
  55.  
If the curl should go into a function to make the while loop faster,
I am not sure how much should go in the function.

Would it be just this:

Expand|Select|Wrap|Line Numbers
  1. function do_curl($target_url) {
  2.    $ch = curl_init();
  3.    curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
  4.    curl_setopt($ch, CURLOPT_URL,$target_url);
  5.    curl_setopt($ch, CURLOPT_FAILONERROR, true);
  6.    curl_setopt($ch, CURLOPT_STDERR, $ceh);        
  7.    curl_setopt($ch, CURLOPT_VERBOSE, 1);
  8.    curl_setopt($ch, CURLOPT_HEADER,1); 
  9.    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
  10.    curl_setopt($ch, CURLOPT_AUTOREFERER, true);
  11.    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
  12.    curl_setopt($ch, CURLOPT_TIMEOUT, 20);
  13.  
  14.    $output = curl_exec($ch);
  15.    return($output);
  16. }
Then call do_curl($target_url); ?

Or should the checking if's go up in the function a well ?

Would do you recommend to make it most efficient ?

If doing this won't make any difference to the speed of execution, then
for readability, I will leave it as it is. :)

Would appreciate any input.



Thanks.
Feb 22 '10 #9
dlite922
1,584 Recognized Expert Top Contributor
get a faster internet connection. :)





Dan
Feb 22 '10 #10
Look at curl_multi_init. I just finished writing a checked that you pass an array of urls to and it checks them all at once. I've tested with up to 1000 urls at a time and the execution time is roughly the same as the slowest responsing url.

Mine is based off this: http://www.somacon.com/p537.php
Oct 6 '10 #11

Sign in to post your reply or Sign up for a free account.

Similar topics

2
5775
by: Albert Ahtenberg | last post by:
Hello, I don't know if it is only me but I was sure that header("Location:url") redirects the browser instantly to URL, or at least stops the execution of the code. But appearantely it continues...
52
5392
by: Gerard M Foley | last post by:
Can one write a webpage which is not displayed but which simply redirects the user to another page without any action by the user? Sorry if this is simple, but I am sometimes simple myself. ...
15
5121
by: Taki Jeden | last post by:
Hello everybody Does anybody know why w3c validator can not get pages that use 404 htaccess redirection? I set up two web sites so that clients request non-existent urls, but htaccess redirects...
193
9377
by: Michael B. | last post by:
I was just thinking about this, specifically wondering if there's any features that the C specification currently lacks, and which may be included in some future standardization. Of course, I...
2
3569
by: Nadav | last post by:
Hi, Introduction: *************** I am trying to redirect stdout to a RichEdit control, this is done by initiating a StringWriter, associated it with a StringBuilder and setting the...
0
1475
by: Dimitrios Mpougas | last post by:
Hello, I have two asp.net pages. The first is a page (main.aspx) wich has four links on it. The href value of each link is: href="view.aspx?id=1" traget="_blank" href="view.aspx?id=2"...
8
2517
by: Luciano A. Ferrer | last post by:
Hi! I was following the http://www.seomoz.org/articles/301-redirects.php article, trying to do that with one of my test sites I added this to the .htaccess file: RewriteEngine On RewriteCond...
13
2680
by: souissipro | last post by:
Hi, I have written a C program that does some of the functionalities mentionned in my previous topic posted some days ago. This shell should: 1- execute input commands from standard input,...
1
3513
by: comp.lang.php | last post by:
require_once("/users/ppowell/web/php_global_vars.php"); if ($_GET) { // INITIALIZE VARS $fileID = @fopen("$userPath/xml/redirect.xml", 'r'); $stuff = @fread($fileID,...
13
4313
by: Massimo Fabbri | last post by:
Maybe it's a little OT, but I'll give it try anyway.... I was asked to maintain and further develop an already existing small company's web site. I know the golden rule of "eternal" URIs, but...
0
7265
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7389
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7551
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
7115
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7542
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5694
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
4751
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3231
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1607
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.