469,306 Members | 1,658 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,306 developers. It's quick & easy.

What regex do I need to use to make this into a hyperlink ?

290 100+
Hi

In my text output file that I will display on my webpage
I have this included:

"Located At: Http://pggg-online.org/blog.htm "

and sometimes :

"found here www.example.org/james.htm "

What is the best way to change these into real
working links eg:

<a href='http://www.example.org/james.htm' target='_blank'>www.example.org/james.htm</a>
I guess that the best way is with a regular expression ?

Of course I have to make sure that I don't double up the
http:// if it is already given

Thanks for any advice
Dec 14 '09 #1
28 3135
dlite922
1,584 Expert 1GB
Here's what I have so far:

Expand|Select|Wrap|Line Numbers
  1. (http://)?(([-A-Z0-9]+\.)?[-A-Z0-9]+\.[A-Z]{2,10})/?([^\s])+(?=\s)
but if you check Google, there's plenty of URL regexps you can modify for your own use.


Dan
Dec 14 '09 #2
kovik
1,044 Expert 1GB
Why is the slash after the domain optional, but the rest of the stuff after it is not? o.O
Dec 14 '09 #3
dlite922
1,584 Expert 1GB
@kovik
not sure what you mean, that regex will match example.com

and these too:

something.example.com
http://what.how.com/wow.html?#$%^&*something%20%30else

Not tested and it does need some work. It was given to you as a start, not as a final solution.




Dan
Dec 15 '09 #4
jeddiki
290 100+
Hi
Thanks for suggesting that I Google it !!

( doh ! - (thats to myself !!) really, thanks .... it helped )

I found this, which I think is what I need:

Expand|Select|Wrap|Line Numbers
  1. //PHP Example: Automatically link URL's inside text.
  2.  
  3. $text = preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $text);
Can anyone spot any problems with using this ?


PS - I saw this comment:
Great regex, thanks ! Small thing : the '-' (dash) is missing - URL like this fails (http://web5.uottawa.ca/admingov/regl...-methodes.html)
Where should I add the "-" ?

Thanks
Dec 15 '09 #5
Atli
5,058 Expert 4TB
I would guess the comment means it should be:
Expand|Select|Wrap|Line Numbers
  1. $text = preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.\-]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $text);
This fixes the problem with the URL posted in the comment, at least.
Dec 15 '09 #6
jeddiki
290 100+
Hi again,

I tried that regex in my script but it is not doing anything.

This is my script:

It is taking the product details out of a database and displaying them.
The description often contains a url

Expand|Select|Wrap|Line Numbers
  1. while($row = mysql_fetch_assoc($result)){
  2.    extract($row);
  3.   $descript = preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.\-]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $descrip);
  4.   $extr = $totearn-$earn;                
  5.   $disp1 = "<b><span style = \"color:maroon;\">$Rctr) $cat</span><br><span style = \"color:darkblue;\">$title</span> ID: $id</b><br>";
  6.   $disp2 = "$descript<br>";
  7.   $disp3 = "<b>$/sale: $earn, +/mth: $rebill, Pop: $pop, Gravity: $grav, Comm%: $comm, Refers: $refer, Total Earn: $totearn</b>";
  8.   echo "<div style=\"width: 800px; text-align: left;\">
  9.   <span>$disp1</span>
  10.   <span>$disp2</span>
  11.   <span>$disp3</span>
  12.  <br><br>
  13.  </div>";
the result of this can be see live here:
script-test

When you see the form just click on the "Analyze Clickbank Now" button
and you will see a list of results.

The results have urls - but none of them are converted :(

Have I done something wrong ?

PS

Just a thought - does that regex only work on https urls?
If so, how do I make it work on all ?
Dec 15 '09 #7
kovik
1,044 Expert 1GB
@dlite922
@dlite:

What I was saying is that your regex also matches this:
http://what.how.comwow.html?#$%^&*something%20%30else

You made the slash after the domain optional.
Dec 15 '09 #8
kovik
1,044 Expert 1GB
@jeddiki:

Firstly, there's no need to surround the entire regex in parentheses. The full regex match exists in $0.

Secondly, your current regex requires a http/https protocol. To make it optional, surround the protocol in parentheses and add a question mark after it.

Thirdly, the domain portion of your regex allows for 1 character TLDs. It also allows for TLDs with dashes. These do not exist. Change "([-\w\.]+)+" to "([-\w]\.)+(\w{2,})". This gives you multiple strings followed by periods, and then a 2 or more character string (without dashes).

The rest looks fine from here.
Dec 15 '09 #9
jeddiki
290 100+
Hi,

Thanks for your help,

I tried to follow what you said but I get this error:

Warning: preg_replace() [function.preg-replace]: Compilation failed: missing terminating ] for character class at offset 68 in /home/guru54gt5/public_html/sys/cb_search.php on line 208
this is the regex:
$descript = preg_replace('@ ( (https?://)? ([-\w]\.)+(\w{2,})+(:\d+)?(/([\w/_\.\]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $descrip);
Can you see where I have gone wrong ?
Dec 15 '09 #10
Dormilich
8,651 Expert Mod 8TB
the character class opened at offset 44 should be closed at offset 52, but its closing brace is escaped, holding the character class open.
Dec 15 '09 #11
jeddiki
290 100+
Again, thanks for the input,

I have changed the expression to:
$descript = preg_replace('@ ( (https?://)? ([-\w]\.)+(\w{2,})+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $descrip);
Now it doesn't error - but it does not convert the urls to
hyperlinks either :(

The result can be seen here: clickbank tool

When you click on the big button, you get a list

You see on the list in position 2 there is a url:
Expand|Select|Wrap|Line Numbers
  1. Http://www.conversational-hypnosis.com/affiliate.php.
and then in position 4 another one:
Expand|Select|Wrap|Line Numbers
  1. At MaverickMoneyMakers.com/Bonus.
and position 6:
Expand|Select|Wrap|Line Numbers
  1. www.affilorama.com/affiliates
None of them got converted.
Dec 15 '09 #12
kovik
1,044 Expert 1GB
Firstly, get rid of the whitespace in the regex. You need to explicitly tell it to ignore whitespace if you don't want it to interpret the whitespace as \s characters.

Secondly, I made a mistake in my regex correction. Change "([-\w]\.)" to "([-\w]+\.)". Also, I just noticed that your regex doesn't allow dashes anywhere but in the domain. Why is that?
Dec 15 '09 #13
jeddiki
290 100+
OK,

I got rid of the spaces.

and added that "+"

I also added a couple of "-" s

so now I have :
$descript = preg_replace('@((https?://)?([-\w]+\.)+(-\w{2,})+(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $descrip);
But alas, no improvement
Dec 15 '09 #14
kovik
1,044 Expert 1GB
... Take the dash out of the TLD. The way you wrote it requires that the first character is a dash, and TLDs don't even have dashes at all.

Also, are you aware that you use $descript and $descrip?
Dec 15 '09 #15
jeddiki
290 100+
Is the TLD this bit ?

(/([-\w/_\.]*

Sorry but I find it difficult to read this stuff !

BTW does this regex cope with .co.uk type of urls?

Yes, I did know about the discript difference, thanks
Dec 15 '09 #16
kovik
1,044 Expert 1GB
It should handle URLs with multiple TLDs.

And it'd be a good idea to take a regex crash course online. And download The Regex Coach. It makes regex easier to understand, as you get to see it applied, live.
Dec 15 '09 #17
jeddiki
290 100+
Thanks for your help.

I have gone through a couple of crash course regex, but the
examples were all pretty basic compared to this.

I have loaded The Regex Coach

and have copied in the expreesion I am working on, but
it does not pick up any matchs.



I tried stepping through but got nothing.


Maybe if I break it down, you can help me where I do not
have the right understanding?

Target: http://www.fred.blogspot.co.uk/confirm.php?a=56
@((https?://)?
The @ is the opener

(https?://)?
The first ? means that the s is optional,
The last ? means that the whole thing is optional
The parentheses mean the element is collected up as $1

Question - if it is optional and doesn't exist is $1 = null ?

In target : http://www.fred.blogspot.co.uk/confirm.php?a=56
([-\w]+\.)+
the [] are grouping the - and w(any alphnumeric)
the + sign means one or more or the grouped items
then a period and must be one or more of these

The parentheses mean the element is collected up as $2

Question - if there is more than more, will it automatically
know to use $3 and $4 etc. ?
In target : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $2
and: : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $3
and: : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $4
and: : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $5
(-\w{2,})+
This looks wrong - surely there should be a grouping here ?
like this:
([-\w]{2,})+
Now it will be saying that there should be a least 2 characters
And there must be one or more off these

The parentheses mean the element is collected up as $6

In target : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $6
(:\d+)?
So this is the port number - optional , if present ------ $7

(/([-\w/_\.]*(\?\S+)?)?)?)
This is for directories and passed variables.
The widest parentheses and last ? she the whole lot is optional
The / is optional
Then zero or more of any alpha numeric or -/_\. - for a sub-directory optional
(second to last ? )

I do not see why the period is escaped as it is in the group there is no need.

The escaped ? must be for passed variables which are optional
(third to last ? )

Question: what is the capital S for ? small s is for any whitespace

Question: I can not tell if these are being collected in a variable
it looks like they might not be as all the parentheses seem to be grouping.

In target : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $6
So do I need to do this ?
Expand|Select|Wrap|Line Numbers
  1. $descrip = preg_replace('@((https?://)?([-\w]+\.)+(-\w{2,})+(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)?)@', '<a href="$1$2$3$4$5$6$7">$1$2$3$4$5$6$7</a>', $descrip);
  2.  
Would really appreciate some more help

Thanks
Dec 18 '09 #18
kovik
1,044 Expert 1GB
@jeddiki
The "@" is an arbitrary delimiting character. This is only necessary in languages that are using regex (i.e. PHP), not in The Regex Coach. Delimiters are needed so that you can have modifiers after the string (i.e. "i" makes your pattern case-insensitive). The modifiers are in The Regex Coach on the right as checkboxes.

@jeddiki
It will be an empty string.

@jeddiki
No. They will overwrite each other as $2, so only the last one will be in $2. However, the entire thing is still included in $0.

@jeddiki
This is the TLD. I told you to get rid of the dash.

@jeddiki
You are correct. Take that up with the place you got it from. ;)

@jeddiki
Uppercase "S" is equivalent to "[^s]".

@jeddiki
The parentheses indicate the variable that they will exist under, not the number of matches. So it will always be static.



@jeddiki
Only use "$0".
Dec 18 '09 #19
jeddiki
290 100+
Thanks for your reply.

I will go through this a hopefully get it working :)
Dec 19 '09 #20
jeddiki
290 100+
OK I have my regex now
( and I understand it :) )

but I need to insert a conditional statement and I am not sure of the best way to do this.

I am using preg_replace on the text

so I have:

Expand|Select|Wrap|Line Numbers
  1. while($row = mysql_fetch_assoc($result)){
  2. extract($row);
  3. $descrip = preg_replace('@((https?://)?([-\w]+\.)+(\w{2,})+?(/([-\w/_.]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $descrip);
  4. echo "<span>$descrip</span>";
  5.  
Now the problem is, I want to check to see if the http has been included and if not, then I have to add it.

(actually I need to check for https:// or http://
or none of them. )

In my testing page I used this:

Expand|Select|Wrap|Line Numbers
  1. $subject = $regex;
  2. $pattern = "@((https?://)?([-\w]+\.)+(\w{2,})+?(/([-\w/_.]*(\?\S+)?)?)?)@";
  3. $result = preg_match($pattern, $subject, $matches);
  4.  
  5. if($result == "1") {
  6.    $new_url = $matches[0];
  7.    $tinyurl  =substr($new_url,0,4);
  8.    $shorturl  =substr($new_url,0,7);
  9.    $shorturl_s  =substr($new_url,0,8);
  10.  
  11.    if ($tinyurl == "http") {
  12.      if ($shorturl == "http://") {
  13.        $longurl = substr($new_url,7);
  14.         echo "<br><br>URL:  <a href=\"http://$longurl\">$longurl<a>";
  15.         }
  16.      if ($shorturl_s == "https://") {
  17.        $longurl = substr($new_url,8);
  18.        echo "<br><br>URL:  <a href=\"https://$longurl\">$longurl<a>";
  19.        }
  20.  }
  21.  else {
  22.     echo "<br><br>URL:  <a href=\"http://$new_url\">$new_url<a>";
  23.  }        
  24. }
  25.   else {
  26.      echo "<br><br>No Matches";
  27.   }                                
  28.  
But how can I put this condition into my preg_replace ?

If I can not do it - what do you think is the best way to
do accomplish this ?

Anu suggestions ?
Dec 21 '09 #21
kovik
1,044 Expert 1GB
Do a preg_replace on the final data, finding all links without a protocol and add it.

Expand|Select|Wrap|Line Numbers
  1. $data = preg_replace('~<a href="(?!https?://~', '\\0http://', $data);
The "?!" means "does not match." It applies to everything in its parentheses.
Dec 21 '09 #22
jeddiki
290 100+
Thanks for that idea.

What does the \\0 in '\\0http://' mean ?
Dec 21 '09 #23
kovik
1,044 Expert 1GB
"\\0" in PHP is equal to "\0" to the regex engine. In regex, \0 is the same as $0.
Dec 21 '09 #24
jeddiki
290 100+
Thanks,

I get an error:

Warning: preg_replace() [function.preg-replace]: Compilation failed: missing ) at offset 21 in /home/guru54gt5/public_html/sys/cb_search.php on line 209

So I guess this needs a closing parentheses added:
$data = preg_replace('~<a href="(?!https?://~', '\\0http://', $data);
Dec 22 '09 #25
kovik
1,044 Expert 1GB
Then add it? lol.
Right before the end delimiter.
Dec 22 '09 #26
jeddiki
290 100+
Are you sure I should add it ?

I mean really, really sure ?


;-)


( I thought you might say YES )

So I did , and it works !!

Thanks.

Just one thing, I want to make this link open in a new window,
so I need to add a target= "_blank".

Do I need to do another preg_replace or can I squeeze it into one of the two that I now have ?
Dec 22 '09 #27
kovik
1,044 Expert 1GB
Add it to the second parameter of the first preg_replace() when you make the initial element.
Dec 22 '09 #28
jeddiki
290 100+
Done that.

Thanks again for your help. :)
Dec 22 '09 #29

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

14 posts views Thread by Rudy | last post: by
4 posts views Thread by joey.powell | last post: by
17 posts views Thread by steve | last post: by
13 posts views Thread by Chris Lieb | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by harlem98 | last post: by
reply views Thread by harlem98 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.