473,396 Members | 1,789 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

What regex do I need to use to make this into a hyperlink ?

290 100+
Hi

In my text output file that I will display on my webpage
I have this included:

"Located At: Http://pggg-online.org/blog.htm "

and sometimes :

"found here www.example.org/james.htm "

What is the best way to change these into real
working links eg:

<a href='http://www.example.org/james.htm' target='_blank'>www.example.org/james.htm</a>
I guess that the best way is with a regular expression ?

Of course I have to make sure that I don't double up the
http:// if it is already given

Thanks for any advice
Dec 14 '09 #1
28 3471
dlite922
1,584 Expert 1GB
Here's what I have so far:

Expand|Select|Wrap|Line Numbers
  1. (http://)?(([-A-Z0-9]+\.)?[-A-Z0-9]+\.[A-Z]{2,10})/?([^\s])+(?=\s)
but if you check Google, there's plenty of URL regexps you can modify for your own use.


Dan
Dec 14 '09 #2
kovik
1,044 Expert 1GB
Why is the slash after the domain optional, but the rest of the stuff after it is not? o.O
Dec 14 '09 #3
dlite922
1,584 Expert 1GB
@kovik
not sure what you mean, that regex will match example.com

and these too:

something.example.com
http://what.how.com/wow.html?#$%^&*something%20%30else

Not tested and it does need some work. It was given to you as a start, not as a final solution.




Dan
Dec 15 '09 #4
jeddiki
290 100+
Hi
Thanks for suggesting that I Google it !!

( doh ! - (thats to myself !!) really, thanks .... it helped )

I found this, which I think is what I need:

Expand|Select|Wrap|Line Numbers
  1. //PHP Example: Automatically link URL's inside text.
  2.  
  3. $text = preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $text);
Can anyone spot any problems with using this ?


PS - I saw this comment:
Great regex, thanks ! Small thing : the '-' (dash) is missing - URL like this fails (http://web5.uottawa.ca/admingov/regl...-methodes.html)
Where should I add the "-" ?

Thanks
Dec 15 '09 #5
Atli
5,058 Expert 4TB
I would guess the comment means it should be:
Expand|Select|Wrap|Line Numbers
  1. $text = preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.\-]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $text);
This fixes the problem with the URL posted in the comment, at least.
Dec 15 '09 #6
jeddiki
290 100+
Hi again,

I tried that regex in my script but it is not doing anything.

This is my script:

It is taking the product details out of a database and displaying them.
The description often contains a url

Expand|Select|Wrap|Line Numbers
  1. while($row = mysql_fetch_assoc($result)){
  2.    extract($row);
  3.   $descript = preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.\-]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $descrip);
  4.   $extr = $totearn-$earn;                
  5.   $disp1 = "<b><span style = \"color:maroon;\">$Rctr) $cat</span><br><span style = \"color:darkblue;\">$title</span> ID: $id</b><br>";
  6.   $disp2 = "$descript<br>";
  7.   $disp3 = "<b>$/sale: $earn, +/mth: $rebill, Pop: $pop, Gravity: $grav, Comm%: $comm, Refers: $refer, Total Earn: $totearn</b>";
  8.   echo "<div style=\"width: 800px; text-align: left;\">
  9.   <span>$disp1</span>
  10.   <span>$disp2</span>
  11.   <span>$disp3</span>
  12.  <br><br>
  13.  </div>";
the result of this can be see live here:
script-test

When you see the form just click on the "Analyze Clickbank Now" button
and you will see a list of results.

The results have urls - but none of them are converted :(

Have I done something wrong ?

PS

Just a thought - does that regex only work on https urls?
If so, how do I make it work on all ?
Dec 15 '09 #7
kovik
1,044 Expert 1GB
@dlite922
@dlite:

What I was saying is that your regex also matches this:
http://what.how.comwow.html?#$%^&*something%20%30else

You made the slash after the domain optional.
Dec 15 '09 #8
kovik
1,044 Expert 1GB
@jeddiki:

Firstly, there's no need to surround the entire regex in parentheses. The full regex match exists in $0.

Secondly, your current regex requires a http/https protocol. To make it optional, surround the protocol in parentheses and add a question mark after it.

Thirdly, the domain portion of your regex allows for 1 character TLDs. It also allows for TLDs with dashes. These do not exist. Change "([-\w\.]+)+" to "([-\w]\.)+(\w{2,})". This gives you multiple strings followed by periods, and then a 2 or more character string (without dashes).

The rest looks fine from here.
Dec 15 '09 #9
jeddiki
290 100+
Hi,

Thanks for your help,

I tried to follow what you said but I get this error:

Warning: preg_replace() [function.preg-replace]: Compilation failed: missing terminating ] for character class at offset 68 in /home/guru54gt5/public_html/sys/cb_search.php on line 208
this is the regex:
$descript = preg_replace('@ ( (https?://)? ([-\w]\.)+(\w{2,})+(:\d+)?(/([\w/_\.\]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $descrip);
Can you see where I have gone wrong ?
Dec 15 '09 #10
Dormilich
8,658 Expert Mod 8TB
the character class opened at offset 44 should be closed at offset 52, but its closing brace is escaped, holding the character class open.
Dec 15 '09 #11
jeddiki
290 100+
Again, thanks for the input,

I have changed the expression to:
$descript = preg_replace('@ ( (https?://)? ([-\w]\.)+(\w{2,})+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $descrip);
Now it doesn't error - but it does not convert the urls to
hyperlinks either :(

The result can be seen here: clickbank tool

When you click on the big button, you get a list

You see on the list in position 2 there is a url:
Expand|Select|Wrap|Line Numbers
  1. Http://www.conversational-hypnosis.com/affiliate.php.
and then in position 4 another one:
Expand|Select|Wrap|Line Numbers
  1. At MaverickMoneyMakers.com/Bonus.
and position 6:
Expand|Select|Wrap|Line Numbers
  1. www.affilorama.com/affiliates
None of them got converted.
Dec 15 '09 #12
kovik
1,044 Expert 1GB
Firstly, get rid of the whitespace in the regex. You need to explicitly tell it to ignore whitespace if you don't want it to interpret the whitespace as \s characters.

Secondly, I made a mistake in my regex correction. Change "([-\w]\.)" to "([-\w]+\.)". Also, I just noticed that your regex doesn't allow dashes anywhere but in the domain. Why is that?
Dec 15 '09 #13
jeddiki
290 100+
OK,

I got rid of the spaces.

and added that "+"

I also added a couple of "-" s

so now I have :
$descript = preg_replace('@((https?://)?([-\w]+\.)+(-\w{2,})+(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $descrip);
But alas, no improvement
Dec 15 '09 #14
kovik
1,044 Expert 1GB
... Take the dash out of the TLD. The way you wrote it requires that the first character is a dash, and TLDs don't even have dashes at all.

Also, are you aware that you use $descript and $descrip?
Dec 15 '09 #15
jeddiki
290 100+
Is the TLD this bit ?

(/([-\w/_\.]*

Sorry but I find it difficult to read this stuff !

BTW does this regex cope with .co.uk type of urls?

Yes, I did know about the discript difference, thanks
Dec 15 '09 #16
kovik
1,044 Expert 1GB
It should handle URLs with multiple TLDs.

And it'd be a good idea to take a regex crash course online. And download The Regex Coach. It makes regex easier to understand, as you get to see it applied, live.
Dec 15 '09 #17
jeddiki
290 100+
Thanks for your help.

I have gone through a couple of crash course regex, but the
examples were all pretty basic compared to this.

I have loaded The Regex Coach

and have copied in the expreesion I am working on, but
it does not pick up any matchs.



I tried stepping through but got nothing.


Maybe if I break it down, you can help me where I do not
have the right understanding?

Target: http://www.fred.blogspot.co.uk/confirm.php?a=56
@((https?://)?
The @ is the opener

(https?://)?
The first ? means that the s is optional,
The last ? means that the whole thing is optional
The parentheses mean the element is collected up as $1

Question - if it is optional and doesn't exist is $1 = null ?

In target : http://www.fred.blogspot.co.uk/confirm.php?a=56
([-\w]+\.)+
the [] are grouping the - and w(any alphnumeric)
the + sign means one or more or the grouped items
then a period and must be one or more of these

The parentheses mean the element is collected up as $2

Question - if there is more than more, will it automatically
know to use $3 and $4 etc. ?
In target : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $2
and: : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $3
and: : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $4
and: : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $5
(-\w{2,})+
This looks wrong - surely there should be a grouping here ?
like this:
([-\w]{2,})+
Now it will be saying that there should be a least 2 characters
And there must be one or more off these

The parentheses mean the element is collected up as $6

In target : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $6
(:\d+)?
So this is the port number - optional , if present ------ $7

(/([-\w/_\.]*(\?\S+)?)?)?)
This is for directories and passed variables.
The widest parentheses and last ? she the whole lot is optional
The / is optional
Then zero or more of any alpha numeric or -/_\. - for a sub-directory optional
(second to last ? )

I do not see why the period is escaped as it is in the group there is no need.

The escaped ? must be for passed variables which are optional
(third to last ? )

Question: what is the capital S for ? small s is for any whitespace

Question: I can not tell if these are being collected in a variable
it looks like they might not be as all the parentheses seem to be grouping.

In target : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $6
So do I need to do this ?
Expand|Select|Wrap|Line Numbers
  1. $descrip = preg_replace('@((https?://)?([-\w]+\.)+(-\w{2,})+(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)?)@', '<a href="$1$2$3$4$5$6$7">$1$2$3$4$5$6$7</a>', $descrip);
  2.  
Would really appreciate some more help

Thanks
Dec 18 '09 #18
kovik
1,044 Expert 1GB
@jeddiki
The "@" is an arbitrary delimiting character. This is only necessary in languages that are using regex (i.e. PHP), not in The Regex Coach. Delimiters are needed so that you can have modifiers after the string (i.e. "i" makes your pattern case-insensitive). The modifiers are in The Regex Coach on the right as checkboxes.

@jeddiki
It will be an empty string.

@jeddiki
No. They will overwrite each other as $2, so only the last one will be in $2. However, the entire thing is still included in $0.

@jeddiki
This is the TLD. I told you to get rid of the dash.

@jeddiki
You are correct. Take that up with the place you got it from. ;)

@jeddiki
Uppercase "S" is equivalent to "[^s]".

@jeddiki
The parentheses indicate the variable that they will exist under, not the number of matches. So it will always be static.



@jeddiki
Only use "$0".
Dec 18 '09 #19
jeddiki
290 100+
Thanks for your reply.

I will go through this a hopefully get it working :)
Dec 19 '09 #20
jeddiki
290 100+
OK I have my regex now
( and I understand it :) )

but I need to insert a conditional statement and I am not sure of the best way to do this.

I am using preg_replace on the text

so I have:

Expand|Select|Wrap|Line Numbers
  1. while($row = mysql_fetch_assoc($result)){
  2. extract($row);
  3. $descrip = preg_replace('@((https?://)?([-\w]+\.)+(\w{2,})+?(/([-\w/_.]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $descrip);
  4. echo "<span>$descrip</span>";
  5.  
Now the problem is, I want to check to see if the http has been included and if not, then I have to add it.

(actually I need to check for https:// or http://
or none of them. )

In my testing page I used this:

Expand|Select|Wrap|Line Numbers
  1. $subject = $regex;
  2. $pattern = "@((https?://)?([-\w]+\.)+(\w{2,})+?(/([-\w/_.]*(\?\S+)?)?)?)@";
  3. $result = preg_match($pattern, $subject, $matches);
  4.  
  5. if($result == "1") {
  6.    $new_url = $matches[0];
  7.    $tinyurl  =substr($new_url,0,4);
  8.    $shorturl  =substr($new_url,0,7);
  9.    $shorturl_s  =substr($new_url,0,8);
  10.  
  11.    if ($tinyurl == "http") {
  12.      if ($shorturl == "http://") {
  13.        $longurl = substr($new_url,7);
  14.         echo "<br><br>URL:  <a href=\"http://$longurl\">$longurl<a>";
  15.         }
  16.      if ($shorturl_s == "https://") {
  17.        $longurl = substr($new_url,8);
  18.        echo "<br><br>URL:  <a href=\"https://$longurl\">$longurl<a>";
  19.        }
  20.  }
  21.  else {
  22.     echo "<br><br>URL:  <a href=\"http://$new_url\">$new_url<a>";
  23.  }        
  24. }
  25.   else {
  26.      echo "<br><br>No Matches";
  27.   }                                
  28.  
But how can I put this condition into my preg_replace ?

If I can not do it - what do you think is the best way to
do accomplish this ?

Anu suggestions ?
Dec 21 '09 #21
kovik
1,044 Expert 1GB
Do a preg_replace on the final data, finding all links without a protocol and add it.

Expand|Select|Wrap|Line Numbers
  1. $data = preg_replace('~<a href="(?!https?://~', '\\0http://', $data);
The "?!" means "does not match." It applies to everything in its parentheses.
Dec 21 '09 #22
jeddiki
290 100+
Thanks for that idea.

What does the \\0 in '\\0http://' mean ?
Dec 21 '09 #23
kovik
1,044 Expert 1GB
"\\0" in PHP is equal to "\0" to the regex engine. In regex, \0 is the same as $0.
Dec 21 '09 #24
jeddiki
290 100+
Thanks,

I get an error:

Warning: preg_replace() [function.preg-replace]: Compilation failed: missing ) at offset 21 in /home/guru54gt5/public_html/sys/cb_search.php on line 209

So I guess this needs a closing parentheses added:
$data = preg_replace('~<a href="(?!https?://~', '\\0http://', $data);
Dec 22 '09 #25
kovik
1,044 Expert 1GB
Then add it? lol.
Right before the end delimiter.
Dec 22 '09 #26
jeddiki
290 100+
Are you sure I should add it ?

I mean really, really sure ?


;-)


( I thought you might say YES )

So I did , and it works !!

Thanks.

Just one thing, I want to make this link open in a new window,
so I need to add a target= "_blank".

Do I need to do another preg_replace or can I squeeze it into one of the two that I now have ?
Dec 22 '09 #27
kovik
1,044 Expert 1GB
Add it to the second parameter of the first preg_replace() when you make the initial element.
Dec 22 '09 #28
jeddiki
290 100+
Done that.

Thanks again for your help. :)
Dec 22 '09 #29

Sign in to post your reply or Sign up for a free account.

Similar topics

3
by: Danny | last post by:
I am trying to do a regular expression to search for a url so anything that has http:\\www.hellothere.com but may not have the http:\\ and may not have the www and may not have http:\\www and...
7
by: Ali-R | last post by:
Hi all, I am getting a CSV file like this from our client: "C1","2","12344","Mr","John","Chan","05/07/1976"......... I need to validate **each filed value** against a set of rules ,for...
14
by: Rudy | last post by:
Hello all! I been trying to get a handle with Images. I have learned alot from the fine people here. So, I also learned that thumbnail images look terrible taken from a digital cam. I know why...
4
by: joey.powell | last post by:
I have run into a problem with one of my aspx pages. When I run the page, I get a "Server Tag Is Not Well Formed" error. This message goes away when I remove the line with the problem. This simply...
17
by: steve | last post by:
here's the deal...cvs, tick encapsulted data. trying to use regex's to validate records. here's an example row: 'AD,'BF','132465','06/09/2004','','BNSF','A','TYPE','1278','','BR','2999',''...
13
by: Chris Lieb | last post by:
I am trying to write a regex that will parse BBcode into HTML using JavaScript. Everything was going smoothly using the string class replace() operator with regex's until I got to the list tag....
23
by: Xah Lee | last post by:
The Concepts and Confusions of Pre-fix, In-fix, Post-fix and Fully Functional Notations Xah Lee, 2006-03-15 Let me summarize: The LISP notation, is a functional notation, and is not a...
0
by: netventuresmarketing | last post by:
Ok so you have already figured out with a name like netventuresmarketing that Im a Internet marketing type trying to teach myself some simple PHP. I purchased a turnkey program to set up a free...
4
by: seberino | last post by:
I'm looking over the docs for the re module and can't find how to "NOT" an entire regex. For example..... How make regex that means "contains regex#1 but NOT regex#2" ? Chris
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.