472,353 Members | 2,198 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,353 software developers and data experts.

What regex do I need to use to make this into a hyperlink ?

290 100+

In my text output file that I will display on my webpage
I have this included:

"Located At: Http://pggg-online.org/blog.htm "

and sometimes :

"found here www.example.org/james.htm "

What is the best way to change these into real
working links eg:

<a href='http://www.example.org/james.htm' target='_blank'>www.example.org/james.htm</a>
I guess that the best way is with a regular expression ?

Of course I have to make sure that I don't double up the
http:// if it is already given

Thanks for any advice
Dec 14 '09 #1
28 3242
1,584 Expert 1GB
Here's what I have so far:

Expand|Select|Wrap|Line Numbers
  1. (http://)?(([-A-Z0-9]+\.)?[-A-Z0-9]+\.[A-Z]{2,10})/?([^\s])+(?=\s)
but if you check Google, there's plenty of URL regexps you can modify for your own use.

Dec 14 '09 #2
1,044 Expert 1GB
Why is the slash after the domain optional, but the rest of the stuff after it is not? o.O
Dec 14 '09 #3
1,584 Expert 1GB
not sure what you mean, that regex will match example.com

and these too:


Not tested and it does need some work. It was given to you as a start, not as a final solution.

Dec 15 '09 #4
290 100+
Thanks for suggesting that I Google it !!

( doh ! - (thats to myself !!) really, thanks .... it helped )

I found this, which I think is what I need:

Expand|Select|Wrap|Line Numbers
  1. //PHP Example: Automatically link URL's inside text.
  3. $text = preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $text);
Can anyone spot any problems with using this ?

PS - I saw this comment:
Great regex, thanks ! Small thing : the '-' (dash) is missing - URL like this fails (http://web5.uottawa.ca/admingov/regl...-methodes.html)
Where should I add the "-" ?

Dec 15 '09 #5
5,058 Expert 4TB
I would guess the comment means it should be:
Expand|Select|Wrap|Line Numbers
  1. $text = preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.\-]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $text);
This fixes the problem with the URL posted in the comment, at least.
Dec 15 '09 #6
290 100+
Hi again,

I tried that regex in my script but it is not doing anything.

This is my script:

It is taking the product details out of a database and displaying them.
The description often contains a url

Expand|Select|Wrap|Line Numbers
  1. while($row = mysql_fetch_assoc($result)){
  2.    extract($row);
  3.   $descript = preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.\-]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $descrip);
  4.   $extr = $totearn-$earn;                
  5.   $disp1 = "<b><span style = \"color:maroon;\">$Rctr) $cat</span><br><span style = \"color:darkblue;\">$title</span> ID: $id</b><br>";
  6.   $disp2 = "$descript<br>";
  7.   $disp3 = "<b>$/sale: $earn, +/mth: $rebill, Pop: $pop, Gravity: $grav, Comm%: $comm, Refers: $refer, Total Earn: $totearn</b>";
  8.   echo "<div style=\"width: 800px; text-align: left;\">
  9.   <span>$disp1</span>
  10.   <span>$disp2</span>
  11.   <span>$disp3</span>
  12.  <br><br>
  13.  </div>";
the result of this can be see live here:

When you see the form just click on the "Analyze Clickbank Now" button
and you will see a list of results.

The results have urls - but none of them are converted :(

Have I done something wrong ?


Just a thought - does that regex only work on https urls?
If so, how do I make it work on all ?
Dec 15 '09 #7
1,044 Expert 1GB

What I was saying is that your regex also matches this:

You made the slash after the domain optional.
Dec 15 '09 #8
1,044 Expert 1GB

Firstly, there's no need to surround the entire regex in parentheses. The full regex match exists in $0.

Secondly, your current regex requires a http/https protocol. To make it optional, surround the protocol in parentheses and add a question mark after it.

Thirdly, the domain portion of your regex allows for 1 character TLDs. It also allows for TLDs with dashes. These do not exist. Change "([-\w\.]+)+" to "([-\w]\.)+(\w{2,})". This gives you multiple strings followed by periods, and then a 2 or more character string (without dashes).

The rest looks fine from here.
Dec 15 '09 #9
290 100+

Thanks for your help,

I tried to follow what you said but I get this error:

Warning: preg_replace() [function.preg-replace]: Compilation failed: missing terminating ] for character class at offset 68 in /home/guru54gt5/public_html/sys/cb_search.php on line 208
this is the regex:
$descript = preg_replace('@ ( (https?://)? ([-\w]\.)+(\w{2,})+(:\d+)?(/([\w/_\.\]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $descrip);
Can you see where I have gone wrong ?
Dec 15 '09 #10
8,658 Expert Mod 8TB
the character class opened at offset 44 should be closed at offset 52, but its closing brace is escaped, holding the character class open.
Dec 15 '09 #11
290 100+
Again, thanks for the input,

I have changed the expression to:
$descript = preg_replace('@ ( (https?://)? ([-\w]\.)+(\w{2,})+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $descrip);
Now it doesn't error - but it does not convert the urls to
hyperlinks either :(

The result can be seen here: clickbank tool

When you click on the big button, you get a list

You see on the list in position 2 there is a url:
Expand|Select|Wrap|Line Numbers
  1. Http://www.conversational-hypnosis.com/affiliate.php.
and then in position 4 another one:
Expand|Select|Wrap|Line Numbers
  1. At MaverickMoneyMakers.com/Bonus.
and position 6:
Expand|Select|Wrap|Line Numbers
  1. www.affilorama.com/affiliates
None of them got converted.
Dec 15 '09 #12
1,044 Expert 1GB
Firstly, get rid of the whitespace in the regex. You need to explicitly tell it to ignore whitespace if you don't want it to interpret the whitespace as \s characters.

Secondly, I made a mistake in my regex correction. Change "([-\w]\.)" to "([-\w]+\.)". Also, I just noticed that your regex doesn't allow dashes anywhere but in the domain. Why is that?
Dec 15 '09 #13
290 100+

I got rid of the spaces.

and added that "+"

I also added a couple of "-" s

so now I have :
$descript = preg_replace('@((https?://)?([-\w]+\.)+(-\w{2,})+(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $descrip);
But alas, no improvement
Dec 15 '09 #14
1,044 Expert 1GB
... Take the dash out of the TLD. The way you wrote it requires that the first character is a dash, and TLDs don't even have dashes at all.

Also, are you aware that you use $descript and $descrip?
Dec 15 '09 #15
290 100+
Is the TLD this bit ?


Sorry but I find it difficult to read this stuff !

BTW does this regex cope with .co.uk type of urls?

Yes, I did know about the discript difference, thanks
Dec 15 '09 #16
1,044 Expert 1GB
It should handle URLs with multiple TLDs.

And it'd be a good idea to take a regex crash course online. And download The Regex Coach. It makes regex easier to understand, as you get to see it applied, live.
Dec 15 '09 #17
290 100+
Thanks for your help.

I have gone through a couple of crash course regex, but the
examples were all pretty basic compared to this.

I have loaded The Regex Coach

and have copied in the expreesion I am working on, but
it does not pick up any matchs.

I tried stepping through but got nothing.

Maybe if I break it down, you can help me where I do not
have the right understanding?

Target: http://www.fred.blogspot.co.uk/confirm.php?a=56
The @ is the opener

The first ? means that the s is optional,
The last ? means that the whole thing is optional
The parentheses mean the element is collected up as $1

Question - if it is optional and doesn't exist is $1 = null ?

In target : http://www.fred.blogspot.co.uk/confirm.php?a=56
the [] are grouping the - and w(any alphnumeric)
the + sign means one or more or the grouped items
then a period and must be one or more of these

The parentheses mean the element is collected up as $2

Question - if there is more than more, will it automatically
know to use $3 and $4 etc. ?
In target : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $2
and: : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $3
and: : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $4
and: : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $5
This looks wrong - surely there should be a grouping here ?
like this:
Now it will be saying that there should be a least 2 characters
And there must be one or more off these

The parentheses mean the element is collected up as $6

In target : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $6
So this is the port number - optional , if present ------ $7

This is for directories and passed variables.
The widest parentheses and last ? she the whole lot is optional
The / is optional
Then zero or more of any alpha numeric or -/_\. - for a sub-directory optional
(second to last ? )

I do not see why the period is escaped as it is in the group there is no need.

The escaped ? must be for passed variables which are optional
(third to last ? )

Question: what is the capital S for ? small s is for any whitespace

Question: I can not tell if these are being collected in a variable
it looks like they might not be as all the parentheses seem to be grouping.

In target : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $6
So do I need to do this ?
Expand|Select|Wrap|Line Numbers
  1. $descrip = preg_replace('@((https?://)?([-\w]+\.)+(-\w{2,})+(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)?)@', '<a href="$1$2$3$4$5$6$7">$1$2$3$4$5$6$7</a>', $descrip);
Would really appreciate some more help

Dec 18 '09 #18
1,044 Expert 1GB
The "@" is an arbitrary delimiting character. This is only necessary in languages that are using regex (i.e. PHP), not in The Regex Coach. Delimiters are needed so that you can have modifiers after the string (i.e. "i" makes your pattern case-insensitive). The modifiers are in The Regex Coach on the right as checkboxes.

It will be an empty string.

No. They will overwrite each other as $2, so only the last one will be in $2. However, the entire thing is still included in $0.

This is the TLD. I told you to get rid of the dash.

You are correct. Take that up with the place you got it from. ;)

Uppercase "S" is equivalent to "[^s]".

The parentheses indicate the variable that they will exist under, not the number of matches. So it will always be static.

Only use "$0".
Dec 18 '09 #19
290 100+
Thanks for your reply.

I will go through this a hopefully get it working :)
Dec 19 '09 #20
290 100+
OK I have my regex now
( and I understand it :) )

but I need to insert a conditional statement and I am not sure of the best way to do this.

I am using preg_replace on the text

so I have:

Expand|Select|Wrap|Line Numbers
  1. while($row = mysql_fetch_assoc($result)){
  2. extract($row);
  3. $descrip = preg_replace('@((https?://)?([-\w]+\.)+(\w{2,})+?(/([-\w/_.]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $descrip);
  4. echo "<span>$descrip</span>";
Now the problem is, I want to check to see if the http has been included and if not, then I have to add it.

(actually I need to check for https:// or http://
or none of them. )

In my testing page I used this:

Expand|Select|Wrap|Line Numbers
  1. $subject = $regex;
  2. $pattern = "@((https?://)?([-\w]+\.)+(\w{2,})+?(/([-\w/_.]*(\?\S+)?)?)?)@";
  3. $result = preg_match($pattern, $subject, $matches);
  5. if($result == "1") {
  6.    $new_url = $matches[0];
  7.    $tinyurl  =substr($new_url,0,4);
  8.    $shorturl  =substr($new_url,0,7);
  9.    $shorturl_s  =substr($new_url,0,8);
  11.    if ($tinyurl == "http") {
  12.      if ($shorturl == "http://") {
  13.        $longurl = substr($new_url,7);
  14.         echo "<br><br>URL:  <a href=\"http://$longurl\">$longurl<a>";
  15.         }
  16.      if ($shorturl_s == "https://") {
  17.        $longurl = substr($new_url,8);
  18.        echo "<br><br>URL:  <a href=\"https://$longurl\">$longurl<a>";
  19.        }
  20.  }
  21.  else {
  22.     echo "<br><br>URL:  <a href=\"http://$new_url\">$new_url<a>";
  23.  }        
  24. }
  25.   else {
  26.      echo "<br><br>No Matches";
  27.   }                                
But how can I put this condition into my preg_replace ?

If I can not do it - what do you think is the best way to
do accomplish this ?

Anu suggestions ?
Dec 21 '09 #21
1,044 Expert 1GB
Do a preg_replace on the final data, finding all links without a protocol and add it.

Expand|Select|Wrap|Line Numbers
  1. $data = preg_replace('~<a href="(?!https?://~', '\\0http://', $data);
The "?!" means "does not match." It applies to everything in its parentheses.
Dec 21 '09 #22
290 100+
Thanks for that idea.

What does the \\0 in '\\0http://' mean ?
Dec 21 '09 #23
1,044 Expert 1GB
"\\0" in PHP is equal to "\0" to the regex engine. In regex, \0 is the same as $0.
Dec 21 '09 #24
290 100+

I get an error:

Warning: preg_replace() [function.preg-replace]: Compilation failed: missing ) at offset 21 in /home/guru54gt5/public_html/sys/cb_search.php on line 209

So I guess this needs a closing parentheses added:
$data = preg_replace('~<a href="(?!https?://~', '\\0http://', $data);
Dec 22 '09 #25
1,044 Expert 1GB
Then add it? lol.
Right before the end delimiter.
Dec 22 '09 #26
290 100+
Are you sure I should add it ?

I mean really, really sure ?


( I thought you might say YES )

So I did , and it works !!


Just one thing, I want to make this link open in a new window,
so I need to add a target= "_blank".

Do I need to do another preg_replace or can I squeeze it into one of the two that I now have ?
Dec 22 '09 #27
1,044 Expert 1GB
Add it to the second parameter of the first preg_replace() when you make the initial element.
Dec 22 '09 #28
290 100+
Done that.

Thanks again for your help. :)
Dec 22 '09 #29

Sign in to post your reply or Sign up for a free account.

Similar topics

by: Danny | last post by:
I am trying to do a regular expression to search for a url so anything that has http:\\www.hellothere.com but may not have the http:\\ and may...
by: Ali-R | last post by:
Hi all, I am getting a CSV file like this from our client: "C1","2","12344","Mr","John","Chan","05/07/1976"......... I need to validate **each...
by: Rudy | last post by:
Hello all! I been trying to get a handle with Images. I have learned alot from the fine people here. So, I also learned that thumbnail images...
by: joey.powell | last post by:
I have run into a problem with one of my aspx pages. When I run the page, I get a "Server Tag Is Not Well Formed" error. This message goes away...
by: steve | last post by:
here's the deal...cvs, tick encapsulted data. trying to use regex's to validate records. here's an example row: ...
by: Chris Lieb | last post by:
I am trying to write a regex that will parse BBcode into HTML using JavaScript. Everything was going smoothly using the string class replace()...
by: Xah Lee | last post by:
The Concepts and Confusions of Pre-fix, In-fix, Post-fix and Fully Functional Notations Xah Lee, 2006-03-15 Let me summarize: The LISP...
by: netventuresmarketing | last post by:
Ok so you have already figured out with a name like netventuresmarketing that Im a Internet marketing type trying to teach myself some simple PHP. I...
by: seberino | last post by:
I'm looking over the docs for the re module and can't find how to "NOT" an entire regex. For example..... How make regex that means "contains...
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and...
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge...
by: jalbright99669 | last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made...
by: Arjunsri | last post by:
I have a Redshift database that I need to use as an import data source. I have configured the DSN connection using the server, port, database, and...
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific...
by: Matthew3360 | last post by:
Hi, I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web...
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the...
by: BLUEPANDA | last post by:
At BluePanda Dev, we're passionate about building high-quality software and sharing our knowledge with the community. That's why we've created a SaaS...
by: Rahul1995seven | last post by:
Introduction: In the realm of programming languages, Python has emerged as a powerhouse. With its simplicity, versatility, and robustness, Python...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.