Hi
In my text output file that I will display on my webpage
I have this included:
"Located At: Http://pggg-online.org/blog.htm "
and sometimes :
"found here www.example.org/james.htm "
What is the best way to change these into real
working links eg:
<a href='http://www.example.org/james.htm' target='_blank'>www.example.org/james.htm</a>
I guess that the best way is with a regular expression ?
Of course I have to make sure that I don't double up the http:// if it is already given
Thanks for any advice
28 3242
Here's what I have so far: - (http://)?(([-A-Z0-9]+\.)?[-A-Z0-9]+\.[A-Z]{2,10})/?([^\s])+(?=\s)
but if you check Google, there's plenty of URL regexps you can modify for your own use.
Dan
Why is the slash after the domain optional, but the rest of the stuff after it is not? o.O
@kovik
not sure what you mean, that regex will match example.com
and these too:
something.example.com
http://what.how.com/wow.html?#$%^&*something%20%30else
Not tested and it does need some work. It was given to you as a start, not as a final solution.
Dan
Hi
Thanks for suggesting that I Google it !!
( doh ! - (thats to myself !!) really, thanks .... it helped )
I found this, which I think is what I need: - //PHP Example: Automatically link URL's inside text.
-
-
$text = preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $text);
Can anyone spot any problems with using this ?
PS - I saw this comment:
Great regex, thanks ! Small thing : the '-' (dash) is missing - URL like this fails (http://web5.uottawa.ca/admingov/regl...-methodes.html)
Where should I add the "-" ?
Thanks
Atli 5,058
Expert 4TB
I would guess the comment means it should be: -
$text = preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.\-]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $text);
This fixes the problem with the URL posted in the comment, at least.
Hi again,
I tried that regex in my script but it is not doing anything.
This is my script:
It is taking the product details out of a database and displaying them.
The description often contains a url - while($row = mysql_fetch_assoc($result)){
-
extract($row);
-
$descript = preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.\-]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $descrip);
-
$extr = $totearn-$earn;
-
$disp1 = "<b><span style = \"color:maroon;\">$Rctr) $cat</span><br><span style = \"color:darkblue;\">$title</span> ID: $id</b><br>";
-
$disp2 = "$descript<br>";
-
$disp3 = "<b>$/sale: $earn, +/mth: $rebill, Pop: $pop, Gravity: $grav, Comm%: $comm, Refers: $refer, Total Earn: $totearn</b>";
-
echo "<div style=\"width: 800px; text-align: left;\">
-
<span>$disp1</span>
-
<span>$disp2</span>
-
<span>$disp3</span>
-
<br><br>
-
</div>";
the result of this can be see live here: script-test
When you see the form just click on the "Analyze Clickbank Now" button
and you will see a list of results.
The results have urls - but none of them are converted :(
Have I done something wrong ?
PS
Just a thought - does that regex only work on http s urls?
If so, how do I make it work on all ?
@dlite922
@dlite:
What I was saying is that your regex also matches this:
http://what.how.comwow.html?#$%^&*something%20%30else
You made the slash after the domain optional.
@jeddiki:
Firstly, there's no need to surround the entire regex in parentheses. The full regex match exists in $0.
Secondly, your current regex requires a http/https protocol. To make it optional, surround the protocol in parentheses and add a question mark after it.
Thirdly, the domain portion of your regex allows for 1 character TLDs. It also allows for TLDs with dashes. These do not exist. Change "([-\w\.]+)+" to "([-\w]\.)+(\w{2,})". This gives you multiple strings followed by periods, and then a 2 or more character string (without dashes).
The rest looks fine from here.
Hi,
Thanks for your help,
I tried to follow what you said but I get this error:
Warning: preg_replace() [function.preg-replace]: Compilation failed: missing terminating ] for character class at offset 68 in /home/guru54gt5/public_html/sys/cb_search.php on line 208
this is the regex:
$descript = preg_replace('@ ( (https?://)? ([-\w]\.)+(\w{2,})+(:\d+)?(/([\w/_\.\]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $descrip);
Can you see where I have gone wrong ?
the character class opened at offset 44 should be closed at offset 52, but its closing brace is escaped, holding the character class open.
Again, thanks for the input,
I have changed the expression to:
$descript = preg_replace('@ ( (https?://)? ([-\w]\.)+(\w{2,})+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $descrip);
Now it doesn't error - but it does not convert the urls to
hyperlinks either :(
The result can be seen here: clickbank tool
When you click on the big button, you get a list
You see on the list in position 2 there is a url: - Http://www.conversational-hypnosis.com/affiliate.php.
and then in position 4 another one: - At MaverickMoneyMakers.com/Bonus.
and position 6: - www.affilorama.com/affiliates
None of them got converted.
Firstly, get rid of the whitespace in the regex. You need to explicitly tell it to ignore whitespace if you don't want it to interpret the whitespace as \s characters.
Secondly, I made a mistake in my regex correction. Change "([-\w]\.)" to "([-\w]+\.)". Also, I just noticed that your regex doesn't allow dashes anywhere but in the domain. Why is that?
OK,
I got rid of the spaces.
and added that "+"
I also added a couple of "-" s
so now I have :
$descript = preg_replace('@((https?://)?([-\w]+\.)+(-\w{2,})+(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $descrip);
But alas, no improvement
... Take the dash out of the TLD. The way you wrote it requires that the first character is a dash, and TLDs don't even have dashes at all.
Also, are you aware that you use $descript and $descrip?
Is the TLD this bit ? (/([-\w/_\.]*
Sorry but I find it difficult to read this stuff !
BTW does this regex cope with .co.uk type of urls?
Yes, I did know about the discript difference, thanks
It should handle URLs with multiple TLDs.
And it'd be a good idea to take a regex crash course online. And download The Regex Coach. It makes regex easier to understand, as you get to see it applied, live.
Thanks for your help.
I have gone through a couple of crash course regex, but the
examples were all pretty basic compared to this.
I have loaded The Regex Coach
and have copied in the expreesion I am working on, but
it does not pick up any matchs. 
I tried stepping through but got nothing.
Maybe if I break it down, you can help me where I do not
have the right understanding? Target: http://www.fred.blogspot.co.uk/confirm.php?a=56 @((https?://)?
The @ is the opener (https?://)?
The first ? means that the s is optional,
The last ? means that the whole thing is optional
The parentheses mean the element is collected up as $1 Question - if it is optional and doesn't exist is $1 = null ?
In target : http://www.fred.blogspot.co.uk/confirm.php?a=56 ([-\w]+\.)+
the [] are grouping the - and w(any alphnumeric)
the + sign means one or more or the grouped items
then a period and must be one or more of these
The parentheses mean the element is collected up as $2 Question - if there is more than more, will it automatically
know to use $3 and $4 etc. ?
In target : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $2
and: : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $3
and: : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $4
and: : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $5
(-\w{2,})+
This looks wrong - surely there should be a grouping here ?
like this: ([-\w]{2,})+
Now it will be saying that there should be a least 2 characters
And there must be one or more off these
The parentheses mean the element is collected up as $6
In target : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $6
(:\d+)?
So this is the port number - optional , if present ------ $7 (/([-\w/_\.]*(\?\S+)?)?)?)
This is for directories and passed variables.
The widest parentheses and last ? she the whole lot is optional
The / is optional
Then zero or more of any alpha numeric or -/_\. - for a sub-directory optional
(second to last ? )
I do not see why the period is escaped as it is in the group there is no need.
The escaped ? must be for passed variables which are optional
(third to last ? ) Question: what is the capital S for ? small s is for any whitespace Question: I can not tell if these are being collected in a variable
it looks like they might not be as all the parentheses seem to be grouping.
In target : http://www.fred.blogspot.co.uk/confirm.php?a=56 ------ $6
So do I need to do this ? -
$descrip = preg_replace('@((https?://)?([-\w]+\.)+(-\w{2,})+(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)?)@', '<a href="$1$2$3$4$5$6$7">$1$2$3$4$5$6$7</a>', $descrip);
-
Would really appreciate some more help
Thanks
@jeddiki
The "@" is an arbitrary delimiting character. This is only necessary in languages that are using regex (i.e. PHP), not in The Regex Coach. Delimiters are needed so that you can have modifiers after the string (i.e. "i" makes your pattern case-insensitive). The modifiers are in The Regex Coach on the right as checkboxes. @jeddiki
It will be an empty string. @jeddiki
No. They will overwrite each other as $2, so only the last one will be in $2. However, the entire thing is still included in $0. @jeddiki
This is the TLD. I told you to get rid of the dash. @jeddiki
You are correct. Take that up with the place you got it from. ;) @jeddiki
Uppercase "S" is equivalent to "[^s]". @jeddiki
The parentheses indicate the variable that they will exist under, not the number of matches. So it will always be static. @jeddiki
Only use "$0".
Thanks for your reply.
I will go through this a hopefully get it working :)
OK I have my regex now
( and I understand it :) )
but I need to insert a conditional statement and I am not sure of the best way to do this.
I am using preg_replace on the text
so I have: - while($row = mysql_fetch_assoc($result)){
-
extract($row);
-
$descrip = preg_replace('@((https?://)?([-\w]+\.)+(\w{2,})+?(/([-\w/_.]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $descrip);
-
echo "<span>$descrip</span>";
-
Now the problem is, I want to check to see if the http has been included and if not, then I have to add it.
(actually I need to check for https:// or http://
or none of them. ) In my testing page I used this: -
$subject = $regex;
-
$pattern = "@((https?://)?([-\w]+\.)+(\w{2,})+?(/([-\w/_.]*(\?\S+)?)?)?)@";
-
$result = preg_match($pattern, $subject, $matches);
-
-
if($result == "1") {
-
$new_url = $matches[0];
-
$tinyurl =substr($new_url,0,4);
-
$shorturl =substr($new_url,0,7);
-
$shorturl_s =substr($new_url,0,8);
-
-
if ($tinyurl == "http") {
-
if ($shorturl == "http://") {
-
$longurl = substr($new_url,7);
-
echo "<br><br>URL: <a href=\"http://$longurl\">$longurl<a>";
-
}
-
if ($shorturl_s == "https://") {
-
$longurl = substr($new_url,8);
-
echo "<br><br>URL: <a href=\"https://$longurl\">$longurl<a>";
-
}
-
}
-
else {
-
echo "<br><br>URL: <a href=\"http://$new_url\">$new_url<a>";
-
}
-
}
-
else {
-
echo "<br><br>No Matches";
-
}
-
But how can I put this condition into my preg_replace ?
If I can not do it - what do you think is the best way to
do accomplish this ?
Anu suggestions ?
Do a preg_replace on the final data, finding all links without a protocol and add it. - $data = preg_replace('~<a href="(?!https?://~', '\\0http://', $data);
The "?!" means "does not match." It applies to everything in its parentheses.
Thanks for that idea.
What does the \\0 in '\\0http://' mean ?
"\\0" in PHP is equal to "\0" to the regex engine. In regex, \0 is the same as $0.
Thanks,
I get an error:
Warning: preg_replace() [function.preg-replace]: Compilation failed: missing ) at offset 21 in /home/guru54gt5/public_html/sys/cb_search.php on line 209
So I guess this needs a closing parentheses added: $data = preg_replace('~<a href="(?!https?://~', '\\0http://', $data);
Then add it? lol.
Right before the end delimiter.
Are you sure I should add it ?
I mean really, really sure ? ;-)
( I thought you might say YES )
So I did , and it works !!
Thanks.
Just one thing, I want to make this link open in a new window,
so I need to add a target= "_blank".
Do I need to do another preg_replace or can I squeeze it into one of the two that I now have ?
Add it to the second parameter of the first preg_replace() when you make the initial element.
Done that.
Thanks again for your help. :)
Sign in to post your reply or Sign up for a free account.
Similar topics
by: Danny |
last post by:
I am trying to do a regular expression to search for
a url
so anything that has http:\\www.hellothere.com
but may not have the http:\\
and may...
|
by: Ali-R |
last post by:
Hi all,
I am getting a CSV file like this from our client:
"C1","2","12344","Mr","John","Chan","05/07/1976".........
I need to validate **each...
|
by: Rudy |
last post by:
Hello all!
I been trying to get a handle with Images. I have learned alot from the fine
people here. So, I also learned that thumbnail images...
|
by: joey.powell |
last post by:
I have run into a problem with one of my aspx pages. When I run the
page, I get a "Server Tag Is Not Well Formed" error. This message goes
away...
|
by: steve |
last post by:
here's the deal...cvs, tick encapsulted data. trying to use regex's to
validate records. here's an example row:
...
|
by: Chris Lieb |
last post by:
I am trying to write a regex that will parse BBcode into HTML using
JavaScript. Everything was going smoothly using the string class
replace()...
|
by: Xah Lee |
last post by:
The Concepts and Confusions of Pre-fix, In-fix, Post-fix and Fully
Functional Notations
Xah Lee, 2006-03-15
Let me summarize: The LISP...
|
by: netventuresmarketing |
last post by:
Ok so you have already figured out with a name like netventuresmarketing that Im a Internet marketing type trying to teach myself some simple PHP. I...
|
by: seberino |
last post by:
I'm looking over the docs for the re module and can't find how to
"NOT" an entire regex.
For example.....
How make regex that means "contains...
|
by: Kemmylinns12 |
last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and...
|
by: Naresh1 |
last post by:
What is WebLogic Admin Training?
WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge...
|
by: jalbright99669 |
last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made...
|
by: Arjunsri |
last post by:
I have a Redshift database that I need to use as an import data source. I have configured the DSN connection using the server, port, database, and...
|
by: WisdomUfot |
last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific...
|
by: Matthew3360 |
last post by:
Hi,
I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web...
|
by: Oralloy |
last post by:
Hello Folks,
I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA.
My problem (spelled failure) is with the...
|
by: BLUEPANDA |
last post by:
At BluePanda Dev, we're passionate about building high-quality software and sharing our knowledge with the community. That's why we've created a SaaS...
|
by: Rahul1995seven |
last post by:
Introduction:
In the realm of programming languages, Python has emerged as a powerhouse. With its simplicity, versatility, and robustness, Python...
| |