Connecting Tech Pros Worldwide Forums | Help | Site Map

need help with a regs expression

Yannick Benoit
Guest
 
Posts: n/a
#1: Sep 14 '05
Hi!
I use this to find all links in a page :

[:space:]*(href)[:space:]*=[:space:]*([^ >]+)

but recently i found out that it only works if links are setup this way: <a
href = "...

I would like to know if someone can help me find the proper expression which
would work with <a href = "... <a href= "... <a href="... and <a href ="...
its only a matter of spaces but i cant figure out how to make it work.

Thank you



romain.jouin@gmail.com
Guest
 
Posts: n/a
#2: Sep 14 '05

re: need help with a regs expression


EXPRESSION : $exp_reg ="[ ]*href[ ]*=[ ]*([^>]+)";


------------------------------------------------------
EXEMPLES:
------------------------------------------------------
CODE :
$exp_reg ="[ ]*href[ ]*=[ ]*([^>]+)";
unset ($reg);
ereg($exp_reg, '<a href = "./emp/coucou.html">', $reg);
print_r($reg);
echo "<br>";
unset ($reg);
ereg($exp_reg, '<a href= "./emp/coucou.html">', $reg);
print_r($reg);
echo "<br>";
unset ($reg);
ereg($exp_reg, '<a href ="./emp/coucou.html">', $reg);
print_r($reg);
echo "<br>";
unset ($reg);
ereg($exp_reg, '<a href="./emp/coucou.html">', $reg);
print_r($reg);
echo "<br>";
unset ($reg);
------------------------------------------------------
RESULT :
Array
(
[0] => href = "./emp/coucou.html"
[1] => "./emp/coucou.html"
)
<br>Array
(
[0] => href= "./emp/coucou.html"
[1] => "./emp/coucou.html"
)
<br>Array
(
[0] => href ="./emp/coucou.html"
[1] => "./emp/coucou.html"
)
<br>Array
(
[0] => href="./emp/coucou.html"
[1] => "./emp/coucou.html"
)




------------------------------------------------------
CODE :

$html = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<TITLE>Index of /~romain/compack</TITLE>
</HEAD>
<BODY>
<H1>Index of /~romain/compack</H1>
<IMG SRC="/icons/folder.gif" ALT="[DIR]"> <A HREF = "EMP2/">EMP2/</A>
12-Sep-2005 11:41 -

<IMG SRC="/icons/unknown.gif" ALT="[ ]"> <A HREF
="EMP2/cargar_datos.php">cargar_datos.php</A> 13-Sep-2005 11:15
3k
<IMG SRC="/icons/unknown.gif" ALT="[ ]"> <A HREF=
"EMP1/concepto.php">concepto.php</A> 13-Sep-2005 10:46
12k
<IMG SRC="/icons/unknown.gif" ALT="[ ]"> <A HREF =
"conceptos.php">conceptos.php</A> 08-Sep-2005 10:44 3k';



preg_match_all("([ ]*href[ ]*=[ ]*([^>]+))", strtolower($html), $regs,
PREG_SET_ORDER);




print_r($regs);

------------------------------------------------------
RESULT :

Array
(
[0] => Array
(
[0] => href = "emp2/"
[1] => "emp2/"
)

[1] => Array
(
[0] => href ="emp2/cargar_datos.php"
[1] => "emp2/cargar_datos.php"
)

[2] => Array
(
[0] => href= "emp1/concepto.php"
[1] => "emp1/concepto.php"
)

[3] => Array
(
[0] => href = "conceptos.php"
[1] => "conceptos.php"
)

)

************************************************** *****
hope it can help.
JR.
************************************************** *****








Yannick Benoit a écrit :
[color=blue]
> Hi!
> I use this to find all links in a page :
>
> [:space:]*(href)[:space:]*=[:space:]*([^ >]+)
>
> but recently i found out that it only works if links are setup this way: <a
> href = "...
>
> I would like to know if someone can help me find the proper expression which
> would work with <a href = "... <a href= "... <a href="... and <a href ="...
> its only a matter of spaces but i cant figure out how to make it work.
>
> Thank you[/color]

John Dunlop
Guest
 
Posts: n/a
#3: Sep 18 '05

re: need help with a regs expression


romain.jouin@gmail.com wrote:
[color=blue]
> Yannick Benoit a écrit :
>[color=green]
>>[:space:]*(href)[:space:]*=[:space:]*([^ >]+)[/color][/color]

A named character class, itself inside a character class, has to be
wrapped in '[:' and ':]'. So here you would have /[[:space:]]/. The
Manual does not document this for PCREs.

What you have at the moment, in a POSIX regular expression, are
ordinary character classes. (You'd get an error if you tried that in
a PCRE.) Since they don't match US-ASCII spaces, the only way the
pattern could match is if the subject string doesn't have spaces at
those points.
[color=blue]
> [ ]*href[ ]*=[ ]*([^>]+)[/color]

A character class consisting of a single character means the same if
you remove its square brackets. The classes are superfluous.

/ */ and /[[:space:]]*/ don't necessarily have the same meaning.
The latter likely matches the usual whitespace characters, the former
only US-ASCII spaces.

--
Jock
Closed Thread


Similar PHP bytes