Hello,
I am trying to craft a regular expression to filter an URL from a <a
href=""></a> tag and the one I have doesn't seen right.
I use the regular expression from this snippet of code:
foreach my $message (@messages)
{
my @match=($message->decoded=~/\bhref="(http.*)">.*/gi);
foreach my $match(@match)
{
print $match,"\n";
}
}
but it doesn't lead to results that are exactly what I need. An excerpt of
what I get as an output looks like:
http://2%30%33.197.%3204.1%355/mout/
http://www.superrxsalesman.info/aff1/?mulish
http://www.superrxsalesman.info/aff1/?acme
http://www.superrxsalesman.info/aff1/?blister
http://www.superrxsalesman.info/aff1/?samba
http://www.superrxsalesman.info/aff1/?depot"><font color="#0033CC
http://www.superrxsalesman.info/aff1/?procter"><font color="#0033CC
http://www.superrxsalesman.info/aff1/?use"><font color="#0033CC
http://www.superrxsalesman.info/aff1/?butane"><font color="#0033CC
http://www.superrxsalesman.info/aff1/?fiche"><font color="#0033CC
The first 5 lines are exactly what I want but I don't understand why in the
following lines I get characters after and including ". I want basically to
keep what is in between the "" of the <href=""> tag.
Could anybody tell me what is wrong with my regular expression?
Thanks!
Charles
--
Charles-E. Nadeau Ph.D
http://radio.weblogs.com/0111823/