469,331 Members | 1,479 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,331 developers. It's quick & easy.

Regexp to match an URL in an HTML <a href=""></a> tag

Hello,

I am trying to craft a regular expression to filter an URL from a <a
href=""></a> tag and the one I have doesn't seen right.
I use the regular expression from this snippet of code:

foreach my $message (@messages)
{
my @match=($message->decoded=~/\bhref="(http.*)">.*/gi);

foreach my $match(@match)
{
print $match,"\n";
}

}

but it doesn't lead to results that are exactly what I need. An excerpt of
what I get as an output looks like:

http://2%30%33.197.%3204.1%355/mout/
http://www.superrxsalesman.info/aff1/?mulish
http://www.superrxsalesman.info/aff1/?acme
http://www.superrxsalesman.info/aff1/?blister
http://www.superrxsalesman.info/aff1/?samba
http://www.superrxsalesman.info/aff1/?depot"><font color="#0033CC
http://www.superrxsalesman.info/aff1/?procter"><font color="#0033CC
http://www.superrxsalesman.info/aff1/?use"><font color="#0033CC
http://www.superrxsalesman.info/aff1/?butane"><font color="#0033CC
http://www.superrxsalesman.info/aff1/?fiche"><font color="#0033CC

The first 5 lines are exactly what I want but I don't understand why in the
following lines I get characters after and including ". I want basically to
keep what is in between the "" of the <href=""> tag.
Could anybody tell me what is wrong with my regular expression?
Thanks!

Charles

--
Charles-E. Nadeau Ph.D
http://radio.weblogs.com/0111823/
Jul 19 '05 #1
2 8554
Charles Nadeau wrote:
I am trying to craft a regular expression to filter an URL from a
<a href=""></a> tag and the one I have doesn't seen right. I use
the regular expression from this snippet of code:

foreach my $message (@messages)
{
my @match=($message->decoded=~/\bhref="(http.*)">.*/gi);

foreach my $match(@match)
{
print $match,"\n";
}

}

but it doesn't lead to results that are exactly what I need.


http://theoryx5.uwinnipeg.ca/CPAN/pe...ract_URLs.html

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Jul 19 '05 #2

"Charles Nadeau" <ch***********@hotmail.com> wrote in message
news:bp***********@nwall1.odn.ne.jp...
Hello,

I am trying to craft a regular expression to filter an URL from a <a
href=""></a> tag and the one I have doesn't seen right.
I use the regular expression from this snippet of code:

foreach my $message (@messages)
{
my @match=($message->decoded=~/\bhref="(http.*)">.*/gi);

foreach my $match(@match)
{
print $match,"\n";
}

}

but it doesn't lead to results that are exactly what I need. An excerpt of
what I get as an output looks like:

http://2%30%33.197.%3204.1%355/mout/
http://www.superrxsalesman.info/aff1/?mulish
http://www.superrxsalesman.info/aff1/?acme
http://www.superrxsalesman.info/aff1/?blister
http://www.superrxsalesman.info/aff1/?samba
http://www.superrxsalesman.info/aff1/?depot"><font color="#0033CC
http://www.superrxsalesman.info/aff1/?procter"><font color="#0033CC
http://www.superrxsalesman.info/aff1/?use"><font color="#0033CC
http://www.superrxsalesman.info/aff1/?butane"><font color="#0033CC
http://www.superrxsalesman.info/aff1/?fiche"><font color="#0033CC

The first 5 lines are exactly what I want but I don't understand why in the following lines I get characters after and including ". I want basically to keep what is in between the "" of the <href=""> tag.
Could anybody tell me what is wrong with my regular expression?
Thanks!

Charles

--
Charles-E. Nadeau Ph.D
http://radio.weblogs.com/0111823/


Use a ? to perform a non-greedy match ie:

my @match=($message->decoded=~/\bhref="(http.*?)">.*/gi);

Should work, though I've not tested it.

Andy R
Jul 19 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

6 posts views Thread by lecichy | last post: by
1 post views Thread by Christian Schmidbauer | last post: by
3 posts views Thread by Jim Carlock | last post: by
10 posts views Thread by Gernot Frisch | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by suresh191 | last post: by
reply views Thread by Purva khokhar | last post: by
1 post views Thread by haryvincent176 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.