473,472 Members | 1,760 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Regexp to match an URL in an HTML <a href=""></a> tag

Hello,

I am trying to craft a regular expression to filter an URL from a <a
href=""></a> tag and the one I have doesn't seen right.
I use the regular expression from this snippet of code:

foreach my $message (@messages)
{
my @match=($message->decoded=~/\bhref="(http.*)">.*/gi);

foreach my $match(@match)
{
print $match,"\n";
}

}

but it doesn't lead to results that are exactly what I need. An excerpt of
what I get as an output looks like:

http://2%30%33.197.%3204.1%355/mout/
http://www.superrxsalesman.info/aff1/?mulish
http://www.superrxsalesman.info/aff1/?acme
http://www.superrxsalesman.info/aff1/?blister
http://www.superrxsalesman.info/aff1/?samba
http://www.superrxsalesman.info/aff1/?depot"><font color="#0033CC
http://www.superrxsalesman.info/aff1/?procter"><font color="#0033CC
http://www.superrxsalesman.info/aff1/?use"><font color="#0033CC
http://www.superrxsalesman.info/aff1/?butane"><font color="#0033CC
http://www.superrxsalesman.info/aff1/?fiche"><font color="#0033CC

The first 5 lines are exactly what I want but I don't understand why in the
following lines I get characters after and including ". I want basically to
keep what is in between the "" of the <href=""> tag.
Could anybody tell me what is wrong with my regular expression?
Thanks!

Charles

--
Charles-E. Nadeau Ph.D
http://radio.weblogs.com/0111823/
Jul 19 '05 #1
2 8743
Charles Nadeau wrote:
I am trying to craft a regular expression to filter an URL from a
<a href=""></a> tag and the one I have doesn't seen right. I use
the regular expression from this snippet of code:

foreach my $message (@messages)
{
my @match=($message->decoded=~/\bhref="(http.*)">.*/gi);

foreach my $match(@match)
{
print $match,"\n";
}

}

but it doesn't lead to results that are exactly what I need.


http://theoryx5.uwinnipeg.ca/CPAN/pe...ract_URLs.html

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Jul 19 '05 #2

"Charles Nadeau" <ch***********@hotmail.com> wrote in message
news:bp***********@nwall1.odn.ne.jp...
Hello,

I am trying to craft a regular expression to filter an URL from a <a
href=""></a> tag and the one I have doesn't seen right.
I use the regular expression from this snippet of code:

foreach my $message (@messages)
{
my @match=($message->decoded=~/\bhref="(http.*)">.*/gi);

foreach my $match(@match)
{
print $match,"\n";
}

}

but it doesn't lead to results that are exactly what I need. An excerpt of
what I get as an output looks like:

http://2%30%33.197.%3204.1%355/mout/
http://www.superrxsalesman.info/aff1/?mulish
http://www.superrxsalesman.info/aff1/?acme
http://www.superrxsalesman.info/aff1/?blister
http://www.superrxsalesman.info/aff1/?samba
http://www.superrxsalesman.info/aff1/?depot"><font color="#0033CC
http://www.superrxsalesman.info/aff1/?procter"><font color="#0033CC
http://www.superrxsalesman.info/aff1/?use"><font color="#0033CC
http://www.superrxsalesman.info/aff1/?butane"><font color="#0033CC
http://www.superrxsalesman.info/aff1/?fiche"><font color="#0033CC

The first 5 lines are exactly what I want but I don't understand why in the following lines I get characters after and including ". I want basically to keep what is in between the "" of the <href=""> tag.
Could anybody tell me what is wrong with my regular expression?
Thanks!

Charles

--
Charles-E. Nadeau Ph.D
http://radio.weblogs.com/0111823/


Use a ? to perform a non-greedy match ie:

my @match=($message->decoded=~/\bhref="(http.*?)">.*/gi);

Should work, though I've not tested it.

Andy R
Jul 19 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: lecichy | last post by:
Hello. Well, I found this piece of code on php.net. Thats fine but where can i find explanation for all these ("|\')?(*)("|\')?.*>(*)' syntax so that I can construct my own rules for all kind...
1
by: Christian Schmidbauer | last post by:
Hello! I prepare my XML document like this way: ------------------------------------------------------- PrintWriter writer; Document domDocument; Element domElement; // Root tag
13
by: Dan R Brown | last post by:
I have a large form that is generated dynamically in a jsp using xml / xslt. So, to break up this form into several "tabbed" sections, I break up the form using <div> tags. Each <div...
6
by: Tony Marston | last post by:
The code <a href="..." target="_blank">...</a> will not validate as XHTML STRICT because of the 'target' tag, so how do I achieve the same result by moving it to a CSS file? I cannot find anything...
10
by: Dieter Salath? | last post by:
Hi, in our webpage, a user could open a windows explorer to his temp directory with a simple link and usage of the file protocol: <a href="file://C:\temp" target="_blank">C:\temp</a> This...
3
by: Jim Carlock | last post by:
In creating a dynamic page with some static content, ie, the list of city names is standard HTML encoding. When I click on the link, I see the page reload itself instead of jumping to the content...
10
by: Gernot Frisch | last post by:
Hi, I'm currently writing: <span onclick="window.open(...);">Klick Here</span> but I want to use the <a href> for this, since it is defined in the css script the way I want my link to open....
19
by: FAQ server | last post by:
----------------------------------------------------------------------- FAQ Topic - I have <a href="javascript:somefunction()"what ... ?...
1
by: mark4asp | last post by:
<form runat="server"automatically adds <divtag to code contained within. Is there a way to stop that? Mixing block-level elements with inline-level elements messes up the HTML becasuse that is...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
0
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.