473,320 Members | 1,920 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

preg_match_all

I am trying to take a web page and get all of the links. It almost
works, but I am missing a few links.
Here is what I am using.
preg_match_all('/href=[\"\']?([^\"\'>]*)[\"\']?[^>]*>(.*?)<\/a>/i',
$s,$matches,PREG_SET_ORDER);
It will not pick up links like this:

<a class="highlight" href="browse.php?region=West
+Tennessee&amp;zips=38115&amp;mgrp=13&amp;p=2">
<b>Next &gt;</b>
</a>
How do I get it to pickup hrefs like the one above?
Jun 2 '08 #1
3 3982
On Sat, 31 May 2008 16:49:30 +0200, Anthony Smith <mr******@hotmail.com>
wrote:
I am trying to take a web page and get all of the links. It almost
works, but I am missing a few links.
Here is what I am using.
preg_match_all('/href=[\"\']?([^\"\'>]*)[\"\']?[^>]*>(.*?)<\/a>/i',
$s,$matches,PREG_SET_ORDER);
It will not pick up links like this:

<a class="highlight" href="browse.php?region=West
+Tennessee&amp;zips=38115&amp;mgrp=13&amp;p=2">
<b>Next &gt;</b>
</a>
How do I get it to pickup hrefs like the one above?
Add the /s modifier
--
Rik Wasmus
....spamrun finished
Jun 2 '08 #2
Greetings, Rik Wasmus.
In reply to Your message dated Saturday, May 31, 2008, 19:08:16,
>I am trying to take a web page and get all of the links. It almost
works, but I am missing a few links.
Here is what I am using.
preg_match_all('/href=[\"\']?([^\"\'>]*)[\"\']?[^>]*>(.*?)<\/a>/i',
$s,$matches,PREG_SET_ORDER);
It will not pick up links like this:

<a class="highlight" href="browse.php?region=West
+Tennessee&amp;zips=38115&amp;mgrp=13&amp;p=2">
<b>Next &gt;</b>
</a>
How do I get it to pickup hrefs like the one above?
Add the /s modifier
That would work, after some deeper think about it...
But I wish to offer a bit different approach:

preg_match_all('#href=(?:([\"\'])([^\"\'>]\S*?)\1[^>]*|([^>\"\']+))>(.*?)</a>#is', $s, $matches, PREG_SET_ORDER);

It have one downside: your URL will be in (2) or (3) depends on the quotes
around URL.
So you must pull result with construction like

$url_link = empty($matches[N][3]) ? $matches[N][2] : $matches[N][3];
$url_text = $matches[N][4];
--
Sincerely Yours, AnrDaemon <an*******@freemail.ru>

Jun 27 '08 #3
Greetings, AnrDaemon.
In reply to Your message dated Wednesday, June 4, 2008, 23:00:34,
preg_match_all('#href=(?:([\"\'])([^\"\'>]\S*?)\1[^>]*|([^>\"\']+))>(.*?)</a>#is', $s, $matches, PREG_SET_ORDER);
Regexp should be spelled as
'#href=(?:([\"\'])([^\"\'>]\S*?)\1|([^>\"\'\s]+))[^>]*>(.*?)</a>#is'
--
Sincerely Yours, AnrDaemon <an*******@freemail.ru>

Jun 27 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Han | last post by:
Determining the pattern below has got my stumped. I have a page of HTML and need to find all occurrences of the following pattern: score=9999999999&amp; The number shown can be 5-10 characters...
2
by: Han | last post by:
I'm wondering if someone can explain why the following works with preg_match_all, but not preg_match: $html = "product=3456789&amp;" preg_match_all ("|product=(\d{5,10})&amp;|i", $html, $out); $out...
3
by: Han | last post by:
I know this is possible (because preg can do almost anything!), but can't get a handle on the syntax. I have an HTML string: <font size="3"><a...
5
by: Han | last post by:
Using preg_match_all, I need to capture a list of first and last names plus an optional country code proceeding them. For example: <tr><td>AU</td><td>Jane Smith</td></tr>...
2
by: Han | last post by:
The following pattern (which is one subpattern in a string of several) looks for the following $xxx,xxx.xx (with the dollar sign) or xxx,xxx.xx (space in replace of missing dollar sign) ...
0
by: petrovitch | last post by:
While using the following loop to extract images from the google search engine I discovered that preg_match_all works much faster parsing small strings in a loop than extracting all of the urls at...
10
by: greatprovider | last post by:
i'm starting with a string such as "Na**3C**6H**5O**7*2H**20" im attempting to match all **\d+ ...once i can match all the double asterix \d i intend to wrap the \d in "<sub>" tags for display...
1
by: ngmr80 | last post by:
Hi, I'm experiencing a problem when trying to capture substrings with preg_match_all() from strings like "set('Hello','World')" using the following Regular Expression (PERL syntax): ...
6
by: PaulB | last post by:
Hello, as a newbie I'm requesting some help in understanding the regular expression below preg_match_all("|<tr(.*)</tr>|U",$table,$rows); Would anybody please just run through...
2
loriann
by: loriann | last post by:
hi, I have a problem with preg_match function returning empty arrays for my wonderful regexes. However, I can't see what I am doing wrong - maybe one of you could help? I'm loading the source...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.