Greetings, Rik Wasmus.
In reply to Your message dated Saturday, May 31, 2008, 19:08:16,
>I am trying to take a web page and get all of the links. It almost
works, but I am missing a few links.
Here is what I am using.
preg_match_all('/href=[\"\']?([^\"\'>]*)[\"\']?[^>]*>(.*?)<\/a>/i',
$s,$matches,PREG_SET_ORDER);
It will not pick up links like this:
<a class="highlight" href="browse.php?region=West
+Tennessee&zips=38115&mgrp=13&p=2">
<b>Next ></b>
</a>
How do I get it to pickup hrefs like the one above?
Add the /s modifier
That would work, after some deeper think about it...
But I wish to offer a bit different approach:
preg_match_all('#href=(?:([\"\'])([^\"\'>]\S*?)\1[^>]*|([^>\"\']+))>(.*?)</a>#is', $s, $matches, PREG_SET_ORDER);
It have one downside: your URL will be in (2) or (3) depends on the quotes
around URL.
So you must pull result with construction like
$url_link = empty($matches[N][3]) ? $matches[N][2] : $matches[N][3];
$url_text = $matches[N][4];
--
Sincerely Yours, AnrDaemon <an*******@freemail.ru>