473,499 Members | 1,614 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

preg_match_all

I am trying to take a web page and get all of the links. It almost
works, but I am missing a few links.
Here is what I am using.
preg_match_all('/href=[\"\']?([^\"\'>]*)[\"\']?[^>]*>(.*?)<\/a>/i',
$s,$matches,PREG_SET_ORDER);
It will not pick up links like this:

<a class="highlight" href="browse.php?region=West
+Tennessee&amp;zips=38115&amp;mgrp=13&amp;p=2">
<b>Next &gt;</b>
</a>
How do I get it to pickup hrefs like the one above?
Jun 2 '08 #1
3 3996
On Sat, 31 May 2008 16:49:30 +0200, Anthony Smith <mr******@hotmail.com>
wrote:
I am trying to take a web page and get all of the links. It almost
works, but I am missing a few links.
Here is what I am using.
preg_match_all('/href=[\"\']?([^\"\'>]*)[\"\']?[^>]*>(.*?)<\/a>/i',
$s,$matches,PREG_SET_ORDER);
It will not pick up links like this:

<a class="highlight" href="browse.php?region=West
+Tennessee&amp;zips=38115&amp;mgrp=13&amp;p=2">
<b>Next &gt;</b>
</a>
How do I get it to pickup hrefs like the one above?
Add the /s modifier
--
Rik Wasmus
....spamrun finished
Jun 2 '08 #2
Greetings, Rik Wasmus.
In reply to Your message dated Saturday, May 31, 2008, 19:08:16,
>I am trying to take a web page and get all of the links. It almost
works, but I am missing a few links.
Here is what I am using.
preg_match_all('/href=[\"\']?([^\"\'>]*)[\"\']?[^>]*>(.*?)<\/a>/i',
$s,$matches,PREG_SET_ORDER);
It will not pick up links like this:

<a class="highlight" href="browse.php?region=West
+Tennessee&amp;zips=38115&amp;mgrp=13&amp;p=2">
<b>Next &gt;</b>
</a>
How do I get it to pickup hrefs like the one above?
Add the /s modifier
That would work, after some deeper think about it...
But I wish to offer a bit different approach:

preg_match_all('#href=(?:([\"\'])([^\"\'>]\S*?)\1[^>]*|([^>\"\']+))>(.*?)</a>#is', $s, $matches, PREG_SET_ORDER);

It have one downside: your URL will be in (2) or (3) depends on the quotes
around URL.
So you must pull result with construction like

$url_link = empty($matches[N][3]) ? $matches[N][2] : $matches[N][3];
$url_text = $matches[N][4];
--
Sincerely Yours, AnrDaemon <an*******@freemail.ru>

Jun 27 '08 #3
Greetings, AnrDaemon.
In reply to Your message dated Wednesday, June 4, 2008, 23:00:34,
preg_match_all('#href=(?:([\"\'])([^\"\'>]\S*?)\1[^>]*|([^>\"\']+))>(.*?)</a>#is', $s, $matches, PREG_SET_ORDER);
Regexp should be spelled as
'#href=(?:([\"\'])([^\"\'>]\S*?)\1|([^>\"\'\s]+))[^>]*>(.*?)</a>#is'
--
Sincerely Yours, AnrDaemon <an*******@freemail.ru>

Jun 27 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
3785
by: Han | last post by:
Determining the pattern below has got my stumped. I have a page of HTML and need to find all occurrences of the following pattern: score=9999999999&amp; The number shown can be 5-10 characters...
2
4677
by: Han | last post by:
I'm wondering if someone can explain why the following works with preg_match_all, but not preg_match: $html = "product=3456789&amp;" preg_match_all ("|product=(\d{5,10})&amp;|i", $html, $out); $out...
3
7282
by: Han | last post by:
I know this is possible (because preg can do almost anything!), but can't get a handle on the syntax. I have an HTML string: <font size="3"><a...
5
6071
by: Han | last post by:
Using preg_match_all, I need to capture a list of first and last names plus an optional country code proceeding them. For example: <tr><td>AU</td><td>Jane Smith</td></tr>...
2
2371
by: Han | last post by:
The following pattern (which is one subpattern in a string of several) looks for the following $xxx,xxx.xx (with the dollar sign) or xxx,xxx.xx (space in replace of missing dollar sign) ...
0
1465
by: petrovitch | last post by:
While using the following loop to extract images from the google search engine I discovered that preg_match_all works much faster parsing small strings in a loop than extracting all of the urls at...
10
2018
by: greatprovider | last post by:
i'm starting with a string such as "Na**3C**6H**5O**7*2H**20" im attempting to match all **\d+ ...once i can match all the double asterix \d i intend to wrap the \d in "<sub>" tags for display...
1
1929
by: ngmr80 | last post by:
Hi, I'm experiencing a problem when trying to capture substrings with preg_match_all() from strings like "set('Hello','World')" using the following Regular Expression (PERL syntax): ...
6
2432
by: PaulB | last post by:
Hello, as a newbie I'm requesting some help in understanding the regular expression below preg_match_all("|<tr(.*)</tr>|U",$table,$rows); Would anybody please just run through...
2
8213
loriann
by: loriann | last post by:
hi, I have a problem with preg_match function returning empty arrays for my wonderful regexes. However, I can't see what I am doing wrong - maybe one of you could help? I'm loading the source...
0
7014
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7229
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
7395
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
1
4921
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
3108
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
3103
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1429
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
667
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
311
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.