473,325 Members | 2,785 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,325 software developers and data experts.

how to use 2 patterns for preg_match_all function PHP

196 100+
Basically i am trying to understand and learn how to make a php spider (yes i know its not efficient but its only for a single website at a time).

Now the problem i am having is with the
Expand|Select|Wrap|Line Numbers
  1. preg_match_all ()
function, the specific use is ->

Expand|Select|Wrap|Line Numbers
  1. preg_match_all( "#href=\"(https?://[&=a-zA-Z0-9-_./]+)\"#si", $html2, $links2 );
  2.  
Now this bit of code will find all url's on a page that are like this -> "http://www.google.com"

BUT will not find url's like this -> "/main/index.html",

So to get around this i figured out the pattern for the second type of url ->

Expand|Select|Wrap|Line Numbers
  1. preg_match_all( "#href=\"(/[&=a-zA-Z0-9-_./]+)\"#si", $html1, $links1 );
  2.  
Now what i am trying to achieve is to join these to patterns into a single function or something that gets a similar results for example like this ->

Expand|Select|Wrap|Line Numbers
  1. preg_match_all( "#href=\"(https?://[&=a-zA-Z0-9-_./]+)\"#si" || "#href=\"(/[&=a-zA-Z0-9-_./]+)\"#si", $html1, $links1);
  2.  
Note i have tried this method of trying to use an or (||) operator to join the patterns, it DID NOT WORK.

So any help in figuring out how to get the 2 patterns into a single function would be great! note: that i am pretty bad at understanding how the patterns actually work.

Any help is greatly appreciated in advance, Thanks :D
Dec 4 '10 #1
1 1859
chazzy69
196 100+
its was painfull but i finally found a good tutorial on the use of 'regular expression' can be found at

-> 'http://www.tipsntutorials.com/tutorials/PHP/50'

Anyway here was the solution i managed to work out

->
Expand|Select|Wrap|Line Numbers
  1. preg_match_all( "#href=\"(((https?://)|(/))[&=a-zA-Z0-9-_./]+)\"#si", $html, $links );
  2.  
it was just a matter of using the OR (|) operator correctly, thanks for the help anyways :D
Dec 4 '10 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

4
by: Han | last post by:
Determining the pattern below has got my stumped. I have a page of HTML and need to find all occurrences of the following pattern: score=9999999999& The number shown can be 5-10 characters...
2
by: Han | last post by:
I'm wondering if someone can explain why the following works with preg_match_all, but not preg_match: $html = "product=3456789&" preg_match_all ("|product=(\d{5,10})&|i", $html, $out); $out...
3
by: Han | last post by:
I know this is possible (because preg can do almost anything!), but can't get a handle on the syntax. I have an HTML string: <font size="3"><a...
4
by: Fabian | last post by:
Hi all there, I have already tried asking for help a couple of days ago. I try to rephrase better my problem: I need to grab a webpage that looks like this: <td width=80 align=center...
2
by: kevinC | last post by:
Hello, I'm trying to parse out the properties of a class definition from a css file and am running into issues trying to write the reg. expression: h1 { font-family: Verdana, Arial,...
10
by: greatprovider | last post by:
i'm starting with a string such as "Na**3C**6H**5O**7*2H**20" im attempting to match all **\d+ ...once i can match all the double asterix \d i intend to wrap the \d in "<sub>" tags for display...
0
by: MrData | last post by:
Hi, I have a big problem with my recursive function. This function have the task to find all blocks in the template file out and put it recursive into a array. hierarchy of the blocks are...
0
by: dimpie | last post by:
I've created a form where visitors can input more than one value. They have to use a space as a delimiter. For example: 'mozart beethoven grieg'. And redundant spaces should be converted to one...
2
loriann
by: loriann | last post by:
hi, I have a problem with preg_match function returning empty arrays for my wonderful regexes. However, I can't see what I am doing wrong - maybe one of you could help? I'm loading the source...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.