Connecting Tech Pros Worldwide Forums | Help | Site Map

Could do with a little help with this preg_match()

Familiar Sight
 
Join Date: Jan 2009
Posts: 165
#1: 2 Weeks Ago
Hi,
I am wanting to do a simple extraction of the
three key header elements from a web page namely these:


Quote:
<title>This the Title</title>
<meta name="keywords" content="PHP, javascript, other keywords" />
<meta name="description" content="This is the description." />
Is the preg_match() function the best way to find them and put them into variables ?

If they are not found of the web page I would like to fill the relevant variable with "Not found".

I have wriiten this code but I am not sure if it is the best approach or if the logic is correct.

Expand|Select|Wrap|Line Numbers
  1. $title = preg_match("/<title>(.*?)</title>/",$text,$matches);
  2. if ($title === false) {
  3.    $title = "None found";
  4.    }
  5.  
  6. $descrip = preg_match("/<meta name=\"description\" content=\"(.*?)\"/",$text,$matches);
  7. if ($descrip === false) {
  8.    $descrip = "None found";
  9.    }
  10.  
  11. $keys = preg_match("/<meta name=\"keywords\" content=\"(.*?)\"/",$text,$matches);
  12. if ($keys === false) {
  13.    $keys = "None found";
  14.    }
Any suggestions, corrections most welcome. :)

Dormilich's Avatar
Moderator
 
Join Date: Aug 2008
Location: Leipzig, Germany
Posts: 3,649
#2: 2 Weeks Ago

re: Could do with a little help with this preg_match()


question number one: does it work?
Familiar Sight
 
Join Date: Jan 2009
Posts: 165
#3: 2 Weeks Ago

re: Could do with a little help with this preg_match()


That a great question.

I couldn't test it cos of server problems,

Now back on line :)

From the code below, I get this result:

Warning: preg_match() [function.preg-match]: Unknown modifier 't' in /home/guru54gt5/public_html/sys/get_google.php on line 130

Title: None found
Descrip: 0
Keys: 0

Expand|Select|Wrap|Line Numbers
  1.  
  2. $title = preg_match("/<title>(.*?)</title>/",$text,$matches);
  3.  if ($title === false) {
  4.     $title = "None found";
  5.     }
  6.  
  7. $descrip = preg_match("/<meta name=\"description\" content=\"(.*?)\"/",$text,$matches);
  8.  if ($descrip === false) {
  9.     $descrip = "None found";
  10.    }
  11.  
  12. $keys = preg_match("/<meta name=\"keywords\" content=\"(.*?)\"/",$text,$matches);
  13.  if ($keys === false) {
  14.     $keys = "None found";
  15.     }
  16.  
  17. echo "<br>Title: $title<br>Descrip: $descrip<br>Keys: $keys";
  18.  
  19.  
Must be something in the first preg_match() I guess.
Familiar Sight
 
Join Date: Jan 2009
Posts: 165
#4: 2 Weeks Ago

re: Could do with a little help with this preg_match()


I changed it and got rid of the errors
but I am still not picking up content.

this is what I have:

Expand|Select|Wrap|Line Numbers
  1. $title = "None found";
  2. $descrip = "None found";
  3. $keys = "None found";
  4.  
  5. $flag = preg_match("/<title>(.*?)<\/title>/",$text,$matches);
  6.  if ($flag == 1) {
  7.     $title = $matches[0];
  8.     }
  9.  
  10. $flag = preg_match("/<meta name=\"description\" content=\"(.*?)\"/",$text,$matches);
  11.  if ($flag == 1) {
  12.     $descrip = $matches[0];
  13.     }
  14.  
  15. $flag = preg_match("/<meta name=\"keywords\" content=\"(.*?)\"/",$text,$matches);
  16.  if ($flag == 1) {
  17.     $keys = $matches[0];
  18.     }
  19.  
  20. echo "<br>Title: $title<br>Descrip: $descrip<br>Keys: $keys";
  21.  
Of course my output is:
Quote:
Title: None found
Descrip: None found
Keys: None found
any ideas ??
Markus's Avatar
Moderator
 
Join Date: Jun 2007
Location: York, England, with wolves.
Posts: 4,940
#5: 2 Weeks Ago

re: Could do with a little help with this preg_match()


Quote:

Originally Posted by jeddiki View Post

That a great question.

I couldn't test it cos of server problems,

Now back on line :)

From the code below, I get this result:

Warning: preg_match() [function.preg-match]: Unknown modifier 't' in /home/guru54gt5/public_html/sys/get_google.php on line 130

Title: None found
Descrip: 0
Keys: 0

Expand|Select|Wrap|Line Numbers
  1.  
  2. $title = preg_match("/<title>(.*?)</title>/",$text,$matches);
  3.  if ($title === false) {
  4.     $title = "None found";
  5.     }
  6.  
  7. $descrip = preg_match("/<meta name=\"description\" content=\"(.*?)\"/",$text,$matches);
  8.  if ($descrip === false) {
  9.     $descrip = "None found";
  10.    }
  11.  
  12. $keys = preg_match("/<meta name=\"keywords\" content=\"(.*?)\"/",$text,$matches);
  13.  if ($keys === false) {
  14.     $keys = "None found";
  15.     }
  16.  
  17. echo "<br>Title: $title<br>Descrip: $descrip<br>Keys: $keys";
  18.  
  19.  
Must be something in the first preg_match() I guess.

Ah - the problem is that the forward slash in </title> closes the regular expression pattern (if you notice that the opening delimiter is also the forward slash character, this will make sense). So then PHP sees the following 't' as being a modifier (a character that follows the closing delimiter and has some effect on the pattern).

So, to solve this you could either
  • change the delimiters to some other character (I generally use #)
  • or, escape the forward slash with a back-slask; </title> would become <\/title>

Mark.
Familiar Sight
 
Join Date: Jan 2009
Posts: 165
#6: 2 Weeks Ago

re: Could do with a little help with this preg_match()


Thanks Markus,
But I beat you to it ;-)

I fixed that and posted the updated version.

Any idea why I don't pick up the data ?
Reply