By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,712 Members | 2,174 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,712 IT Pros & Developers. It's quick & easy.

[regex] Why doesn't preg_replace work?

P: n/a
Hello

I went through some examples, tried a bunch of things... but still
can't figure out why I can't extract the TITLE section of a web page
using preg_replace():

-----------
<?php

$url = "http://www.cnn.com";

$response = file_get_contents($url);

$output=preg_replace("|<title>(.+?)</title>|smiU",
"TITLE=$1",
$response);

$fp = fopen ("output.html", "w");
fputs ($fp,$output);
fclose($fp);
-----------

Any idea?

Thanks!
May 7 '07 #1
Share this Question
Share on Google+
2 Replies


P: n/a
Hi Gilles,

I'm not a regex guru, but I can see a spot a couple problem areas in
your expression:

1. The core syntax could probably be simplified using something like
this:

|^<title>([^<]+)</title>$|i

I hope I got that right -- I usually have to test my expression a few
times before I get all the nuances right. :)

2. smiU - That's modifier overkill. The U here and the ? in your
expression are probably reacting to each other in unexpected ways. If
you don't know about this page, it can help:

http://www.php.net/manual/en/referen....modifiers.php

I have a prefab function I've used for this very thing, but
unfortunately I don't have access to it that moment. Hopefully,
someone will be along shortly with the proper syntax. In the
meantime, I hope this helps in a more general sense.

Regards,
Tom

On May 7, 4:30 pm, Gilles Ganault <nos...@nospam.comwrote:
Hello

I went through some examples, tried a bunch of things... but still
can't figure out why I can't extract the TITLE section of a web page
using preg_replace():

-----------
<?php

$url = "http://www.cnn.com";

$response = file_get_contents($url);

$output=preg_replace("|<title>(.+?)</title>|smiU",
"TITLE=$1",
$response);

$fp = fopen ("output.html", "w");
fputs ($fp,$output);
fclose($fp);
-----------

Any idea?

Thanks!

May 7 '07 #2

P: n/a
On 7 May 2007 16:46:27 -0700, klenwell <kl******@gmail.comwrote:
>2. smiU - That's modifier overkill. The U here and the ? in your
expression are probably reacting to each other in unexpected ways.
Ah, ah... Indeed, it seems like it's either using the U switch to make
Preg non-greedy, or use the ? limiter (eg. ".+?"). Thanks for pointing
it out.

Found it: To extract bits, I shouldn't use preg_replace() but
preg_match():

--------------
$url = "http://www.cnn.com";
$response = file_get_contents($url);

preg_match("|<title>(.+?)</title>|smi",$response,$matches);
$response = $matches[1];

$fp = fopen ("output.html", "w");
fputs ($fp,$response);
fclose($fp);
--------------

Thank you.
May 8 '07 #3

This discussion thread is closed

Replies have been disabled for this discussion.