By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,744 Members | 1,696 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,744 IT Pros & Developers. It's quick & easy.

question about preg_*'s s modifer

P: n/a
Say I have the following script:

<?php
$contents = file_get_contents('preg_test.txt');
echo preg_match("#(.*?)[\r\n]+ddddddddddddd#s",$contents) ? 'is
equal' : 'is not equal';
?>

Here's preg_test.txt:

http://www.geocities.com/terra1024/preg_test.txt

(it's a malformed part of a postscript file, in case you're curious)

My question is... when I remove the s modifier, preg_match returns
true. When the s modifier is there, it returns false. I'm not really
sure why this is. The s modifier means that . includes new lines and
carriage returns. In either case, it seems like it should match.

Any ideas as to why it doesn't?

Feb 5 '07 #1
Share this Question
Share on Google+
7 Replies


P: n/a
Rik
yawnmoth <te*******@yahoo.comwrote:
Say I have the following script:

<?php
$contents = file_get_contents('preg_test.txt');
echo preg_match("#(.*?)[\r\n]+ddddddddddddd#s",$contents) ? 'is
equal' : 'is not equal';
?>

Here's preg_test.txt:

http://www.geocities.com/terra1024/preg_test.txt

(it's a malformed part of a postscript file, in case you're curious)

My question is... when I remove the s modifier, preg_match returns
true. When the s modifier is there, it returns false. I'm not really
sure why this is. The s modifier means that . includes new lines and
carriage returns. In either case, it seems like it should match.
Hmmmz, they both match here in PHP 5.1.4, PCRE Library Version 6.6
06-Feb-2006, with or without the modifier. What version are you using?

--
Rik Wasmus
Feb 5 '07 #2

P: n/a
On Feb 5, 2:13 pm, Rik <luiheidsgoe...@hotmail.comwrote:
yawnmoth<terra1...@yahoo.comwrote:
Say I have the following script:
<?php
$contents = file_get_contents('preg_test.txt');
echo preg_match("#(.*?)[\r\n]+ddddddddddddd#s",$contents) ? 'is
equal' : 'is not equal';
?>
Here's preg_test.txt:
http://www.geocities.com/terra1024/preg_test.txt
(it's a malformed part of a postscript file, in case you're curious)
My question is... when I remove the s modifier, preg_match returns
true. When the s modifier is there, it returns false. I'm not really
sure why this is. The s modifier means that . includes new lines and
carriage returns. In either case, it seems like it should match.

Hmmmz, they both match here in PHP 5.1.4, PCRE Library Version 6.6
06-Feb-2006, with or without the modifier. What version are you using?
PHP 5.2.0 (cli).

Feb 5 '07 #3

P: n/a
Rik
yawnmoth <te*******@yahoo.comwrote:
On Feb 5, 2:13 pm, Rik <luiheidsgoe...@hotmail.comwrote:
>yawnmoth<terra1...@yahoo.comwrote:
Say I have the following script:
<?php
$contents = file_get_contents('preg_test.txt');
echo preg_match("#(.*?)[\r\n]+ddddddddddddd#s",$contents) ? 'is
equal' : 'is not equal';
?>
Here's preg_test.txt:
>http://www.geocities.com/terra1024/preg_test.txt
(it's a malformed part of a postscript file, in case you're curious)
My question is... when I remove the s modifier, preg_match returns
true. When the s modifier is there, it returns false. I'm not really
sure why this is. The s modifier means that . includes new lines and
carriage returns. In either case, it seems like it should match.

Hmmmz, they both match here in PHP 5.1.4, PCRE Library Version 6.6
06-Feb-2006, with or without the modifier. What version are you using?

PHP 5.2.0 (cli).
Hmmmz, the file you gave was the actual file you tested it with, or justa
part of it? Might be it just maxes out. I really don't have any other
ideas :(
--
Rik Wasmus
Feb 5 '07 #4

P: n/a
On Mon, 05 Feb 2007 11:24:30 -0800, yawnmoth <te*******@yahoo.comwrote:
Say I have the following script:

<?php
$contents = file_get_contents('preg_test.txt');
echo preg_match("#(.*?)[\r\n]+ddddddddddddd#s",$contents) ? 'is
equal' : 'is not equal';
?>

Here's preg_test.txt:

http://www.geocities.com/terra1024/preg_test.txt

(it's a malformed part of a postscript file, in case you're curious)

My question is... when I remove the s modifier, preg_match returns
true. When the s modifier is there, it returns false. I'm not really
sure why this is. The s modifier means that . includes new lines and
carriage returns. In either case, it seems like it should match.

Any ideas as to why it doesn't?
This is a very inefficient regex for a large amount of data. Since you are
using the lazy asterisk with the dot, the regex engine immediately starts
backtracking throughout the search. It would be easier to specify the
amount of d's through the {} quantifier, not hardcoding.

Is there a reason you capture all the content before the CRLF and d
portion of the pattern? It looks like you're merely testing if any
whitespace and 13 d's exist. If that's the case, you could just use the
strstr() function. If you want everything except the whitespace and d's,
then use substr().

--
Curtis, http://dyersweb.com
Feb 7 '07 #5

P: n/a
On Feb 5, 4:03 pm, Rik <luiheidsgoe...@hotmail.comwrote:
yawnmoth<terra1...@yahoo.comwrote:
On Feb 5, 2:13 pm, Rik <luiheidsgoe...@hotmail.comwrote:
<snip>
Hmmmz, the file you gave was the actual file you tested it with, or just a
part of it? Might be it just maxes out. I really don't have any other
ideas :(
Yup - that's the actual file. In testing similar files, I noted that
removing a few characters from the middle made it work just fine.
This made me think that it was maxing out, but then I tried to write a
PHP script to sorta auto-generate a more simplified file that'd
demonstrate the problem but was unable to do so.

Feb 7 '07 #6

P: n/a
On Feb 7, 2:50 am, Curtis <dyers...@verizon.netwrote:
On Mon, 05 Feb 2007 11:24:30 -0800,yawnmoth<terra1...@yahoo.comwrote:
Say I have the following script:
<?php
$contents = file_get_contents('preg_test.txt');
echo preg_match("#(.*?)[\r\n]+ddddddddddddd#s",$contents) ? 'is
equal' : 'is not equal';
?>
Here's preg_test.txt:
http://www.geocities.com/terra1024/preg_test.txt
(it's a malformed part of a postscript file, in case you're curious)
My question is... when I remove the s modifier, preg_match returns
true. When the s modifier is there, it returns false. I'm not really
sure why this is. The s modifier means that . includes new lines and
carriage returns. In either case, it seems like it should match.
Any ideas as to why it doesn't?

This is a very inefficient regex for a large amount of data. Since you are
using the lazy asterisk with the dot, the regex engine immediately starts
backtracking throughout the search. It would be easier to specify the
amount of d's through the {} quantifier, not hardcoding.

Is there a reason you capture all the content before the CRLF and d
portion of the pattern? It looks like you're merely testing if any
whitespace and 13 d's exist. If that's the case, you could just use the
strstr() function. If you want everything except the whitespace and d's,
then use substr().
I'm trying to extract fonts from *.ps files. Because the fonts can
have any name of any length (afaik), substr() isn't sufficient. That
said, I assume [^\r\n]+ would be more efficient than .*? ?

Feb 7 '07 #7

P: n/a
yawnmoth wrote:
On Feb 7, 2:50 am, Curtis <dyers...@verizon.netwrote:
>On Mon, 05 Feb 2007 11:24:30 -0800,yawnmoth<terra1...@yahoo.comwrote:
>>Say I have the following script:
<?php
$contents = file_get_contents('preg_test.txt');
echo preg_match("#(.*?)[\r\n]+ddddddddddddd#s",$contents) ? 'is
equal' : 'is not equal';
?>
Here's preg_test.txt:
http://www.geocities.com/terra1024/preg_test.txt
(it's a malformed part of a postscript file, in case you're curious)
My question is... when I remove the s modifier, preg_match returns
true. When the s modifier is there, it returns false. I'm not really
sure why this is. The s modifier means that . includes new lines and
carriage returns. In either case, it seems like it should match.
Any ideas as to why it doesn't?
This is a very inefficient regex for a large amount of data. Since you are
using the lazy asterisk with the dot, the regex engine immediately starts
backtracking throughout the search. It would be easier to specify the
amount of d's through the {} quantifier, not hardcoding.

Is there a reason you capture all the content before the CRLF and d
portion of the pattern? It looks like you're merely testing if any
whitespace and 13 d's exist. If that's the case, you could just use the
strstr() function. If you want everything except the whitespace and d's,
then use substr().
I'm trying to extract fonts from *.ps files. Because the fonts can
have any name of any length (afaik), substr() isn't sufficient. That
said, I assume [^\r\n]+ would be more efficient than .*? ?
Yeah [^\r\n]+ is definitely more efficient, as it won't cause
backtracking.

--
Curtis, http://dyersweb.com
Feb 7 '07 #8

This discussion thread is closed

Replies have been disabled for this discussion.