By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,567 Members | 1,056 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,567 IT Pros & Developers. It's quick & easy.

preg_match doesn't work properly!?

P: n/a
I might have found a problem with how preg_match works though I'm not
sure.
Lets say you have a regular expression that you want to match a string
of numbers. You might write the code like this:
preg_match( '/^[0-9]+$/', $TestString );

OK everything seems fine. However, did you know if you pass the
following to preg_match: "12345\n" it will return that a match
occurred?!? Even though the newline is not a valid character in our
regular expression.

Here is the test program, *please run the program as written below*:

<?php
$TestString = "12345\n";
print preg_match( '/^[0-9]+$/', $TestString );
?>

You will find it prints 1 even though the newline character isn't a
valid part of our regular expression. What other characters I wonder
can be put in a regular expression and have the string match!? Any
ideas on this? Why is this undocumented behavior present in PHP?!?
For regular expressions to not work as expected or documented seems
like a pretty serious bug in PHP. I don't think there is a problem
with the regular expression.

Thoughts?
Jun 2 '08 #1
Share this Question
Share on Google+
13 Replies


P: n/a
I found this link about the topic:
http://blog.php-security.org/archive...h-filters.html

Apparently '$' isn't the end of the string unless you add the 'D' to
the end as in:
print preg_match( '/^[0-9]+$/D', $TestString );

The page says 'even documented in the PHP manual is that $...' however
I looked at the preg_match page on php.net and there is no mention of
this or the /D switch either. Any ideas what the author was referring
too?

I am new to PHP but I would certainly consider this a 'gotcha'
especially since it is relatively undocumented.
Jun 2 '08 #2

P: n/a
ch*****************@yahoo.com wrote:
I might have found a problem with how preg_match works though I'm not
sure.
Lets say you have a regular expression that you want to match a string
of numbers. You might write the code like this:
preg_match( '/^[0-9]+$/', $TestString );

OK everything seems fine. However, did you know if you pass the
following to preg_match: "12345\n" it will return that a match
occurred?!? Even though the newline is not a valid character in our
regular expression.

Here is the test program, *please run the program as written below*:

<?php
$TestString = "12345\n";
print preg_match( '/^[0-9]+$/', $TestString );
?>

You will find it prints 1 even though the newline character isn't a
valid part of our regular expression. What other characters I wonder
can be put in a regular expression and have the string match!? Any
ideas on this? Why is this undocumented behavior present in PHP?!?
For regular expressions to not work as expected or documented seems
like a pretty serious bug in PHP. I don't think there is a problem
with the regular expression.

Thoughts?
'/^[0-9]+$/D'

http://nl2.php.net/manual/en/referen....modifiers.php
D (PCRE_DOLLAR_ENDONLY)
If this modifier is set, a dollar metacharacter in the pattern matches
only at the end of the subject string. Without this modifier, a dollar
also matches immediately before the final character if it is a newline
(but not before any other newlines). This modifier is ignored if m
modifier is set. There is no equivalent to this modifier in Perl.
Yes, I also think this is weird. If I want to match for newlines, I'll
match for newlines :).
--
Rik Wasmus
....spamrun finished
Jun 2 '08 #3

P: n/a
ch*****************@yahoo.com wrote:
>I might have found a problem with how preg_match works though I'm not
sure.
Lets say you have a regular expression that you want to match a string
of numbers. You might write the code like this:
preg_match( '/^[0-9]+$/', $TestString );

OK everything seems fine. However, did you know if you pass the
following to preg_match: "12345\n" it will return that a match
occurred?!? Even though the newline is not a valid character in our
regular expression.
Yes, I did, but only because that's what it says in the manual:
D (PCRE_DOLLAR_ENDONLY)

If this modifier is set, a dollar metacharacter in the pattern matches only
at the end of the subject string. Without this modifier, a dollar also
matches immediately before the final character if it is a newline (but not
before any other newlines). This modifier is ignored if m modifier is set.
There is no equivalent to this modifier in Perl.
Here is the test program, *please run the program as written below*:

<?php
$TestString = "12345\n";
print preg_match( '/^[0-9]+$/', $TestString );
?>

You will find it prints 1 even though the newline character isn't a
valid part of our regular expression. What other characters I wonder
can be put in a regular expression and have the string match!? Any
ideas on this? Why is this undocumented behavior present in PHP?!?
It isn't since it is documented.
For regular expressions to not work as expected or documented seems
like a pretty serious bug in PHP.
If this was the case then I would agree. However since the cause is not that
it is not in the documentation, but simply that you did not read it in the
documentation.....
I don't think there is a problem
with the regular expression.
Neither do I.
Jun 2 '08 #4

P: n/a
In our last episode,
<15**********************************@k30g2000hse. googlegroups.com>, the
lovely and talented ch*****************@yahoo.com broadcast on
comp.lang.php:
I might have found a problem with how preg_match works though I'm not
sure. Lets say you have a regular expression that you want to match a
string of numbers. You might write the code like this: preg_match(
'/^[0-9]+$/', $TestString );
OK everything seems fine. However, did you know if you pass the
following to preg_match: "12345\n" it will return that a match
occurred?!?
Right, because it did.
Even though the newline is not a valid character in our regular
expression.
Doesn't matter. The whole expression matches before the newline.
Here is the test program, *please run the program as written below*:
><?php
$TestString = "12345\n";
print preg_match( '/^[0-9]+$/', $TestString );
?>
You will find it prints 1 even though the newline character isn't a
valid part of our regular expression.
It returns 1 (a match exists) because all of the pattern is found
in $TestString. That is how perl regular expressions work.

preg_match('/dog/','catisnotadogbubba')

matches because all of 'dog' is in 'catisnotadogbubba'.
What other characters I wonder can be put in a regular expression and have
the string match!?
You can put just about anything in if the pattern matches some part of the
string.
Any ideas on this? Why is this undocumented behavior present in PHP?!?
Of course it is not undocumented. The manuel page makes it perfectly clear
what a match consists of.
For regular expressions to not work as expected or documented seems
like a pretty serious bug in PHP. I don't think there is a problem
with the regular expression.
There isn't. There is a serious problem in your understanding of what a
match is --- or possibly what $ means in a perl regular expression. You
do know the p in preg_match means perl.
Thoughts?
man perlre

--
Lars Eighner <http://larseighner.com/us****@larseighner.com
Countdown: 237 days to go.
Jun 2 '08 #5

P: n/a
Lars Eighner a écrit :
There isn't. There is a serious problem in your understanding of what a
match is --- or possibly what $ means in a perl regular expression. You
do know the p in preg_match means perl.
First, we're not talking about Perl, but PHP function "preg_replace",
which use PCRE syntax, and not Perl syntax.

Second, PCRE (just like Perl actually O_o) defines ^ and $ as being
start and end of string/line (cf.
http://www.pcre.org/pcre.txt "PCRE_MULTILINE") (Perl defines them as
start/end of string and start/end of line if used with /m).
POSIX doesn't define them, but that's not the point here.

Pattern ^[0-9]+$ should not match, because in "12345\n" there is a "\n"
between the last number and the end of string, basically "between the
plus and the dollar".

Regards,
--
Guillaume
Jun 2 '08 #6

P: n/a
On Tue, 27 May 2008 18:47:07 +0200, Lars Eighner <us****@larseighner.com
wrote:
In our last episode,
<15**********************************@k30g2000hse. googlegroups.com>, the
lovely and talented ch*****************@yahoo.com broadcast on
comp.lang.php:
>I might have found a problem with how preg_match works though I'm not
sure. Lets say you have a regular expression that you want to match a
string of numbers. You might write the code like this: preg_match(
'/^[0-9]+$/', $TestString );
>OK everything seems fine. However, did you know if you pass the
following to preg_match: "12345\n" it will return that a match
occurred?!?

Right, because it did.
>Even though the newline is not a valid character in our regular
expression.

Doesn't matter. The whole expression matches before the newline.
>Here is the test program, *please run the program as written below*:
><?php
$TestString = "12345\n";
print preg_match( '/^[0-9]+$/', $TestString );
?>
>You will find it prints 1 even though the newline character isn't a
valid part of our regular expression.

It returns 1 (a match exists) because all of the pattern is found
in $TestString. That is how perl regular expressions work.

preg_match('/dog/','catisnotadogbubba')
<SNIPPED more>

With all due respect, you're talking nonsense. You appartently missed that
the match is anchored to the start & end of string. Nothing of your story
has any relevance to the op's problem (which he already googled & solved
himself just before I answered him :) ).
--
Rik Wasmus
....spamrun finished
Jun 2 '08 #7

P: n/a
>You do know the p in preg_match means perl.

Well I come from a Perl background and that's where the original
misunderstanding came from. Assuming preg_match operated like a Perl
regular expression (how stupid could I be?) in a function named after
Perl...

I now submit that preg_match should really be named
klpbnratagybrtdcidreg_match which stands for:
"Kinda Like Perl But Not Really There Are Gotchas You Better Read The
Documentation In Detail regular expression" matching. Though maybe
others have ideas for a shorter name. :)

Chad. :)
Jun 2 '08 #8

P: n/a
Actually, I have to correct myself! Much to my surprise this is
actually how Perl works after I tried it out. As documented here:
http://www.regular-expressions.info/anchors.html

So in Perl:

my $x = "12345\n";
if ( $x =~ /^[0-9]+$/ )
{
print 1;
}
else
{
print 0;
}

Prints 1 whereas:

$x = "12345\n";
if ( $x =~ /^[0-9]+\z/ )
{
print 1;
}
else
{
print 0;
}

Prints 0. So I guess preg_match is a good name... :)
Jun 2 '08 #9

P: n/a
In our last episode, <g1**********@biggoron.nerim.net>, the lovely and
talented Guillaume broadcast on comp.lang.php:
Lars Eighner a écrit :
>There isn't. There is a serious problem in your understanding of what a
match is --- or possibly what $ means in a perl regular expression. You
do know the p in preg_match means perl.
First, we're not talking about Perl, but PHP function "preg_replace",
which use PCRE syntax, and not Perl syntax.
Second, PCRE (just like Perl actually O_o) defines ^ and $ as being start
and end of string/line (cf. http://www.pcre.org/pcre.txt "PCRE_MULTILINE")
(Perl defines them as start/end of string and start/end of line if used
with /m). POSIX doesn't define them, but that's not the point here.
Pattern ^[0-9]+$ should not match, because in "12345\n" there is a "\n"
between the last number and the end of string, basically "between the
plus and the dollar".
This is absurd. $ matches the end of the line. You see that is why a
"newline" is called a newline. It is after the end of the line.
--
Lars Eighner <http://larseighner.com/us****@larseighner.com
Countdown: 237 days to go.
Jun 2 '08 #10

P: n/a
In our last episode,
<op***************@metallium.lan>,
the lovely and talented Rik Wasmus
broadcast on comp.lang.php:
On Tue, 27 May 2008 18:47:07 +0200, Lars Eighner <us****@larseighner.com>
wrote:
>In our last episode,
<15**********************************@k30g2000hse .googlegroups.com>, the
lovely and talented ch*****************@yahoo.com broadcast on
comp.lang.php:
>>I might have found a problem with how preg_match works though I'm not
sure. Lets say you have a regular expression that you want to match a
string of numbers. You might write the code like this: preg_match(
'/^[0-9]+$/', $TestString );
>>OK everything seems fine. However, did you know if you pass the
following to preg_match: "12345\n" it will return that a match
occurred?!?

Right, because it did.
>>Even though the newline is not a valid character in our regular
expression.

Doesn't matter. The whole expression matches before the newline.
>>Here is the test program, *please run the program as written below*:
>><?php
$TestString = "12345\n";
print preg_match( '/^[0-9]+$/', $TestString );
?>
>>You will find it prints 1 even though the newline character isn't a
valid part of our regular expression.

It returns 1 (a match exists) because all of the pattern is found
in $TestString. That is how perl regular expressions work.

preg_match('/dog/','catisnotadogbubba')
><SNIPPED more>
With all due respect, you're talking nonsense. You appartently missed that
the match is anchored to the start & end of string. Nothing of your story
has any relevance to the op's problem (which he already googled & solved
himself just before I answered him :) ).
$ matches the end of a line. When there is no newline, the end of a string
is presumed to be the end of a line. It was not ever anchored to "end of
string." Anyone who thinks of ^ and $ as relating to strings instead of
lines is asking for trouble.

--
Lars Eighner <http://larseighner.com/us****@larseighner.com
Countdown: 237 days to go.
Jun 2 '08 #11

P: n/a
On Tue, 27 May 2008 22:15:13 +0200, Lars Eighner <us****@larseighner.com
wrote:
In our last episode,
<op***************@metallium.lan>,
the lovely and talented Rik Wasmus
broadcast on comp.lang.php:
>On Tue, 27 May 2008 18:47:07 +0200, Lars Eighner
<us****@larseighner.com>
wrote:
>>In our last episode,
<15**********************************@k30g2000hs e.googlegroups.com>,
the
lovely and talented ch*****************@yahoo.com broadcast on
comp.lang.php:

I might have found a problem with how preg_match works though I'm not
sure. Lets say you have a regular expression that you want to matcha
string of numbers. You might write the code like this: preg_match(
'/^[0-9]+$/', $TestString );

OK everything seems fine. However, did you know if you pass the
following to preg_match: "12345\n" it will return that a match
occurred?!?

Right, because it did.

Even though the newline is not a valid character in our regular
expression.

Doesn't matter. The whole expression matches before the newline.

Here is the test program, *please run the program as written below*:

<?php
$TestString = "12345\n";
print preg_match( '/^[0-9]+$/', $TestString );
?>

You will find it prints 1 even though the newline character isn't a
valid part of our regular expression.

It returns 1 (a match exists) because all of the pattern is found
in $TestString. That is how perl regular expressions work.

preg_match('/dog/','catisnotadogbubba')
><SNIPPED more>
>With all due respect, you're talking nonsense. You appartently missed
that
the match is anchored to the start & end of string. Nothing of your
story
has any relevance to the op's problem (which he already googled & solved
himself just before I answered him :) ).

$ matches the end of a line. When there is no newline, the end of a
string
is presumed to be the end of a line. It was not ever anchored to "endof
string." Anyone who thinks of ^ and $ as relating to strings insteadof
lines is asking for trouble.
/m
Tricks a lot of people, for obvious reasons.
'nuff said
--
Rik Wasmus
....spamrun finished
Jun 2 '08 #12

P: n/a
Greetings, Lars Eighner.
In reply to Your message dated Wednesday, May 28, 2008, 00:11:01,
This is absurd. $ matches the end of the line. You see that is why a
"newline" is called a newline. It is after the end of the line.
$ matches the end of the line while it set to the multiline. Otherwise it
matches the end of *string* (or right before the last \n at the end of string).
Feel the difference.
--
Sincerely Yours, AnrDaemon <an*******@freemail.ru>

Jun 2 '08 #13

P: n/a
Lars Eighner a écrit :
Anyone who thinks of ^ and $ as relating to strings instead of
lines is asking for trouble.
Or is reading documentation carefully :p

Regards,
--
Guillaume
Jun 2 '08 #14

This discussion thread is closed

Replies have been disabled for this discussion.