preg_match doesn't work properly!?

chadsspameateremail

I might have found a problem with how preg_match works though I'm not
sure.
Lets say you have a regular expression that you want to match a string
of numbers. You might write the code like this:
preg_match( '/^[0-9]+$/', $TestString );

OK everything seems fine. However, did you know if you pass the
following to preg_match: "12345\n" it will return that a match
occurred?!? Even though the newline is not a valid character in our
regular expression.

Here is the test program, *please run the program as written below*:

<?php
$TestString = "12345\n";
print preg_match( '/^[0-9]+$/', $TestString );
?>

You will find it prints 1 even though the newline character isn't a
valid part of our regular expression. What other characters I wonder
can be put in a regular expression and have the string match!? Any
ideas on this? Why is this undocumented behavior present in PHP?!?
For regular expressions to not work as expected or documented seems
like a pretty serious bug in PHP. I don't think there is a problem
with the regular expression.

Thoughts?

Jun 2 '08 #1

Subscribe Post Reply

5324

chadsspameateremail

I found this link about the topic:
http://blog.php-security.org/archive...h-filters.html

Apparently '$' isn't the end of the string unless you add the 'D' to
the end as in:
print preg_match( '/^[0-9]+$/D', $TestString );

The page says 'even documented in the PHP manual is that $...' however
I looked at the preg_match page on php.net and there is no mention of
this or the /D switch either. Any ideas what the author was referring
too?

I am new to PHP but I would certainly consider this a 'gotcha'
especially since it is relatively undocumented.

Jun 2 '08 #2

Rik Wasmus

ch*****************@yahoo.com wrote:

I might have found a problem with how preg_match works though I'm not
sure.
Lets say you have a regular expression that you want to match a string
of numbers. You might write the code like this:
preg_match( '/^[0-9]+$/', $TestString );

OK everything seems fine. However, did you know if you pass the
following to preg_match: "12345\n" it will return that a match
occurred?!? Even though the newline is not a valid character in our
regular expression.

Here is the test program, *please run the program as written below*:

<?php
$TestString = "12345\n";
print preg_match( '/^[0-9]+$/', $TestString );
?>

You will find it prints 1 even though the newline character isn't a
valid part of our regular expression. What other characters I wonder
can be put in a regular expression and have the string match!? Any
ideas on this? Why is this undocumented behavior present in PHP?!?
For regular expressions to not work as expected or documented seems
like a pretty serious bug in PHP. I don't think there is a problem
with the regular expression.

Thoughts?

'/^[0-9]+$/D'

http://nl2.php.net/manual/en/referen....modifiers.php
D (PCRE_DOLLAR_ENDONLY)
If this modifier is set, a dollar metacharacter in the pattern matches
only at the end of the subject string. Without this modifier, a dollar
also matches immediately before the final character if it is a newline
(but not before any other newlines). This modifier is ignored if m
modifier is set. There is no equivalent to this modifier in Perl.
Yes, I also think this is weird. If I want to match for newlines, I'll
match for newlines :).
--
Rik Wasmus
....spamrun finished

Jun 2 '08 #3

Paul Lautman

ch*****************@yahoo.com wrote:

>I might have found a problem with how preg_match works though I'm not
sure.
Lets say you have a regular expression that you want to match a string
of numbers. You might write the code like this:
preg_match( '/^[0-9]+$/', $TestString );

OK everything seems fine. However, did you know if you pass the
following to preg_match: "12345\n" it will return that a match
occurred?!? Even though the newline is not a valid character in our
regular expression.

Yes, I did, but only because that's what it says in the manual:
D (PCRE_DOLLAR_ENDONLY)

If this modifier is set, a dollar metacharacter in the pattern matches only
at the end of the subject string. Without this modifier, a dollar also
matches immediately before the final character if it is a newline (but not
before any other newlines). This modifier is ignored if m modifier is set.
There is no equivalent to this modifier in Perl.

Here is the test program, *please run the program as written below*:

<?php
$TestString = "12345\n";
print preg_match( '/^[0-9]+$/', $TestString );
?>

You will find it prints 1 even though the newline character isn't a
valid part of our regular expression. What other characters I wonder
can be put in a regular expression and have the string match!? Any
ideas on this? Why is this undocumented behavior present in PHP?!?

It isn't since it is documented.

For regular expressions to not work as expected or documented seems
like a pretty serious bug in PHP.

If this was the case then I would agree. However since the cause is not that
it is not in the documentation, but simply that you did not read it in the
documentation.....

I don't think there is a problem
with the regular expression.

Neither do I.

Jun 2 '08 #4

Lars Eighner

In our last episode,
<15**********************************@k30g2000hse. googlegroups.com>, the
lovely and talented ch*****************@yahoo.com broadcast on
comp.lang.php:

I might have found a problem with how preg_match works though I'm not
sure. Lets say you have a regular expression that you want to match a
string of numbers. You might write the code like this: preg_match(
'/^[0-9]+$/', $TestString );

OK everything seems fine. However, did you know if you pass the
following to preg_match: "12345\n" it will return that a match
occurred?!?

Right, because it did.

Even though the newline is not a valid character in our regular
expression.

Doesn't matter. The whole expression matches before the newline.

Here is the test program, *please run the program as written below*:

><?php
$TestString = "12345\n";
print preg_match( '/^[0-9]+$/', $TestString );
?>

You will find it prints 1 even though the newline character isn't a
valid part of our regular expression.

It returns 1 (a match exists) because all of the pattern is found
in $TestString. That is how perl regular expressions work.

preg_match('/dog/','catisnotadogbubba')

matches because all of 'dog' is in 'catisnotadogbubba'.

What other characters I wonder can be put in a regular expression and have
the string match!?

You can put just about anything in if the pattern matches some part of the
string.

Any ideas on this? Why is this undocumented behavior present in PHP?!?

Of course it is not undocumented. The manuel page makes it perfectly clear
what a match consists of.

For regular expressions to not work as expected or documented seems
like a pretty serious bug in PHP. I don't think there is a problem
with the regular expression.

There isn't. There is a serious problem in your understanding of what a
match is --- or possibly what $ means in a perl regular expression. You
do know the p in preg_match means perl.

Thoughts?

man perlre

--
Lars Eighner <http://larseighner.com/us****@larseighner.com
Countdown: 237 days to go.

Jun 2 '08 #5

Guillaume

Lars Eighner a écrit :

There isn't. There is a serious problem in your understanding of what a
match is --- or possibly what $ means in a perl regular expression. You
do know the p in preg_match means perl.

First, we're not talking about Perl, but PHP function "preg_replace",
which use PCRE syntax, and not Perl syntax.

Second, PCRE (just like Perl actually O_o) defines ^ and $ as being
start and end of string/line (cf.
http://www.pcre.org/pcre.txt "PCRE_MULTILINE") (Perl defines them as
start/end of string and start/end of line if used with /m).
POSIX doesn't define them, but that's not the point here.

Pattern ^[0-9]+$ should not match, because in "12345\n" there is a "\n"
between the last number and the end of string, basically "between the
plus and the dollar".

Regards,
--
Guillaume

Jun 2 '08 #6

Rik Wasmus

On Tue, 27 May 2008 18:47:07 +0200, Lars Eighner <us****@larseighner.com
wrote:

In our last episode,
<15**********************************@k30g2000hse. googlegroups.com>, the
lovely and talented ch*****************@yahoo.com broadcast on
comp.lang.php:

>I might have found a problem with how preg_match works though I'm not
sure. Lets say you have a regular expression that you want to match a
string of numbers. You might write the code like this: preg_match(
'/^[0-9]+$/', $TestString );

>OK everything seems fine. However, did you know if you pass the
following to preg_match: "12345\n" it will return that a match
occurred?!?

Right, because it did.

>Even though the newline is not a valid character in our regular
expression.

Doesn't matter. The whole expression matches before the newline.

>Here is the test program, *please run the program as written below*:

><?php
$TestString = "12345\n";
print preg_match( '/^[0-9]+$/', $TestString );
?>

>You will find it prints 1 even though the newline character isn't a
valid part of our regular expression.

It returns 1 (a match exists) because all of the pattern is found
in $TestString. That is how perl regular expressions work.

preg_match('/dog/','catisnotadogbubba')

<SNIPPED more>

With all due respect, you're talking nonsense. You appartently missed that
the match is anchored to the start & end of string. Nothing of your story
has any relevance to the op's problem (which he already googled & solved
himself just before I answered him :) ).
--
Rik Wasmus
....spamrun finished

Jun 2 '08 #7

chadsspameateremail

>You do know the p in preg_match means perl.

Well I come from a Perl background and that's where the original
misunderstanding came from. Assuming preg_match operated like a Perl
regular expression (how stupid could I be?) in a function named after
Perl...

I now submit that preg_match should really be named
klpbnratagybrtdcidreg_match which stands for:
"Kinda Like Perl But Not Really There Are Gotchas You Better Read The
Documentation In Detail regular expression" matching. Though maybe
others have ideas for a shorter name. :)

Chad. :)

Jun 2 '08 #8

chadsspameateremail

Actually, I have to correct myself! Much to my surprise this is
actually how Perl works after I tried it out. As documented here:
http://www.regular-expressions.info/anchors.html

So in Perl:

my $x = "12345\n";
if ( $x =~ /^[0-9]+$/ )
{
print 1;
}
else
{
print 0;
}

Prints 1 whereas:

$x = "12345\n";
if ( $x =~ /^[0-9]+\z/ )
{
print 1;
}
else
{
print 0;
}

Prints 0. So I guess preg_match is a good name... :)

Jun 2 '08 #9

Lars Eighner

In our last episode, <g1**********@biggoron.nerim.net>, the lovely and
talented Guillaume broadcast on comp.lang.php:

Lars Eighner a écrit :

>There isn't. There is a serious problem in your understanding of what a
match is --- or possibly what $ means in a perl regular expression. You
do know the p in preg_match means perl.

First, we're not talking about Perl, but PHP function "preg_replace",
which use PCRE syntax, and not Perl syntax.

Second, PCRE (just like Perl actually O_o) defines ^ and $ as being start
and end of string/line (cf. http://www.pcre.org/pcre.txt "PCRE_MULTILINE")
(Perl defines them as start/end of string and start/end of line if used
with /m). POSIX doesn't define them, but that's not the point here.

Pattern ^[0-9]+$ should not match, because in "12345\n" there is a "\n"
between the last number and the end of string, basically "between the
plus and the dollar".

This is absurd. $ matches the end of the line. You see that is why a
"newline" is called a newline. It is after the end of the line.
--
Lars Eighner <http://larseighner.com/us****@larseighner.com
Countdown: 237 days to go.

Jun 2 '08 #10

Lars Eighner

In our last episode,
<op***************@metallium.lan>,
the lovely and talented Rik Wasmus
broadcast on comp.lang.php:

On Tue, 27 May 2008 18:47:07 +0200, Lars Eighner <us****@larseighner.com>
wrote:
>In our last episode,
<15**********************************@k30g2000hse .googlegroups.com>, the
lovely and talented ch*****************@yahoo.com broadcast on
comp.lang.php:

>>I might have found a problem with how preg_match works though I'm not
sure. Lets say you have a regular expression that you want to match a
string of numbers. You might write the code like this: preg_match(
'/^[0-9]+$/', $TestString );

>>OK everything seems fine. However, did you know if you pass the
following to preg_match: "12345\n" it will return that a match
occurred?!?

Right, because it did.

>>Even though the newline is not a valid character in our regular
expression.

Doesn't matter. The whole expression matches before the newline.

>>Here is the test program, *please run the program as written below*:

>><?php
$TestString = "12345\n";
print preg_match( '/^[0-9]+$/', $TestString );
?>

>>You will find it prints 1 even though the newline character isn't a
valid part of our regular expression.

It returns 1 (a match exists) because all of the pattern is found
in $TestString. That is how perl regular expressions work.

preg_match('/dog/','catisnotadogbubba')

><SNIPPED more>

With all due respect, you're talking nonsense. You appartently missed that
the match is anchored to the start & end of string. Nothing of your story
has any relevance to the op's problem (which he already googled & solved
himself just before I answered him :) ).

$ matches the end of a line. When there is no newline, the end of a string
is presumed to be the end of a line. It was not ever anchored to "end of
string." Anyone who thinks of ^ and $ as relating to strings instead of
lines is asking for trouble.

--
Lars Eighner <http://larseighner.com/us****@larseighner.com
Countdown: 237 days to go.

Jun 2 '08 #11

Rik Wasmus

On Tue, 27 May 2008 22:15:13 +0200, Lars Eighner <us****@larseighner.com
wrote:

In our last episode,
<op***************@metallium.lan>,
the lovely and talented Rik Wasmus
broadcast on comp.lang.php:

>On Tue, 27 May 2008 18:47:07 +0200, Lars Eighner
<us****@larseighner.com>
wrote:
>>In our last episode,
<15**********************************@k30g2000hs e.googlegroups.com>,
the
lovely and talented ch*****************@yahoo.com broadcast on
comp.lang.php:

I might have found a problem with how preg_match works though I'm not
sure. Lets say you have a regular expression that you want to matcha
string of numbers. You might write the code like this: preg_match(
'/^[0-9]+$/', $TestString );

OK everything seems fine. However, did you know if you pass the
following to preg_match: "12345\n" it will return that a match
occurred?!?

Right, because it did.

Even though the newline is not a valid character in our regular
expression.

Doesn't matter. The whole expression matches before the newline.

Here is the test program, *please run the program as written below*:

<?php
$TestString = "12345\n";
print preg_match( '/^[0-9]+$/', $TestString );
?>

You will find it prints 1 even though the newline character isn't a
valid part of our regular expression.

It returns 1 (a match exists) because all of the pattern is found
in $TestString. That is how perl regular expressions work.

preg_match('/dog/','catisnotadogbubba')

><SNIPPED more>

>With all due respect, you're talking nonsense. You appartently missed
that
the match is anchored to the start & end of string. Nothing of your
story
has any relevance to the op's problem (which he already googled & solved
himself just before I answered him :) ).

$ matches the end of a line. When there is no newline, the end of a
string
is presumed to be the end of a line. It was not ever anchored to "endof
string." Anyone who thinks of ^ and $ as relating to strings insteadof
lines is asking for trouble.

/m
Tricks a lot of people, for obvious reasons.
'nuff said
--
Rik Wasmus
....spamrun finished

Jun 2 '08 #12

AnrDaemon

Greetings, Lars Eighner.
In reply to Your message dated Wednesday, May 28, 2008, 00:11:01,

This is absurd. $ matches the end of the line. You see that is why a
"newline" is called a newline. It is after the end of the line.

$ matches the end of the line while it set to the multiline. Otherwise it
matches the end of *string* (or right before the last \n at the end of string).
Feel the difference.
--
Sincerely Yours, AnrDaemon <an*******@freemail.ru>

Jun 2 '08 #13

Guillaume

Lars Eighner a écrit :

Anyone who thinks of ^ and $ as relating to strings instead of
lines is asking for trouble.

Or is reading documentation carefully :p

Regards,
--
Guillaume

Jun 2 '08 #14

by: Station Media | last post by:

Hi, here my problem, i use stored procedure in ACCESS database(latest version), and i would like to use this procedure but it doesnt work, do you know why ? Procedure: PARAMETERS MyField Text...

Microsoft Access / VBA

AddHandler to Dropdownlist in ItemDataBound Doesnt work !!

by: Clouds | last post by:

Hi ! How do I add the dynamic event handler for a dropdownlist present in the itemtemplate of a datalist !! I am doing it in the itemdatabound event of the datalist but it doesnt work... I am...

ASP.NET

Function doesnt work in Safari -

by: effendi | last post by:

Hi I tested the following function in Safari and it doesnt work. This is tested fine in IE. function processOutcome(){ mainDatabase=document.forms.AssessDatabase.value var...

Javascript

Web application doesnt work in 2005

by: Juna | last post by:

I have been working in vs2003, but now started to work in vs2005 but the problem, I have simple web application not website, which work i mean open in browser when we press F5 or run the...

ASP.NET

char[3][3] array doesnt work in Set of C++

by: Digital Don | last post by:

I am writing a program for Peg solitaire... To check for no repetition of previous states I use a Set for storage of Board states.. The pronblem is when I declare the set as type char i.e. set...

C / C++

LAST_INSERT_ID() doesnt work properly

by: jx2 | last post by:

hi guys i would appriciate your coments on this code - when i ran it for the very first time it doesnt see @last = LAST_INSERT_ID() but when i ran it next time it read it properly i need to know it...

PHP

Update metod doesnt work !!

by: Dany13 | last post by:

hi all. i using some text box for input value and some localvarible for passing this data to dataset . give instance for correct row of dataset and data in data table . use one gird view for...

.NET Framework

Load xml in Javascript doesnt work in firefox

by: Hush | last post by:

Hi, The following code works fine in IE7 but FF returns with an error: Access denied to achieve the property Element.firstChild. In this line: nodes = xmlDoc.documentElement.childNodes; My...

Javascript

Coercion of a String Into a Double Doesnt work (??!!)

by: AGP | last post by:

I've been scratching my head for weeks to understand why some code doesnt work for me. here is what i have: dim sVal as string = "13.2401516" dim x as double x = sVal debug.writeline ( x)

Visual Basic .NET

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

preg_match doesn't work properly!?

Similar topics