By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,501 Members | 2,861 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,501 IT Pros & Developers. It's quick & easy.

Regex help

P: n/a
Hello everyone,

I am puzzled at PHP's handling of regex. Here's the code:

<?php

$str="aabcc";
$pattern="/((a+)b?(c+))/";

preg_match_all($pattern,$str,$matches);
print_r($matches[0]);

?>

The behaviour I expect from the above would be to match:
a
aa
c
cc
abc
aabc
abcc
aabbcc

The output is ALWAYS the maximum strings :

Array
(
[0] =Array
(
[0] =aabcc
)

[1] =Array
(
[0] =aabcc
)

[2] =Array
(
[0] =aa
)

[3] =Array
(
[0] =cc
)

)
Any idea why the substrings are not picked up?

Thanks
Patrick
Dec 17 '07 #1
Share this Question
Share on Google+
10 Replies


P: n/a
..oO(Patrick Drouin)
>I am puzzled at PHP's handling of regex. Here's the code:

<?php

$str="aabcc";
$pattern="/((a+)b?(c+))/";

preg_match_all($pattern,$str,$matches);
print_r($matches[0]);

?>

The behaviour I expect from the above would be to match:
a
aa
c
cc
abc
aabc
abcc
aabbcc
Nope. What's returned is the entire matched string and all parenthesized
sub strings (if there are any), but not every single matching point from
during the execution.
>The output is ALWAYS the maximum strings :
Correct.
>Array
(
[0] =Array
(
[0] =aabcc
)

[1] =Array
(
[0] =aabcc
)

[2] =Array
(
[0] =aa
)

[3] =Array
(
[0] =cc
)

)
Any idea why the substrings are not picked up?
The above is exactly what you told preg_match() to return:

0: the entire matched string
1: the first sub pattern: ((a+)b?(c+)) =the entire string again
2: the second sub pattern: (a+) =aa
3: the third sub pattern: (c+) =cc

Micha
Dec 17 '07 #2

P: n/a
Hello Michael,
Nope. What's returned is the entire matched string and all parenthesized
sub strings (if there are any), but not every single matching point from
during the execution.

The above is exactly what you told preg_match() to return:
Well OK, let me rephrase then, how can I tell PHP to match the
substrings. In my mind, (a+) means a, aa, aaa, ... and not only the
maximum string. I don't see how that behaviour is logical in any way.

Thanks,
Patrick
Dec 17 '07 #3

P: n/a
Patrick Drouin wrote:
$pattern="/((a+)b?(c+))/";
$pattern="/(a+)b?(c+)/";

HTH

--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 10 days, 21:10.]

Sharing Music with Apple iTunes
http://tobyinkster.co.uk/blog/2007/1...tunes-sharing/
Dec 18 '07 #4

P: n/a
On Mon, 17 Dec 2007 17:03:19 +0100, Patrick Drouin <no**@none.comwrote:
Hello everyone,

I am puzzled at PHP's handling of regex. Here's the code:

<?php

$str="aabcc";
$pattern="/((a+)b?(c+))/";

preg_match_all($pattern,$str,$matches);
print_r($matches[0]);

?>

The behaviour I expect from the above would be to match:
a
aa
c
cc
abc
aabc
abcc
aabbcc
Is this one string or a set of strings you're trying to match? Only the
5th, 6th and 7th line will match this pattern...
>
The output is ALWAYS the maximum strings :
What do you mean by 'maximum'? I see nothing weird here...
--
Rik Wasmus
Dec 18 '07 #5

P: n/a
Moi
Hello,

On 18 déc, 05:34, Toby A Inkster <usenet200...@tobyinkster.co.uk>
wrote:
Patrick Drouin wrote:
$pattern="/((a+)b?(c+))/";

$pattern="/(a+)b?(c+)/";
If you try this, you will see that it spit out

[0] =aaabccc
[0] =aaa
[0] =ccc

That's not what I'm looking for...

Thanks,
P
Dec 18 '07 #6

P: n/a
Moi
Hello Rick,

On 18 déc, 06:52, "Rik Wasmus" <luiheidsgoe...@hotmail.comwrote:
<?php
$str="aabcc";
$pattern="/((a+)b?(c+))/";
preg_match_all($pattern,$str,$matches);
print_r($matches[0]);
?>
The behaviour I expect from the above would be to match:
a
aa
c
cc
abc
aabc
abcc
aabbcc
The output is ALWAYS the maximum strings :

What do you mean by 'maximum'? I see nothing weird here...
OK,what I don't understand is why "a" is not captured as (a+) means
"a" or repeated "a"...

Thanks,
P
Dec 18 '07 #7

P: n/a
On Tue, 18 Dec 2007 15:31:14 +0100, Moi <pa************@gmail.comwrote:
Hello Rick,

On 18 déc, 06:52, "Rik Wasmus" <luiheidsgoe...@hotmail.comwrote:
<?php
$str="aabcc";
$pattern="/((a+)b?(c+))/";
preg_match_all($pattern,$str,$matches);
print_r($matches[0]);
?>
The behaviour I expect from the above would be to match:
a
aa
c
cc
abc
aabc
abcc
aabbcc
The output is ALWAYS the maximum strings :

What do you mean by 'maximum'? I see nothing weird here...

OK,what I don't understand is why "a" is not captured as (a+) means
"a" or repeated "a"...
because /(a+)b?(c+)/ means:
At least one or more a
Optionally followed by one b
Followed by at least one or more c

If 'a' is taken as a single string, it will not match because there's no c
after it, if it was part of a total string you gave us above, it will
still not match because both 'a' and 'aa' are not followed by 'c' or ('b'
and some 'c's), they're followed by a newline character (\n).
--
Rik Wasmus
Dec 18 '07 #8

P: n/a
Moi
Thanks Rik, I guess that makes sense.
P
Dec 18 '07 #9

P: n/a
..oO(Patrick Drouin)
>Hello Michael,
>Nope. What's returned is the entire matched string and all parenthesized
sub strings (if there are any), but not every single matching point from
during the execution.

The above is exactly what you told preg_match() to return:

Well OK, let me rephrase then, how can I tell PHP to match the
substrings. In my mind, (a+) means a, aa, aaa, ... and not only the
maximum string. I don't see how that behaviour is logical in any way.
That's how regular expressions work in general. The only thing that you
can control in many regex engines is whether the engine should stop the
matching process after it has found a minimum match (ungreedy) or if it
should continue until the maximum length (greedy), which is usually the
default.

Micha
Dec 18 '07 #10

P: n/a
Greetings, Moi.
In reply to Your message dated Tuesday, December 18, 2007, 17:29:22,
Hello,
On 18 déc, 05:34, Toby A Inkster <usenet200...@tobyinkster.co.uk>
wrote:
>Patrick Drouin wrote:
$pattern="/((a+)b?(c+))/";

$pattern="/(a+)b?(c+)/";
If you try this, you will see that it spit out
[0] =aaabccc
[0] =aaa
[0] =ccc
TH, it is

[0] =aaabccc
[1] =aaa
[2] =ccc

which is HIGHLY different.
That's not what I'm looking for...
RTFM FTW.
In (0) it always return the whole matched [sub]string.
Just ignore the [0] entry if You do not want to deal with it.
--
Sincerely Yours, AnrDaemon <an*******@freemail.ru>

Dec 21 '07 #11

This discussion thread is closed

Replies have been disabled for this discussion.