473,326 Members | 2,168 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

Regex help

Hello everyone,

I am puzzled at PHP's handling of regex. Here's the code:

<?php

$str="aabcc";
$pattern="/((a+)b?(c+))/";

preg_match_all($pattern,$str,$matches);
print_r($matches[0]);

?>

The behaviour I expect from the above would be to match:
a
aa
c
cc
abc
aabc
abcc
aabbcc

The output is ALWAYS the maximum strings :

Array
(
[0] =Array
(
[0] =aabcc
)

[1] =Array
(
[0] =aabcc
)

[2] =Array
(
[0] =aa
)

[3] =Array
(
[0] =cc
)

)
Any idea why the substrings are not picked up?

Thanks
Patrick
Dec 17 '07 #1
10 2044
..oO(Patrick Drouin)
>I am puzzled at PHP's handling of regex. Here's the code:

<?php

$str="aabcc";
$pattern="/((a+)b?(c+))/";

preg_match_all($pattern,$str,$matches);
print_r($matches[0]);

?>

The behaviour I expect from the above would be to match:
a
aa
c
cc
abc
aabc
abcc
aabbcc
Nope. What's returned is the entire matched string and all parenthesized
sub strings (if there are any), but not every single matching point from
during the execution.
>The output is ALWAYS the maximum strings :
Correct.
>Array
(
[0] =Array
(
[0] =aabcc
)

[1] =Array
(
[0] =aabcc
)

[2] =Array
(
[0] =aa
)

[3] =Array
(
[0] =cc
)

)
Any idea why the substrings are not picked up?
The above is exactly what you told preg_match() to return:

0: the entire matched string
1: the first sub pattern: ((a+)b?(c+)) =the entire string again
2: the second sub pattern: (a+) =aa
3: the third sub pattern: (c+) =cc

Micha
Dec 17 '07 #2
Hello Michael,
Nope. What's returned is the entire matched string and all parenthesized
sub strings (if there are any), but not every single matching point from
during the execution.

The above is exactly what you told preg_match() to return:
Well OK, let me rephrase then, how can I tell PHP to match the
substrings. In my mind, (a+) means a, aa, aaa, ... and not only the
maximum string. I don't see how that behaviour is logical in any way.

Thanks,
Patrick
Dec 17 '07 #3
Patrick Drouin wrote:
$pattern="/((a+)b?(c+))/";
$pattern="/(a+)b?(c+)/";

HTH

--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 10 days, 21:10.]

Sharing Music with Apple iTunes
http://tobyinkster.co.uk/blog/2007/1...tunes-sharing/
Dec 18 '07 #4
On Mon, 17 Dec 2007 17:03:19 +0100, Patrick Drouin <no**@none.comwrote:
Hello everyone,

I am puzzled at PHP's handling of regex. Here's the code:

<?php

$str="aabcc";
$pattern="/((a+)b?(c+))/";

preg_match_all($pattern,$str,$matches);
print_r($matches[0]);

?>

The behaviour I expect from the above would be to match:
a
aa
c
cc
abc
aabc
abcc
aabbcc
Is this one string or a set of strings you're trying to match? Only the
5th, 6th and 7th line will match this pattern...
>
The output is ALWAYS the maximum strings :
What do you mean by 'maximum'? I see nothing weird here...
--
Rik Wasmus
Dec 18 '07 #5
Moi
Hello,

On 18 déc, 05:34, Toby A Inkster <usenet200...@tobyinkster.co.uk>
wrote:
Patrick Drouin wrote:
$pattern="/((a+)b?(c+))/";

$pattern="/(a+)b?(c+)/";
If you try this, you will see that it spit out

[0] =aaabccc
[0] =aaa
[0] =ccc

That's not what I'm looking for...

Thanks,
P
Dec 18 '07 #6
Moi
Hello Rick,

On 18 déc, 06:52, "Rik Wasmus" <luiheidsgoe...@hotmail.comwrote:
<?php
$str="aabcc";
$pattern="/((a+)b?(c+))/";
preg_match_all($pattern,$str,$matches);
print_r($matches[0]);
?>
The behaviour I expect from the above would be to match:
a
aa
c
cc
abc
aabc
abcc
aabbcc
The output is ALWAYS the maximum strings :

What do you mean by 'maximum'? I see nothing weird here...
OK,what I don't understand is why "a" is not captured as (a+) means
"a" or repeated "a"...

Thanks,
P
Dec 18 '07 #7
On Tue, 18 Dec 2007 15:31:14 +0100, Moi <pa************@gmail.comwrote:
Hello Rick,

On 18 déc, 06:52, "Rik Wasmus" <luiheidsgoe...@hotmail.comwrote:
<?php
$str="aabcc";
$pattern="/((a+)b?(c+))/";
preg_match_all($pattern,$str,$matches);
print_r($matches[0]);
?>
The behaviour I expect from the above would be to match:
a
aa
c
cc
abc
aabc
abcc
aabbcc
The output is ALWAYS the maximum strings :

What do you mean by 'maximum'? I see nothing weird here...

OK,what I don't understand is why "a" is not captured as (a+) means
"a" or repeated "a"...
because /(a+)b?(c+)/ means:
At least one or more a
Optionally followed by one b
Followed by at least one or more c

If 'a' is taken as a single string, it will not match because there's no c
after it, if it was part of a total string you gave us above, it will
still not match because both 'a' and 'aa' are not followed by 'c' or ('b'
and some 'c's), they're followed by a newline character (\n).
--
Rik Wasmus
Dec 18 '07 #8
Moi
Thanks Rik, I guess that makes sense.
P
Dec 18 '07 #9
..oO(Patrick Drouin)
>Hello Michael,
>Nope. What's returned is the entire matched string and all parenthesized
sub strings (if there are any), but not every single matching point from
during the execution.

The above is exactly what you told preg_match() to return:

Well OK, let me rephrase then, how can I tell PHP to match the
substrings. In my mind, (a+) means a, aa, aaa, ... and not only the
maximum string. I don't see how that behaviour is logical in any way.
That's how regular expressions work in general. The only thing that you
can control in many regex engines is whether the engine should stop the
matching process after it has found a minimum match (ungreedy) or if it
should continue until the maximum length (greedy), which is usually the
default.

Micha
Dec 18 '07 #10
Greetings, Moi.
In reply to Your message dated Tuesday, December 18, 2007, 17:29:22,
Hello,
On 18 déc, 05:34, Toby A Inkster <usenet200...@tobyinkster.co.uk>
wrote:
>Patrick Drouin wrote:
$pattern="/((a+)b?(c+))/";

$pattern="/(a+)b?(c+)/";
If you try this, you will see that it spit out
[0] =aaabccc
[0] =aaa
[0] =ccc
TH, it is

[0] =aaabccc
[1] =aaa
[2] =ccc

which is HIGHLY different.
That's not what I'm looking for...
RTFM FTW.
In (0) it always return the whole matched [sub]string.
Just ignore the [0] entry if You do not want to deal with it.
--
Sincerely Yours, AnrDaemon <an*******@freemail.ru>

Dec 21 '07 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Dave | last post by:
I'm struggling with something that should be fairly simple. I just don't know the regext syntax very well, unfortunately. I'd like to parse words out of what is basically a boolean search...
20
by: jeevankodali | last post by:
Hi I have an .Net application which processes thousands of Xml nodes each day and for each node I am using around 30-40 Regex matches to see if they satisfy some conditions are not. These Regex...
17
by: clintonG | last post by:
I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher ...
7
by: Mike Labosh | last post by:
I have the following System.Text.RegularExpressions.Regex that is supposed to remove this predefined list of garbage characters from contact names that come in on import files : Dim...
9
by: jmchadha | last post by:
I have got the following html: "something in html ... etc.. city1... etc... <a class="font1" href="city1.html" onclick="etc."click for <b>info</bon city1 </a> ... some html. city1.. can repeat...
4
by: Chris | last post by:
Hi Everyone, I am using a regex to check for a string. When all the file contains is my test string the regex returns a match, but when I embed the test string in the middle of a text file a...
7
by: Extremest | last post by:
I am using this regex. static Regex paranthesis = new Regex("(\\d*/\\d*)", RegexOptions.IgnoreCase); it should find everything between parenthesis that have some numbers onyl then a forward...
6
by: Phil Barber | last post by:
I am using Regex to validate a file name. I have everything I need except I would like the dot(.) in the filename only to appear once. My question is it possible to allow one instance of character...
1
by: jonnyboy6969 | last post by:
Hi All Really hoping someone can help me out here with my deficient regex skills :) I have a function which takes a string of HTML and replaces a term (word or phrase) with a link. The pupose...
0
by: Support Desk | last post by:
That’s it exactly..thx -----Original Message----- From: Reedick, Andrew Sent: Tuesday, June 03, 2008 9:26 AM To: Support Desk Subject: RE: regex help The regex will now skip anything with...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.