By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,197 Members | 1,173 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,197 IT Pros & Developers. It's quick & easy.

Combining 2 preg matches.

P: n/a
Hi group,

I have a function which validates a string using preg match.
A part looks like

if( !preg_match( '/^([a-z0-9]+(([a-z0-9_-]*)?[a-z0-9])?)$/', $string )
||
preg_match( '/(--|__)+/' ,$string) ) {

i wonder how i could combine those two into one ...
I tried a few different options of putting the second match into the
first one,
using things like [^__]+ etc, but nothing worked for me.
it should prevent double (or more) dashes or underscores behind each
other.
hello-there = ok
hello--there != ok

Any help would be great.

Frizzle.

Jul 15 '06 #1
Share this Question
Share on Google+
14 Replies


P: n/a
frizzle wrote:
I have a function which validates a string using preg match.
A part looks like

if( !preg_match( '/^([a-z0-9]+(([a-z0-9_-]*)?[a-z0-9])?)$/', $string )
||
preg_match( '/(--|__)+/' ,$string) ) {

i wonder how i could combine those two into one ...
I tried a few different options of putting the second match into the
first one,
using things like [^__]+ etc, but nothing worked for me.
it should prevent double (or more) dashes or underscores behind each
other.
hello-there = ok
hello--there != ok
Is hello-_there ok?
Is hello_-there ok?
Is _hello-there ok?

If the answer to the above three questions is no, then the following
should do the trick. Note that this implies that the final character
could be a - or _:

if (preg_match('/^([a-z0-9][-_]?)+$/', $string)) { ... }

Csaba Gabor from New York

Jul 15 '06 #2

P: n/a

frizzle wrote:
Hi group,

I have a function which validates a string using preg match.
A part looks like

if( !preg_match( '/^([a-z0-9]+(([a-z0-9_-]*)?[a-z0-9])?)$/', $string )
||
preg_match( '/(--|__)+/' ,$string) ) {

i wonder how i could combine those two into one ...
I tried a few different options of putting the second match into the
first one,
using things like [^__]+ etc, but nothing worked for me.
it should prevent double (or more) dashes or underscores behind each
other.
hello-there = ok
hello--there != ok

Any help would be great.

Frizzle.
What you need is a lookahead and lookbehind assertion on the dash and
underscore, stating that they're acceptable only if there're letters in
front and behind them:

/^(?:[a-z]|(?<=[a-z])[-_](?=[a-z]))+$/

Jul 15 '06 #3

P: n/a

Chung Leong wrote:
frizzle wrote:
Hi group,

I have a function which validates a string using preg match.
A part looks like

if( !preg_match( '/^([a-z0-9]+(([a-z0-9_-]*)?[a-z0-9])?)$/', $string )
||
preg_match( '/(--|__)+/' ,$string) ) {

i wonder how i could combine those two into one ...
I tried a few different options of putting the second match into the
first one,
using things like [^__]+ etc, but nothing worked for me.
it should prevent double (or more) dashes or underscores behind each
other.
hello-there = ok
hello--there != ok

Any help would be great.

Frizzle.

What you need is a lookahead and lookbehind assertion on the dash and
underscore, stating that they're acceptable only if there're letters in
front and behind them:

/^(?:[a-z]|(?<=[a-z])[-_](?=[a-z]))+$/
/^(?:[a-z]|(?<=[a-z])[-_](?=[a-z]))+$/

wowowow, could you explain a little on this ?
like the : and ?<= parts

(i assume 0-9 should still be included??)

Frizzle.

Jul 16 '06 #4

P: n/a

frizzle wrote:
Chung Leong wrote:
frizzle wrote:
Hi group,
>
I have a function which validates a string using preg match.
A part looks like
>
if( !preg_match( '/^([a-z0-9]+(([a-z0-9_-]*)?[a-z0-9])?)$/', $string )
||
preg_match( '/(--|__)+/' ,$string) ) {
>
i wonder how i could combine those two into one ...
I tried a few different options of putting the second match into the
first one,
using things like [^__]+ etc, but nothing worked for me.
it should prevent double (or more) dashes or underscores behind each
other.
hello-there = ok
hello--there != ok
>
Any help would be great.
>
Frizzle.
What you need is a lookahead and lookbehind assertion on the dash and
underscore, stating that they're acceptable only if there're letters in
front and behind them:

/^(?:[a-z]|(?<=[a-z])[-_](?=[a-z]))+$/

/^(?:[a-z]|(?<=[a-z])[-_](?=[a-z]))+$/

wowowow, could you explain a little on this ?
like the : and ?<= parts

(i assume 0-9 should still be included??)

Frizzle.
Still curious after the explanation, but just letting you know it works
axactly as it should ..

Frizzle.

Jul 16 '06 #5

P: n/a
Rik
frizzle wrote:
Chung Leong wrote:
>frizzle wrote:
>>Hi group,

I have a function which validates a string using preg match.
A part looks like

if( !preg_match( '/^([a-z0-9]+(([a-z0-9_-]*)?[a-z0-9])?)$/',
$string )
>
preg_match( '/(--|__)+/' ,$string) ) {

i wonder how i could combine those two into one ...
I tried a few different options of putting the second match into the
first one,
using things like [^__]+ etc, but nothing worked for me.
it should prevent double (or more) dashes or underscores behind each
other.
hello-there = ok
hello--there != ok

Any help would be great.

Frizzle.

What you need is a lookahead and lookbehind assertion on the dash and
underscore, stating that they're acceptable only if there're letters
in front and behind them:

/^(?:[a-z]|(?<=[a-z])[-_](?=[a-z]))+$/


wowowow, could you explain a little on this ?
like the : and ?<= parts
non-capturing group (usefull when you just want to match, and don't need the
exact matched portion):
http://www.regular-expressions.info/brackets.html

positive lookbehind:
http://www.regular-expressions.info/lookaround.html

$regex ='/ #opening delimiter
^ #start of string
(?: #start of non-capturing group
[a-z] #any character between a and z
| #OR
(?<= #start of positive lookbehind (is preceeded by..)
[a-z] #any character between a and z
) #end of positive lookbehind
[-_] #character - or _ (not incorrect, but probably better
to [_\-],[_-] or [\-_]
(?= #start of positive lookahead
[a-z] #any character between a and z
) #end of positive lookahead
) #end of non-capturing group
+ #1 or more times, greedy
$ #end of string
/x';
Human translation:
The entire(1) string consists of 1 or more (2) characters [a-z] and possibly
the single characters _ or - enclosed by characters in the range [a-z].

(1) by achoring them with ^.....$
(2) by +
(i assume 0-9 should still be included??)

If you want that, yes, just change every [a-z] to [a-z0-9].

Use the /i modifier if you want a match to be case-insensitive.

Grtz,
--
Rik Wasmus
Jul 16 '06 #6

P: n/a
Rik
Rik wrote:
$regex ='/ #opening delimiter
^ #start of string
(?: #start of non-capturing group
[a-z] #any character between a and z
| #OR
(?<= #start of positive lookbehind (is preceeded
by..) [a-z] #any character between a and z
) #end of positive lookbehind
[-_] #character - or _ (not incorrect, but probably
better to [_\-],[_-] or [\-_]
(?= #start of positive lookahead
[a-z] #any character between a and z
) #end of positive lookahead
) #end of non-capturing group
+ #1 or more times, greedy
$ #end of string
/x';

It just occured to me that, allthough a wonderfull example:

$regex ='/^(?:[a-z]|[a-z][_\-][a-z])+$/';

....will do just fine.

equally so:
$regex ='/^(?:[a-z]+(?:[_\-][a-z]+))+$/';

Lookahead & -behind are unneccessary in this case, and this keep it simple.

Grtz,
--
Rik Wasmus
Jul 16 '06 #7

P: n/a

Rik wrote:
Rik wrote:
$regex ='/ #opening delimiter
^ #start of string
(?: #start of non-capturing group
[a-z] #any character between a and z
| #OR
(?<= #start of positive lookbehind (is preceeded
by..) [a-z] #any character between a and z
) #end of positive lookbehind
[-_] #character - or _ (not incorrect, but probably
better to [_\-],[_-] or [\-_]
(?= #start of positive lookahead
[a-z] #any character between a and z
) #end of positive lookahead
) #end of non-capturing group
+ #1 or more times, greedy
$ #end of string
/x';


It just occured to me that, allthough a wonderfull example:

$regex ='/^(?:[a-z]|[a-z][_\-][a-z])+$/';

...will do just fine.

equally so:
$regex ='/^(?:[a-z]+(?:[_\-][a-z]+))+$/';

Lookahead & -behind are unneccessary in this case, and this keep it simple.

Grtz,
--
Rik Wasmus
Wow, thanks for the explanation!
Nice link there as well. Going right into my bookmarks.

Frizzle.

Jul 16 '06 #8

P: n/a
Rik wrote:
It just occured to me that, allthough a wonderfull example:

$regex ='/^(?:[a-z]|[a-z][_\-][a-z])+$/';

...will do just fine.

equally so:
$regex ='/^(?:[a-z]+(?:[_\-][a-z]+))+$/';

Lookahead & -behind are unneccessary in this case, and this keep it simple.
Good point. It doesn't make sense to use assertions when you'll capture
the matches anyway.

Jul 16 '06 #9

P: n/a

Chung Leong wrote:
Rik wrote:
It just occured to me that, allthough a wonderfull example:

$regex ='/^(?:[a-z]|[a-z][_\-][a-z])+$/';

...will do just fine.

equally so:
$regex ='/^(?:[a-z]+(?:[_\-][a-z]+))+$/';

Lookahead & -behind are unneccessary in this case, and this keep it simple.

Good point. It doesn't make sense to use assertions when you'll capture
the matches anyway.
Somehow, i believe Rik's solution, gave me problems ...

'/^(?:[a-z0-9]|[a-z0-9][_\-][a-z0-9])+$/'; gave problems.
'/^(?:[a-z0-9]|(?<=[a-z0-9])[-_](?=[a-z0-9]))+$/' didn't.

An example string that gave problems is:
really_a_made_up_string

So i used Chung's option.

Frizzle.

Jul 17 '06 #10

P: n/a
Rik
frizzle wrote:
>>$regex ='/^(?:[a-z]|[a-z][_\-][a-z])+$/';

...will do just fine.

equally so:
$regex ='/^(?:[a-z]+(?:[_\-][a-z]+))+$/';

Lookahead & -behind are unneccessary in this case, and this keep it
simple.
Good point. It doesn't make sense to use assertions when you'll
capture
the matches anyway.
Somehow, i believe Rik's solution, gave me problems ...

'/^(?:[a-z0-9]|[a-z0-9][_\-][a-z0-9])+$/'; gave problems.
'/^(?:[a-z0-9]|(?<=[a-z0-9])[-_](?=[a-z0-9]))+$/' didn't.

An example string that gave problems is:
really_a_made_up_string

Ah, forgot that in [a-z0-9][_\-][a-z0-9] the character on the right is
already matched, so it won't work as a start for the second _ in _a_....

This one should still be working though:
$regex ='/^(?:[a-z0-9]+(?:[_\-][a-z0-9]+)*)$/';

Grtz,
--
Rik Wasmus
Jul 17 '06 #11

P: n/a

Rik wrote:
frizzle wrote:
>$regex ='/^(?:[a-z]|[a-z][_\-][a-z])+$/';

...will do just fine.

equally so:
$regex ='/^(?:[a-z]+(?:[_\-][a-z]+))+$/';

Lookahead & -behind are unneccessary in this case, and this keep it
simple.
Good point. It doesn't make sense to use assertions when you'll
capture
the matches anyway.
Somehow, i believe Rik's solution, gave me problems ...

'/^(?:[a-z0-9]|[a-z0-9][_\-][a-z0-9])+$/'; gave problems.
'/^(?:[a-z0-9]|(?<=[a-z0-9])[-_](?=[a-z0-9]))+$/' didn't.

An example string that gave problems is:
really_a_made_up_string


Ah, forgot that in [a-z0-9][_\-][a-z0-9] the character on the right is
already matched, so it won't work as a start for the second _ in _a_....

This one should still be working though:
$regex ='/^(?:[a-z0-9]+(?:[_\-][a-z0-9]+)*)$/';

Grtz,
--
Rik Wasmus
ok, dankjewel / thanks a lot.

Frizzle.

Jul 17 '06 #12

P: n/a
Rik wrote:
Ah, forgot that in [a-z0-9][_\-][a-z0-9] the character on the right is
already matched, so it won't work as a start for the second _ in _a_....
You know, I thought that was the problem initially, but then remembered
that the regular expression engine does backtracking in order to
maximise any match. When it encounters the underscore after assigning
the letter to the first subpattern, it's supposed to abandon the
previous match, backtrack to the letter, and go down the second branch.

Jul 17 '06 #13

P: n/a
Rik
Chung Leong wrote:
Rik wrote:
>Ah, forgot that in [a-z0-9][_\-][a-z0-9] the character on the right
is already matched, so it won't work as a start for the second _ in
_a_....

You know, I thought that was the problem initially, but then
remembered that the regular expression engine does backtracking in
order to
maximise any match. When it encounters the underscore after assigning
the letter to the first subpattern, it's supposed to abandon the
previous match, backtrack to the letter, and go down the second
branch.
Yes and no. It does exactly what you say, but it is simply not valid:

With the pattern:
'/^(?:[a-z0-9]|[a-z0-9][_\-][a-z0-9])+$/';
one states the entire string can be build by either [a-z0-9](1)OR
[a-z0-9][_\-][a-z0-9](2), think of them as blocks.

Let's examine it (not entirely how it works, but this instance close
enough):
(fixed width font is handy now:)
positions: 123456789012345678901234567890
string: really_a_made_up_string
match1: 111111_error, let's try the other option.
match2: 111112--_error, no other matches possible.

There is no possibility for a match with either (1) or (2) at the second _,
and no other options to match instead at the beginning of the string.

Grtz,
--
Rik Wasmus
Jul 17 '06 #14

P: n/a
Rik wrote:
Yes and no. It does exactly what you say, but it is simply not valid:

With the pattern:
'/^(?:[a-z0-9]|[a-z0-9][_\-][a-z0-9])+$/';
one states the entire string can be build by either [a-z0-9](1)OR
[a-z0-9][_\-][a-z0-9](2), think of them as blocks.

Let's examine it (not entirely how it works, but this instance close
enough):
(fixed width font is handy now:)
positions: 123456789012345678901234567890
string: really_a_made_up_string
match1: 111111_error, let's try the other option.
match2: 111112--_error, no other matches possible.

There is no possibility for a match with either (1) or (2) at the second _,
and no other options to match instead at the beginning of the string.
Ah! I missed the single letter case.

Jul 17 '06 #15

This discussion thread is closed

Replies have been disabled for this discussion.