473,382 Members | 1,329 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

Combining 2 preg matches.

Hi group,

I have a function which validates a string using preg match.
A part looks like

if( !preg_match( '/^([a-z0-9]+(([a-z0-9_-]*)?[a-z0-9])?)$/', $string )
||
preg_match( '/(--|__)+/' ,$string) ) {

i wonder how i could combine those two into one ...
I tried a few different options of putting the second match into the
first one,
using things like [^__]+ etc, but nothing worked for me.
it should prevent double (or more) dashes or underscores behind each
other.
hello-there = ok
hello--there != ok

Any help would be great.

Frizzle.

Jul 15 '06 #1
14 1994
frizzle wrote:
I have a function which validates a string using preg match.
A part looks like

if( !preg_match( '/^([a-z0-9]+(([a-z0-9_-]*)?[a-z0-9])?)$/', $string )
||
preg_match( '/(--|__)+/' ,$string) ) {

i wonder how i could combine those two into one ...
I tried a few different options of putting the second match into the
first one,
using things like [^__]+ etc, but nothing worked for me.
it should prevent double (or more) dashes or underscores behind each
other.
hello-there = ok
hello--there != ok
Is hello-_there ok?
Is hello_-there ok?
Is _hello-there ok?

If the answer to the above three questions is no, then the following
should do the trick. Note that this implies that the final character
could be a - or _:

if (preg_match('/^([a-z0-9][-_]?)+$/', $string)) { ... }

Csaba Gabor from New York

Jul 15 '06 #2

frizzle wrote:
Hi group,

I have a function which validates a string using preg match.
A part looks like

if( !preg_match( '/^([a-z0-9]+(([a-z0-9_-]*)?[a-z0-9])?)$/', $string )
||
preg_match( '/(--|__)+/' ,$string) ) {

i wonder how i could combine those two into one ...
I tried a few different options of putting the second match into the
first one,
using things like [^__]+ etc, but nothing worked for me.
it should prevent double (or more) dashes or underscores behind each
other.
hello-there = ok
hello--there != ok

Any help would be great.

Frizzle.
What you need is a lookahead and lookbehind assertion on the dash and
underscore, stating that they're acceptable only if there're letters in
front and behind them:

/^(?:[a-z]|(?<=[a-z])[-_](?=[a-z]))+$/

Jul 15 '06 #3

Chung Leong wrote:
frizzle wrote:
Hi group,

I have a function which validates a string using preg match.
A part looks like

if( !preg_match( '/^([a-z0-9]+(([a-z0-9_-]*)?[a-z0-9])?)$/', $string )
||
preg_match( '/(--|__)+/' ,$string) ) {

i wonder how i could combine those two into one ...
I tried a few different options of putting the second match into the
first one,
using things like [^__]+ etc, but nothing worked for me.
it should prevent double (or more) dashes or underscores behind each
other.
hello-there = ok
hello--there != ok

Any help would be great.

Frizzle.

What you need is a lookahead and lookbehind assertion on the dash and
underscore, stating that they're acceptable only if there're letters in
front and behind them:

/^(?:[a-z]|(?<=[a-z])[-_](?=[a-z]))+$/
/^(?:[a-z]|(?<=[a-z])[-_](?=[a-z]))+$/

wowowow, could you explain a little on this ?
like the : and ?<= parts

(i assume 0-9 should still be included??)

Frizzle.

Jul 16 '06 #4

frizzle wrote:
Chung Leong wrote:
frizzle wrote:
Hi group,
>
I have a function which validates a string using preg match.
A part looks like
>
if( !preg_match( '/^([a-z0-9]+(([a-z0-9_-]*)?[a-z0-9])?)$/', $string )
||
preg_match( '/(--|__)+/' ,$string) ) {
>
i wonder how i could combine those two into one ...
I tried a few different options of putting the second match into the
first one,
using things like [^__]+ etc, but nothing worked for me.
it should prevent double (or more) dashes or underscores behind each
other.
hello-there = ok
hello--there != ok
>
Any help would be great.
>
Frizzle.
What you need is a lookahead and lookbehind assertion on the dash and
underscore, stating that they're acceptable only if there're letters in
front and behind them:

/^(?:[a-z]|(?<=[a-z])[-_](?=[a-z]))+$/

/^(?:[a-z]|(?<=[a-z])[-_](?=[a-z]))+$/

wowowow, could you explain a little on this ?
like the : and ?<= parts

(i assume 0-9 should still be included??)

Frizzle.
Still curious after the explanation, but just letting you know it works
axactly as it should ..

Frizzle.

Jul 16 '06 #5
Rik
frizzle wrote:
Chung Leong wrote:
>frizzle wrote:
>>Hi group,

I have a function which validates a string using preg match.
A part looks like

if( !preg_match( '/^([a-z0-9]+(([a-z0-9_-]*)?[a-z0-9])?)$/',
$string )
>
preg_match( '/(--|__)+/' ,$string) ) {

i wonder how i could combine those two into one ...
I tried a few different options of putting the second match into the
first one,
using things like [^__]+ etc, but nothing worked for me.
it should prevent double (or more) dashes or underscores behind each
other.
hello-there = ok
hello--there != ok

Any help would be great.

Frizzle.

What you need is a lookahead and lookbehind assertion on the dash and
underscore, stating that they're acceptable only if there're letters
in front and behind them:

/^(?:[a-z]|(?<=[a-z])[-_](?=[a-z]))+$/


wowowow, could you explain a little on this ?
like the : and ?<= parts
non-capturing group (usefull when you just want to match, and don't need the
exact matched portion):
http://www.regular-expressions.info/brackets.html

positive lookbehind:
http://www.regular-expressions.info/lookaround.html

$regex ='/ #opening delimiter
^ #start of string
(?: #start of non-capturing group
[a-z] #any character between a and z
| #OR
(?<= #start of positive lookbehind (is preceeded by..)
[a-z] #any character between a and z
) #end of positive lookbehind
[-_] #character - or _ (not incorrect, but probably better
to [_\-],[_-] or [\-_]
(?= #start of positive lookahead
[a-z] #any character between a and z
) #end of positive lookahead
) #end of non-capturing group
+ #1 or more times, greedy
$ #end of string
/x';
Human translation:
The entire(1) string consists of 1 or more (2) characters [a-z] and possibly
the single characters _ or - enclosed by characters in the range [a-z].

(1) by achoring them with ^.....$
(2) by +
(i assume 0-9 should still be included??)

If you want that, yes, just change every [a-z] to [a-z0-9].

Use the /i modifier if you want a match to be case-insensitive.

Grtz,
--
Rik Wasmus
Jul 16 '06 #6
Rik
Rik wrote:
$regex ='/ #opening delimiter
^ #start of string
(?: #start of non-capturing group
[a-z] #any character between a and z
| #OR
(?<= #start of positive lookbehind (is preceeded
by..) [a-z] #any character between a and z
) #end of positive lookbehind
[-_] #character - or _ (not incorrect, but probably
better to [_\-],[_-] or [\-_]
(?= #start of positive lookahead
[a-z] #any character between a and z
) #end of positive lookahead
) #end of non-capturing group
+ #1 or more times, greedy
$ #end of string
/x';

It just occured to me that, allthough a wonderfull example:

$regex ='/^(?:[a-z]|[a-z][_\-][a-z])+$/';

....will do just fine.

equally so:
$regex ='/^(?:[a-z]+(?:[_\-][a-z]+))+$/';

Lookahead & -behind are unneccessary in this case, and this keep it simple.

Grtz,
--
Rik Wasmus
Jul 16 '06 #7

Rik wrote:
Rik wrote:
$regex ='/ #opening delimiter
^ #start of string
(?: #start of non-capturing group
[a-z] #any character between a and z
| #OR
(?<= #start of positive lookbehind (is preceeded
by..) [a-z] #any character between a and z
) #end of positive lookbehind
[-_] #character - or _ (not incorrect, but probably
better to [_\-],[_-] or [\-_]
(?= #start of positive lookahead
[a-z] #any character between a and z
) #end of positive lookahead
) #end of non-capturing group
+ #1 or more times, greedy
$ #end of string
/x';


It just occured to me that, allthough a wonderfull example:

$regex ='/^(?:[a-z]|[a-z][_\-][a-z])+$/';

...will do just fine.

equally so:
$regex ='/^(?:[a-z]+(?:[_\-][a-z]+))+$/';

Lookahead & -behind are unneccessary in this case, and this keep it simple.

Grtz,
--
Rik Wasmus
Wow, thanks for the explanation!
Nice link there as well. Going right into my bookmarks.

Frizzle.

Jul 16 '06 #8
Rik wrote:
It just occured to me that, allthough a wonderfull example:

$regex ='/^(?:[a-z]|[a-z][_\-][a-z])+$/';

...will do just fine.

equally so:
$regex ='/^(?:[a-z]+(?:[_\-][a-z]+))+$/';

Lookahead & -behind are unneccessary in this case, and this keep it simple.
Good point. It doesn't make sense to use assertions when you'll capture
the matches anyway.

Jul 16 '06 #9

Chung Leong wrote:
Rik wrote:
It just occured to me that, allthough a wonderfull example:

$regex ='/^(?:[a-z]|[a-z][_\-][a-z])+$/';

...will do just fine.

equally so:
$regex ='/^(?:[a-z]+(?:[_\-][a-z]+))+$/';

Lookahead & -behind are unneccessary in this case, and this keep it simple.

Good point. It doesn't make sense to use assertions when you'll capture
the matches anyway.
Somehow, i believe Rik's solution, gave me problems ...

'/^(?:[a-z0-9]|[a-z0-9][_\-][a-z0-9])+$/'; gave problems.
'/^(?:[a-z0-9]|(?<=[a-z0-9])[-_](?=[a-z0-9]))+$/' didn't.

An example string that gave problems is:
really_a_made_up_string

So i used Chung's option.

Frizzle.

Jul 17 '06 #10
Rik
frizzle wrote:
>>$regex ='/^(?:[a-z]|[a-z][_\-][a-z])+$/';

...will do just fine.

equally so:
$regex ='/^(?:[a-z]+(?:[_\-][a-z]+))+$/';

Lookahead & -behind are unneccessary in this case, and this keep it
simple.
Good point. It doesn't make sense to use assertions when you'll
capture
the matches anyway.
Somehow, i believe Rik's solution, gave me problems ...

'/^(?:[a-z0-9]|[a-z0-9][_\-][a-z0-9])+$/'; gave problems.
'/^(?:[a-z0-9]|(?<=[a-z0-9])[-_](?=[a-z0-9]))+$/' didn't.

An example string that gave problems is:
really_a_made_up_string

Ah, forgot that in [a-z0-9][_\-][a-z0-9] the character on the right is
already matched, so it won't work as a start for the second _ in _a_....

This one should still be working though:
$regex ='/^(?:[a-z0-9]+(?:[_\-][a-z0-9]+)*)$/';

Grtz,
--
Rik Wasmus
Jul 17 '06 #11

Rik wrote:
frizzle wrote:
>$regex ='/^(?:[a-z]|[a-z][_\-][a-z])+$/';

...will do just fine.

equally so:
$regex ='/^(?:[a-z]+(?:[_\-][a-z]+))+$/';

Lookahead & -behind are unneccessary in this case, and this keep it
simple.
Good point. It doesn't make sense to use assertions when you'll
capture
the matches anyway.
Somehow, i believe Rik's solution, gave me problems ...

'/^(?:[a-z0-9]|[a-z0-9][_\-][a-z0-9])+$/'; gave problems.
'/^(?:[a-z0-9]|(?<=[a-z0-9])[-_](?=[a-z0-9]))+$/' didn't.

An example string that gave problems is:
really_a_made_up_string


Ah, forgot that in [a-z0-9][_\-][a-z0-9] the character on the right is
already matched, so it won't work as a start for the second _ in _a_....

This one should still be working though:
$regex ='/^(?:[a-z0-9]+(?:[_\-][a-z0-9]+)*)$/';

Grtz,
--
Rik Wasmus
ok, dankjewel / thanks a lot.

Frizzle.

Jul 17 '06 #12
Rik wrote:
Ah, forgot that in [a-z0-9][_\-][a-z0-9] the character on the right is
already matched, so it won't work as a start for the second _ in _a_....
You know, I thought that was the problem initially, but then remembered
that the regular expression engine does backtracking in order to
maximise any match. When it encounters the underscore after assigning
the letter to the first subpattern, it's supposed to abandon the
previous match, backtrack to the letter, and go down the second branch.

Jul 17 '06 #13
Rik
Chung Leong wrote:
Rik wrote:
>Ah, forgot that in [a-z0-9][_\-][a-z0-9] the character on the right
is already matched, so it won't work as a start for the second _ in
_a_....

You know, I thought that was the problem initially, but then
remembered that the regular expression engine does backtracking in
order to
maximise any match. When it encounters the underscore after assigning
the letter to the first subpattern, it's supposed to abandon the
previous match, backtrack to the letter, and go down the second
branch.
Yes and no. It does exactly what you say, but it is simply not valid:

With the pattern:
'/^(?:[a-z0-9]|[a-z0-9][_\-][a-z0-9])+$/';
one states the entire string can be build by either [a-z0-9](1)OR
[a-z0-9][_\-][a-z0-9](2), think of them as blocks.

Let's examine it (not entirely how it works, but this instance close
enough):
(fixed width font is handy now:)
positions: 123456789012345678901234567890
string: really_a_made_up_string
match1: 111111_error, let's try the other option.
match2: 111112--_error, no other matches possible.

There is no possibility for a match with either (1) or (2) at the second _,
and no other options to match instead at the beginning of the string.

Grtz,
--
Rik Wasmus
Jul 17 '06 #14
Rik wrote:
Yes and no. It does exactly what you say, but it is simply not valid:

With the pattern:
'/^(?:[a-z0-9]|[a-z0-9][_\-][a-z0-9])+$/';
one states the entire string can be build by either [a-z0-9](1)OR
[a-z0-9][_\-][a-z0-9](2), think of them as blocks.

Let's examine it (not entirely how it works, but this instance close
enough):
(fixed width font is handy now:)
positions: 123456789012345678901234567890
string: really_a_made_up_string
match1: 111111_error, let's try the other option.
match2: 111112--_error, no other matches possible.

There is no possibility for a match with either (1) or (2) at the second _,
and no other options to match instead at the beginning of the string.
Ah! I missed the single letter case.

Jul 17 '06 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: sinister | last post by:
The examples in the online manual all seem to use double quotes, e.g. at http://us3.php.net/preg_replace Why? (The behavior is different with single quotes, and presumably simpler to...
2
by: chris | last post by:
Hi, I would like to take two documents and combine them. I can do this but I'm having a little problem with namespaces again. The input documents namespace is xhtml, but how do I tell the...
2
by: toedipper | last post by:
Hello, The following bit of code does a preg match and does something if true (sets $browser to ppcie) Without using if then and else's how do I code it so it does not equal what it is...
4
by: system7designs | last post by:
I don't know preg's that well, can anyone tell me how to write a regular expression that will select everything BUT files/folders that begin with ._ or __?(that's period-underscore and underscore...
0
by: rufus | last post by:
I have some text to parse. I dont want to match link text or text inside paragraphs of class=tab. All other text should be matched. Here is the text: ********** This text will match<a...
1
by: terence.parker | last post by:
I am trying to do a search through some data, more specifically HTML, to extract data from it. So for example I may have: <b>Title:</b<em>This is a title</em> <b>Name:</b<em>Fred</em> I wish...
2
by: ameshkin | last post by:
This script I wrote works with tables, td's and div's, but not with style tags. Can anyone figure out the regular expression for finding <styletags. The trick is that sometimes its not just...
5
by: monomaniac21 | last post by:
hi all what is the preg for capitals in a word to be replaced by that word preceded by a space? i need to be able to do this in preg: thisWord := this Word AnotherExample := Another Example
3
moishy
by: moishy | last post by:
If I wanted to match for instance, all characters that are not in <TAGS>, I would search for all ">ANYTHING<". But how do I make that "ANYTHING"? What will be the PREG for absolutely ANY...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.