Connecting Tech Pros Worldwide Forums | Help | Site Map

How can I improve this regex ?

Familiar Sight
 
Join Date: Jan 2009
Posts: 165
#1: 2 Weeks Ago
Hi,

I am not sure if this can be solved with regex,
possibly the string needs to be chopped into words
and then stepped through ( but not sure how).

Anyway, this is what I have and it is very close to what
I want.:
Expand|Select|Wrap|Line Numbers
  1. preg_match_all("#((?:\b\w{1,20}\b\s+){2})#", $data, $matches) 
Here is part of the out put from print_r($matches);

[32] => technical support [33] => services attempt [34] => to help
[35] => the user [36] => solve specific [37] => problems with

As you can see, the data is just being divided into two word chunks.

And I am missing half of the possible phrases eg "support services"
is not reported.

This is not quite what I expected

What I wanted was a list of all the two word phrases,
so I should be getting:

[32] => technical support [33] => support services [34] => services attempt
[35] => attempt to [36] => to help [37] => help the [38] => the user

You see the overlap ?
This ensures that I do get all the phrases.

Any ideas on how would I need to change my
regex to achieve my desired output ?

If not possible, how else can I achieve it ?

Dormilich's Avatar
Moderator
 
Join Date: Aug 2008
Location: Leipzig, Germany
Posts: 3,652
#2: 2 Weeks Ago

re: How can I improve this regex ?


there’s nothing you can do while using preg_match_all()

Quote:

Originally Posted by php.net

After the first match is found, the subsequent searches are continued on from end of the last match.

Familiar Sight
 
Join Date: Jan 2009
Posts: 165
#3: 2 Weeks Ago

re: How can I improve this regex ?


So I think the "b" part of the question comes into play.

Any suggestions ?
Dormilich's Avatar
Moderator
 
Join Date: Aug 2008
Location: Leipzig, Germany
Posts: 3,652
#4: 2 Weeks Ago

re: How can I improve this regex ?


\b = word boundary

otherwise see above quote
Atli's Avatar
Moderator
 
Join Date: Nov 2006
Location: Iceland
Posts: 3,751
#5: 2 Weeks Ago

re: How can I improve this regex ?


Hey.

I don't see a way to do this using regexp alone. It just searches for patterns, it doesn't do logic.

You could just split the string into induvidual words and have PHP pair the together, two and two.
A loop that goes through each word in the array, partnering it up with the next word in the list, added to a second array.

Expand|Select|Wrap|Line Numbers
  1. $words = explode(' ', $input);
  2. for($i = 1; $i < count($words) - 1; ++$i) {
  3.   $pairs[] = $words[$i-1] . " " . $words[$i];
  4. }
Reply