473,396 Members | 1,917 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

How can I improve this regex ?

290 100+
Hi,

I am not sure if this can be solved with regex,
possibly the string needs to be chopped into words
and then stepped through ( but not sure how).

Anyway, this is what I have and it is very close to what
I want.:
Expand|Select|Wrap|Line Numbers
  1. preg_match_all("#((?:\b\w{1,20}\b\s+){2})#", $data, $matches) 
Here is part of the out put from print_r($matches);

[32] => technical support [33] => services attempt [34] => to help
[35] => the user [36] => solve specific [37] => problems with

As you can see, the data is just being divided into two word chunks.

And I am missing half of the possible phrases eg "support services"
is not reported.

This is not quite what I expected

What I wanted was a list of all the two word phrases,
so I should be getting:

[32] => technical support [33] => support services [34] => services attempt
[35] => attempt to [36] => to help [37] => help the [38] => the user

You see the overlap ?
This ensures that I do get all the phrases.

Any ideas on how would I need to change my
regex to achieve my desired output ?

If not possible, how else can I achieve it ?
Nov 7 '09 #1
4 1466
Dormilich
8,658 Expert Mod 8TB
there’s nothing you can do while using preg_match_all()

After the first match is found, the subsequent searches are continued on from end of the last match.
Nov 7 '09 #2
jeddiki
290 100+
So I think the "b" part of the question comes into play.

Any suggestions ?
Nov 7 '09 #3
Dormilich
8,658 Expert Mod 8TB
\b = word boundary

otherwise see above quote
Nov 7 '09 #4
Atli
5,058 Expert 4TB
Hey.

I don't see a way to do this using regexp alone. It just searches for patterns, it doesn't do logic.

You could just split the string into induvidual words and have PHP pair the together, two and two.
A loop that goes through each word in the array, partnering it up with the next word in the list, added to a second array.

Expand|Select|Wrap|Line Numbers
  1. $words = explode(' ', $input);
  2. for($i = 1; $i < count($words) - 1; ++$i) {
  3.   $pairs[] = $words[$i-1] . " " . $words[$i];
  4. }
Nov 7 '09 #5

Sign in to post your reply or Sign up for a free account.

Similar topics

9
by: Tim Conner | last post by:
Is there a way to write a faster function ? public static bool IsNumber( char Value ) { if (Regex.IsMatch( Value.ToString(), @"^+$" )) { return true; } else return false; }
23
by: philipl | last post by:
hi, I have some code here which basically look for within a string, the occurance of any 3 consectative characters which are the same. so AAA bbb etc would be reported by this function. I later...
20
by: jeevankodali | last post by:
Hi I have an .Net application which processes thousands of Xml nodes each day and for each node I am using around 30-40 Regex matches to see if they satisfy some conditions are not. These Regex...
17
by: clintonG | last post by:
I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher ...
0
by: hardieca | last post by:
Hello, I'm creating a regex pattern that will pull out the attribute and value pairs from an HTML tag. What I have so far is: @"\s*(?<attribute>*)\s*=\s*?(?<value>*)" This is all well and...
6
by: Extremest | last post by:
I have a huge regex setup going on. If I don't do each one by itself instead of all in one it won't work for. Also would like to know if there is a faster way tried to use string.replace with all...
7
by: Extremest | last post by:
I am using this regex. static Regex paranthesis = new Regex("(\\d*/\\d*)", RegexOptions.IgnoreCase); it should find everything between parenthesis that have some numbers onyl then a forward...
3
by: aspineux | last post by:
My goal is to write a parser for these imaginary string from the SMTP protocol, regarding RFC 821 and 1869. I'm a little flexible with the BNF from these RFC :-) Any comment ? tests= def...
15
by: morleyc | last post by:
Hi, i would like to remove a number of characters from my string (\t \r \n which are throughout the string), i know regex can do this but i have no idea how. Any pointers much appreciated. Chris
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.