472,353 Members | 1,946 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,353 software developers and data experts.

What is best way to find 3-word groups in text?

290 100+
I am writing a little script that will improve authors writing skills by
finding repeated phrases in the text.

The text of a chapter will average about 10,000 words, however, I could
reduce the size of the files if it is better to do so.

So the idea is to search through a string and find repeats of any 3 or 4 word group.

So if the author has repeated the phrase "then I went" 6 times in the text, then this would be found and highlighted.

I am not sure where to start with this :o

Maybe it is best to start by converting the string into an array of all the words?
Expand|Select|Wrap|Line Numbers
  1. $word_list = explode(" ", $text);
But I still don't know how the best way to find these repeated 3 or 4 word phrases is.

The other thing I want to provide is a list of all the words used ( maybe I will exclude words like and, the, a, etc) and the number of times they are used.

Any good ideas on how I should proceed ?

Thanks
Nov 4 '09 #1
6 2103
Dormilich
8,658 Expert Mod 8TB
maybe using regular expressions?
like (to show the general idea)
Expand|Select|Wrap|Line Numbers
  1. // matches 3 or 4 word groups up to 5 letters per word
  2. "#((?:\b\w{1,5}\b\s+){3,4})#" 
Nov 4 '09 #2
jeddiki
290 100+
Yep,
I guessed it might require regex, but I left the question
open in case there is a method that is less cpu intensive.

Thanks for your example, it will be useful as I am still not all that
good with regex.

What would be the best approach to count up all the different words ?
Nov 4 '09 #3
Dormilich
8,658 Expert Mod 8TB
@jeddiki
even if there is, what if the follow-up processes eat up that saved memory/workload/whatever?

@jeddiki
get all single words into an array
(lowercase)
array_unique()
count()
Nov 4 '09 #4
jeddiki
290 100+
Thanks for the pointers :)

I will follow them up and get some code down.
Nov 4 '09 #5
jeddiki
290 100+
Hi,

I have been playing about with the resulting word list for a while but ı can not work out how to get the number times the words occur in an array.

For example

Expand|Select|Wrap|Line Numbers
  1. $words = "Mary Had A Little Lamb and She LOVED It So much she had a fit and killed the lamb. She also loved lamb chops you see";
First I would this:

Expand|Select|Wrap|Line Numbers
  1. $words = strtolower($words);
  2. ...
  3. $list = explode(" ", $words);
  4.  
From here what would you recommend I do to get this:

mary 1
little 1
it 1
so 1
much 1
fit 1
killed 1
also 1
chops 1
you 1
see 1

a 2
had 2
and 2
loved 2

lamb 3
she 3

Any ideas ?
Nov 5 '09 #6
Dormilich
8,658 Expert Mod 8TB
array_count_values() (did I mention that searching the manual is the first step?)
Nov 5 '09 #7

Sign in to post your reply or Sign up for a free account.

Similar topics

23
by: darwinist | last post by:
What PHP Represents There is no shortage of complaints one could make about php as a language, and although the list does shrink with each...
54
by: Brandon J. Van Every | last post by:
I'm realizing I didn't frame my question well. What's ***TOTALLY COMPELLING*** about Ruby over Python? What makes you jump up in your chair and...
226
by: Stephen C. Waterbury | last post by:
This seems like it ought to work, according to the description of reduce(), but it doesn't. Is this a bug, or am I missing something? Python...
22
by: Alper AKCAYOZ | last post by:
Hello Esteemed Developers and Experts, I have been using Microsoft Visual C++ .NET for 1 year. During this time, I have searhed some topics over...
11
by: modemer | last post by:
If I define the following codes: void f(const MyClass & in) {cout << "f(const)\n";} void f(MyClass in) {cout<<"f()\n";} MyClass myclass;...
6
by: Mark Broadbent | last post by:
this might sound like an obvious question but I have found that usually these two evolve at the same time. One of the biggest reasons for creating...
13
by: Jason Huang | last post by:
Hi, Would someone explain the following coding more detail for me? What's the ( ) for? CurrentText = (TextBox)e.Item.Cells.Controls; Thanks. ...
5
by: Tor Erik | last post by:
I would be surprised if it is the naive: m = 0 s1 = "me" s2 = "locate me" s1len = len(s1) s2len = len(s2) found = False while m + s1len <=...
98
by: tjb | last post by:
I often see code like this: /// <summary> /// Removes a node. /// </summary> /// <param name="node">The node to remove.</param> public void...
184
by: jim | last post by:
In a thread about wrapping .Net applications using Thinstall and Xenocode, it was pointed out that there may be better programming languages/IDEs to...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge...
0
jalbright99669
by: jalbright99669 | last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was...
0
by: AndyPSV | last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable...
0
by: Arjunsri | last post by:
I have a Redshift database that I need to use as an import data source. I have configured the DSN connection using the server, port, database, and...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific...
0
by: Matthew3360 | last post by:
Hi, I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web...
0
Oralloy
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the...
0
BLUEPANDA
by: BLUEPANDA | last post by:
At BluePanda Dev, we're passionate about building high-quality software and sharing our knowledge with the community. That's why we've created a SaaS...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.