473,574 Members | 3,052 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

What is best way to find 3-word groups in text?

290 Contributor
I am writing a little script that will improve authors writing skills by
finding repeated phrases in the text.

The text of a chapter will average about 10,000 words, however, I could
reduce the size of the files if it is better to do so.

So the idea is to search through a string and find repeats of any 3 or 4 word group.

So if the author has repeated the phrase "then I went" 6 times in the text, then this would be found and highlighted.

I am not sure where to start with this :o

Maybe it is best to start by converting the string into an array of all the words?
Expand|Select|Wrap|Line Numbers
  1. $word_list = explode(" ", $text);
But I still don't know how the best way to find these repeated 3 or 4 word phrases is.

The other thing I want to provide is a list of all the words used ( maybe I will exclude words like and, the, a, etc) and the number of times they are used.

Any good ideas on how I should proceed ?

Thanks
Nov 4 '09 #1
6 2274
Dormilich
8,658 Recognized Expert Moderator Expert
maybe using regular expressions?
like (to show the general idea)
Expand|Select|Wrap|Line Numbers
  1. // matches 3 or 4 word groups up to 5 letters per word
  2. "#((?:\b\w{1,5}\b\s+){3,4})#" 
Nov 4 '09 #2
jeddiki
290 Contributor
Yep,
I guessed it might require regex, but I left the question
open in case there is a method that is less cpu intensive.

Thanks for your example, it will be useful as I am still not all that
good with regex.

What would be the best approach to count up all the different words ?
Nov 4 '09 #3
Dormilich
8,658 Recognized Expert Moderator Expert
@jeddiki
even if there is, what if the follow-up processes eat up that saved memory/workload/whatever?

@jeddiki
get all single words into an array
(lowercase)
array_unique()
count()
Nov 4 '09 #4
jeddiki
290 Contributor
Thanks for the pointers :)

I will follow them up and get some code down.
Nov 4 '09 #5
jeddiki
290 Contributor
Hi,

I have been playing about with the resulting word list for a while but ı can not work out how to get the number times the words occur in an array.

For example

Expand|Select|Wrap|Line Numbers
  1. $words = "Mary Had A Little Lamb and She LOVED It So much she had a fit and killed the lamb. She also loved lamb chops you see";
First I would this:

Expand|Select|Wrap|Line Numbers
  1. $words = strtolower($words);
  2. ...
  3. $list = explode(" ", $words);
  4.  
From here what would you recommend I do to get this:

mary 1
little 1
it 1
so 1
much 1
fit 1
killed 1
also 1
chops 1
you 1
see 1

a 2
had 2
and 2
loved 2

lamb 3
she 3

Any ideas ?
Nov 5 '09 #6
Dormilich
8,658 Recognized Expert Moderator Expert
array_count_val ues() (did I mention that searching the manual is the first step?)
Nov 5 '09 #7

Sign in to post your reply or Sign up for a free account.

Similar topics

23
2796
by: darwinist | last post by:
What PHP Represents There is no shortage of complaints one could make about php as a language, and although the list does shrink with each release, some of them are inherent to the origins and development process of this, the most popular of the web-based, server-side, glue-languages. That said, most descriptions of what is good about...
54
6535
by: Brandon J. Van Every | last post by:
I'm realizing I didn't frame my question well. What's ***TOTALLY COMPELLING*** about Ruby over Python? What makes you jump up in your chair and scream "Wow! Ruby has *that*? That is SO FRICKIN' COOL!!! ***MAN*** that would save me a buttload of work and make my life sooooo much easier!" As opposed to minor differences of this feature...
226
12411
by: Stephen C. Waterbury | last post by:
This seems like it ought to work, according to the description of reduce(), but it doesn't. Is this a bug, or am I missing something? Python 2.3.2 (#1, Oct 20 2003, 01:04:35) on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> d1 = {'a':1} >>> d2 = {'b':2} >>> d3 = {'c':3}
22
528
by: Alper AKCAYOZ | last post by:
Hello Esteemed Developers and Experts, I have been using Microsoft Visual C++ .NET for 1 year. During this time, I have searhed some topics over internets. Most of the topics about .NET is related to C# and Visual Basic .NET. There are less documents about Visual C++ .NET or Managed C++. I wonder the reasons of below questions: 1) Is C#...
11
2503
by: modemer | last post by:
If I define the following codes: void f(const MyClass & in) {cout << "f(const)\n";} void f(MyClass in) {cout<<"f()\n";} MyClass myclass; f(myclass); Compiler complain that it can't find the best match. Anyone could give a detail explanation in theory? Which one is good?
6
1987
by: Mark Broadbent | last post by:
this might sound like an obvious question but I have found that usually these two evolve at the same time. One of the biggest reasons for creating the abstraction in the first place (in my opinion) is to create a reusable framework that can be applied to similar projects. However I have found that if an abstraction is created first during the...
13
5029
by: Jason Huang | last post by:
Hi, Would someone explain the following coding more detail for me? What's the ( ) for? CurrentText = (TextBox)e.Item.Cells.Controls; Thanks. Jason
5
2879
by: Tor Erik | last post by:
I would be surprised if it is the naive: m = 0 s1 = "me" s2 = "locate me" s1len = len(s1) s2len = len(s2) found = False while m + s1len <= s2len:
98
4539
by: tjb | last post by:
I often see code like this: /// <summary> /// Removes a node. /// </summary> /// <param name="node">The node to remove.</param> public void RemoveNode(Node node) { <...> }
184
6949
by: jim | last post by:
In a thread about wrapping .Net applications using Thinstall and Xenocode, it was pointed out that there may be better programming languages/IDEs to use for the purpose of creating standalone, single executable apps. My goal is to create desktop applications for use on Windows XP+ OSs that are distributed as single executables that do not...
0
7841
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7762
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
8272
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7859
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
6514
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
5336
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
1
2274
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1369
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
1101
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.