473,791 Members | 3,071 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

What is best way to find 3-word groups in text?

290 Contributor
I am writing a little script that will improve authors writing skills by
finding repeated phrases in the text.

The text of a chapter will average about 10,000 words, however, I could
reduce the size of the files if it is better to do so.

So the idea is to search through a string and find repeats of any 3 or 4 word group.

So if the author has repeated the phrase "then I went" 6 times in the text, then this would be found and highlighted.

I am not sure where to start with this :o

Maybe it is best to start by converting the string into an array of all the words?
Expand|Select|Wrap|Line Numbers
  1. $word_list = explode(" ", $text);
But I still don't know how the best way to find these repeated 3 or 4 word phrases is.

The other thing I want to provide is a list of all the words used ( maybe I will exclude words like and, the, a, etc) and the number of times they are used.

Any good ideas on how I should proceed ?

Thanks
Nov 4 '09 #1
6 2283
Dormilich
8,658 Recognized Expert Moderator Expert
maybe using regular expressions?
like (to show the general idea)
Expand|Select|Wrap|Line Numbers
  1. // matches 3 or 4 word groups up to 5 letters per word
  2. "#((?:\b\w{1,5}\b\s+){3,4})#" 
Nov 4 '09 #2
jeddiki
290 Contributor
Yep,
I guessed it might require regex, but I left the question
open in case there is a method that is less cpu intensive.

Thanks for your example, it will be useful as I am still not all that
good with regex.

What would be the best approach to count up all the different words ?
Nov 4 '09 #3
Dormilich
8,658 Recognized Expert Moderator Expert
@jeddiki
even if there is, what if the follow-up processes eat up that saved memory/workload/whatever?

@jeddiki
get all single words into an array
(lowercase)
array_unique()
count()
Nov 4 '09 #4
jeddiki
290 Contributor
Thanks for the pointers :)

I will follow them up and get some code down.
Nov 4 '09 #5
jeddiki
290 Contributor
Hi,

I have been playing about with the resulting word list for a while but ı can not work out how to get the number times the words occur in an array.

For example

Expand|Select|Wrap|Line Numbers
  1. $words = "Mary Had A Little Lamb and She LOVED It So much she had a fit and killed the lamb. She also loved lamb chops you see";
First I would this:

Expand|Select|Wrap|Line Numbers
  1. $words = strtolower($words);
  2. ...
  3. $list = explode(" ", $words);
  4.  
From here what would you recommend I do to get this:

mary 1
little 1
it 1
so 1
much 1
fit 1
killed 1
also 1
chops 1
you 1
see 1

a 2
had 2
and 2
loved 2

lamb 3
she 3

Any ideas ?
Nov 5 '09 #6
Dormilich
8,658 Recognized Expert Moderator Expert
array_count_val ues() (did I mention that searching the manual is the first step?)
Nov 5 '09 #7

Sign in to post your reply or Sign up for a free account.

Similar topics

23
2819
by: darwinist | last post by:
What PHP Represents There is no shortage of complaints one could make about php as a language, and although the list does shrink with each release, some of them are inherent to the origins and development process of this, the most popular of the web-based, server-side, glue-languages. That said, most descriptions of what is good about php, fail to do it justice. Although they are generally enthusiastic and sometimes fanatical, no...
54
6579
by: Brandon J. Van Every | last post by:
I'm realizing I didn't frame my question well. What's ***TOTALLY COMPELLING*** about Ruby over Python? What makes you jump up in your chair and scream "Wow! Ruby has *that*? That is SO FRICKIN' COOL!!! ***MAN*** that would save me a buttload of work and make my life sooooo much easier!" As opposed to minor differences of this feature here, that feature there. Variations on style are of no interest to me. I'm coming at this from a...
226
12712
by: Stephen C. Waterbury | last post by:
This seems like it ought to work, according to the description of reduce(), but it doesn't. Is this a bug, or am I missing something? Python 2.3.2 (#1, Oct 20 2003, 01:04:35) on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> d1 = {'a':1} >>> d2 = {'b':2} >>> d3 = {'c':3}
22
528
by: Alper AKCAYOZ | last post by:
Hello Esteemed Developers and Experts, I have been using Microsoft Visual C++ .NET for 1 year. During this time, I have searhed some topics over internets. Most of the topics about .NET is related to C# and Visual Basic .NET. There are less documents about Visual C++ .NET or Managed C++. I wonder the reasons of below questions: 1) Is C# more powerful than Managed C++ and Visual C++ .NET? 2) Is Microsoft intending to support C# and...
11
2573
by: modemer | last post by:
If I define the following codes: void f(const MyClass & in) {cout << "f(const)\n";} void f(MyClass in) {cout<<"f()\n";} MyClass myclass; f(myclass); Compiler complain that it can't find the best match. Anyone could give a detail explanation in theory? Which one is good?
6
2005
by: Mark Broadbent | last post by:
this might sound like an obvious question but I have found that usually these two evolve at the same time. One of the biggest reasons for creating the abstraction in the first place (in my opinion) is to create a reusable framework that can be applied to similar projects. However I have found that if an abstraction is created first during the development phase, when the implementation is occurring, the functionality or intended behaviour...
13
5060
by: Jason Huang | last post by:
Hi, Would someone explain the following coding more detail for me? What's the ( ) for? CurrentText = (TextBox)e.Item.Cells.Controls; Thanks. Jason
5
2886
by: Tor Erik | last post by:
I would be surprised if it is the naive: m = 0 s1 = "me" s2 = "locate me" s1len = len(s1) s2len = len(s2) found = False while m + s1len <= s2len:
98
4627
by: tjb | last post by:
I often see code like this: /// <summary> /// Removes a node. /// </summary> /// <param name="node">The node to remove.</param> public void RemoveNode(Node node) { <...> }
184
7124
by: jim | last post by:
In a thread about wrapping .Net applications using Thinstall and Xenocode, it was pointed out that there may be better programming languages/IDEs to use for the purpose of creating standalone, single executable apps. My goal is to create desktop applications for use on Windows XP+ OSs that are distributed as single executables that do not require traditional install packages to run. I would like to use a drag and drop UI development...
0
9669
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10427
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10207
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9029
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7537
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6776
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5431
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5559
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3718
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.