473,836 Members | 1,464 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

What is best way to find 3-word groups in text?

290 Contributor
I am writing a little script that will improve authors writing skills by
finding repeated phrases in the text.

The text of a chapter will average about 10,000 words, however, I could
reduce the size of the files if it is better to do so.

So the idea is to search through a string and find repeats of any 3 or 4 word group.

So if the author has repeated the phrase "then I went" 6 times in the text, then this would be found and highlighted.

I am not sure where to start with this :o

Maybe it is best to start by converting the string into an array of all the words?
Expand|Select|Wrap|Line Numbers
  1. $word_list = explode(" ", $text);
But I still don't know how the best way to find these repeated 3 or 4 word phrases is.

The other thing I want to provide is a list of all the words used ( maybe I will exclude words like and, the, a, etc) and the number of times they are used.

Any good ideas on how I should proceed ?

Thanks
Nov 4 '09 #1
6 2285
Dormilich
8,658 Recognized Expert Moderator Expert
maybe using regular expressions?
like (to show the general idea)
Expand|Select|Wrap|Line Numbers
  1. // matches 3 or 4 word groups up to 5 letters per word
  2. "#((?:\b\w{1,5}\b\s+){3,4})#" 
Nov 4 '09 #2
jeddiki
290 Contributor
Yep,
I guessed it might require regex, but I left the question
open in case there is a method that is less cpu intensive.

Thanks for your example, it will be useful as I am still not all that
good with regex.

What would be the best approach to count up all the different words ?
Nov 4 '09 #3
Dormilich
8,658 Recognized Expert Moderator Expert
@jeddiki
even if there is, what if the follow-up processes eat up that saved memory/workload/whatever?

@jeddiki
get all single words into an array
(lowercase)
array_unique()
count()
Nov 4 '09 #4
jeddiki
290 Contributor
Thanks for the pointers :)

I will follow them up and get some code down.
Nov 4 '09 #5
jeddiki
290 Contributor
Hi,

I have been playing about with the resulting word list for a while but ı can not work out how to get the number times the words occur in an array.

For example

Expand|Select|Wrap|Line Numbers
  1. $words = "Mary Had A Little Lamb and She LOVED It So much she had a fit and killed the lamb. She also loved lamb chops you see";
First I would this:

Expand|Select|Wrap|Line Numbers
  1. $words = strtolower($words);
  2. ...
  3. $list = explode(" ", $words);
  4.  
From here what would you recommend I do to get this:

mary 1
little 1
it 1
so 1
much 1
fit 1
killed 1
also 1
chops 1
you 1
see 1

a 2
had 2
and 2
loved 2

lamb 3
she 3

Any ideas ?
Nov 5 '09 #6
Dormilich
8,658 Recognized Expert Moderator Expert
array_count_val ues() (did I mention that searching the manual is the first step?)
Nov 5 '09 #7

Sign in to post your reply or Sign up for a free account.

Similar topics

23
2824
by: darwinist | last post by:
What PHP Represents There is no shortage of complaints one could make about php as a language, and although the list does shrink with each release, some of them are inherent to the origins and development process of this, the most popular of the web-based, server-side, glue-languages. That said, most descriptions of what is good about php, fail to do it justice. Although they are generally enthusiastic and sometimes fanatical, no...
54
6592
by: Brandon J. Van Every | last post by:
I'm realizing I didn't frame my question well. What's ***TOTALLY COMPELLING*** about Ruby over Python? What makes you jump up in your chair and scream "Wow! Ruby has *that*? That is SO FRICKIN' COOL!!! ***MAN*** that would save me a buttload of work and make my life sooooo much easier!" As opposed to minor differences of this feature here, that feature there. Variations on style are of no interest to me. I'm coming at this from a...
226
12750
by: Stephen C. Waterbury | last post by:
This seems like it ought to work, according to the description of reduce(), but it doesn't. Is this a bug, or am I missing something? Python 2.3.2 (#1, Oct 20 2003, 01:04:35) on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> d1 = {'a':1} >>> d2 = {'b':2} >>> d3 = {'c':3}
22
528
by: Alper AKCAYOZ | last post by:
Hello Esteemed Developers and Experts, I have been using Microsoft Visual C++ .NET for 1 year. During this time, I have searhed some topics over internets. Most of the topics about .NET is related to C# and Visual Basic .NET. There are less documents about Visual C++ .NET or Managed C++. I wonder the reasons of below questions: 1) Is C# more powerful than Managed C++ and Visual C++ .NET? 2) Is Microsoft intending to support C# and...
11
2586
by: modemer | last post by:
If I define the following codes: void f(const MyClass & in) {cout << "f(const)\n";} void f(MyClass in) {cout<<"f()\n";} MyClass myclass; f(myclass); Compiler complain that it can't find the best match. Anyone could give a detail explanation in theory? Which one is good?
6
2007
by: Mark Broadbent | last post by:
this might sound like an obvious question but I have found that usually these two evolve at the same time. One of the biggest reasons for creating the abstraction in the first place (in my opinion) is to create a reusable framework that can be applied to similar projects. However I have found that if an abstraction is created first during the development phase, when the implementation is occurring, the functionality or intended behaviour...
13
5067
by: Jason Huang | last post by:
Hi, Would someone explain the following coding more detail for me? What's the ( ) for? CurrentText = (TextBox)e.Item.Cells.Controls; Thanks. Jason
5
2891
by: Tor Erik | last post by:
I would be surprised if it is the naive: m = 0 s1 = "me" s2 = "locate me" s1len = len(s1) s2len = len(s2) found = False while m + s1len <= s2len:
98
4638
by: tjb | last post by:
I often see code like this: /// <summary> /// Removes a node. /// </summary> /// <param name="node">The node to remove.</param> public void RemoveNode(Node node) { <...> }
184
7145
by: jim | last post by:
In a thread about wrapping .Net applications using Thinstall and Xenocode, it was pointed out that there may be better programming languages/IDEs to use for the purpose of creating standalone, single executable apps. My goal is to create desktop applications for use on Windows XP+ OSs that are distributed as single executables that do not require traditional install packages to run. I would like to use a drag and drop UI development...
0
9820
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10548
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10591
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10254
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9374
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7792
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6979
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5826
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
3115
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.