Hi all. I've been given an assignment to construct an algorithm that can find all words that occur within a given word. For example:
Given furthermore, solutions could be
form,
the,
further,
more,
theme
...and so on.
I thought about using the STL next_permutation() function, but it only finds permutation with the same amount of letters. Searching the dictionary for correct words isn't hard, I just can't start to figure out how to come up with all of the words. Any suggestions?
6 3295
something like this -
foreach( word in dictionary)
-
{
-
foreach ( letter in [A..Z] )
-
{
-
if( letter_count(word, letter)>letter_count(original_word, letter) )
-
next word;
-
}
-
print word;
-
}
-
ie if each letter occurs the same or less times in dictionay word that original word, it can be constructed from the letters of original word.
Calculation of letter_count(original_word,letter) probably should be moved outside the loop because is has the same value for any word_in_dictionary.
I looked at that algorithm, but it requires looking at each word in the dictionary, and then seeing if it can be made from the given word, and is a very slow algorithm (takes about 15seconds on my core 2 duo 2ghz!) I know the faster way would be to find all combinations of the given word and matching them against the dictionary because then you wouldn't have to check every single dictionary entry. I just can't figure out how to do the combinatorial algebra to divine each combination.
Banfa 9,065
Recognized Expert Moderator Expert
Try one on paper, for instance try finding the all the combinations available from 5 letters (A, B, C, D, E).
Do it by the number of combinations for each result length so find the number of combinations of length 1, then length 2 and 3, 4 and 5.
Look at how you did it and then try to duplicate it on your computer as an algorithm.
Be aware that if you have 2 letters the same in the initial word you risk producing some duplicate results. That is if the word is "free" and you are picking 2 letter combinations then there are 2 ways of picking both "fe" and "re".
You could also try look up the maths for combinations which is binomial coefficients.
Once you have the various permutations, you'll need to check if they're actual words. For an efficient way of doing this, check out this thread.
donbock 2,426
Recognized Expert Top Contributor
I looked at that algorithm, but it requires looking at each word in the dictionary, and then seeing if it can be made from the given word, and is a very slow algorithm (takes about 15seconds on my core 2 duo 2ghz!) I know the faster way would be to find all combinations of the given word and matching them against the dictionary because then you wouldn't have to check every single dictionary entry. I just can't figure out how to do the combinatorial algebra to divine each combination.
You may be disappointed by that strategy. Take your example: "furthermore". Your strategy requires you to first compose a list of all the 1-letter sequences, 2-letter sequences, ... and finally the 11-letter sequences that can be formed from those 11 letters and then check each against your dictionary to see if it is a word.
Notice that I said "sequence" rather than your term "combination". That's because "combination" is a technical term in this context. You should look up the terms "permutation" and "combination" to see which is appropriate for your problem. If you're not already familiar with it, you should also look up the "factorial" operator.
For an 11-letter word, there are 108,303,111 different sequences to compare against your dictionary! Odds are that's a lot more terms than are in your dictionary. This number comes from trusting the combination/permutation function in Excel.
A lot of times the best way to start a coding problem is to NOT code ... but think about the strategies you might use and then find ways to compare them.
Cheers,
donbock
donbock 2,426
Recognized Expert Top Contributor
something like this -
foreach( word in dictionary)
-
{
-
foreach ( letter in [A..Z] )
-
{
-
if( letter_count(word, letter)>letter_count(original_word, letter) )
-
next word;
-
}
-
print word;
-
}
-
ie if each letter occurs the same or less times in dictionay word that original word, it can be constructed from the letters of original word.
Calculation of letter_count(original_word,letter) probably should be moved outside the loop because is has the same value for any word_in_dictionary.
Another approach would be to sort the letters in the original word and then, for each dictionary word, compare the sorted original word against the sorted dictionary word. This is similar to Knuth's anagram algorithm in "Sorting and Searching". Notice that unlike Knuth, you are not seeking equality in this comparison. Presorting the dictionary would be a very good tactic if several input words were to be checked.
Cheers,
donbock
Sign in to post your reply or Sign up for a free account.
Similar topics |
by: TXSherry |
last post by:
Hi,
I cannot seem to wrap my brain around preg_replace. Though I've read
the help file backwords and forwards. :/ Hoping someone can give me
a solution here.
Problem: Given string 'str'...
|
by: Brynn |
last post by:
I want to thank the person that sent the nasty messages to my
contact.asp test page ... you have inspired me !!!
I have now added a bad_words filter both to my JSValidation script
(going up on...
|
by: teoryn |
last post by:
I've been spending today learning python and as an exercise I've ported
a program I wrote in java that unscrambles a word. Before describing
the problem, here's the code:
*--beginning of file--*...
|
by: none |
last post by:
Hello there,
I have a table with many text and varchar fields and I would like to get
all distinct words from these filelds.
For example:
table Pet:
id int(10) unsigned
legende text
|
by: Peter Bungart |
last post by:
I have a text field in Access (XP) that lists the occurrences of various
plants, which are delimited within the field by commas (e.g., cactus,
mesquite, yucca, cliffrose). Is there a way to...
| |
by: Sling |
last post by:
I code in Rexx on the mainframe which has 2 built-in functions: word(s,i) &
words(s).
word(s,i) returns the ith word in the s(tring), and words(s) returns the
number of words within the s(tring)....
|
by: MN |
last post by:
Hello all -
I'm really at wits end on this as I'm not having much success with locating
a function or other options I have. I'm working on a message board for a
website and I'm needing to check...
|
by: vonclausowitz |
last post by:
Repost from an VB group.
Hi All,
I'm looking for a way to search for multiple words in a database.
There is however one but. The words have to be within a certain range
of each other.
For...
|
by: sumedh..... |
last post by:
In a compiler there are 36bits for a word and to store a character
8bits are needed. In this to store a character two words appended.
Then for storing k characters string,how many words are needed?
|
by: Hvid Hat |
last post by:
Hello
How should I go about marking certain words in a text? I've got a list of
words:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="Mark.xsl"?>
<Words>...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The...
| |
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...
| |