Connecting Tech Pros Worldwide Forums | Help | Site Map

Q: Algorithm/Solution for finding words in non delimited string

brock@gunter-smith.com
Guest
 
Posts: n/a
#1: Aug 10 '06
I'd like to be able to take a string and search within it for all words
(of the longest length possible) that are possibly contained within it
(in sequence, we're not re-ordering the letters in the string).
Obviously the brute force approach (which may be the only solution) is
to iterate through a dictionary file searching for occurances of each
entry within the string.

If anyone has done anything similar to this, were there any other
methods used to reduce the number of iterations required like using a
list of common words that are not generally elements of other words
that can be quickly broken out from the string? Or are there libraries
that may be of use in efficiently processing this type of search?

e.g. given the string "themeether", possible solutions might be
{'the','meet','her'} or {'theme','ether'}


Carl Vondrick
Guest
 
Posts: n/a
#2: Aug 10 '06

re: Q: Algorithm/Solution for finding words in non delimited string


brock@gunter-smith.com wrote:
Quote:
I'd like to be able to take a string and search within it for all words
(of the longest length possible) that are possibly contained within it
(in sequence, we're not re-ordering the letters in the string).
Obviously the brute force approach (which may be the only solution) is
to iterate through a dictionary file searching for occurances of each
entry within the string.
It sounds like you are after a LCS (Longest Common Subsequence)
implementation. Just google for "longest common subsequence" and you'll
get a thousand ways to do it. Wikipedia has one that seems to work
well: http://en.wikipedia.org/wiki/Longest...quence_problem

LCS is used in diff algorithms.
Quote:
If anyone has done anything similar to this, were there any other
methods used to reduce the number of iterations required like using a
list of common words that are not generally elements of other words
that can be quickly broken out from the string? Or are there libraries
that may be of use in efficiently processing this type of search?
>
e.g. given the string "themeether", possible solutions might be
{'the','meet','her'} or {'theme','ether'}
>
Chung Leong
Guest
 
Posts: n/a
#3: Aug 11 '06

re: Q: Algorithm/Solution for finding words in non delimited string


brock@gunter-smith.com wrote:
Quote:
I'd like to be able to take a string and search within it for all words
(of the longest length possible) that are possibly contained within it
(in sequence, we're not re-ordering the letters in the string).
Obviously the brute force approach (which may be the only solution) is
to iterate through a dictionary file searching for occurances of each
entry within the string.
>
If anyone has done anything similar to this, were there any other
methods used to reduce the number of iterations required like using a
list of common words that are not generally elements of other words
that can be quickly broken out from the string? Or are there libraries
that may be of use in efficiently processing this type of search?
>
e.g. given the string "themeether", possible solutions might be
{'the','meet','her'} or {'theme','ether'}
That's very similiar to the Thai word-breaking problem. Is that what
you're trying to do, in fact? There's a link listing some of the
approaches:

http://www.fi.muni.cz/~xantos/poster/#x1-3000

Closed Thread