473,395 Members | 1,458 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Q: Algorithm/Solution for finding words in non delimited string

I'd like to be able to take a string and search within it for all words
(of the longest length possible) that are possibly contained within it
(in sequence, we're not re-ordering the letters in the string).
Obviously the brute force approach (which may be the only solution) is
to iterate through a dictionary file searching for occurances of each
entry within the string.

If anyone has done anything similar to this, were there any other
methods used to reduce the number of iterations required like using a
list of common words that are not generally elements of other words
that can be quickly broken out from the string? Or are there libraries
that may be of use in efficiently processing this type of search?

e.g. given the string "themeether", possible solutions might be
{'the','meet','her'} or {'theme','ether'}

Aug 10 '06 #1
2 1789
br***@gunter-smith.com wrote:
I'd like to be able to take a string and search within it for all words
(of the longest length possible) that are possibly contained within it
(in sequence, we're not re-ordering the letters in the string).
Obviously the brute force approach (which may be the only solution) is
to iterate through a dictionary file searching for occurances of each
entry within the string.
It sounds like you are after a LCS (Longest Common Subsequence)
implementation. Just google for "longest common subsequence" and you'll
get a thousand ways to do it. Wikipedia has one that seems to work
well: http://en.wikipedia.org/wiki/Longest...quence_problem

LCS is used in diff algorithms.
If anyone has done anything similar to this, were there any other
methods used to reduce the number of iterations required like using a
list of common words that are not generally elements of other words
that can be quickly broken out from the string? Or are there libraries
that may be of use in efficiently processing this type of search?

e.g. given the string "themeether", possible solutions might be
{'the','meet','her'} or {'theme','ether'}
Aug 10 '06 #2
br***@gunter-smith.com wrote:
I'd like to be able to take a string and search within it for all words
(of the longest length possible) that are possibly contained within it
(in sequence, we're not re-ordering the letters in the string).
Obviously the brute force approach (which may be the only solution) is
to iterate through a dictionary file searching for occurances of each
entry within the string.

If anyone has done anything similar to this, were there any other
methods used to reduce the number of iterations required like using a
list of common words that are not generally elements of other words
that can be quickly broken out from the string? Or are there libraries
that may be of use in efficiently processing this type of search?

e.g. given the string "themeether", possible solutions might be
{'the','meet','her'} or {'theme','ether'}
That's very similiar to the Thai word-breaking problem. Is that what
you're trying to do, in fact? There's a link listing some of the
approaches:

http://www.fi.muni.cz/~xantos/poster/#x1-3000

Aug 11 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: John van Terheijden | last post by:
Hi. I'm trying to make a conversion algorithm that colors even and odd words in a HTML string with <div> tags. // input text with all sorts of HTML tags and whitespace $str = "<h1>Title...
6
by: Jack Smith | last post by:
Hello, any help appreciated with following problem. I figured out the algorithm (I think), just having trouble proving it is optimal. Suppose we are given n tasks each of which takes 1 unit...
4
by: m sergei | last post by:
I am not asking for code but wanted help with understanding the algorithm to permute all characters of a string. say string is "ABCD" I want to know the algorithm for finding all permutations of...
5
by: Paolino | last post by:
I have a self organizing net which aim is clustering words. Let's think the clustering is about their 2-grams set. Words then are instances of this class. class clusterable(str): def...
18
by: JohnR | last post by:
From reading the documentation, this should be a relatively easy thing. I have an arraylist of custom class instances which I want to search with an"indexof" where I'm passing an instance if the...
1
by: hn.ft.pris | last post by:
"Forward maximum match" mean that there is a dictionary contains thousands of items which are single words. For example, "kid", "nice", "best"... And here get a string "kidnicebs", the forward...
4
by: sklett | last post by:
I realize this could be a little off-topic, but there are some great minds on this NG and I hope you can let me slide this time ;0) I'm designing our system to manage what products can fit in...
4
prometheuzz
by: prometheuzz | last post by:
Hello (Java) enthusiasts, In this article I’d like to tell you a little bit about graphs and how you can search a graph using the BFS (breadth first search) algorithm. I’ll address, and...
6
by: pj | last post by:
Hi, I 'm currently writing a program that performs transliteration (i.e., converts greek text written using the english alphabet to "pure" greek text using the greek alphabet) as part of my...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.