By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,061 Members | 1,563 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,061 IT Pros & Developers. It's quick & easy.

Comparing text strings

P: n/a
I have a monthly safety slogan competition which requires back
checking to a list of already submitted slogans. This takes forerver
to do. I have 2 lists: this month's slogans and a master list of all
slogans.

Here is an example:
S1Master: An unsafe behavior can bring you down
S1Submitted: Unsafe behaviors can bring you down

I am thinking that after removing the plural "s", and then counting
the number of words that are in both sentences, as a percentage of the
total number of words. Percentage greater than 90% say would be listed
as a match.

6 words in 2 sentences the same = 12/13 words = 92.31%
therefore this is a matched pair.

I am trying to keep it simple but have enough accuracy, its not
terrible if a few slip by. Will the above method work and can I
achieve it without VBA. Please guide me.

Thanks
Steve
Nov 13 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
steve wrote:
I have a monthly safety slogan competition which requires back
checking to a list of already submitted slogans. This takes forerver
to do. I have 2 lists: this month's slogans and a master list of all
slogans.

Here is an example:
S1Master: An unsafe behavior can bring you down
S1Submitted: Unsafe behaviors can bring you down

I am thinking that after removing the plural "s", and then counting
the number of words that are in both sentences, as a percentage of the
total number of words. Percentage greater than 90% say would be listed
as a match.

6 words in 2 sentences the same = 12/13 words = 92.31%
therefore this is a matched pair.

I am trying to keep it simple but have enough accuracy, its not
terrible if a few slip by. Will the above method work and can I
achieve it without VBA. Please guide me.

Thanks
Steve


I don't know how you can do it without VBA. Also, do you want the
computer to figure this out for you or do you want to figure it out? To
keep it simple, some human interaction would be better.

For example, the 2 key words I saw in your example was "unsafe
behavior". You could run a query to find all master records that have
"unsafe behavior". Or you could create a query that finds all records
that contain "unsafe" and "behavior". See, here you are getting rid of
plurals. You control the keywords to search on.

Let's say you the master had a key value and the phrase. You have a
form to input 6 up to key words called Form1 with 6 keywords called
Key1..Key6. These Keywords that you type in have a default value of
Null. You could create a query to select the phrase.

Select Phrase, _
IIF(instr([Phrase],IIF(Not
IsNull(Forms!Form1!Key1),Forms!Form1!Key1,chr(0))) > 0,1,0) As Key1Cnt,
IIF(instr([Phrase],IIF(Not
IsNull(Forms!Form1!Key2),Forms!Form1!Key2,chr(0))) > 0,1,0) As Key2Cnt,
....etc.

What this does is it looks for the Key1 word in the phrase if there is a
Key1 value. Same for Key2. Does this for Key1 to Key6. If the key
word is found in the phrase, the value of the column is 1, if not found
or if the keyword to search is blank, the value is 0.

Save this as query1.

Now create another query. The first column is the phrase. The second
column is Key1Cnt + Key2Cnt + Key3Cnt...Key6Cnt. Click Show to off.
Set the criteria to >0. Sort in descending value. Save as query2. Run
this query.

This will exclude all records from the master table where no keywords
matched and present those that did have matches in the number of words
that did match. This method does not account for misspelled words.
Nov 13 '05 #2

P: n/a
Browsing these forums some more I see that that it might be important
to point out that I am using Access 2000. Also it seems like the split
function to break the string into an array is what I need to do but I
don;t know how.
Thanks,
Nov 13 '05 #3

P: n/a
You actually might need to take a few steps back.... natural language
processing is simply NOT going to be this simple, no matter how simple it
appears to be nor how much you try to "dumb down" the approach.

As this is a pure programming task, the version does not matter quite as
much. but developing a language parser is a monumental task, the kind of
thing that might get you a doctorate from the Dept. of Brain and Cognitive
Sciences at MIT if you could get in there and could actually prove a theory
enough to have such a successful applied model. To do so in VBA is a
herculean task, worthy of only a masochist beyond the level of the
aforementioned doctoral candidate.

For more info on why you are moving into an area that will definitely
"stretch" your knowledge and probably your sanity, I would recommend "The
Language Instinct" by Steven Pinker. He will show you that once you think
you have plural forms figured out, that there are many exceptions to the "s"
suffix pluralization rules (ones not as easy to detect). And then he will
show you how other markers, such as those for case and tense, can be used to
confound your parsing efforts, as can reversals of the typical SVO order of
English that are commonly done with case markers, especially in slogans
which are meant to be catchy.

Summary -- this is way to big of a job for anything less than a team of
people, with an actual PhD on the team.
--
MichKa [MS]
(armchair linguist)
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies
Windows International Division

This posting is provided "AS IS" with
no warranties, and confers no rights.
"steve" <ha***@asus.net> wrote in message
news:2d**************************@posting.google.c om...
Browsing these forums some more I see that that it might be important
to point out that I am using Access 2000. Also it seems like the split
function to break the string into an array is what I need to do but I
don;t know how.
Thanks,

Nov 13 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.