By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,269 Members | 1,507 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,269 IT Pros & Developers. It's quick & easy.

dotlucene question: determine count of matched tokens?

P: n/a
I am working on trying to compare "titles" from two different lists and
trying to determine the most likely match. When a similar entry appears in
both, the result works well, but what I need to do is refine my search to
filter out results that are not in both.

I am currently getting a super high percentage for items that clearly do not
match. For instance, the item "Essence Of A Miracle : Elizabeth Neal" is
matching 100% with the item "power of a flower : wilkinson neal" which is
not the case and only three keywords match (I am not using stop works so the
"of" and "a" are also matching).

What I would like to do is also compare the number of matching keywords with
the total number of keywords. In the case above, since three keywords match
and since there are 6 keywords total, that would mean that only 50% of the
keywords match. If I was able to calculate the 50%, I could then adjust the
score so that matches that to not have a certain percentage of similar
keywords (say 80%) would be discarded. Is this possible? Can I get access
to a keyword count (both total and matching)?
Feb 1 '06 #1
Share this question for a faster answer!
Share on Google+

This discussion thread is closed

Replies have been disabled for this discussion.