Say you are allowed 3 errors in the string match:
Defining error: having to remove or add a character.
If you have to replace a character, that counts as 2 errors.
Also the search is case-insensitive for now.
So abcd and abc have 1 error
bcd and abcde have 2 errors
abcd and bcde have 2 errors
ace and abde have 3 errors
eg: (*contrived example) say the master list contains
Expand|Select|Wrap|Line Numbers
- Alpha
- Beta
- Gamma
- Delta
- Epsilon
- Zeta
- Eta
- Theta
- Iota
- Kappa
- Lambda
- Mu
- Nu
- Xi
- Omicron
- Pi
- Rho
- Sigma
- Tau
- Upsilon
- Phi
- Chi
- Psi
- Omega
eta with 1 error
delta with 1 error
beta with 2 errors
zeta with 2 errors
theta with 3 errors
The problem I have is, how do I determine when I should skip characters, add characters etc? This is particularly a problem when matching words with many of the same character, eg maybe
banana - banna
construction - constution
construction - constitution (shouldn't match)
Is there an easy way to determine on a mismatch whether I should:
1. Ignore a character from source?
2. Ignore a character from search substring?
3. Backtrack in some other way?
Or is there some way to 'hash' the string such that there are particular properties that we can use on the hash to determine if they are in fact similar?