"Martin Schneider" <ma**************@illusion-factory.de> wrote in message
news:49************@individual.net...
Hi!
I have a database with approx. 2500 records. When entering a new record
I'd like to avoid double entries that may differ slightly in writing.
Currently I am calculating the Levenshtein Distance across all records and
offer the top ten matches. Unfortunately this is very slow on long
strings.
Is there a faster method to do this?
Best regards,
Martin
If you want to do such a detailed comparison against each and every record
in the database, then of course it will be slow. What can we say? Get a
faster PC or buy more RAM?
If we knew what was being stored in the database, we might be able to make
more helpful suggestions. So if the database stores contact details and you
wanted to add a new contact:
Jane Smith, 117 Waterloo Road, London, SE1 8UL, United Kingdom
Would I really want to go through and evaluate how closely Jane's name
matches someone called Jan Smitt from Norway before I was sure this was not
a duplicate? However, we know nothing about your application.