By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,212 Members | 1,178 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,212 IT Pros & Developers. It's quick & easy.

Algorithm used by difflib.get_close_match

P: n/a

Hi all,

Does anyone know whether this function uses edit distance? If not,
which algorithm is it using?

Regards,

Guillermo
Sep 2 '08 #1
Share this Question
Share on Google+
2 Replies


P: n/a
On Sep 2, 2:17*pm, Guillermo <guillermo.lis...@googlemail.comwrote:
Hi all,

Does anyone know whether this function uses edit distance? If not,
which algorithm is it using?

Regards,

Guillermo
help(difflib.get_close_matches) will give you your first clue...
Sep 2 '08 #2

P: n/a
On Tue, 2 Sep 2008 06:17:37 -0700 (PDT), Guillermo wrote:
Does anyone know whether this function uses edit distance? If not,
which algorithm is it using?
The following passage comes from difflib.py:

SequenceMatcher is a flexible class for comparing pairs of sequences of
any type, so long as the sequence elements are hashable. The basic
algorithm predates, and is a little fancier than, an algorithm
published in the late 1980's by Ratcliff and Obershelp under the
hyperbolic name "gestalt pattern matching". The basic idea is to find
the longest contiguous matching subsequence that contains no "junk"
elements (R-O doesn't address junk). The same idea is then applied
recursively to the pieces of the sequences to the left and to the right
of the matching subsequence. This does not yield minimal edit
sequences, but does tend to yield matches that "look right" to
people.

HTH.

--
Regards,
Wojtek Walczak,
http://tosh.pl/gminick/
Sep 2 '08 #3

This discussion thread is closed

Replies have been disabled for this discussion.