473,387 Members | 1,722 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Algorithm used by difflib.get_close_match


Hi all,

Does anyone know whether this function uses edit distance? If not,
which algorithm is it using?

Regards,

Guillermo
Sep 2 '08 #1
2 2351
On Sep 2, 2:17*pm, Guillermo <guillermo.lis...@googlemail.comwrote:
Hi all,

Does anyone know whether this function uses edit distance? If not,
which algorithm is it using?

Regards,

Guillermo
help(difflib.get_close_matches) will give you your first clue...
Sep 2 '08 #2
On Tue, 2 Sep 2008 06:17:37 -0700 (PDT), Guillermo wrote:
Does anyone know whether this function uses edit distance? If not,
which algorithm is it using?
The following passage comes from difflib.py:

SequenceMatcher is a flexible class for comparing pairs of sequences of
any type, so long as the sequence elements are hashable. The basic
algorithm predates, and is a little fancier than, an algorithm
published in the late 1980's by Ratcliff and Obershelp under the
hyperbolic name "gestalt pattern matching". The basic idea is to find
the longest contiguous matching subsequence that contains no "junk"
elements (R-O doesn't address junk). The same idea is then applied
recursively to the pieces of the sequences to the left and to the right
of the matching subsequence. This does not yield minimal edit
sequences, but does tend to yield matches that "look right" to
people.

HTH.

--
Regards,
Wojtek Walczak,
http://tosh.pl/gminick/
Sep 2 '08 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: shuhsien | last post by:
Hi, I am confused by the junk parameter in the difflib.sequencematcher. I thought it would simply ignore everything that's returned true by the junk function. However, I have results as follows:...
3
by: Humpdydum | last post by:
Can anyone try the following in their python interpreter? These give correct output: >>> print list(ndiff(,)) >>> print list(ndiff(,)) >>> print list(ndiff(,))
11
by: John Henry | last post by:
I am just wondering what's with get_close_matches() in difflib. What's the magic? How fuzzy do I need to get in order to get a match?
1
by: Neilen Marais | last post by:
Hi I'm trying to compare some text to find differences other than whitespace. I seem to be misunderstanding something, since I can't even get a basic example to work: In : d =...
0
by: stefaan | last post by:
Hello List, I am using difflib.HtmlDiff and it provides great functionality. Unfortunately it is too slow for my purpose. Is anyone aware of an alternative ? - a C-implementation lying around...
7
by: whitewave | last post by:
Hi Guys, I'm a bit confused in difflib. In most cases, the differences found using difflib works well but when I have come across the following set of text: .... problem even for the simple...
2
by: krishnakant Mane | last post by:
hello all, I have a bit of a confusing question. firstly I wanted a library which can do an svn like diff with two files. let's say I have file1 and file2 where file2 contains some thing which...
1
by: erikcw | last post by:
Hi, I'm trying to create an undo/redo feature for a webapp I'm working on (django based). I'd like to have an undo/redo function. My first thought was to use the difflib to generate a diff to...
3
by: n00m | last post by:
from random import randint s1 = '' s2 = '' for i in xrange(1000): s1 += chr(randint(97,122)) s2 += chr(randint(97,122)) print s1
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.