By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
431,943 Members | 1,890 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 431,943 IT Pros & Developers. It's quick & easy.

Aproximative string matching

P: n/a
I'm searching for a library which makes aproximative string matching,
for example, searching in a dictionary the word "motorcycle", but
returns similar strings like "motorcicle".

Is there such a library?

Nov 22 '05 #1
Share this Question
Share on Google+
10 Replies


P: n/a
This algorithm is called soundex. Here is one implementation example.

http://aspn.activestate.com/ASPN/Coo...n/Recipe/52213

here is another:
http://effbot.org/librarybook/soundex.htm

Nov 22 '05 #2

P: n/a
This algorithm is called soundex. Here is one implementation example.

http://aspn.activestate.com/ASPN/Coo...n/Recipe/52213

here is another:
http://effbot.org/librarybook/soundex.htm

Nov 22 '05 #3

P: n/a
el*******@hotmail.com wrote:
This algorithm is called soundex. Here is one implementation example.

http://aspn.activestate.com/ASPN/Coo...n/Recipe/52213

here is another:
http://effbot.org/librarybook/soundex.htm


Soundex is *one* particular algorithm for approximate
string matching. It is optimised for matching
Anglo-American names (like Smith/Smythe), and is
considered to be quite old and obsolete for all but the
most trivial applications -- or so I'm told.

Soundex will not match arbitrary changes -- it will
match both cat and cet, but it won't match cat and mat.

A more sophisticated approximate string matching
algorithm will use the Levenshtein distance. You can
find a Useless implementation here:

http://www.uselesspython.com/download.php?script_id=108
Given a function levenshtein(s1, s2) that returns the
distance between two strings, you could use it for
approximate matching like this:

def approx_matching(strlist, target, dist=1):
"""Matches approximately strings in strlist to
a target string.

Returns a list of strings, where each string
matched is no further than an edit distance of
dist from the target.
"""
found = []
for s in strlist:
if levenshtein(s, target) <= dist:
found.append(s)
return s

--
Steven.

Nov 22 '05 #4

P: n/a
el*******@hotmail.com wrote:
This algorithm is called soundex. Here is one implementation example.

http://aspn.activestate.com/ASPN/Coo...n/Recipe/52213

here is another:
http://effbot.org/librarybook/soundex.htm


Soundex is *one* particular algorithm for approximate
string matching. It is optimised for matching
Anglo-American names (like Smith/Smythe), and is
considered to be quite old and obsolete for all but the
most trivial applications -- or so I'm told.

Soundex will not match arbitrary changes -- it will
match both cat and cet, but it won't match cat and mat.

A more sophisticated approximate string matching
algorithm will use the Levenshtein distance. You can
find a Useless implementation here:

http://www.uselesspython.com/download.php?script_id=108
Given a function levenshtein(s1, s2) that returns the
distance between two strings, you could use it for
approximate matching like this:

def approx_matching(strlist, target, dist=1):
"""Matches approximately strings in strlist to
a target string.

Returns a list of strings, where each string
matched is no further than an edit distance of
dist from the target.
"""
found = []
for s in strlist:
if levenshtein(s, target) <= dist:
found.append(s)
return s

--
Steven.

Nov 22 '05 #5

P: n/a
"javuchi" <ja*****@gmail.com> wrote:

I'm searching for a library which makes aproximative string matching,
for example, searching in a dictionary the word "motorcycle", but
returns similar strings like "motorcicle".

Is there such a library?


There is an algorithm called Soundex that replaces each word by a
4-character string, such that all words that are pronounced similarly
encode to the same string.

The algorithm is easy to implement; you can probably find one by Googling.
--
- Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Nov 22 '05 #6

P: n/a
"javuchi" <ja*****@gmail.com> wrote:

I'm searching for a library which makes aproximative string matching,
for example, searching in a dictionary the word "motorcycle", but
returns similar strings like "motorcicle".

Is there such a library?


There is an algorithm called Soundex that replaces each word by a
4-character string, such that all words that are pronounced similarly
encode to the same string.

The algorithm is easy to implement; you can probably find one by Googling.
--
- Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Nov 22 '05 #7

P: n/a
javuchi wrote:
I'm searching for a library which makes aproximative string matching,
for example, searching in a dictionary the word "motorcycle", but
returns similar strings like "motorcicle".

Is there such a library?


agrep (aproximate grep) allows for a certain amount of errors and there
exist Python bindings (http://www.bio.cam.ac.uk/~mw263/pyagrep.html)

Or google for "agrep python".

Daniel

Nov 22 '05 #8

P: n/a
javuchi wrote:
I'm searching for a library which makes aproximative string matching,
for example, searching in a dictionary the word "motorcycle", but
returns similar strings like "motorcicle".

Is there such a library?


agrep (aproximate grep) allows for a certain amount of errors and there
exist Python bindings (http://www.bio.cam.ac.uk/~mw263/pyagrep.html)

Or google for "agrep python".

Daniel

Nov 22 '05 #9

P: n/a
Tim Roberts wrote:
I'm searching for a library which makes aproximative string matching,
for example, searching in a dictionary the word "motorcycle", but
returns similar strings like "motorcicle".

Is there such a library?


There is an algorithm called Soundex that replaces each word by a
4-character string, such that all words that are pronounced similarly
encode to the same string.

The algorithm is easy to implement; you can probably find one by Googling.


Python used to ship with a soundex module, but it was removed
in 1.6, for various reasons. here's a replacement:

http://orca.mojam.com/~skip/python/soundex.py

</F>

Nov 22 '05 #10

P: n/a
Tim Roberts wrote:
I'm searching for a library which makes aproximative string matching,
for example, searching in a dictionary the word "motorcycle", but
returns similar strings like "motorcicle".

Is there such a library?


There is an algorithm called Soundex that replaces each word by a
4-character string, such that all words that are pronounced similarly
encode to the same string.

The algorithm is easy to implement; you can probably find one by Googling.


Python used to ship with a soundex module, but it was removed
in 1.6, for various reasons. here's a replacement:

http://orca.mojam.com/~skip/python/soundex.py

</F>

Nov 22 '05 #11

This discussion thread is closed

Replies have been disabled for this discussion.