I'm searching for a library which makes aproximative string matching,
for example, searching in a dictionary the word "motorcycle", but
returns similar strings like "motorcicle".
Is there such a library? 10 9513 el*******@hotmail.com wrote: This algorithm is called soundex. Here is one implementation example.
http://aspn.activestate.com/ASPN/Coo...n/Recipe/52213
here is another: http://effbot.org/librarybook/soundex.htm
Soundex is *one* particular algorithm for approximate
string matching. It is optimised for matching
Anglo-American names (like Smith/Smythe), and is
considered to be quite old and obsolete for all but the
most trivial applications -- or so I'm told.
Soundex will not match arbitrary changes -- it will
match both cat and cet, but it won't match cat and mat.
A more sophisticated approximate string matching
algorithm will use the Levenshtein distance. You can
find a Useless implementation here: http://www.uselesspython.com/download.php?script_id=108
Given a function levenshtein(s1, s2) that returns the
distance between two strings, you could use it for
approximate matching like this:
def approx_matching(strlist, target, dist=1):
"""Matches approximately strings in strlist to
a target string.
Returns a list of strings, where each string
matched is no further than an edit distance of
dist from the target.
"""
found = []
for s in strlist:
if levenshtein(s, target) <= dist:
found.append(s)
return s
--
Steven. el*******@hotmail.com wrote: This algorithm is called soundex. Here is one implementation example.
http://aspn.activestate.com/ASPN/Coo...n/Recipe/52213
here is another: http://effbot.org/librarybook/soundex.htm
Soundex is *one* particular algorithm for approximate
string matching. It is optimised for matching
Anglo-American names (like Smith/Smythe), and is
considered to be quite old and obsolete for all but the
most trivial applications -- or so I'm told.
Soundex will not match arbitrary changes -- it will
match both cat and cet, but it won't match cat and mat.
A more sophisticated approximate string matching
algorithm will use the Levenshtein distance. You can
find a Useless implementation here: http://www.uselesspython.com/download.php?script_id=108
Given a function levenshtein(s1, s2) that returns the
distance between two strings, you could use it for
approximate matching like this:
def approx_matching(strlist, target, dist=1):
"""Matches approximately strings in strlist to
a target string.
Returns a list of strings, where each string
matched is no further than an edit distance of
dist from the target.
"""
found = []
for s in strlist:
if levenshtein(s, target) <= dist:
found.append(s)
return s
--
Steven.
"javuchi" <ja*****@gmail.com> wrote: I'm searching for a library which makes aproximative string matching, for example, searching in a dictionary the word "motorcycle", but returns similar strings like "motorcicle".
Is there such a library?
There is an algorithm called Soundex that replaces each word by a
4-character string, such that all words that are pronounced similarly
encode to the same string.
The algorithm is easy to implement; you can probably find one by Googling.
--
- Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
"javuchi" <ja*****@gmail.com> wrote: I'm searching for a library which makes aproximative string matching, for example, searching in a dictionary the word "motorcycle", but returns similar strings like "motorcicle".
Is there such a library?
There is an algorithm called Soundex that replaces each word by a
4-character string, such that all words that are pronounced similarly
encode to the same string.
The algorithm is easy to implement; you can probably find one by Googling.
--
- Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
javuchi wrote: I'm searching for a library which makes aproximative string matching, for example, searching in a dictionary the word "motorcycle", but returns similar strings like "motorcicle".
Is there such a library?
agrep (aproximate grep) allows for a certain amount of errors and there
exist Python bindings ( http://www.bio.cam.ac.uk/~mw263/pyagrep.html)
Or google for "agrep python".
Daniel
javuchi wrote: I'm searching for a library which makes aproximative string matching, for example, searching in a dictionary the word "motorcycle", but returns similar strings like "motorcicle".
Is there such a library?
agrep (aproximate grep) allows for a certain amount of errors and there
exist Python bindings ( http://www.bio.cam.ac.uk/~mw263/pyagrep.html)
Or google for "agrep python".
Daniel
Tim Roberts wrote: I'm searching for a library which makes aproximative string matching, for example, searching in a dictionary the word "motorcycle", but returns similar strings like "motorcicle".
Is there such a library?
There is an algorithm called Soundex that replaces each word by a 4-character string, such that all words that are pronounced similarly encode to the same string.
The algorithm is easy to implement; you can probably find one by Googling.
Python used to ship with a soundex module, but it was removed
in 1.6, for various reasons. here's a replacement: http://orca.mojam.com/~skip/python/soundex.py
</F>
Tim Roberts wrote: I'm searching for a library which makes aproximative string matching, for example, searching in a dictionary the word "motorcycle", but returns similar strings like "motorcicle".
Is there such a library?
There is an algorithm called Soundex that replaces each word by a 4-character string, such that all words that are pronounced similarly encode to the same string.
The algorithm is easy to implement; you can probably find one by Googling.
Python used to ship with a soundex module, but it was removed
in 1.6, for various reasons. here's a replacement: http://orca.mojam.com/~skip/python/soundex.py
</F> This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Xah Lee |
last post by:
# -*- coding: utf-8 -*-
# Python
# Matching string patterns
#
# Sometimes you want to know if a string is of
# particular pattern. Let's say in your website
# you have converted all images...
|
by: Tom Warren |
last post by:
I found a c program called similcmp on the net and converted it to vba
if anybody wants it. I'll post the technical research on it if there
is any call for it. It looks like it could be a useful...
|
by: Paul |
last post by:
hi, there,
for example,
char *mystr="##this is##a examp#le";
I want to replace all the "##" in mystr with "****". How can I do this?
I checked all the string functions in C, but did not...
|
by: javuchi |
last post by:
I'm searching for a library which makes aproximative string matching,
for example, searching in a dictionary the word "motorcycle", but
returns similar strings like "motorcicle".
Is there such a...
|
by: Day Of The Eagle |
last post by:
Jeff_Relf wrote:
> ...yet you don't even know what RegEx is.
>
I'm looking at the source code for mono's Regex implementation right
now. You can download that source here ( use the class...
|
by: olaufr |
last post by:
Hi,
I'd need to perform simple pattern matching within a string using a
list of possible patterns. For example, I want to know if the substring
starting at position n matches any of the string I...
|
by: Kevin CH |
last post by:
Hi,
I'm currently running into a confusion on regex and hopefully you guys
can clear it up for me.
Suppose I have a regular expression (0|(1(01*0)*1))* and two test
strings: 110_1011101_ and...
|
by: regis |
last post by:
Greetings,
about scanf matching nonempty sequences using the "%" matches a nonempty sequence of anything except '-'
"%" matches a nonempty sequence of anything except ']" matches a nonempty...
|
by: tech |
last post by:
Hi, I need a function to specify a match pattern including using
wildcard characters as below
to find chars in a std::string.
The match pattern can contain the wildcard characters "*" and "?",...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
| |