I'm searching for a library which makes aproximative string matching,
for example, searching in a dictionary the word "motorcycle ", but
returns similar strings like "motorcicle ".
Is there such a library? 10 9529 el*******@hotma il.com wrote: This algorithm is called soundex. Here is one implementation example.
http://aspn.activestate.com/ASPN/Coo...n/Recipe/52213
here is another: http://effbot.org/librarybook/soundex.htm
Soundex is *one* particular algorithm for approximate
string matching. It is optimised for matching
Anglo-American names (like Smith/Smythe), and is
considered to be quite old and obsolete for all but the
most trivial applications -- or so I'm told.
Soundex will not match arbitrary changes -- it will
match both cat and cet, but it won't match cat and mat.
A more sophisticated approximate string matching
algorithm will use the Levenshtein distance. You can
find a Useless implementation here: http://www.uselesspython.com/download.php?script_id=108
Given a function levenshtein(s1, s2) that returns the
distance between two strings, you could use it for
approximate matching like this:
def approx_matching (strlist, target, dist=1):
"""Matches approximately strings in strlist to
a target string.
Returns a list of strings, where each string
matched is no further than an edit distance of
dist from the target.
"""
found = []
for s in strlist:
if levenshtein(s, target) <= dist:
found.append(s)
return s
--
Steven. el*******@hotma il.com wrote: This algorithm is called soundex. Here is one implementation example.
http://aspn.activestate.com/ASPN/Coo...n/Recipe/52213
here is another: http://effbot.org/librarybook/soundex.htm
Soundex is *one* particular algorithm for approximate
string matching. It is optimised for matching
Anglo-American names (like Smith/Smythe), and is
considered to be quite old and obsolete for all but the
most trivial applications -- or so I'm told.
Soundex will not match arbitrary changes -- it will
match both cat and cet, but it won't match cat and mat.
A more sophisticated approximate string matching
algorithm will use the Levenshtein distance. You can
find a Useless implementation here: http://www.uselesspython.com/download.php?script_id=108
Given a function levenshtein(s1, s2) that returns the
distance between two strings, you could use it for
approximate matching like this:
def approx_matching (strlist, target, dist=1):
"""Matches approximately strings in strlist to
a target string.
Returns a list of strings, where each string
matched is no further than an edit distance of
dist from the target.
"""
found = []
for s in strlist:
if levenshtein(s, target) <= dist:
found.append(s)
return s
--
Steven.
"javuchi" <ja*****@gmail. com> wrote: I'm searching for a library which makes aproximative string matching, for example, searching in a dictionary the word "motorcycle ", but returns similar strings like "motorcicle ".
Is there such a library?
There is an algorithm called Soundex that replaces each word by a
4-character string, such that all words that are pronounced similarly
encode to the same string.
The algorithm is easy to implement; you can probably find one by Googling.
--
- Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
"javuchi" <ja*****@gmail. com> wrote: I'm searching for a library which makes aproximative string matching, for example, searching in a dictionary the word "motorcycle ", but returns similar strings like "motorcicle ".
Is there such a library?
There is an algorithm called Soundex that replaces each word by a
4-character string, such that all words that are pronounced similarly
encode to the same string.
The algorithm is easy to implement; you can probably find one by Googling.
--
- Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
javuchi wrote: I'm searching for a library which makes aproximative string matching, for example, searching in a dictionary the word "motorcycle ", but returns similar strings like "motorcicle ".
Is there such a library?
agrep (aproximate grep) allows for a certain amount of errors and there
exist Python bindings ( http://www.bio.cam.ac.uk/~mw263/pyagrep.html)
Or google for "agrep python".
Daniel
javuchi wrote: I'm searching for a library which makes aproximative string matching, for example, searching in a dictionary the word "motorcycle ", but returns similar strings like "motorcicle ".
Is there such a library?
agrep (aproximate grep) allows for a certain amount of errors and there
exist Python bindings ( http://www.bio.cam.ac.uk/~mw263/pyagrep.html)
Or google for "agrep python".
Daniel
Tim Roberts wrote: I'm searching for a library which makes aproximative string matching, for example, searching in a dictionary the word "motorcycle ", but returns similar strings like "motorcicle ".
Is there such a library?
There is an algorithm called Soundex that replaces each word by a 4-character string, such that all words that are pronounced similarly encode to the same string.
The algorithm is easy to implement; you can probably find one by Googling.
Python used to ship with a soundex module, but it was removed
in 1.6, for various reasons. here's a replacement: http://orca.mojam.com/~skip/python/soundex.py
</F> This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Xah Lee |
last post by:
# -*- coding: utf-8 -*-
# Python
# Matching string patterns
#
# Sometimes you want to know if a string is of
# particular pattern. Let's say in your website
# you have converted all images files from gif
# format to png format. Now you need to change the
# html code to use the .png files. So, essentially
|
by: Tom Warren |
last post by:
I found a c program called similcmp on the net and converted it to vba
if anybody wants it. I'll post the technical research on it if there
is any call for it. It looks like it could be a useful tool for
breaking ties when a phonic call returns a bunch of possibilities.
Also, I'm looking for someone that has a zip code file with alternate...
|
by: Paul |
last post by:
hi, there,
for example,
char *mystr="##this is##a examp#le";
I want to replace all the "##" in mystr with "****". How can I do this?
I checked all the string functions in C, but did not find one.
|
by: javuchi |
last post by:
I'm searching for a library which makes aproximative string matching,
for example, searching in a dictionary the word "motorcycle", but
returns similar strings like "motorcicle".
Is there such a library?
|
by: Day Of The Eagle |
last post by:
Jeff_Relf wrote:
> ...yet you don't even know what RegEx is.
>
I'm looking at the source code for mono's Regex implementation right
now. You can download that source here ( use the class libraries
download ).
http://www.mono-project.com/Downloads
| |
by: olaufr |
last post by:
Hi,
I'd need to perform simple pattern matching within a string using a
list of possible patterns. For example, I want to know if the substring
starting at position n matches any of the string I have a list, as
below:
sentence = "the color is $red"
patterns =
pos = sentence.find($)
|
by: Kevin CH |
last post by:
Hi,
I'm currently running into a confusion on regex and hopefully you guys
can clear it up for me.
Suppose I have a regular expression (0|(1(01*0)*1))* and two test
strings: 110_1011101_ and _101101_1. (The underscores are not part of
the string. They are added to show that both string has a substring
that matches the pattern.) ...
|
by: regis |
last post by:
Greetings,
about scanf matching nonempty sequences using the "%" matches a nonempty sequence of anything except '-'
"%" matches a nonempty sequence of anything except ']" matches a nonempty sequence of anything except ']'
"%" matches a nonempty sequence of anything except '^'
"%" matches a nonempty sequence of '-'
"%" matches a nonempty...
|
by: tech |
last post by:
Hi, I need a function to specify a match pattern including using
wildcard characters as below
to find chars in a std::string.
The match pattern can contain the wildcard characters "*" and "?",
where "*" matches zero or more consecutive occurrences of any
character and "?" matches a single occurrence of any character.
Does boost or some...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it. ...
| |
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in...
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...
| |