472,784 Members | 939 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,784 software developers and data experts.

string similarity comparison

I'm looking for a method to compare two strings and grade them for
similarity.

My idea is to strip out common words and punctuation and create a checksum
of each remaining string. I would then compare the checksums and if they are
close then there's a potential match (to be judged by user interaction.)

Can someone suggest an existing function or class to perform such a task?

Thanks!
Jul 17 '05 #1
11 6245
Bosconian wrote:
I'm looking for a method to compare two strings and grade them for
similarity.


http://docs.php.net/en/function.levenshtein.html

or

http://docs.php.net/en/function.similar-text.html

Cheers,
Nhcholas Sherlock
Jul 17 '05 #2
"Nicholas Sherlock" <n_********@hotmail.com> wrote in message
news:d4**********@lust.ihug.co.nz...
Bosconian wrote:
I'm looking for a method to compare two strings and grade them for
similarity.


http://docs.php.net/en/function.levenshtein.html

or

http://docs.php.net/en/function.similar-text.html

Cheers,
Nhcholas Sherlock


Nhcholas, I'm embarrassed to say that I didn't check php.net before posting
my message. Shame on me, but thanks for the tip!
Jul 17 '05 #3
BTW, would you happen to know if this can be done at the query level?

"Nicholas Sherlock" <n_********@hotmail.com> wrote in message
news:d4**********@lust.ihug.co.nz...
Bosconian wrote:
I'm looking for a method to compare two strings and grade them for
similarity.


http://docs.php.net/en/function.levenshtein.html

or

http://docs.php.net/en/function.similar-text.html

Cheers,
Nhcholas Sherlock

Jul 17 '05 #4
"Nicholas Sherlock" <n_********@hotmail.com> wrote in message
news:d4**********@lust.ihug.co.nz...
Bosconian wrote:
I'm looking for a method to compare two strings and grade them for
similarity.


http://docs.php.net/en/function.levenshtein.html

or

http://docs.php.net/en/function.similar-text.html

Cheers,
Nhcholas Sherlock


BTW, would you happen to know if this can be done at the query level?
Jul 17 '05 #5
Bosconian wrote:
"Nicholas Sherlock" <n_********@hotmail.com> wrote in message
news:d4**********@lust.ihug.co.nz...
Bosconian wrote:
I'm looking for a method to compare two strings and grade them for
similarity.


http://docs.php.net/en/function.levenshtein.html

or

http://docs.php.net/en/function.similar-text.html

BTW, would you happen to know if this can be done at the query level?


Ah, do you want to something like this made up query:

SELECT * FROM mytable WHERE text IS SORT OF SIMILAR TO $mysearch ?

If so, you can't do this with PHP functions. You may be able to find an
add-on for your database server which will add functionality like this,
but I don't think that it comes standard with any databases.

Cheers,
Nicholas Sherlock
Jul 17 '05 #6
Nicholas Sherlock wrote:
Bosconian wrote: <snip>
http://docs.php.net/en/function.similar-text.html

BTW, would you happen to know if this can be done at the query

level? Ah, do you want to something like this made up query:

SELECT * FROM mytable WHERE text IS SORT OF SIMILAR TO $mysearch ?

If so, you can't do this with PHP functions. You may be able to find an add-on for your database server which will add functionality like this, but I don't think that it comes standard with any databases.


Though MySQL supports user defined functions, in SQLite, it is easy
AFAIK (never tried); you can mix PHP user functions with query
<http://in2.php.net/sqlite>
--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

Jul 17 '05 #7
R. Rajesh Jeba Anbiah wrote:
<http://in2.php.net/sqlite>


To avoid having everyone over for a party at _your_ local server, you
should trim your url: <http://php.net/sqlite> (it then jumps to a local
server appropriate for the visitor).
--
Firefox Web Browser - Rediscover the web - http://getffox.com/
Thunderbird E-mail and Newsgroups - http://gettbird.com/
Jul 17 '05 #8
Ewoud Dronkert wrote:
R. Rajesh Jeba Anbiah wrote:
<http://in2.php.net/sqlite>
To avoid having everyone over for a party at _your_ local server, you

should trim your url: <http://php.net/sqlite> (it then jumps to a local server appropriate for the visitor).


Oh, yes.

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

Jul 17 '05 #9
"Ewoud Dronkert" <fi*******@lastname.net.invalid> wrote in message
news:42*********************@dreader4.news.xs4all. nl...
R. Rajesh Jeba Anbiah wrote:
<http://in2.php.net/sqlite>


To avoid having everyone over for a party at _your_ local server, you
should trim your url: <http://php.net/sqlite> (it then jumps to a local
server appropriate for the visitor).
--
Firefox Web Browser - Rediscover the web - http://getffox.com/
Thunderbird E-mail and Newsgroups - http://gettbird.com/


I usually use the server in Finland. Less traffic than the stateside
servers.
Jul 17 '05 #10
Chung Leong wrote:
"Ewoud Dronkert" <fi*******@lastname.net.invalid> wrote in message
news:42*********************@dreader4.news.xs4all. nl...
R. Rajesh Jeba Anbiah wrote:
<http://in2.php.net/sqlite>


To avoid having everyone over for a party at _your_ local server, you should trim your url: <http://php.net/sqlite> (it then jumps to a local server appropriate for the visitor).


I usually use the server in Finland. Less traffic than the stateside
servers.


I think, the mirror redirection is random and buggy. For me, it
often redirects to heavy traffic mirror.

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

Jul 17 '05 #11
"Nicholas Sherlock" <n_********@hotmail.com> wrote in message
news:d4**********@lust.ihug.co.nz...
Bosconian wrote:
"Nicholas Sherlock" <n_********@hotmail.com> wrote in message
news:d4**********@lust.ihug.co.nz...
Bosconian wrote:

I'm looking for a method to compare two strings and grade them for
similarity.

http://docs.php.net/en/function.levenshtein.html

or

http://docs.php.net/en/function.similar-text.html

BTW, would you happen to know if this can be done at the query level?


Ah, do you want to something like this made up query:

SELECT * FROM mytable WHERE text IS SORT OF SIMILAR TO $mysearch ?

If so, you can't do this with PHP functions. You may be able to find an
add-on for your database server which will add functionality like this,
but I don't think that it comes standard with any databases.

Cheers,
Nicholas Sherlock


Something like your mock query makes sense... kind of a LIKE clause on
steroids. I'm surprise MySQL doesn't support it.

In my case it's not a big deal. I'm only dealing with a couple hundred
records at the most. I can simply loop through the recordset (like the
levenshtein php.net example) and find any/all similarities with a value of
less than 10 or whatever.
Jul 17 '05 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Luca Montecchiani | last post by:
Introduction ------------ The need to find files that "resembled" in the name has pushed me to write an utility that unlike the other it was not based on the content of the files but on its name....
0
by: Anibal Acosta | last post by:
Somebody know an algorithm for determine the similarity between two string for example: string 1: "Hello what is your name?, where are you from?" string 2: "Hello man, where are you from?" ...
4
by: Michel Esber | last post by:
Hello, DB2 V8 FP 11. Given two strings, I need an UDF to compare both and return the percentage of matching characters. For example: ABCDEFGHIJ
4
by: almurph | last post by:
Hi, Hope you can help me with this one. I'm looking for some nice string comparison algorithms. I want to be able to compare 2 strings (fairly smallish, less than 50 characters) and return a %...
14
by: Steve Bergman | last post by:
I'm looking for a module to do fuzzy comparison of strings. I have 2 item master files which are supposed to be identical, but they have thousands of records where the item numbers don't match in...
9
by: subramanian100in | last post by:
Suppose we have char *a = "test message" ; Consider the comparison if (a == "string") ..... Here "string" is an array of characters. So shouldn't the compiler
5
by: Travis | last post by:
I understand its possible to locate substrings within a string (http:// www.cplusplus.com/reference/string/string/find.html) but is it possible to compare two strings for a particular threshold of...
3
by: Jeff | last post by:
I've got a series of data like this: Long Sleeve White P/C Sm 32/33 Long Sleeve White P/C Med 32/33 .... What I'd like to do is extract the differences and the similarity. In this case: ...
6
by: aznimah | last post by:
hi, i'm work on image comparison. i'm using the similarity measurement which i need to: 1) convert the image into the binary form since the algorithm that i've use works with binary data for the...
0
by: Rina0 | last post by:
Cybersecurity engineering is a specialized field that focuses on the design, development, and implementation of systems, processes, and technologies that protect against cyber threats and...
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 2 August 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
by: erikbower65 | last post by:
Here's a concise step-by-step guide for manually installing IntelliJ IDEA: 1. Download: Visit the official JetBrains website and download the IntelliJ IDEA Community or Ultimate edition based on...
0
by: kcodez | last post by:
As a H5 game development enthusiast, I recently wrote a very interesting little game - Toy Claw ((http://claw.kjeek.com/))。Here I will summarize and share the development experience here, and hope it...
0
by: Taofi | last post by:
I try to insert a new record but the error message says the number of query names and destination fields are not the same This are my field names ID, Budgeted, Actual, Status and Differences ...
14
DJRhino1175
by: DJRhino1175 | last post by:
When I run this code I get an error, its Run-time error# 424 Object required...This is my first attempt at doing something like this. I test the entire code and it worked until I added this - If...
0
by: Rina0 | last post by:
I am looking for a Python code to find the longest common subsequence of two strings. I found this blog post that describes the length of longest common subsequence problem and provides a solution in...
5
by: DJRhino | last post by:
Private Sub CboDrawingID_BeforeUpdate(Cancel As Integer) If = 310029923 Or 310030138 Or 310030152 Or 310030346 Or 310030348 Or _ 310030356 Or 310030359 Or 310030362 Or...
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.