473,327 Members | 2,118 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,327 software developers and data experts.

string difference and similarity

I've got a series of data like this:

Long Sleeve White P/C Sm 32/33
Long Sleeve White P/C Med 32/33
....

What I'd like to do is extract the differences and the similarity. In
this case:

similar: Long Sleeve White P/C

difference: Med 32/33
If I were writing a function, I'd probably compare increasingly
longer substrings, but I'm thinking that php probably already has
functions for that. What is that?

I found "xdiff_string_diff", but I don't really understand it or how I
would get the common text.

Jeff
Nov 13 '08 #1
3 6360

"Jeff" <jeff@spam_me_not.comwrote in message
news:Qd******************************@earthlink.co m...
I've got a series of data like this:

Long Sleeve White P/C Sm 32/33
Long Sleeve White P/C Med 32/33
...

What I'd like to do is extract the differences and the similarity. In
this case:

similar: Long Sleeve White P/C

difference: Med 32/33
If I were writing a function, I'd probably compare increasingly longer
substrings, but I'm thinking that php probably already has functions for
that. What is that?

I found "xdiff_string_diff", but I don't really understand it or how I
would get the common text.
I haven't looked into that formula yet. One way to think about it as an
alternative would be to turn the strings into arrays and use either
array_intersect, array_diff, or loop through one of the arrays checking to
see if that value is in_array of the second. I'm not sure how your strings
are created, so it's hard to tell what would be appropriate...since:

I'm a crochety old man.

is different than

Old man, I'm crochety.

and not just by two characters. :^)
Nov 13 '08 #2
On Thu, 13 Nov 2008 01:39:42 -0500, jeff@spam_me_not.com wrote:
I've got a series of data like this:

Long Sleeve White P/C Sm 32/33
Long Sleeve White P/C Med 32/33
...

What I'd like to do is extract the differences and the similarity. In
this case:

similar: Long Sleeve White P/C

difference: Med 32/33
If I were writing a function, I'd probably compare increasingly
longer substrings, but I'm thinking that php probably already has
functions for that. What is that?

I found "xdiff_string_diff", but I don't really understand it or how I
would get the common text.

Jeff
If your string comparison needs are all as simple as your example,
using strspn() could probably suit your needs. Perhaps something
like:

<?php
$s1 = 'Long Sleeve White P/C Sm 32/33';
$s2 = 'Long Sleeve White P/C Med 32/33';

$matchlen = strspn($s1, $s2);

// remove 1st non-matching char
$same = substr($s1, 0, $matchlen - 1);

// include 1st non-matching char
$diff = substr($s2, $matchlen - 1);

printf("Same: [%s]\nDiff: [%s]", $same, $diff);
?>

strspn() will give us the length of the initial matching segment in
$s1. When writing a function, I'd check to see if the strings are
equal first, and preemptively return the string or whatever suits
your needs.

If you need a more complex algorithm, see the manual:

<URL:http://php.net/manual/en/function.levenshtein.php>
--
Curtis
$email = str_replace('sig.invalid', 'gmail.com', $from);
Nov 13 '08 #3
Jeff escribió:
I've got a series of data like this:

Long Sleeve White P/C Sm 32/33
Long Sleeve White P/C Med 32/33
...

What I'd like to do is extract the differences and the similarity. In
this case:

similar: Long Sleeve White P/C

difference: Med 32/33
The documentation looks slightly scarce but this package features inline
diffs:

http://pear.php.net/package/Text_Diff


--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://bits.demogracia.com
-- Mi web de humor al baño María: http://www.demogracia.com
--
Nov 13 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Bosconian | last post by:
I'm looking for a method to compare two strings and grade them for similarity. My idea is to strip out common words and punctuation and create a checksum of each remaining string. I would then...
5
by: Achim Domma | last post by:
Hi, I have a list of lets say 100-1000 strings and want to know which one is most similar to a reference string. Does somebody know such a library for Python? I don't need complicated scientific...
21
by: Chris S. | last post by:
I have a number of strings, containing wildcards (e.g. 'abc#e#' where # is anything), which I want to match with a test string (e.g 'abcdef'). What would be the best way for me to store my strings...
3
by: John Harman | last post by:
Hi, I'm trying to do a MySQL Query using Mysql 3.23.58 something like that below SELECT name FROM customers WHERE name LIKE "Fred" ORDER BY difference(name,"Fred"); The difference piece...
0
by: Anibal Acosta | last post by:
Somebody know an algorithm for determine the similarity between two string for example: string 1: "Hello what is your name?, where are you from?" string 2: "Hello man, where are you from?" ...
4
by: almurph | last post by:
Hi, Hope you can help me with this one. I'm looking for some nice string comparison algorithms. I want to be able to compare 2 strings (fairly smallish, less than 50 characters) and return a %...
9
by: Rajarshi | last post by:
Hi, I have some code that takes a string and obtains a compressed version using zlib.compress Does anybody know how I can remove the header portion of the compressed bytes, such that I only have...
6
by: Muhs | last post by:
Hi !! i want to compare two strings and return the difference For example, i have two strings, string str1="then"; string str2="than"; it compared the two strings and returns 1, as only one...
6
by: aznimah | last post by:
hi, i'm work on image comparison. i'm using the similarity measurement which i need to: 1) convert the image into the binary form since the algorithm that i've use works with binary data for the...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.