Connecting Tech Pros Worldwide Help | Site Map

string difference and similarity

  #1  
Old November 13th, 2008, 06:45 AM
Jeff
Guest
 
Posts: n/a
I've got a series of data like this:

Long Sleeve White P/C Sm 32/33
Long Sleeve White P/C Med 32/33
....

What I'd like to do is extract the differences and the similarity. In
this case:

similar: Long Sleeve White P/C

difference: Med 32/33


If I were writing a function, I'd probably compare increasingly
longer substrings, but I'm thinking that php probably already has
functions for that. What is that?

I found "xdiff_string_diff", but I don't really understand it or how I
would get the common text.

Jeff
  #2  
Old November 13th, 2008, 07:05 AM
Jessica Griego
Guest
 
Posts: n/a

re: string difference and similarity



"Jeff" <jeff@spam_me_not.comwrote in message
news:QdSdnUrAlu0sVobUnZ2dnUVZ_vWdnZ2d@earthlink.co m...
Quote:
I've got a series of data like this:
>
Long Sleeve White P/C Sm 32/33
Long Sleeve White P/C Med 32/33
...
>
What I'd like to do is extract the differences and the similarity. In
this case:
>
similar: Long Sleeve White P/C
>
difference: Med 32/33
>
>
If I were writing a function, I'd probably compare increasingly longer
substrings, but I'm thinking that php probably already has functions for
that. What is that?
>
I found "xdiff_string_diff", but I don't really understand it or how I
would get the common text.
I haven't looked into that formula yet. One way to think about it as an
alternative would be to turn the strings into arrays and use either
array_intersect, array_diff, or loop through one of the arrays checking to
see if that value is in_array of the second. I'm not sure how your strings
are created, so it's hard to tell what would be appropriate...since:

I'm a crochety old man.

is different than

Old man, I'm crochety.

and not just by two characters. :^)


  #3  
Old November 13th, 2008, 08:55 AM
Curtis
Guest
 
Posts: n/a

re: string difference and similarity


On Thu, 13 Nov 2008 01:39:42 -0500, jeff@spam_me_not.com wrote:
Quote:
I've got a series of data like this:
>
Long Sleeve White P/C Sm 32/33
Long Sleeve White P/C Med 32/33
...
>
What I'd like to do is extract the differences and the similarity. In
this case:
>
similar: Long Sleeve White P/C
>
difference: Med 32/33
>
>
If I were writing a function, I'd probably compare increasingly
longer substrings, but I'm thinking that php probably already has
functions for that. What is that?
>
I found "xdiff_string_diff", but I don't really understand it or how I
would get the common text.
>
Jeff
>
If your string comparison needs are all as simple as your example,
using strspn() could probably suit your needs. Perhaps something
like:

<?php
$s1 = 'Long Sleeve White P/C Sm 32/33';
$s2 = 'Long Sleeve White P/C Med 32/33';

$matchlen = strspn($s1, $s2);

// remove 1st non-matching char
$same = substr($s1, 0, $matchlen - 1);

// include 1st non-matching char
$diff = substr($s2, $matchlen - 1);

printf("Same: [%s]\nDiff: [%s]", $same, $diff);
?>

strspn() will give us the length of the initial matching segment in
$s1. When writing a function, I'd check to see if the strings are
equal first, and preemptively return the string or whatever suits
your needs.

If you need a more complex algorithm, see the manual:

<URL:http://php.net/manual/en/function.levenshtein.php>
--
Curtis
$email = str_replace('sig.invalid', 'gmail.com', $from);
  #4  
Old November 13th, 2008, 09:45 AM
=?ISO-8859-1?Q?=22=C1lvaro_G=2E_Vicario=22?=
Guest
 
Posts: n/a

re: string difference and similarity


Jeff escribió:
Quote:
I've got a series of data like this:
>
Long Sleeve White P/C Sm 32/33
Long Sleeve White P/C Med 32/33
...
>
What I'd like to do is extract the differences and the similarity. In
this case:
>
similar: Long Sleeve White P/C
>
difference: Med 32/33
The documentation looks slightly scarce but this package features inline
diffs:

http://pear.php.net/package/Text_Diff




--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://bits.demogracia.com
-- Mi web de humor al baño María: http://www.demogracia.com
--
Closed Thread


Similar Threads
Thread Thread Starter Forum Replies Last Post
String compare Muhs answers 6 March 13th, 2008 04:45 PM
Compute Similarity of Two Strings Charles Law answers 5 November 22nd, 2005 10:17 PM
Compute Similarity of Two Strings Charles Law answers 5 November 21st, 2005 08:04 PM
Compute Similarity of Two Strings Charles Law answers 5 July 26th, 2005 05:35 PM