By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,974 Members | 1,834 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,974 IT Pros & Developers. It's quick & easy.

string difference and similarity

P: n/a
I've got a series of data like this:

Long Sleeve White P/C Sm 32/33
Long Sleeve White P/C Med 32/33
....

What I'd like to do is extract the differences and the similarity. In
this case:

similar: Long Sleeve White P/C

difference: Med 32/33
If I were writing a function, I'd probably compare increasingly
longer substrings, but I'm thinking that php probably already has
functions for that. What is that?

I found "xdiff_string_diff", but I don't really understand it or how I
would get the common text.

Jeff
Nov 13 '08 #1
Share this Question
Share on Google+
3 Replies


P: n/a

"Jeff" <jeff@spam_me_not.comwrote in message
news:Qd******************************@earthlink.co m...
I've got a series of data like this:

Long Sleeve White P/C Sm 32/33
Long Sleeve White P/C Med 32/33
...

What I'd like to do is extract the differences and the similarity. In
this case:

similar: Long Sleeve White P/C

difference: Med 32/33
If I were writing a function, I'd probably compare increasingly longer
substrings, but I'm thinking that php probably already has functions for
that. What is that?

I found "xdiff_string_diff", but I don't really understand it or how I
would get the common text.
I haven't looked into that formula yet. One way to think about it as an
alternative would be to turn the strings into arrays and use either
array_intersect, array_diff, or loop through one of the arrays checking to
see if that value is in_array of the second. I'm not sure how your strings
are created, so it's hard to tell what would be appropriate...since:

I'm a crochety old man.

is different than

Old man, I'm crochety.

and not just by two characters. :^)
Nov 13 '08 #2

P: n/a
On Thu, 13 Nov 2008 01:39:42 -0500, jeff@spam_me_not.com wrote:
I've got a series of data like this:

Long Sleeve White P/C Sm 32/33
Long Sleeve White P/C Med 32/33
...

What I'd like to do is extract the differences and the similarity. In
this case:

similar: Long Sleeve White P/C

difference: Med 32/33
If I were writing a function, I'd probably compare increasingly
longer substrings, but I'm thinking that php probably already has
functions for that. What is that?

I found "xdiff_string_diff", but I don't really understand it or how I
would get the common text.

Jeff
If your string comparison needs are all as simple as your example,
using strspn() could probably suit your needs. Perhaps something
like:

<?php
$s1 = 'Long Sleeve White P/C Sm 32/33';
$s2 = 'Long Sleeve White P/C Med 32/33';

$matchlen = strspn($s1, $s2);

// remove 1st non-matching char
$same = substr($s1, 0, $matchlen - 1);

// include 1st non-matching char
$diff = substr($s2, $matchlen - 1);

printf("Same: [%s]\nDiff: [%s]", $same, $diff);
?>

strspn() will give us the length of the initial matching segment in
$s1. When writing a function, I'd check to see if the strings are
equal first, and preemptively return the string or whatever suits
your needs.

If you need a more complex algorithm, see the manual:

<URL:http://php.net/manual/en/function.levenshtein.php>
--
Curtis
$email = str_replace('sig.invalid', 'gmail.com', $from);
Nov 13 '08 #3

P: n/a
Jeff escribió:
I've got a series of data like this:

Long Sleeve White P/C Sm 32/33
Long Sleeve White P/C Med 32/33
...

What I'd like to do is extract the differences and the similarity. In
this case:

similar: Long Sleeve White P/C

difference: Med 32/33
The documentation looks slightly scarce but this package features inline
diffs:

http://pear.php.net/package/Text_Diff


--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://bits.demogracia.com
-- Mi web de humor al baño María: http://www.demogracia.com
--
Nov 13 '08 #4

This discussion thread is closed

Replies have been disabled for this discussion.