Connecting Tech Pros Worldwide Forums | Help | Site Map

Algorithm for detecting/highlighting changes in documents

Chung Leong
Guest
 
Posts: n/a
#1: Aug 19 '05
Anyone knows a good algorithm for detecting difference between two
pieces of text? I'm working on a content management system and would
like to add the ability to highlight changes made between versions. I
don't think diff is suitable here, since I want word level detection.
Besides, I'll need to handle a large number of relatively short
strings. Spawning an process for each would be too time consuming.

AFAIK, PHP doesn't have a build-in function. If necessary, I can build
a wrapper extension.


Andy Hassall
Guest
 
Posts: n/a
#2: Aug 19 '05

re: Algorithm for detecting/highlighting changes in documents


On 19 Aug 2005 10:29:01 -0700, "Chung Leong" <chernyshevsky@hotmail.com> wrote:
[color=blue]
>Anyone knows a good algorithm for detecting difference between two
>pieces of text? I'm working on a content management system and would
>like to add the ability to highlight changes made between versions. I
>don't think diff is suitable here, since I want word level detection.
>Besides, I'll need to handle a large number of relatively short
>strings. Spawning an process for each would be too time consuming.
>
>AFAIK, PHP doesn't have a build-in function. If necessary, I can build
>a wrapper extension.[/color]

http://pear.php.net/package/Text_Diff

See in particular the "inline" render which does word-level diffs.

--
Andy Hassall / <andy@andyh.co.uk> / <http://www.andyh.co.uk>
<http://www.andyhsoftware.co.uk/space> Space: disk usage analysis tool
R. Rajesh Jeba Anbiah
Guest
 
Posts: n/a
#3: Aug 20 '05

re: Algorithm for detecting/highlighting changes in documents


Chung Leong wrote:[color=blue]
> Anyone knows a good algorithm for detecting difference between two
> pieces of text? I'm working on a content management system and would
> like to add the ability to highlight changes made between versions. I
> don't think diff is suitable here, since I want word level detection.
> Besides, I'll need to handle a large number of relatively short
> strings. Spawning an process for each would be too time consuming.
>
> AFAIK, PHP doesn't have a build-in function. If necessary, I can build
> a wrapper extension.[/color]

Many wiki diffs I have seen are using <http://in.php.net/array_diff>

[p.s. Will be away for next couple of days. And may not be able to
follow-up]

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com

Chung Leong
Guest
 
Posts: n/a
#4: Aug 26 '05

re: Algorithm for detecting/highlighting changes in documents


Thanks for the info. The code pointed me in the right direction. The
xdiff extension is just what I need. I'll have to tweak it though,
since I'm diffing UTF-8 text.

Closed Thread