By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,018 Members | 1,204 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,018 IT Pros & Developers. It's quick & easy.

algorithm on comparing two html files

P: n/a
ddd
I am trying to build a diff tool that allows me to compare two HTML files. I
am looking for resources on how to achive this. The main problem is that I do
not want to simply highlight the line of code where the change happened, but
rather the word/text that changed.

Example say the html file contains a table with three cells/one row, and all
that changes between the two HTML files that I want to compare is the value
on the second cell. I need to be able to distinuish that thats what changed,
even if the actual html code was one single line. (basically comparing what
is being rendered displayed by the HTML renderer).

Any ideas suggestions on where I can start looking ?

thanks
Nov 16 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
hi ddd,

this is mahesh here
use xml,xsl,& xslt,xml schema
if you want to simply compare two html files you can use regular expresion
for byte by byte comparision but i think you want to point out differance
between the contents of two html files then i would suggest you to use xml &
xslt transformation to transform xml into html
then it will be easyer for you to point out difference between content of
two xml file (indirectly html file)
since xml imposes tree structure on documents for whitch you can use xml
schema(xsd)
& comparing nodes you can compare differance between contents of document

i am also newbee in programming field so my suggetion may be stupid if so
excuse me
bye have a good day,
ma************@yahoo.com
akstech solutions pvt ltd

"ddd" wrote:
I am trying to build a diff tool that allows me to compare two HTML files. I
am looking for resources on how to achive this. The main problem is that I do
not want to simply highlight the line of code where the change happened, but
rather the word/text that changed.

Example say the html file contains a table with three cells/one row, and all
that changes between the two HTML files that I want to compare is the value
on the second cell. I need to be able to distinuish that thats what changed,
even if the actual html code was one single line. (basically comparing what
is being rendered displayed by the HTML renderer).

Any ideas suggestions on where I can start looking ?

thanks

Nov 16 '05 #2

P: n/a
Not all HTML files are XML compliant. The majority of the webpages in
the internet aren't.

Regards
Senthil

Nov 16 '05 #3

P: n/a
You can try using MSHTML parser to parse the HTML file and compare the
contents.

Regards
Senthil

Nov 16 '05 #4

P: n/a
JR
The algorithm was published in E. Myers, "An O(ND) difference algorithm and
its variations," Algorithmica, vol. 1, pp. 251-266, 1986.

A good commercial product is Araxis Merge. It compares fles, not exactly
what you asked for. You could tokenize your files and then compare the
tokenized data.

I once wrote a DOS shareware program (JDIF -
http://www.qsm.co.il/Software/jdif.htm) which could be modified to compare
tokens rather than source lines. But it is limited by the DOS memory size.

For a reference implementation you could look at the GNU diff source code
(remember the GPL).

JR

"ddd" <dd*@discussions.microsoft.com> wrote in message
news:77**********************************@microsof t.com...
I am trying to build a diff tool that allows me to compare two HTML files.
I
am looking for resources on how to achive this. The main problem is that I
do
not want to simply highlight the line of code where the change happened,
but
rather the word/text that changed.

Example say the html file contains a table with three cells/one row, and
all
that changes between the two HTML files that I want to compare is the
value
on the second cell. I need to be able to distinuish that thats what
changed,
even if the actual html code was one single line. (basically comparing
what
is being rendered displayed by the HTML renderer).

Any ideas suggestions on where I can start looking ?

thanks

Nov 16 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.