I've been tasked with developing a document/file versioning system of sorts.
Similar to a very scaled down source control system. We're integrating it
very closely with an existing application so it must be 100% home grown.
Similar to how many source code controls systems behave I'd like to only
save only differences from the original in each of the versions and then
reassemble those differences when the latest version is requested. I have
some vague ideas but I thought I'd get some others opinions. What makes it a
little more complex is that I have to apply this to files with possible
binary data in them also. Can anyone point me into the direction of some
usefull or helpful framework classes or hints? Thanks!
Josh 4 1802
You can generate a Hash on both files and compare the hashed value. http://www.dotnetspider.com/technology/KBPages/397.aspx http://dotnetjunkies.com/WebLog/darr.../20/19820.aspx
--
W.G. Ryan MVP (Windows Embedded)
TiBA Solutions www.tibasolutions.com | www.devbuzz.com | www.knowdotnet.com
"Josh Carlisle" <jc*******@removeforspam.viewfusion.com> wrote in message
news:eN**************@TK2MSFTNGP12.phx.gbl... I've been tasked with developing a document/file versioning system of
sorts. Similar to a very scaled down source control system. We're integrating it very closely with an existing application so it must be 100% home grown. Similar to how many source code controls systems behave I'd like to only save only differences from the original in each of the versions and then reassemble those differences when the latest version is requested. I have some vague ideas but I thought I'd get some others opinions. What makes it
a little more complex is that I have to apply this to files with possible binary data in them also. Can anyone point me into the direction of some usefull or helpful framework classes or hints? Thanks!
Josh
Thanks for the quick reply :)
This looks like a good starting point, although the hashing algorithm will
only tell me if there is a difference but not what the differences are. Non
the less this is a very useful exercise for verification prior to actual
byte by byte comparison. Are you aware of any algorithms that are helpful in
this area? I was initially considering doing a byte for byte comparison for
actually identifying the changes (so only the changes are stored) but I
wasn't sure if there was a better way. I was about to check out the source
code of some open source windiff type projects but like most developers I'm
looking for some shortcuts :)
Anyway thanks again for the two links and any further advice you may have
would be appreciated.
Josh
"W.G. Ryan eMVP" <Wi*********@NoSpam.gmail.com> wrote in message
news:ug**************@TK2MSFTNGP14.phx.gbl... You can generate a Hash on both files and compare the hashed value. http://www.dotnetspider.com/technology/KBPages/397.aspx
http://dotnetjunkies.com/WebLog/darr.../20/19820.aspx
-- W.G. Ryan MVP (Windows Embedded)
TiBA Solutions www.tibasolutions.com | www.devbuzz.com | www.knowdotnet.com "Josh Carlisle" <jc*******@removeforspam.viewfusion.com> wrote in message news:eN**************@TK2MSFTNGP12.phx.gbl... I've been tasked with developing a document/file versioning system of sorts. Similar to a very scaled down source control system. We're integrating it very closely with an existing application so it must be 100% home grown. Similar to how many source code controls systems behave I'd like to only save only differences from the original in each of the versions and then reassemble those differences when the latest version is requested. I have some vague ideas but I thought I'd get some others opinions. What makes it a little more complex is that I have to apply this to files with possible binary data in them also. Can anyone point me into the direction of some usefull or helpful framework classes or hints? Thanks!
Josh
Hi Josh,
Comparing differences between files, especially binary files, in an
interesting topic. It has been described in various places. I did a little
google searching and found this excerpt: GNU diff was written by Mike Haertel, David Hayes, Richard Stallman, Len
Tower, and Paul Eggert. Wayne Davison designed and implemented the unified
output format. The basic algorithm is described in "An O(ND) Difference
Algorithm and its Variations", Eugene W. Myers, Algorithmica Vol. 1 No. 2,
1986, pp. 251--266; and in "A File Comparison Program", Webb Miller and
Eugene W. Myers, Software--Practice and Experience Vol. 15 No. 11, 1985, pp.
1025--1040. The algorithm was independently discovered as described in
"Algorithms for Approximate String Matching", E. Ukkonen, Information and
Control Vol. 64, 1985, pp. 100--118. <<
The cited articles are probably a good place to start. Unfortunately, I
have not read them, so I cannot comment on the algorithm itself.
I will say one thing though: modern code control systems do NOT store the
original file and then store the differences to get newer versions.
Modern systems store the MOST RECENT file and store the differences needed
to recreate Previous versions (since 99% of the time, you don't want the
first version... you want the most recent one.)
Also, given the low cost of hard drive space and the ability to simply
compress prior versions, you may want to simply consider keeping the entire
contents of each version of each file, simply compressing old versions to
save space.
One more thing to look at: If you have Windows Server 2003, you can
download, for free, Windows Sharepoint Services. This system gives you
simple document management capabilities, including the ability to set up a
virtual "folder" tree that contains "files" where you can store every
version of any or all files. It's pretty nice, and because it's free, you
would avoid most licensing issues. That's the upside. The downside: it
only runs on Windows Server 2003. If your customers cannot upgrade their
OS, then this can't be used as your back end. Still, it's worth
considering, if for no other reason that to simply Write Less Code.
Good Luck. I hope this helps,
--- Nick
"Josh Carlisle" <jc*******@removeforspam.viewfusion.com> wrote in message
news:eN**************@TK2MSFTNGP12.phx.gbl... I've been tasked with developing a document/file versioning system of
sorts. Similar to a very scaled down source control system. We're integrating it very closely with an existing application so it must be 100% home grown. Similar to how many source code controls systems behave I'd like to only save only differences from the original in each of the versions and then reassemble those differences when the latest version is requested. I have some vague ideas but I thought I'd get some others opinions. What makes it
a little more complex is that I have to apply this to files with possible binary data in them also. Can anyone point me into the direction of some usefull or helpful framework classes or hints? Thanks!
Josh
Nick,
I actually downloaded the windiff code so I'm going to take a look at that
but GNU diff looks interesting also. Luckily I'm comfortable enough with c
and c++ to get algorithms out of the code I need so hopefully that will be
usefull. Luckily we're not looking at some of the other features of most
source code controls systems (branching, merging, etc) so I'm hoping to keep
it simple. Also because of some of the unique aspects of what we are tieing
it into using a product like sharepoint isn't possible. However you make a
very good point about alternative mechansims. I had done some googling and
found some references about only storing differences so not knowing any
better I started to pursue that route but after reading your statement that
most modern systems store the current and only the differences for historic
purposes is some very valuable information and does make more sense. I had
originally thought of storing the complete versions but discounted it for
space concerns but I'm actually heading that direction more to reduce
complexity and as you say with the use of some compression (which I've used
with some remoting sinks in the past) makes it a very feasible.
Thanks for your replies.
Josh
"Nick Malik" <ni*******@hotmail.nospam.com> wrote in message
news:oGutd.154041$V41.20900@attbi_s52... Hi Josh,
Comparing differences between files, especially binary files, in an interesting topic. It has been described in various places. I did a little google searching and found this excerpt:
GNU diff was written by Mike Haertel, David Hayes, Richard Stallman, Len
Tower, and Paul Eggert. Wayne Davison designed and implemented the unified output format. The basic algorithm is described in "An O(ND) Difference Algorithm and its Variations", Eugene W. Myers, Algorithmica Vol. 1 No. 2, 1986, pp. 251--266; and in "A File Comparison Program", Webb Miller and Eugene W. Myers, Software--Practice and Experience Vol. 15 No. 11, 1985, pp. 1025--1040. The algorithm was independently discovered as described in "Algorithms for Approximate String Matching", E. Ukkonen, Information and Control Vol. 64, 1985, pp. 100--118. <<
The cited articles are probably a good place to start. Unfortunately, I have not read them, so I cannot comment on the algorithm itself.
I will say one thing though: modern code control systems do NOT store the original file and then store the differences to get newer versions.
Modern systems store the MOST RECENT file and store the differences needed to recreate Previous versions (since 99% of the time, you don't want the first version... you want the most recent one.)
Also, given the low cost of hard drive space and the ability to simply compress prior versions, you may want to simply consider keeping the entire contents of each version of each file, simply compressing old versions to save space.
One more thing to look at: If you have Windows Server 2003, you can download, for free, Windows Sharepoint Services. This system gives you simple document management capabilities, including the ability to set up a virtual "folder" tree that contains "files" where you can store every version of any or all files. It's pretty nice, and because it's free, you would avoid most licensing issues. That's the upside. The downside: it only runs on Windows Server 2003. If your customers cannot upgrade their OS, then this can't be used as your back end. Still, it's worth considering, if for no other reason that to simply Write Less Code.
Good Luck. I hope this helps,
--- Nick
"Josh Carlisle" <jc*******@removeforspam.viewfusion.com> wrote in message news:eN**************@TK2MSFTNGP12.phx.gbl... I've been tasked with developing a document/file versioning system of sorts. Similar to a very scaled down source control system. We're integrating it very closely with an existing application so it must be 100% home grown. Similar to how many source code controls systems behave I'd like to only save only differences from the original in each of the versions and then reassemble those differences when the latest version is requested. I have some vague ideas but I thought I'd get some others opinions. What makes it a little more complex is that I have to apply this to files with possible binary data in them also. Can anyone point me into the direction of some usefull or helpful framework classes or hints? Thanks!
Josh
This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: kevin.hall |
last post by:
I've got a problem where I have to identify differences in network. The
network may have different types of nodes and may only have a string of
ring-like topology:
Code:
A--B--C--D--E
or
|
by: Eric |
last post by:
Hi,
I need to find a way to identify between a few different file formats
WITHOUT looking at the file extension. Very often our customers will name
file incorrectly. For example, they'll send us...
|
by: yezi |
last post by:
Hi, ALl:
The following code is to canculate 2 vector distance. Suppose the
vectore is stored in some txt file like
-1 0.34
0 0.045
1 0.98
1 0.01
|
by: Goh |
last post by:
Hi,
I would like to know how can we implement a web page that
intelligent enough to unique identify that pc have been visit before without
any cookies and login user require.
I have try...
|
by: Shilpa |
last post by:
Hi All,
I want to write C# code to identify a file type and open the file in
the associated editor.
For example, text files should be identified and opened in notepad,
html should be opened in...
| |
by: Shilpa |
last post by:
Hi All,
I want to write C# code to identify a file type and open the file in
the associated editor.
For example, text files should be identified and opened in notepad,
html should be opened in...
|
by: h112211 |
last post by:
Hi,
I installed the newest available PIL (1.1.5 for Python 2.4) from their
site, but cannot seem to open any files. The following
from PIL import Image
i =...
|
by: Pieter |
last post by:
Hi,
For some procedures that throws exceptions, I would like to show different
messages to the user depending on what type of exception he's getting. For
instance this one: when the file is...
|
by: Alan Jones |
last post by:
Hello everyone, any help would be greatly appreciated. :)
What I'm trying to do may not be advisable, but here goes...
I want a page named signature.php to appear conditionally as
an include...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...
| |