I am implementing dup file finder via getting Hash (MD5 or SHA1) from files.
I know how to get Hash but my problem is files are big. Real big (~3-4G
each). That is lot of read from HDD if I am doing a match with even 20
files.
My solution is to get hash from few MB of each file (e.g. first 10MB). At
least it reduces lots of read and time. It does the job but is not a sure
shot way because it is not 'really' the true hash of a file. Thus I am not
happy with this solution.
Any better way anyone can think off?
Thank you,
--
Po