Lars von Wedel <vo******@lfpt.rwth-aachen.de> wrote in
news:Xn**********************************@137.226. 144.7:
I have a rather large number of files and I would like to check for
duplicates. Name or date are not relevant.
I found the MD5 class which seems to be suitable. Can I use the returned
byte[] as an index in a hashtable in order to create a mapping from
key and identical files (i.e. their names)?
Depending on your data, CRC32 might be enough and can be held in an integer.
CRC32 rarely returns dups. On any suspected ones you run a secondary check
using MD5, or raw compare.
The advantage is that it returns integers, easier to index, faster, etc..
--
Chad Z. Hower (a.k.a. Kudzu) -
http://www.hower.org/Kudzu/
"Programming is an art form that fights back"
ELKNews - Get your free copy at
http://www.atozedsoftware.com