On Sat, 26 Feb 2005 23:53:10 +0100, Patrick Useldinger
<pu*********@gm ail.com> wrote:
I've tested it intensively
"Famous Last Words" :-)
Thanks for your feedback!
Here's some more:
(1) Manic s/w producing lots of files all the same size: the Borland
C[++] compiler produces a debug symbol file (.tds) that's always
384KB; I have 144 of these on my HD, rarely more than 1 in the same
directory.
Here's a snippet from a duplicate detection run:
DUP|393216|2|\d evel\delimited\ build\lib.win32-1.5\delimited.t ds|\devel\delim ited\build\lib. win32-2.1\delimited.t ds
DUP|393216|2|\d evel\delimited\ build\lib.win32-2.3\delimited.t ds|\devel\delim ited\build\lib. win32-2.4\delimited.t ds
(2) There appears to be a flaw in your logic such that it will find
duplicates only if they are in the *SAME* directory and only when
there are no other directories with two or more files of the same
size. The above duplicates were detected only when I made the
following changes to your script:
--- fdups Sat Feb 26 06:41:36 2005
+++ fdups_jm.py Sun Feb 27 12:18:04 2005
@@ -29,13 +29,14 @@
self.count = self.totalsize = self.inodecount =
self.slinkcount = 0
self.gain = self.bytescompa red = self.bytesread =
self.inodecount = 0
for toplevel in args:
- os.path.walk(to plevel, self.buildList, None)
+ os.path.walk(to plevel, self.updateDict , None)
if self.count > 0:
self.compare()
- def buildList(self, arg,dirpath,nam elist):
- """ build a dictionnary of files to be analysed, indexed by
length """
- files = {}
+ def updateDict(self ,arg,dirpath,na melist):
+ """ update a dictionary of files to be analysed, indexed by
length """
+ # files = {}
+ files = self.compfiles
for filepath in namelist:
fullpath = os.path.join(di rpath,filepath)
if os.path.isfile( fullpath):
@@ -51,20 +52,23 @@
if size >= MIN_FILESIZE:
self.count += 1
self.totalsize += size
+ # is above totalling in the wrong place?
if size not in files:
files[size]=[fullpath]
else:
files[size].append(fullpat h)
- for size in files:
- if len(files[size]) != 1:
- self.compfiles[size]=files[size]
+ # for size in files:
+ # if len(files[size]) != 1:
+ # self.compfiles[size]=files[size]
def compare(self):
""" compare all files of the same size - outer loop """
sizes=self.comp files.keys()
sizes.sort()
for size in sizes:
- self.comparefil es(size,self.co mpfiles[size])
+ list_of_filenam es = self.compfiles[size]
+ if len(list_of_fil enames) > 1:
+ self.comparefil es(size, list_of_filenam es)
def comparefiles(se lf,size,filelis t):
""" compare all files of the same size - inner loop """
(3) Your fdups-check gadget doesn't work on Windows; the commands
module works only on Unix but is supplied with Python on all
platforms. The results might just confuse a newbie:
(1, "'{' is not recognized as an internal or external
command,\nopera ble program or batch file.")
Why not use the Python filecmp module?
Cheers,
John