On Nov 24, 7:57 am, "Andre Meyer" <m...@acm.orgwrote:
>
os.walk() is a nice generator for performing actions on all files in a
directory and subdirectories. However, how can one use os.walk() for walking
through two hierarchies at once? I want to synchronise two directories (just
backup for now), but cannot see how I can traverse a second one. I do this
now with os.listdir() recursively, which works fine, but I am afraid that
recursion can become inefficient for large hierarchies.
I wrote a script to perform this function using the dircmp class in the
filecmp module. I did something similar to this:
import filecmp, os, shutil
def backup(d1,d2):
print 'backing up %s to %s' % (d1,d2)
compare = filecmp.dircmp(d1,d2)
for item in compare.left_only:
fullpath = os.path.join(d1, item)
if os.path.isdir(fullpath):
shutil.copytree(fullpath,os.path.join(d2,item))
elif os.path.isfile(fullpath):
shutil.copy2(fullpath,d2)
for item in compare.diff_files:
shutil.copy2(os.path.join(d1,item),d2)
for item in compare.common_dirs:
backup(os.path.join(d1,item),os.path.join(d2,item) )
if __name__ == '__main__':
import sys
if len(sys.argv) == 3:
backup(sys.argv[1], sys.argv[2])
My script has some error checking and keeps up to 5 previous versions
of a changed file. I find it very efficient, even with recursion, as it
only actually copies those files that have changed. I sync somewhere
around 5 GB worth of files nightly across the network and I haven't had
any trouble.
Of course, if I just had rsync available, I would use that.
Hope this helps,
Pete