By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,995 Members | 1,217 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,995 IT Pros & Developers. It's quick & easy.

synching with os.walk()

P: n/a
Hi all

os.walk() is a nice generator for performing actions on all files in a
directory and subdirectories. However, how can one use os.walk() for walking
through two hierarchies at once? I want to synchronise two directories (just
backup for now), but cannot see how I can traverse a second one. I do this
now with os.listdir() recursively, which works fine, but I am afraid that
recursion can become inefficient for large hierarchies.

thanks for your help
André
Nov 24 '06 #1
Share this Question
Share on Google+
8 Replies


P: n/a
os.walk() is a nice generator for performing actions on all files in a
directory and subdirectories. However, how can one use os.walk() for walking
through two hierarchies at once? I want to synchronise two directories (just
backup for now), but cannot see how I can traverse a second one. I do this
now with os.listdir() recursively, which works fine, but I am afraid that
recursion can become inefficient for large hierarchies.
I've run into wanting to work with parallel directory structures
before, and what I generally do is something like:

for root, dirs, files in os.walk( dir1 ):
dir2_root = dir2 + root[len(dir1):]
for f in files:
dir1_path = os.path.join( root, f )
dir2_path = os.path.join( dir2_root, f )

Does this work for your needs?
-- Nils

Nov 24 '06 #2

P: n/a

Andre Meyer wrote:
Hi all

os.walk() is a nice generator for performing actions on all files in a
directory and subdirectories. However, how can one use os.walk() for walking
through two hierarchies at once? I want to synchronise two directories (just
backup for now), but cannot see how I can traverse a second one. I do this
now with os.listdir() recursively, which works fine, but I am afraid that
recursion can become inefficient for large hierarchies.

thanks for your help
André
Walk each tree individually gathering file names relative to the head
of the tree and modification data.

compare the two sets of data to generate:
1. A list of what needs to be copied from the original to the copy.
2. A list of what needs to be copied from the copy to the original

Do the copying.

|You might want to show the user what needs to be done and give them
the option of aborting after generating the copy lists.

- Paddy.

Nov 24 '06 #3

P: n/a

Paddy wrote:
Andre Meyer wrote:
Hi all

os.walk() is a nice generator for performing actions on all files in a
directory and subdirectories. However, how can one use os.walk() for walking
through two hierarchies at once? I want to synchronise two directories (just
backup for now), but cannot see how I can traverse a second one. I do this
now with os.listdir() recursively, which works fine, but I am afraid that
recursion can become inefficient for large hierarchies.

thanks for your help
André

Walk each tree individually gathering file names relative to the head
of the tree and modification data.

compare the two sets of data to generate:
1. A list of what needs to be copied from the original to the copy.
2. A list of what needs to be copied from the copy to the original

Do the copying.

|You might want to show the user what needs to be done and give them
the option of aborting after generating the copy lists.

- Paddy.
P.S. If you are on a Unix type system you can use tar to do the copying
as you can easily compress the data if it needs to go over a sow link,
and tar will take care of creating any needed directories in the
destination if you create new directories as well as new files.
- Paddy.

Nov 24 '06 #4

P: n/a
>os.walk() is a nice generator for performing actions on all files in a
directory and subdirectories. However, how can one use os.walk() for walking
through two hierarchies at once? I want to synchronise two directories (just
backup for now), but cannot see how I can traverse a second one. I do this
now with os.listdir() recursively, which works fine, but I am afraid that
recursion can become inefficient for large hierarchies.

I've run into wanting to work with parallel directory structures
before, and what I generally do is something like:

for root, dirs, files in os.walk( dir1 ):
dir2_root = dir2 + root[len(dir1):]
for f in files:
dir1_path = os.path.join( root, f )
dir2_path = os.path.join( dir2_root, f )
Wouldn't it be better to implement tree traversing into a class, then
you can traverse two directory trees at once and can do funny things
with it?

Thomas

Nov 24 '06 #5

P: n/a

Paddy wrote:
P.S. If you are on a Unix type system you can use tar to do the copying
as you can easily compress the data if it needs to go over a sow link,
Sow links, transfers your data and then may form a tasty sandwich when
cooked.

(The original should, of course, read ...slow...)
- Pad.

Nov 24 '06 #6

P: n/a
Andre Meyer wrote:
Hi all

os.walk() is a nice generator for performing actions on all files in a
directory and subdirectories. However, how can one use os.walk() for
walking through two hierarchies at once? I want to synchronise two
directories (just backup for now), but cannot see how I can traverse a
second one. I do this now with os.listdir() recursively, which works
fine, but I am afraid that recursion can become inefficient for large
hierarchies.

thanks for your help
André
http://aspn.activestate.com/ASPN/Coo.../Recipe/191017
might be what you are looking for, or at least a starting point...

Regards,
antoine
Nov 24 '06 #7

P: n/a
Antoine De Groote wrote:
>
http://aspn.activestate.com/ASPN/Coo.../Recipe/191017
might be what you are looking for, or at least a starting point...
There's an updated version of this script at pages 403-04 of the Python
Cookbook 2nd Edition.

rd

Nov 25 '06 #8

P: n/a


On Nov 24, 7:57 am, "Andre Meyer" <m...@acm.orgwrote:
>
os.walk() is a nice generator for performing actions on all files in a
directory and subdirectories. However, how can one use os.walk() for walking
through two hierarchies at once? I want to synchronise two directories (just
backup for now), but cannot see how I can traverse a second one. I do this
now with os.listdir() recursively, which works fine, but I am afraid that
recursion can become inefficient for large hierarchies.
I wrote a script to perform this function using the dircmp class in the
filecmp module. I did something similar to this:
import filecmp, os, shutil

def backup(d1,d2):
print 'backing up %s to %s' % (d1,d2)
compare = filecmp.dircmp(d1,d2)
for item in compare.left_only:
fullpath = os.path.join(d1, item)
if os.path.isdir(fullpath):
shutil.copytree(fullpath,os.path.join(d2,item))
elif os.path.isfile(fullpath):
shutil.copy2(fullpath,d2)
for item in compare.diff_files:
shutil.copy2(os.path.join(d1,item),d2)
for item in compare.common_dirs:
backup(os.path.join(d1,item),os.path.join(d2,item) )

if __name__ == '__main__':
import sys
if len(sys.argv) == 3:
backup(sys.argv[1], sys.argv[2])

My script has some error checking and keeps up to 5 previous versions
of a changed file. I find it very efficient, even with recursion, as it
only actually copies those files that have changed. I sync somewhere
around 5 GB worth of files nightly across the network and I haven't had
any trouble.

Of course, if I just had rsync available, I would use that.

Hope this helps,

Pete

Nov 27 '06 #9

This discussion thread is closed

Replies have been disabled for this discussion.