473,411 Members | 1,937 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,411 software developers and data experts.

synching with os.walk()

Hi all

os.walk() is a nice generator for performing actions on all files in a
directory and subdirectories. However, how can one use os.walk() for walking
through two hierarchies at once? I want to synchronise two directories (just
backup for now), but cannot see how I can traverse a second one. I do this
now with os.listdir() recursively, which works fine, but I am afraid that
recursion can become inefficient for large hierarchies.

thanks for your help
André
Nov 24 '06 #1
8 2264
os.walk() is a nice generator for performing actions on all files in a
directory and subdirectories. However, how can one use os.walk() for walking
through two hierarchies at once? I want to synchronise two directories (just
backup for now), but cannot see how I can traverse a second one. I do this
now with os.listdir() recursively, which works fine, but I am afraid that
recursion can become inefficient for large hierarchies.
I've run into wanting to work with parallel directory structures
before, and what I generally do is something like:

for root, dirs, files in os.walk( dir1 ):
dir2_root = dir2 + root[len(dir1):]
for f in files:
dir1_path = os.path.join( root, f )
dir2_path = os.path.join( dir2_root, f )

Does this work for your needs?
-- Nils

Nov 24 '06 #2

Andre Meyer wrote:
Hi all

os.walk() is a nice generator for performing actions on all files in a
directory and subdirectories. However, how can one use os.walk() for walking
through two hierarchies at once? I want to synchronise two directories (just
backup for now), but cannot see how I can traverse a second one. I do this
now with os.listdir() recursively, which works fine, but I am afraid that
recursion can become inefficient for large hierarchies.

thanks for your help
André
Walk each tree individually gathering file names relative to the head
of the tree and modification data.

compare the two sets of data to generate:
1. A list of what needs to be copied from the original to the copy.
2. A list of what needs to be copied from the copy to the original

Do the copying.

|You might want to show the user what needs to be done and give them
the option of aborting after generating the copy lists.

- Paddy.

Nov 24 '06 #3

Paddy wrote:
Andre Meyer wrote:
Hi all

os.walk() is a nice generator for performing actions on all files in a
directory and subdirectories. However, how can one use os.walk() for walking
through two hierarchies at once? I want to synchronise two directories (just
backup for now), but cannot see how I can traverse a second one. I do this
now with os.listdir() recursively, which works fine, but I am afraid that
recursion can become inefficient for large hierarchies.

thanks for your help
André

Walk each tree individually gathering file names relative to the head
of the tree and modification data.

compare the two sets of data to generate:
1. A list of what needs to be copied from the original to the copy.
2. A list of what needs to be copied from the copy to the original

Do the copying.

|You might want to show the user what needs to be done and give them
the option of aborting after generating the copy lists.

- Paddy.
P.S. If you are on a Unix type system you can use tar to do the copying
as you can easily compress the data if it needs to go over a sow link,
and tar will take care of creating any needed directories in the
destination if you create new directories as well as new files.
- Paddy.

Nov 24 '06 #4
>os.walk() is a nice generator for performing actions on all files in a
directory and subdirectories. However, how can one use os.walk() for walking
through two hierarchies at once? I want to synchronise two directories (just
backup for now), but cannot see how I can traverse a second one. I do this
now with os.listdir() recursively, which works fine, but I am afraid that
recursion can become inefficient for large hierarchies.

I've run into wanting to work with parallel directory structures
before, and what I generally do is something like:

for root, dirs, files in os.walk( dir1 ):
dir2_root = dir2 + root[len(dir1):]
for f in files:
dir1_path = os.path.join( root, f )
dir2_path = os.path.join( dir2_root, f )
Wouldn't it be better to implement tree traversing into a class, then
you can traverse two directory trees at once and can do funny things
with it?

Thomas

Nov 24 '06 #5

Paddy wrote:
P.S. If you are on a Unix type system you can use tar to do the copying
as you can easily compress the data if it needs to go over a sow link,
Sow links, transfers your data and then may form a tasty sandwich when
cooked.

(The original should, of course, read ...slow...)
- Pad.

Nov 24 '06 #6
Andre Meyer wrote:
Hi all

os.walk() is a nice generator for performing actions on all files in a
directory and subdirectories. However, how can one use os.walk() for
walking through two hierarchies at once? I want to synchronise two
directories (just backup for now), but cannot see how I can traverse a
second one. I do this now with os.listdir() recursively, which works
fine, but I am afraid that recursion can become inefficient for large
hierarchies.

thanks for your help
André
http://aspn.activestate.com/ASPN/Coo.../Recipe/191017
might be what you are looking for, or at least a starting point...

Regards,
antoine
Nov 24 '06 #7
Antoine De Groote wrote:
>
http://aspn.activestate.com/ASPN/Coo.../Recipe/191017
might be what you are looking for, or at least a starting point...
There's an updated version of this script at pages 403-04 of the Python
Cookbook 2nd Edition.

rd

Nov 25 '06 #8


On Nov 24, 7:57 am, "Andre Meyer" <m...@acm.orgwrote:
>
os.walk() is a nice generator for performing actions on all files in a
directory and subdirectories. However, how can one use os.walk() for walking
through two hierarchies at once? I want to synchronise two directories (just
backup for now), but cannot see how I can traverse a second one. I do this
now with os.listdir() recursively, which works fine, but I am afraid that
recursion can become inefficient for large hierarchies.
I wrote a script to perform this function using the dircmp class in the
filecmp module. I did something similar to this:
import filecmp, os, shutil

def backup(d1,d2):
print 'backing up %s to %s' % (d1,d2)
compare = filecmp.dircmp(d1,d2)
for item in compare.left_only:
fullpath = os.path.join(d1, item)
if os.path.isdir(fullpath):
shutil.copytree(fullpath,os.path.join(d2,item))
elif os.path.isfile(fullpath):
shutil.copy2(fullpath,d2)
for item in compare.diff_files:
shutil.copy2(os.path.join(d1,item),d2)
for item in compare.common_dirs:
backup(os.path.join(d1,item),os.path.join(d2,item) )

if __name__ == '__main__':
import sys
if len(sys.argv) == 3:
backup(sys.argv[1], sys.argv[2])

My script has some error checking and keeps up to 5 previous versions
of a changed file. I find it very efficient, even with recursion, as it
only actually copies those files that have changed. I sync somewhere
around 5 GB worth of files nightly across the network and I haven't had
any trouble.

Of course, if I just had rsync available, I would use that.

Hope this helps,

Pete

Nov 27 '06 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Marcello Pietrobon | last post by:
Hello, I am using Pyton 2.3 I desire to walk a directory without recursion this only partly works: def walk_files() : for root, dirs, files in os.walk(top, topdown=True): for filename in...
6
by: rbt | last post by:
More of an OS question than a Python question, but it is Python related so here goes: When I do os.walk('/') on a Linux computer, the entire file system is walked. On windows, however, I can...
1
by: John | last post by:
Hi We have an access db which records daily orders. We would like the orders to be downloaded to a windows mobile 2003 device via usb cradle. Drivers then take along the device and get...
7
by: KraftDiner | last post by:
The os.walk function walks the operating systems directory tree. This seems to work, but I don't quite understand the tupple that is returned... Can someone explain please? for root, dirs,...
6
by: Bruce | last post by:
Hi all, I have a question about traversing file systems, and could use some help. Because of directories with many files in them, os.walk appears to be rather slow. I`m thinking there is a...
9
by: silverburgh.meryl | last post by:
i am trying to use python to walk thru each subdirectory from a top directory. Here is my script: savedPagesDirectory = "/home/meryl/saved_pages/data" dir=open(savedPagesDirectory, 'r') ...
2
by: gregpinero | last post by:
In the example from help(os.walk) it lists this: from os.path import join, getsize for root, dirs, files in walk('python/Lib/email'): print root, "consumes", print sum(), print "bytes in",...
0
by: Jeff McNeil | last post by:
Your args are fine, that's just the way os.path.walk works. If you just need the absolute pathname of a directory when given a relative path, you can always use os.path.abspath, too. A couple...
4
by: Jeff Nyman | last post by:
Greetings all. I did some searching on this but I can't seem to find a specific solution. I have code like this: ========================================= def walker1(arg, dirname, names):...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.