473,385 Members | 1,325 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

os.walk()

rbt
Could someone demonstrate the correct/proper way to use os.walk() to skip certain
files and folders while walking a specified path? I've read the module docs and
googled to no avail and posted here about other os.walk issues, but I think I need to
back up to the basics or find another tool as this isn't going anywhere fast... I've
tried this:

for root, dirs, files in os.walk(path, topdown=True):

file_skip_list = ['file1', 'file2']
dir_skip_list = ['dir1', 'dir2']

for f in files:
if f in file_skip_list
files.remove(f)

for d in dirs:
if d in dir_skip_list:
dirs.remove(d)

NOW, ANALYZE THE FILES

And This:

files = [f for f in files if f not in file_skip_list]
dirs = [d for d in dirs if dir not in dir_skip_list]

NOW, ANAYLZE THE FILES

The problem I run into is that some of the files and dirs are not removed while others
are. I can be more specific and give exact examples if needed. On WinXP,
'pagefile.sys' is always removed, while 'UsrClass.dat' is *never* removed, etc.
Jul 18 '05 #1
5 3755

"rbt" <rb*@athop1.ath.vt.edu> wrote in message
news:cv**********@solaris.cc.vt.edu...
Could someone demonstrate the correct/proper way to use os.walk() to skip
certain files and folders while walking a specified path? I've read the
module docs and googled to no avail and posted here about other os.walk
issues, but I think I need to back up to the basics or find another tool
as this isn't going anywhere fast... I've tried this:

for root, dirs, files in os.walk(path, topdown=True):

file_skip_list = ['file1', 'file2']
dir_skip_list = ['dir1', 'dir2']

for f in files:
if f in file_skip_list
files.remove(f)

for d in dirs:
if d in dir_skip_list:
dirs.remove(d)
I think the problem here is that you are removing elements from a list while
traversing it. Try to use a copy for the traversal, like this:
for f in files[:]:
if f in file_skip_list
files.remove(f)

for d in dirs[:]:
if d in dir_skip_list:
dirs.remove(d)
And This:

files = [f for f in files if f not in file_skip_list]
dirs = [d for d in dirs if dir not in dir_skip_list]


This is not doing what you want because it just creates new lists and it
doesn't modify the existing lists that the os.walk generator is using.
Jul 18 '05 #2
rbt wrote:
The problem I run into is that some of the files and dirs are not
removed while others are. I can be more specific and give exact examples
if needed. On WinXP, 'pagefile.sys' is always removed, while
'UsrClass.dat' is *never* removed, etc.


Keep in mind that the comparisons are done case sensitive; are you sure
that there's no problem regarding uppercase/lowercase?

--
"Codito ergo sum"
Roel Schroeven
Jul 18 '05 #3
rbt
Roel Schroeven wrote:
rbt wrote:
The problem I run into is that some of the files and dirs are not
removed while others are. I can be more specific and give exact examples
if needed. On WinXP, 'pagefile.sys' is always removed, while
'UsrClass.dat' is *never* removed, etc.

Keep in mind that the comparisons are done case sensitive; are you sure
that there's no problem regarding uppercase/lowercase?


I've noticed that. I've tried most all combinations possible with the same results.
Jul 18 '05 #4
<snip>

os.walk() is a generator. When you iterate over it, like in a for loop,
as
for r,ds,fs in os.walk(...):
r, ds and fs are set to new values at the beginning of each iteration.
If you want to end up with a list of files or dirs, rather than
processing them in the bodies of the file and dir for loops, you need
to keep a list of the files and dirs that os.walk gives you:
import os
dir_skip_list = ['sub2']
file_skip_list = []
keptfiles = list()
keptdirs = list()
for root, ds, fs in os.walk('c:\\bin\\gtest\\'): for f in fs:
if f not in file_skip_list:
keptfiles.append(f)
for d in ds:
if d in dir_skip_list:
ds.remove(d)
else:
keptdirs.append(d)

keptfiles ['P4064013.JPG', 'P4064015.JPG', 'Thumbs.db', 'P4064060.JPG',
'P4064061.JPG', 'Thumbs.db', 'PC030088.JPG', 'P4224133.JPG',
'Thumbs.db'] keptdirs ['sub1', 'sub5', 'sub6']

There is something going on above that I don't quite understand, there
should be more directories, so if you can't get something working with
that, this gives you lists of files and dirs that you can then filter:
keptfiles = list()
keptdirs = list()
for r, ds, fs in os.walk('c:\\bin\\gtest'): keptfiles.extend(fs)
keptdirs.extend(ds)
keptfiles ['P4064013.JPG', 'P4064015.JPG', 'Thumbs.db', 'P4064026.JPG',
'Thumbs.db', 'Thumbs.db', 'Thumbs.db', 'P4064034.JPG', 'Thumbs.db',
'P3123878.JPG', 'P4064065.JPG', 'Thumbs.db', 'P4064060.JPG',
'P4064061.JPG', 'Thumbs.db', 'PC030088.JPG', 'P4224133.JPG',
'Thumbs.db'] keptdirs ['sub1', 'sub2', 'sub3', 'sub5', 'sub6', 'sub8', 'SubA', 'sub9',
'sub6'] #filter away...


Hope this helps,
max

Jul 18 '05 #5
rbt <rb*@athop1.ath.vt.edu> writes:
Could someone demonstrate the correct/proper way to use os.walk() to
skip certain files and folders while walking a specified path? I've
read the module docs and googled to no avail and posted here about
other os.walk issues, but I think I need to back up to the basics or
find another tool as this isn't going anywhere fast... I've tried this:

for root, dirs, files in os.walk(path, topdown=True):

file_skip_list = ['file1', 'file2']
dir_skip_list = ['dir1', 'dir2']

for f in files:
if f in file_skip_list
files.remove(f)

for d in dirs:
if d in dir_skip_list:
dirs.remove(d)

NOW, ANALYZE THE FILES

And This:

files = [f for f in files if f not in file_skip_list]
dirs = [d for d in dirs if dir not in dir_skip_list]

NOW, ANAYLZE THE FILES

The problem I run into is that some of the files and dirs are not
removed while others are. I can be more specific and give exact
examples if needed. On WinXP, 'pagefile.sys' is always removed, while
'UsrClass.dat' is *never* removed, etc.


As other have pointed out, the problem you are running into is that
you are modifying the list while looping over it. You can fix this by
creating copies of the list. No one has presented the LC version yet:

for rl, dl, fl in os.walk(path, topdown=True):
file_skip_list = ('file1', 'file2') #*
dir_skip_list = ('dir1', 'dir2')

files = [f for f in fl if not f in file_skip_list]
dirs = [d for d in dl if not d in dir_skip_list]

# Analyze files and dirs

If you're using 2.4, you might consider using generators instead of
LC's to avoid creating the second copy of the list:

files = (f for f in fl if not f in file_skip_list)
dirs = (d for d in dl if not d in dir_skip_list)

<mike

*) I changed the short list to short tuples, because I use tuples if
I'm not going to modify the list.
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Marcello Pietrobon | last post by:
Hello, I am using Pyton 2.3 I desire to walk a directory without recursion this only partly works: def walk_files() : for root, dirs, files in os.walk(top, topdown=True): for filename in...
7
by: KraftDiner | last post by:
The os.walk function walks the operating systems directory tree. This seems to work, but I don't quite understand the tupple that is returned... Can someone explain please? for root, dirs,...
2
by: gregpinero | last post by:
In the example from help(os.walk) it lists this: from os.path import join, getsize for root, dirs, files in walk('python/Lib/email'): print root, "consumes", print sum(), print "bytes in",...
0
by: Jeff McNeil | last post by:
Your args are fine, that's just the way os.path.walk works. If you just need the absolute pathname of a directory when given a relative path, you can always use os.path.abspath, too. A couple...
4
by: Jeff Nyman | last post by:
Greetings all. I did some searching on this but I can't seem to find a specific solution. I have code like this: ========================================= def walker1(arg, dirname, names):...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.