os.walk() - Python

rbt

Could someone demonstrate the correct/proper way to use os.walk() to skip certain
files and folders while walking a specified path? I've read the module docs and
googled to no avail and posted here about other os.walk issues, but I think I need to
back up to the basics or find another tool as this isn't going anywhere fast... I've
tried this:

for root, dirs, files in os.walk(path, topdown=True):

file_skip_list = ['file1', 'file2']
dir_skip_list = ['dir1', 'dir2']

for f in files:
if f in file_skip_list
files.remove(f)

for d in dirs:
if d in dir_skip_list:
dirs.remove(d)

NOW, ANALYZE THE FILES

And This:

files = [f for f in files if f not in file_skip_list]
dirs = [d for d in dirs if dir not in dir_skip_list]

NOW, ANAYLZE THE FILES

The problem I run into is that some of the files and dirs are not removed while others
are. I can be more specific and give exact examples if needed. On WinXP,
'pagefile.sys' is always removed, while 'UsrClass.dat' is *never* removed, etc.

Jul 18 '05 #1

Subscribe Post Reply

3755

Dan Perl

"rbt" <rb*@athop1.ath.vt.edu> wrote in message
news:cv**********@solaris.cc.vt.edu...

Could someone demonstrate the correct/proper way to use os.walk() to skip
certain files and folders while walking a specified path? I've read the
module docs and googled to no avail and posted here about other os.walk
issues, but I think I need to back up to the basics or find another tool
as this isn't going anywhere fast... I've tried this:

for root, dirs, files in os.walk(path, topdown=True):

file_skip_list = ['file1', 'file2']
dir_skip_list = ['dir1', 'dir2']

for f in files:
if f in file_skip_list
files.remove(f)

for d in dirs:
if d in dir_skip_list:
dirs.remove(d)
I think the problem here is that you are removing elements from a list while
traversing it. Try to use a copy for the traversal, like this:
for f in files[:]:
if f in file_skip_list
files.remove(f)

for d in dirs[:]:
if d in dir_skip_list:
dirs.remove(d)
And This:

files = [f for f in files if f not in file_skip_list]
dirs = [d for d in dirs if dir not in dir_skip_list]

This is not doing what you want because it just creates new lists and it
doesn't modify the existing lists that the os.walk generator is using.

Jul 18 '05 #2

Roel Schroeven

rbt wrote:

The problem I run into is that some of the files and dirs are not
removed while others are. I can be more specific and give exact examples
if needed. On WinXP, 'pagefile.sys' is always removed, while
'UsrClass.dat' is *never* removed, etc.

Keep in mind that the comparisons are done case sensitive; are you sure
that there's no problem regarding uppercase/lowercase?

--
"Codito ergo sum"
Roel Schroeven

Jul 18 '05 #3

rbt

Roel Schroeven wrote:

rbt wrote:
The problem I run into is that some of the files and dirs are not
removed while others are. I can be more specific and give exact examples
if needed. On WinXP, 'pagefile.sys' is always removed, while
'UsrClass.dat' is *never* removed, etc.

Keep in mind that the comparisons are done case sensitive; are you sure
that there's no problem regarding uppercase/lowercase?

I've noticed that. I've tried most all combinations possible with the same results.

Jul 18 '05 #4

Max Erickson

<snip>

os.walk() is a generator. When you iterate over it, like in a for loop,
as
for r,ds,fs in os.walk(...):
r, ds and fs are set to new values at the beginning of each iteration.
If you want to end up with a list of files or dirs, rather than
processing them in the bodies of the file and dir for loops, you need
to keep a list of the files and dirs that os.walk gives you:

import os
dir_skip_list = ['sub2']
file_skip_list = []
keptfiles = list()
keptdirs = list()
for root, ds, fs in os.walk('c:\\bin\\gtest\\'): for f in fs:
if f not in file_skip_list:
keptfiles.append(f)
for d in ds:
if d in dir_skip_list:
ds.remove(d)
else:
keptdirs.append(d)

keptfiles ['P4064013.JPG', 'P4064015.JPG', 'Thumbs.db', 'P4064060.JPG',
'P4064061.JPG', 'Thumbs.db', 'PC030088.JPG', 'P4224133.JPG',
'Thumbs.db'] keptdirs ['sub1', 'sub5', 'sub6']

There is something going on above that I don't quite understand, there
should be more directories, so if you can't get something working with
that, this gives you lists of files and dirs that you can then filter:
keptfiles = list()
keptdirs = list()
for r, ds, fs in os.walk('c:\\bin\\gtest'): keptfiles.extend(fs)
keptdirs.extend(ds)
keptfiles ['P4064013.JPG', 'P4064015.JPG', 'Thumbs.db', 'P4064026.JPG',
'Thumbs.db', 'Thumbs.db', 'Thumbs.db', 'P4064034.JPG', 'Thumbs.db',
'P3123878.JPG', 'P4064065.JPG', 'Thumbs.db', 'P4064060.JPG',
'P4064061.JPG', 'Thumbs.db', 'PC030088.JPG', 'P4224133.JPG',
'Thumbs.db'] keptdirs ['sub1', 'sub2', 'sub3', 'sub5', 'sub6', 'sub8', 'SubA', 'sub9',
'sub6'] #filter away...

Hope this helps,
max

Jul 18 '05 #5

Mike Meyer

rbt <rb*@athop1.ath.vt.edu> writes:

Could someone demonstrate the correct/proper way to use os.walk() to
skip certain files and folders while walking a specified path? I've
read the module docs and googled to no avail and posted here about
other os.walk issues, but I think I need to back up to the basics or
find another tool as this isn't going anywhere fast... I've tried this:

for root, dirs, files in os.walk(path, topdown=True):

file_skip_list = ['file1', 'file2']
dir_skip_list = ['dir1', 'dir2']

for f in files:
if f in file_skip_list
files.remove(f)

for d in dirs:
if d in dir_skip_list:
dirs.remove(d)

NOW, ANALYZE THE FILES

And This:

files = [f for f in files if f not in file_skip_list]
dirs = [d for d in dirs if dir not in dir_skip_list]

NOW, ANAYLZE THE FILES

The problem I run into is that some of the files and dirs are not
removed while others are. I can be more specific and give exact
examples if needed. On WinXP, 'pagefile.sys' is always removed, while
'UsrClass.dat' is *never* removed, etc.

As other have pointed out, the problem you are running into is that
you are modifying the list while looping over it. You can fix this by
creating copies of the list. No one has presented the LC version yet:

for rl, dl, fl in os.walk(path, topdown=True):
file_skip_list = ('file1', 'file2') #*
dir_skip_list = ('dir1', 'dir2')

files = [f for f in fl if not f in file_skip_list]
dirs = [d for d in dl if not d in dir_skip_list]

# Analyze files and dirs

If you're using 2.4, you might consider using generators instead of
LC's to avoid creating the second copy of the list:

files = (f for f in fl if not f in file_skip_list)
dirs = (d for d in dl if not d in dir_skip_list)

<mike

*) I changed the short list to short tuples, because I use tuples if
I'm not going to modify the list.
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

Jul 18 '05 #6

Similar topics

os.walk walks too much

by: Marcello Pietrobon | last post by:

Hello, I am using Pyton 2.3 I desire to walk a directory without recursion this only partly works: def walk_files() : for root, dirs, files in os.walk(top, topdown=True): for filename in...

Python

A little assistance with os.walk please.

by: KraftDiner | last post by:

The os.walk function walks the operating systems directory tree. This seems to work, but I don't quite understand the tupple that is returned... Can someone explain please? for root, dirs,...

Python

How does os.walk work?

by: gregpinero | last post by:

In the example from help(os.walk) it lists this: from os.path import join, getsize for root, dirs, files in walk('python/Lib/email'): print root, "consumes", print sum(), print "bytes in",...

Python

Re: Unable to write output from os.path.walk to a file.

by: Jeff McNeil | last post by:

Your args are fine, that's just the way os.path.walk works. If you just need the absolute pathname of a directory when given a relative path, you can always use os.path.abspath, too. A couple...

Python

os.path.walk -- Can You Limit Directories Returned?

by: Jeff Nyman | last post by:

Greetings all. I did some searching on this but I can't seem to find a specific solution. I have code like this: ========================================= def walker1(arg, dirname, names):...

Python

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General