By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,272 Members | 1,733 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,272 IT Pros & Developers. It's quick & easy.

os.path.walk not pruning descent tree (and I'm not happy with that behavior?)

P: n/a
Good day, everybody! From what I can tell from the archives, this is
everyone's favorite method from the standard lib, and everyone loves
answering questions about it. Right? :)

Anyway, my question regards the way that the visit callback modifies
the names list. Basically, my simple example is:

##############################
def listUndottedDirs( d ):
dots = re.compile( '\.' )

def visit( arg, dirname, names ):
for f in names:
if dots.match( f ):
i = names.index( f )
del names[i]
else:
print "%s: %s" % ( dirname, f )

os.path.walk( d, visit, None )
###############################

Basically, I don't want to visit any hidden subdirs (this is a unix
system), nor am I interested in dot-files. If I call the function
like, "listUndottedDirs( '/usr/home/ardent' )", however, EVEN THOUGH
IT IS REMOVING DOTTED DIRS AND FILES FROM names, it will recurse into
the dotted directories; eg, if I have ".kde3/" in that directory, it
will begin listing the contents of /usr/home/ardent/.kde3/ . Here's
what the documentation says about this method:

"The visit function may modify names to influence the set of
directories visited below dirname, e.g. to avoid visiting certain
parts of the tree. (The object referred to by names must be modified
in place, using del or slice assignment.)"

So... What am I missing? Any help would be greatly appreciated.
--
Joe Ardent

May 28 '07 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Joe Ardent wrote:
Good day, everybody! From what I can tell from the archives, this is
everyone's favorite method from the standard lib, and everyone loves
answering questions about it. Right? :)
I don't know what to make of the smiley, so I'll be explicit: use os.walk()
instead of os.path.walk().
Anyway, my question regards the way that the visit callback modifies
the names list. Basically, my simple example is:

##############################
def listUndottedDirs( d ):
dots = re.compile( '\.' )

def visit( arg, dirname, names ):
for f in names:
if dots.match( f ):
i = names.index( f )
del names[i]
else:
print "%s: %s" % ( dirname, f )

os.path.walk( d, visit, None )
###############################

Basically, I don't want to visit any hidden subdirs (this is a unix
system), nor am I interested in dot-files. If I call the function
like, "listUndottedDirs( '/usr/home/ardent' )", however, EVEN THOUGH
IT IS REMOVING DOTTED DIRS AND FILES FROM names, it will recurse into
the dotted directories; eg, if I have ".kde3/" in that directory, it
will begin listing the contents of /usr/home/ardent/.kde3/ . Here's
what the documentation says about this method:

"The visit function may modify names to influence the set of
directories visited below dirname, e.g. to avoid visiting certain
parts of the tree. (The object referred to by names must be modified
in place, using del or slice assignment.)"

So... What am I missing? Any help would be greatly appreciated.
Your problem is that you are deleting items from a list while iterating over
it:

# WRONG
>>names = [".alpha", ".beta", "gamma"]
for name in names:
.... if name.startswith("."):
.... del names[names.index(name)]
....
>>names
['.beta', 'gamma']

Here's one way to avoid that mess:
>>names = [".alpha", ".beta", "gamma"]
names[:] = [name for name in names if not name.startswith(".")]
names
['gamma']

The slice [:] on the left side is necessary to change the list in-place.

Peter

May 28 '07 #2

P: n/a
En Sun, 27 May 2007 22:39:32 -0300, Joe Ardent <ar****@gmail.comescribió:
Good day, everybody! From what I can tell from the archives, this is
everyone's favorite method from the standard lib, and everyone loves
answering questions about it. Right? :)
Well, in fact, the preferred (and easier) way is to use os.walk - but
os.path.walk is fine too.
Anyway, my question regards the way that the visit callback modifies
the names list. Basically, my simple example is:

##############################
def listUndottedDirs( d ):
dots = re.compile( '\.' )

def visit( arg, dirname, names ):
for f in names:
if dots.match( f ):
i = names.index( f )
del names[i]
else:
print "%s: %s" % ( dirname, f )

os.path.walk( d, visit, None )
###############################
There is nothing wrong with os.walk - you are iterating over the names
list *and* removing elements from it at the same time, and that's not
good... Some ways to avoid it:

- iterate over a copy (the [:] is important):

for fname in names[:]:
if fname[:1]=='.':
names.remove(fname)

- iterate backwards:

for i in range(len(names)-1, -1, -1):
fname = names[i]
if fname[:1]=='.':
names.remove(fname)

- collect first and remove later:

to_be_deleted = [fname for fname in names if fname[:1]=='.']
for fname in to_be_deleted:
names.remove[fname]

- filter and reassign in place (the [:] is important):

names[:] = [fname for fname in names if fname[:1]!='.']

(Notice that I haven't used a regular expression, and the remove method)

--
Gabriel Genellina

May 28 '07 #3

P: n/a
I'm really sorry, for all that private mails, thunderbird is awfully
stupid dealing with mailing lists folder.
Gabriel Genellina a écrit :
En Sun, 27 May 2007 22:39:32 -0300, Joe Ardent <ar****@gmail.comescribió:
- iterate backwards:

for i in range(len(names)-1, -1, -1):
fname = names[i]
if fname[:1]=='.':
names.remove(fname)
This is not about iterating backward, this is about iterating over the
index of each element instead of iterating over the element (which must
be done begining by the end). In fact this code is both inefficient and
contains a subtle bug. If two objects compare equals in the list, you
will remove the wrong one.

It should be :

for i in range(len(names)-1, -1, -1):
if names[i][:1]=='.':
del names[i]

- filter and reassign in place
Seems the best here.
(the [:] is important):
Not so. Unless "names" is referenced in another namespace, simple
assignment is enough.
names[:] = [fname for fname in names if fname[:1]!='.']

(Notice that I haven't used a regular expression, and the remove method)

May 28 '07 #4

P: n/a
En Mon, 28 May 2007 05:25:18 -0300, Maric Michaud <ma***@aristote.info>
escribió:
Gabriel Genellina a écrit :
>- iterate backwards:

for i in range(len(names)-1, -1, -1):
fname = names[i]
if fname[:1]=='.':
names.remove(fname)

This is not about iterating backward, this is about iterating over the
index of each element instead of iterating over the element (which must
be done begining by the end). In fact this code is both inefficient and
contains a subtle bug. If two objects compare equals in the list, you
will remove the wrong one.

It should be :

for i in range(len(names)-1, -1, -1):
if names[i][:1]=='.':
del names[i]
Yes, sure, this is what I should have written. Thanks for the correction!
>- filter and reassign in place

Seems the best here.
>(the [:] is important):

Not so. Unless "names" is referenced in another namespace, simple
assignment is enough.
But this is exactly the case; the visit function is called from inside the
os.path.walk code, and you have to modify the names parameter in-place for
the caller to notice it (and skip the undesided files and folders).

--
Gabriel Genellina

May 28 '07 #5

This discussion thread is closed

Replies have been disabled for this discussion.