473,396 Members | 1,816 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

os.path.walk not pruning descent tree (and I'm not happy with that behavior?)

Good day, everybody! From what I can tell from the archives, this is
everyone's favorite method from the standard lib, and everyone loves
answering questions about it. Right? :)

Anyway, my question regards the way that the visit callback modifies
the names list. Basically, my simple example is:

##############################
def listUndottedDirs( d ):
dots = re.compile( '\.' )

def visit( arg, dirname, names ):
for f in names:
if dots.match( f ):
i = names.index( f )
del names[i]
else:
print "%s: %s" % ( dirname, f )

os.path.walk( d, visit, None )
###############################

Basically, I don't want to visit any hidden subdirs (this is a unix
system), nor am I interested in dot-files. If I call the function
like, "listUndottedDirs( '/usr/home/ardent' )", however, EVEN THOUGH
IT IS REMOVING DOTTED DIRS AND FILES FROM names, it will recurse into
the dotted directories; eg, if I have ".kde3/" in that directory, it
will begin listing the contents of /usr/home/ardent/.kde3/ . Here's
what the documentation says about this method:

"The visit function may modify names to influence the set of
directories visited below dirname, e.g. to avoid visiting certain
parts of the tree. (The object referred to by names must be modified
in place, using del or slice assignment.)"

So... What am I missing? Any help would be greatly appreciated.
--
Joe Ardent

May 28 '07 #1
4 1327
Joe Ardent wrote:
Good day, everybody! From what I can tell from the archives, this is
everyone's favorite method from the standard lib, and everyone loves
answering questions about it. Right? :)
I don't know what to make of the smiley, so I'll be explicit: use os.walk()
instead of os.path.walk().
Anyway, my question regards the way that the visit callback modifies
the names list. Basically, my simple example is:

##############################
def listUndottedDirs( d ):
dots = re.compile( '\.' )

def visit( arg, dirname, names ):
for f in names:
if dots.match( f ):
i = names.index( f )
del names[i]
else:
print "%s: %s" % ( dirname, f )

os.path.walk( d, visit, None )
###############################

Basically, I don't want to visit any hidden subdirs (this is a unix
system), nor am I interested in dot-files. If I call the function
like, "listUndottedDirs( '/usr/home/ardent' )", however, EVEN THOUGH
IT IS REMOVING DOTTED DIRS AND FILES FROM names, it will recurse into
the dotted directories; eg, if I have ".kde3/" in that directory, it
will begin listing the contents of /usr/home/ardent/.kde3/ . Here's
what the documentation says about this method:

"The visit function may modify names to influence the set of
directories visited below dirname, e.g. to avoid visiting certain
parts of the tree. (The object referred to by names must be modified
in place, using del or slice assignment.)"

So... What am I missing? Any help would be greatly appreciated.
Your problem is that you are deleting items from a list while iterating over
it:

# WRONG
>>names = [".alpha", ".beta", "gamma"]
for name in names:
.... if name.startswith("."):
.... del names[names.index(name)]
....
>>names
['.beta', 'gamma']

Here's one way to avoid that mess:
>>names = [".alpha", ".beta", "gamma"]
names[:] = [name for name in names if not name.startswith(".")]
names
['gamma']

The slice [:] on the left side is necessary to change the list in-place.

Peter

May 28 '07 #2
En Sun, 27 May 2007 22:39:32 -0300, Joe Ardent <ar****@gmail.comescribió:
Good day, everybody! From what I can tell from the archives, this is
everyone's favorite method from the standard lib, and everyone loves
answering questions about it. Right? :)
Well, in fact, the preferred (and easier) way is to use os.walk - but
os.path.walk is fine too.
Anyway, my question regards the way that the visit callback modifies
the names list. Basically, my simple example is:

##############################
def listUndottedDirs( d ):
dots = re.compile( '\.' )

def visit( arg, dirname, names ):
for f in names:
if dots.match( f ):
i = names.index( f )
del names[i]
else:
print "%s: %s" % ( dirname, f )

os.path.walk( d, visit, None )
###############################
There is nothing wrong with os.walk - you are iterating over the names
list *and* removing elements from it at the same time, and that's not
good... Some ways to avoid it:

- iterate over a copy (the [:] is important):

for fname in names[:]:
if fname[:1]=='.':
names.remove(fname)

- iterate backwards:

for i in range(len(names)-1, -1, -1):
fname = names[i]
if fname[:1]=='.':
names.remove(fname)

- collect first and remove later:

to_be_deleted = [fname for fname in names if fname[:1]=='.']
for fname in to_be_deleted:
names.remove[fname]

- filter and reassign in place (the [:] is important):

names[:] = [fname for fname in names if fname[:1]!='.']

(Notice that I haven't used a regular expression, and the remove method)

--
Gabriel Genellina

May 28 '07 #3
I'm really sorry, for all that private mails, thunderbird is awfully
stupid dealing with mailing lists folder.
Gabriel Genellina a écrit :
En Sun, 27 May 2007 22:39:32 -0300, Joe Ardent <ar****@gmail.comescribió:
- iterate backwards:

for i in range(len(names)-1, -1, -1):
fname = names[i]
if fname[:1]=='.':
names.remove(fname)
This is not about iterating backward, this is about iterating over the
index of each element instead of iterating over the element (which must
be done begining by the end). In fact this code is both inefficient and
contains a subtle bug. If two objects compare equals in the list, you
will remove the wrong one.

It should be :

for i in range(len(names)-1, -1, -1):
if names[i][:1]=='.':
del names[i]

- filter and reassign in place
Seems the best here.
(the [:] is important):
Not so. Unless "names" is referenced in another namespace, simple
assignment is enough.
names[:] = [fname for fname in names if fname[:1]!='.']

(Notice that I haven't used a regular expression, and the remove method)

May 28 '07 #4
En Mon, 28 May 2007 05:25:18 -0300, Maric Michaud <ma***@aristote.info>
escribió:
Gabriel Genellina a écrit :
>- iterate backwards:

for i in range(len(names)-1, -1, -1):
fname = names[i]
if fname[:1]=='.':
names.remove(fname)

This is not about iterating backward, this is about iterating over the
index of each element instead of iterating over the element (which must
be done begining by the end). In fact this code is both inefficient and
contains a subtle bug. If two objects compare equals in the list, you
will remove the wrong one.

It should be :

for i in range(len(names)-1, -1, -1):
if names[i][:1]=='.':
del names[i]
Yes, sure, this is what I should have written. Thanks for the correction!
>- filter and reassign in place

Seems the best here.
>(the [:] is important):

Not so. Unless "names" is referenced in another namespace, simple
assignment is enough.
But this is exactly the case; the visit function is called from inside the
os.path.walk code, and you have to modify the names parameter in-place for
the caller to notice it (and skip the undesided files and folders).

--
Gabriel Genellina

May 28 '07 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Rob Cowie | last post by:
Hi, Given a string representing the path to a file, what is the best way to get at the filename? Does the OS module provide a function to parse the path? or is it acceptable to split the string...
13
by: André Nogueira | last post by:
Hi there. I know you can view a node's fullpath property, but is it posible to select a node using its path? Like, tell the treeview that the node that should be selected is the node with the...
22
by: delraydog | last post by:
It's quite simple to walk to the DOM tree going forward however I can't figure out a nice clean way to walk the DOM tree in reverse. Checking previousSibling is not sufficient as the...
7
by: Sharon | last post by:
How can I get the full XML path (as string) of a specific XmlNode ? -- Thanks Sharon
18
by: Just Another Victim of the Ambient Morality | last post by:
Is pyparsing really a recursive descent parser? I ask this because there are grammars it can't parse that my recursive descent parser would parse, should I have written one. For instance: ...
1
by: Paul Lemelle | last post by:
I Am trying to output the os.path.walk to a file, but the writelines method complains.... Below is the code, any helpful suggestions would be appreciated. def visit(arg, dirnames, names):...
0
by: Jeff McNeil | last post by:
Your args are fine, that's just the way os.path.walk works. If you just need the absolute pathname of a directory when given a relative path, you can always use os.path.abspath, too. A couple...
4
by: Jeff Nyman | last post by:
Greetings all. I did some searching on this but I can't seem to find a specific solution. I have code like this: ========================================= def walker1(arg, dirname, names):...
0
by: Fredrik Lundh | last post by:
A. Joseph wrote: os.walk traverses the directory tree, so I'm not sure why you think that your program needs to use recursion? wouldn't a plain loop work? import os, shutil for dirpath,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.