473,657 Members | 2,515 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

os.path.walk not pruning descent tree (and I'm not happy with that behavior?)

Good day, everybody! From what I can tell from the archives, this is
everyone's favorite method from the standard lib, and everyone loves
answering questions about it. Right? :)

Anyway, my question regards the way that the visit callback modifies
the names list. Basically, my simple example is:

############### ###############
def listUndottedDir s( d ):
dots = re.compile( '\.' )

def visit( arg, dirname, names ):
for f in names:
if dots.match( f ):
i = names.index( f )
del names[i]
else:
print "%s: %s" % ( dirname, f )

os.path.walk( d, visit, None )
############### ############### #

Basically, I don't want to visit any hidden subdirs (this is a unix
system), nor am I interested in dot-files. If I call the function
like, "listUndottedDi rs( '/usr/home/ardent' )", however, EVEN THOUGH
IT IS REMOVING DOTTED DIRS AND FILES FROM names, it will recurse into
the dotted directories; eg, if I have ".kde3/" in that directory, it
will begin listing the contents of /usr/home/ardent/.kde3/ . Here's
what the documentation says about this method:

"The visit function may modify names to influence the set of
directories visited below dirname, e.g. to avoid visiting certain
parts of the tree. (The object referred to by names must be modified
in place, using del or slice assignment.)"

So... What am I missing? Any help would be greatly appreciated.
--
Joe Ardent

May 28 '07 #1
4 1348
Joe Ardent wrote:
Good day, everybody! From what I can tell from the archives, this is
everyone's favorite method from the standard lib, and everyone loves
answering questions about it. Right? :)
I don't know what to make of the smiley, so I'll be explicit: use os.walk()
instead of os.path.walk().
Anyway, my question regards the way that the visit callback modifies
the names list. Basically, my simple example is:

############### ###############
def listUndottedDir s( d ):
dots = re.compile( '\.' )

def visit( arg, dirname, names ):
for f in names:
if dots.match( f ):
i = names.index( f )
del names[i]
else:
print "%s: %s" % ( dirname, f )

os.path.walk( d, visit, None )
############### ############### #

Basically, I don't want to visit any hidden subdirs (this is a unix
system), nor am I interested in dot-files. If I call the function
like, "listUndottedDi rs( '/usr/home/ardent' )", however, EVEN THOUGH
IT IS REMOVING DOTTED DIRS AND FILES FROM names, it will recurse into
the dotted directories; eg, if I have ".kde3/" in that directory, it
will begin listing the contents of /usr/home/ardent/.kde3/ . Here's
what the documentation says about this method:

"The visit function may modify names to influence the set of
directories visited below dirname, e.g. to avoid visiting certain
parts of the tree. (The object referred to by names must be modified
in place, using del or slice assignment.)"

So... What am I missing? Any help would be greatly appreciated.
Your problem is that you are deleting items from a list while iterating over
it:

# WRONG
>>names = [".alpha", ".beta", "gamma"]
for name in names:
.... if name.startswith ("."):
.... del names[names.index(nam e)]
....
>>names
['.beta', 'gamma']

Here's one way to avoid that mess:
>>names = [".alpha", ".beta", "gamma"]
names[:] = [name for name in names if not name.startswith (".")]
names
['gamma']

The slice [:] on the left side is necessary to change the list in-place.

Peter

May 28 '07 #2
En Sun, 27 May 2007 22:39:32 -0300, Joe Ardent <ar****@gmail.c omescribió:
Good day, everybody! From what I can tell from the archives, this is
everyone's favorite method from the standard lib, and everyone loves
answering questions about it. Right? :)
Well, in fact, the preferred (and easier) way is to use os.walk - but
os.path.walk is fine too.
Anyway, my question regards the way that the visit callback modifies
the names list. Basically, my simple example is:

############### ###############
def listUndottedDir s( d ):
dots = re.compile( '\.' )

def visit( arg, dirname, names ):
for f in names:
if dots.match( f ):
i = names.index( f )
del names[i]
else:
print "%s: %s" % ( dirname, f )

os.path.walk( d, visit, None )
############### ############### #
There is nothing wrong with os.walk - you are iterating over the names
list *and* removing elements from it at the same time, and that's not
good... Some ways to avoid it:

- iterate over a copy (the [:] is important):

for fname in names[:]:
if fname[:1]=='.':
names.remove(fn ame)

- iterate backwards:

for i in range(len(names )-1, -1, -1):
fname = names[i]
if fname[:1]=='.':
names.remove(fn ame)

- collect first and remove later:

to_be_deleted = [fname for fname in names if fname[:1]=='.']
for fname in to_be_deleted:
names.remove[fname]

- filter and reassign in place (the [:] is important):

names[:] = [fname for fname in names if fname[:1]!='.']

(Notice that I haven't used a regular expression, and the remove method)

--
Gabriel Genellina

May 28 '07 #3
I'm really sorry, for all that private mails, thunderbird is awfully
stupid dealing with mailing lists folder.
Gabriel Genellina a écrit :
En Sun, 27 May 2007 22:39:32 -0300, Joe Ardent <ar****@gmail.c omescribió:
- iterate backwards:

for i in range(len(names )-1, -1, -1):
fname = names[i]
if fname[:1]=='.':
names.remove(fn ame)
This is not about iterating backward, this is about iterating over the
index of each element instead of iterating over the element (which must
be done begining by the end). In fact this code is both inefficient and
contains a subtle bug. If two objects compare equals in the list, you
will remove the wrong one.

It should be :

for i in range(len(names )-1, -1, -1):
if names[i][:1]=='.':
del names[i]

- filter and reassign in place
Seems the best here.
(the [:] is important):
Not so. Unless "names" is referenced in another namespace, simple
assignment is enough.
names[:] = [fname for fname in names if fname[:1]!='.']

(Notice that I haven't used a regular expression, and the remove method)

May 28 '07 #4
En Mon, 28 May 2007 05:25:18 -0300, Maric Michaud <ma***@aristote .info>
escribió:
Gabriel Genellina a écrit :
>- iterate backwards:

for i in range(len(names )-1, -1, -1):
fname = names[i]
if fname[:1]=='.':
names.remove(fn ame)

This is not about iterating backward, this is about iterating over the
index of each element instead of iterating over the element (which must
be done begining by the end). In fact this code is both inefficient and
contains a subtle bug. If two objects compare equals in the list, you
will remove the wrong one.

It should be :

for i in range(len(names )-1, -1, -1):
if names[i][:1]=='.':
del names[i]
Yes, sure, this is what I should have written. Thanks for the correction!
>- filter and reassign in place

Seems the best here.
>(the [:] is important):

Not so. Unless "names" is referenced in another namespace, simple
assignment is enough.
But this is exactly the case; the visit function is called from inside the
os.path.walk code, and you have to modify the names parameter in-place for
the caller to notice it (and skip the undesided files and folders).

--
Gabriel Genellina

May 28 '07 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
23375
by: Rob Cowie | last post by:
Hi, Given a string representing the path to a file, what is the best way to get at the filename? Does the OS module provide a function to parse the path? or is it acceptable to split the string using '/' as delimiters and get the last 'word'. The reason I'm not entirely happy with that method is that it is platform specific. I would prefer to use a built in method if possible. Cheers,
13
9049
by: André Nogueira | last post by:
Hi there. I know you can view a node's fullpath property, but is it posible to select a node using its path? Like, tell the treeview that the node that should be selected is the node with the path "Users\Administrators\John Doe"? Thank you in advance! Andre Nogueira
22
5368
by: delraydog | last post by:
It's quite simple to walk to the DOM tree going forward however I can't figure out a nice clean way to walk the DOM tree in reverse. Checking previousSibling is not sufficient as the previousSibling could be a node which has childNodes and therefore the 'true' previousSibling would be the *deepest* lastChild of the previousSibling... For example, given this graph: 1 myAnchor nodeType 1 2 myAnchorText1 nodeType 3
7
15311
by: Sharon | last post by:
How can I get the full XML path (as string) of a specific XmlNode ? -- Thanks Sharon
18
4712
by: Just Another Victim of the Ambient Morality | last post by:
Is pyparsing really a recursive descent parser? I ask this because there are grammars it can't parse that my recursive descent parser would parse, should I have written one. For instance: from pyparsing import * grammar = OneOrMore(Word(alphas)) + Literal('end') grammar.parseString('First Second Third end')
1
1640
by: Paul Lemelle | last post by:
I Am trying to output the os.path.walk to a file, but the writelines method complains.... Below is the code, any helpful suggestions would be appreciated. def visit(arg, dirnames, names): print dirnames
0
2033
by: Jeff McNeil | last post by:
Your args are fine, that's just the way os.path.walk works. If you just need the absolute pathname of a directory when given a relative path, you can always use os.path.abspath, too. A couple more examples that may help, using os.walk: .... for j in i + i: .... print os.path.join(i, j) .... /var/log/apache2
4
2459
by: Jeff Nyman | last post by:
Greetings all. I did some searching on this but I can't seem to find a specific solution. I have code like this: ========================================= def walker1(arg, dirname, names): DC_List.append((dirname,'')) os.path.walk('\\\\vcdcflx006\\Flex\\Sites', walker1, 0)
0
1543
by: Fredrik Lundh | last post by:
A. Joseph wrote: os.walk traverses the directory tree, so I'm not sure why you think that your program needs to use recursion? wouldn't a plain loop work? import os, shutil for dirpath, dirnames, filenames in os.walk(directory): for name in filenames:
0
8421
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8325
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8621
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7354
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6177
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5643
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4330
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
1971
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1734
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.