By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,974 Members | 1,913 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,974 IT Pros & Developers. It's quick & easy.

os.walk walks too much

P: n/a
Hello,
I am using Pyton 2.3
I desire to walk a directory without recursion

this only partly works:
def walk_files() :
for root, dirs, files in os.walk(top, topdown=True):
for filename in files:
print( "file:" + os.path.join(root, filename) )
for dirname in dirs:
dirs.remove( dirname )
because it skips all the subdirectories but one.

this *does not* work at all
def walk_files() :
for root, dirs, files in os.walk(top, topdown=True):
for filename in files:
print( "file:" + os.path.join(root, filename) )
dirs = []

This is surprizing to me.
Is this a glitch ?

How should I implement this ?
Maybe it would be good to put it in the os. walk documentation ?

Cheers,
Marcello

Jul 18 '05 #1
Share this Question
Share on Google+
9 Replies


P: n/a
Hi,
Hello,
I am using Pyton 2.3
I desire to walk a directory without recursion

this only partly works:
def walk_files() :
for root, dirs, files in os.walk(top, topdown=True):
for filename in files:
print( "file:" + os.path.join(root, filename) )
for dirname in dirs:
dirs.remove( dirname )

I don't know what this walk function does, but anyway, I
think one problem here is that you are iterating over a
variable that you are changing later. There was a similar
message weeks ago and the solution was to copy the list
and remove the elements of this duplicate. I don't remember
which was the function used to copy the list, but I'm sure
that you can't use:
dirs2=dirs
because they reffer to the same memory address.

Regards,
Josef
Jul 18 '05 #2

P: n/a
Marcello Pietrobon wrote:
I am using Pyton 2.3
I desire to walk a directory without recursion

this only partly works:
def walk_files() :
for root, dirs, files in os.walk(top, topdown=True):
for filename in files:
print( "file:" + os.path.join(root, filename) )
This is *bad*. If you want to change a list while you iterate over it, use a
copy (there may be worse side effects than you have seen):
for dirname in dirs[:]: for dirname in dirs:
dirs.remove( dirname )
because it skips all the subdirectories but one.

this *does not* work at all
def walk_files() :
for root, dirs, files in os.walk(top, topdown=True):
for filename in files:
print( "file:" + os.path.join(root, filename) )
You are rebinding dirs to a newly created list, leaving the old one (to
which os.walk() still holds a reference) unaltered. Using

dirs[:] = []

instead should work as desired.
dirs = []


Here's what I do:

def walk_files(root, recursive=False):
for path, dirs, files in os.walk(root):
for fn in files:
yield os.path.join(path, fn)
if not recursive:
break

Peter
Jul 18 '05 #3

P: n/a
Marcello Pietrobon wrote:
Hello,
I am using Pyton 2.3
I desire to walk a directory without recursion
I am not sure what this means. Do you want to iterate over the
non-directory files in directory top? For this job I would use:

def walk_files(top):
names = os.listdir(top)
for name in names:
if os.path.isfile(name):
yield name
this only partly works:
def walk_files() :
for root, dirs, files in os.walk(top, topdown=True):
for filename in files:
print( "file:" + os.path.join(root, filename) )
for dirname in dirs:
dirs.remove( dirname )
because it skips all the subdirectories but one.
Replace
for dirname in dirs:
dirs.remove( dirname )
with
for i in range(len(dirs)-1, -1, -1):
del dirs[i]
to make it work. Run

seq = [0,1,2,3,4,5]
for x in seq:
seq.remove(x)
print seq

to see the problem. If you are iterating through a list selectively
removing members, you should iterate in reverse. Never change the
positions in the list of elements that have not yet been reached by the
iterator.
this *does not* work at all
def walk_files() :
for root, dirs, files in os.walk(top, topdown=True):
for filename in files:
print( "file:" + os.path.join(root, filename) )
dirs = []


There is a subtle point in the documentation.

"When topdown is true, the caller can modify the dirnames list in-place
(perhaps using del or slice assignment), and walk() will only recurse
into the subdirectories whose names remain in dirnames; ..."

The key word is "in-place". "dirs = []" does not change "dirs" in-place.
It replaces "dirs" with a different list. Either use "del"
for i in range(len(dirs)-1, -1, -1):
del dirs[i]
as I did above or use "slice assignment"
dirs[:] = []
Jul 18 '05 #4

P: n/a
Thank you everybody for all the answers.
They all have been useful :)

I have only two question reguarding Peter Otten's answer

1)
What is the difference between

for dirname in dirs:
dirs.remove( dirname )

and

for dirname in dirs[:]:
dirs.remove( dirname )

( I understand and agree that there are better ways, and at list a reverse iterator should be used )

2)

def walk_files(root, recursive=False):
for path, dirs, files in os.walk(root):
for fn in files:
yield os.path.join(path, fn)
if not recursive:
break

seems not correct to me:

because I tend to assimilate yield to a very special return statement
so I think the following is correct

def walk_files(root, recursive=False):
for path, dirs, files in os.walk(root):
for fn in files:
yield os.path.join(path, fn)
if not recursive:
break
is that right ?

Thank you very much,
Marcello

Peter Otten wrote:
Marcello Pietrobon wrote:
I am using Pyton 2.3
I desire to walk a directory without recursion

this only partly works:
def walk_files() :
for root, dirs, files in os.walk(top, topdown=True):
for filename in files:
print( "file:" + os.path.join(root, filename) )


This is *bad*. If you want to change a list while you iterate over it, use a
copy (there may be worse side effects than you have seen):
for dirname in dirs[:]:

for dirname in dirs:
dirs.remove( dirname )
because it skips all the subdirectories but one.

this *does not* work at all
def walk_files() :
for root, dirs, files in os.walk(top, topdown=True):
for filename in files:
print( "file:" + os.path.join(root, filename) )


You are rebinding dirs to a newly created list, leaving the old one (to
which os.walk() still holds a reference) unaltered. Using

dirs[:] = []

instead should work as desired.
dirs = []


Here's what I do:

def walk_files(root, recursive=False):
for path, dirs, files in os.walk(root):
for fn in files:
yield os.path.join(path, fn)
if not recursive:
break

Peter


Jul 18 '05 #5

P: n/a
Marcello Pietrobon wrote:
What is the difference between

for dirname in dirs:
dirs.remove( dirname )

and

for dirname in dirs[:]:
dirs.remove( dirname )
dirs[:] makes a slice containing all elements, i. e. a shallow copy of the
complete list, so the loop is not affected by changes to the original:
dirs = ["alpha", "beta", "gamma"]
dirs == dirs[:] # equal True dirs is dirs[:] # but not the same list

False
def walk_files(root, recursive=False):
for path, dirs, files in os.walk(root):
for fn in files:
yield os.path.join(path, fn)
if not recursive:
break

seems not correct to me:

because I tend to assimilate yield to a very special return statement
so I think the following is correct

def walk_files(root, recursive=False):
for path, dirs, files in os.walk(root):
for fn in files:
yield os.path.join(path, fn)
if not recursive:
break
is that right ?


Oops, of course you're right.

Peter

Jul 18 '05 #6

P: n/a
On 2004-02-25, Peter Otten <__*******@web.de> wrote:
dirs[:] makes a slice containing all elements, i. e. a shallow copy of the
complete list, so the loop is not affected by changes to the original:

dirs = ["alpha", "beta", "gamma"]
dirs == dirs[:] # equal True dirs is dirs[:] # but not the same list False


Better way to make it crystal clear.

{grey@teleute:~} python
Python 2.3.3 (#2, Jan 13 2004, 00:47:05)
[GCC 3.3.3 20040110 (prerelease) (Debian)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
real = [1, 2, 3]
copy = real[:]
real [1, 2, 3] copy [1, 2, 3] for x in real: .... print(x)
.... real.remove(x)
....
1
3 real [2] real = [1, 2, 3]
for x in real[:]: # IE, same as using copy .... print(x)
.... real.remove(x)
....
1
2
3 real

[]

To the original poster, the reason changing the list you're iterating over
is because the index doesn't move along with the data in it. So in the first
loop 1 and 3 are printed, 2 is left. Why? Assign indexes do the data. This
is most likely not how Python does it internally but this is good to show what
happened.

[1, 2, 3]
0 1 2

First run through x is one but it got it from the first index, 0. So x is
1. Then you remove 1 from the data set so now it looks like this.

[2, 3]
0 1

So now Python moves on in the loop, it grabs the next index which is 1.
However, since you've changed the list that index now points to 3, not 2. It
grabs 3, prints it then removes it. So now we're left with:

[2]
0

Since it's already done 0 the loop ends. By using a copy you're using the
copy to preserve the indexing to the data while you manipulate the data. Hope
this clears it up. :)
--
Steve C. Lamb | I'm your priest, I'm your shrink, I'm your
PGP Key: 8B6E99C5 | main connection to the switchboard of souls.
-------------------------------+---------------------------------------------
Jul 18 '05 #7

P: n/a
Hi Steve,

Steve Lamb wrote:
On 2004-02-25, Peter Otten <__*******@web.de> wrote:

dirs[:] makes a slice containing all elements, i. e. a shallow copy of the
complete list, so the loop is not affected by changes to the original:

>dirs = ["alpha", "beta", "gamma"]
>dirs == dirs[:] # equal
>
>

True

>dirs is dirs[:] # but not the same list
>
>

False


Better way to make it crystal clear.

{grey@teleute:~} python
Python 2.3.3 (#2, Jan 13 2004, 00:47:05)
[GCC 3.3.3 20040110 (prerelease) (Debian)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

real = [1, 2, 3]
copy = real[:]
real

[1, 2, 3]

copy

[1, 2, 3]

for x in real:

... print(x)
... real.remove(x)
...
1
3

real

[2]

real = [1, 2, 3]
for x in real[:]: # IE, same as using copy

... print(x)
... real.remove(x)
...
1
2
3

real

[]

To the original poster, the reason changing the list you're iterating over
is because the index doesn't move along with the data in it. So in the first
loop 1 and 3 are printed, 2 is left. Why? Assign indexes do the data. This
is most likely not how Python does it internally but this is good to show what
happened.

[1, 2, 3]
0 1 2

First run through x is one but it got it from the first index, 0. So x is
1. Then you remove 1 from the data set so now it looks like this.

[2, 3]
0 1

So now Python moves on in the loop, it grabs the next index which is 1.
However, since you've changed the list that index now points to 3, not 2. It
grabs 3, prints it then removes it. So now we're left with:

[2]
0

Since it's already done 0 the loop ends. By using a copy you're using the
copy to preserve the indexing to the data while you manipulate the data. Hope
this clears it up. :)


I thought intuitively something like that, but your help has been...
helpful ! :)

Can I ask you one more thing ?

It is surprizing to me that in

for x in real[:]

dirs[:] creates a copy of dirs

while

dirs[:] = [] - empty the original list
and
dirs = [] - empty a copy of the original list

I understand ( I think ) the concept of slicing, but this is stil
surprizing to me.
Like to say that when I do

for x in real[:]

this is not using slicing

While
dirs[:] = []
is using slicing
Maybe I just making a big mess in my mind.
It looks like assignments in Python and C++ are pretty different
Cheers,
Marcello

Jul 18 '05 #8

P: n/a
On Fri, Feb 27, 2004 at 02:51:04PM -0500, Marcello Pietrobon wrote:
dirs[:] creates a copy of dirs
This creates a new list which contains the same items as the list named
by dirs
dirs[:] = [] - empty the original list
This changes the items in the list named by dirs. It replaces (mutates)
the range named on the left-hand of = with the items on the right-hand.
and
dirs = [] - empty a copy of the original list


This makes dirs name a different list than it did before, but the value
of the list that dirs named a moment ago is unchanged.

In the case of os.walk (or anywhere you do something by mutating an item
passed in) you have to change the items in a particular list ("mutate
the list") , not change the list a particular local name refers to.

Jeff

Jul 18 '05 #9

P: n/a
On 2004-02-27, Marcello Pietrobon <te*****@attglobal.net> wrote:
Can I ask you one more thing ?
Sure. However I am a Python neophyte who happens to have a few years
experience so take everything I say with a large heaping of salt. :)
It is surprizing to me that in
Ah, took me a minote to see what you were saying.
for x in real[:] dirs[:] creates a copy of dirs
Well, creating a copy is the shorthand. What both of these are doing is
"output the values from the array x from y to z." Since y and z are not
specified you get the whole array (or string, or directory or any other
slicable object).
while dirs[:] = [] - empty the original list
This is "assign the range of x to y the list given". A better way to see
it would be to do this:
= [1, 2, 3, 4]
foo [1, 2, 3, 4] foo[1:2] = [3, 2, 5]
foo [1, 3, 2, 5, 3, 4]

Hmmm, ok, even I'm scratching my head at that since I expected 1, 3, 2, 5
4. Erm, but you get the idea. :)
and
dirs = [] - empty a copy of the original list
This is because here you're assigning the name to a new object.

So in order...

for x in real[:] - iterate over the results of the slice of real from y to z.

foo = dirs[:] - Assign foo to the results of the slice of dirs from y to z.

dirs[:] = [] - Assign the the area of dirs defined by slice y to z with an
emptry array.

dirs = [] - Assign the name dirs to a new, empty array.

Where most people get hung up is the different between strings, which are
immutable, and lists/dictionaries, etc. which are mutable. :)
I understand ( I think ) the concept of slicing, but this is stil
surprizing to me. Like to say that when I do

for x in real[:] this is not using slicing
Yes, it is. Take foo from above...
id(foo) 1075943980
foo points to object 1075943980.
id(foo) 1075943980
foo still points to object 1075943980.
id(foo[:])

1075943308
However this is a different object, 1075943308.

So in the above example it is using a slice. real[:] is returning a slice
and it is that object which x is iterating over. Just because that slice
doesn't have a name assigned to it doesn't mean it doesn't exist. :)
While
dirs[:] = []
is using slicing
Well, it is using it in a different manner. Above you're using slicing to
tell Python what to return. Here you're using slicing to tell Python what to
replace.
Maybe I just making a big mess in my mind.
It looks like assignments in Python and C++ are pretty different


Never touched C++ so I cannot say. :)

--
Steve C. Lamb | I'm your priest, I'm your shrink, I'm your
PGP Key: 8B6E99C5 | main connection to the switchboard of souls.
-------------------------------+---------------------------------------------
Jul 18 '05 #10

This discussion thread is closed

Replies have been disabled for this discussion.