473,386 Members | 1,745 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

os.walk walks too much

Hello,
I am using Pyton 2.3
I desire to walk a directory without recursion

this only partly works:
def walk_files() :
for root, dirs, files in os.walk(top, topdown=True):
for filename in files:
print( "file:" + os.path.join(root, filename) )
for dirname in dirs:
dirs.remove( dirname )
because it skips all the subdirectories but one.

this *does not* work at all
def walk_files() :
for root, dirs, files in os.walk(top, topdown=True):
for filename in files:
print( "file:" + os.path.join(root, filename) )
dirs = []

This is surprizing to me.
Is this a glitch ?

How should I implement this ?
Maybe it would be good to put it in the os. walk documentation ?

Cheers,
Marcello

Jul 18 '05 #1
9 2890
Hi,
Hello,
I am using Pyton 2.3
I desire to walk a directory without recursion

this only partly works:
def walk_files() :
for root, dirs, files in os.walk(top, topdown=True):
for filename in files:
print( "file:" + os.path.join(root, filename) )
for dirname in dirs:
dirs.remove( dirname )

I don't know what this walk function does, but anyway, I
think one problem here is that you are iterating over a
variable that you are changing later. There was a similar
message weeks ago and the solution was to copy the list
and remove the elements of this duplicate. I don't remember
which was the function used to copy the list, but I'm sure
that you can't use:
dirs2=dirs
because they reffer to the same memory address.

Regards,
Josef
Jul 18 '05 #2
Marcello Pietrobon wrote:
I am using Pyton 2.3
I desire to walk a directory without recursion

this only partly works:
def walk_files() :
for root, dirs, files in os.walk(top, topdown=True):
for filename in files:
print( "file:" + os.path.join(root, filename) )
This is *bad*. If you want to change a list while you iterate over it, use a
copy (there may be worse side effects than you have seen):
for dirname in dirs[:]: for dirname in dirs:
dirs.remove( dirname )
because it skips all the subdirectories but one.

this *does not* work at all
def walk_files() :
for root, dirs, files in os.walk(top, topdown=True):
for filename in files:
print( "file:" + os.path.join(root, filename) )
You are rebinding dirs to a newly created list, leaving the old one (to
which os.walk() still holds a reference) unaltered. Using

dirs[:] = []

instead should work as desired.
dirs = []


Here's what I do:

def walk_files(root, recursive=False):
for path, dirs, files in os.walk(root):
for fn in files:
yield os.path.join(path, fn)
if not recursive:
break

Peter
Jul 18 '05 #3
Marcello Pietrobon wrote:
Hello,
I am using Pyton 2.3
I desire to walk a directory without recursion
I am not sure what this means. Do you want to iterate over the
non-directory files in directory top? For this job I would use:

def walk_files(top):
names = os.listdir(top)
for name in names:
if os.path.isfile(name):
yield name
this only partly works:
def walk_files() :
for root, dirs, files in os.walk(top, topdown=True):
for filename in files:
print( "file:" + os.path.join(root, filename) )
for dirname in dirs:
dirs.remove( dirname )
because it skips all the subdirectories but one.
Replace
for dirname in dirs:
dirs.remove( dirname )
with
for i in range(len(dirs)-1, -1, -1):
del dirs[i]
to make it work. Run

seq = [0,1,2,3,4,5]
for x in seq:
seq.remove(x)
print seq

to see the problem. If you are iterating through a list selectively
removing members, you should iterate in reverse. Never change the
positions in the list of elements that have not yet been reached by the
iterator.
this *does not* work at all
def walk_files() :
for root, dirs, files in os.walk(top, topdown=True):
for filename in files:
print( "file:" + os.path.join(root, filename) )
dirs = []


There is a subtle point in the documentation.

"When topdown is true, the caller can modify the dirnames list in-place
(perhaps using del or slice assignment), and walk() will only recurse
into the subdirectories whose names remain in dirnames; ..."

The key word is "in-place". "dirs = []" does not change "dirs" in-place.
It replaces "dirs" with a different list. Either use "del"
for i in range(len(dirs)-1, -1, -1):
del dirs[i]
as I did above or use "slice assignment"
dirs[:] = []
Jul 18 '05 #4
Thank you everybody for all the answers.
They all have been useful :)

I have only two question reguarding Peter Otten's answer

1)
What is the difference between

for dirname in dirs:
dirs.remove( dirname )

and

for dirname in dirs[:]:
dirs.remove( dirname )

( I understand and agree that there are better ways, and at list a reverse iterator should be used )

2)

def walk_files(root, recursive=False):
for path, dirs, files in os.walk(root):
for fn in files:
yield os.path.join(path, fn)
if not recursive:
break

seems not correct to me:

because I tend to assimilate yield to a very special return statement
so I think the following is correct

def walk_files(root, recursive=False):
for path, dirs, files in os.walk(root):
for fn in files:
yield os.path.join(path, fn)
if not recursive:
break
is that right ?

Thank you very much,
Marcello

Peter Otten wrote:
Marcello Pietrobon wrote:
I am using Pyton 2.3
I desire to walk a directory without recursion

this only partly works:
def walk_files() :
for root, dirs, files in os.walk(top, topdown=True):
for filename in files:
print( "file:" + os.path.join(root, filename) )


This is *bad*. If you want to change a list while you iterate over it, use a
copy (there may be worse side effects than you have seen):
for dirname in dirs[:]:

for dirname in dirs:
dirs.remove( dirname )
because it skips all the subdirectories but one.

this *does not* work at all
def walk_files() :
for root, dirs, files in os.walk(top, topdown=True):
for filename in files:
print( "file:" + os.path.join(root, filename) )


You are rebinding dirs to a newly created list, leaving the old one (to
which os.walk() still holds a reference) unaltered. Using

dirs[:] = []

instead should work as desired.
dirs = []


Here's what I do:

def walk_files(root, recursive=False):
for path, dirs, files in os.walk(root):
for fn in files:
yield os.path.join(path, fn)
if not recursive:
break

Peter


Jul 18 '05 #5
Marcello Pietrobon wrote:
What is the difference between

for dirname in dirs:
dirs.remove( dirname )

and

for dirname in dirs[:]:
dirs.remove( dirname )
dirs[:] makes a slice containing all elements, i. e. a shallow copy of the
complete list, so the loop is not affected by changes to the original:
dirs = ["alpha", "beta", "gamma"]
dirs == dirs[:] # equal True dirs is dirs[:] # but not the same list

False
def walk_files(root, recursive=False):
for path, dirs, files in os.walk(root):
for fn in files:
yield os.path.join(path, fn)
if not recursive:
break

seems not correct to me:

because I tend to assimilate yield to a very special return statement
so I think the following is correct

def walk_files(root, recursive=False):
for path, dirs, files in os.walk(root):
for fn in files:
yield os.path.join(path, fn)
if not recursive:
break
is that right ?


Oops, of course you're right.

Peter

Jul 18 '05 #6
On 2004-02-25, Peter Otten <__*******@web.de> wrote:
dirs[:] makes a slice containing all elements, i. e. a shallow copy of the
complete list, so the loop is not affected by changes to the original:

dirs = ["alpha", "beta", "gamma"]
dirs == dirs[:] # equal True dirs is dirs[:] # but not the same list False


Better way to make it crystal clear.

{grey@teleute:~} python
Python 2.3.3 (#2, Jan 13 2004, 00:47:05)
[GCC 3.3.3 20040110 (prerelease) (Debian)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
real = [1, 2, 3]
copy = real[:]
real [1, 2, 3] copy [1, 2, 3] for x in real: .... print(x)
.... real.remove(x)
....
1
3 real [2] real = [1, 2, 3]
for x in real[:]: # IE, same as using copy .... print(x)
.... real.remove(x)
....
1
2
3 real

[]

To the original poster, the reason changing the list you're iterating over
is because the index doesn't move along with the data in it. So in the first
loop 1 and 3 are printed, 2 is left. Why? Assign indexes do the data. This
is most likely not how Python does it internally but this is good to show what
happened.

[1, 2, 3]
0 1 2

First run through x is one but it got it from the first index, 0. So x is
1. Then you remove 1 from the data set so now it looks like this.

[2, 3]
0 1

So now Python moves on in the loop, it grabs the next index which is 1.
However, since you've changed the list that index now points to 3, not 2. It
grabs 3, prints it then removes it. So now we're left with:

[2]
0

Since it's already done 0 the loop ends. By using a copy you're using the
copy to preserve the indexing to the data while you manipulate the data. Hope
this clears it up. :)
--
Steve C. Lamb | I'm your priest, I'm your shrink, I'm your
PGP Key: 8B6E99C5 | main connection to the switchboard of souls.
-------------------------------+---------------------------------------------
Jul 18 '05 #7
Hi Steve,

Steve Lamb wrote:
On 2004-02-25, Peter Otten <__*******@web.de> wrote:

dirs[:] makes a slice containing all elements, i. e. a shallow copy of the
complete list, so the loop is not affected by changes to the original:

>dirs = ["alpha", "beta", "gamma"]
>dirs == dirs[:] # equal
>
>

True

>dirs is dirs[:] # but not the same list
>
>

False


Better way to make it crystal clear.

{grey@teleute:~} python
Python 2.3.3 (#2, Jan 13 2004, 00:47:05)
[GCC 3.3.3 20040110 (prerelease) (Debian)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

real = [1, 2, 3]
copy = real[:]
real

[1, 2, 3]

copy

[1, 2, 3]

for x in real:

... print(x)
... real.remove(x)
...
1
3

real

[2]

real = [1, 2, 3]
for x in real[:]: # IE, same as using copy

... print(x)
... real.remove(x)
...
1
2
3

real

[]

To the original poster, the reason changing the list you're iterating over
is because the index doesn't move along with the data in it. So in the first
loop 1 and 3 are printed, 2 is left. Why? Assign indexes do the data. This
is most likely not how Python does it internally but this is good to show what
happened.

[1, 2, 3]
0 1 2

First run through x is one but it got it from the first index, 0. So x is
1. Then you remove 1 from the data set so now it looks like this.

[2, 3]
0 1

So now Python moves on in the loop, it grabs the next index which is 1.
However, since you've changed the list that index now points to 3, not 2. It
grabs 3, prints it then removes it. So now we're left with:

[2]
0

Since it's already done 0 the loop ends. By using a copy you're using the
copy to preserve the indexing to the data while you manipulate the data. Hope
this clears it up. :)


I thought intuitively something like that, but your help has been...
helpful ! :)

Can I ask you one more thing ?

It is surprizing to me that in

for x in real[:]

dirs[:] creates a copy of dirs

while

dirs[:] = [] - empty the original list
and
dirs = [] - empty a copy of the original list

I understand ( I think ) the concept of slicing, but this is stil
surprizing to me.
Like to say that when I do

for x in real[:]

this is not using slicing

While
dirs[:] = []
is using slicing
Maybe I just making a big mess in my mind.
It looks like assignments in Python and C++ are pretty different
Cheers,
Marcello

Jul 18 '05 #8
On Fri, Feb 27, 2004 at 02:51:04PM -0500, Marcello Pietrobon wrote:
dirs[:] creates a copy of dirs
This creates a new list which contains the same items as the list named
by dirs
dirs[:] = [] - empty the original list
This changes the items in the list named by dirs. It replaces (mutates)
the range named on the left-hand of = with the items on the right-hand.
and
dirs = [] - empty a copy of the original list


This makes dirs name a different list than it did before, but the value
of the list that dirs named a moment ago is unchanged.

In the case of os.walk (or anywhere you do something by mutating an item
passed in) you have to change the items in a particular list ("mutate
the list") , not change the list a particular local name refers to.

Jeff

Jul 18 '05 #9
On 2004-02-27, Marcello Pietrobon <te*****@attglobal.net> wrote:
Can I ask you one more thing ?
Sure. However I am a Python neophyte who happens to have a few years
experience so take everything I say with a large heaping of salt. :)
It is surprizing to me that in
Ah, took me a minote to see what you were saying.
for x in real[:] dirs[:] creates a copy of dirs
Well, creating a copy is the shorthand. What both of these are doing is
"output the values from the array x from y to z." Since y and z are not
specified you get the whole array (or string, or directory or any other
slicable object).
while dirs[:] = [] - empty the original list
This is "assign the range of x to y the list given". A better way to see
it would be to do this:
= [1, 2, 3, 4]
foo [1, 2, 3, 4] foo[1:2] = [3, 2, 5]
foo [1, 3, 2, 5, 3, 4]

Hmmm, ok, even I'm scratching my head at that since I expected 1, 3, 2, 5
4. Erm, but you get the idea. :)
and
dirs = [] - empty a copy of the original list
This is because here you're assigning the name to a new object.

So in order...

for x in real[:] - iterate over the results of the slice of real from y to z.

foo = dirs[:] - Assign foo to the results of the slice of dirs from y to z.

dirs[:] = [] - Assign the the area of dirs defined by slice y to z with an
emptry array.

dirs = [] - Assign the name dirs to a new, empty array.

Where most people get hung up is the different between strings, which are
immutable, and lists/dictionaries, etc. which are mutable. :)
I understand ( I think ) the concept of slicing, but this is stil
surprizing to me. Like to say that when I do

for x in real[:] this is not using slicing
Yes, it is. Take foo from above...
id(foo) 1075943980
foo points to object 1075943980.
id(foo) 1075943980
foo still points to object 1075943980.
id(foo[:])

1075943308
However this is a different object, 1075943308.

So in the above example it is using a slice. real[:] is returning a slice
and it is that object which x is iterating over. Just because that slice
doesn't have a name assigned to it doesn't mean it doesn't exist. :)
While
dirs[:] = []
is using slicing
Well, it is using it in a different manner. Above you're using slicing to
tell Python what to return. Here you're using slicing to tell Python what to
replace.
Maybe I just making a big mess in my mind.
It looks like assignments in Python and C++ are pretty different


Never touched C++ so I cannot say. :)

--
Steve C. Lamb | I'm your priest, I'm your shrink, I'm your
PGP Key: 8B6E99C5 | main connection to the switchboard of souls.
-------------------------------+---------------------------------------------
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: hokieghal99 | last post by:
This script is not recursive... in order to make it recursive, I have to call it several times (my kludge... hey, it works). I thought os.walk's sole purpose was to recursively walk a directory...
1
by: Samuel Wright | last post by:
Hi Guys Using Python 2.3 here, trying to parse a MBOX email file using the code below: ------------------------ mailboxfile = 'emails.txt' import email import email.Errors, email.Parser,...
2
by: Micheal | last post by:
If I have os.path.walk(name, processDirectory, None) and processDirectory needs three arguments how can I ass them because walk only takes 3?
8
by: RSoIsCaIrLiIoA | last post by:
you people (programmers) are too much sedentary
7
by: KraftDiner | last post by:
The os.walk function walks the operating systems directory tree. This seems to work, but I don't quite understand the tupple that is returned... Can someone explain please? for root, dirs,...
6
by: Bruce | last post by:
Hi all, I have a question about traversing file systems, and could use some help. Because of directories with many files in them, os.walk appears to be rather slow. I`m thinking there is a...
1
by: Efrat Regev | last post by:
Hello, I'm trying to write something that will translate Python code to pseudo-code (for teaching purposes). Googling around indicated that the compiler module is pertinent, especially creating...
2
by: Martin Marcher | last post by:
Hello, I'm playing around with os.walk and I made up del_tree(path) which I think is correct (in terms of the algorithm, but not as python wants it :)). As soon as some directory is deleted...
0
by: Jeff McNeil | last post by:
Your args are fine, that's just the way os.path.walk works. If you just need the absolute pathname of a directory when given a relative path, you can always use os.path.abspath, too. A couple...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.