473,383 Members | 1,922 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

Is there an alternative to os.walk?

Hi all,
I have a question about traversing file systems, and could use some
help. Because of directories with many files in them, os.walk appears
to be rather slow. I`m thinking there is a potential for speed-up since
I don`t need os.walk to report filenames of all the files in every
directory it visits. Is there some clever way to use os.walk or another
tool that would provide functionality like os.walk except for the
listing of the filenames?

Oct 4 '06 #1
6 5873
Bruce wrote:
Hi all,
I have a question about traversing file systems, and could use some
help. Because of directories with many files in them, os.walk appears
to be rather slow.
Provide more info/code. I suspect it is not os.walk itself that is slow,
but rather the code that processes its result...
I`m thinking there is a potential for speed-up since
I don`t need os.walk to report filenames of all the files in every
directory it visits. Is there some clever way to use os.walk or another
tool that would provide functionality like os.walk except for the
listing of the filenames?
You may want to take a look at os.path.walk then.

--Irmen
Oct 4 '06 #2
Bruce wrote:
Hi all,
I have a question about traversing file systems, and could use some
help. Because of directories with many files in them, os.walk appears
to be rather slow. I`m thinking there is a potential for speed-up since
I don`t need os.walk to report filenames of all the files in every
directory it visits. Is there some clever way to use os.walk or another
tool that would provide functionality like os.walk except for the
listing of the filenames?
You might want to check out the path module [1] (not os.path). The
following is from the docs:
The method path.walk() returns an iterator which steps recursively
through a whole directory tree. path.walkdirs() and path.walkfiles()
are the same, but they yield only the directories and only the files,
respectively.
Oh, and you can thank Paul Bissex for pointing me to path [2].

[1]: http://www.jorendorff.com/articles/python/path/
[2]: http://e-scribe.com/news/289

Oct 4 '06 #3
waylan wrote:
Bruce wrote:
Hi all,
I have a question about traversing file systems, and could use some
help. Because of directories with many files in them, os.walk appears
to be rather slow. I`m thinking there is a potential for speed-up since
I don`t need os.walk to report filenames of all the files in every
directory it visits. Is there some clever way to use os.walk or another
tool that would provide functionality like os.walk except for the
listing of the filenames?

You might want to check out the path module [1] (not os.path). The
following is from the docs:
The method path.walk() returns an iterator which steps recursively
through a whole directory tree. path.walkdirs() and path.walkfiles()
are the same, but they yield only the directories and only the files,
respectively.

Oh, and you can thank Paul Bissex for pointing me to path [2].
[1]: http://www.jorendorff.com/articles/python/path/
[2]: http://e-scribe.com/news/289
A little late but.. thanks for the replies, was very useful. Here`s
what I do in this case:

def search(a_dir):
valid_dirs = []
walker = os.walk(a_dir)
while 1:
try:
dirpath, dirnames, filenames = walker.next()
except StopIteration:
break
if dirtest(dirpath,filenames):
valid_dirs.append(dirpath)
return valid_dirs

def dirtest(a_dir):
testfiles = ['a','b','c']
for f in testfiles:
if not os.path.exists(os.path.join(a_dir,f)):
return 0
return 1

I think you`re right - it`s not os.walk that makes this slow, it`s the
dirtest method that takes so much more time when there are many files
in a directory. Also, thanks for pointing me to the path module, was
interesting.

Oct 7 '06 #4
"Bruce" <ep****@gmail.comwrote:
>
A little late but.. thanks for the replies, was very useful. Here`s
what I do in this case:

def search(a_dir):
valid_dirs = []
walker = os.walk(a_dir)
while 1:
try:
dirpath, dirnames, filenames = walker.next()
except StopIteration:
break
if dirtest(dirpath,filenames):
valid_dirs.append(dirpath)
return valid_dirs

def dirtest(a_dir):
testfiles = ['a','b','c']
for f in testfiles:
if not os.path.exists(os.path.join(a_dir,f)):
return 0
return 1

I think you`re right - it`s not os.walk that makes this slow, it`s the
dirtest method that takes so much more time when there are many files
in a directory. Also, thanks for pointing me to the path module, was
interesting.
Umm, may I point out that you don't NEED the "os.path.exists" call, because
you are already being HANDED a list of all the filenames in that directory?
You could "dirtest" with this much faster routinee:

def dirtest(a_dir,filenames):
for f in ['a','b','c']:
if not f in filenames:
return 0
return 1
--
- Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Oct 8 '06 #5
On 10/8/06, Tim Roberts <ti**@probo.comwrote:
Umm, may I point out that you don't NEED the "os.path.exists" call, because
you are already being HANDED a list of all the filenames in that directory?
You could "dirtest" with this much faster routinee:

def dirtest(a_dir,filenames):
for f in ['a','b','c']:
if not f in filenames:
return 0
return 1
Or False / True for sufficiently new versions of Python. :)

-- Theerasak
Oct 8 '06 #6
Ant
The idiomatic way of doing the tree traversal is:

def search(a_dir):
valid_dirs = []
for dirpath, dirnames, filenames in os.walk(a_dir):
if dirtest(filenames):
valid_dirs.append(dirpath)
return valid_dirs

Also since you are given a list of filenames in the directory, then why
not just check the list of those files for your test files:

def dirtest(filenames):
testfiles = ['a','b','c']
for f in testfiles:
if not f in filenames:
return False
return False

You'd have to test this to see if it made a difference in performance,
but it makes for more readable code

Oct 8 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: hokieghal99 | last post by:
This script is not recursive... in order to make it recursive, I have to call it several times (my kludge... hey, it works). I thought os.walk's sole purpose was to recursively walk a directory...
9
by: Marcello Pietrobon | last post by:
Hello, I am using Pyton 2.3 I desire to walk a directory without recursion this only partly works: def walk_files() : for root, dirs, files in os.walk(top, topdown=True): for filename in...
6
by: rbt | last post by:
More of an OS question than a Python question, but it is Python related so here goes: When I do os.walk('/') on a Linux computer, the entire file system is walked. On windows, however, I can...
7
by: KraftDiner | last post by:
The os.walk function walks the operating systems directory tree. This seems to work, but I don't quite understand the tupple that is returned... Can someone explain please? for root, dirs,...
9
by: silverburgh.meryl | last post by:
i am trying to use python to walk thru each subdirectory from a top directory. Here is my script: savedPagesDirectory = "/home/meryl/saved_pages/data" dir=open(savedPagesDirectory, 'r') ...
2
by: gregpinero | last post by:
In the example from help(os.walk) it lists this: from os.path import join, getsize for root, dirs, files in walk('python/Lib/email'): print root, "consumes", print sum(), print "bytes in",...
2
by: Martin Marcher | last post by:
Hello, I'm playing around with os.walk and I made up del_tree(path) which I think is correct (in terms of the algorithm, but not as python wants it :)). As soon as some directory is deleted...
0
by: Jeff McNeil | last post by:
Your args are fine, that's just the way os.path.walk works. If you just need the absolute pathname of a directory when given a relative path, you can always use os.path.abspath, too. A couple...
4
by: Jeff Nyman | last post by:
Greetings all. I did some searching on this but I can't seem to find a specific solution. I have code like this: ========================================= def walker1(arg, dirname, names):...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.