By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
439,957 Members | 1,960 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 439,957 IT Pros & Developers. It's quick & easy.

Big speed boost in os.walk in Python 2.5

P: n/a
Hi,
I noticed a big speed improvement in some of my script that use os.walk
and I write a small script to check it:
import os
for path, dirs, files in os.walk('D:\\FILES\\'):
pass

Results on Windows XP after some run to fill the disk cache (with
~59000 files and ~3500 folders):
Python 2.4.3 : 45s
Python 2.5 : 10s

Very nice, but somewhat strange...
Is Python 2.4.3 os.walk buggy ???
Is this results only valid in Windows or *nix system show the same
difference ?
The profiler show that most of time is spend in ntpath.isdir and this
function is *a lot* faster in Python 2.5.
Maybe this improvement could be backported in Python 2.4 branch for the
next release ?
Python 2.4.3
604295 function calls (587634 primitive calls) in 48.629 CPU
seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
62554 0.264 0.000 0.264 0.000 :0(append)
1 0.001 0.001 48.593 48.593 :0(execfile)
66074 0.197 0.000 0.197 0.000 :0(len)
3521 5.219 0.001 5.219 0.001 :0(listdir)
1 0.036 0.036 0.036 0.036 :0(setprofile)
62554 38.812 0.001 38.812 0.001 :0(stat)
1 0.000 0.000 48.593 48.593 <string>:1(?)
66074 0.218 0.000 0.218 0.000 ntpath.py:116(splitdrive)
3520 0.009 0.000 0.009 0.000 ntpath.py:246(islink)
62554 0.767 0.000 40.137 0.001 ntpath.py:268(isdir)
66074 0.433 0.000 0.650 0.000 ntpath.py:51(isabs)
66074 0.880 0.000 1.726 0.000 ntpath.py:59(join)
20183/3522 1.217 0.000 48.573 0.014 os.py:211(walk)
1 0.000 0.000 48.629 48.629
profile:0(execfile('test.py'))
0 0.000 0.000 profile:0(profiler)
62554 0.174 0.000 0.174 0.000 stat.py:29(S_IFMT)
62554 0.385 0.000 0.559 0.000 stat.py:45(S_ISDIR)
1 0.019 0.019 48.592 48.592 test.py:1(?)
Python 2.5:
604295 function calls (587634 primitive calls) in 17.386 CPU
seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
62554 0.247 0.000 0.247 0.000 :0(append)
1 0.001 0.001 17.315 17.315 :0(execfile)
66074 0.168 0.000 0.168 0.000 :0(len)
3521 5.287 0.002 5.287 0.002 :0(listdir)
1 0.071 0.071 0.071 0.071 :0(setprofile)
62554 7.812 0.000 7.812 0.000 :0(stat)
1 0.000 0.000 17.315 17.315 <string>:1(<module>)
66074 0.186 0.000 0.186 0.000 ntpath.py:116(splitdrive)
3520 0.009 0.000 0.009 0.000 ntpath.py:245(islink)
62554 0.712 0.000 9.013 0.000 ntpath.py:267(isdir)
66074 0.394 0.000 0.581 0.000 ntpath.py:51(isabs)
66074 0.815 0.000 1.564 0.000 ntpath.py:59(join)
20183/3522 1.176 0.000 17.296 0.005 os.py:218(walk)
1 0.000 0.000 17.386 17.386
profile:0(execfile('test.py'))
0 0.000 0.000 profile:0(profiler)
62554 0.159 0.000 0.159 0.000 stat.py:29(S_IFMT)
62554 0.331 0.000 0.489 0.000 stat.py:45(S_ISDIR)
1 0.018 0.018 17.314 17.314 test.py:1(<module>)

Oct 13 '06 #1
Share this Question
Share on Google+
3 Replies


P: n/a
looping wrote:
Results on Windows XP after some run to fill the disk cache (with
~59000 files and ~3500 folders):
Python 2.4.3 : 45s
Python 2.5 : 10s

Very nice, but somewhat strange...
Is Python 2.4.3 os.walk buggy ???
No. A few "os" function are now implemented in terms of Windows API:s,
instead of using Microsoft C's POSIX compatibility layer. This includes
os.stat(), which is what isdir() uses to check if something is a
directory. The code was rewritten to work around problems with
timestamps, so the speedup is purely a side effect.
Is this results only valid in Windows or *nix system show the same
difference ?
On Unix system, Python uses POSIX API:s, not Windows API:s.
The profiler show that most of time is spend in ntpath.isdir and this
function is *a lot* faster in Python 2.5.
Why are you asking if something's buggy when you've already figured out
what's been improved?
Maybe this improvement could be backported in Python 2.4 branch for the
next release ?
It's not really broken, so that's not very likely.

</F>

Oct 13 '06 #2

P: n/a
Fredrik Lundh wrote:
looping wrote:

Very nice, but somewhat strange...
Is Python 2.4.3 os.walk buggy ???


Why are you asking if something's buggy when you've already figured out
what's been improved?
You're right, buggy isn't the right word...

Anyway thanks for your detailed informations and I'm very pleased with
the performance improvement even if it's only a side effect and only on
Windows.

Oct 13 '06 #3

P: n/a
looping schrieb:
Maybe this improvement could be backported in Python 2.4 branch for the
next release ?
As Fredrik explains, this is probably the side-effect of a from-scratch
rewrite of the relevant functions. Another (undesirable) side-effect is
that the resulting binary won't work on Windows 95 anymore. So
backporting it as-is is out of the question.

However, even if the patch was improved to still work on W9x, and to not
introduce the other behavioral changes that came with the rewrite, it
still couldn't go into 2.4.x. Likely, 2.4.4 is the final 2.4 release,
and the release candidate for that was already produced.

Regards,
Martin
Oct 13 '06 #4

This discussion thread is closed

Replies have been disabled for this discussion.