473,789 Members | 2,478 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Speed ain't bad


One of the posters inspired me to do profiling on my newbie script
(pasted below). After measurements I have found that the speed
of Python, at least in the area where my script works, is surprisingly
high.

This is the experiment: a script recreates the folder hierarchy
somewhere else and stores there the compressed versions of
files from source hierarchy (the script is doing additional backups
of the disk of file server at the company where I work onto other
disks, with compression for sake of saving space). The data was:

468 MB, 15057 files, 1568 folders
(machine: win2k, python v2.3.3)

The time that WinRAR v3.20 (with ZIP format and normal compression
set) needed to compress all that was 119 seconds.

The Python script time (running under profiler) was, drumroll...

198 seconds.

Note that the Python script had to laboriously recreate the tree of
1568 folders and create over 15 thousand compressed files, so
it had more work to do actually than WinRAR did. The size of
compressed data was basically the same, about 207 MB.

I find it very encouraging that in the real world area of application
a newbie script written in the very high-level language can have the
performance that is not that far from the performance of "shrinkwrap "
pro archiver (WinRAR is excellent archiver, both when it comes to
compression as well as speed). I do realize that this is mainly
the result of all the "underlying infrastructure" of Python. Great
work, guys. Congrats.

The only thing I'm missing in this picture is knowledge if my script
could be further optimised (not that I actually need better
performance, I'm just curious what possible solutions could be).

Any takers among the experienced guys?

Profiling results:
p3.sort_stats(' cumulative').pr int_stats(40)

Fri Dec 31 01:04:14 2004 p3.tmp

580543 function calls (568607 primitive calls) in 198.124 CPU
seconds

Ordered by: cumulative time
List reduced from 69 to 40 due to restriction <40>

ncalls tottime percall cumtime percall
filename:lineno (function)
1 0.013 0.013 198.124 198.124 profile:0(z3())
1 0.000 0.000 198.110 198.110 <string>:1(?)
1 0.000 0.000 198.110 198.110 <interactive
input>:1(z3)
1 1.513 1.513 198.110 198.110 zmtree3.py:26(z mtree)
15057 14.504 0.001 186.961 0.012 zmtree3.py:7(zf )
15057 147.582 0.010 148.778 0.010
C:\Python23\lib \zipfile.py:388 (write)
15057 12.156 0.001 12.156 0.001
C:\Python23\lib \zipfile.py:182 (__init__)
32002 7.957 0.000 8.542 0.000
C:\PYTHON23\Lib \ntpath.py:266( isdir)
13826/1890 2.550 0.000 8.143 0.004
C:\Python23\lib \os.py:206(walk )
30114 3.164 0.000 3.164 0.000
C:\Python23\lib \zipfile.py:483 (close)
60228 1.753 0.000 2.149 0.000
C:\PYTHON23\Lib \ntpath.py:157( split)
45171 0.538 0.000 2.116 0.000
C:\PYTHON23\Lib \ntpath.py:197( basename)
15057 1.285 0.000 1.917 0.000
C:\PYTHON23\Lib \ntpath.py:467( abspath)
33890 0.688 0.000 1.419 0.000
C:\PYTHON23\Lib \ntpath.py:58(j oin)
109175 0.783 0.000 0.783 0.000
C:\PYTHON23\Lib \ntpath.py:115( splitdrive)
15057 0.196 0.000 0.768 0.000
C:\PYTHON23\Lib \ntpath.py:204( dirname)
33890 0.433 0.000 0.731 0.000
C:\PYTHON23\Lib \ntpath.py:50(i sabs)
15057 0.544 0.000 0.632 0.000
C:\PYTHON23\Lib \ntpath.py:438( normpath)
32002 0.431 0.000 0.585 0.000
C:\PYTHON23\Lib \stat.py:45(S_I SDIR)
15057 0.555 0.000 0.555 0.000
C:\Python23\lib \zipfile.py:149 (FileHeader)
15057 0.483 0.000 0.483 0.000
C:\Python23\lib \zipfile.py:116 (__init__)
151 0.002 0.000 0.435 0.003
C:\PYTHON23\lib \site-packages\Python win\pywin\frame work\winout.py: 171(write)
151 0.002 0.000 0.432 0.003
C:\PYTHON23\lib \site-packages\Python win\pywin\frame work\winout.py: 489(write)
151 0.013 0.000 0.430 0.003
C:\PYTHON23\lib \site-packages\Python win\pywin\frame work\winout.py: 461(HandleOutpu t)
76 0.087 0.001 0.405 0.005
C:\PYTHON23\lib \site-packages\Python win\pywin\frame work\winout.py: 430(QueueFlush)
15057 0.239 0.000 0.340 0.000
C:\Python23\lib \zipfile.py:479 (__del__)
15057 0.157 0.000 0.157 0.000
C:\Python23\lib \zipfile.py:371 (_writecheck)
32002 0.154 0.000 0.154 0.000
C:\PYTHON23\Lib \stat.py:29(S_I FMT)
76 0.007 0.000 0.146 0.002
C:\PYTHON23\lib \site-packages\Python win\pywin\frame work\winout.py: 262(dowrite)
76 0.007 0.000 0.137 0.002
C:\PYTHON23\lib \site-packages\Python win\pywin\scint illa\formatter. py:221(OnStyleN eeded)
76 0.011 0.000 0.118 0.002
C:\PYTHON23\lib \site-packages\Python win\pywin\frame work\interact.p y:197(Colorize)
76 0.110 0.001 0.112 0.001
C:\PYTHON23\lib \site-packages\Python win\pywin\scint illa\control.py :69(SCIInsertTe xt)
76 0.079 0.001 0.081 0.001
C:\PYTHON23\lib \site-packages\Python win\pywin\scint illa\control.py :333(GetTextRan ge)
76 0.018 0.000 0.020 0.000
C:\PYTHON23\lib \site-packages\Python win\pywin\scint illa\control.py :296(SetSel)
76 0.006 0.000 0.018 0.000
C:\PYTHON23\lib \site-packages\Python win\pywin\scint illa\document.p y:149(__call__)
227 0.003 0.000 0.012 0.000
C:\Python23\lib \Queue.py:172(g et_nowait)
76 0.007 0.000 0.011 0.000
C:\PYTHON23\lib \site-packages\Python win\pywin\frame work\interact.p y:114(ColorizeI nteractiveCode)
532 0.011 0.000 0.011 0.000
C:\PYTHON23\lib \site-packages\Python win\pywin\scint illa\control.py :330(GetTextLen gth)
76 0.001 0.000 0.010 0.000
C:\PYTHON23\lib \site-packages\Python win\pywin\scint illa\view.py:25 6(OnBraceMatch)
1888 0.009 0.000 0.009 0.000
C:\PYTHON23\Lib \ntpath.py:245( islink)
---
Script:

#!/usr/bin/python

import os
import sys
from zipfile import ZipFile, ZIP_DEFLATED

def zf(sfpath, targetdir):
if (sys.platform[:3] == 'win'):
tgfpath=sfpath[2:]
else:
tgfpath=sfpath
zfdir=os.path.d irname(os.path. abspath(targetd ir) + tgfpath)
zfpath=zfdir + os.path.sep + os.path.basenam e(tgfpath) + '.zip'
if(not os.path.isdir(z fdir)):
os.makedirs(zfd ir)
archive=ZipFile (zfpath, 'w', ZIP_DEFLATED)
sfile=open(sfpa th,'rb')
zfname=os.path. basename(tgfpat h)
archive.write(s fpath, os.path.basenam e(zfpath), ZIP_DEFLATED)
archive.close()
ssize=os.stat(s fpath).st_size
zsize=os.stat(z fpath).st_size
return (ssize,zsize)
def zmtree(sdir,tdi r):
n=0
ssize=0
zsize=0
sys.stdout.writ e('\n ')
for root, dirs, files in os.walk(sdir):
for file in files:
res=zf(os.path. join(root,file) ,tdir)
ssize+=res[0]
zsize+=res[1]
n=n+1
#sys.stdout.wri te('.')
if (n % 200 == 0):
print " %.2fM (%.2fM)" % (ssize/1048576.0,
zsize/1048576.0)
#sys.stdout.wri te(' ')
return (n, ssize, zsize)
if __name__=="__ma in__":
if len(sys.argv) == 3:
if(os.path.isdi r(sys.argv[1]) and os.path.isdir(s ys.argv[2])):

(n,ssize,zsize) =zmtree(os.path .abspath(sys.ar gv[1]),os.path.abspa th(sys.argv[2]))
print "\n\n Summary:\n Number of files compressed: %d\n
Total size of original files: %.2fM\n \
Total size of compressed files: %.2fM" % (n, ssize/1048576.0,
zsize/1048576.0)
sys.exit(0)
else:
print "Incorrect arguments."
if (not os.path.isdir(s ys.argv[1])): print sys.argv[1] + "
is not directory."
if (not os.path.isdir(s ys.argv[2])): print sys.argv[2] + "
is not directory."

print "\n Usage:\n " + sys.argv[0] + " source-directory
target-directory"

--
It's a man's life in a Python Programming Association.
Jul 18 '05
14 1550
"Bulba!" <bu***@bulba.co m> wrote:

One of the posters inspired me to do profiling on my newbie script
(pasted below). After measurements I have found that the speed
of Python, at least in the area where my script works, is surprisingly
high.
Pretty good code for someone who calls himself a newbie.

One line that puzzles me: sfile=open(sfpa th,'rb')
You never use sfile again.
In any case, you should explicitly close all files that you open. Even
if there's an exception:

sfile = open(sfpath, 'rb')
try:
<stuff to do with the file open>
finally:
sfile.close()

The only thing I'm missing in this picture is knowledge if my script
could be further optimised (not that I actually need better
performance, I'm just curious what possible solutions could be).

Any takers among the experienced guys?


Basically the way to optimise these things is to cut down on anything
that does I/O: Use as few calls to os.path.is{dir, file}, os.stat, open
and such that you can get away with.

One way to do that is caching; e.g. storing names of known directories
in a set (sets.Set()) and checking that set before calling
os.path.isdir. I haven't spotted any obvious opportunities for that
in your script, though.

Another way is the strategy of "it's easier to ask forgiveness than to
ask permission".
If you replace:
if(not os.path.isdir(z fdir)):
os.makedirs(zfd ir)
with:
try:
os.makedirs(zfd ir)
except EnvironmentErro r:
pass

then not only will your script become a micron more robust, but
assuming zfdir typically does not exist, you will have saved the call
to os.path.isdir.

- Anders
Jul 18 '05 #11
On Sat, 1 Jan 2005 14:20:06 +0100, "Anders J. Munch"
<an******@inbou nd.dk> wrote:
One of the posters inspired me to do profiling on my newbie script
(pasted below). After measurements I have found that the speed
of Python, at least in the area where my script works, is surprisingly
high.
Pretty good code for someone who calls himself a newbie.


<blush>
One line that puzzles me:
sfile=open(sfpa th,'rb')
You never use sfile again.
Right! It's a leftover from a previous implementation (that
used bzip2). Forgot to delete it, thanks.
Another way is the strategy of "it's easier to ask forgiveness than to
ask permission".
If you replace:
if(not os.path.isdir(z fdir)):
os.makedirs(zfd ir)
with:
try:
os.makedirs(zfd ir)
except EnvironmentErro r:
pass then not only will your script become a micron more robust, but
assuming zfdir typically does not exist, you will have saved the call
to os.path.isdir.


Yes, this is the kind of habit that low-level languages like C
missing features like exceptions ingrain in a mind of a programmer...

Getting out of this straitjacket is kind of hard - it would not cross
my mind to try smth like what you showed me, thanks!

Exceptions in Python are a GODSEND. I strongly recommend
to any former C programmer wanting to get rid of a "straightjacket "
to read the following to get an idea how not to write C code in Python
and instead exploit the better side of VHLL:

http://gnosis.cx/TPiP/appendix_a.txt


--
It's a man's life in a Python Programming Association.
Jul 18 '05 #12
Anders J. Munch wrote:
Another way is the strategy of "it's easier to ask forgiveness than to
ask permission".
If you replace:
if(not os.path.isdir(z fdir)):
os.makedirs(zfd ir)
with:
try:
os.makedirs(zfd ir)
except EnvironmentErro r:
pass

then not only will your script become a micron more robust, but
assuming zfdir typically does not exist, you will have saved the call
to os.path.isdir.


.... at the cost of an exception frame setup and an incomplete call to
os.makedirs(). It's an open question whether the exception setup and
recovery take less time than the call to isdir(), though I'd expect
probably not. The exception route definitely makes more sense if the
makedirs() call is likely to succeed; if it's likely to fail, then
things are murkier.

Since isdir() *is* a disk i/o operation, then in this case the
exception route is probably preferable anyhow. In either case, one
must touch the disk; in the exception case, there will only ever be
one disk access (which either succeeds or fails), while in the other
case, there may be two disk accesses. However, if it wasn't for the
extra disk i/o operation, then the 'if ...' might be slightly faster,
even though the exception-based route is more Pythonic.

Jeff Shannon
Technician/Programmer
Credit International

Jul 18 '05 #13
Anders J. Munch wrote:
Another way is the strategy of "it's easier to ask forgiveness than to ask permission".
If you replace:
if(not os.path.isdir(z fdir)):
os.makedirs(zfd ir)
with:
try:
os.makedirs(zfd ir)
except EnvironmentErro r:
pass

then not only will your script become a micron more robust, but
assuming zfdir typically does not exist, you will have saved the call
to os.path.isdir.


1. Robustness: Both versions will "crash" (in the sense of an unhandled
exception) in the situation where zfdir exists but is not a directory.
The revised version just crashes later than the OP's version :-(
Trapping EnvironmentErro r seems not very useful -- the result will not
distinguish (on Windows 2000 at least) between the 'existing dir' and
'existing non-directory' cases.
Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on
win32
import os, os.path
os.path.exists( 'fubar_not_dir' ) True os.path.isdir(' fubar_not_dir') False os.makedirs('fu bar_not_dir') Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "c:\Python24\li b\os.py", line 159, in makedirs
mkdir(name, mode)
OSError: [Errno 17] File exists: 'fubar_not_dir' try: .... os.mkdir('fubar _not_dir')
.... except EnvironmentErro r:
.... print 'trapped env err'
....
trapped env err os.mkdir('fubar _is_dir')
os.mkdir('fubar _is_dir') Traceback (most recent call last):
File "<stdin>", line 1, in ?
OSError: [Errno 17] File exists: 'fubar_is_dir'


2. Efficiency: I don't see the disk I/O inefficiency in calling
os.path.isdir() before os.makedirs() -- if the relevant part of the
filesystem wasn't already in memory, the isdir() call would make it so,
and makedirs() would get a free ride, yes/no?

Jul 18 '05 #14
"John Machin" <sj******@lexic on.net> wrote:
1. Robustness: Both versions will "crash" (in the sense of an unhandled
2. Efficiency: I don't see the disk I/O inefficiency in calling
3. Don't itemise perceived flaws in other people's postings. It may
give off a hostile impression.
1. Robustness: Both versions will "crash" (in the sense of an unhandled
exception) in the situation where zfdir exists but is not a directory.
The revised version just crashes later than the OP's version :-(
Trapping EnvironmentErro r seems not very useful -- the result will not
distinguish (on Windows 2000 at least) between the 'existing dir' and
'existing non-directory' cases.
Good point; my version has room for improvement. But at least it fixes
the race condition between isdir and makedirs.

What I like about EnvironmentErro r is that it it's easier to use than
figuring out which one of IOError or OSError applies (and whether that
can be relied on, cross-platform).
2. Efficiency: I don't see the disk I/O inefficiency in calling
os.path.isdir() before os.makedirs() -- if the relevant part of the
filesystem wasn't already in memory, the isdir() call would make it
so, and makedirs() would get a free ride, yes/no?


Perhaps. Looking stuff up in operating system tables and buffers takes
time too. And then there's network latency; how much local caching do
you get for an NFS mount or SMB share?

If you really want to know, measure.

- Anders
Jul 18 '05 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
23074
by: Yang Li Ke | last post by:
Hi guys, Is it possible to know the internet speed of the visitors with php? Thanx -- Yang
34
2480
by: Jacek Generowicz | last post by:
I have a program in which I make very good use of a memoizer: def memoize(callable): cache = {} def proxy(*args): try: return cache except KeyError: return cache.setdefault(args, callable(*args)) return proxy which, is functionally equivalent to
72
4867
by: Herbert | last post by:
I'm still relativey new to stylesheets, so I'm hoping that the way I'm going about things can be seriously improved upon, i.e . I just haven't undersood something obvious about the 'cascading' nature of the coding, which I believe concerns the way attributes relate to one another when 'nested'... I think I can illustrate the nature of the beast with this example, using just two text styles:
7
3049
by: YAZ | last post by:
Hello, I have a dll which do some number crunching. Performances (execution speed) are very important in my application. I use VC6 to compile the DLL. A friend of mine told me that in Visual studio 2003 .net optimization were enhanced and that i must gain in performance if I switch to VS 2003 or intel compiler. So I send him the project and he returned a compiled DLL with VS 2003. Result : the VS 2003 compiled Dll is slower than the VC6...
8
1941
by: rpsetzer | last post by:
I have to create a big web application and I was thinking of using a data layer. For each entity in the database, I'll define a class that maps the table structure, having sub-objects for each foreign key, having insert/delete/update methods, the usual deal. Yet, I am very concerned about performance. For example, there are lots of cases when I may just be needing the employee name. Yet using this model, I will have to instantiate an...
17
1994
by: garrickp | last post by:
While creating a log parser for fairly large logs, we have run into an issue where the time to process was relatively unacceptable (upwards of 5 minutes for 1-2 million lines of logs). In contrast, using the Linux tool grep would complete the same search in a matter of seconds. The search we used was a regex of 6 elements "or"ed together, with an exclusionary set of ~3 elements. Due to the size of the files, we decided to run these line...
6
6256
by: Jassim Rahma | last post by:
I want to detect the internet speed using C# to show the user on what speed he's connecting to internet?
9
2730
by: copx | last post by:
C's enum type disappoints me a lot. You cannot define which type of integer variable is used. This contradicts C's low level spirit. "I want a number variable. I do not care about size or execution speed." feels like typical ultra high level scripting language design. The compiler could not optimize an enum if it wanted to, because you cannot even specify whether memory requirements or speed are your primary concern. Enums were meant to...
4
8623
by: nestle | last post by:
I have DSL with a download speed of 32MB/s and an upload speed of 8MB/s(according to my ISP), and I am using a router. My upload speed is always between 8MB/s and 9MB/s(which is above the max upload speed), ALWAYS. However, my download speed doesn't go over 25MB/s. And when my brother turns on the internet from his computer and takes up half the download/upload speeds (routers automatically split the speeds in two when two computers are using...
0
9511
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10404
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10195
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10136
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9979
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6765
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5548
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3695
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2906
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.