473,776 Members | 1,652 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Speed ain't bad


One of the posters inspired me to do profiling on my newbie script
(pasted below). After measurements I have found that the speed
of Python, at least in the area where my script works, is surprisingly
high.

This is the experiment: a script recreates the folder hierarchy
somewhere else and stores there the compressed versions of
files from source hierarchy (the script is doing additional backups
of the disk of file server at the company where I work onto other
disks, with compression for sake of saving space). The data was:

468 MB, 15057 files, 1568 folders
(machine: win2k, python v2.3.3)

The time that WinRAR v3.20 (with ZIP format and normal compression
set) needed to compress all that was 119 seconds.

The Python script time (running under profiler) was, drumroll...

198 seconds.

Note that the Python script had to laboriously recreate the tree of
1568 folders and create over 15 thousand compressed files, so
it had more work to do actually than WinRAR did. The size of
compressed data was basically the same, about 207 MB.

I find it very encouraging that in the real world area of application
a newbie script written in the very high-level language can have the
performance that is not that far from the performance of "shrinkwrap "
pro archiver (WinRAR is excellent archiver, both when it comes to
compression as well as speed). I do realize that this is mainly
the result of all the "underlying infrastructure" of Python. Great
work, guys. Congrats.

The only thing I'm missing in this picture is knowledge if my script
could be further optimised (not that I actually need better
performance, I'm just curious what possible solutions could be).

Any takers among the experienced guys?

Profiling results:
p3.sort_stats(' cumulative').pr int_stats(40)

Fri Dec 31 01:04:14 2004 p3.tmp

580543 function calls (568607 primitive calls) in 198.124 CPU
seconds

Ordered by: cumulative time
List reduced from 69 to 40 due to restriction <40>

ncalls tottime percall cumtime percall
filename:lineno (function)
1 0.013 0.013 198.124 198.124 profile:0(z3())
1 0.000 0.000 198.110 198.110 <string>:1(?)
1 0.000 0.000 198.110 198.110 <interactive
input>:1(z3)
1 1.513 1.513 198.110 198.110 zmtree3.py:26(z mtree)
15057 14.504 0.001 186.961 0.012 zmtree3.py:7(zf )
15057 147.582 0.010 148.778 0.010
C:\Python23\lib \zipfile.py:388 (write)
15057 12.156 0.001 12.156 0.001
C:\Python23\lib \zipfile.py:182 (__init__)
32002 7.957 0.000 8.542 0.000
C:\PYTHON23\Lib \ntpath.py:266( isdir)
13826/1890 2.550 0.000 8.143 0.004
C:\Python23\lib \os.py:206(walk )
30114 3.164 0.000 3.164 0.000
C:\Python23\lib \zipfile.py:483 (close)
60228 1.753 0.000 2.149 0.000
C:\PYTHON23\Lib \ntpath.py:157( split)
45171 0.538 0.000 2.116 0.000
C:\PYTHON23\Lib \ntpath.py:197( basename)
15057 1.285 0.000 1.917 0.000
C:\PYTHON23\Lib \ntpath.py:467( abspath)
33890 0.688 0.000 1.419 0.000
C:\PYTHON23\Lib \ntpath.py:58(j oin)
109175 0.783 0.000 0.783 0.000
C:\PYTHON23\Lib \ntpath.py:115( splitdrive)
15057 0.196 0.000 0.768 0.000
C:\PYTHON23\Lib \ntpath.py:204( dirname)
33890 0.433 0.000 0.731 0.000
C:\PYTHON23\Lib \ntpath.py:50(i sabs)
15057 0.544 0.000 0.632 0.000
C:\PYTHON23\Lib \ntpath.py:438( normpath)
32002 0.431 0.000 0.585 0.000
C:\PYTHON23\Lib \stat.py:45(S_I SDIR)
15057 0.555 0.000 0.555 0.000
C:\Python23\lib \zipfile.py:149 (FileHeader)
15057 0.483 0.000 0.483 0.000
C:\Python23\lib \zipfile.py:116 (__init__)
151 0.002 0.000 0.435 0.003
C:\PYTHON23\lib \site-packages\Python win\pywin\frame work\winout.py: 171(write)
151 0.002 0.000 0.432 0.003
C:\PYTHON23\lib \site-packages\Python win\pywin\frame work\winout.py: 489(write)
151 0.013 0.000 0.430 0.003
C:\PYTHON23\lib \site-packages\Python win\pywin\frame work\winout.py: 461(HandleOutpu t)
76 0.087 0.001 0.405 0.005
C:\PYTHON23\lib \site-packages\Python win\pywin\frame work\winout.py: 430(QueueFlush)
15057 0.239 0.000 0.340 0.000
C:\Python23\lib \zipfile.py:479 (__del__)
15057 0.157 0.000 0.157 0.000
C:\Python23\lib \zipfile.py:371 (_writecheck)
32002 0.154 0.000 0.154 0.000
C:\PYTHON23\Lib \stat.py:29(S_I FMT)
76 0.007 0.000 0.146 0.002
C:\PYTHON23\lib \site-packages\Python win\pywin\frame work\winout.py: 262(dowrite)
76 0.007 0.000 0.137 0.002
C:\PYTHON23\lib \site-packages\Python win\pywin\scint illa\formatter. py:221(OnStyleN eeded)
76 0.011 0.000 0.118 0.002
C:\PYTHON23\lib \site-packages\Python win\pywin\frame work\interact.p y:197(Colorize)
76 0.110 0.001 0.112 0.001
C:\PYTHON23\lib \site-packages\Python win\pywin\scint illa\control.py :69(SCIInsertTe xt)
76 0.079 0.001 0.081 0.001
C:\PYTHON23\lib \site-packages\Python win\pywin\scint illa\control.py :333(GetTextRan ge)
76 0.018 0.000 0.020 0.000
C:\PYTHON23\lib \site-packages\Python win\pywin\scint illa\control.py :296(SetSel)
76 0.006 0.000 0.018 0.000
C:\PYTHON23\lib \site-packages\Python win\pywin\scint illa\document.p y:149(__call__)
227 0.003 0.000 0.012 0.000
C:\Python23\lib \Queue.py:172(g et_nowait)
76 0.007 0.000 0.011 0.000
C:\PYTHON23\lib \site-packages\Python win\pywin\frame work\interact.p y:114(ColorizeI nteractiveCode)
532 0.011 0.000 0.011 0.000
C:\PYTHON23\lib \site-packages\Python win\pywin\scint illa\control.py :330(GetTextLen gth)
76 0.001 0.000 0.010 0.000
C:\PYTHON23\lib \site-packages\Python win\pywin\scint illa\view.py:25 6(OnBraceMatch)
1888 0.009 0.000 0.009 0.000
C:\PYTHON23\Lib \ntpath.py:245( islink)
---
Script:

#!/usr/bin/python

import os
import sys
from zipfile import ZipFile, ZIP_DEFLATED

def zf(sfpath, targetdir):
if (sys.platform[:3] == 'win'):
tgfpath=sfpath[2:]
else:
tgfpath=sfpath
zfdir=os.path.d irname(os.path. abspath(targetd ir) + tgfpath)
zfpath=zfdir + os.path.sep + os.path.basenam e(tgfpath) + '.zip'
if(not os.path.isdir(z fdir)):
os.makedirs(zfd ir)
archive=ZipFile (zfpath, 'w', ZIP_DEFLATED)
sfile=open(sfpa th,'rb')
zfname=os.path. basename(tgfpat h)
archive.write(s fpath, os.path.basenam e(zfpath), ZIP_DEFLATED)
archive.close()
ssize=os.stat(s fpath).st_size
zsize=os.stat(z fpath).st_size
return (ssize,zsize)
def zmtree(sdir,tdi r):
n=0
ssize=0
zsize=0
sys.stdout.writ e('\n ')
for root, dirs, files in os.walk(sdir):
for file in files:
res=zf(os.path. join(root,file) ,tdir)
ssize+=res[0]
zsize+=res[1]
n=n+1
#sys.stdout.wri te('.')
if (n % 200 == 0):
print " %.2fM (%.2fM)" % (ssize/1048576.0,
zsize/1048576.0)
#sys.stdout.wri te(' ')
return (n, ssize, zsize)
if __name__=="__ma in__":
if len(sys.argv) == 3:
if(os.path.isdi r(sys.argv[1]) and os.path.isdir(s ys.argv[2])):

(n,ssize,zsize) =zmtree(os.path .abspath(sys.ar gv[1]),os.path.abspa th(sys.argv[2]))
print "\n\n Summary:\n Number of files compressed: %d\n
Total size of original files: %.2fM\n \
Total size of compressed files: %.2fM" % (n, ssize/1048576.0,
zsize/1048576.0)
sys.exit(0)
else:
print "Incorrect arguments."
if (not os.path.isdir(s ys.argv[1])): print sys.argv[1] + "
is not directory."
if (not os.path.isdir(s ys.argv[2])): print sys.argv[2] + "
is not directory."

print "\n Usage:\n " + sys.argv[0] + " source-directory
target-directory"

--
It's a man's life in a Python Programming Association.
Jul 18 '05 #1
14 1548
On Fri, 31 Dec 2004 01:41:13 +0100, Bulba! wrote:

One of the posters inspired me to do profiling on my newbie script (pasted
below). After measurements I have found that the speed of Python, at least
in the area where my script works, is surprisingly high.

This is the experiment: a script recreates the folder hierarchy somewhere
else and stores there the compressed versions of files from source
hierarchy (the script is doing additional backups of the disk of file
server at the company where I work onto other disks, with compression for
sake of saving space). The data was:


I did not study your script but odds are it is strongly disk bound.

This means that the disk access time is so large that it completely swamps
almost everything else.

I would point out a couple of other ideas, though you may be aware of
them: Compressing all the files seperately, if they are small, may greatly
reduce the final compression since similarities between the files can not
be exploited. You may not care. Also, the "zip" format can be updated on a
file-by-file basis; it may do all by itself what you are trying to do,
with just a single command line. Just a thought.
Jul 18 '05 #2
On Fri, 2004-12-31 at 11:17, Jeremy Bowers wrote:
I would point out a couple of other ideas, though you may be aware of
them: Compressing all the files seperately, if they are small, may greatly
reduce the final compression since similarities between the files can not
be exploited.


True; however, it's my understanding that compressing individual files
also means that in the case of damage to the archive it is possible to
recover the files after the damaged file. This cannot be guaranteed when
the archive is compressed as a single stream.

--
Craig Ringer

Jul 18 '05 #3
Craig Ringer wrote:
On Fri, 2004-12-31 at 11:17, Jeremy Bowers wrote:
I would point out a couple of other ideas, though you may be aware of
them: Compressing all the files seperately, if they are small, may greatly
reduce the final compression since similarities between the files can not
be exploited.


True; however, it's my understanding that compressing individual files
also means that in the case of damage to the archive it is possible to
recover the files after the damaged file. This cannot be guaranteed when
the archive is compressed as a single stream.


With gzip, you can forget the entire rest of the stream; with bzip2,
there is a good chance that nothing more than one block (100-900k) is lost.

regards,
Reinhold
Jul 18 '05 #4
On Fri, 31 Dec 2004 13:19:44 +0100, Reinhold Birkenfeld
<re************ ************@wo lke7.net> wrote:
True; however, it's my understanding that compressing individual files
also means that in the case of damage to the archive it is possible to
recover the files after the damaged file. This cannot be guaranteed when
the archive is compressed as a single stream.
With gzip, you can forget the entire rest of the stream; with bzip2,
there is a good chance that nothing more than one block (100-900k) is lost.


I have actually written the version of that script with bzip2 but
it was so horribly slow that I chose the zip version.


--
It's a man's life in a Python Programming Association.
Jul 18 '05 #5
On Thu, 30 Dec 2004 22:17:10 -0500, Jeremy Bowers <je**@jerf.or g>
wrote:
I would point out a couple of other ideas, though you may be aware of
them: Compressing all the files seperately, if they are small, may greatly
reduce the final compression since similarities between the files can not
be exploited. You may not care.


The problem is about easy recovery of individual files plus storing
and not deleting the older versions of files for some time (users
of the file servers tend to come around crying "I have deleted this
important file created a week before accidentally, where can I find
it?").

The way it is done I can expose the directory hierarchy as read-only
to users and they can get the damn file themselves, they just need
to unzip it. If they were to search through a huge zipfile to find
it, that could be a problem for them.


--
It's a man's life in a Python Programming Association.
Jul 18 '05 #6
On Fri, 31 Dec 2004 13:19:44 +0100, Reinhold Birkenfeld
<re************ ************@wo lke7.net> wrote:
True; however, it's my understanding that compressing individual files
also means that in the case of damage to the archive it is possible to
recover the files after the damaged file. This cannot be guaranteed when
the archive is compressed as a single stream.


With gzip, you can forget the entire rest of the stream; with bzip2,
there is a good chance that nothing more than one block (100-900k) is lost.


A "good chance" sometimes is unacceptable -- I have to have a
guarantee that as long as the hardware isn't broken a user can
recover that old file. We've even thought about storing uncompressed
directory trees, but holding them would consume too much diskspace.
Hence compression had to be used.

(initially, that was just a shell script, but whitespaces and
strange chars that users love to enter into filenames break
just too many shell tools)


--
It's a man's life in a Python Programming Association.
Jul 18 '05 #7
Bulba! <bu***@bulba.co m> writes:
The only thing I'm missing in this picture is knowledge if my script
could be further optimised (not that I actually need better
performance, I'm just curious what possible solutions could be).

Any takers among the experienced guys?


There's another compression program called LHN which is supposed to be
quite a bit faster than gzip, though with somewhat worse compression.
I haven't gotten around to trying it.
Jul 18 '05 #8
Bulba! <bu***@bulba.co m> writes:
With gzip, you can forget the entire rest of the stream; with bzip2,
there is a good chance that nothing more than one block (100-900k) is lost.
A "good chance" sometimes is unacceptable -- I have to have a
guarantee that as long as the hardware isn't broken a user can
recover that old file.


Well, we're talking about an archive that's been damaged, whether by
software or hardware. That damage isn't supposed to happen, but
sometimes it does anyway.
We've even thought about storing uncompressed directory trees, but
holding them would consume too much diskspace. Hence compression
had to be used.
If these are typical files, compression gets you maybe 2:1 shrinkage,
much less on larger files (e.g. multimedia files) which tend to be
incompressible. Disk space is cheap these days, buy more drives.
(initially, that was just a shell script, but whitespaces and
strange chars that users love to enter into filenames break
just too many shell tools)


I didn't look at your script, but why not just use info-zip?
Jul 18 '05 #9
On 31 Dec 2004 06:05:44 -0800, Paul Rubin
<http://ph****@NOSPAM.i nvalid> wrote:
(initially, that was just a shell script, but whitespaces and
strange chars that users love to enter into filenames break
just too many shell tools)
I didn't look at your script, but why not just use info-zip?


Because I need the users to be able to access the folder
tree with old versions of files, not the one big zipfile -- which
they can't search for their old files using standard Windows
explorer for instance.

--
It's a man's life in a Python Programming Association.
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
23070
by: Yang Li Ke | last post by:
Hi guys, Is it possible to know the internet speed of the visitors with php? Thanx -- Yang
34
2479
by: Jacek Generowicz | last post by:
I have a program in which I make very good use of a memoizer: def memoize(callable): cache = {} def proxy(*args): try: return cache except KeyError: return cache.setdefault(args, callable(*args)) return proxy which, is functionally equivalent to
72
4867
by: Herbert | last post by:
I'm still relativey new to stylesheets, so I'm hoping that the way I'm going about things can be seriously improved upon, i.e . I just haven't undersood something obvious about the 'cascading' nature of the coding, which I believe concerns the way attributes relate to one another when 'nested'... I think I can illustrate the nature of the beast with this example, using just two text styles:
7
3049
by: YAZ | last post by:
Hello, I have a dll which do some number crunching. Performances (execution speed) are very important in my application. I use VC6 to compile the DLL. A friend of mine told me that in Visual studio 2003 .net optimization were enhanced and that i must gain in performance if I switch to VS 2003 or intel compiler. So I send him the project and he returned a compiled DLL with VS 2003. Result : the VS 2003 compiled Dll is slower than the VC6...
8
1941
by: rpsetzer | last post by:
I have to create a big web application and I was thinking of using a data layer. For each entity in the database, I'll define a class that maps the table structure, having sub-objects for each foreign key, having insert/delete/update methods, the usual deal. Yet, I am very concerned about performance. For example, there are lots of cases when I may just be needing the employee name. Yet using this model, I will have to instantiate an...
17
1992
by: garrickp | last post by:
While creating a log parser for fairly large logs, we have run into an issue where the time to process was relatively unacceptable (upwards of 5 minutes for 1-2 million lines of logs). In contrast, using the Linux tool grep would complete the same search in a matter of seconds. The search we used was a regex of 6 elements "or"ed together, with an exclusionary set of ~3 elements. Due to the size of the files, we decided to run these line...
6
6255
by: Jassim Rahma | last post by:
I want to detect the internet speed using C# to show the user on what speed he's connecting to internet?
9
2730
by: copx | last post by:
C's enum type disappoints me a lot. You cannot define which type of integer variable is used. This contradicts C's low level spirit. "I want a number variable. I do not care about size or execution speed." feels like typical ultra high level scripting language design. The compiler could not optimize an enum if it wanted to, because you cannot even specify whether memory requirements or speed are your primary concern. Enums were meant to...
4
8622
by: nestle | last post by:
I have DSL with a download speed of 32MB/s and an upload speed of 8MB/s(according to my ISP), and I am using a router. My upload speed is always between 8MB/s and 9MB/s(which is above the max upload speed), ALWAYS. However, my download speed doesn't go over 25MB/s. And when my brother turns on the internet from his computer and takes up half the download/upload speeds (routers automatically split the speeds in two when two computers are using...
0
9628
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9464
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10292
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10122
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10061
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8954
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
4031
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3627
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2860
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.