473,546 Members | 2,239 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Simple py script to calc folder sizes

Hi everyone

[Short version: I put a some code below: what changes can make it run
faster?]

Unless you have a nice tool handy, calculating many folder sizes for
clearing disk space can be a click-fest nightmare. Looking around, I
found Baobab (gui tool); the "du" linux/unix command-line tool; the
extremely impressive tkdu: http://unpythonic.net/jeff/tkdu/ ; a python
script I didn't really understand at
http://vsbabu.org/webdev/zopedev/foldersize.html (are these "folder
objects" zope thingies?); there are also tools that can add a
"foldersize " column into Explorer on Windows
(foldersize.sou rceforge.net, for example); the superb freeCommander
file-manager (win32) has the functionality built in, and so on.

"du" is closest to what I was looking for, but is not immediately
cross-platform: I know I can probably get it through Cygwin, and there
is probably a win32 binary or clone around somewhere, but I thought a
simple python solution would be great. Maybe there already is one, but
I couldn't find it with a modest amount of searching.

Anyway, I made one that will produce a list of only the folders in the
current folder, along with their sizes. I am posting it for two
reasons: it might be useful for someone else, and I want to know if it
can be made faster (but in a cross-platform way); maybe you spot
something in the code that is obviously sub-optimal.

# Python script to list sizes of folders in current folder

import os, os.path

rootfolders = os.listdir('.')
rootfolders = [i for i in rootfolders if os.path.isdir(i )]

class counter:
def __init__(self,r ootfolder):
self.count = 0
self.rootfolder = rootfolder
def inc(self,num):
self.count = self.count + num
def __str__(self):
if self.count<1024 .:
unit = ' bytes'
scaler = 1.
elif self.count<1024 .*1024.:
unit = ' KB'
scaler = 1/1024.
elif self.count<1024 .*1024.*1024.:
unit = ' MB'
scaler = 1/1024./1024.
else:
unit = ' GB'
scaler = 1/1024./1024./1024.
return '%-20s -
%8.2f%s'%(self. rootfolder,self .count*scaler,u nit)

def visitfun(cntObj ,dirname,names) :
for i in names:
fullname = os.path.join(di rname,i)
if os.path.isfile( fullname):
cntObj.inc( os.path.getsize (fullname) )
return None

foldersizeobjec ts = []
for i in rootfolders:
cntObj = counter(i)
os.path.walk(i, visitfun,cntObj )
foldersizeobjec ts.append(cntOb j)

def cmpfunc(a,b):
if a.count > b.count:
return 1
elif a.count == b.count:
return 0
else:
return -1

foldersizeobjec ts.sort(cmpfunc )

tot=0
for foldersize in foldersizeobjec ts:
tot=tot+folders ize.count
print foldersize
print 'Total: %.2f MB'%(tot/1024./1024.)

# End

regards
Caleb

Mar 21 '06 #1
6 3587
Caleb Hattingh wrote:
Unless you have a nice tool handy, calculating many folder sizes for
clearing disk space can be a click-fest nightmare. Looking around, I
found Baobab (gui tool); the "du" linux/unix command-line tool; the
extremely impressive tkdu: http://unpythonic.net/jeff/tkdu/ ; a python
script I didn't really understand at
http://vsbabu.org/webdev/zopedev/foldersize.html (are these "folder
objects" zope thingies?); there are also tools that can add a
"foldersize " column into Explorer on Windows
(foldersize.sou rceforge.net, for example); the superb freeCommander
file-manager (win32) has the functionality built in, and so on.
You also might want to take a look at KDirStat
(http://kdirstat.sourceforge.net/) and its win32 counterpart,
WinDirStat (http://windirstat.sourceforge.net/).
"du" is closest to what I was looking for, but is not immediately
cross-platform: I know I can probably get it through Cygwin, and there
is probably a win32 binary or clone around somewhere


Try http://unxutils.sourceforge.net/ ... much quicker to set up than
Cygwin.

A pure Python port of du (and other unix utilities) would be cool,
though.

--Ben

Mar 22 '06 #2
Caleb Hattingh wrote:
Hi everyone

[Short version: I put a some code below: what changes can make it run
faster?]
On my slow notebook, your code takes about 1.5 seconds to do my
C:\Python24 dir. With a few changes my code does it in about 1 second.

Here is my code:

import os, os.path, math

def foldersize(fdir ):
"""Returns the size of all data in folder fdir in bytes"""
root, dirs, files = os.walk(fdir).n ext()
files = [os.path.join(ro ot, x) for x in files]
dirs = [os.path.join(ro ot, x) for x in dirs]
return sum(map(os.path .getsize, files)) + sum(map(folders ize, dirs))

suffixes = ['bytes','kb','m b','gb','tb']
def prettier(bytesi ze):
"""Convert a number in bytes to a string in MB, GB, etc"""
# What power of 1024 is less than or equal to bytesize?
exponent = int(math.log(by tesize, 1024))
if exponent > 4:
return "%d bytes" % bytesize
return "%8.2f %s" % (bytesize / 1024.0 ** exponent, suffixes[exponent])

rootfolders = [i for i in os.listdir('.') if os.path.isdir(i )]
results = [ (foldersize(fol der), folder) for folder in rootfolders ]

for size, folder in sorted(results) :
print "%s\t%s" % (folder, prettier(size))

print
print "Total:\t%s " % prettier(sum ( size for size, folder in results ))

# End

The biggest change I made was to use os.walk rather than os.path.walk.
os.walk is newer, and a bit easier to understand; it takes just a single
directory path as an argument, and returns a nice generator object that
you can use in a for loop to walk the entire tree. I use it in a
somewhat unconventional way here. Look at the docs for a more
conventional application.

The "map(os.path.ge tsize, files)" code should run a bit faster than a
for loop, because map only has to look up the getsize function once.

I use log in the "prettier" function rather than your chain of ifs. The
chain of ifs might actually be faster. But I spent so long studying
math in school that I like to use it whenever I get a chance.

Some other comments on your code:
def cmpfunc(a,b):
if a.count > b.count:
return 1
elif a.count == b.count:
return 0
else:
return -1
This could be just "return a.count - b.count". Cmp does not require -1
or +1, just a positive, negative, or zero.
foldersizeobjec ts.sort(cmpfunc )
You could also use the key parameter; it is usually faster than a cmp
function. As you can see, I used a tuple; the sort functions by default
sort on the first element of the tuples. Of course, sorting is not a
serious bottleneck in either program.
tot=0
for foldersize in foldersizeobjec ts:
tot=tot+folders ize.count
print foldersize


"tot +=" is cooler than tot = tot + . And perhaps a bit faster.
Mar 22 '06 #3
Thanks John

I will use your code :) 30% improvement is not insignificant, and
that's what I was looking for.

I find the log function a little harder to read, but I guess that is a
limitation of me, not your code.

Caleb

Mar 28 '06 #4
Hi John

Your code works on some folders but not others. For example, it works
on my /usr/lib/python2.4 (the example you gave), but on other folders
it terminates early with StopIteration exception on the
os.walk().next( ) step.

I haven't really looked at this closely enough yet, but it looks as
though there may be an issue with permissions (and not having enough)
on subfolders within a tree.

I don't want you to work too hard on what is my problem, but are there
any ideas that jump out at you?

Regards
Caleb

Mar 30 '06 #5
Caleb Hattingh wrote:
Your code works on some folders but not others. For example, it works
on my /usr/lib/python2.4 (the example you gave), but on other folders
it terminates early with StopIteration exception on the
os.walk().next( ) step.

I haven't really looked at this closely enough yet, but it looks as
though there may be an issue with permissions (and not having enough)
on subfolders within a tree.


You're quite correct. Here's a version of John's code that handles
such cases:

import warnings
def foldersize(fdir ):
"""Returns the size of all data in folder fdir in bytes"""
try:
root, dirs, files = os.walk(fdir).n ext()
except StopIteration:
warnings.warn(" Could not access " + fdir)
return 0
files = [os.path.join(ro ot, x) for x in files]
dirs = [os.path.join(ro ot, x) for x in dirs]
return sum(map(os.path .getsize, files)) + sum(map(folders ize, dirs))

There's also another bug in the prettier() function that barfs on empty
directories, as it's taking the log of 0. The fix:

exponent = int(math.log(ma x(1, bytesize), 1024))

--Ben

Mar 31 '06 #6
Ben,

Thank you.

Caleb

Mar 31 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1550
by: news | last post by:
I have no idea if this is a Linux issue or PHP. Looking for clues here. I have a PHP script which I use to upload a CSV spreadsheet into a mySQL database. (The script follows here in a sec.) I was using WindowsXP and MS Excel to take these sheets, edit it if needed, and save it in text based CSV. Then use the PHP script to upload it and...
1
7426
by: John MacIntyre | last post by:
Hi, I am having a problem executing server side javascript. For some reason the script tag is ignored when the runat=server is combined with the src attribute. The code is being used client side also, so embedding it right into the page is not an option. I have also tried the following code without success. <script language=javascript...
3
1435
by: Randi | last post by:
Hi, I am trying to write this program to get this shopping cart to work. I need to use the calc function to determine the option prices, using the selectedIndex property for each dropdown list. This is what I have so far, I started with if statements, but I think i need to use a for loop, because of the array. Any help would be appreciated....
2
1526
by: Johnny | last post by:
I'm creating a Web page on which I want to do some math, and I'm pretty sure it's best don with Javascript, but I have no idea where to get started. Here's the deal. I want to allow the user to enter a number and click "Go," at which point a table would be updated to show the result of multiplying that number by certain constants. For...
33
9903
by: patrick_woflian | last post by:
hey guys, im just writing a basic calculation at the moment, before building on it for an A-Level piece of work. i can add/divide etc... two numbers together yet i am having a major problem with the following calculation: z = x! / (x- y)! The following code is my attempt and i was hoping for a point in the right direction as too where i...
24
6294
by: firstcustomer | last post by:
Hi, Firstly, I know NOTHING about Javascript I'm afraid, so I'm hoping that someone will be able to point me to a ready-made solution to my problem! A friend of mine (honest!) is wanting to have on his site, a Javascript Calculator for working out the cost of what they want, for example: 1 widget and 2 widglets = £5.00
3
7278
by: kimiraikkonen | last post by:
Hi there, I want to begin understanding how class libraries are written under VB.NET and how can i call them under my executable project. For example think an arithmetic calculator includes only plus, minus, division, multiplying functions. Yes it may not be necessary but how can i seperate each arithmetic funtion into per class library...
3
18109
FishVal
by: FishVal | last post by:
Windows Script Host Object library. Full name: Windows Script Host Object Model LibName: IWshRuntimeScripting Location: ...\WINDOWS\system32\wshom.ocx The present tip is closely related to the previous stuff written by ADezii concerning Scripting Runtime library. Both libraries share the same functionality as for file functions but the...
4
1782
by: Yonih | last post by:
So I am trying to get this Calculator to work. It needs to take in a vaule, and select a shipping Everythin works great except the shipping part. I need it to take the shipping value and add it to the "Downpayment" and also the "Total amount paid" Example: item cost $20.00 , $8.50 shipped selected, Payment 1 = 12 + 8.50 so $20.50 Payments...
0
7504
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7435
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7694
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
7947
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7461
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
7792
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
1
5360
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
3470
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1921
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.