473,414 Members | 1,775 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,414 software developers and data experts.

Simple py script to calc folder sizes

Hi everyone

[Short version: I put a some code below: what changes can make it run
faster?]

Unless you have a nice tool handy, calculating many folder sizes for
clearing disk space can be a click-fest nightmare. Looking around, I
found Baobab (gui tool); the "du" linux/unix command-line tool; the
extremely impressive tkdu: http://unpythonic.net/jeff/tkdu/ ; a python
script I didn't really understand at
http://vsbabu.org/webdev/zopedev/foldersize.html (are these "folder
objects" zope thingies?); there are also tools that can add a
"foldersize" column into Explorer on Windows
(foldersize.sourceforge.net, for example); the superb freeCommander
file-manager (win32) has the functionality built in, and so on.

"du" is closest to what I was looking for, but is not immediately
cross-platform: I know I can probably get it through Cygwin, and there
is probably a win32 binary or clone around somewhere, but I thought a
simple python solution would be great. Maybe there already is one, but
I couldn't find it with a modest amount of searching.

Anyway, I made one that will produce a list of only the folders in the
current folder, along with their sizes. I am posting it for two
reasons: it might be useful for someone else, and I want to know if it
can be made faster (but in a cross-platform way); maybe you spot
something in the code that is obviously sub-optimal.

# Python script to list sizes of folders in current folder

import os, os.path

rootfolders = os.listdir('.')
rootfolders = [i for i in rootfolders if os.path.isdir(i)]

class counter:
def __init__(self,rootfolder):
self.count = 0
self.rootfolder = rootfolder
def inc(self,num):
self.count = self.count + num
def __str__(self):
if self.count<1024.:
unit = ' bytes'
scaler = 1.
elif self.count<1024.*1024.:
unit = ' KB'
scaler = 1/1024.
elif self.count<1024.*1024.*1024.:
unit = ' MB'
scaler = 1/1024./1024.
else:
unit = ' GB'
scaler = 1/1024./1024./1024.
return '%-20s -
%8.2f%s'%(self.rootfolder,self.count*scaler,unit)

def visitfun(cntObj,dirname,names):
for i in names:
fullname = os.path.join(dirname,i)
if os.path.isfile(fullname):
cntObj.inc( os.path.getsize(fullname) )
return None

foldersizeobjects = []
for i in rootfolders:
cntObj = counter(i)
os.path.walk(i,visitfun,cntObj)
foldersizeobjects.append(cntObj)

def cmpfunc(a,b):
if a.count > b.count:
return 1
elif a.count == b.count:
return 0
else:
return -1

foldersizeobjects.sort(cmpfunc)

tot=0
for foldersize in foldersizeobjects:
tot=tot+foldersize.count
print foldersize
print 'Total: %.2f MB'%(tot/1024./1024.)

# End

regards
Caleb

Mar 21 '06 #1
6 3579
Caleb Hattingh wrote:
Unless you have a nice tool handy, calculating many folder sizes for
clearing disk space can be a click-fest nightmare. Looking around, I
found Baobab (gui tool); the "du" linux/unix command-line tool; the
extremely impressive tkdu: http://unpythonic.net/jeff/tkdu/ ; a python
script I didn't really understand at
http://vsbabu.org/webdev/zopedev/foldersize.html (are these "folder
objects" zope thingies?); there are also tools that can add a
"foldersize" column into Explorer on Windows
(foldersize.sourceforge.net, for example); the superb freeCommander
file-manager (win32) has the functionality built in, and so on.
You also might want to take a look at KDirStat
(http://kdirstat.sourceforge.net/) and its win32 counterpart,
WinDirStat (http://windirstat.sourceforge.net/).
"du" is closest to what I was looking for, but is not immediately
cross-platform: I know I can probably get it through Cygwin, and there
is probably a win32 binary or clone around somewhere


Try http://unxutils.sourceforge.net/ ... much quicker to set up than
Cygwin.

A pure Python port of du (and other unix utilities) would be cool,
though.

--Ben

Mar 22 '06 #2
Caleb Hattingh wrote:
Hi everyone

[Short version: I put a some code below: what changes can make it run
faster?]
On my slow notebook, your code takes about 1.5 seconds to do my
C:\Python24 dir. With a few changes my code does it in about 1 second.

Here is my code:

import os, os.path, math

def foldersize(fdir):
"""Returns the size of all data in folder fdir in bytes"""
root, dirs, files = os.walk(fdir).next()
files = [os.path.join(root, x) for x in files]
dirs = [os.path.join(root, x) for x in dirs]
return sum(map(os.path.getsize, files)) + sum(map(foldersize, dirs))

suffixes = ['bytes','kb','mb','gb','tb']
def prettier(bytesize):
"""Convert a number in bytes to a string in MB, GB, etc"""
# What power of 1024 is less than or equal to bytesize?
exponent = int(math.log(bytesize, 1024))
if exponent > 4:
return "%d bytes" % bytesize
return "%8.2f %s" % (bytesize / 1024.0 ** exponent, suffixes[exponent])

rootfolders = [i for i in os.listdir('.') if os.path.isdir(i)]
results = [ (foldersize(folder), folder) for folder in rootfolders ]

for size, folder in sorted(results):
print "%s\t%s" % (folder, prettier(size))

print
print "Total:\t%s" % prettier(sum ( size for size, folder in results ))

# End

The biggest change I made was to use os.walk rather than os.path.walk.
os.walk is newer, and a bit easier to understand; it takes just a single
directory path as an argument, and returns a nice generator object that
you can use in a for loop to walk the entire tree. I use it in a
somewhat unconventional way here. Look at the docs for a more
conventional application.

The "map(os.path.getsize, files)" code should run a bit faster than a
for loop, because map only has to look up the getsize function once.

I use log in the "prettier" function rather than your chain of ifs. The
chain of ifs might actually be faster. But I spent so long studying
math in school that I like to use it whenever I get a chance.

Some other comments on your code:
def cmpfunc(a,b):
if a.count > b.count:
return 1
elif a.count == b.count:
return 0
else:
return -1
This could be just "return a.count - b.count". Cmp does not require -1
or +1, just a positive, negative, or zero.
foldersizeobjects.sort(cmpfunc)
You could also use the key parameter; it is usually faster than a cmp
function. As you can see, I used a tuple; the sort functions by default
sort on the first element of the tuples. Of course, sorting is not a
serious bottleneck in either program.
tot=0
for foldersize in foldersizeobjects:
tot=tot+foldersize.count
print foldersize


"tot +=" is cooler than tot = tot + . And perhaps a bit faster.
Mar 22 '06 #3
Thanks John

I will use your code :) 30% improvement is not insignificant, and
that's what I was looking for.

I find the log function a little harder to read, but I guess that is a
limitation of me, not your code.

Caleb

Mar 28 '06 #4
Hi John

Your code works on some folders but not others. For example, it works
on my /usr/lib/python2.4 (the example you gave), but on other folders
it terminates early with StopIteration exception on the
os.walk().next() step.

I haven't really looked at this closely enough yet, but it looks as
though there may be an issue with permissions (and not having enough)
on subfolders within a tree.

I don't want you to work too hard on what is my problem, but are there
any ideas that jump out at you?

Regards
Caleb

Mar 30 '06 #5
Caleb Hattingh wrote:
Your code works on some folders but not others. For example, it works
on my /usr/lib/python2.4 (the example you gave), but on other folders
it terminates early with StopIteration exception on the
os.walk().next() step.

I haven't really looked at this closely enough yet, but it looks as
though there may be an issue with permissions (and not having enough)
on subfolders within a tree.


You're quite correct. Here's a version of John's code that handles
such cases:

import warnings
def foldersize(fdir):
"""Returns the size of all data in folder fdir in bytes"""
try:
root, dirs, files = os.walk(fdir).next()
except StopIteration:
warnings.warn("Could not access " + fdir)
return 0
files = [os.path.join(root, x) for x in files]
dirs = [os.path.join(root, x) for x in dirs]
return sum(map(os.path.getsize, files)) + sum(map(foldersize, dirs))

There's also another bug in the prettier() function that barfs on empty
directories, as it's taking the log of 0. The fix:

exponent = int(math.log(max(1, bytesize), 1024))

--Ben

Mar 31 '06 #6
Ben,

Thank you.

Caleb

Mar 31 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: news | last post by:
I have no idea if this is a Linux issue or PHP. Looking for clues here. I have a PHP script which I use to upload a CSV spreadsheet into a mySQL database. (The script follows here in a sec.) I...
1
by: John MacIntyre | last post by:
Hi, I am having a problem executing server side javascript. For some reason the script tag is ignored when the runat=server is combined with the src attribute. The code is being used client...
3
by: Randi | last post by:
Hi, I am trying to write this program to get this shopping cart to work. I need to use the calc function to determine the option prices, using the selectedIndex property for each dropdown list. ...
2
by: Johnny | last post by:
I'm creating a Web page on which I want to do some math, and I'm pretty sure it's best don with Javascript, but I have no idea where to get started. Here's the deal. I want to allow the user to...
33
by: patrick_woflian | last post by:
hey guys, im just writing a basic calculation at the moment, before building on it for an A-Level piece of work. i can add/divide etc... two numbers together yet i am having a major problem with...
24
by: firstcustomer | last post by:
Hi, Firstly, I know NOTHING about Javascript I'm afraid, so I'm hoping that someone will be able to point me to a ready-made solution to my problem! A friend of mine (honest!) is wanting to...
3
by: kimiraikkonen | last post by:
Hi there, I want to begin understanding how class libraries are written under VB.NET and how can i call them under my executable project. For example think an arithmetic calculator includes only...
3
FishVal
by: FishVal | last post by:
Windows Script Host Object library. Full name: Windows Script Host Object Model LibName: IWshRuntimeScripting Location: ...\WINDOWS\system32\wshom.ocx The present tip is closely related to...
4
by: Yonih | last post by:
So I am trying to get this Calculator to work. It needs to take in a vaule, and select a shipping Everythin works great except the shipping part. I need it to take the shipping value and add it to...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.