473,395 Members | 1,497 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

emulating du with os.walk

Hrm, I'm a bit stumped on this.

I want to write a script lists a file directory hierarchy and gives me a
sorted list showing cumulative directory size. The example code for
os.walk gets me half-way there, but I can't quite figure out how to do
the hierarchal sum. Here is the output I'm getting:

/home/kirk/.gconf/apps/ggv/layout consumes 228 bytes in 1 non-directory
files
/home/kirk/.gconf/apps/ggv consumes 0 bytes in 1 non-directory files
/home/kirk/.gconf/apps consumes 0 bytes in 1 non-directory files

However, what I want is:

/home/kirk/.gconf/apps/ggv/layout consumes 228 bytes in 1 non-directory
files
/home/kirk/.gconf/apps/ggv consumes 228 bytes in 1 non-directory files
/home/kirk/.gconf/apps consumes 228 bytes in 1 non-directory files

There should be an easy way to get around this, or perhaps I'm better
off just parsing the output of du.

--
Kirk Job-Sluder
"The square-jawed homunculi of Tommy Hilfinger ads make every day an
existential holocaust." --Scary Go Round
Jul 18 '05 #1
5 2331
Firs of all, I don't know how much you already know about os.walk, but it
can traverse trees either top-down or bottom-up (it has an argument
'topdown'). The default is topdown=True. What you probably need in your
case is a bottom-up traversal (so pass topdown=False).

Then you have to keep track of all the directories (I can suggest a data
structure if you want) and add the du values of all the children directories
plus the sizes of all the files to determine the du value of a parent
directory.

Without seeing your code, I'm guessing you are not doing one of these
things.

Dan

"Kirk Job-Sluder" <ki**@eyegor.jobsluder.net> wrote in message
news:slrnclej7g.cl6.ki**@eyegor.jobsluder.net...
Hrm, I'm a bit stumped on this.

I want to write a script lists a file directory hierarchy and gives me a
sorted list showing cumulative directory size. The example code for
os.walk gets me half-way there, but I can't quite figure out how to do
the hierarchal sum. Here is the output I'm getting:

/home/kirk/.gconf/apps/ggv/layout consumes 228 bytes in 1 non-directory
files
/home/kirk/.gconf/apps/ggv consumes 0 bytes in 1 non-directory files
/home/kirk/.gconf/apps consumes 0 bytes in 1 non-directory files

However, what I want is:

/home/kirk/.gconf/apps/ggv/layout consumes 228 bytes in 1 non-directory
files
/home/kirk/.gconf/apps/ggv consumes 228 bytes in 1 non-directory files
/home/kirk/.gconf/apps consumes 228 bytes in 1 non-directory files

There should be an easy way to get around this, or perhaps I'm better
off just parsing the output of du.

--
Kirk Job-Sluder
"The square-jawed homunculi of Tommy Hilfinger ads make every day an
existential holocaust." --Scary Go Round

Jul 18 '05 #2
Kirk Job-Sluder wrote:
There should be an easy way to get around this, or perhaps I'm better
off just parsing the output of du.


I suggest that you don't use os.path.walk, but write a recursive
function yourself. You should find that the entire problem can
be solved in 12 lines of Python code.

Regards,
Martin
Jul 18 '05 #3
On 2004-09-27, Martin v. Löwis <ma****@v.loewis.de> wrote:
Kirk Job-Sluder wrote:
There should be an easy way to get around this, or perhaps I'm better
off just parsing the output of du.
I suggest that you don't use os.path.walk, but write a recursive
function yourself. You should find that the entire problem can
be solved in 12 lines of Python code.


Yeah, I finally solved it with a recursive function. Took me 16
including the bookeeping.

Regards,
Martin

--
Kirk Job-Sluder
"The square-jawed homunculi of Tommy Hilfinger ads make every day an
existential holocaust." --Scary Go Round
Jul 18 '05 #4
"Martin v. Löwis" wrote:
Kirk Job-Sluder wrote:
There should be an easy way to get around this, or perhaps I'm better
off just parsing the output of du.


I suggest that you don't use os.path.walk, but write a recursive
function yourself. You should find that the entire problem can
be solved in 12 lines of Python code.


There are some nasty little problems which make it difficult.

First, what do you do with hardlinks? Suppose directory a/a, a/b and a/c
all contain the same 100 MiB file. Directory a/ only has 100 MiB, but a
naive script will report 300 MiB.

Most of the time, you'll want to stay in one filesystem.

You don't want to get stuck in recursive symlinks. If a/b is a symlink
to a/, you quickly get into an infinite loop.

Directories have a size too.

What do we do with files we can't read?

In /proc, even stranger subtleties exist which I don't understand -
ENOENT although listed by listdir() and that sort of thing.

Together with more options, human-readable file sizes and documentation,
it took be ~200 LOC at
http://topjaklont.student.utwente.nl/creaties/dkus.py

Note that du doesn't solve these problems either.

yours,
Gerrit.

--
Weather in Twenthe, Netherlands 28/09 08:55:
15.0°C mist overcast wind 4.0 m/s SW (57 m above NAP)
--
In the councils of government, we must guard against the acquisition of
unwarranted influence, whether sought or unsought, by the
military-industrial complex. The potential for the disastrous rise of
misplaced power exists and will persist.
-Dwight David Eisenhower, January 17, 1961
Jul 18 '05 #5
On 2004-09-28, Gerrit <ge****@nl.linux.org> wrote:
"Martin v. Löwis" wrote:
Kirk Job-Sluder wrote:
>There should be an easy way to get around this, or perhaps I'm better
>off just parsing the output of du.
I suggest that you don't use os.path.walk, but write a recursive
function yourself. You should find that the entire problem can
be solved in 12 lines of Python code.


There are some nasty little problems which make it difficult.

First, what do you do with hardlinks? Suppose directory a/a, a/b and a/c
all contain the same 100 MiB file. Directory a/ only has 100 MiB, but a
naive script will report 300 MiB.


Well, that is a good question. The primary goal of this script is to
construct lists of files that can be passed to cpio in order to make
multiple volumes of a certain size. (In my case, efficiently pack
CD-ROM or CD-RW disks.) The other goal is to minimize splitting of
directory heirarchies between volumes where possible. So for example,
given a list of directories:

foo 500M
bar 400M
baz 100M
rab 200M

the script should construct file lists for two volumes:
volume1: foo baz
volume2: bar rab

(Of course, the actual volumes will be larger than 600M to allow for
compression.)

Since each volume should be independent of other volumes, it makes sense
to treat hard links as regular files. Even though foo/a.txt and
bar/b.txt point to the same file. A full copy of a.txt and b.txt is
required.
Most of the time, you'll want to stay in one filesystem.

You don't want to get stuck in recursive symlinks. If a/b is a symlink
to a/, you quickly get into an infinite loop.
Good point. I should check for that.
Directories have a size too.

What do we do with files we can't read?
At the moment, throw an error and move on.
In /proc, even stranger subtleties exist which I don't understand -
ENOENT although listed by listdir() and that sort of thing.

Together with more options, human-readable file sizes and documentation,
it took be ~200 LOC at
http://topjaklont.student.utwente.nl/creaties/dkus.py
Thanks!
Note that du doesn't solve these problems either.
True, but I'm willing to sacrifice some precision for the sake of getting
it done. Getting volume sizes in the ballpark is good enough.

yours,
Gerrit.

--
Weather in Twenthe, Netherlands 28/09 08:55:
15.0°C mist overcast wind 4.0 m/s SW (57 m above NAP)

--
Kirk Job-Sluder
"The square-jawed homunculi of Tommy Hilfinger ads make every day an
existential holocaust." --Scary Go Round
Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Marcello Pietrobon | last post by:
Hello, I am using Pyton 2.3 I desire to walk a directory without recursion this only partly works: def walk_files() : for root, dirs, files in os.walk(top, topdown=True): for filename in...
6
by: rbt | last post by:
More of an OS question than a Python question, but it is Python related so here goes: When I do os.walk('/') on a Linux computer, the entire file system is walked. On windows, however, I can...
2
by: Bart | last post by:
Hi there, Since you've all told me that frames ar evil, I'm planning to disguard frames in favour of CSS "pseudo-frames" for my personal website. While trying to "emulate frames" (that is I...
5
by: Stephan Schaem | last post by:
How does one write an unmanaged function that perform this functionality? In short I want to turn off/on visual style in my app... Thanks, Stephan PS: two people have been looking for...
7
by: KraftDiner | last post by:
The os.walk function walks the operating systems directory tree. This seems to work, but I don't quite understand the tupple that is returned... Can someone explain please? for root, dirs,...
6
by: Bruce | last post by:
Hi all, I have a question about traversing file systems, and could use some help. Because of directories with many files in them, os.walk appears to be rather slow. I`m thinking there is a...
2
by: gregpinero | last post by:
In the example from help(os.walk) it lists this: from os.path import join, getsize for root, dirs, files in walk('python/Lib/email'): print root, "consumes", print sum(), print "bytes in",...
0
by: Jeff McNeil | last post by:
Your args are fine, that's just the way os.path.walk works. If you just need the absolute pathname of a directory when given a relative path, you can always use os.path.abspath, too. A couple...
4
by: Jeff Nyman | last post by:
Greetings all. I did some searching on this but I can't seem to find a specific solution. I have code like this: ========================================= def walker1(arg, dirname, names):...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.