469,283 Members | 2,303 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,283 developers. It's quick & easy.

How to get the size of a file?

Anyone have ideas which os command could be used to get the size of a
file without actually opening it? My intention is to write a script
that identifies duplicate files with different names. I have no
trouble getting the names of all the files in the directory using the
os.listdir() command, but that doesn't return the file size. In order
to be identical, files must be the same size, so I want to use file
size as the first criteria, then, if they are the same size, actually
open them up and compare the contents.

I have written such a script in the past, but had to resort to
something like:

os.system('dir *.* >> trash.txt')

The next step was then to open up 'trash.txt', and piece together the
information I need compare file sizes. The problems with this
approach are that it is very platform dependent (worked on WIN 95, but
don't know what else it will work on) and 8.3 filename limitations
that apply within this environment. That is the reason I'm looking
for some other command to obtain file size before the files are ever
opened.
Jul 18 '05 #1
6 3642
User wrote:
Anyone have ideas which os command could be used to get the size of a
file without actually opening it?

os.path.getsize("/etc/passwd") 722L


Andrew
da***@dalkescientific.com
Jul 18 '05 #2
On Sun, 17 Oct 2004 03:43:32 GMT, Andrew Dalke <ad****@mindspring.com>
wrote:
User wrote:
Anyone have ideas which os command could be used to get the size of a
file without actually opening it?

os.path.getsize("/etc/passwd")722L


Andrew
da***@dalkescientific.com


That works. Thanks. I didn't think to look there for it.
Jul 18 '05 #3
User <1@2.3> writes:
Anyone have ideas which os command could be used to get the size of a
file without actually opening it? My intention is to write a script
that identifies duplicate files with different names. I have no
trouble getting the names of all the files in the directory using the
os.listdir() command, but that doesn't return the file size. In order
to be identical, files must be the same size, so I want to use file
size as the first criteria, then, if they are the same size, actually
open them up and compare the contents.

I have written such a script in the past, but had to resort to
something like:

os.system('dir *.* >> trash.txt')


You're looking for os.stat. It returns an object whose attributes have
info about the file.
import os
s = os.stat('somefile')
print s.st_size

2821
(or whatever)

More info in the docs for the os module, of course.

--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca
Jul 18 '05 #4
On Sun, 17 Oct 2004 03:13:46 GMT, User <1@2.3> wrote:
Anyone have ideas which os command could be used to get the size of a
file without actually opening it? My intention is to write a script
that identifies duplicate files with different names. I have no
trouble getting the names of all the files in the directory using the
os.listdir() command, but that doesn't return the file size. In order
to be identical, files must be the same size, so I want to use file
size as the first criteria, then, if they are the same size, actually
open them up and compare the contents.

I have written such a script in the past, but had to resort to
something like:

os.system('dir *.* >> trash.txt')

The next step was then to open up 'trash.txt', and piece together the
information I need compare file sizes. The problems with this
approach are that it is very platform dependent (worked on WIN 95, but
don't know what else it will work on) and 8.3 filename limitations
that apply within this environment. That is the reason I'm looking
for some other command to obtain file size before the files are ever
opened.


This should list duplicate files in the specified directory:
You can hack to suit. Not very tested. Just what you see ;-)
------------------------------------------------
# get_dupes.py
import os, md5
def get_dupes(thedir):
finfo = {}
for f in os.listdir(thedir):
if os.path.isfile(f):
finfo.setdefault(os.path.getsize(f), []).append(f)

result = []
for size, flist in finfo.items():
if len(flist)>1:
dupes = {}
for name in flist:
dupes.setdefault(md5.new(open(name, 'rb').read()).hexdigest(),[]).append(name)
for digest, names in dupes.items():
if len(names)>1: result.append((size, digest, names))
return result

if __name__ == '__main__':
import sys
try:
dupes = get_dupes(sys.argv[1])
if dupes:
print
print '%8s %32s %s' % ('size','md5 digest','files with the given size, digest')
print '%8s %32s %s' % ('----','-'*32 ,'---------------------------------')
for duped in dupes:
print '%8s %32s %s' % duped
else:
print 'No duplicate files in %r' % sys.argv[1]
except:
raise SystemExit, 'Usage: python get_dupes.py directory'
-------------------------------------------

(I was surprised at the amount of duplicated stuff ;-)

[23:23] C:\pywk\clp>python get_dupes.py .

size md5 digest files with the given size, digest
---- -------------------------------- ---------------------------------
0 d41d8cd98f00b204e9800998ecf8427e ['z3', 'zero_len.py']
111 ea70a0f814917ef8861bebc085e5e7d0 ['MyConsts.py', 'MyConsts.py~']
163 f8e4add20e45bb253bd46963f25a7057 ['ramb.txt', 'rambxx.txt']
4096 d96633a4b58522ce5787ef80a18e9c7b ['yyy2', 'yyy3']
786 05956208d5185259b47362afcf1812fd ['startmore.py', 'startmore.py~']
851 3845f161fa93cbb9119c16fc43e7b62a ['quadratic.py', 'quadratic.py~']
1536 72f5c05b7ea8dd6059bf59f50b22df33 ['virtest.txt', '~DF30EC.tmp']
1028 fbedc511f9556a8a1dc2ecfa3d859621 ['PaulMoore.py', 'PaulMoore.py~']
1515 568f9732866a9de698732616ae4f9c3b ['loopbreak.py', 'loopbreak.py~']
1662 f54414637ed420fe61b78eeba59737b7 ['for_grodrigues.py', 'for_grodrigues.r1.py']
1702 23fa57926e7fcf2487943acb10db7e2a ['bitfield.py', 'bitfield.py~', 'packbits.py']
3765 e69bf6b018ba305cc3e190378f93e421 ['pythonHi.gif', 'showgif.gif']
5874 bae87bbed53c1e6908bb5c37db9c4292 ['testyenc.py', 'testyenc.py~']
3990 4a5096efaf136f901603a2e1be850eb3 ['pns.py', 'pns.r1.py']

Regards,
Bengt Richter
Jul 18 '05 #5
On Sun, 17 Oct 2004 06:29:36 GMT, bo**@oz.net (Bengt Richter) wrote:
[...]

This should list duplicate files in the specified directory:
You can hack to suit. Not very tested. Just what you see ;-)
------------------------------------------------
# get_dupes.py

[... version which only worked for current working directory...]
Phooey. Hopefully better:

----------------------------------------------------------------------------
# get_dupes.py
import os, md5
def get_dupes(thedir):
finfo = {}
for f in os.listdir(thedir):
p = os.path.join(thedir, f)
if os.path.isfile(p):
finfo.setdefault(os.path.getsize(p), []).append(f)

result = []
for size, flist in finfo.items():
if len(flist)>1:
dupes = {}
for name in flist:
dupes.setdefault(md5.new(open(os.path.join(thedir, name), 'rb'
).read()).hexdigest(),[]).append(name)
for digest, names in dupes.items():
if len(names)>1: result.append((size, digest, names))
return result

if __name__ == '__main__':
import sys
try:
dupes = get_dupes(sys.argv[1])
if dupes:
print
print '%8s %32s %s' % ('size','md5 digest','files with the given size, digest')
print '%8s %32s %s' % ('----','-'*32 ,'---------------------------------')
for duped in dupes:
print '%8s %32s %s' % duped
else:
print 'No duplicate files in %r' % sys.argv[1]
except:
raise SystemExit, 'Usage: python get_dupes.py directory'
----------------------------------------------------------------------------------------------
Regards,
Bengt Richter
Jul 18 '05 #6
On Sun, 17 Oct 2004 03:13:46 GMT, User <1@2.3> declaimed the following
in comp.lang.python:
Anyone have ideas which os command could be used to get the size of a
file without actually opening it? My intention is to write a script
that identifies duplicate files with different names. I have no
trouble getting the names of all the files in the directory using the
os.listdir() command, but that doesn't return the file size. In order
to be identical, files must be the same size, so I want to use file
size as the first criteria, then, if they are the same size, actually
open them up and compare the contents.
Watch out for line wraps... This one didn't actually check the
contents, only logged candidates (time for me to use it again too)... As
you can see, it is ancient, and no doubt could be improved with some of
the newer Python modules...

#
# DupCheck.py -- Scans a directory and all subdirectories
# for duplicate file names, reporting
conflicts
# March 22 1998 dl bieber <wu******@netcom.com>
#

import os
import sys
import string
from stat import *

Files = {}

def Scan_Dir(cd):
global Files, logfile

cur_files = os.listdir(cd)
cur_files.sort()
for f in cur_files:
fib = os.stat("%s\\%s" % (cd, f))
if S_ISDIR(fib[ST_MODE]):
Scan_Dir("%s\\%s" % (cd, f))
elif S_ISREG(fib[ST_MODE]):
if Files.has_key(string.lower(f)):
(aSize, aDir) = Files[string.lower(f)]
if fib[ST_SIZE] == aSize:
logfile.write(
"***** Possible
Duplicate File: %s\n" % (f))
logfile.write(
" %s\t%s\n" %
(fib[ST_SIZE], cd))
logfile.write(
" %s\t%s\n\n" %
(Files[string.lower(f)]))
else:
Files[string.lower(f)] = (fib[ST_SIZE],
cd)
else:
logfile.write(
"***** SKIPPED Not File or Dir:
%s\n\n" % (f))
if __name__ == "__main__":
Cur_Dir = raw_input("Root Directory -> ")
Log_To = raw_input("Log File -> ")

if Log_To:
logfile = open(Log_To, "w")
else:
logfile = sys.stdout

Scan_Dir(Cur_Dir)

if Log_To:
logfile.close()
-- ================================================== ============ <
wl*****@ix.netcom.com | Wulfraed Dennis Lee Bieber KD6MOG <
wu******@dm.net | Bestiaria Support Staff <
================================================== ============ <
Home Page: <http://www.dm.net/~wulfraed/> <
Overflow Page: <http://wlfraed.home.netcom.com/> <

Jul 18 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

17 posts views Thread by Arnold | last post: by
6 posts views Thread by Andrew Clark | last post: by
8 posts views Thread by Dave | last post: by
8 posts views Thread by Ron | last post: by
5 posts views Thread by Jefferis NoSpamme | last post: by
4 posts views Thread by Doug | last post: by
20 posts views Thread by Ashit Vora | last post: by
18 posts views Thread by MisterE | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.