473,394 Members | 1,841 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

How to get the size of a file?

Anyone have ideas which os command could be used to get the size of a
file without actually opening it? My intention is to write a script
that identifies duplicate files with different names. I have no
trouble getting the names of all the files in the directory using the
os.listdir() command, but that doesn't return the file size. In order
to be identical, files must be the same size, so I want to use file
size as the first criteria, then, if they are the same size, actually
open them up and compare the contents.

I have written such a script in the past, but had to resort to
something like:

os.system('dir *.* >> trash.txt')

The next step was then to open up 'trash.txt', and piece together the
information I need compare file sizes. The problems with this
approach are that it is very platform dependent (worked on WIN 95, but
don't know what else it will work on) and 8.3 filename limitations
that apply within this environment. That is the reason I'm looking
for some other command to obtain file size before the files are ever
opened.
Jul 18 '05 #1
6 3999
User wrote:
Anyone have ideas which os command could be used to get the size of a
file without actually opening it?

os.path.getsize("/etc/passwd") 722L


Andrew
da***@dalkescientific.com
Jul 18 '05 #2
On Sun, 17 Oct 2004 03:43:32 GMT, Andrew Dalke <ad****@mindspring.com>
wrote:
User wrote:
Anyone have ideas which os command could be used to get the size of a
file without actually opening it?

os.path.getsize("/etc/passwd")722L


Andrew
da***@dalkescientific.com


That works. Thanks. I didn't think to look there for it.
Jul 18 '05 #3
User <1@2.3> writes:
Anyone have ideas which os command could be used to get the size of a
file without actually opening it? My intention is to write a script
that identifies duplicate files with different names. I have no
trouble getting the names of all the files in the directory using the
os.listdir() command, but that doesn't return the file size. In order
to be identical, files must be the same size, so I want to use file
size as the first criteria, then, if they are the same size, actually
open them up and compare the contents.

I have written such a script in the past, but had to resort to
something like:

os.system('dir *.* >> trash.txt')


You're looking for os.stat. It returns an object whose attributes have
info about the file.
import os
s = os.stat('somefile')
print s.st_size

2821
(or whatever)

More info in the docs for the os module, of course.

--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca
Jul 18 '05 #4
On Sun, 17 Oct 2004 03:13:46 GMT, User <1@2.3> wrote:
Anyone have ideas which os command could be used to get the size of a
file without actually opening it? My intention is to write a script
that identifies duplicate files with different names. I have no
trouble getting the names of all the files in the directory using the
os.listdir() command, but that doesn't return the file size. In order
to be identical, files must be the same size, so I want to use file
size as the first criteria, then, if they are the same size, actually
open them up and compare the contents.

I have written such a script in the past, but had to resort to
something like:

os.system('dir *.* >> trash.txt')

The next step was then to open up 'trash.txt', and piece together the
information I need compare file sizes. The problems with this
approach are that it is very platform dependent (worked on WIN 95, but
don't know what else it will work on) and 8.3 filename limitations
that apply within this environment. That is the reason I'm looking
for some other command to obtain file size before the files are ever
opened.


This should list duplicate files in the specified directory:
You can hack to suit. Not very tested. Just what you see ;-)
------------------------------------------------
# get_dupes.py
import os, md5
def get_dupes(thedir):
finfo = {}
for f in os.listdir(thedir):
if os.path.isfile(f):
finfo.setdefault(os.path.getsize(f), []).append(f)

result = []
for size, flist in finfo.items():
if len(flist)>1:
dupes = {}
for name in flist:
dupes.setdefault(md5.new(open(name, 'rb').read()).hexdigest(),[]).append(name)
for digest, names in dupes.items():
if len(names)>1: result.append((size, digest, names))
return result

if __name__ == '__main__':
import sys
try:
dupes = get_dupes(sys.argv[1])
if dupes:
print
print '%8s %32s %s' % ('size','md5 digest','files with the given size, digest')
print '%8s %32s %s' % ('----','-'*32 ,'---------------------------------')
for duped in dupes:
print '%8s %32s %s' % duped
else:
print 'No duplicate files in %r' % sys.argv[1]
except:
raise SystemExit, 'Usage: python get_dupes.py directory'
-------------------------------------------

(I was surprised at the amount of duplicated stuff ;-)

[23:23] C:\pywk\clp>python get_dupes.py .

size md5 digest files with the given size, digest
---- -------------------------------- ---------------------------------
0 d41d8cd98f00b204e9800998ecf8427e ['z3', 'zero_len.py']
111 ea70a0f814917ef8861bebc085e5e7d0 ['MyConsts.py', 'MyConsts.py~']
163 f8e4add20e45bb253bd46963f25a7057 ['ramb.txt', 'rambxx.txt']
4096 d96633a4b58522ce5787ef80a18e9c7b ['yyy2', 'yyy3']
786 05956208d5185259b47362afcf1812fd ['startmore.py', 'startmore.py~']
851 3845f161fa93cbb9119c16fc43e7b62a ['quadratic.py', 'quadratic.py~']
1536 72f5c05b7ea8dd6059bf59f50b22df33 ['virtest.txt', '~DF30EC.tmp']
1028 fbedc511f9556a8a1dc2ecfa3d859621 ['PaulMoore.py', 'PaulMoore.py~']
1515 568f9732866a9de698732616ae4f9c3b ['loopbreak.py', 'loopbreak.py~']
1662 f54414637ed420fe61b78eeba59737b7 ['for_grodrigues.py', 'for_grodrigues.r1.py']
1702 23fa57926e7fcf2487943acb10db7e2a ['bitfield.py', 'bitfield.py~', 'packbits.py']
3765 e69bf6b018ba305cc3e190378f93e421 ['pythonHi.gif', 'showgif.gif']
5874 bae87bbed53c1e6908bb5c37db9c4292 ['testyenc.py', 'testyenc.py~']
3990 4a5096efaf136f901603a2e1be850eb3 ['pns.py', 'pns.r1.py']

Regards,
Bengt Richter
Jul 18 '05 #5
On Sun, 17 Oct 2004 06:29:36 GMT, bo**@oz.net (Bengt Richter) wrote:
[...]

This should list duplicate files in the specified directory:
You can hack to suit. Not very tested. Just what you see ;-)
------------------------------------------------
# get_dupes.py

[... version which only worked for current working directory...]
Phooey. Hopefully better:

----------------------------------------------------------------------------
# get_dupes.py
import os, md5
def get_dupes(thedir):
finfo = {}
for f in os.listdir(thedir):
p = os.path.join(thedir, f)
if os.path.isfile(p):
finfo.setdefault(os.path.getsize(p), []).append(f)

result = []
for size, flist in finfo.items():
if len(flist)>1:
dupes = {}
for name in flist:
dupes.setdefault(md5.new(open(os.path.join(thedir, name), 'rb'
).read()).hexdigest(),[]).append(name)
for digest, names in dupes.items():
if len(names)>1: result.append((size, digest, names))
return result

if __name__ == '__main__':
import sys
try:
dupes = get_dupes(sys.argv[1])
if dupes:
print
print '%8s %32s %s' % ('size','md5 digest','files with the given size, digest')
print '%8s %32s %s' % ('----','-'*32 ,'---------------------------------')
for duped in dupes:
print '%8s %32s %s' % duped
else:
print 'No duplicate files in %r' % sys.argv[1]
except:
raise SystemExit, 'Usage: python get_dupes.py directory'
----------------------------------------------------------------------------------------------
Regards,
Bengt Richter
Jul 18 '05 #6
On Sun, 17 Oct 2004 03:13:46 GMT, User <1@2.3> declaimed the following
in comp.lang.python:
Anyone have ideas which os command could be used to get the size of a
file without actually opening it? My intention is to write a script
that identifies duplicate files with different names. I have no
trouble getting the names of all the files in the directory using the
os.listdir() command, but that doesn't return the file size. In order
to be identical, files must be the same size, so I want to use file
size as the first criteria, then, if they are the same size, actually
open them up and compare the contents.
Watch out for line wraps... This one didn't actually check the
contents, only logged candidates (time for me to use it again too)... As
you can see, it is ancient, and no doubt could be improved with some of
the newer Python modules...

#
# DupCheck.py -- Scans a directory and all subdirectories
# for duplicate file names, reporting
conflicts
# March 22 1998 dl bieber <wu******@netcom.com>
#

import os
import sys
import string
from stat import *

Files = {}

def Scan_Dir(cd):
global Files, logfile

cur_files = os.listdir(cd)
cur_files.sort()
for f in cur_files:
fib = os.stat("%s\\%s" % (cd, f))
if S_ISDIR(fib[ST_MODE]):
Scan_Dir("%s\\%s" % (cd, f))
elif S_ISREG(fib[ST_MODE]):
if Files.has_key(string.lower(f)):
(aSize, aDir) = Files[string.lower(f)]
if fib[ST_SIZE] == aSize:
logfile.write(
"***** Possible
Duplicate File: %s\n" % (f))
logfile.write(
" %s\t%s\n" %
(fib[ST_SIZE], cd))
logfile.write(
" %s\t%s\n\n" %
(Files[string.lower(f)]))
else:
Files[string.lower(f)] = (fib[ST_SIZE],
cd)
else:
logfile.write(
"***** SKIPPED Not File or Dir:
%s\n\n" % (f))
if __name__ == "__main__":
Cur_Dir = raw_input("Root Directory -> ")
Log_To = raw_input("Log File -> ")

if Log_To:
logfile = open(Log_To, "w")
else:
logfile = sys.stdout

Scan_Dir(Cur_Dir)

if Log_To:
logfile.close()
-- ================================================== ============ <
wl*****@ix.netcom.com | Wulfraed Dennis Lee Bieber KD6MOG <
wu******@dm.net | Bestiaria Support Staff <
================================================== ============ <
Home Page: <http://www.dm.net/~wulfraed/> <
Overflow Page: <http://wlfraed.home.netcom.com/> <

Jul 18 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

17
by: Arnold | last post by:
Is using fseek and ftell a reliable method of getting the file size on a binary file? I thought I remember reading somewhere it wasn't... If not what would be the "right" and portable method to...
6
by: Andrew Clark | last post by:
*** post for FREE via your newsreader at post.newsfeed.com *** Hello all, I recall several threads over the years about how reading file size cannot be done consistantly or portably, but I...
8
by: Dave | last post by:
I am serialising an object to a memory mapped file (using the CreateFileMapping and MapViewOfFile p/invoke calls). These need to know the maximum size of the "file". I can put in a "good guess" ie...
8
by: Ron | last post by:
Hi all, How do I determine the size of the tables I'm using? I looked under properties and it's not there. The book I just browsed said table is limited to 1GB. How do I find out what size my...
5
by: Jefferis NoSpamme | last post by:
Hi all, I'm trying to limit the file size of an image submission and I keep running into various problems. I've got most of it working, but I'm stumped and I have a basic question as to WHY this...
12
by: Phil Z. | last post by:
After migrating an ASP.NET 1.1 application to 2.0 we were getting "Cannot access a closed file" errors when uploading. I found a number of post on the subject and have since moved from using an...
4
by: Doug | last post by:
Hi, It looks like the only way to get a size of a file within csharp is to use FileInfo and the Length property. However that only returns the number of bytes in the file which is translating...
1
by: chrisj | last post by:
I'm using freeASPupload and got some assistance integrating to a Member script. It works successfully. In this modified version there are two groups that use this upload script. Members of one...
20
by: Ashit Vora | last post by:
Hi, I 'm new to C programming and 'm stuck somewhere. I want to find the size of a file. I couldn't find a proper way of doing it. What I was planning to do is... Open the requested file,...
18
by: MisterE | last post by:
I hear that this isn't always valid: FILE *in; long size; in = fopen("foo.bar","rb"); fseek(in,0,SEEK_END); size = ftell(in); fseek(in,0,SEEK_SET); then fread size many bytes into memory.
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.